from:"Alejandro Colomar"

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-08-16 Thread Alejandro Colomar

Hi Jiri, Steven,

On Fri, Aug 16, 2024 at 08:55:47PM GMT, Alejandro Colomar wrote:
> > hi,
> > there are no args for x86.. it's there just to note that it might
> > be different on other archs, so not sure what man page should say
> > in such case.. keeping (void) is fine with me
> 
> Hmmm, then I'll remove that paragraph.  If that function is implemented
> in another arch and the args are different, we can change the manual
> page then.
> 
> > 
> > > 
> > > Please add the changes proposed below to your patch, tweak anything if
> > > you consider it appropriate) and send it as v10.
> > 
> > it looks good to me, thanks a lot
> > 
> > Acked-by: From: Jiri Olsa 

I have applied your patch with the tweaks I mentioned, and added several
tags to the commit message.

It's currently here:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=977e3eecbb81b7398defc4e4f41810ca31d63c1b>

and will $soon be pushed to master.

Have a lovely night!
Alex


-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-08-16 Thread Alejandro Colomar

On Fri, Aug 16, 2024 at 07:03:59PM GMT, Jiri Olsa wrote:
> On Fri, Aug 16, 2024 at 01:42:26PM +0200, Alejandro Colomar wrote:
> > Hi Steven, Jiri,
> > 
> > On Wed, Aug 07, 2024 at 04:27:34PM GMT, Steven Rostedt wrote:
> > > Just in case nobody pinged you, the rest of the series is now in Linus's
> > > tree.
> > 
> > Thanks for the ping!
> > 
> > I have prepared some tweaks to the patch (see below).
> > Also, I have some doubts.  The prototype shows that it has no arguments
> > (void), but the text said that arguments, if any, are arch-specific.
> > Does any arch have arguments?  Should we use a variadic prototype (...)?
> 
> hi,
> there are no args for x86.. it's there just to note that it might
> be different on other archs, so not sure what man page should say
> in such case.. keeping (void) is fine with me

Hmmm, then I'll remove that paragraph.  If that function is implemented
in another arch and the args are different, we can change the manual
page then.

> 
> > 
> > Please add the changes proposed below to your patch, tweak anything if
> > you consider it appropriate) and send it as v10.
> 
> it looks good to me, thanks a lot
> 
> Acked-by: From: Jiri Olsa 

Thanks!

Have a lovely day!
Alex

> 
> jirka
> 
> > 
> > Have a lovely day!
> > Alex
> > 
> > 
> > diff --git i/man/man2/uretprobe.2 w/man/man2/uretprobe.2
> > index cf1c2b0d8..51b566998 100644
> > --- i/man/man2/uretprobe.2
> > +++ w/man/man2/uretprobe.2
> > @@ -7,50 +7,43 @@ .SH NAME
> >  uretprobe \- execute pending return uprobes
> >  .SH SYNOPSIS
> >  .nf
> > -.B int uretprobe(void)
> > +.B int uretprobe(void);
> >  .fi
> >  .SH DESCRIPTION
> > -The
> >  .BR uretprobe ()
> > -system call is an alternative to breakpoint instructions for triggering 
> > return
> > -uprobe consumers.
> > +is an alternative to breakpoint instructions
> > +for triggering return uprobe consumers.
> >  .P
> >  Calls to
> >  .BR uretprobe ()
> > -system call are only made from the user-space trampoline provided by the 
> > kernel.
> > +are only made from the user-space trampoline provided by the kernel.
> >  Calls from any other place result in a
> >  .BR SIGILL .
> > -.SH RETURN VALUE
> > -The
> > +.P
> > +Details of the arguments (if any) passed to
> >  .BR uretprobe ()
> > -system call return value is architecture-specific.
> > +are architecture-specific.
> > +.SH RETURN VALUE
> > +The return value is architecture-specific.
> >  .SH ERRORS
> >  .TP
> >  .B SIGILL
> > -The
> >  .BR uretprobe ()
> > -system call was called by a user-space program.
> > +was called by a user-space program.
> >  .SH VERSIONS
> > -Details of the
> > -.BR uretprobe ()
> > -system call behavior vary across systems.
> > +The behavior varies across systems.
> >  .SH STANDARDS
> >  None.
> >  .SH HISTORY
> > -TBD
> > -.SH NOTES
> > -The
> > +Linux 6.11.
> > +.P
> >  .BR uretprobe ()
> > -system call was initially introduced for the x86_64 architecture
> > +was initially introduced for the x86_64 architecture
> >  where it was shown to be faster than breakpoint traps.
> >  It might be extended to other architectures.
> > -.P
> > -The
> > +.SH CAVEATS
> >  .BR uretprobe ()
> > -system call exists only to allow the invocation of return uprobe consumers.
> > +exists only to allow the invocation of return uprobe consumers.
> >  It should
> >  .B never
> >  be called directly.
> > -Details of the arguments (if any) passed to
> > -.BR uretprobe ()
> > -and the return value are architecture-specific.
> > 
> > -- 
> > <https://www.alejandro-colomar.es/>
> 

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-08-16 Thread Alejandro Colomar

Hi Steven, Jiri,

On Wed, Aug 07, 2024 at 04:27:34PM GMT, Steven Rostedt wrote:
> Just in case nobody pinged you, the rest of the series is now in Linus's
> tree.

Thanks for the ping!

I have prepared some tweaks to the patch (see below).
Also, I have some doubts.  The prototype shows that it has no arguments
(void), but the text said that arguments, if any, are arch-specific.
Does any arch have arguments?  Should we use a variadic prototype (...)?

Please add the changes proposed below to your patch, tweak anything if
you consider it appropriate) and send it as v10.

Have a lovely day!
Alex


diff --git i/man/man2/uretprobe.2 w/man/man2/uretprobe.2
index cf1c2b0d8..51b566998 100644
--- i/man/man2/uretprobe.2
+++ w/man/man2/uretprobe.2
@@ -7,50 +7,43 @@ .SH NAME
 uretprobe \- execute pending return uprobes
 .SH SYNOPSIS
 .nf
-.B int uretprobe(void)
+.B int uretprobe(void);
 .fi
 .SH DESCRIPTION
-The
 .BR uretprobe ()
-system call is an alternative to breakpoint instructions for triggering return
-uprobe consumers.
+is an alternative to breakpoint instructions
+for triggering return uprobe consumers.
 .P
 Calls to
 .BR uretprobe ()
-system call are only made from the user-space trampoline provided by the 
kernel.
+are only made from the user-space trampoline provided by the kernel.
 Calls from any other place result in a
 .BR SIGILL .
-.SH RETURN VALUE
-The
+.P
+Details of the arguments (if any) passed to
 .BR uretprobe ()
-system call return value is architecture-specific.
+are architecture-specific.
+.SH RETURN VALUE
+The return value is architecture-specific.
 .SH ERRORS
 .TP
 .B SIGILL
-The
 .BR uretprobe ()
-system call was called by a user-space program.
+was called by a user-space program.
 .SH VERSIONS
-Details of the
-.BR uretprobe ()
-system call behavior vary across systems.
+The behavior varies across systems.
 .SH STANDARDS
 None.
 .SH HISTORY
-TBD
-.SH NOTES
-The
+Linux 6.11.
+.P
 .BR uretprobe ()
-system call was initially introduced for the x86_64 architecture
+was initially introduced for the x86_64 architecture
 where it was shown to be faster than breakpoint traps.
 It might be extended to other architectures.
-.P
-The
+.SH CAVEATS
 .BR uretprobe ()
-system call exists only to allow the invocation of return uprobe consumers.
+exists only to allow the invocation of return uprobe consumers.
 It should
 .B never
 be called directly.
-Details of the arguments (if any) passed to
-.BR uretprobe ()
-and the return value are architecture-specific.

-- 



signature.asc
Description: PGP signature

Re: [PATCHv8 9/9] man2: Add uretprobe syscall page

2024-06-11 Thread Alejandro Colomar

Hi,

On Tue, Jun 11, 2024 at 11:30:22PM GMT, Masami Hiramatsu wrote:
> On Tue, 11 Jun 2024 13:21:58 +0200
> Jiri Olsa  wrote:
> 
> > Adding man page for new uretprobe syscall.
> > 
> > Acked-by: Andrii Nakryiko 
> > Reviewed-by: Alejandro Colomar 
> > Signed-off-by: Jiri Olsa 
> 
> This looks good to me.
> 
> Reviewed-by: Masami Hiramatsu (Google) 
> 
> And this needs to be picked by linux-man@ project.

Yup; please ping me when the rest is merged and I should pick it.

Have a lovely day!
Alex

> 
> Thank you,
> 
> > ---
> >  man/man2/uretprobe.2 | 56 
> >  1 file changed, 56 insertions(+)
> >  create mode 100644 man/man2/uretprobe.2
> > 
> > diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> > new file mode 100644
> > index ..cf1c2b0d852e
> > --- /dev/null
> > +++ b/man/man2/uretprobe.2
> > @@ -0,0 +1,56 @@
> > +.\" Copyright (C) 2024, Jiri Olsa 
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +uretprobe \- execute pending return uprobes
> > +.SH SYNOPSIS
> > +.nf
> > +.B int uretprobe(void)
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR uretprobe ()
> > +system call is an alternative to breakpoint instructions for triggering 
> > return
> > +uprobe consumers.
> > +.P
> > +Calls to
> > +.BR uretprobe ()
> > +system call are only made from the user-space trampoline provided by the 
> > kernel.
> > +Calls from any other place result in a
> > +.BR SIGILL .
> > +.SH RETURN VALUE
> > +The
> > +.BR uretprobe ()
> > +system call return value is architecture-specific.
> > +.SH ERRORS
> > +.TP
> > +.B SIGILL
> > +The
> > +.BR uretprobe ()
> > +system call was called by a user-space program.
> > +.SH VERSIONS
> > +Details of the
> > +.BR uretprobe ()
> > +system call behavior vary across systems.
> > +.SH STANDARDS
> > +None.
> > +.SH HISTORY
> > +TBD
> > +.SH NOTES
> > +The
> > +.BR uretprobe ()
> > +system call was initially introduced for the x86_64 architecture
> > +where it was shown to be faster than breakpoint traps.
> > +It might be extended to other architectures.
> > +.P
> > +The
> > +.BR uretprobe ()
> > +system call exists only to allow the invocation of return uprobe consumers.
> > +It should
> > +.B never
> > +be called directly.
> > +Details of the arguments (if any) passed to
> > +.BR uretprobe ()
> > +and the return value are architecture-specific.
> > -- 
> > 2.45.1
> > 
> 
> 
> -- 
> Masami Hiramatsu (Google) 
> 

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature

Re: [PATCHv6 9/9] man2: Add uretprobe syscall page

2024-05-22 Thread Alejandro Colomar

Hi Jirka,

On Wed, May 22, 2024 at 09:54:58AM GMT, Jiri Olsa wrote:
> ok, thanks
> 
> jirka
> 
> 
> ---
> diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> new file mode 100644
> index ..5b5f340b59b6
> --- /dev/null
> +++ b/man/man2/uretprobe.2
> @@ -0,0 +1,56 @@
> +.\" Copyright (C) 2024, Jiri Olsa 
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR uretprobe ()
> +system call is an alternative to breakpoint instructions for triggering 
> return
> +uprobe consumers.
> +.P
> +Calls to
> +.BR uretprobe ()
> +system call are only made from the user-space trampoline provided by the 
> kernel.
> +Calls from any other place result in a
> +.BR SIGILL .
> +.SH RETURN VALUE
> +The
> +.BR uretprobe ()
> +system call return value is architecture-specific.
> +.SH ERRORS
> +.TP
> +.B SIGILL
> +The
> +.BR uretprobe ()
> +system call was called by user.

Maybe 'a user-space program'?
Anyway, LGTM.  Thanks!

Reviewed-by: Alejandro Colomar 

Have a lovely day!
Alex

> +.SH VERSIONS
> +Details of the
> +.BR uretprobe ()
> +system call behavior vary across systems.
> +.SH STANDARDS
> +None.
> +.SH HISTORY
> +TBD
> +.SH NOTES
> +The
> +.BR uretprobe ()
> +system call was initially introduced for the x86_64 architecture
> +where it was shown to be faster than breakpoint traps.
> +It might be extended to other architectures.
> +.P
> +The
> +.BR uretprobe ()
> +system call exists only to allow the invocation of return uprobe consumers.
> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are architecture-specific.

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature

Re: [PATCHv6 9/9] man2: Add uretprobe syscall page

2024-05-21 Thread Alejandro Colomar

Hi Jirka,

On Tue, May 21, 2024 at 10:24:30PM GMT, Jiri Olsa wrote:
> how about the change below?

Much better.  I still have a few comments below.  :-)

> 
> thanks,
> jirka
> 
> 
> ---
> diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> new file mode 100644
> index ..959b7a47102b
> --- /dev/null
> +++ b/man/man2/uretprobe.2
> @@ -0,0 +1,55 @@
> +.\" Copyright (C) 2024, Jiri Olsa 
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR uretprobe ()
> +system call is an alternative to breakpoint instructions for triggering 
> return
> +uprobe consumers.
> +.P
> +Calls to
> +.BR uretprobe ()
> +system call are only made from the user-space trampoline provided by the 
> kernel.
> +Calls from any other place result in a
> +.BR SIGILL .
> +.SH RETURN VALUE
> +The
> +.BR uretprobe ()
> +system call return value is architecture-specific.
> +.SH ERRORS
> +.BR SIGILL

This should be a tagged paragraph, preceeded with '.TP'.  See any manual
page with an ERRORS section for an example.

Also, BR is Bold alternating with Roman, but this is just bold, so it
should use '.B'.

.TP
.B SIGILL

> +The
> +.BR uretprobe ()
> +system call was called by user.
> +.SH VERSIONS
> +Details of the
> +.BR uretprobe ()
> +system call behavior vary across systems.
> +.SH STANDARDS
> +None.
> +.SH HISTORY
> +TBD
> +.SH NOTES
> +The
> +.BR uretprobe ()
> +system call was initially introduced for the x86_64 architecture where it 
> was shown

We have a strong-ish limit at column 80.  Please break after
'architecture', which is a clause boundary.

Have a lovely night!
Alex

> +to be faster than breakpoint traps.
> +It might be extended to other architectures.
> +.P
> +The
> +.BR uretprobe ()
> +system call exists only to allow the invocation of return uprobe consumers.
> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are architecture-specific.
> 

-- 



signature.asc
Description: PGP signature

Re: [PATCHv6 9/9] man2: Add uretprobe syscall page

2024-05-21 Thread Alejandro Colomar

Hi Jiri,

On Tue, May 21, 2024 at 12:48:25PM GMT, Jiri Olsa wrote:
> Adding man page for new uretprobe syscall.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  man2/uretprobe.2 | 50 
>  1 file changed, 50 insertions(+)
>  create mode 100644 man2/uretprobe.2
> 
> diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
> new file mode 100644
> index ..690fe3b1a44f
> --- /dev/null
> +++ b/man2/uretprobe.2
> @@ -0,0 +1,50 @@
> +.\" Copyright (C) 2024, Jiri Olsa 
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi

What header file provides this system call?

> +.SH DESCRIPTION
> +The
> +.BR uretprobe ()
> +syscall is an alternative to breakpoint instructions for
> +triggering return uprobe consumers.
> +.P
> +Calls to
> +.BR uretprobe ()
> +suscall are only made from the user-space trampoline provided by the kernel.

s/suscall/system call/

> +Calls from any other place result in a
> +.BR SIGILL .

Maybe add an ERRORS section?

> +

We don't use blank lines; it causes a groff(1) warning, and other
problems.  Instead, use '.P'.

> +.SH RETURN VALUE
> +The
> +.BR uretprobe ()
> +syscall return value is architecture-specific.
> +

.P

> +.SH VERSIONS
> +This syscall is not specified in POSIX,

Redundant with "STANDARDS: None.".

> +and details of its behavior vary across systems.

Keep this.

> +.SH STANDARDS
> +None.
> +.SH HISTORY
> +TBD
> +.SH NOTES
> +The
> +.BR uretprobe ()
> +syscall was initially introduced for the x86_64 architecture where it was 
> shown
> +to be faster than breakpoint traps. It might be extended to other 
> architectures.

Please use semantic newlines.

$ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
   Use semantic newlines
 In the source of a manual page, new sentences should be started on
 new lines, long sentences should be split  into  lines  at  clause
 breaks  (commas,  semicolons, colons, and so on), and long clauses
 should be split at phrase boundaries.  This convention,  sometimes
 known as "semantic newlines", makes it easier to see the effect of
 patches, which often operate at the level of individual sentences,
 clauses, or phrases.

> +.P
> +The
> +.BR uretprobe ()
> +syscall exists only to allow the invocation of return uprobe consumers.

s/syscall/system call/

> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are architecture-specific.
> -- 
> 2.44.0

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature

Re: [PATCHv4 7/7] man2: Add uretprobe syscall page

2024-05-02 Thread Alejandro Colomar

Hi Jiri,

On Thu, May 02, 2024 at 10:13:12PM +0200, Jiri Olsa wrote:
> > You could add a HISTORY section.
> 
> ok, IIUC for this syscall it should contain just kernel version where
> it got merged, right?

Yep.

> 
> > 
> > Have a lovely day!
> 
> thanks for review,
> jirka

Thanks for the page.

Have a lovely night!
Alex

-- 

A client is hiring kernel driver, mm, and/or crypto developers;
contact me if interested.

signature.asc
Description: PGP signature

Re: [PATCHv4 7/7] man2: Add uretprobe syscall page

2024-05-02 Thread Alejandro Colomar

Hi Jiri,

On Thu, May 02, 2024 at 02:23:13PM +0200, Jiri Olsa wrote:
> Adding man page for new uretprobe syscall.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  man2/uretprobe.2 | 45 +
>  1 file changed, 45 insertions(+)
>  create mode 100644 man2/uretprobe.2
> 
> diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
> new file mode 100644
> index ..08fe6a670430
> --- /dev/null
> +++ b/man2/uretprobe.2
> @@ -0,0 +1,45 @@
> +.\" Copyright (C) 2024, Jiri Olsa 
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi
> +.SH DESCRIPTION
> +Kernel is using
> +.BR uretprobe()
> +syscall to trigger uprobe return probe consumers instead of using
> +standard breakpoint instruction.
> +

Please use .P instead of a blank.  See man-pages(7):

   Formatting conventions (general)
 Paragraphs should be separated by suitable markers (usually either
 .P or .IP).  Do not separate paragraphs using blank lines, as this
 results in poor rendering in some output formats  (such  as  Post‐
 Script and PDF).

> +The uretprobe syscall is not supposed to be called directly by user, it's 
> allowed

s/by user/by the user/

> +to be invoked only through user space trampoline provided by kernel.

s/user space/user-space/

Missing a few 'the' too, here and in the rest of the page.

> +When called from outside of this trampoline, the calling process will receive
> +.BR SIGILL .
> +
> +.SH RETURN VALUE
> +.BR uretprobe()

You're missing a space here:

.BR uretprobe ()

> +return value is specific for given architecture.
> +
> +.SH VERSIONS
> +This syscall is not specified in POSIX,
> +and details of its behavior vary across systems.
> +.SH STANDARDS
> +None.

You could add a HISTORY section.

Have a lovely day!
Alex

> +.SH NOTES
> +.BR uretprobe()
> +syscall is initially introduced on x86-64 architecture, because doing syscall
> +is faster than doing breakpoint trap on it. It might be extended to other
> +architectures.
> +
> +.BR uretprobe()
> +syscall exists only to allow the invocation of return uprobe consumers.
> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are specific for given architecture.
> -- 
> 2.44.0
> 
> 

-- 

A client is hiring kernel driver, mm, and/or crypto developers;
contact me if interested.


signature.asc
Description: PGP signature

Re: [PATCH] set_thread_area.2: Add C-SKY document

2023-10-15 Thread Alejandro Colomar

Hi Guo,

On Sun, Oct 15, 2023 at 11:07:32AM -0400, guo...@kernel.org wrote:
> From: Guo Ren 
> 
> C-SKY only needs set_thread_area, no need for get_thread_area, the
> same as MIPS.
> 
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> ---

Patch applied.


Thanks!
Alex

>  man2/set_thread_area.2 | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/man2/set_thread_area.2 b/man2/set_thread_area.2
> index 02f65e0418f2..c43a92eb447a 100644
> --- a/man2/set_thread_area.2
> +++ b/man2/set_thread_area.2
> @@ -26,7 +26,7 @@ Standard C library
>  .B "int syscall(SYS_get_thread_area);"
>  .BI "int syscall(SYS_set_thread_area, unsigned long " tp );
>  .PP
> -.B #elif defined __mips__
> +.B #elif defined(__mips__ || defined __csky__)

I removed the parentheses here.

>  .PP
>  .BI "int syscall(SYS_set_thread_area, unsigned long " addr );
>  .PP
> @@ -42,17 +42,17 @@ These calls provide architecture-specific support for a 
> thread-local storage
>  implementation.
>  At the moment,
>  .BR set_thread_area ()
> -is available on m68k, MIPS, and x86 (both 32-bit and 64-bit variants);
> +is available on m68k, MIPS, C-SKY, and x86 (both 32-bit and 64-bit variants);
>  .BR get_thread_area ()
>  is available on m68k and x86.
>  .PP
> -On m68k and MIPS,
> +On m68k, MIPS and C-SKY,
>  .BR set_thread_area ()
>  allows storing an arbitrary pointer (provided in the
>  .B tp
>  argument on m68k and in the
>  .B addr
> -argument on MIPS)
> +argument on MIPS and C-SKY)
>  in the kernel data structure associated with the calling thread;
>  this pointer can later be retrieved using
>  .BR get_thread_area ()
> @@ -139,7 +139,7 @@ return 0 on success, and \-1 on failure, with
>  .I errno
>  set to indicate the error.
>  .PP
> -On MIPS and m68k,
> +On C-SKY, MIPS and m68k,
>  .BR set_thread_area ()
>  always returns 0.
>  On m68k,
> -- 
> 2.36.1
> 

-- 



signature.asc
Description: PGP signature

Re: set_thread_area.2: csky architecture undocumented

2023-10-14 Thread Alejandro Colomar

Hi Guo,

On Tue, Nov 24, 2020 at 08:07:07PM +0800, Guo Ren wrote:

Huh, 3 years already!  I've had this in my head for all this time; just
didn't find the energy to act on it.

> Thx Michael & Alejandro,
> 
> Yes, the man page has no csky's.

I've applied a patch to add initial documentation for it:
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=a63979eb24aaf73f4be5799cc9579f84a3874b7d>

> 
> C-SKY have abiv1 and abiv2
> For abiv1: There is no register for tls saving, We use trap 3 to got
> tls and use set_thread_area to init ti->tp_value.
> For abiv2: The r31 is the tls register. We could directly read r31 to
> got r31 and use set_thread_area to init reg->tls value.
> 
> In glibc:
> # ifdef __CSKYABIV2__
> /* Define r31 as thread pointer register.  */
> #  define READ_THREAD_POINTER() \
> mov r0, r31;
> # else
> #  define READ_THREAD_POINTER() \
> trap 3;
> # endif
> 
> /* Code to initially initialize the thread pointer.  This might need
>special attention since 'errno' is not yet available and if the
>operation can cause a failure 'errno' must not be touched.  */
> # define TLS_INIT_TP(tcbp) \
>   ({ INTERNAL_SYSCALL_DECL (err);   \
>  long result_var;   \
>  result_var = INTERNAL_SYSCALL (set_thread_area, err, 1,\
> (char *) (tcbp) + TLS_TCB_OFFSET);  \
>  INTERNAL_SYSCALL_ERROR_P (result_var, err) \
>? "unknown error" : NULL; })
> 
> In kernel:
> SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
> {
> struct thread_info *ti = task_thread_info(current);
> struct pt_regs *reg = current_pt_regs();
> 
> reg->tls = addr;
> ti->tp_value = addr;
> 
> return 0;
> }
> 
> Any comments are welcome :)

I'm sorry, but I have little understanding of this syscall, and that
shounds like gibberish to me :)

Feel free to send a patch to improve the documentation for csky.

Cheers,
Alex

> 
> 
> On Tue, Nov 24, 2020 at 5:51 PM Michael Kerrisk (man-pages)
>  wrote:
> >
> > Hi Alex,
> >
> > On 11/23/20 10:31 PM, Alejandro Colomar (man-pages) wrote:
> > > Hi Michael,
> > >
> > > SYNOPSIS
> > >#include 
> > >
> > >#if defined __i386__ || defined __x86_64__
> > ># include 
> > >
> > >int get_thread_area(struct user_desc *u_info);
> > >int set_thread_area(struct user_desc *u_info);
> > >
> > >#elif defined __m68k__
> > >
> > >int get_thread_area(void);
> > >int set_thread_area(unsigned long tp);
> > >
> > >#elif defined __mips__
> > >
> > >int set_thread_area(unsigned long addr);
> > >
> > >#endif
> > >
> > >Note: There are no glibc wrappers for these system  calls;  see
> > >NOTES.
> > >
> > >
> > > $ grep -rn 'SYSCALL_DEFINE.*et_thread_area'
> > > arch/csky/kernel/syscall.c:6:
> > > SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
> > > arch/mips/kernel/syscall.c:86:
> > > SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
> > > arch/x86/kernel/tls.c:191:
> > > SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, u_info)
> > > arch/x86/kernel/tls.c:243:
> > > SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, u_info)
> > > arch/x86/um/tls_32.c:277:
> > > SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, user_desc)
> > > arch/x86/um/tls_32.c:325:
> > > SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, user_desc)
> > >
> > >
> > > See kernel commit 4859bfca11c7d63d55175bcd85a75d6cee4b7184
> > >
> > >
> > > I'd change
> > > -  #elif defined __mips__
> > > +  #elif defined(__mips__ || __csky__)
> > >
> > > and then change the rest of the text to add csky when appropriate.
> > > Am I correct?
> >
> > AFAICT, you are correct. I think the reason that csky is missing is
> > that the architecture was added after this manual pages was added.
> >
> > Thanks,
> >
> > Michael
> >
> >
> > --
> > Michael Kerrisk
> > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> > Linux/UNIX System Programming Training: http://man7.org/training/
> 
> 
> 
> --
> Best Regards
>  Guo Ren
> 
> ML: https://lore.kernel.org/linux-csky/

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature

Re: [PATCH v5 0/4] man2: udpate mm/userfaultfd manpages to latest

2021-04-01 Thread Alejandro Colomar (man-pages)


Hi Peter,

On 3/30/21 12:18 AM, Peter Xu wrote:

v5:
- add r-bs for Mike R.
- Fix spelling mistake "diable" [Mike R.]
- s/Starting from/Since/ for patch 2 (also replaced two existing ones in the
   same file) [Alex]
- s/un-write-protect/write-unprotect/ [Alex]
- s/The process was interrupted and need to retry/The process was interrupted;
   retry this call/ in the last patch. [Alex]

v4:
- Fixed a few "subordinate clauses" (SC) cases [Alex]
- Reword in ioctl_userfaultfd.2 to use bold font for the two modes referenced,
   so as to be clear on what is "both" referring to [Alex]

v3:
- Don't use "Currently", instead add "(since x.y)" mark where proper [Alex]
- Always use semantic newlines across the whole patchset [Alex]
- Use quote when possible, rather than escapes [Alex]
- Fix one missing replacement of ".BR" -> ".B" [Alex]
- Some other trivial rephrases here and there when fixing up above

v2 changes:
- Fix wordings as suggested [MikeR]
- convert ".BR" to ".B" where proper for the patchset [Alex]
- rearrange a few lines in the last two patches where they got messed up
- document more things, e.g. UFFDIO_COPY_MODE_WP; and also on how to resolve a
   wr-protect page fault.

There're two features missing in current manpage, namely:

   (1) Userfaultfd Thread-ID feature
   (2) Userfaultfd write protect mode

There's also a 3rd one which was just contributed from Axel - Axel, I think it
would be great if you can add that part too, probably after the whole
hugetlbfs/shmem minor mode reaches the linux master branch.

Please review, thanks.

Peter Xu (4):
   userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
   userfaultfd.2: Add write-protect mode
   ioctl_userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs
   ioctl_userfaultfd.2: Add write-protect mode docs


I applied all 4 patches (with a few minor fixes to 1/4 and 4/4 (cosmetic 
fixes; some of them about the 80-col right margin)): 
<https://github.com/alejandro-colomar/man-pages/tree/eb8f2001d493d458d08b9b87605ed2ac453c7f5f>


Thanks!

Alex



  man2/ioctl_userfaultfd.2 |  89 +++-
  man2/userfaultfd.2   | 121 +++++--
  2 files changed, 203 insertions(+), 7 deletions(-)




--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH v4 4/4] ioctl_userfaultfd.2: Add write-protect mode docs

2021-03-29 Thread Alejandro Colomar (man-pages)

Hi Peter,

On 3/29/21 11:51 PM, Peter Xu wrote:
> On Thu, Mar 25, 2021 at 10:32:20PM +0100, Alejandro Colomar (man-pages) wrote:
>>>>> +.TP
>>>>> +.B ENOENT
>>>>> +The range specified in
>>>>> +.I range
>>>>> +is not valid.
>>>>
>>>> I'm not sure how this is different from the wording above in EINVAL.  An
>>>> "otherwise invalid range" was already giving EINVAL?
>>>
>>> This can be returned when vma is not found (mwriteprotect_range()):
>>>
>>> err = -ENOENT;
>>> dst_vma = find_dst_vma(dst_mm, start, len);
>>>
>>> if (!dst_vma)
>>> goto out_unlock;
>>>
>>> I think maybe I could simply remove this entry, because from an user app
>>> developer pov I'd only be interested in specific error that I'd be able to
>>> detect and (even better) recover from.  For such error I'd say there's not 
>>> much
>>> to do besides failing the app.
>>
>> If there's any possibility that the error can happen, it should be
>> documented, even if it's to say "Fatal error; abort!".  Just try to explain
>> the causes and how to avoid causing them and/or possibly what to do when
>> they happen (abort?).
> 
> Okay.  Would you mind me keeping my original wording?  Because IMHO that
> exactly does what you said as "trying to explain the causes" and so on:
> 
> .B ENOENT
> The range specified in
> .I range
> is not valid.
> For example, the virtual address does not exist,
> or not registered with userfaultfd write-protect mode.
> 
> It's indeed slightly duplicated with EINVAL, but if you don't agree with the
> wording meanwhile if you don't agree on overlapping of the errors, then what I
> need is not reworking this patchset, but proposing a kernel patch to change 
> the
> error retval to make them match. I am not against proposing a kernel patch, 
> but
> I just don't see it extremely necessary.
> 
> For my own experience on working with the kernel, the return value sometimes 
> is
> not that strict - say, it's hard to control every single bit of the possible
> return code of a syscall/ioctl to reflect everything matching the document.  
> We
> should always try to do it accurate but it seems not easy to me.  It's also
> hard to write up the document that 100% matching the kernel code, because at
> least that'll require a full-path workthrough of every single piece of kernel
> code that the syscall/ioctl has called, so as to collect all the errors, then
> summarize their meanings.  That could be a lot of work.

Yes, That's fine.  I was only curious about the overlap, but if they do
overlap, that's it.

>>>>> +For example, the virtual address does not exist,
>>>>> +or not registered with userfaultfd write-protect mode.
>>>>> +.TP
>>>>> +.B EFAULT
>>>>> +Encountered a generic fault during processing.
>>>>
>>>> What is a "generic fault"?
>>>
>>> For example when the user copy failed due to some reason.  See
>>> userfaultfd_writeprotect():
>>>
>>> if (copy_from_user(&uffdio_wp, user_uffdio_wp,
>>>sizeof(struct uffdio_writeprotect)))
>>> return -EFAULT;
>>>
>>> But I didn't check other places, generally I'd return -EFAULT if I can't 
>>> find a
>>> proper other replacement which has a clearer meaning.
>>>
>>> I don't think this is really helpful to user app too because no user app 
>>> would
>>> start to read this -EFAULT to do anything useful.. how about I drop it too 
>>> if
>>> you think the description is confusing?
>>
>> Same as above.
> 
> Above copy_from_user() is the only place that could trigger -EFAULT so far I
> can find.  So either I can change above into:
> 
> .TP
> .B EFAULT
> Failure on copying ioctl parameters into the kernel.
> 
> Would you think it okay (before I repost)?  I'd still prefer my original
> wording because I bet 90% user developer may not even know what does it mean
> when the kernel cannot copy the user parameter, and what he/she can do with
> it..  However if you think it's proper I'll use it.

Okay, I'll take your original words.  Maybe all this "extra" info could
go into the commit message.  I'll wait for your resend with the a-b and
the minor changes :-)

Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH] mremap.2: MREMAP_DONTUNMAP to reflect to supported mappings

2021-03-25 Thread Alejandro Colomar (man-pages)


Hello Brian,

Is this already merged in Linux?  I guess not, as I've seen a patch of 
yous for the kernel, right?


Thanks,

Alex

On 3/23/21 7:25 PM, Brian Geffon wrote:

mremap(2) now supports MREMAP_DONTUNMAP with mapping types other
than private anonymous.

Signed-off-by: Brian Geffon 
---
  man2/mremap.2 | 13 ++---
  1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/man2/mremap.2 b/man2/mremap.2
index 3ed0c0c0a..72acbc111 100644
--- a/man2/mremap.2
+++ b/man2/mremap.2
@@ -118,16 +118,6 @@ This flag, which must be used in conjunction with
  remaps a mapping to a new address but does not unmap the mapping at
  .IR old_address .
  .IP
-The
-.B MREMAP_DONTUNMAP
-flag can be used only with private anonymous mappings
-(see the description of
-.BR MAP_PRIVATE
-and
-.BR MAP_ANONYMOUS
-in
-.BR mmap (2)).
-.IP
  After completion,
  any access to the range specified by
  .IR old_address
@@ -227,7 +217,8 @@ was specified, but one or more pages in the range specified 
by
  .IR old_address
  and
  .IR old_size
-were not private anonymous;
+were part of a special mapping or the mapping is one that
+does not support merging or expanding;
  .IP *
  .B MREMAP_DONTUNMAP
  was specified and



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH v4 4/4] ioctl_userfaultfd.2: Add write-protect mode docs

2021-03-25 Thread Alejandro Colomar (man-pages)


Hi Peter,

On 3/23/21 8:16 PM, Peter Xu wrote:

On Tue, Mar 23, 2021 at 07:11:04PM +0100, Alejandro Colomar (man-pages) wrote:

+.TP
+.B UFFDIO_COPY_MODE_WP
+Copy the page with read-only permission.
+This allows the user to trap the next write to the page,
+which will block and generate another write-protect userfault message.


s/write-protect/write-protected/
?


I think here "write-protect" is the wording I wanted to use, it is the name of
the type of the message in plain text.


Okay.



[...]


+.B EAGAIN
+The process was interrupted and need to retry.


Maybe: "The process was interrupted; retry this call."?
I don't know what other pager say about this kind of error.


Frankly I see no difference between the two..  If you prefer the latter, I can
switch.


I understand yours, but technically it's a bit incorrect:  The subject 
of the sentence changes: in "The process was interrupted" it's the 
process, and in "need to retry" it's [you].  By separating the sentence 
into two, it's more natural. :)







+.TP
+.B ENOENT
+The range specified in
+.I range
+is not valid.


I'm not sure how this is different from the wording above in EINVAL.  An
"otherwise invalid range" was already giving EINVAL?


This can be returned when vma is not found (mwriteprotect_range()):

err = -ENOENT;
dst_vma = find_dst_vma(dst_mm, start, len);

if (!dst_vma)
goto out_unlock;

I think maybe I could simply remove this entry, because from an user app
developer pov I'd only be interested in specific error that I'd be able to
detect and (even better) recover from.  For such error I'd say there's not much
to do besides failing the app.


If there's any possibility that the error can happen, it should be 
documented, even if it's to say "Fatal error; abort!".  Just try to 
explain the causes and how to avoid causing them and/or possibly what to 
do when they happen (abort?).







+For example, the virtual address does not exist,
+or not registered with userfaultfd write-protect mode.
+.TP
+.B EFAULT
+Encountered a generic fault during processing.


What is a "generic fault"?


For example when the user copy failed due to some reason.  See
userfaultfd_writeprotect():

if (copy_from_user(&uffdio_wp, user_uffdio_wp,
   sizeof(struct uffdio_writeprotect)))
return -EFAULT;

But I didn't check other places, generally I'd return -EFAULT if I can't find a
proper other replacement which has a clearer meaning.

I don't think this is really helpful to user app too because no user app would
start to read this -EFAULT to do anything useful.. how about I drop it too if
you think the description is confusing?


Same as above.

Thanks,

Alex


--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH v4 2/4] userfaultfd.2: Add write-protect mode

2021-03-23 Thread Alejandro Colomar (man-pages)

he flag
+.B UFFDIO_WRITEPROTECT_MODE_WP
+cleared upon the faulted page or range.
+.PP
+Write-protect mode only supports private anonymous memory.
  .SS Reading from the userfaultfd structure
  Each
  .BR read (2)
@@ -364,8 +460,12 @@ flag (see
  .BR ioctl_userfaultfd (2))
  and this flag is set, this a write fault;
  otherwise it is a read fault.
-.\"
-.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
+.TP
+.B UFFD_PAGEFAULT_FLAG_WP
+If the address is in a range that was registered with the
+.B UFFDIO_REGISTER_MODE_WP
+flag, when this bit is set it means it's a write-protect fault.
+Otherwise it's a page missing fault.
  .RE
  .TP
  .I pagefault.feat.pid




--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH v4 4/4] ioctl_userfaultfd.2: Add write-protect mode docs

2021-03-23 Thread Alejandro Colomar (man-pages)


Hi Peter,

Please see a few comments below.

Thanks,

Alex

On 3/22/21 11:08 PM, Peter Xu wrote:

Userfaultfd write-protect mode is supported starting from Linux 5.7.

Signed-off-by: Peter Xu 
---
  man2/ioctl_userfaultfd.2 | 84 ++--
  1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index d4a8375b8..5419687a6 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -234,6 +234,11 @@ operation is supported.
  The
  .B UFFDIO_UNREGISTER
  operation is supported.
+.TP
+.B 1 << _UFFDIO_WRITEPROTECT
+The
+.B UFFDIO_WRITEPROTECT
+operation is supported.
  .PP
  This
  .BR ioctl (2)
@@ -322,9 +327,6 @@ Track page faults on missing pages.
  .B UFFDIO_REGISTER_MODE_WP
  Track page faults on write-protected pages.
  .PP
-Currently, the only supported mode is
-.BR UFFDIO_REGISTER_MODE_MISSING .
-.PP
  If the operation is successful, the kernel modifies the
  .I ioctls
  bit-mask field to indicate which
@@ -443,6 +445,16 @@ operation:
  .TP
  .B UFFDIO_COPY_MODE_DONTWAKE
  Do not wake up the thread that waits for page-fault resolution
+.TP
+.B UFFDIO_COPY_MODE_WP
+Copy the page with read-only permission.
+This allows the user to trap the next write to the page,
+which will block and generate another write-protect userfault message.


s/write-protect/write-protected/
?


+This is only used when both
+.B UFFDIO_REGISTER_MODE_MISSING
+and
+.B UFFDIO_REGISTER_MODE_WP
+modes are enabled for the registered range.
  .PP
  The
  .I copy
@@ -654,6 +666,72 @@ field of the
  structure was not a multiple of the system page size; or
  .I len
  was zero; or the specified range was otherwise invalid.
+.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
+Write-protect or write-unprotect an userfaultfd registered memory range
+registered with mode
+.BR UFFDIO_REGISTER_MODE_WP .
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_range
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_writeprotect {
+struct uffdio_range range;  /* Range to change write permission */
+__u64 mode; /* Mode to change write permission */
+};
+.EE
+.in
+There're two mode bits that are supported in this structure:
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_WP
+When this mode bit is set, the ioctl will be a write-protect operation upon the
+memory range specified by
+.IR range .
+Otherwise it'll be a write-unprotect operation upon the specified range,
+which can be used to resolve an userfaultfd write-protect page fault.
+.TP
+.B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
+When this mode bit is set,
+do not wake up any thread that waits for page-fault resolution after the 
operation.
+This could only be specified if
+.B UFFDIO_WRITEPROTECT_MODE_WP
+is not specified.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EINVAL
+The
+.I start
+or the
+.I len
+field of the
+.I ufdio_range
+structure was not a multiple of the system page size; or
+.I len
+was zero; or the specified range was otherwise invalid.
+.TP
+.B EAGAIN
+The process was interrupted and need to retry.


Maybe: "The process was interrupted; retry this call."?
I don't know what other pager say about this kind of error.


+.TP
+.B ENOENT
+The range specified in
+.I range
+is not valid.


I'm not sure how this is different from the wording above in EINVAL.  An 
"otherwise invalid range" was already giving EINVAL?



+For example, the virtual address does not exist,
+or not registered with userfaultfd write-protect mode.
+.TP
+.B EFAULT
+Encountered a generic fault during processing.


What is a "generic fault"?


  .SH RETURN VALUE
  See descriptions of the individual operations, above.
  .SH ERRORS




--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH v3 4/4] ioctl_userfaultfd.2: Add write-protect mode docs

2021-03-19 Thread Alejandro Colomar (man-pages)

Hi Peter,

A few more comments below.

Thanks,

Alex

On 3/10/21 11:23 PM, Peter Xu wrote:
> Userfaultfd write-protect mode is supported starting from Linux 5.7.
> 
> Signed-off-by: Peter Xu 
> ---
>  man2/ioctl_userfaultfd.2 | 81 ++--
>  1 file changed, 78 insertions(+), 3 deletions(-)
> 
> diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
> index d4a8375b8..d8380896a 100644
> --- a/man2/ioctl_userfaultfd.2
> +++ b/man2/ioctl_userfaultfd.2
> @@ -234,6 +234,11 @@ operation is supported.
>  The
>  .B UFFDIO_UNREGISTER
>  operation is supported.
> +.TP
> +.B 1 << _UFFDIO_WRITEPROTECT
> +The
> +.B UFFDIO_WRITEPROTECT
> +operation is supported.
>  .PP
>  This
>  .BR ioctl (2)
> @@ -322,9 +327,6 @@ Track page faults on missing pages.
>  .B UFFDIO_REGISTER_MODE_WP
>  Track page faults on write-protected pages.
>  .PP
> -Currently, the only supported mode is
> -.BR UFFDIO_REGISTER_MODE_MISSING .
> -.PP
>  If the operation is successful, the kernel modifies the
>  .I ioctls
>  bit-mask field to indicate which
> @@ -443,6 +445,13 @@ operation:
>  .TP
>  .B UFFDIO_COPY_MODE_DONTWAKE
>  Do not wake up the thread that waits for page-fault resolution
> +.TP
> +.B UFFDIO_COPY_MODE_WP
> +Copy the page with read-only permission.
> +This allows the user to trap the next write to the page, which will block and

Break at the comma instead.

> +generate another write-protect userfault message.
> +This is only used in conjunction with write-protect mode when both missing 
> and

"when both missing"

both what?

> +write-protect modes are enabled.
>  .PP
>  The
>  .I copy
> @@ -654,6 +663,72 @@ field of the
>  structure was not a multiple of the system page size; or
>  .I len
>  was zero; or the specified range was otherwise invalid.
> +.SS UFFDIO_WRITEPROTECT (Since Linux 5.7)
> +Write-protect or write-unprotect an userfaultfd registered memory range
> +registered with mode
> +.BR UFFDIO_REGISTER_MODE_WP .
> +.PP
> +The
> +.I argp
> +argument is a pointer to a
> +.I uffdio_range
> +structure as shown below:
> +.PP
> +.in +4n
> +.EX
> +struct uffdio_writeprotect {
> +struct uffdio_range range;  /* Range to change write permission */
> +__u64 mode; /* Mode to change write permission */
> +};
> +.EE
> +.in
> +There're two mode bits that are supported in this structure:
> +.TP
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +When this mode bit is set, the ioctl will be a write-protect operation upon 
> the
> +memory range specified by
> +.IR range .
> +Otherwise it'll be a write-unprotect operation upon the specified range, 
> which

Break at the comma instead.

> +can be used to resolve an userfaultfd write-protect page fault.
> +.TP
> +.B UFFDIO_WRITEPROTECT_MODE_DONTWAKE
> +When this mode bit is set, do not wake up any thread that waits for 
> page-fault

Break at the comma.

> +resolution after the operation.
> +This could only be specified if
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +is not specified.
> +.PP
> +This
> +.BR ioctl (2)
> +operation returns 0 on success.
> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the error.
> +Possible errors include:
> +.TP
> +.B EINVAL
> +The
> +.I start
> +or the
> +.I len
> +field of the
> +.I ufdio_range
> +structure was not a multiple of the system page size; or
> +.I len
> +was zero; or the specified range was otherwise invalid.
> +.TP
> +.B EAGAIN
> +The process was interrupted and need to retry.
> +.TP
> +.B ENOENT
> +The range specified in
> +.I range
> +is not valid.
> +For example, the virtual address does not exist, or not registered with

Better break at the comma.

> +userfaultfd write-protect mode.
> +.TP
> +.B EFAULT
> +Encountered a generic fault during processing.
>  .SH RETURN VALUE
>  See descriptions of the individual operations, above.
>  .SH ERRORS
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH v3 2/4] userfaultfd.2: Add write-protect mode

2021-03-19 Thread Alejandro Colomar (man-pages)

t; +will be with
> +.B UFFD_PAGEFAULT_FLAG_WP
> +flag set.  Note: since only writes can trigger such kind of fault,

Break at the point above too.

> +write-protect messages will always be with
> +.B UFFD_PAGEFAULT_FLAG_WRITE
> +bit set too along with bit
> +.BR UFFD_PAGEFAULT_FLAG_WP .
> +.PP
> +To resolve a write-protection page fault, the user should initiate another
> +.B UFFDIO_WRITEPROTECT
> +ioctl, whose
> +.I uffd_msg.pagefault.flags
> +should have the flag
> +.B UFFDIO_WRITEPROTECT_MODE_WP
> +cleared upon the faulted page or range.
> +.PP
> +Write-protect mode only supports private anonymous memory.
>  .SS Reading from the userfaultfd structure
>  Each
>  .BR read (2)
> @@ -364,8 +459,12 @@ flag (see
>  .BR ioctl_userfaultfd (2))
>  and this flag is set, this a write fault;
>  otherwise it is a read fault.
> -.\"
> -.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
> +.TP
> +.B UFFD_PAGEFAULT_FLAG_WP
> +If the address is in a range that was registered with the
> +.B UFFDIO_REGISTER_MODE_WP
> +flag, when this bit is set it means it's a write-protect fault.  Otherwise 
> it's
> +a page missing fault.

Break at the point.

>  .RE
>  .TP
>  .I pagefault.feat.pid
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: AW: [RFC v3 04/17] arch_prctl.2: SYNOPSIS: Remove unused includes

2021-03-16 Thread Alejandro Colomar (man-pages)


Hi Walter,

On 3/15/21 7:00 PM, Walter Harms wrote:

I have learned the other way around:
#include 
Is a general system header to use that may include
the asm/prctrl.h what should never be included by
userspace programms.



Are you sure that  includes ?

user@debian:/usr/include$ grep -rn '\bARCH_'
asm-generic/statfs.h:42:#ifndef ARCH_PACK_STATFS64
asm-generic/statfs.h:43:#define ARCH_PACK_STATFS64
asm-generic/statfs.h:59:} ARCH_PACK_STATFS64;
asm-generic/statfs.h:65:#ifndef ARCH_PACK_COMPAT_STATFS64
asm-generic/statfs.h:66:#define ARCH_PACK_COMPAT_STATFS64
asm-generic/statfs.h:82:} ARCH_PACK_COMPAT_STATFS64;
x86_64-linux-gnu/asm/statfs.h:10:#define ARCH_PACK_COMPAT_STATFS64 
__attribute__((packed,aligned(4)))

x86_64-linux-gnu/asm/prctl.h:5:#define ARCH_SET_GS  0x1001
x86_64-linux-gnu/asm/prctl.h:6:#define ARCH_SET_FS  0x1002
x86_64-linux-gnu/asm/prctl.h:7:#define ARCH_GET_FS  0x1003
x86_64-linux-gnu/asm/prctl.h:8:#define ARCH_GET_GS  0x1004
x86_64-linux-gnu/asm/prctl.h:10:#define ARCH_GET_CPUID  0x1011
x86_64-linux-gnu/asm/prctl.h:11:#define ARCH_SET_CPUID  0x1012
x86_64-linux-gnu/asm/prctl.h:13:#define ARCH_MAP_VDSO_X32   0x2001
x86_64-linux-gnu/asm/prctl.h:14:#define ARCH_MAP_VDSO_320x2002
x86_64-linux-gnu/asm/prctl.h:15:#define ARCH_MAP_VDSO_640x2003
x86_64-linux-gnu/asm/auxvec.h:13:/* entries in ARCH_DLINFO: */
user@debian:/usr/include$ grep -rn 'asm/prctl.h'
user@debian:/usr/include$

At least on my system, no header seems to be including .

Thanks,

Alex



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

[RFC v3 17/17] ioprio_set.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/ioprio_set.2 | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/man2/ioprio_set.2 b/man2/ioprio_set.2
index 91ec03f3b..f0c914ab0 100644
--- a/man2/ioprio_set.2
+++ b/man2/ioprio_set.2
@@ -26,12 +26,13 @@
 ioprio_get, ioprio_set \- get/set I/O scheduling class and priority
 .SH SYNOPSIS
 .nf
-.BI "int ioprio_get(int " which ", int " who );
-.BI "int ioprio_set(int " which ", int " who ", int " ioprio );
-.fi
+.BR "#include " "/* Definition of " IOPRIO_* " constants 
*/"
+.BR "#include " " /* Definition of " SYS_* " constants */"
+.B #include 
 .PP
-.IR Note :
-There are no glibc wrappers for these system calls; see NOTES.
+.BI "int syscall(SYS_ioprio_get, int " which ", int " who );
+.BI "int syscall(SYS_ioprio_set, int " which ", int " who ", int " ioprio );
+.fi
 .SH DESCRIPTION
 The
 .BR ioprio_get ()
@@ -199,9 +200,6 @@ kernel 2.6.13.
 .SH CONFORMING TO
 These system calls are Linux-specific.
 .SH NOTES
-Glibc does not provide a wrapper for these system calls; call them using
-.BR syscall (2).
-.PP
 Two or more processes or threads can share an I/O context.
 This will be the case when
 .BR clone (2)
-- 
2.30.2

[RFC v3 16/17] init_module.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/init_module.2 | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/man2/init_module.2 b/man2/init_module.2
index 31229ea93..9bc2080a3 100644
--- a/man2/init_module.2
+++ b/man2/init_module.2
@@ -29,14 +29,22 @@
 init_module, finit_module \- load a kernel module
 .SH SYNOPSIS
 .nf
+.PP
 .BI "int init_module(void *" module_image ", unsigned long " len ,
-.BI "const char *" param_values );
-.BI "int finit_module(int " fd ", const char *" param_values ,
-.BI " int " flags );
+.BI "const char *" param_values );
+.PP
+.BR "#include " "/* Definition of " MODULE_* " constants 
*/"
+.BR "#include " " /* Definition of " SYS_* " constants */"
+.B #include 
+.PP
+.BI "int syscall(SYS_finit_module, int " fd ", const char *" param_values ,
+.BI "int " flags );
 .fi
 .PP
 .IR Note :
-There are no glibc wrappers for these system calls; see NOTES.
+No declaration of
+.BR init_module ()
+is provided in glibc headers; see NOTES.
 .SH DESCRIPTION
 .BR init_module ()
 loads an ELF image into kernel space,
@@ -268,11 +276,6 @@ manually declare the interface in your code;
 alternatively, you can invoke the system call using
 .BR syscall (2).
 .PP
-Glibc does not provide a wrapper for
-.BR finit_module ();
-call it using
-.BR syscall (2).
-.PP
 Information about currently loaded modules can be found in
 .IR /proc/modules
 and in the file trees under the per-module subdirectories under
-- 
2.30.2

[RFC v3 14/17] get_robust_list.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Also remove unused includes.

Signed-off-by: Alejandro Colomar 
---
 man2/get_robust_list.2 | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/man2/get_robust_list.2 b/man2/get_robust_list.2
index b1ae42dbd..9c8f14443 100644
--- a/man2/get_robust_list.2
+++ b/man2/get_robust_list.2
@@ -32,17 +32,16 @@
 get_robust_list, set_robust_list \- get/set list of robust futexes
 .SH SYNOPSIS
 .nf
-.B #include 
-.B #include 
-.B #include 
+.BR "#include " \
+"/* Definition of " "struct robust_list_head" " */"
+.BR "#include " "/* Definition of " SYS_* " constants */"
+.B #include 
 .PP
-.BI "long get_robust_list(int " pid ", struct robust_list_head **" head_ptr ,
-.BI " size_t *" len_ptr );
-.BI "long set_robust_list(struct robust_list_head *" head ", size_t " len );
+.BI "long syscall(SYS_get_robust_list, int " pid ,
+.BI " struct robust_list_head **" head_ptr ", size_t *" len_ptr );
+.BI "long syscall(SYS_set_robust_list,"
+.BI " struct robust_list_head *" head ", size_t " len );
 .fi
-.PP
-.IR Note :
-There are no glibc wrappers for these system calls; see NOTES.
 .SH DESCRIPTION
 These system calls deal with per-thread robust futex lists.
 These lists are managed in user space:
@@ -139,9 +138,6 @@ could be found.
 These system calls were added in Linux 2.6.17.
 .SH NOTES
 These system calls are not needed by normal applications.
-No support for them is provided in glibc.
-In the unlikely event that you want to call them directly, use
-.BR syscall (2).
 .PP
 A thread can have only one robust futex list;
 therefore applications that wish
-- 
2.30.2

[RFC v3 13/17] getunwind.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/getunwind.2 | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/man2/getunwind.2 b/man2/getunwind.2
index 9a58f43e4..3490a0617 100644
--- a/man2/getunwind.2
+++ b/man2/getunwind.2
@@ -29,16 +29,14 @@
 getunwind \- copy the unwind data to caller's buffer
 .SH SYNOPSIS
 .nf
-.B #include 
 .B #include 
+.BR "#include " "  /* Definition of " SYS_* " constants */"
+.B #inlcude 
 .PP
-.BI "long getunwind(void " *buf ", size_t " buf_size );
+.BI "long syscall(SYS_getunwind, void " *buf ", size_t " buf_size );
 .fi
-.PP
-.IR Note :
-There is no glibc wrapper for this system call; see NOTES.
 .SH DESCRIPTION
-.I Note: this function is obsolete.
+.I Note: this system call is obsolete.
 .PP
 The
 IA-64-specific
@@ -102,9 +100,5 @@ and is available only on the IA-64 architecture.
 This system call has been deprecated.
 The modern way to obtain the kernel's unwind data is via the
 .BR vdso (7).
-.PP
-Glibc does not provide a wrapper for this system call;
-in the unlikely event that you want to call it, use
-.BR syscall (2).
 .SH SEE ALSO
 .BR getauxval (3)
-- 
2.30.2

[RFC v3 09/17] execveat.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/execveat.2 | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/man2/execveat.2 b/man2/execveat.2
index 499bf1b57..0d23cb39b 100644
--- a/man2/execveat.2
+++ b/man2/execveat.2
@@ -28,15 +28,13 @@
 execveat \- execute program relative to a directory file descriptor
 .SH SYNOPSIS
 .nf
+.BR "#include " "  /* Definition of " SYS_* " constants */"
 .B #include 
 .PP
-.BI "int execveat(int " dirfd ", const char *" pathname ,
-.BI " const char *const " argv "[], const char *const " envp [],
-.BI " int " flags );
+.BI "int syscall(SYS_execveat, int " dirfd ", const char *" pathname ,
+.BI "const char *const " argv "[], const char *const " envp [],
+.BI "int " flags );
 .fi
-.PP
-.IR Note :
-There is no glibc wrapper for this system call; see NOTES.
 .\" FIXME . See https://sourceware.org/bugzilla/show_bug.cgi?id=27364
 .SH DESCRIPTION
 .\" commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
@@ -209,9 +207,6 @@ the natural idiom when using
 is to set the close-on-exec flag on
 .IR dirfd .
 (But see BUGS.)
-.PP
-Glibc does not provide a wrapper for this system call; call it using
-.BR syscall (2).
 .SH BUGS
 The
 .B ENOENT
-- 
2.30.2

[RFC v3 10/17] exit_group.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/exit_group.2 | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/man2/exit_group.2 b/man2/exit_group.2
index d26ec8c70..5bf207bd3 100644
--- a/man2/exit_group.2
+++ b/man2/exit_group.2
@@ -27,9 +27,10 @@
 exit_group \- exit all threads in a process
 .SH SYNOPSIS
 .nf
-.B #include 
+.BR "#include " "   /* Definition of " SYS_* " constants */"
+.B #inlcude 
 .PP
-.BI "void exit_group(int " status );
+.BI "void syscall(SYS_exit_group, int " status );
 .fi
 .SH DESCRIPTION
 This system call is equivalent to
-- 
2.30.2

[RFC v3 12/17] getdents.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/getdents.2 | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/man2/getdents.2 b/man2/getdents.2
index ba41e0be8..7d5b0e01c 100644
--- a/man2/getdents.2
+++ b/man2/getdents.2
@@ -33,7 +33,11 @@
 getdents, getdents64 \- get directory entries
 .SH SYNOPSIS
 .nf
-.BI "long getdents(unsigned int " fd ", struct linux_dirent *" dirp ,
+.BR "#include " "  /* Definition of " SYS_* " constants */"
+.B #include 
+.PP
+.BI "long syscall(SYS_getdents, unsigned int " fd \
+", struct linux_dirent *" dirp ,
 .BI " unsigned int " count );
 .PP
 .BR "#define _GNU_SOURCE" "/* See feature_test_macros(7) */"
@@ -43,9 +47,9 @@ getdents, getdents64 \- get directory entries
 .fi
 .PP
 .IR Note :
-There is no glibc wrapper for
-.BR getdents ();
-see NOTES.
+There is no definition of
+.B struct linux_dirent
+in glibc; see NOTES.
 .SH DESCRIPTION
 These are not the interfaces you are interested in.
 Look at
-- 
2.30.2

[RFC v3 11/17] futex.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

At the same time, document only headers that are required
for calling the function, or those that are specific to the
function:

 is required for the syscall() prototype.
 is required for the syscall name SYS_xxx.
 is specific to this syscall.

However, uint32_t is generic enough that it shouldn't be
documented here.  The system_data_types(7) page already documents
it, and is more precise about it.  The same goes for timespec.

As a general rule a man[23] page should document the header that
includes the prototype, and all of the headers that define macros
that should be used with the call.  However, the information about
types should be restricted to system_data_types(7) (and that page
should probably be improved by adding types), except for types
that are very specific to the call.  Otherwise, we're duplicating
info and it's then harder to maintain, and probably outdated in
the future.

Signed-off-by: Alejandro Colomar 
---
 man2/futex.2 | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/man2/futex.2 b/man2/futex.2
index e698178d2..a2486628b 100644
--- a/man2/futex.2
+++ b/man2/futex.2
@@ -25,18 +25,16 @@ futex \- fast user-space locking
 .SH SYNOPSIS
 .nf
 .PP
-.B #include 
-.B #include 
-.B #include 
+.BR "#include " "  /* Definition of " FUTEX_* " constants 
*/"
+.BR "#include " "  /* Definition of " SYS_* " constants */"
+.B #include 
 .PP
-.BI "long futex(uint32_t *" uaddr ", int " futex_op ", uint32_t " val ,
-.BI "  const struct timespec *" timeout , \
+.BI "long syscall(SYS_futex, uint32_t *" uaddr ", int " futex_op \
+", uint32_t " val ,
+.BI " const struct timespec *" timeout , \
 " \fR  /* or: \fBuint32_t \fIval2\fP */"
-.BI "  uint32_t *" uaddr2 ", uint32_t " val3 );
+.BI " uint32_t *" uaddr2 ", uint32_t " val3 );
 .fi
-.PP
-.IR Note :
-There is no glibc wrapper for this system call; see NOTES.
 .SH DESCRIPTION
 The
 .BR futex ()
@@ -1695,9 +1693,6 @@ and a sixth argument was added in Linux 2.6.7.
 .SH CONFORMING TO
 This system call is Linux-specific.
 .SH NOTES
-Glibc does not provide a wrapper for this system call; call it using
-.BR syscall (2).
-.PP
 Several higher-level programming abstractions are implemented via futexes,
 including POSIX semaphores and
 various POSIX threads synchronization mechanisms
-- 
2.30.2

[RFC v3 08/17] epoll_wait.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/epoll_wait.2 | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/man2/epoll_wait.2 b/man2/epoll_wait.2
index af4180df0..0f3cfe1d5 100644
--- a/man2/epoll_wait.2
+++ b/man2/epoll_wait.2
@@ -32,7 +32,13 @@ epoll_wait, epoll_pwait, epoll_pwait2 \- wait for an I/O 
event on an epoll file
 .BI "int epoll_pwait(int " epfd ", struct epoll_event *" events ,
 .BI "   int " maxevents ", int " timeout ,
 .BI "   const sigset_t *" sigmask );
-.BI "int epoll_pwait2(int " epfd ", struct epoll_event *" events ,
+.PP
+.BR "#include " \
+" /* Definition of " "struct epoll_event" " */"
+.BR "#include " " /* Definition of " SYS_* " constants */"
+.B #include 
+.PP
+.BI "int syscall(SYS_epoll_pwait2, int " epfd ", struct epoll_event *" events ,
 .BI "   int " maxevents ", const struct timespec *" timeout ,
 .BI "   const sigset_t *" sigmask );
 .\" FIXME: Check if glibc has added a wrapper for epoll_pwait2(),
-- 
2.30.2

[RFC v3 04/17] arch_prctl.2: SYNOPSIS: Remove unused includes

2021-03-13 Thread Alejandro Colomar

AFAICS, there's no reason to include that.
All of the macros that this function uses
are already defined in the other headers.

Cc: glibc 
Signed-off-by: Alejandro Colomar 
---
 man2/arch_prctl.2 | 1 -
 1 file changed, 1 deletion(-)

diff --git a/man2/arch_prctl.2 b/man2/arch_prctl.2
index 8706cd1ec..d1b9e16f9 100644
--- a/man2/arch_prctl.2
+++ b/man2/arch_prctl.2
@@ -28,7 +28,6 @@ arch_prctl \- set architecture-specific thread state
 .SH SYNOPSIS
 .nf
 .BR "#include " "/* Definition of " ARCH_* " constants */"
-.B #include 
 .BR "#include " "  /* Definition of " SYS_* " constants */"
 .B #include 
 .PP
-- 
2.30.2

[RFC v3 06/17] clone.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/clone.2 | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/man2/clone.2 b/man2/clone.2
index 11eb6c622..2f5ecac42 100644
--- a/man2/clone.2
+++ b/man2/clone.2
@@ -56,13 +56,12 @@ clone, __clone2, clone3 \- create a child process
 .PP
 /* For the prototype of the raw clone() system call, see NOTES */
 .PP
-.BI "long clone3(struct clone_args *" cl_args ", size_t " size );
-.fi
+.BR "#include " "/* Definition of " "struct clone_args" " 
*/"
+.BR "#include " "/* Definition of " SYS_* " constants */"
+.B #include 
 .PP
-.IR Note :
-There is no glibc wrapper for
-.BR clone3 ();
-see NOTES.
+.BI "long syscall(SYS_clone3, struct clone_args *" cl_args ", size_t " size );
+.fi
 .SH DESCRIPTION
 These system calls
 create a new ("child") process, in a manner similar to
@@ -1541,11 +1540,6 @@ One use of these systems calls
 is to implement threads: multiple flows of control in a program that
 run concurrently in a shared address space.
 .PP
-Glibc does not provide a wrapper for
-.BR clone3 ();
-call it using
-.BR syscall (2).
-.PP
 Note that the glibc
 .BR clone ()
 wrapper function makes some changes
-- 
2.30.2

[RFC v3 07/17] delete_module.2: wfix

2021-03-13 Thread Alejandro Colomar

Use the same wording as in delete_module(2) for this special case.

Signed-off-by: Alejandro Colomar 
---
 man2/delete_module.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/delete_module.2 b/man2/delete_module.2
index cb78cf484..50921d7ba 100644
--- a/man2/delete_module.2
+++ b/man2/delete_module.2
@@ -31,7 +31,7 @@ delete_module \- unload a kernel module
 .fi
 .PP
 .IR Note :
-There is no glibc wrapper for this system call; see NOTES.
+No declaration of this system call is provided in glibc headers; see NOTES.
 .SH DESCRIPTION
 The
 .BR delete_module ()
-- 
2.30.2

[RFC v3 01/17] access.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/access.2 | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/man2/access.2 b/man2/access.2
index 77ff2bd99..b662d07cf 100644
--- a/man2/access.2
+++ b/man2/access.2
@@ -49,15 +49,20 @@ access, faccessat, faccessat2 \- check user's permissions 
for a file
 .PP
 .BI "int access(const char *" pathname ", int " mode );
 .PP
-.BR "#include" "/* Definition of AT_* constants */"
+.BR "#include " "/* Definition of " AT_* " constants */"
 .B #include 
 .PP
 .BI "int faccessat(int " dirfd ", const char *" pathname ", int " \
 mode ", int " flags );
 /* But see C library/kernel differences, below */
 .PP
-.BI "int faccessat2(int " dirfd ", const char *" pathname ", int " \
-mode ", int " flags );
+.BR "#include " "/* Definition of " AT_* " constants */"
+.BR "#include " "  /* For " SYS_* " constants */"
+.B #include 
+.PP
+.BI "int syscall(SYS_faccessat2,"
+.BI "int " dirfd ", const char *" pathname ", int " mode \
+", int " flags );
 .fi
 .PP
 .RS -4
-- 
2.30.2

[RFC v3 05/17] capget.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/capget.2 | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/man2/capget.2 b/man2/capget.2
index ea504c28c..981645f90 100644
--- a/man2/capget.2
+++ b/man2/capget.2
@@ -18,14 +18,15 @@
 capget, capset \- set/get capabilities of thread(s)
 .SH SYNOPSIS
 .nf
-.B #include 
+.BR "#include " " /* Definition of types and constants */"
+.BR "#include " "  /* Definition of " SYS_* " constants */"
+.B #include 
 .PP
-.BI "int capget(cap_user_header_t " hdrp ", cap_user_data_t " datap );
-.BI "int capset(cap_user_header_t " hdrp ", const cap_user_data_t " datap );
+.BI "int syscall(SYS_capget, cap_user_header_t " hdrp ,
+.BI "cap_user_data_t " datap );
+.BI "int syscall(SYS_capset, cap_user_header_t " hdrp ,
+.BI "const cap_user_data_t " datap );
 .fi
-.PP
-.IR Note :
-There are no glibc wrappers for these system calls; see NOTES.
 .SH DESCRIPTION
 These two system calls are the raw kernel interface for getting and
 setting thread capabilities.
@@ -40,7 +41,7 @@ The portable interfaces are
 .BR cap_set_proc (3)
 and
 .BR cap_get_proc (3);
-if possible, you should use those interfaces in applications.
+if possible, you should use those interfaces in applications; see NOTES.
 .\"
 .SS Current details
 Now that you have been warned, some current kernel details.
@@ -239,9 +240,6 @@ No such thread.
 .SH CONFORMING TO
 These system calls are Linux-specific.
 .SH NOTES
-Glibc does not provide a wrapper for this system call; call it using
-.BR syscall (2).
-.PP
 The portable interface to the capability querying and setting
 functions is provided by the
 .I libcap
-- 
2.30.2

[RFC v3 02/17] alloc_hugepages.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/alloc_hugepages.2 | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/man2/alloc_hugepages.2 b/man2/alloc_hugepages.2
index 623eeab6e..a3a157725 100644
--- a/man2/alloc_hugepages.2
+++ b/man2/alloc_hugepages.2
@@ -27,11 +27,12 @@
 alloc_hugepages, free_hugepages \- allocate or free huge pages
 .SH SYNOPSIS
 .nf
-.BI "void *alloc_hugepages(int " key ", void *" addr ", size_t " len ,
-.BI "  int " prot ", int " flag );
+.BI "void *syscall(SYS_alloc_hugepages, int " key ", void *" addr \
+", size_t " len ,
+.BI "  int " prot ", int " flag );
 .\" asmlinkage unsigned long sys_alloc_hugepages(int key, unsigned long addr,
 .\" unsigned long len, int prot, int flag);
-.BI "int free_hugepages(void *" addr );
+.BI "int syscall(SYS_free_hugepages, void *" addr );
 .\" asmlinkage int sys_free_hugepages(unsigned long addr);
 .fi
 .SH DESCRIPTION
-- 
2.30.2

[RFC v3 03/17] arch_prctl.2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 man2/arch_prctl.2 | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/man2/arch_prctl.2 b/man2/arch_prctl.2
index f9a9dc39d..8706cd1ec 100644
--- a/man2/arch_prctl.2
+++ b/man2/arch_prctl.2
@@ -27,15 +27,14 @@
 arch_prctl \- set architecture-specific thread state
 .SH SYNOPSIS
 .nf
-.B #include 
+.BR "#include " "/* Definition of " ARCH_* " constants */"
 .B #include 
+.BR "#include " "  /* Definition of " SYS_* " constants */"
+.B #include 
 .PP
-.BI "int arch_prctl(int " code ", unsigned long " addr );
-.BI "int arch_prctl(int " code ", unsigned long *" addr );
+.BI "int syscall(SYS_arch_prctl, int " code ", unsigned long " addr );
+.BI "int syscall(SYS_arch_prctl, int " code ", unsigned long *" addr );
 .fi
-.PP
-.IR Note :
-There is no glibc wrapper for this system call; see NOTES.
 .SH DESCRIPTION
 .BR arch_prctl ()
 sets architecture-specific process or thread state.
@@ -177,9 +176,6 @@ and
 in the same thread is dangerous, as they may overwrite each other's
 TLS entries.
 .PP
-Glibc does not provide a wrapper for this system call; call it using
-.BR syscall (2).
-.PP
 .I FS
 may be already used by the threading library.
 Programs that use
-- 
2.30.2

[RFC v3 00/17] man2: Use syscall(SYS_...); for system calls without a wrapper

2021-03-13 Thread Alejandro Colomar

Hi Michael,

This draft is more polished than my inital idea.

I changed only those functions without a wrapper.
If we decide later to do something different, we'll see.

Any thoughts?

Cheers,

Alex

P.S.: [RFC v3 15/17] doesn't exist :)

Alejandro Colomar (17):
  access.2: Use syscall(SYS_...); for system calls without a wrapper
  alloc_hugepages.2: Use syscall(SYS_...); for system calls without a
wrapper
  arch_prctl.2: Use syscall(SYS_...); for system calls without a wrapper
  arch_prctl.2: SYNOPSIS: Remove unused includes
  capget.2: Use syscall(SYS_...); for system calls without a wrapper
  clone.2: Use syscall(SYS_...); for system calls without a wrapper
  delete_module.2: wfix
  epoll_wait.2: Use syscall(SYS_...); for system calls without a wrapper
  execveat.2: Use syscall(SYS_...); for system calls without a wrapper
  exit_group.2: Use syscall(SYS_...); for system calls without a wrapper
  futex.2: Use syscall(SYS_...); for system calls without a wrapper
  getdents.2: Use syscall(SYS_...); for system calls without a wrapper
  getunwind.2: Use syscall(SYS_...); for system calls without a wrapper
  get_robust_list.2: Use syscall(SYS_...); for system calls without a
wrapper
  init_module.2: Use syscall(SYS_...); for system calls without a
wrapper
  ioprio_set.2: Use syscall(SYS_...); for system calls without a wrapper

 man2/access.2  | 11 ---
 man2/alloc_hugepages.2 |  7 ---
 man2/arch_prctl.2  | 15 +--
 man2/capget.2  | 18 --
 man2/clone.2   | 16 +---
 man2/delete_module.2   |  2 +-
 man2/epoll_wait.2  | 32 +++-
 man2/execveat.2| 13 -
 man2/exit_group.2  |  5 +++--
 man2/futex.2   | 19 +++
 man2/get_robust_list.2 | 20 
 man2/getdents.2| 12 
 man2/getunwind.2   | 14 --
 man2/init_module.2 | 21 -
 man2/ioprio_set.2  | 14 ++
 15 files changed, 102 insertions(+), 117 deletions(-)

-- 
2.30.2

Re: [PATCH v2 2/4] userfaultfd.2: Add write-protect mode

2021-03-10 Thread Alejandro Colomar (man-pages)

th
+.B UFFD_PAGEFAULT_FLAG_WRITE
+bit set too along with
+.BR UFFD_PAGEFAULT_FLAG_WP .
+.PP
+To resolve a write-protection page fault, the user should initiate another
+.B UFFDIO_WRITEPROTECT
+ioctl whose
+.I uffd_msg.pagefault.flags
+should have the flag
+.BR UFFDIO_WRITEPROTECT_MODE_WP


.B


+cleared upon the faulted page or range.
+.PP
+Currently, write-protect mode only supports private anonymous memory.
  .SS Reading from the userfaultfd structure
  Each
  .BR read (2)
@@ -364,8 +454,12 @@ flag (see
  .BR ioctl_userfaultfd (2))
  and this flag is set, this a write fault;
  otherwise it is a read fault.
-.\"
-.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
+.TP
+.B UFFD_PAGEFAULT_FLAG_WP
+If the address is in a range that was registered with the
+.B UFFDIO_REGISTER_MODE_WP
+flag, when this bit is set it means it's a write-protect fault.  Otherwise it's
+a page missing fault.
  .RE
  .TP
  .I pagefault.feat.pid



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [RFC v4] copy_file_range.2: Update cross-filesystem support for 5.12

2021-03-04 Thread Alejandro Colomar (man-pages)


Hi Darrick,

On 3/4/21 6:13 PM, Darrick J. Wong wrote:

On Thu, Mar 04, 2021 at 10:38:07AM +0100, Alejandro Colomar wrote:

+However, on some virtual filesystems,
+the call failed to copy, while still reporting success.


...success, or merely a short copy?


Okay.



(The rest looks reasonable (at least by c_f_r standards) to me.)


I'm curious, what does "c_f_r standards" mean? :)

Cheers,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH 1/4] userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs

2021-03-04 Thread Alejandro Colomar (man-pages)


Hi Peter,

On 3/4/21 4:50 PM, Peter Xu wrote:

On Thu, Mar 04, 2021 at 10:22:18AM +0100, Alejandro Colomar (man-pages) wrote:

+.BR UFFD_FEATURE_THREAD_ID


This should use [.B] and not [.BR].
.BR is for alternate Bold and Roman.
.B is for bold.

(There are more appearances of this in the other patches.)


Yeah I got a bit confused when differenciating those two, since I also see
similar usage, e.g.:

.BR O_CLOEXEC


Yes, these are minor imperfections that got into the manual pages, and 
we don't remove them due to the churn that it would create (and 
possibility of introducing other bugs while doing such a big scripted 
change that couldn't be easily reviewed (thousands of lines)).  So as we 
still have those lines, they tend to confuse.




I'll fix all of them appeared in current patchset.  Let me know if you also
want me to fix all the existing uses of ".BR" too where ".B" would suffice.
Otherwise I won't touch them since I can't say they're wrong either (I think
most of them should generate the same output with either ".BR" or ".B" if
there's only one word?).


Our current non-written guidelines are:
We are fixing the existing ones as we modify code near them,
but leave untouched code that is far from what we are changing, even on 
the same page.


Thanks,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

[RFC v4] copy_file_range.2: Update cross-filesystem support for 5.12

2021-03-04 Thread Alejandro Colomar

Linux 5.12 fixes a regression.

Cross-filesystem (introduced in 5.3) copies were buggy.

Move the statements documenting cross-fs to BUGS.
Kernels 5.3..5.11 should be patched soon.

State version information for some errors related to this.

Reported-by: Luis Henriques 
Reported-by: Amir Goldstein 
Related: <https://lwn.net/Articles/846403/>
Cc: Greg KH 
Cc: Michael Kerrisk 
Cc: Anna Schumaker 
Cc: Jeff Layton 
Cc: Steve French 
Cc: Miklos Szeredi 
Cc: Trond Myklebust 
Cc: Alexander Viro 
Cc: "Darrick J. Wong" 
Cc: Dave Chinner 
Cc: Nicolas Boichat 
Cc: Ian Lance Taylor 
Cc: Luis Lozano 
Cc: Andreas Dilger 
Cc: Olga Kornievskaia 
Cc: Christoph Hellwig 
Cc: ceph-devel 
Cc: linux-kernel 
Cc: CIFS 
Cc: samba-technical 
Cc: linux-fsdevel 
Cc: Linux NFS Mailing List 
Cc: Walter Harms 
Signed-off-by: Alejandro Colomar 
---

v3:
- Don't remove some important text.
- Reword BUGS.
v4:
- Reword.
- Link to BUGS.

Thanks, Amir, for all the help and better wordings.

Cheers,

Alex

---
 man2/copy_file_range.2 | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
index 611a39b80..f58bfea8f 100644
--- a/man2/copy_file_range.2
+++ b/man2/copy_file_range.2
@@ -169,6 +169,9 @@ Out of memory.
 .B ENOSPC
 There is not enough space on the target filesystem to complete the copy.
 .TP
+.BR EOPNOTSUPP " (since Linux 5.12)"
+The filesystem does not support this operation.
+.TP
 .B EOVERFLOW
 The requested source or destination range is too large to represent in the
 specified data types.
@@ -184,10 +187,17 @@ or
 .I fd_out
 refers to an active swap file.
 .TP
-.B EXDEV
+.BR EXDEV " (before Linux 5.3)"
+The files referred to by
+.IR fd_in " and " fd_out
+are not on the same filesystem.
+.TP
+.BR EXDEV " (since Linux 5.12)"
 The files referred to by
 .IR fd_in " and " fd_out
-are not on the same mounted filesystem (pre Linux 5.3).
+are not on the same filesystem,
+and the source and target filesystems are not of the same type,
+or do not support cross-filesystem copy.
 .SH VERSIONS
 The
 .BR copy_file_range ()
@@ -200,8 +210,11 @@ Areas of the API that weren't clearly defined were 
clarified and the API bounds
 are much more strictly checked than on earlier kernels.
 Applications should target the behaviour and requirements of 5.3 kernels.
 .PP
-First support for cross-filesystem copies was introduced in Linux 5.3.
-Older kernels will return -EXDEV when cross-filesystem copies are attempted.
+Since Linux 5.12,
+cross-filesystem copies can be achieved
+when both filesystems are of the same type,
+and that filesystem implements support for it.
+See BUGS for behavior prior to 5.12.
 .SH CONFORMING TO
 The
 .BR copy_file_range ()
@@ -226,6 +239,12 @@ gives filesystems an opportunity to implement "copy 
acceleration" techniques,
 such as the use of reflinks (i.e., two or more inodes that share
 pointers to the same copy-on-write disk blocks)
 or server-side-copy (in the case of NFS).
+.SH BUGS
+In Linux kernels 5.3 to 5.11,
+cross-filesystem copies were implemented by the kernel,
+if the operation was not supported by individual filesystems.
+However, on some virtual filesystems,
+the call failed to copy, while still reporting success.
 .SH EXAMPLES
 .EX
 #define _GNU_SOURCE
-- 
2.30.1.721.g45526154a5

Re: [PATCH 1/4] userfaultfd.2: Add UFFD_FEATURE_THREAD_ID docs

2021-03-04 Thread Alejandro Colomar (man-pages)


Hello Peter,

On 3/4/21 2:59 AM, Peter Xu wrote:

UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.

Signed-off-by: Peter Xu 
---
  man2/userfaultfd.2 | 12 
  1 file changed, 12 insertions(+)

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index e7dc9f813..2d14effc6 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -77,6 +77,12 @@ When the last file descriptor referring to a userfaultfd 
object is closed,
  all memory ranges that were registered with the object are unregistered
  and unread events are flushed.
  .\"
+.PP
+Since Linux 4.14, userfaultfd page fault message can selectively embed fault
+thread ID information into the fault message.  One needs to enable this feature
+explicitly using the
+.BR UFFD_FEATURE_THREAD_ID


This should use [.B] and not [.BR].
.BR is for alternate Bold and Roman.
.B is for bold.

(There are more appearances of this in the other patches.)

Thanks,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH] mount_setattr.2: New manual page documenting the mount_setattr() system call

2021-03-01 Thread Alejandro Colomar (man-pages)

;},
+{"block-exec",  no_argument,0,  'f'},
+{"no-access-time",  no_argument,0,  'g'},
+{ NULL, 0,  0,   0 },
+};
+
+#define exit_log(format, ...)   \\
+({  \\
+fprintf(stderr, format, ##__VA_ARGS__); \\
+exit(EXIT_FAILURE); \\
+})
+
+int main(int argc, char *argv[])
+{
+int fd_userns = -EBADF, index = 0;
+bool recursive = false;
+struct mount_attr *attr = &(struct mount_attr){};
+const char *source, *target;
+int fd_tree, new_argc, ret;
+char *const *new_argv;
+
+while ((ret = getopt_long_only(argc, argv, "", longopts, &index)) != -1) {
+switch (ret) {
+case 'a':
+fd_userns = open(optarg, O_RDONLY | O_CLOEXEC);
+if (fd_userns < 0)
+exit_log("%m - Failed top open user namespace path %s\n", 
optarg);
+break;
+case 'b':
+recursive = true;
+break;
+case 'c':
+attr->attr_set |= MOUNT_ATTR_RDONLY;
+break;
+case 'd':
+attr->attr_set |= MOUNT_ATTR_NOSUID;
+break;
+case 'e':
+attr->attr_set |= MOUNT_ATTR_NODEV;
+break;
+case 'f':
+attr->attr_set |= MOUNT_ATTR_NOEXEC;
+break;
+case 'g':
+attr->attr_set |= MOUNT_ATTR_NOATIME;
+attr->attr_clr |= MOUNT_ATTR__ATIME;
+break;
+default:
+exit_log("Invalid argument specified");
+}
+}
+
+new_argv = &argv[optind];
+new_argc = argc - optind;
+if (new_argc < 2)
+exit_log("Missing source or target mountpoint\n");
+source = new_argv[0];
+target = new_argv[1];
+
+fd_tree = open_tree(-EBADF, source,
+OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC | AT_EMPTY_PATH |
+(recursive ? AT_RECURSIVE : 0));
+if (fd_tree < 0)
+exit_log("%m - Failed to open %s\n", source);
+
+if (fd_userns >= 0) {
+attr->attr_set  |= MOUNT_ATTR_IDMAP;
+attr->userns_fd = fd_userns;
+}
+ret = mount_setattr(fd_tree, "",
+AT_EMPTY_PATH | (recursive ? AT_RECURSIVE : 0),
+attr, sizeof(struct mount_attr));
+if (ret < 0)
+exit_log("%m - Failed to change mount attributes\n");
+close(fd_userns);
+
+ret = move_mount(fd_tree, "", -EBADF, target, MOVE_MOUNT_F_EMPTY_PATH);
+if (ret < 0)
+exit_log("%m - Failed to attach mount to %s\n", target);
+close(fd_tree);
+
+exit(EXIT_SUCCESS);
+}
+.fi


.EE


+.SH SEE ALSO
+.BR capabilities (7),
+.BR clone (2),
+.BR clone3 (2),
+.BR ext4 (5),
+.BR mount (2),
+.BR mount_namespaces (7),
+.BR newuidmap (1),
+.BR newgidmap (1),
+.BR proc (5),
+.BR unshare (2),
+.BR user_namespaces (7),
+.BR xattr (7),
+.BR xfs (5)

base-commit: 64b8654d8bcac58cae635690f624e2b332736425



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

[RFC v3] copy_file_range.2: Update cross-filesystem support for 5.12

2021-03-01 Thread Alejandro Colomar

Linux 5.12 fixes a regression.

Cross-filesystem (introduced in 5.3) copies were buggy.

Move the statements documenting cross-fs to BUGS.
Kernels 5.3..5.11 should be patched soon.

State version information for some errors related to this.

Reported-by: Luis Henriques 
Reported-by: Amir Goldstein 
Related: <https://lwn.net/Articles/846403/>
Cc: Greg KH 
Cc: Michael Kerrisk 
Cc: Anna Schumaker 
Cc: Jeff Layton 
Cc: Steve French 
Cc: Miklos Szeredi 
Cc: Trond Myklebust 
Cc: Alexander Viro 
Cc: "Darrick J. Wong" 
Cc: Dave Chinner 
Cc: Nicolas Boichat 
Cc: Ian Lance Taylor 
Cc: Luis Lozano 
Cc: Andreas Dilger 
Cc: Olga Kornievskaia 
Cc: Christoph Hellwig 
Cc: ceph-devel 
Cc: linux-kernel 
Cc: CIFS 
Cc: samba-technical 
Cc: linux-fsdevel 
Cc: Linux NFS Mailing List 
Cc: Walter Harms 
Signed-off-by: Alejandro Colomar 
---

v3:
- Don't remove some important text.
- Reword BUGS.

---
Hi Amir,

I covered your comments.  I may need to add something else after your
discussion with Steve; please comment.

I tried to reword BUGS so that it's as specific and understandable as I can.
If you still find it not good enough, please comment :)

Thanks,

Alex

---
 man2/copy_file_range.2 | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
index 611a39b80..1c0df3f74 100644
--- a/man2/copy_file_range.2
+++ b/man2/copy_file_range.2
@@ -169,6 +169,9 @@ Out of memory.
 .B ENOSPC
 There is not enough space on the target filesystem to complete the copy.
 .TP
+.BR EOPNOTSUPP " (since Linux 5.12)"
+The filesystem does not support this operation.
+.TP
 .B EOVERFLOW
 The requested source or destination range is too large to represent in the
 specified data types.
@@ -184,10 +187,17 @@ or
 .I fd_out
 refers to an active swap file.
 .TP
-.B EXDEV
+.BR EXDEV " (before Linux 5.3)"
+The files referred to by
+.IR fd_in " and " fd_out
+are not on the same filesystem.
+.TP
+.BR EXDEV " (since Linux 5.12)"
 The files referred to by
 .IR fd_in " and " fd_out
-are not on the same mounted filesystem (pre Linux 5.3).
+are not on the same filesystem,
+and the source and target filesystems are not of the same type,
+or do not support cross-filesystem copy.
 .SH VERSIONS
 The
 .BR copy_file_range ()
@@ -200,8 +210,10 @@ Areas of the API that weren't clearly defined were 
clarified and the API bounds
 are much more strictly checked than on earlier kernels.
 Applications should target the behaviour and requirements of 5.3 kernels.
 .PP
-First support for cross-filesystem copies was introduced in Linux 5.3.
-Older kernels will return -EXDEV when cross-filesystem copies are attempted.
+Since 5.12,
+cross-filesystem copies can be achieved
+when both filesystems are of the same type,
+and that filesystem implements support for it.
 .SH CONFORMING TO
 The
 .BR copy_file_range ()
@@ -226,6 +238,12 @@ gives filesystems an opportunity to implement "copy 
acceleration" techniques,
 such as the use of reflinks (i.e., two or more inodes that share
 pointers to the same copy-on-write disk blocks)
 or server-side-copy (in the case of NFS).
+.SH BUGS
+In Linux kernels 5.3 to 5.11,
+cross-filesystem copies were supported by the kernel,
+instead of being supported by individual filesystems.
+However, on some virtual filesystems,
+the call failed to copy, while still reporting success.
 .SH EXAMPLES
 .EX
 #define _GNU_SOURCE
-- 
2.30.1.721.g45526154a5

[RFC v2] copy_file_range.2: Update cross-filesystem support for 5.12

2021-02-27 Thread Alejandro Colomar

Linux 5.12 fixes a regression.

Cross-filesystem copies (introduced in 5.3) were buggy.

Move the statements documenting cross-fs to BUGS.
Kernels 5.3..5.11 should be patched soon.

State version information for some errors related to this.

Reported-by: Luis Henriques 
Reported-by: Amir Goldstein 
Related: <https://lwn.net/Articles/846403/>
Cc: Greg KH 
Cc: Michael Kerrisk 
Cc: Anna Schumaker 
Cc: Jeff Layton 
Cc: Steve French 
Cc: Miklos Szeredi 
Cc: Trond Myklebust 
Cc: Alexander Viro 
Cc: "Darrick J. Wong" 
Cc: Dave Chinner 
Cc: Nicolas Boichat 
Cc: Ian Lance Taylor 
Cc: Luis Lozano 
Cc: Andreas Dilger 
Cc: Olga Kornievskaia 
Cc: Christoph Hellwig 
Cc: ceph-devel 
Cc: linux-kernel 
Cc: CIFS 
Cc: samba-technical 
Cc: linux-fsdevel 
Cc: Linux NFS Mailing List 
Cc: Walter Harms 
Signed-off-by: Alejandro Colomar 
---

Hi all,

Please check that this is correct.
I wrote it as I understood copy_file_range() from the LWN article,
and the conversation on this thread,
but maybe someone with more experience on this syscall find bugs in my patch.

When kernels 5.3..5.11 fix this, some info could be compacted a bit more,
and maybe the BUGS section could be removed.

Also, I'd like to know which filesystems support cross-fs, and since when.

Amir, you said that it was only cifs and nfs (since when? 5.3? 5.12?).

Also, I'm a bit surprised that <5.3 could fail with EOPNOTSUPP
and it wasn't documented.  Is that for sure, Amir?

Thanks,

Alex

---
 man2/copy_file_range.2 | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
index 611a39b80..93f54889d 100644
--- a/man2/copy_file_range.2
+++ b/man2/copy_file_range.2
@@ -169,6 +169,9 @@ Out of memory.
 .B ENOSPC
 There is not enough space on the target filesystem to complete the copy.
 .TP
+.BR EOPNOTSUPP " (before Linux 5.3; or since Linux 5.12)"
+The filesystem does not support this operation.
+.TP
 .B EOVERFLOW
 The requested source or destination range is too large to represent in the
 specified data types.
@@ -184,10 +187,17 @@ or
 .I fd_out
 refers to an active swap file.
 .TP
-.B EXDEV
+.BR EXDEV " (before Linux 5.3)"
 The files referred to by
 .IR fd_in " and " fd_out
-are not on the same mounted filesystem (pre Linux 5.3).
+are not on the same filesystem.
+.TP
+.BR EXDEV " (or since Linux 5.12)"
+The files referred to by
+.IR fd_in " and " fd_out
+are not on the same filesystem,
+and the source and target filesystems are not of the same type,
+or do not support cross-filesystem copy.
 .SH VERSIONS
 The
 .BR copy_file_range ()
@@ -195,13 +205,10 @@ system call first appeared in Linux 4.5, but glibc 2.27 
provides a user-space
 emulation when it is not available.
 .\" 
https://sourceware.org/git/?p=glibc.git;a=commit;f=posix/unistd.h;h=bad7a0c81f501fbbcc79af9eaa4b8254441c4a1f
 .PP
-A major rework of the kernel implementation occurred in 5.3.
-Areas of the API that weren't clearly defined were clarified and the API bounds
-are much more strictly checked than on earlier kernels.
-Applications should target the behaviour and requirements of 5.3 kernels.
-.PP
-First support for cross-filesystem copies was introduced in Linux 5.3.
-Older kernels will return -EXDEV when cross-filesystem copies are attempted.
+Since 5.12,
+cross-filesystem copies can be achieved
+when both filesystems are of the same type,
+and that filesystem implements support for it.
 .SH CONFORMING TO
 The
 .BR copy_file_range ()
@@ -226,6 +233,10 @@ gives filesystems an opportunity to implement "copy 
acceleration" techniques,
 such as the use of reflinks (i.e., two or more inodes that share
 pointers to the same copy-on-write disk blocks)
 or server-side-copy (in the case of NFS).
+.SH BUGS
+In Linux kernels 5.3 to 5.11, cross-filesystem copies were supported.
+However, on some virtual filesystems, the call failed to copy,
+eventhough it may have reported success.
 .SH EXAMPLES
 .EX
 #define _GNU_SOURCE
-- 
2.30.1.721.g45526154a5

Re: [PATCH] copy_file_range.2: Kernel v5.12 updates

2021-02-27 Thread Alejandro Colomar (man-pages)


Hi Amir,

On 2/27/21 6:41 AM, Amir Goldstein wrote:

On Sat, Feb 27, 2021 at 12:19 AM Alejandro Colomar (man-pages)

On 2/24/21 5:10 PM, Amir Goldstein wrote:

On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques  wrote:

   .TP
+.B EOPNOTSUPP


I'll add the kernel version here:

.BR EOPNOTSUPP " (since Linux 5.12)"


Error could be returned prior to 5.3 and would be probably returned
by future stable kernels 5.3..5.12 too


OK, I think I'll state <5.3 and >=5.12 for the moment, and if Greg adds 
that to stable 5.3..5.11 kernels, please update me.



   .B EXDEV
   The files referred to by
   .IR fd_in " and " fd_out
-are not on the same mounted filesystem (pre Linux 5.3).
+are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).


I'm not sure that 'mounted' adds any value here.  Would you remove the
word here?


See rename(2). 'mounted' in this context is explained there.
HOWEVER, it does not fit here.
copy_file_range() IS allowed between two mounts of the same filesystem instance.


Also allowed for <5.3 ?



To make things more complicated, it appears that cross mount clone is not
allowed via FICLONE/FICLONERANGE ioctl, so ioctl_ficlonerange(2) man page
also uses the 'mounted filesystem' terminology for EXDEV

As things stand now, because of the fallback to clone logic,
copy_file_range() provides a way for users to clone across different mounts
of the same filesystem instance, which they cannot do with the FICLONE ioctl.

Fun :)

BTW, I don't know if preventing cross mount clone was done intentionally,
but as I wrote in a comment in the code once:

 /*
  * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
  * the same mount. Practically, they only need to be on the same file
  * system.
  */


:)





It reads as if two separate devices with the same filesystem type would
still give this error.

Per the LWN.net article Amir shared, this is permitted ("When called
from user space, copy_file_range() will only try to copy a file across
filesystems if the two are of the same type").

This behavior was slightly different before 5.3 AFAICR (was it?) ("until
then, copy_file_range() refused to copy between files that were not
located on the same filesystem.").  If that's the case, I'd specify the
difference, or more probably split the error into two, one before 5.3,
and one since 5.12.



True.



I think you need to drop the (Linux range) altogether.


I'll keep the range.  Users of 5.3..5.11 might be surprised if the
filesystems are different and they don't get an error, I think.

I reworded it to follow other pages conventions:

.BR EXDEV " (before Linux 5.3; or since Linux 5.12)"

which renders as:

 EXDEV (before Linux 5.3; or since Linux 5.12)
The files referred to by fd_in and fd_out are not on
the same mounted filesystem.



drop 'mounted'


Yes






What's missing here is the NFS cross server copy use case.
Maybe:

...are not on the same mounted filesystem and the source and target filesystems
do not support cross-filesystem copy.


Yes.

Again, this wasn't true before 5.3, right?



Right.
Actually, v5.3 provides the vfs capabilities for filesystems to support
cross fs copy. I am not sure if NFS already implements cross fs copy in
v5.3 and not sure about cifs. Need to get input from nfs/cis developers
or dig in the release notes for server-side copy.


Okay

Thanks to LWN :)


:)

Thanks,

Alex


--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH] copy_file_range.2: Kernel v5.12 updates

2021-02-26 Thread Alejandro Colomar (man-pages)

Hello Amir, Luis,

On 2/24/21 5:10 PM, Amir Goldstein wrote:

On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques  wrote:

Update man-page with recent changes to this syscall.

Signed-off-by: Luis Henriques 
---
Hi!

Here's a suggestion for fixing the manpage for copy_file_range().  Note that
I've assumed the fix will hit 5.12.

  man2/copy_file_range.2 | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
index 611a39b8026b..b0fd85e2631e 100644
--- a/man2/copy_file_range.2
+++ b/man2/copy_file_range.2
@@ -169,6 +169,9 @@ Out of memory.
  .B ENOSPC
  There is not enough space on the target filesystem to complete the copy.
  .TP
+.B EOPNOTSUPP

I'll add the kernel version here:

.BR EOPNOTSUPP " (since Linux 5.12)"

+The filesystem does not support this operation >> +.TP
  .B EOVERFLOW
  The requested source or destination range is too large to represent in the
  specified data types.
@@ -187,7 +190,7 @@ refers to an active swap file.
  .B EXDEV
  The files referred to by
  .IR fd_in " and " fd_out
-are not on the same mounted filesystem (pre Linux 5.3).
+are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).

I'm not sure that 'mounted' adds any value here.  Would you remove the 
word here?

It reads as if two separate devices with the same filesystem type would 
still give this error.

Per the LWN.net article Amir shared, this is permitted ("When called 
from user space, copy_file_range() will only try to copy a file across 
filesystems if the two are of the same type").

This behavior was slightly different before 5.3 AFAICR (was it?) ("until 
then, copy_file_range() refused to copy between files that were not 
located on the same filesystem.").  If that's the case, I'd specify the 
difference, or more probably split the error into two, one before 5.3, 
and one since 5.12.

I think you need to drop the (Linux range) altogether.

I'll keep the range.  Users of 5.3..5.11 might be surprised if the 
filesystems are different and they don't get an error, I think.

I reworded it to follow other pages conventions:

.BR EXDEV " (before Linux 5.3; or since Linux 5.12)"

which renders as:

   EXDEV (before Linux 5.3; or since Linux 5.12)
  The files referred to by fd_in and fd_out are not on
  the same mounted filesystem.

What's missing here is the NFS cross server copy use case.
Maybe:

...are not on the same mounted filesystem and the source and target filesystems
do not support cross-filesystem copy.

Yes.

Again, this wasn't true before 5.3, right?

You may refer the reader to VERSIONS section where it will say which
filesystems support cross-fs copy as of kernel version XXX (i.e. cifs and nfs).

  .SH VERSIONS
  The
  .BR copy_file_range ()
@@ -202,6 +205,11 @@ Applications should target the behaviour and requirements 
of 5.3 kernels.
  .PP
  First support for cross-filesystem copies was introduced in Linux 5.3.
  Older kernels will return -EXDEV when cross-filesystem copies are attempted.
+.PP
+After Linux 5.12, support for copies between different filesystems was dropped.
+However, individual filesystems may still provide
+.BR copy_file_range ()
+implementations that allow copies across different devices.

Again, this is not likely to stay uptodate for very long.
The stable kernels are expected to apply your patch (because it fixes
a regression)
so this should be phrased differently.
If it were me, I would provide all the details of the situation to
Michael and ask him
to write the best description for this section.

I'll look into more detail at this part in a later review.

On 2/26/21 11:34 AM, Amir Goldstein wrote:
> Is this detailed enough? ;-)
>
> https://lwn.net/Articles/846403/

Yes, it is!

Thanks,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH] copy_file_range.2: Kernel v5.12 updates

2021-02-26 Thread Alejandro Colomar (man-pages)


Hello Jeff,

On 2/26/21 2:59 PM, Jeff Layton wrote:

Here's a link that should work. I'm probably breaking the rules a bit as
a subscriber, but hopefully Jon won't mind too much. FWIW, I've found it
to be worthwhile to subscribe to LWN if you're doing a lot of kernel
development:

 https://lwn.net/SubscriberLink/846403/0fd639403e629cab/


Thanks!  (I already received the link privately some minutes before from 
various people.)


It seems that he considers it fair use :)

[[
Where is it appropriate to post a subscriber link?

Almost anywhere. Private mail, messages to project mailing lists, and 
blog entries are all appropriate. As long as people do not use 
subscriber links as a way to defeat our attempts to gain subscribers, we 
are happy to see them shared.

]]
<https://lwn.net/op/FAQ.lwn#site>

Cheers,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [RFC v2] execve.2: SYNOPSIS: Document both glibc wrapper and kernel sycalls

2021-02-26 Thread Alejandro Colomar (man-pages)


Hi Michael,


Okay, after a few days of thinking, I'm not sure about what to do in 
some cases.


But I think we agree to use syscall(SYS_ ...) for syscalls with no 
wrapper (such as membarrier(2)).


Is that right?

I think it may be better to separate this into 2 sets of changes.

1)  Document syscalls without wrappers as syscall(SYS_ ...).
We could already start with this.
(Actually, after I finish fixing the prototypes in man3.)
This change will be fast, because there aren't many of these.

2)  Do the rest, I don't know yet how.  We'll see.


Thanks,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH] copy_file_range.2: Kernel v5.12 updates

2021-02-26 Thread Alejandro Colomar (man-pages)


Hello Amir,

On 2/26/21 11:34 AM, Amir Goldstein wrote:

Is this detailed enough? ;-)

https://lwn.net/Articles/846403/


I'm sorry I can't read it yet:

[
Subscription required
The page you have tried to view (How useful should copy_file_range() 
be?) is currently available to LWN subscribers only. Reader 
subscriptions are a necessary way to fund the continued existence of LWN 
and the quality of its content.

[...]
(Alternatively, this item will become freely available on March 4, 2021)
]

However, the 4th of March is close enough, i guess.

Thanks,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH] copy_file_range.2: Kernel v5.12 updates

2021-02-26 Thread Alejandro Colomar (man-pages)


Hello Luis,

On 2/25/21 11:21 AM, Luis Henriques wrote:

On Wed, Feb 24, 2021 at 06:10:45PM +0200, Amir Goldstein wrote:

If it were me, I would provide all the details of the situation to
Michael and ask him
to write the best description for this section.


Thanks Amir.

Yeah, it's tricky.  Support was added and then dropped.   Since stable
kernels will be picking this patch,  maybe the best thing to do is to no
mention the generic cross-filesystem support at all...?  Or simply say
that 5.3 temporarily supported it but that support was later dropped.

Michael (or Alejandro), would you be OK handling this yourself as Amir
suggested?


Could you please provide a more detailed history of what is to be 
documented?


Thanks,

Alex

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [RFC v2] execve.2: SYNOPSIS: Document both glibc wrapper and kernel sycalls

2021-02-19 Thread Alejandro Colomar (man-pages)


Hi Michael,

On 2/19/21 1:39 PM, Michael Kerrisk (man-pages) wrote:

Hey Alex,

On 2/18/21 4:13 PM, Alejandro Colomar wrote:

Until now, the manual pages have (usually) documented only either
the glibc (or another library) wrapper for a syscall, or the
kernel syscall (this only when there's not a wrapper).

Let's document both prototypes, which many times are slightly
different.  This will solve a problem where documenting glibc
wrappers implied shadowing the documentation for the raw syscall.

Signed-off-by: Alejandro Colomar 


This patch also changes madvise.2, I suppose accidentally.


I forgot to change the commit msg.

I said in the previous email[1] that I'd add a syscall without wrapper 
to the RFC.


[1]: 
<https://lore.kernel.org/linux-man/938df2c0-04b5-f6a4-79c3-b8fe09973...@gmail.com/T/#mceefe007c2e4eb0419833583d893eb37dd02b235>




I'm still not sure whether I consider this change worthwhile
for cases like this where the differences between the libc
wrapper and the syscall are minor enough to probably
be irrelevant to user-space programmers. But, if we do
add something like this, I thing a sentence or two
of English is desirable as well. Something like

The kernel system call differs slightly from the glibc
wrapper, in the addition of 'const' to two parameter
declarations:
 
 syscall(...)


But, before we go down this track, I'd like to get a sense
of how many cases there are like this where there are these
small differences between the glibc wrapper and the syscall
interface. I'm not meaning you should check every system call
now.  But maybe you can let me know something like: of the first
20 system calls I checked, there X system calls that had
such differences.


Don't worry, I'm first fixing the prototypes of man3.  This is only a 
prototype, and I'm not yet sure about which way is better to go.  I'm 
only showing ideas.


In a few days, I'll compare side to side the syscalls and their wrappers 
to see that.  If you want to have a look yourself, you can use these 
side by side:



 For reading the glibc wrappers:

 .../gnu/glibc$ man_lsfunc ../../linux/man-pages/man2 \
   |while read -r syscall; do
   echo "=  ${syscall}";
   grep_glibc_prototype ${syscall};
   done \
   |sed -e 's/\bextern //' -e 's/\b_*//g' \
   |less;

 For reading the kernel syscalls:

 .../linux/linux$ man_lsfunc ../man-pages/man2/ \
   |while read -r syscall; do
   echo "=  ${syscall}";
   grep_syscall ${syscall};
   done \
   |less;

Thanks,

Alex



Thanks,

Michael


---
  man2/execve.2 | 15 +--
  man2/membarrier.2 | 14 +-
  2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/man2/execve.2 b/man2/execve.2
index 027a0efd2..318c71c85 100644
--- a/man2/execve.2
+++ b/man2/execve.2
@@ -41,8 +41,8 @@ execve \- execute program
  .nf
  .B #include 
  .PP
-.BI "int execve(const char *" pathname ", char *const " argv [],
-.BI "   char *const " envp []);
+.BI "int execve(const char *" pathname ",
+.BI "   char *const " argv "[], char *const " envp []);
  .fi
  .SH DESCRIPTION
  .BR execve ()
@@ -772,6 +772,17 @@ Thus, this argument list was not directly usable in a 
further
  .BR exec ()
  call.
  Since UNIX\ V7, both are NULL.
+.SS C library/kernel differences
+.RS 4
+.nf
+/* Kernel system call: */
+.BR "#include " "/* For " SYS_* " constants */"
+.B #include 
+.PP
+.BI "int syscall(SYS_execve, const char *" pathname ,
+.BI "const char *const " argv "[], const char *const " envp []);
+.fi
+.RE
  .\"
  .\" .SH BUGS
  .\" Some Linux versions have failed to check permissions on ELF
diff --git a/man2/membarrier.2 b/man2/membarrier.2
index 173195484..25d6add77 100644
--- a/man2/membarrier.2
+++ b/man2/membarrier.2
@@ -28,13 +28,12 @@ membarrier \- issue memory barriers on a set of threads
  .SH SYNOPSIS
  .nf
  .PP
-.B #include 
+.BR "#include " "   /* For " MEMBARRIER_* " constants */"
+.BR "#include " "/* For " SYS_* " constants */"
+.B #include 
  .PP
-.BI "int membarrier(int " cmd ", unsigned int " flags ", int " cpu_id );
+.BI "int syscall(SYS_membarrier, int " cmd ", unsigned int " flags ", int " 
cpu_id );
  .fi
-.PP
-.IR Note :
-There is no glibc wrapper for this system call; see NOTES.
  .SH DESCRIPTION
  The
  .BR membarrier ()
@@ -295,7 +294,7 @@ was:
  .PP
  .in +4n
  .EX
-.BI "int membarrier(int " cmd ", int " flags );
+.BI "int syscall(SYS_membarrier, int " cmd ", int " flags );
  .EE
  .in
  .SH CONFO

[RFC v2] execve.2: SYNOPSIS: Document both glibc wrapper and kernel sycalls

2021-02-18 Thread Alejandro Colomar

Until now, the manual pages have (usually) documented only either
the glibc (or another library) wrapper for a syscall, or the
kernel syscall (this only when there's not a wrapper).

Let's document both prototypes, which many times are slightly
different.  This will solve a problem where documenting glibc
wrappers implied shadowing the documentation for the raw syscall.

Signed-off-by: Alejandro Colomar 
---
 man2/execve.2 | 15 +--
 man2/membarrier.2 | 14 +-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/man2/execve.2 b/man2/execve.2
index 027a0efd2..318c71c85 100644
--- a/man2/execve.2
+++ b/man2/execve.2
@@ -41,8 +41,8 @@ execve \- execute program
 .nf
 .B #include 
 .PP
-.BI "int execve(const char *" pathname ", char *const " argv [],
-.BI "   char *const " envp []);
+.BI "int execve(const char *" pathname ",
+.BI "   char *const " argv "[], char *const " envp []);
 .fi
 .SH DESCRIPTION
 .BR execve ()
@@ -772,6 +772,17 @@ Thus, this argument list was not directly usable in a 
further
 .BR exec ()
 call.
 Since UNIX\ V7, both are NULL.
+.SS C library/kernel differences
+.RS 4
+.nf
+/* Kernel system call: */
+.BR "#include " "/* For " SYS_* " constants */"
+.B #include 
+.PP
+.BI "int syscall(SYS_execve, const char *" pathname ,
+.BI "const char *const " argv "[], const char *const " envp []);
+.fi
+.RE
 .\"
 .\" .SH BUGS
 .\" Some Linux versions have failed to check permissions on ELF
diff --git a/man2/membarrier.2 b/man2/membarrier.2
index 173195484..25d6add77 100644
--- a/man2/membarrier.2
+++ b/man2/membarrier.2
@@ -28,13 +28,12 @@ membarrier \- issue memory barriers on a set of threads
 .SH SYNOPSIS
 .nf
 .PP
-.B #include 
+.BR "#include " "   /* For " MEMBARRIER_* " constants */"
+.BR "#include " "/* For " SYS_* " constants */"
+.B #include 
 .PP
-.BI "int membarrier(int " cmd ", unsigned int " flags ", int " cpu_id );
+.BI "int syscall(SYS_membarrier, int " cmd ", unsigned int " flags ", int " 
cpu_id );
 .fi
-.PP
-.IR Note :
-There is no glibc wrapper for this system call; see NOTES.
 .SH DESCRIPTION
 The
 .BR membarrier ()
@@ -295,7 +294,7 @@ was:
 .PP
 .in +4n
 .EX
-.BI "int membarrier(int " cmd ", int " flags );
+.BI "int syscall(SYS_membarrier, int " cmd ", int " flags );
 .EE
 .in
 .SH CONFORMING TO
@@ -322,9 +321,6 @@ Examples where
 .BR membarrier ()
 can be useful include implementations
 of Read-Copy-Update libraries and garbage collectors.
-.PP
-Glibc does not provide a wrapper for this system call; call it using
-.BR syscall (2).
 .SH EXAMPLES
 Assuming a multithreaded application where "fast_path()" is executed
 very frequently, and where "slow_path()" is executed infrequently, the
-- 
2.30.1.721.g45526154a5

Re: [RFC] execve.2: SYNOPSIS: Document both glibc wrapper and kernel sycalls

2021-02-18 Thread Alejandro Colomar (man-pages)


Hi Micahel,

On 2/18/21 1:27 PM, Michael Kerrisk (man-pages) wrote:

Hi Alex,

On 2/14/21 2:39 PM, Alejandro Colomar wrote:

Until now, the manual pages have (usually) documented only either
the glibc (or another library) wrapper for a syscall, or the raw
syscall (this only when there's not a wrapper).

Let's document both prototypes, which many times are slightly
different.  This will solve a problem where documenting glibc
wrappers implied shadowing the documentation for the raw syscall.

It will also be much clearer for the reader where the syscall
comes from (kernel? glibc? other?), by adding an explicit comment
at the beginning of the prototypes.  This removes the need of
scrolling down to NOTES to see that info.

Signed-off-by: Alejandro Colomar 
---

Hi all,

This is a prototype for doing some important changes to the SYNOPSIS
of the man-pages.

The commit message above explains the idea quite well.  A few details
that couldn't be shown on this commit are:

For cases where the wrapper is provided by a library other than glibc,
I'd simply change the comment.  For example, for move_pages(2),
it would say /* libnuma wrapper function: */.

I think this would make the samll notes warning that there's no glibc
wrapper function deprecated (but we could keep them for some time and
decide that later).

While changing this, I'd also make sure that the headers are correct,
and clearly differentiate which headers are needed for the raw syscall
and for the wrapper function.

This change will probably take more than one release of the man-pages
to complete.

Any thoughts?


My first impression is that I'm not keen on this. We'll add extra
text to all Section 2 pages, and in many (most?) cases the info
will be redundant (i.e., the wrapper and the syscall() notation
will express the same info). In other cases, I suspect the info
will be largely irrelevant to the user. To take an example: to
whom will the difference that you document below for execve()
matter, how will it matter, and does it matter enough that we
headline the info in the pages? I'd want cogent answers to
those questions before considering a wide-ranging change.


It will matter to:

1) Users of old systems where the glibc wrapper is not yet present.

3) Users of some unicorn Linux distributions that use a C library 
different than glibc and may not have wrappers for some syscalls that 
glibc provides.


2) Library (libc) developers.

Those won't have the glibc wrapper available for them, and will have to 
use syscall(2).  The kernel syscall info would be highly valuable for 
them.  However, the sum of them is probably not a big number of people.





There are indeed cases where the wrapper API differs in
significant ways from the syscall API (and these differences
are usually captured in the " C library/kernel differences"
subsections, such as for pselect()/pselect6() in select(2)).
But I imagine that that is the case in only a smallish
minority of the pages.

And indeed there are a very few syscalls that have wrappers
provided in another library. But it's a very small percentage
I think, and best documented case by case in specific pages.
The default presumption is that the wrapper is in the C library.


Agree.



There are other cases where I think it may be worthwhile
considering the syscall() notation:

1. Where the system call has no wrapper. In that case, we might
use the syscall() notation in the SYNOPISIS as both
(a) a clear indication that there is no wrapper and
(b) instructions to the reader about how to call the
system call using syscall().


Yes.



2. In cases where there is a "significant" difference between
the wrapper and the system call. In this case, we might
also place the syscall() notation in the SYNOPSIS, or
(perhaps more likely) in the NOTES


Yes.

I think it would be equally good to have the kernel syscall prototype in 
"C library/kernel ABI differences" in those cases where there is a glibc 
wrapper (even if it's quite different).  It would be even better, as it 
would clearly mark the syscall(2) method as a second-class method, that 
should be avoided if possible.  And also wouldn't add lines to the SYNOPSIS.


However, we should probably have that subsection for all syscalls, 
including those where the prototype is very similar to the glibc one, to 
support those who need to use the kernel syscall, and provide them with 
the exact types that the kernel expects.(except for those unsupported by 
libraries, of course, which would have the syscall(SYS_xxx) prototype in 
the SYNOPSIS).


I'll prepare a new RFC with this, with 2 pages:  one with wrapper and 
one without wrapper.


Thanks,

Alex


See also:
<https://lwn.net/Articles/534682/>
<https://www.kernel.org/doc/man-pages/todo.html#migrate_to_kernel_source>




Thanks,

Michael



Thanks,

Alex

---
  man2/execve.2 | 12 +

[RFC] execve.2: SYNOPSIS: Document both glibc wrapper and kernel sycalls

2021-02-14 Thread Alejandro Colomar

Until now, the manual pages have (usually) documented only either
the glibc (or another library) wrapper for a syscall, or the raw
syscall (this only when there's not a wrapper).

Let's document both prototypes, which many times are slightly
different.  This will solve a problem where documenting glibc
wrappers implied shadowing the documentation for the raw syscall.

It will also be much clearer for the reader where the syscall
comes from (kernel? glibc? other?), by adding an explicit comment
at the beginning of the prototypes.  This removes the need of
scrolling down to NOTES to see that info.

Signed-off-by: Alejandro Colomar 
---

Hi all,

This is a prototype for doing some important changes to the SYNOPSIS
of the man-pages.

The commit message above explains the idea quite well.  A few details
that couldn't be shown on this commit are:

For cases where the wrapper is provided by a library other than glibc,
I'd simply change the comment.  For example, for move_pages(2),
it would say /* libnuma wrapper function: */.

I think this would make the samll notes warning that there's no glibc
wrapper function deprecated (but we could keep them for some time and
decide that later).

While changing this, I'd also make sure that the headers are correct,
and clearly differentiate which headers are needed for the raw syscall
and for the wrapper function.

This change will probably take more than one release of the man-pages
to complete.

Any thoughts?

Thanks,

Alex

---
 man2/execve.2 | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/man2/execve.2 b/man2/execve.2
index 639e3b4b9..87ff022ce 100644
--- a/man2/execve.2
+++ b/man2/execve.2
@@ -39,10 +39,18 @@
 execve \- execute program
 .SH SYNOPSIS
 .nf
+/* Glibc wrapper function: */
 .B #include 
 .PP
-.BI "int execve(const char *" pathname ", char *const " argv [],
-.BI "   char *const " envp []);
+.BI "int execve(const char *" pathname ",
+.BI "   char *const " argv "[], char *const " envp []);
+.PP
+ /* Raw system call: */
+.B #include 
+.B #include 
+.PP
+.BI "int syscall(SYS_execve, const char *" pathname ,
+.BI "   const char *const " argv "[], const char *const " envp []);
 .fi
 .SH DESCRIPTION
 .BR execve ()
-- 
2.30.0

[PATCH v2] ipc.2: Fix prototype parameter types

2021-02-07 Thread Alejandro Colomar

The types for some of the parameters are incorrect
(different than the kernel).  Fix them.
Below are shown the types that the kernel uses.

..

.../linux$ grep_syscall ipc
ipc/syscall.c:110:
SYSCALL_DEFINE6(ipc, unsigned int, call, int, first, unsigned long, second,
unsigned long, third, void __user *, ptr, long, fifth)
ipc/syscall.c:205:
COMPAT_SYSCALL_DEFINE6(ipc, u32, call, int, first, int, second,
u32, third, compat_uptr_t, ptr, u32, fifth)
include/linux/compat.h:874:
asmlinkage long compat_sys_ipc(u32, int, int, u32, compat_uptr_t, u32);
include/linux/syscalls.h:1221:
asmlinkage long sys_ipc(unsigned int call, int first, unsigned long second,
unsigned long third, void __user *ptr, long fifth);
.../linux$

function grep_syscall()
{
if ! [ -v 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} ";
return ${EX_USAGE};
fi

find * -type f \
|grep '\.c$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^\w*SYSCALL_DEFINE.\(${1},.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';

find * -type f \
|grep '\.[ch]$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^asmlinkage\s+[\w\s]+\**sys_${1}\s*\(.*?\)" \
    |sed -E 's/^[^:]+:[0-9]+:/&\n/';
}

Signed-off-by: Alejandro Colomar 
---
 man2/ipc.2 | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/man2/ipc.2 b/man2/ipc.2
index 6589ffae6..a36e895a2 100644
--- a/man2/ipc.2
+++ b/man2/ipc.2
@@ -27,9 +27,8 @@
 ipc \- System V IPC system calls
 .SH SYNOPSIS
 .nf
-.BI "int ipc(unsigned int " call ", int " first ", int " second \
-", int " third ,
-.BI "void *" ptr ", long " fifth );
+.BI "int ipc(unsigned int " call ", int " first ", unsigned long " second ,
+.BI "unsigned long " third ", void *" ptr ", long " fifth );
 .fi
 .PP
 .IR Note :
-- 
2.30.0

Re: outb.2: What to do with prototypes?

2021-02-04 Thread Alejandro Colomar (man-pages)

On 2/4/21 1:59 PM, Alejandro Colomar (man-pages) wrote:
> Hi Michael,
> 
> What would you do with the prototypes in outb.2?
> They are different in glibc and the kernel.
> However, since these are functions to be called mostly withing the
> kernel, the kernel prototype is more important.  Would you use the glibc
> one in SYNOPSIS, and then a C library / kernel differences with the
> kernel prototypes?
> 
> Thanks,
> 
> Alex
> 

BTW, the declarations of those functions in the kernel are a bit
different from the rest.  My grep_syscall function couldn't find them.
There's no sys_inb, nor does it use SYSCALL_DEFINE?().

There are a lot of different declarations like plain 'inb' (some static,
some extern).  Where is the actual syscall defined?

Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

[PATCH] mmap2.2: Fix prototype parameter types

2021-02-04 Thread Alejandro Colomar

There are many slightly different prototypes for this syscall,
but none of them is like the documented one.
Of all the different prototypes,
let's document the asm-generic one.

..

.../linux$ grep_syscall mmap2
arch/csky/kernel/syscall.c:17:
SYSCALL_DEFINE6(mmap2,
unsigned long, addr,
unsigned long, len,
unsigned long, prot,
unsigned long, flags,
unsigned long, fd,
off_t, offset)
arch/microblaze/kernel/sys_microblaze.c:46:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long, fd,
unsigned long, pgoff)
arch/nds32/kernel/sys_nds32.c:12:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
   unsigned long, prot, unsigned long, flags,
   unsigned long, fd, unsigned long, pgoff)
arch/powerpc/kernel/syscalls.c:60:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, pgoff)
arch/riscv/kernel/sys_riscv.c:37:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, off_t, offset)
arch/s390/kernel/sys_s390.c:49:
SYSCALL_DEFINE1(mmap2, struct s390_mmap_arg_struct __user *, arg)
arch/sparc/kernel/sys_sparc_32.c:101:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long, fd,
unsigned long, pgoff)
arch/ia64/include/asm/unistd.h:30:
asmlinkage unsigned long sys_mmap2(
unsigned long addr, unsigned long len,
int prot, int flags,
int fd, long pgoff);
arch/ia64/kernel/sys_ia64.c:139:
asmlinkage unsigned long
sys_mmap2 (unsigned long addr, unsigned long len, int prot, int flags, int fd, 
long pgoff)
arch/m68k/kernel/sys_m68k.c:40:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
arch/parisc/kernel/sys_parisc.c:275:
asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags, unsigned long fd,
unsigned long pgoff)
arch/powerpc/include/asm/syscalls.h:15:
asmlinkage long sys_mmap2(unsigned long addr, size_t len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
arch/sh/include/asm/syscalls.h:8:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
  unsigned long prot, unsigned long flags,
  unsigned long fd, unsigned long pgoff);
arch/sh/kernel/sys_sh.c:41:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
arch/sparc/kernel/systbls.h:23:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
  unsigned long prot, unsigned long flags,
  unsigned long fd, unsigned long pgoff);
include/asm-generic/syscalls.h:14:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
.../linux$

function grep_syscall()
{
if ! [ -v 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} ";
return ${EX_USAGE};
fi

find * -type f \
|grep '\.c$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^\w*SYSCALL_DEFINE.\(${1},.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';

find * -type f \
|grep '\.[ch]$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^asmlinkage\s+[\w\s]+\**sys_${1}\s*\(.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';
}

Signed-off-by: Alejandro Colomar 
---
 man2/mmap2.2 | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/man2/mmap2.2 b/man2/mmap2.2
index 349ee45e5..f9f9e91cb 100644
--- a/man2/mmap2.2
+++ b/man2/mmap2.2
@@ -33,8 +33,9 @@ mmap2 \- map files or devices into memory
 .nf
 .B #include 
 .PP
-.BI "void *mmap2(void *" addr ", size_t " length ", int " prot ,
-.BI " int " flags ", int " fd ", off_t " pgoffset );
+.BI "void *mmap2(unsigned long " addr ", unsigned long " length ,
+.BI "unsigned long " prot ", unsigned long " flags ,
+.BI "unsigned long " fd ", unsigned long " pgoffset );
 .fi
 .SH DESCRIPTION
 This is probably not the system call that you are interested in; instead, see
-- 
2.30.0

[PATCH] ipc.2: Fix prototype parameter types

2021-02-04 Thread Alejandro Colomar

.../linux$ grep_syscall ipc
ipc/syscall.c:110:
SYSCALL_DEFINE6(ipc, unsigned int, call, int, first, unsigned long, second,
unsigned long, third, void __user *, ptr, long, fifth)
ipc/syscall.c:205:
COMPAT_SYSCALL_DEFINE6(ipc, u32, call, int, first, int, second,
u32, third, compat_uptr_t, ptr, u32, fifth)
include/linux/compat.h:874:
asmlinkage long compat_sys_ipc(u32, int, int, u32, compat_uptr_t, u32);
include/linux/syscalls.h:1221:
asmlinkage long sys_ipc(unsigned int call, int first, unsigned long second,
unsigned long third, void __user *ptr, long fifth);
.../linux$

function grep_syscall()
{
if ! [ -v 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} ";
return ${EX_USAGE};
fi

find * -type f \
|grep '\.c$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^\w*SYSCALL_DEFINE.\(${1},.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';

find * -type f \
|grep '\.[ch]$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^asmlinkage\s+[\w\s]+\**sys_${1}\s*\(.*?\)" \
    |sed -E 's/^[^:]+:[0-9]+:/&\n/';
}

Signed-off-by: Alejandro Colomar 
---
 man2/ipc.2 | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/man2/ipc.2 b/man2/ipc.2
index 469185638..5cfa3df3e 100644
--- a/man2/ipc.2
+++ b/man2/ipc.2
@@ -27,9 +27,8 @@
 ipc \- System V IPC system calls
 .SH SYNOPSIS
 .nf
-.BI "int ipc(unsigned int " call ", int " first ", int " second \
-", int " third ,
-.BI "void *" ptr ", long " fifth );
+.BI "int ipc(unsigned int " call ", int " first ", unsigned long " second ,
+.BI "unsigned long " third ", void *" ptr ", long " fifth );
 .fi
 .SH DESCRIPTION
 .BR ipc ()
-- 
2.30.0

[tip: locking/core] futex: Change utime parameter to be 'const ... *'

2021-01-28 Thread tip-bot2 for Alejandro Colomar

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 1ce53e2c2ac069e7b3c400a427002a70deb4a916
Gitweb:
https://git.kernel.org/tip/1ce53e2c2ac069e7b3c400a427002a70deb4a916
Author:Alejandro Colomar 
AuthorDate:Sat, 28 Nov 2020 13:39:46 +01:00
Committer: Thomas Gleixner 
CommitterDate: Thu, 28 Jan 2021 13:20:18 +01:00

futex: Change utime parameter to be 'const ... *'

futex(2) says that 'utime' is a pointer to 'const'.  The implementation
doesn't use 'const'; however, it _never_ modifies the contents of utime.

- futex() either uses 'utime' as a pointer to struct or as a 'u32'.

- In case it's used as a 'u32', it makes a copy of it, and of course it is
  not dereferenced.

- In case it's used as a 'struct __kernel_timespec __user *', the pointer
  is not dereferenced inside the futex() definition, and it is only passed
  to a function: get_timespec64(), which accepts a 'const struct
  __kernel_timespec __user *'.

[ tglx: Make the same change to the compat syscall and fixup the prototypes. ]

Signed-off-by: Alejandro Colomar 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201128123945.4592-1-alx.manpa...@gmail.com

---
 include/linux/syscalls.h | 8 
 kernel/futex.c   | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f3929af..5cb74ed 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -583,11 +583,11 @@ asmlinkage long sys_unshare(unsigned long unshare_flags);
 
 /* kernel/futex.c */
 asmlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
-   struct __kernel_timespec __user *utime, u32 __user 
*uaddr2,
-   u32 val3);
+ const struct __kernel_timespec __user *utime,
+ u32 __user *uaddr2, u32 val3);
 asmlinkage long sys_futex_time32(u32 __user *uaddr, int op, u32 val,
-   struct old_timespec32 __user *utime, u32 __user *uaddr2,
-   u32 val3);
+const struct old_timespec32 __user *utime,
+u32 __user *uaddr2, u32 val3);
 asmlinkage long sys_get_robust_list(int pid,
struct robust_list_head __user * __user 
*head_ptr,
size_t __user *len_ptr);
diff --git a/kernel/futex.c b/kernel/futex.c
index c47d101..d0775aa 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3790,8 +3790,8 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t 
*timeout,
 
 
 SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
-   struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
-   u32, val3)
+   const struct __kernel_timespec __user *, utime,
+   u32 __user *, uaddr2, u32, val3)
 {
struct timespec64 ts;
ktime_t t, *tp = NULL;
@@ -3986,7 +3986,7 @@ err_unlock:
 
 #ifdef CONFIG_COMPAT_32BIT_TIME
 SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val,
-   struct old_timespec32 __user *, utime, u32 __user *, uaddr2,
+   const struct old_timespec32 __user *, utime, u32 __user *, 
uaddr2,
u32, val3)
 {
struct timespec64 ts;

[tip: locking/core] futex: Change utime parameter to be 'const ... *'

2021-01-27 Thread tip-bot2 for Alejandro Colomar

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 3018a0840135536817507dd14c2a7c4ffa69
Gitweb:
https://git.kernel.org/tip/3018a0840135536817507dd14c2a7c4ffa69
Author:Alejandro Colomar 
AuthorDate:Sat, 28 Nov 2020 13:39:46 +01:00
Committer: Thomas Gleixner 
CommitterDate: Wed, 27 Jan 2021 12:30:02 +01:00

futex: Change utime parameter to be 'const ... *'

futex(2) says that 'utime' is a pointer to 'const'.  The implementation
doesn't use 'const'; however, it _never_ modifies the contents of utime.

- futex() either uses 'utime' as a pointer to struct or as a 'u32'.

- In case it's used as a 'u32', it makes a copy of it, and of course it is
  not dereferenced.

- In case it's used as a 'struct __kernel_timespec __user *', the pointer
  is not dereferenced inside the futex() definition, and it is only passed
  to a function: get_timespec64(), which accepts a 'const struct
  __kernel_timespec __user *'.

[ tglx: Make the same change to the compat syscall ]

Signed-off-by: Alejandro Colomar 
Signed-off-by: Thomas Gleixner 
Link: https://lore.kernel.org/r/20201128123945.4592-1-alx.manpa...@gmail.com
---
 kernel/futex.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index c47d101..d0775aa 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3790,8 +3790,8 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t 
*timeout,
 
 
 SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
-   struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
-   u32, val3)
+   const struct __kernel_timespec __user *, utime,
+   u32 __user *, uaddr2, u32, val3)
 {
struct timespec64 ts;
ktime_t t, *tp = NULL;
@@ -3986,7 +3986,7 @@ err_unlock:
 
 #ifdef CONFIG_COMPAT_32BIT_TIME
 SYSCALL_DEFINE6(futex_time32, u32 __user *, uaddr, int, op, u32, val,
-   struct old_timespec32 __user *, utime, u32 __user *, uaddr2,
+   const struct old_timespec32 __user *, utime, u32 __user *, 
uaddr2,
u32, val3)
 {
struct timespec64 ts;

Re: [PATCH -V9 2/3] NOT kernel/man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2021-01-21 Thread Alejandro Colomar (man-pages)

Hi Huang Ying,

On 1/20/21 7:12 AM, Huang Ying wrote:
> Signed-off-by: "Huang, Ying" 
> Cc: "Alejandro Colomar" 

Sorry, for the confusion.
I have a different email for reading lists.
I use alx.manpages@ for everything,
and alx.mailinglists@ just for reading lists, but sometimes,
when I answer emails not sent to me,
I forget to change the reply address,
and you see that address (which I intended to be readonly).

Please, use alx.manpa...@gmail.com,
or your mail might get lost between many list emails ;)

> ---
>  man2/set_mempolicy.2 | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
> index 68011eecb..fa64a1820 100644
> --- a/man2/set_mempolicy.2
> +++ b/man2/set_mempolicy.2
> @@ -113,6 +113,22 @@ A nonempty
>  .I nodemask
>  specifies node IDs that are relative to the set of
>  node IDs allowed by the process's current cpuset.
> +.TP
> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.12)"
> +When
> +.I mode
> +is
> +.BR MPOL_BIND ,
> +enable the kernel NUMA balancing for the task if it is supported by
> +the kernel.
> +If the flag isn't supported by the kernel, or is used with
> +.I mode
> +other than
> +.BR MPOL_BIND ,
> +return \-1 and
> +.I errno
> +is set to
> +.BR EINVAL .

The wording here is a bit weird:
[return // is set].  It would be better as
[return // set] or [returns // sets] or [is returned // is set].

The same page, has:

[
RETURN VALUE
   On success, set_mempolicy() returns 0; on error, -1 is  re‐
   turned and errno is set to indicate the error.
]

so I'd use the latter for consistency.

>  .PP
>  .I nodemask
>  points to a bit mask of node IDs that contains up to
> @@ -293,6 +309,12 @@ argument specified both
>  .B MPOL_F_STATIC_NODES
>  and
>  .BR MPOL_F_RELATIVE_NODES .
> +Or, the
> +.B MPOL_F_NUMA_BALANCING
> +isn't supported by the kernel, or is used with
> +.I mode
> +other than
> +.BR MPOL_BIND .
>  .TP
>  .B ENOMEM
>  Insufficient kernel memory was available.
> 

Other than that, it's good for me.

Thanks,

Alex

Just a reminder for myself (please ignore it):
- Break EINVAL into multiple paragraphs.
- (Maybe) reorder lists to be in alphabetical order.

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Ping: [PATCH] futex: Change 'utime' parameter to be 'const ... *'

2021-01-17 Thread Alejandro Colomar (man-pages)

Ping!

On 12/10/20 6:36 PM, Alejandro Colomar (man-pages) wrote:
> Hi Thomas & Ingo,
> 
> I tested the changes. Everything's OK.
> 
> Cheers,
> 
> Alex
> 
> $ uname -a
> Linux debian 5.10.0-rc7+alx3+ #4 SMP Thu Dec 10 18:05:03 CET 2020 x86_64
> GNU/Linux
> 
> .../linux/tools/testing/selftests/futex$ sudo ./run.sh
> [sudo] password for user:
> 
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=0 owner=0 timeout=0ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=0 owner=0 timeout=0ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=1 owner=0 timeout=0ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=0 owner=1 timeout=0ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=1 owner=0 timeout=0ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=0 owner=1 timeout=0ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=1 owner=0 timeout=5000ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=1 owner=0 timeout=5000ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=1 owner=0 timeout=50ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=1 owner=0 timeout=50ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=0 owner=0 timeout=5000ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=0 owner=0 timeout=5000ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=0 owner=0 timeout=50ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=0 owner=0 timeout=50ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=0 owner=1 timeout=5000ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=1 owner=0 timeout=5000ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=0 owner=1 timeout=50ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=1 owner=0 timeout=50ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=1 locked=1 owner=0 timeout=20ns
> ok 1 futex-requeue-pi
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> TAP version 13
> 1..1
> # futex_requeue_pi: Test requeue functionality
> # Arguments: broadcast=0 locked=1 owne

Re: [PATCH] getcpu.2: Document glibc wrapper instead of kernel syscall

2021-01-02 Thread Alejandro Colomar (man-pages)




On 1/2/21 9:41 AM, Michael Kerrisk (man-pages) wrote:
> Hi Alex,
> 
> On 12/30/20 10:41 PM, Alejandro Colomar wrote:
>> The glibc wrapper doesn't provide the third argument.
>> Simplify the info about the (unused) kernel parameter
>> to the minimum that is useful.
>>
>> kernels <=2.6.23 are EOL since a long time ago.
>>
>> The old info is commented out instead of removed.
> 
> I tend to be rather conservative about preserving historical
> detail in the manual pages. Yes, 2.6.23 may be EOL from the
> kernel community's point of view, but even in quite recent
> times I've run into folk in the embedded world that who have
> to at the very least support 2.6.* systems. So, as a general
> principle, I'm inclined to retain the kind of info that this
> patch removes. (I admit though that this is an extreme case:
> historical behavior in a system call that is not frequently
> used.)
> 
> There are exceptions. Occassionaly I run into historical 
> info in manual pages that is clearly wrong, or incomplete.
> In such cases, I am sometimes inclined to trim the details,
> rather than invest the effort in working out all of the
> historical details.
> 
> Clearly though, some fix is needed, since we now have 
> a glibc wrapper that has just two arguments. I've applied
> the patch below.

Hi Michael,

Agreed :-)

Cheers,

Alex

> 
> Cheers,
> 
> Michael
> 
> diff --git a/man2/getcpu.2 b/man2/getcpu.2
> index a75123f97..59089bd74 100644
> --- a/man2/getcpu.2
> +++ b/man2/getcpu.2
> @@ -14,10 +14,10 @@
>  getcpu \- determine CPU and NUMA node on which the calling thread is running
>  .SH SYNOPSIS
>  .nf
> -.B #include 
> +.BR "#define _GNU_SOURCE" " /* See feature_test_macros(7) */"
> +.B #include 
>  .PP
> -.BI "int getcpu(unsigned int *" cpu ", unsigned int *" node \
> -", struct getcpu_cache *" tcache );
> +.BI "int getcpu(unsigned int *" cpu ", unsigned int *" node );
>  .fi
>  .SH DESCRIPTION
>  The
> @@ -37,10 +37,6 @@ or
>  .I node
>  is NULL nothing is written to the respective pointer.
>  .PP
> -The third argument to this system call is nowadays unused,
> -and should be specified as NULL
> -unless portability to Linux 2.6.23 or earlier is required (see NOTES).
> -.PP
>  The information placed in
>  .I cpu
>  is guaranteed to be current only at the time of the call:
> @@ -82,16 +78,31 @@ The intention of
>  .BR getcpu ()
>  is to allow programs to make optimizations with per-CPU data
>  or for NUMA optimization.
> +.\"
> +.SS C library/kernel differences
> +The kernel system call has a third argument:
> +.PP
> +.in +4n
> +.nf
> +.BI "int getcpu(unsigned int *" cpu ", unsigned int *" node ,
> +.BI "   struct getcpu_cache *" tcache );
> +.fi
> +.in
>  .PP
>  The
>  .I tcache
> -argument is unused since Linux 2.6.24.
> +argument is unused since Linux 2.6.24,
> +and (when invoking the system call directly)
> +should be specified as NULL,
> +unless portability to Linux 2.6.23 or earlier is required.
> +.PP
>  .\" commit 4307d1e5ada595c87f9a4d16db16ba5edb70dcb1
>  .\" Author: Ingo Molnar 
>  .\" Date:   Wed Nov 7 18:37:48 2007 +0100
>  .\" x86: ignore the sys_getcpu() tcache parameter
> -In earlier kernels,
> -if this argument was non-NULL,
> +In Linux 2.6.23 and earlier, if the
> +.I tcache
> +argument was non-NULL,
>  then it specified a pointer to a caller-allocated buffer in thread-local
>  storage that was used to provide a caching mechanism for
>  .BR getcpu ().
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

[PATCH] getcpu.2: Document glibc wrapper instead of kernel syscall

2020-12-30 Thread Alejandro Colomar

The glibc wrapper doesn't provide the third argument.
Simplify the info about the (unused) kernel parameter
to the minimum that is useful.

kernels <=2.6.23 are EOL since a long time ago.

The old info is commented out instead of removed.

..

$ syscall='getcpu';
$ ret='int';
$ find glibc/ -type f -name '*.h' \
  |xargs pcregrep -Mn "(?s)^[\w\s]*${ret}\s*${syscall}\s*\(.*?;";
glibc/sysdeps/unix/sysv/linux/bits/sched.h:92:
extern int getcpu (unsigned int *, unsigned int *) __THROW;

Signed-off-by: Alejandro Colomar 
---
 man2/getcpu.2 | 55 ---
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/man2/getcpu.2 b/man2/getcpu.2
index 46e4d53ff..a9588119b 100644
--- a/man2/getcpu.2
+++ b/man2/getcpu.2
@@ -16,8 +16,7 @@ getcpu \- determine CPU and NUMA node on which the calling 
thread is running
 .nf
 .B #include 
 .PP
-.BI "int getcpu(unsigned int *" cpu ", unsigned int *" node ,
-.BI "   struct getcpu_cache *" tcache );
+.BI "int getcpu(unsigned int *" cpu ", unsigned int *" node );
 .fi
 .SH DESCRIPTION
 The
@@ -36,10 +35,10 @@ When either
 or
 .I node
 is NULL nothing is written to the respective pointer.
-.PP
-The third argument to this system call is nowadays unused,
-and should be specified as NULL
-unless portability to Linux 2.6.23 or earlier is required (see NOTES).
+.\" .PP
+.\" The third argument to this system call is nowadays unused,
+.\" and should be specified as NULL
+.\" unless portability to Linux 2.6.23 or earlier is required (see NOTES).
 .PP
 The information placed in
 .I cpu
@@ -71,6 +70,12 @@ Library support was added in glibc 2.29
 (Earlier glibc versions did not provide a wrapper for this system call,
 necessitating the use of
 .BR syscall (2).)
+.PP
+The Linux system call has a third argument
+.IR tcache ,
+which since kernel 2.6.24 is ignored.
+It should be specified as NULL.
+The glibc wrapper hides that parameter.
 .SH CONFORMING TO
 .BR getcpu ()
 is Linux-specific.
@@ -82,25 +87,25 @@ The intention of
 .BR getcpu ()
 is to allow programs to make optimizations with per-CPU data
 or for NUMA optimization.
-.PP
-The
-.I tcache
-argument is unused since Linux 2.6.24.
-.\" commit 4307d1e5ada595c87f9a4d16db16ba5edb70dcb1
-.\" Author: Ingo Molnar 
-.\" Date:   Wed Nov 7 18:37:48 2007 +0100
-.\" x86: ignore the sys_getcpu() tcache parameter
-In earlier kernels,
-if this argument was non-NULL,
-then it specified a pointer to a caller-allocated buffer in thread-local
-storage that was used to provide a caching mechanism for
-.BR getcpu ().
-Use of the cache could speed
-.BR getcpu ()
-calls, at the cost that there was a very small chance that
-the returned information would be out of date.
-The caching mechanism was considered to cause problems when
-migrating threads between CPUs, and so the argument is now ignored.
+.\" .PP
+.\" The
+.\" .I tcache
+.\" argument is unused since Linux 2.6.24.
+.\" .\" commit 4307d1e5ada595c87f9a4d16db16ba5edb70dcb1
+.\" .\" Author: Ingo Molnar 
+.\" .\" Date:   Wed Nov 7 18:37:48 2007 +0100
+.\" .\" x86: ignore the sys_getcpu() tcache parameter
+.\" In earlier kernels,
+.\" if this argument was non-NULL,
+.\" then it specified a pointer to a caller-allocated buffer in thread-local
+.\" storage that was used to provide a caching mechanism for
+.\" .BR getcpu ().
+.\" Use of the cache could speed
+.\" .BR getcpu ()
+.\" calls, at the cost that there was a very small chance that
+.\" the returned information would be out of date.
+.\" The caching mechanism was considered to cause problems when
+.\" migrating threads between CPUs, and so the argument is now ignored.
 .\"
 .\" = Before kernel 2.6.24: =
 .\" .I tcache
-- 
2.29.2

Re: [PATCH v3] close_range.2: new page documenting close_range(2)

2020-12-21 Thread Alejandro Colomar (man-pages)

Hi Stephen,

On 12/21/20 8:24 PM, Stephen Kitt wrote:
> Hi Alex,
>
> On Sat, 19 Dec 2020 15:00:00 +0100, "Alejandro Colomar (man-pages)"
>  wrote:
>> On 12/18/20 5:58 PM, Stephen Kitt wrote:
> [...]
>>> +This program executes the command given on its command-line after
>>> +opening the files listed after the command,
>>> +and then using
>>
>> s/using/uses/
>
> It’s the same form as “opening”: “after opening ... and then using”. The
> overall sequence is “open”, “close_range”, “execve”.
>
> Regards,
>
> Stephen
>


Ahhh.  Then I think the comma is misleading.
What about the following?:


On 12/18/20 5:58 PM, Stephen Kitt wrote:
> +.PP
> +This program executes the command given on its command-line after
> +opening the files listed after the command,
> +and then using
> +.B close_range
> +to close them:

This program executes the command given on its command line,
after opening the files listed after the command
and then using *close_range()* to close them:


Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH v3] close_range.2: new page documenting close_range(2)

2020-12-21 Thread Alejandro Colomar (man-pages)




On 12/20/20 11:00 PM, Stephen Kitt wrote:
> Hi Alex,
> 
> On Sat, 19 Dec 2020 15:00:00 +0100, "Alejandro Colomar (man-pages)"
>  wrote:
>> Please see some comments below.
>> It's looking good ;)
> 
> Thanks for your review and patience!
> 
>> On 12/18/20 5:58 PM, Stephen Kitt wrote:
>>> This documents close_range(2) based on information in
>>> 278a5fbaed89dacd04e9d052f4594ffd0e0585de,
>>> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and
>>> 582f1fb6b721facf04848d2ca57f34468da1813e.
>>>
>>> Signed-off-by: Stephen Kitt 
>>> ---
>>> V3: fix synopsis overflow
>>> copy notes from membarrier.2 re the lack of wrapper
>>> semantic newlines
>>> drop non-standard "USE CASES" section heading
>>> add code example
>>>
>>> V2: unsigned int to match the kernel declarations
>>> groff and grammar tweaks
>>> CLOSE_RANGE_UNSHARE unshares *and* closes
>>> Explain that EMFILE and ENOMEM can occur with C_R_U
>>> "Conforming to" phrasing
>>> Detailed explanation of CLOSE_RANGE_UNSHARE
>>> Reading /proc isn't common
>>>
>>>  man2/close_range.2 | 266 +
>>>  1 file changed, 266 insertions(+)
>>>  create mode 100644 man2/close_range.2
>>>
>>> diff --git a/man2/close_range.2 b/man2/close_range.2
>>> new file mode 100644
>>> index 0..f8f2053ac
>>> --- /dev/null
>>> +++ b/man2/close_range.2
>>> @@ -0,0 +1,266 @@
>>> +.\" Copyright (c) 2020 Stephen Kitt 
>>> +.\"
>>> +.\" %%%LICENSE_START(VERBATIM)
>>> +.\" Permission is granted to make and distribute verbatim copies of this
>>> +.\" manual provided the copyright notice and this permission notice are
>>> +.\" preserved on all copies.
>>> +.\"
>>> +.\" Permission is granted to copy and distribute modified versions of
>>> this +.\" manual under the conditions for verbatim copying, provided that
>>> the +.\" entire resulting derived work is distributed under the terms of a
>>> +.\" permission notice identical to this one.
>>> +.\"
>>> +.\" Since the Linux kernel and libraries are constantly changing, this
>>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
>>> +.\" responsibility for errors or omissions, or for damages resulting from
>>> +.\" the use of the information contained herein.  The author(s) may not
>>> +.\" have taken the same level of care in the production of this manual,
>>> +.\" which is licensed free of charge, as they might when working
>>> +.\" professionally.
>>> +.\"
>>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>>> +.\" the source, must acknowledge the copyright and authors of this work.
>>> +.\" %%%LICENSE_END
>>> +.\"
>>> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
>>> +.SH NAME
>>> +close_range \- close all file descriptors in a given range
>>> +.SH SYNOPSIS
>>> +.nf
>>> +.B #include 
>>> +.PP
>>> +.BI "int close_range(unsigned int " first ", unsigned int " last ,
>>> +.BI "unsigned int " flags );
>>> +.fi
>>> +.PP
>>> +.IR Note :
>>> +There is no glibc wrapper for this system call; see NOTES.
>>> +.SH DESCRIPTION
>>> +The
>>> +.BR close_range ()
>>> +system call closes all open file descriptors from
>>> +.I first
>>> +to
>>> +.I last
>>> +(included).
>>> +.PP
>>> +Errors closing a given file descriptor are currently ignored.
>>> +.PP
>>> +.I flags
>>> +can be 0 or set to one or both of the following:
>>> +.TP
>>> +.B CLOSE_RANGE_UNSHARE
>>> +unshares the range of file descriptors from any other processes,
>>> +before closing them,
>>> +avoiding races with other threads sharing the file descriptor table.
>>> +.TP
>>> +.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)"  
>>
>> |sort
>>
>> I prefer alphabetic order rather than adding new items at the bottom.
>> When lists grow, it becomes difficult to find what you're looking for.
>>
>> CLOEXEC should go before UNSHARE.
> 
> That makes sense.
>

Re: [PATCH v3] close_range.2: new page documenting close_range(2)

2020-12-19 Thread Alejandro Colomar (man-pages)

gt; +
> +int
> +main(int argc, char *argv[])
> +{
> +char *newargv[] = { NULL };
> +char *newenviron[] = { NULL };
> +int i;

dd

> +
> +if (argc < 3) {
> +fprintf(stderr, "Usage: %s  \n", 
> argv[0]);

s/\\/\\e/

> +exit(EXIT_FAILURE);
> +}
> +
> +for (i = 2; i < argc; i++) {

for (int i = 2; i < argc; i++) {

> +if (open(argv[i], O_RDONLY) == -1) {
> +perror(argv[i]);
> +exit(EXIT_FAILURE);
> +}
> +}
> +
> +if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) {
> +perror("close_range");
> +exit(EXIT_FAILURE);
> +}
> +
> +execve(argv[1], newargv, newenviron);
> +perror("execve");
> +exit(EXIT_FAILURE);
> +}
> +.EE
> +.in
> +.PP
> +We can use the second program to exec the first as follows:
> +.PP
> +.in +4n
> +.EX
> +.RB "$" " make listopen close_range"
> +.RB "$" " ./close_range ./listopen /dev/null /dev/zero"
> +FD 0 is open.
> +FD 1 is open.
> +FD 2 is open.
> +.EE
> +.in
> +.PP
> +Removing the call to
> +.B close_range

.BR close_range ()

> +will show different output, with the file descriptors for the named
> +files still open.

[
will show different output,
with the file descriptors for the named files still open.
]

> +.SH SEE ALSO
> +.BR close (2)
> 
> base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [PATCH -V6 RESEND 2/3] NOT kernel/man-pages: man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2020-12-18 Thread Alejandro Colomar (mailing lists; readonly)

Hi Huang, Ying,

Sorry I forgot to answer.
See below.

BTW, Linux 5.10 has been released recently;
is this series already merged for 5.11?
If not yet, could you just write '5.??' and we'll fix it (and add a
commit number in a comment) when we know the definitive version?

Thanks,

Alex

On 12/8/20 9:13 AM, Huang, Ying wrote:
> Hi, Alex,
> 
> Sorry for late, I just notice this email today.
> 
> "Alejandro Colomar (mailing lists; readonly)"
>  writes:
> 
>> Hi Huang Ying,
>>
>> Please see a few fixes below.
>>
>> Michael, as always, some question for you too ;)
>>
>> Thanks,
>>
>> Alex
>>
>> On 12/2/20 9:42 AM, Huang Ying wrote:
>>> Signed-off-by: "Huang, Ying" 
>>> ---
>>>  man2/set_mempolicy.2 | 9 +
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
>>> index 68011eecb..3754b3e12 100644
>>> --- a/man2/set_mempolicy.2
>>> +++ b/man2/set_mempolicy.2
>>> @@ -113,6 +113,12 @@ A nonempty
>>>  .I nodemask
>>>  specifies node IDs that are relative to the set of
>>>  node IDs allowed by the process's current cpuset.
>>> +.TP
>>> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
>>
>> I'd prefer it to be in alphabetical order (rather than just adding at
>> the bottom).
> 
> That's OK for me.  But it's better to be done in another patch to
> distinguish contents from pure order change?

Yes, if you could do a series of 2 patches with a reordering first, it
would be great.

> 
>> That way, when lists grow, it's easier to find things.
>>
>>> +Enable the Linux kernel NUMA balancing for the task if it is supported
>>> +by kernel.
>>
>> I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the
>> Linux kernel to differentiate it from other kernels.  It only adds noise
>> (IMHO).  mtk?
> 
> Sure.  Will fix this and all following comments below.  Thanks a lot for
> your help!  I am new to man pages.

Thank you!

> 
> Best Regards,
> Huang, Ying
>

Ping: [patch] close_range.2: new page documenting close_range(2)

2020-12-18 Thread Alejandro Colomar (man-pages)

Hi Stephen,

Linux 5.10 has been recently released.
Do you have any updates for this patch?

Thanks,

Alex

On 12/12/20 6:58 PM, Alejandro Colomar (man-pages) wrote:
> Hi Christian,
> 
> Makes sense to me.
> 
> Thanks,
> 
> Alex
> 
> On 12/12/20 1:14 PM, Christian Brauner wrote:
>> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) 
>> wrote:
>>> Hi Christian,
>>
>> Hi Alex,
>>
>>>
>>> Thanks for confirming that behavior.  Seems reasonable.
>>>
>>> I was wondering...
>>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>>> shouldn't it fail for the same reasons those syscalls can fail?
>>>
>>> What about the following errors?:
>>>
>>> From unshare(2):
>>>
>>>EPERM  The calling process did not have the  required  privi‐
>>>   leges for this operation.
>>
>> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
>> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
>> i.e.
>> CLONE_NEWNS
>> CLONE_NEWUTS
>> CLONE_NEWIPC
>> CLONE_NEWNET
>> CLONE_NEWPID
>> CLONE_NEWCGROUP
>> CLONE_NEWTIME
>> so the permissions are the same.
>>
>>>
>>> From close(2):
>>>EBADF  fd isn't a valid open file descriptor.
>>>
>>> OK, this one can't happen with the current code.
>>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>>> It's a no-op (although it will still unshare if the flag is set).
>>> But souldn't it fail with EBADF?
>>
>> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
>> table independent of whether or not any file descriptors need to be
>> closed. That's also how we documented the flag:
>>
>> /* Unshare the file descriptor table before closing file descriptors. */
>> #define CLOSE_RANGE_UNSHARE  (1U << 1)
>>
>> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
>> or the proper close_range() syscall wants to make sure that all unwanted
>> file descriptors are closed (if any) and that no new file descriptors
>> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
>> there are no fds to be closed you open up a race window. It would also
>> be annoying for userspace if they _may_ have received a private file
>> descriptor table but only if any fds needed to be closed.
>>
>> If people really were extremely keen about skipping the unshare when no
>> fd needs to be closed then this could become a new flag. But I really
>> don't think that's necessary and also doesn't make a lot of sense, imho.
>>
>>>
>>>EINTR  The close() call was interrupted by a signal; see sig‐
>>>   nal(7).
>>>
>>>EIOAn I/O error occurred.
>>>
>>>ENOSPC, EDQUOT
>>>   On NFS, these errors are not normally reported against
>>>   the first write which exceeds  the  available  storage
>>>   space,  but  instead  against  a  subsequent write(2),
>>>   fsync(2), or close().
>>
>> None of these will be seen by userspace because close_range() currently
>> ignores all errors after it has begun closing files.
>>
>> Christian
>>

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Re: [Bug 210655] ptrace.2: documentation is incorrect about access checking threads in same thread group

2020-12-16 Thread Alejandro Colomar (man-pages)

[CC += Thomas, Ingo, Peter, Darren]

Hi Oleg,

On 12/16/20 3:33 AM, Jann Horn wrote:
> On Wed, Dec 16, 2020 at 3:21 AM Ted Estes  wrote:
>> On 12/15/2020 6:01 PM, Jann Horn wrote:
>>> On Wed, Dec 16, 2020 at 12:25 AM Alejandro Colomar (man-pages)
>>>  wrote:
>>>> On 12/16/20 12:23 AM, Alejandro Colomar (man-pages) wrote:
>>>>> On 12/16/20 12:07 AM, Jann Horn wrote:
>>>>>> As the comment explains, you can't actually *attach*
>>>>>> to another task in the same thread group; but that's
>>>>>> not because of the ptrace-style access check rules,
>>>>>> but because specifically *attaching* to another task
>>>>>> in the same thread group doesn't work.
>>> As I said, attaching indeed doesn't work. But that's not what "Ptrace
>>> access mode checking" means. As the first sentence of that section
>>> says:
>>>
>>> | Various parts of the kernel-user-space API (not just ptrace()
>>> | operations), require so-called "ptrace access mode" checks,
>>> | whose outcome determines whether an operation is
>>> | permitted (or, in a  few cases,  causes  a "read" operation
>>> | to return sanitized data).
>>>
>>> You can find these places by grepping for \bptrace_may_access\b -
>>> operations like e.g. the get_robust_list() syscall will always succeed
>>> when inspecting other tasks in the caller's thread group thanks to
>>> this rule.
>>
>> Ah, yes.  I missed that back reference while trying to digest that
>> rather meaty man page.  A grep on the man page source tree does show a
>> number of references to "ptrace access mode".
>>
>> That said, the ptrace(2) man page also directly references the ptrace
>> access mode check under both PTRACE_ATTACH and PTACE_SEIZE:
>>
>> | Permission to perform a PTRACE_ATTACH is governed by a ptrace | access
>> mode PTRACE_MODE_ATTACH_REALCREDS check; see below. As confirmed, the
>> "same thread group" rule does not apply to either of those operations. A
>> re-wording of rule 1 similar to this might help avoid confusion: 1. If
>> the calling thread and the target thread are in the same thread group:
>> a. For ptrace() called with PTRACE_ATTACH or PTRACE_SEIZE, access is
>> NEVER allowed. b. For all other so-called "ptrace access mode checks",
>> access is ALWAYS allowed. --Ted
> 
> Yeah, maybe. OTOH I'm not sure whether it really makes sense to
> explain this as being part of a security check, or whether it should
> be explained separately as a restriction on PTRACE_ATTACH and
> PTRACE_SEIZE (with a note like "(irrelevant for ptrace attachment)" on
> rule 1). But I don't feel strongly about it either way.
> 

As you are the maintainer for ptrace,
could you confirm the above from Jan?
And maybe suggest what you would do with the manual page.

I'd like to get confirmation that there are still other functions that
require "ptrace access mode" other than ptrace() itself, where it's
valid that the calling thread and the target thread are in the same group.

Jann noted get_robust_list() as an example, so I CCed futex maintainers.

Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

Re: [Bug 210655] ptrace.2: documentation is incorrect about access checking threads in same thread group

2020-12-15 Thread Alejandro Colomar (man-pages)

Hi Jann,

On 12/16/20 12:07 AM, Jann Horn wrote:
> Am Tue, Dec 15, 2020 at 06:01:25PM +0100 schrieb Alejandro Colomar 
> (man-pages):
>> Hi,
>>
>> There's a bug report: https://bugzilla.kernel.org/show_bug.cgi?id=210655
>>
>> [[
>> Under "Ptrace access mode checking", the documentation states:
>>   "1. If the calling thread and the target thread are in the same thread
>> group, access is always allowed."
>>
>> This is incorrect. A thread may never attach to another in the same group.
> 
> No, that is correct. ptrace-mode access checks do always short-circuit for
> tasks in the same thread group:
> 
> /* Returns 0 on success, -errno on denial. */
> static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> {
> [...]
> /* May we inspect the given task?
>  * This check is used both for attaching with ptrace
>  * and for allowing access to sensitive information in /proc.
>  *
>  * ptrace_attach denies several cases that /proc allows
>  * because setting up the necessary parent/child relationship
>  * or halting the specified task is impossible.
>  */
> 
> /* Don't let security modules deny introspection */
> if (same_thread_group(task, current))
> return 0;
> [...]
> }

AFAICS, that code always returns non-zero,
at least when called from ptrace_attach().

As you can see below,
__ptrace_may_access() is called some lines after
the code pointed to by the bug report.


static int ptrace_attach(struct task_struct *task, long request,
 unsigned long addr,
 unsigned long flags)
{
[...]
if (same_thread_group(task, current))
goto out;

/*
 * Protect exec's credential calculations against our interference;
 * SUID, SGID and LSM creds get determined differently
 * under ptrace.
 */
retval = -ERESTARTNOINTR;
if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
goto out;

task_lock(task);
retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
[...]
}


Thanks,

Alex

> 
> As the comment explains, you can't actually *attach*
> to another task in the same thread group; but that's
> not because of the ptrace-style access check rules,
> but because specifically *attaching* to another task
> in the same thread group doesn't work.
>

Re: [Bug 210655] ptrace.2: documentation is incorrect about access checking threads in same thread group

2020-12-15 Thread Alejandro Colomar (man-pages)




On 12/16/20 12:23 AM, Alejandro Colomar (man-pages) wrote:
> Hi Jann,
> 
> On 12/16/20 12:07 AM, Jann Horn wrote:
>> Am Tue, Dec 15, 2020 at 06:01:25PM +0100 schrieb Alejandro Colomar 
>> (man-pages):
>>> Hi,
>>>
>>> There's a bug report: https://bugzilla.kernel.org/show_bug.cgi?id=210655
>>>
>>> [[
>>> Under "Ptrace access mode checking", the documentation states:
>>>   "1. If the calling thread and the target thread are in the same thread
>>> group, access is always allowed."
>>>
>>> This is incorrect. A thread may never attach to another in the same group.
>>
>> No, that is correct. ptrace-mode access checks do always short-circuit for
>> tasks in the same thread group:
>>
>> /* Returns 0 on success, -errno on denial. */
>> static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
>> {
>> [...]
>> /* May we inspect the given task?
>>  * This check is used both for attaching with ptrace
>>  * and for allowing access to sensitive information in /proc.
>>  *
>>  * ptrace_attach denies several cases that /proc allows
>>  * because setting up the necessary parent/child relationship
>>  * or halting the specified task is impossible.
>>  */
>>
>> /* Don't let security modules deny introspection */
>> if (same_thread_group(task, current))
>> return 0;
>> [...]
>> }
> 
> AFAICS, that code always returns non-zero,

Sorry, I should have said "that code never returns 0".

> at least when called from ptrace_attach().
> 
> As you can see below,
> __ptrace_may_access() is called some lines after
> the code pointed to by the bug report.
> 
> 
> static int ptrace_attach(struct task_struct *task, long request,
>unsigned long addr,
>unsigned long flags)
> {
> [...]
>   if (same_thread_group(task, current))
>   goto out;
> 
>   /*
>* Protect exec's credential calculations against our interference;
>* SUID, SGID and LSM creds get determined differently
>* under ptrace.
>*/
>   retval = -ERESTARTNOINTR;
>   if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
>   goto out;
> 
>   task_lock(task);
>   retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> [...]
> }
> 
> 
> Thanks,
> 
> Alex
> 
>>
>> As the comment explains, you can't actually *attach*
>> to another task in the same thread group; but that's
>> not because of the ptrace-style access check rules,
>> but because specifically *attaching* to another task
>> in the same thread group doesn't work.
>>

Re: [Bug 210655] ptrace.2: documentation is incorrect about access checking threads in same thread group

2020-12-15 Thread Alejandro Colomar (man-pages)

[CC += Andreas, Linus, Roland, Markus; fixed Oleg]

On 12/15/20 7:34 PM, Alejandro Colomar (man-pages) wrote:
> Hi Ted,
>
> On 12/15/20 7:31 PM, Ted Estes wrote:
>> Per my research on the topic, the error is in the manual page.  The
>> behavior of ptrace(2) was intentionally changed to prohibit attaching to
>> a thread in the same group.  Apparently, there were a number of
>> ill-behaved edge cases.
>>
>> I found this email thread on the subject:
>> https://lkml.org/lkml/2006/8/31/241

Okay, after reading the LKML thread,
the old behavior was removed because it was very buggy.

We have two options now:

1) Remove that paragraph, as if that behavior had never existed.

   If we do this, not much is lost:
   Only _very_ old kernels had that behavior,
   and it's not even advisable to make use of it on those, AFAICS.

2) Add a note to that paragraph, saying that since kernel 2.X.Y?
   the calling thread and the target thread can't be in the same group.

   Cons: That info is unlikely to be useful, and will only add
   a few more lines to a page that is already very long.

3) Suggestions?

I prefer option 1.

I'll add a larger screenshot of the manual page below,
so that readers don't need to read 'man 2 ptrace':

[[
...

   The algorithm employed for ptrace access mode  checking  deter‐
   mines  whether  the  calling  process is allowed to perform the
   corresponding action on the target process.  (In  the  case  of
   opening  /proc/[pid]  files,  the  "calling process" is the one
   opening the file, and the process with the corresponding PID is
   the "target process".)  The algorithm is as follows:

   1. If  the calling thread and the target thread are in the same
  thread group, access is always allowed.

   2. If the access mode specifies PTRACE_MODE_FSCREDS, then,  for
  the  check  in the next step, employ the caller's filesystem
  UID and GID.  (As noted in  credentials(7),  the  filesystem
  UID and GID almost always have the same values as the corre‐
  sponding effective IDs.)

  Otherwise, the access mode specifies  PTRACE_MODE_REALCREDS,
  so  use  the caller's real UID and GID for the checks in the
  next step.  (Most APIs that check the caller's UID  and  GID
  use   the   effective  IDs.   For  historical  reasons,  the
  PTRACE_MODE_REALCREDS check uses the real IDs instead.)

...
]]

Any thoughts before I write the patch?

Thanks,

Alex

>
> Thank you for all the details and links!
> I'll fix the page.
>
> Thanks,
>
> Alex
>
>>
>> Thank you.
>> --Ted Estes
>>
>> On 12/15/2020 11:01 AM, Alejandro Colomar (man-pages) wrote:
>>> Hi,
>>>
>>> There's a bug report: https://bugzilla.kernel.org/show_bug.cgi?id=210655
>>>
>>> [[
>>> Under "Ptrace access mode checking", the documentation states:
>>>"1. If the calling thread and the target thread are in the same
thread
>>> group, access is always allowed."
>>>
>>> This is incorrect. A thread may never attach to another in the same
>>> group.
>>>
>>> Reference, ptrace_attach()
>>>
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/ptrace.c?h=v5.9.14#n380
>>>
>>> ]]
>>>
>>> I just wanted to make sure that it is a bug in the manual page, and not
>>> in the implementation.
>>>
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>
>

Re: [Bug 210655] ptrace.2: documentation is incorrect about access checking threads in same thread group

2020-12-15 Thread Alejandro Colomar (man-pages)

Hi Ted,

On 12/15/20 7:31 PM, Ted Estes wrote:
> Per my research on the topic, the error is in the manual page.  The
> behavior of ptrace(2) was intentionally changed to prohibit attaching to
> a thread in the same group.  Apparently, there were a number of
> ill-behaved edge cases.
> 
> I found this email thread on the subject:
> https://lkml.org/lkml/2006/8/31/241

Thank you for all the details and links!
I'll fix the page.

Thanks,

Alex

> 
> Thank you.
> --Ted Estes
> 
> On 12/15/2020 11:01 AM, Alejandro Colomar (man-pages) wrote:
>> Hi,
>>
>> There's a bug report: https://bugzilla.kernel.org/show_bug.cgi?id=210655
>>
>> [[
>> Under "Ptrace access mode checking", the documentation states:
>>    "1. If the calling thread and the target thread are in the same thread
>> group, access is always allowed."
>>
>> This is incorrect. A thread may never attach to another in the same
>> group.
>>
>> Reference, ptrace_attach()
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/ptrace.c?h=v5.9.14#n380
>>
>> ]]
>>
>> I just wanted to make sure that it is a bug in the manual page, and not
>> in the implementation.
>>
>>
>> Thanks,
>>
>> Alex
>>
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

[Bug 210655] ptrace.2: documentation is incorrect about access checking threads in same thread group

2020-12-15 Thread Alejandro Colomar (man-pages)

Hi,

There's a bug report: https://bugzilla.kernel.org/show_bug.cgi?id=210655

[[
Under "Ptrace access mode checking", the documentation states:
  "1. If the calling thread and the target thread are in the same thread
group, access is always allowed."

This is incorrect. A thread may never attach to another in the same group.

Reference, ptrace_attach()
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/ptrace.c?h=v5.9.14#n380
]]

I just wanted to make sure that it is a bug in the manual page, and not
in the implementation.


Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

Re: [patch] close_range.2: new page documenting close_range(2)

2020-12-12 Thread Alejandro Colomar (man-pages)

Hi Christian,

Makes sense to me.

Thanks,

Alex

On 12/12/20 1:14 PM, Christian Brauner wrote:
> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
>> Hi Christian,
> 
> Hi Alex,
> 
>>
>> Thanks for confirming that behavior.  Seems reasonable.
>>
>> I was wondering...
>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>> shouldn't it fail for the same reasons those syscalls can fail?
>>
>> What about the following errors?:
>>
>> From unshare(2):
>>
>>EPERM  The calling process did not have the  required  privi‐
>>   leges for this operation.
> 
> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
> i.e.
> CLONE_NEWNS
> CLONE_NEWUTS
> CLONE_NEWIPC
> CLONE_NEWNET
> CLONE_NEWPID
> CLONE_NEWCGROUP
> CLONE_NEWTIME
> so the permissions are the same.
> 
>>
>> From close(2):
>>EBADF  fd isn't a valid open file descriptor.
>>
>> OK, this one can't happen with the current code.
>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>> It's a no-op (although it will still unshare if the flag is set).
>> But souldn't it fail with EBADF?
> 
> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
> table independent of whether or not any file descriptors need to be
> closed. That's also how we documented the flag:
> 
> /* Unshare the file descriptor table before closing file descriptors. */
> #define CLOSE_RANGE_UNSHARE   (1U << 1)
> 
> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
> or the proper close_range() syscall wants to make sure that all unwanted
> file descriptors are closed (if any) and that no new file descriptors
> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
> there are no fds to be closed you open up a race window. It would also
> be annoying for userspace if they _may_ have received a private file
> descriptor table but only if any fds needed to be closed.
> 
> If people really were extremely keen about skipping the unshare when no
> fd needs to be closed then this could become a new flag. But I really
> don't think that's necessary and also doesn't make a lot of sense, imho.
> 
>>
>>EINTR  The close() call was interrupted by a signal; see sig‐
>>   nal(7).
>>
>>EIOAn I/O error occurred.
>>
>>ENOSPC, EDQUOT
>>   On NFS, these errors are not normally reported against
>>   the first write which exceeds  the  available  storage
>>   space,  but  instead  against  a  subsequent write(2),
>>   fsync(2), or close().
> 
> None of these will be seen by userspace because close_range() currently
> ignores all errors after it has begun closing files.
> 
> Christian
>

Re: [PATCH] futex: Change 'utime' parameter to be 'const ... *'

2020-12-10 Thread Alejandro Colomar (man-pages)

or:0

TAP version 13
1..1
# futex_wait_wouldblock: Test the unexpected futex value in FUTEX_WAIT
ok 1 futex-wait-wouldblock
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

TAP version 13
1..1
# futex_wait_uninitialized_heap: Test the uninitialized futex value in
FUTEX_WAIT
ok 1 futex-wait-uninitialized-heap
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
TAP version 13
1..1
# futex_wait_private_mapped_file: Test the futex value of private file
mappings in FUTEX_WAIT
ok 1 futex-wait-private-mapped-file
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0


On 11/28/20 1:39 PM, Alejandro Colomar wrote:
> futex(2) says that 'utime' is a pointer to 'const'.
> The implementation doesn't use 'const';
> however, it _never_ modifies the contents of utime.
> 
> - futex() either uses 'utime' as a pointer to struct or as a 'u32'.
> 
> - In case it's used as a 'u32', it makes a copy of it,
>   and of course it is not dereferenced.
> 
> - In case it's used as a 'struct __kernel_timespec __user *',
>   the pointer is not dereferenced inside the futex() definition,
>   and it is only passed to a function: get_timespec64(),
>   which accepts a 'const struct __kernel_timespec __user *'.
> 
> context:
> 
> 
> [[
> FUTEX(2)   Linux Programmer's Manual  FUTEX(2)
> 
> NAME
>futex - fast user-space locking
> 
> SYNOPSIS
>#include 
>#include 
>#include 
> 
>long futex(uint32_t *uaddr, int futex_op, uint32_t val,
>  const struct timespec *timeout,   /* or: uint32_t val2 */
>  uint32_t *uaddr2, uint32_t val3);
> 
>Note:  There  is  no  glibc  wrapper  for this system call; see
>NOTES.
> ]]
> 
> $ sed -n '/SYSCALL_DEFINE.(futex\>/,/^}/p' linux/kernel/futex.c
> SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
>   struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
>   u32, val3)
> {
>   struct timespec64 ts;
>   ktime_t t, *tp = NULL;
>   u32 val2 = 0;
>   int cmd = op & FUTEX_CMD_MASK;
> 
>   if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
> cmd == FUTEX_WAIT_BITSET ||
> cmd == FUTEX_WAIT_REQUEUE_PI)) {
>   if (unlikely(should_fail_futex(!(op & FUTEX_PRIVATE_FLAG
>   return -EFAULT;
>   if (get_timespec64(&ts, utime))
>   return -EFAULT;
>   if (!timespec64_valid(&ts))
>   return -EINVAL;
> 
>   t = timespec64_to_ktime(ts);
>   if (cmd == FUTEX_WAIT)
>   t = ktime_add_safe(ktime_get(), t);
>   else if (!(op & FUTEX_CLOCK_REALTIME))
>   t = timens_ktime_to_host(CLOCK_MONOTONIC, t);
>   tp = &t;
>   }
>   /*
>* requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
>* number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
>*/
>   if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
>   cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
>   val2 = (u32) (unsigned long) utime;
> 
>   return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
> }
> 
> $ sed -n '/get_timespec64(/,/;/p' linux/include/linux/time.h
> int get_timespec64(struct timespec64 *ts,
>   const struct __kernel_timespec __user *uts);
> 
> ...
> 
> Signed-off-by: Alejandro Colomar 
> ---
> 
> Hello Thomas & Ingo,
> 
> I'm sorry I couldn't test the change in my computers,
> as there is a bug since Linux 5.7 where I can't boot
> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974166).
> 
> Alex
> 
>  kernel/futex.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 00259c7e288e..28577c7d2805 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -3792,8 +3792,8 @@ long do_futex(u32 __user *uaddr, int op, u32 val, 
> ktime_t *timeout,
>  
>  
>  SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
> - struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
> - u32, val3)
> + const struct __kernel_timespec __user *, utime,
> + u32 __user *, uaddr2, u32, val3)
>  {
>   struct timespec64 ts;
>   ktime_t t, *tp = NULL;
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

Re: [patch] close_range.2: new page documenting close_range(2)

2020-12-10 Thread Alejandro Colomar (man-pages)

Hi Christian,

Thanks for confirming that behavior.  Seems reasonable.

I was wondering...
If this call is equivalent to unshare(2)+{close(2) in a loop},
shouldn't it fail for the same reasons those syscalls can fail?

What about the following errors?:

>From unshare(2):

   EPERM  The calling process did not have the  required  privi‐
  leges for this operation.

>From close(2):
   EBADF  fd isn't a valid open file descriptor.

OK, this one can't happen with the current code.
Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
It's a no-op (although it will still unshare if the flag is set).
But souldn't it fail with EBADF?

   EINTR  The close() call was interrupted by a signal; see sig‐
  nal(7).

   EIOAn I/O error occurred.

   ENOSPC, EDQUOT
  On NFS, these errors are not normally reported against
  the first write which exceeds  the  available  storage
  space,  but  instead  against  a  subsequent write(2),
  fsync(2), or close().

Thanks,

Alex

On 12/9/20 11:56 AM, Christian Brauner wrote:
> On Wed, Dec 09, 2020 at 11:44:22AM +0100, Alejandro Colomar (man-pages) wrote:
>> Hey Christian,
>>
>> I have a question for you below.
>>
>> Thanks,
> 
> Hey Alex,
> 
> Sure!

[...]

>>
>> AFAICS after reading the code, if the unsharing fails,
>> it will not close any file descriptors (please correct me if I'm wrong).
>>
>> Just wanted to be sure that it was the intended behavior with you,
>> and if so, it would be good to document it in the page.
> 
> Yes, this is intended because if the unshare fails we haven't yet
> actually started closing anything so we're before the point of no
> return where we ignore failures. So we can let userspace decide whether
> they want to retry without CLOSE_RANGE_UNSHARE.
> 
> Christian
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

Re: [PATCH v2] close_range.2: new page documenting close_range(2)

2020-12-09 Thread Alejandro Colomar (man-pages)

sent on a few pages,
but the one in membarrier(2) is more extended.

Please, copy the notices from membarrier(2).
There's one in SYNOPSIS, and one in NOTES.

> +.PP
> +.B CLOSE_RANGE_UNSHARE
> +is conceptually equivalent to
> +.PP
> +.in +4n
> +.EX
> +unshare(CLONE_FILES);
> +close_range(first, last, 0);
> +.EE
> +.in
> +.PP
> +but can be more efficient: if the unshared range extends past the
> +current maximum number of file descriptors allocated in the caller's
> +file descriptor table (the common case when
> +.I last
> +is
> +.BR ~0U ),
> +the kernel will unshare a new file descriptor
> +table for the caller up to
> +.IR first .
> +This avoids subsequent close calls entirely; the whole operation is
> +complete once the table is unshared.

I think the above is more suitable for the DESCRIPTION, but what are
your thoughts, mtk?

> +.SH USE CASES

This section is unconventional.  Please move that text to one of the
traditional sections.  I think DESCRIPTION would be the best place for this.

For a list of the traditional sections,
see man-pages(7)::DESCRIPTION::Sections within a manual page

> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
> +.SS Closing file descriptors before exec
> +File descriptors can be closed safely using
> +.PP
> +.in +4n
> +.EX
> +/* we don't want anything past stderr here */
> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
> +execve();
> +.EE
> +.in
> +.SS Closing all open file descriptors
> +To avoid blindly closing file descriptors in the range of possible
> +file descriptors, this is sometimes implemented (on Linux) by listing
> +open file descriptors in
> +.I /proc/self/fd/
> +and calling
> +.BR close (2)
> +on each one.
> +.BR close_range ()
> +can take care of this without requiring
> +.I /proc
> +and with a single system call, which provides significant performance
> +benefits.
> +.SH SEE ALSO
> +.BR close (2)
> 
> base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

Re: [patch] close_range.2: new page documenting close_range(2)

2020-12-09 Thread Alejandro Colomar (man-pages)

Hey Christian,

I have a question for you below.

Thanks,

Alex

On 12/9/20 10:58 AM, Christian Brauner wrote:
> On Tue, Dec 08, 2020 at 10:51:33PM +0100, Stephen Kitt wrote:
>> This documents close_range(2) based on information in
>> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
>> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
>>
>> Signed-off-by: Stephen Kitt 
>> ---
> 
> Hey Stephen,
> 
> Thanks for working on this that's an early Christmas present as it gets
> an item off my todo list!
> 
>>  man2/close_range.2 | 112 +
>>  1 file changed, 112 insertions(+)
>>  create mode 100644 man2/close_range.2
>>
>> diff --git a/man2/close_range.2 b/man2/close_range.2
>> new file mode 100644
>> index 0..62167d9b0
>> --- /dev/null
>> +++ b/man2/close_range.2
>> @@ -0,0 +1,112 @@
>> +.\" Copyright (c) 2020 Stephen Kitt 
>> +.\"
>> +.\" %%%LICENSE_START(VERBATIM)
>> +.\" Permission is granted to make and distribute verbatim copies of this
>> +.\" manual provided the copyright notice and this permission notice are
>> +.\" preserved on all copies.
>> +.\"
>> +.\" Permission is granted to copy and distribute modified versions of this
>> +.\" manual under the conditions for verbatim copying, provided that the
>> +.\" entire resulting derived work is distributed under the terms of a
>> +.\" permission notice identical to this one.
>> +.\"
>> +.\" Since the Linux kernel and libraries are constantly changing, this
>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
>> +.\" responsibility for errors or omissions, or for damages resulting from
>> +.\" the use of the information contained herein.  The author(s) may not
>> +.\" have taken the same level of care in the production of this manual,
>> +.\" which is licensed free of charge, as they might when working
>> +.\" professionally.
>> +.\"
>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>> +.\" the source, must acknowledge the copyright and authors of this work.
>> +.\" %%%LICENSE_END
>> +.\"
>> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
>> +.SH NAME
>> +close_range \- close all file descriptors in a given range
>> +.SH SYNOPSIS
>> +.nf
>> +.B #include 
>> +.PP
>> +.BI "int close_range(int " first ", int " last ", unsigned int " flags );
> 
> Note, the kernel prototype uses unsigned int as the type for file
> descriptor arguments. As does the close() syscall itself. Only glibc
> wrappers expose file descriptor types (at least in close variants) as
> int.
> Since this is a manpage about the syscall not the wrapper it might make
> sense to note the correct types.
> 
>> +.fi
>> +.SH DESCRIPTION
>> +The
>> +.BR close_range ()
>> +system call closes all open file descriptors from
>> +.I first
>> +to
>> +.IR last
>> +(included).
>> +.PP
>> +Errors closing a given file descriptor are currently ignored.
>> +.PP
>> +.I flags
>> +can be set to
>> +.B CLOSE_RANGE_UNSHARE
>> +to unshare the range of file descriptors from any other processes,
>> +.I instead
>> +of closing them.
> 
> As Michael has noted, this needs to be reworded. A few things to note:
> - CLOSE_RANGE_UNSHARE will ensure that the calling process will have a
>   private file descriptor table. This ensures that other threads opening
>   files cannot inject new file descriptors into the caller's file
>   descriptor table to e.g. make the caller inherit unwanted file
>   descriptors.
> - CLOSE_RANGE_UNSHARE is conceptually equivalent to:
>   unshare(CLONE_FILES);
>   close_range(3, ~0U);

AFAICS after reading the code, if the unsharing fails,
it will not close any file descriptors (please correct me if I'm wrong).

Just wanted to be sure that it was the intended behavior with you,
and if so, it would be good to document it in the page.

> - Whenever the requested range @last is greater than the current maximum
>   number of file descriptors allocated in the caller's file descriptor
>   table the kernel will only unshare a new file descriptor table for the
>   caller up to @first, i.e. the new file descriptor table will be 0 up
>   to and including @first not 0 up to and including @last. Which means
>   that the kernel will not have to do any costly filp_close() calls at
>   all. In essence, the close_range() operation is finished after the
>   in-kernel unshare call in such cases.
> 
> Christian
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

Re: [patch] close_range.2: new page documenting close_range(2)

2020-12-09 Thread Alejandro Colomar (man-pages)

e_range().

fs/file.c:
If CLOSE_RANGE_UNSHARE, __close_range() calls unshare_fd()
with CLONE_FILES flag.

kernel/fork.c:
unshare_fd() calls dup_fd().

fs/file.c:
dup_fd() may fail with -ENOMEM or -EMFILE.

> 
>> +.TP
>> +.B ENOMEM
>> +Insufficient kernel memory was available.
>> +.SH VERSIONS
>> +.BR close_range ()
>> +first appeared in Linux 5.9.
>> +.SH CONFORMING TO
>> +.BR close_range ()
>> +is available on Linux and FreeBSD.
> 
> Here, I think it would be better to write:
> 
> close_range()
> is a nonstandard function that is also present on FreeBSD.
> 
>> +.SH NOTES
>> +Currently, there is no glibc wrapper for this system call; call it using
>> +.BR syscall (2).
>> +.SH USE CASES
>> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
>> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
>> +.SS Closing file descriptors before exec
>> +File descriptors can be closed safely using
>> +.PP
>> +.in +4n
>> +.EX
>> +/* we don't want anything past stderr here */
>> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
>> +execve();
>> +.EE
>> +.in
>> +.PP
> 
> .PP is not necessary before a new subsection (.SS).
> 
>> +.SS Closing all open file descriptors
>> +This is commonly implemented (on Linux) by listing open file
> 
> Is it really true that this is common? I suspect not. It's slow, and
> relies on /proc being present. I would have thought that more common
> is something like:
> 
> int maxfd = sysconf(_SC_OPEN_MAX);
> if (maxfd == -1)/* Limit is indeterminate... */
> maxfd = 16384;   /* so take a guess */
> 
> for (fd = 0; fd < maxfd; fd++)
> close(fd);
> 
> I think it's fine to mention the use of a /proc as an (inferior and)
> alternative way of doing this. I'm just not sure that "commonly" is
> correct.
> 
>> +descriptors in
>> +.B /proc/self/fd/

By reading proc.5, I think this should s/.B/.I/, right mtk?

>> +and calling
>> +.BR close (2)
>> +on each one.
>> +.BR close_range ()
>> +can take care of this without requiring
>> +.B /proc

By reading proc.5, I think this should s/.B/.I/, right mtk?

>> +and with a single system call, which provides significant performance
>> +benefits.
>> +.SH SEE ALSO
>> +.BR close (2)
>>
>> base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
>> --
>> 2.20.1
> 
> Thanks,
> 
> Michael
> 
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; http://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

Re: [PATCH -V7 2/3] NOT kernel/man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2020-12-04 Thread Alejandro Colomar (man-pages)

Hi Huang Ying,

Please, see a few fixes below.

Thanks,

Alex

On 12/4/20 10:15 AM, Huang Ying wrote:
> Signed-off-by: "Huang, Ying" 
> ---
>  man2/set_mempolicy.2 | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
> index 68011eecb..fb2e6fd96 100644
> --- a/man2/set_mempolicy.2
> +++ b/man2/set_mempolicy.2
> @@ -113,6 +113,15 @@ A nonempty
>  .I nodemask
>  specifies node IDs that are relative to the set of
>  node IDs allowed by the process's current cpuset.
> +.TP
> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
> +When
> +.I mode
> +is MPOL_BIND, enable the Linux kernel NUMA balancing for the task if

.B MPOL_BIND

> +it is supported by kernel.
> +If the flag isn't supported by Linux kernel, or is used with
> +.I mode> +other than MPOL_BIND, return -1 and errno is set to EINVAL.

.BR MPOL_BIND ,

A minus sign should be escaped:
\-1
See man-pages(7)::STYLE GUIDE::Generating optimal glyphs)

.I errno
.BR EINVAL .

>  .PP
>  .I nodemask
>  points to a bit mask of node IDs that contains up to
> @@ -293,6 +302,11 @@ argument specified both
>  .B MPOL_F_STATIC_NODES
>  and
>  .BR MPOL_F_RELATIVE_NODES .
> +Or, the
> +.B MPOL_F_NUMA_BALANCING
> +isn't supported by the Linux kernel, or is used with
> +.I mode
> +other than MPOL_BIND.

.BR MPOL_BIND .

>  .TP
>  .B ENOMEM
>  Insufficient kernel memory was available.
>

Re: [PATCH -V6 RESEND 2/3] NOT kernel/man-pages: man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING

2020-12-02 Thread Alejandro Colomar (mailing lists; readonly)

Hi Huang Ying,

Please see a few fixes below.

Michael, as always, some question for you too ;)

Thanks,

Alex

On 12/2/20 9:42 AM, Huang Ying wrote:
> Signed-off-by: "Huang, Ying" 
> ---
>  man2/set_mempolicy.2 | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
> index 68011eecb..3754b3e12 100644
> --- a/man2/set_mempolicy.2
> +++ b/man2/set_mempolicy.2
> @@ -113,6 +113,12 @@ A nonempty
>  .I nodemask
>  specifies node IDs that are relative to the set of
>  node IDs allowed by the process's current cpuset.
> +.TP
> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"

I'd prefer it to be in alphabetical order (rather than just adding at
the bottom).

That way, when lists grow, it's easier to find things.

> +Enable the Linux kernel NUMA balancing for the task if it is supported
> +by kernel.

I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the
Linux kernel to differentiate it from other kernels.  It only adds noise
(IMHO).  mtk?

wfix:

... supported by _the_ kernel.

> +If the flag isn't supported by Linux kernel, return -1 and errno is

wfix:

If the flag isn't supported by _the_ kernel, ...

> +set to EINVAL.

errno and EINVAL should use .I and .B respectively

>  .PP
>  .I nodemask
>  points to a bit mask of node IDs that contains up to
> @@ -293,6 +299,9 @@ argument specified both
>  .B MPOL_F_STATIC_NODES
>  and
>  .BR MPOL_F_RELATIVE_NODES .
> +Or, the
> +.B MPOL_F_NUMA_BALANCING
> +isn't supported by the Linux kernel.
>  .TP
>  .B ENOMEM
>  Insufficient kernel memory was available.
>

[PATCH] futex: Change 'utime' parameter to be 'const ... *'

2020-11-28 Thread Alejandro Colomar

futex(2) says that 'utime' is a pointer to 'const'.
The implementation doesn't use 'const';
however, it _never_ modifies the contents of utime.

- futex() either uses 'utime' as a pointer to struct or as a 'u32'.

- In case it's used as a 'u32', it makes a copy of it,
  and of course it is not dereferenced.

- In case it's used as a 'struct __kernel_timespec __user *',
  the pointer is not dereferenced inside the futex() definition,
  and it is only passed to a function: get_timespec64(),
  which accepts a 'const struct __kernel_timespec __user *'.

context:


[[
FUTEX(2)   Linux Programmer's Manual  FUTEX(2)

NAME
   futex - fast user-space locking

SYNOPSIS
   #include 
   #include 
   #include 

   long futex(uint32_t *uaddr, int futex_op, uint32_t val,
 const struct timespec *timeout,   /* or: uint32_t val2 */
 uint32_t *uaddr2, uint32_t val3);

   Note:  There  is  no  glibc  wrapper  for this system call; see
   NOTES.
]]

$ sed -n '/SYSCALL_DEFINE.(futex\>/,/^}/p' linux/kernel/futex.c
SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
u32, val3)
{
struct timespec64 ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;

if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
  cmd == FUTEX_WAIT_BITSET ||
  cmd == FUTEX_WAIT_REQUEUE_PI)) {
if (unlikely(should_fail_futex(!(op & FUTEX_PRIVATE_FLAG
return -EFAULT;
if (get_timespec64(&ts, utime))
return -EFAULT;
if (!timespec64_valid(&ts))
return -EINVAL;

t = timespec64_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add_safe(ktime_get(), t);
else if (!(op & FUTEX_CLOCK_REALTIME))
t = timens_ktime_to_host(CLOCK_MONOTONIC, t);
tp = &t;
}
/*
 * requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
 * number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
 */
if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
val2 = (u32) (unsigned long) utime;

return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}

$ sed -n '/get_timespec64(/,/;/p' linux/include/linux/time.h
int get_timespec64(struct timespec64 *ts,
const struct __kernel_timespec __user *uts);

...

Signed-off-by: Alejandro Colomar 
---

Hello Thomas & Ingo,

I'm sorry I couldn't test the change in my computers,
as there is a bug since Linux 5.7 where I can't boot
(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974166).

Alex

 kernel/futex.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 00259c7e288e..28577c7d2805 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3792,8 +3792,8 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t 
*timeout,
 
 
 SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
-   struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
-   u32, val3)
+   const struct __kernel_timespec __user *, utime,
+   u32 __user *, uaddr2, u32, val3)
 {
struct timespec64 ts;
ktime_t t, *tp = NULL;
-- 
2.29.2

[PATCH] subpage_prot.2: SYNOPSIS: Fix return type: s/long/int/

2020-11-27 Thread Alejandro Colomar

The Linux kernel uses 'int' instead of 'long' for the return type.
As glibc provides no wrapper, use the same type the kernel uses.

..

$ grep -n wrapper man-pages/man2/subpage_prot.2
40:There is no glibc wrapper for this system call; see NOTES.
99:Glibc does not provide a wrapper for this system call; call it using

$ grep -rn SYSCALL_DEFINE.*subpage_prot linux/;
linux/arch/powerpc/mm/book3s64/subpage_prot.c:190:
SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,

$ sed -n /SYSCALL.*subpage_prot/,/^}/p \
  linux/arch/powerpc/mm/book3s64/subpage_prot.c \
  |grep return;
return -ENOENT;
return -EINVAL;
return -EINVAL;
return 0;
return -EFAULT;
return -EFAULT;
return err;

$ sed -n /SYSCALL.*subpage_prot/,/^}/p \
  linux/arch/powerpc/mm/book3s64/subpage_prot.c \
  |grep '\';
int err;
err = -ENOMEM;
err = -ENOMEM;
err = 0;
    return err;

Signed-off-by: Alejandro Colomar 
---
 man2/subpage_prot.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/subpage_prot.2 b/man2/subpage_prot.2
index b38ba718f..d6f016665 100644
--- a/man2/subpage_prot.2
+++ b/man2/subpage_prot.2
@@ -32,7 +32,7 @@
 subpage_prot \- define a subpage protection for an address range
 .SH SYNOPSIS
 .nf
-.BI "long subpage_prot(unsigned long " addr ", unsigned long " len ,
+.BI "int subpage_prot(unsigned long " addr ", unsigned long " len ,
 .BI "  uint32_t *" map );
 .fi
 .PP
-- 
2.29.2

Re: [PATCH] spu_create.2: Clarify that one of the prototypes is the current one

2020-11-27 Thread Alejandro Colomar (man-pages)

Hi Michael,

On 11/27/20 11:43 AM, Michael Kerrisk (man-pages) wrote:
> Hi ALex,
> 
> On 11/26/20 7:32 PM, Alejandro Colomar wrote:
>> The current Linux kernel only provides a definition of 'spu_create()'.
>> It has 4 parameters, the last being 'int neighbor_fd'.
>>
>> Before Linux 2.6.23, there was an older prototype,
>> which didn't have this last parameter.
>>
>> Move that old prototype to VERSIONS,
>> and keep the current one in SYNOPSIS.
>>
>> ..
>>
>> $ grep -rn "SYSCALL_DEFINE.(spu_create"
>> arch/powerpc/platforms/cell/spu_syscalls.c:56:
>> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
>>
>> $ sed -n 56,/^}/p arch/powerpc/platforms/cell/spu_syscalls.c
>> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
>>  umode_t, mode, int, neighbor_fd)
>> {
>>  long ret;
>>  struct spufs_calls *calls;
>>
>>  calls = spufs_calls_get();
>>  if (!calls)
>>  return -ENOSYS;
>>
>>  if (flags & SPU_CREATE_AFFINITY_SPU) {
>>  struct fd neighbor = fdget(neighbor_fd);
>>  ret = -EBADF;
>>  if (neighbor.file) {
>>  ret = calls->create_thread(name, flags, mode, 
>> neighbor.file);
>>  fdput(neighbor);
>>  }
>>  } else
>>  ret = calls->create_thread(name, flags, mode, NULL);
>>
>>  spufs_calls_put(calls);
>>  return ret;
>> }
>>
>> $ git blame arch/powerpc/platforms/cell/spu_syscalls.c -L 56,/\)/
>> 1bc94226d5c64 (Al Viro 2011-07-26 16:50:23 -0400 56)
>> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
>> 1bc94226d5c64 (Al Viro 2011-07-26 16:50:23 -0400 57)
>>umode_t, mode, int, neighbor_fd)
>>
>> $ git checkout 1bc94226d5c64~1
>> $ git blame arch/powerpc/platforms/cell/spu_syscalls.c -L /spu_create/,/\)/
>> 67207b9664a8d (Arnd Bergmann 2005-11-15 15:53:48 -0500 68)
>> asmlinkage long sys_spu_create(const char __user *name,
>> 8e68e2f248332 (Arnd Bergmann 2007-07-20 21:39:47 +0200 69)
>>  unsigned int flags, mode_t mode, int neighbor_fd)
>>
>> $ git checkout 8e68e2f248332~1
>> $ git blame arch/powerpc/platforms/cell/spu_syscalls.c -L /spu_create/,/\)/
>> 67207b9664a8d (Arnd Bergmann 2005-11-15 15:53:48 -0500 36)
>> asmlinkage long sys_spu_create(const char __user *name,
>> 67207b9664a8d (Arnd Bergmann 2005-11-15 15:53:48 -0500 37)
>>  unsigned int flags, mode_t mode)
>>
>> $ git describe --contains 8e68e2f248332
>> v2.6.23-rc1~195^2~7
>>
>> Signed-off-by: Alejandro Colomar 
>> ---
>>  man2/spu_create.2 | 16 +---
>>  1 file changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/man2/spu_create.2 b/man2/spu_create.2
>> index 4e6f5d730..3eeafee56 100644
>> --- a/man2/spu_create.2
>> +++ b/man2/spu_create.2
>> @@ -30,9 +30,8 @@ spu_create \- create a new spu context
>>  .B #include 
>>  .B #include 
>>  .PP
>> -.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode 
>> ");"
>> -.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode 
>> ","
>> -.BI "   int " neighbor_fd ");"
>> +.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode ,
>> +.BI "   int " neighbor_fd );
>>  .fi
>>  .PP
>>  .IR Note :
>> @@ -247,6 +246,17 @@ By convention, it gets mounted in
>>  The
>>  .BR spu_create ()
>>  system call was added to Linux in kernel 2.6.16.
>> +.PP
>> +.\" commit 8e68e2f248332a9c3fd4f08258f488c209bd3e0c
>> +Before Linux 2.6.23, the prototype for
>> +.BR spu_create ()
>> +was:
>> +.PP
>> +.in +4n
>> +.EX
>> +.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode 
>> );
>> +.EE
>> +.in
>>  .SH CONFORMING TO
>>  This call is Linux-specific and implemented only on the PowerPC
>>  architecture.
> 
> Thanks for the detailed research.

You're welcome! :)

> The page was indeed a bit messy
> in explaining some details. I've instead opted for a different change;
> see below.

Looks good!

Cheers,

Alex

> 
> Thanks,
> 
> Michael
> 
> diff --git a/man2/spu_create.2

[PATCH] spu_create.2: Clarify that one of the prototypes is the current one

2020-11-26 Thread Alejandro Colomar

The current Linux kernel only provides a definition of 'spu_create()'.
It has 4 parameters, the last being 'int neighbor_fd'.

Before Linux 2.6.23, there was an older prototype,
which didn't have this last parameter.

Move that old prototype to VERSIONS,
and keep the current one in SYNOPSIS.

..

$ grep -rn "SYSCALL_DEFINE.(spu_create"
arch/powerpc/platforms/cell/spu_syscalls.c:56:
SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,

$ sed -n 56,/^}/p arch/powerpc/platforms/cell/spu_syscalls.c
SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
umode_t, mode, int, neighbor_fd)
{
long ret;
struct spufs_calls *calls;

calls = spufs_calls_get();
if (!calls)
return -ENOSYS;

if (flags & SPU_CREATE_AFFINITY_SPU) {
struct fd neighbor = fdget(neighbor_fd);
ret = -EBADF;
if (neighbor.file) {
ret = calls->create_thread(name, flags, mode, 
neighbor.file);
fdput(neighbor);
}
} else
ret = calls->create_thread(name, flags, mode, NULL);

spufs_calls_put(calls);
return ret;
}

$ git blame arch/powerpc/platforms/cell/spu_syscalls.c -L 56,/\)/
1bc94226d5c64 (Al Viro 2011-07-26 16:50:23 -0400 56)
SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
1bc94226d5c64 (Al Viro 2011-07-26 16:50:23 -0400 57)
   umode_t, mode, int, neighbor_fd)

$ git checkout 1bc94226d5c64~1
$ git blame arch/powerpc/platforms/cell/spu_syscalls.c -L /spu_create/,/\)/
67207b9664a8d (Arnd Bergmann 2005-11-15 15:53:48 -0500 68)
asmlinkage long sys_spu_create(const char __user *name,
8e68e2f248332 (Arnd Bergmann 2007-07-20 21:39:47 +0200 69)
 unsigned int flags, mode_t mode, int neighbor_fd)

$ git checkout 8e68e2f248332~1
$ git blame arch/powerpc/platforms/cell/spu_syscalls.c -L /spu_create/,/\)/
67207b9664a8d (Arnd Bergmann 2005-11-15 15:53:48 -0500 36)
asmlinkage long sys_spu_create(const char __user *name,
67207b9664a8d (Arnd Bergmann 2005-11-15 15:53:48 -0500 37)
 unsigned int flags, mode_t mode)

$ git describe --contains 8e68e2f248332
v2.6.23-rc1~195^2~7

Signed-off-by: Alejandro Colomar 
---
 man2/spu_create.2 | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/man2/spu_create.2 b/man2/spu_create.2
index 4e6f5d730..3eeafee56 100644
--- a/man2/spu_create.2
+++ b/man2/spu_create.2
@@ -30,9 +30,8 @@ spu_create \- create a new spu context
 .B #include 
 .B #include 
 .PP
-.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode ");"
-.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode ","
-.BI "   int " neighbor_fd ");"
+.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode ,
+.BI "   int " neighbor_fd );
 .fi
 .PP
 .IR Note :
@@ -247,6 +246,17 @@ By convention, it gets mounted in
 The
 .BR spu_create ()
 system call was added to Linux in kernel 2.6.16.
+.PP
+.\" commit 8e68e2f248332a9c3fd4f08258f488c209bd3e0c
+Before Linux 2.6.23, the prototype for
+.BR spu_create ()
+was:
+.PP
+.in +4n
+.EX
+.BI "int spu_create(const char *" pathname ", int " flags ", mode_t " mode );
+.EE
+.in
 .SH CONFORMING TO
 This call is Linux-specific and implemented only on the PowerPC
 architecture.
-- 
2.29.2

Re: set_thread_area.2: csky architecture undocumented

2020-11-26 Thread Alejandro Colomar (mailing lists; readonly)

Hi Guo,

Thanks for the details!
I'll try to add csky to the man page,
and if I have any doubts I'll ask you.
Anyway, I'll CC you in any change I propose.

Cheers,

Alex

On 11/24/20 1:07 PM, Guo Ren wrote:
> Thx Michael & Alejandro,
> 
> Yes, the man page has no csky's.
> 
> C-SKY have abiv1 and abiv2
> For abiv1: There is no register for tls saving, We use trap 3 to got
> tls and use set_thread_area to init ti->tp_value.
> For abiv2: The r31 is the tls register. We could directly read r31 to
> got r31 and use set_thread_area to init reg->tls value.
> 
> In glibc:
> # ifdef __CSKYABIV2__
> /* Define r31 as thread pointer register.  */
> #  define READ_THREAD_POINTER() \
> mov r0, r31;
> # else
> #  define READ_THREAD_POINTER() \
> trap 3;
> # endif
> 
> /* Code to initially initialize the thread pointer.  This might need
>special attention since 'errno' is not yet available and if the
>operation can cause a failure 'errno' must not be touched.  */
> # define TLS_INIT_TP(tcbp) \
>   ({ INTERNAL_SYSCALL_DECL (err);   \
>  long result_var;   \
>  result_var = INTERNAL_SYSCALL (set_thread_area, err, 1,\
> (char *) (tcbp) + TLS_TCB_OFFSET);  \
>  INTERNAL_SYSCALL_ERROR_P (result_var, err) \
>? "unknown error" : NULL; })
> 
> In kernel:
> SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
> {
> struct thread_info *ti = task_thread_info(current);
> struct pt_regs *reg = current_pt_regs();
> 
> reg->tls = addr;
> ti->tp_value = addr;
> 
> return 0;
> }
> 
> Any comments are welcome :)
> 
> 
> On Tue, Nov 24, 2020 at 5:51 PM Michael Kerrisk (man-pages)
>  wrote:
>>
>> Hi Alex,
>>
>> On 11/23/20 10:31 PM, Alejandro Colomar (man-pages) wrote:
>>> Hi Michael,
>>>
>>> SYNOPSIS
>>>#include 
>>>
>>>#if defined __i386__ || defined __x86_64__
>>># include 
>>>
>>>int get_thread_area(struct user_desc *u_info);
>>>int set_thread_area(struct user_desc *u_info);
>>>
>>>#elif defined __m68k__
>>>
>>>int get_thread_area(void);
>>>int set_thread_area(unsigned long tp);
>>>
>>>#elif defined __mips__
>>>
>>>int set_thread_area(unsigned long addr);
>>>
>>>#endif
>>>
>>>Note: There are no glibc wrappers for these system  calls;  see
>>>NOTES.
>>>
>>>
>>> $ grep -rn 'SYSCALL_DEFINE.*et_thread_area'
>>> arch/csky/kernel/syscall.c:6:
>>> SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
>>> arch/mips/kernel/syscall.c:86:
>>> SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
>>> arch/x86/kernel/tls.c:191:
>>> SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, u_info)
>>> arch/x86/kernel/tls.c:243:
>>> SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, u_info)
>>> arch/x86/um/tls_32.c:277:
>>> SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, user_desc)
>>> arch/x86/um/tls_32.c:325:
>>> SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, user_desc)
>>>
>>>
>>> See kernel commit 4859bfca11c7d63d55175bcd85a75d6cee4b7184
>>>
>>>
>>> I'd change
>>> -  #elif defined __mips__
>>> +  #elif defined(__mips__ || __csky__)
>>>
>>> and then change the rest of the text to add csky when appropriate.
>>> Am I correct?
>>
>> AFAICT, you are correct. I think the reason that csky is missing is
>> that the architecture was added after this manual pages was added.
>>
>> Thanks,
>>
>> Michael
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
> 
> 
> 
> --
> Best Regards
>  Guo Ren
> 
> ML: https://lore.kernel.org/linux-csky/
>

Re: set_thread_area.2: csky architecture undocumented

2020-11-26 Thread Alejandro Colomar (mailing lists; readonly)

HI Michael,

On 11/24/20 10:51 AM, Michael Kerrisk (man-pages) wrote:
> Hi Alex,
> 
> On 11/23/20 10:31 PM, Alejandro Colomar (man-pages) wrote:
>> Hi Michael,
>>
>> SYNOPSIS
>>#include 
>>
>>#if defined __i386__ || defined __x86_64__
>># include 
>>
>>int get_thread_area(struct user_desc *u_info);
>>int set_thread_area(struct user_desc *u_info);
>>
>>#elif defined __m68k__
>>
>>int get_thread_area(void);
>>int set_thread_area(unsigned long tp);
>>
>>#elif defined __mips__
>>
>>int set_thread_area(unsigned long addr);
>>
>>#endif
>>
>>Note: There are no glibc wrappers for these system  calls;  see
>>NOTES.
>>
>>
>> $ grep -rn 'SYSCALL_DEFINE.*et_thread_area'
>> arch/csky/kernel/syscall.c:6:
>> SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
>> arch/mips/kernel/syscall.c:86:
>> SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
>> arch/x86/kernel/tls.c:191:
>> SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, u_info)
>> arch/x86/kernel/tls.c:243:
>> SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, u_info)
>> arch/x86/um/tls_32.c:277:
>> SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, user_desc)
>> arch/x86/um/tls_32.c:325:
>> SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, user_desc)
>>
>>
>> See kernel commit 4859bfca11c7d63d55175bcd85a75d6cee4b7184
>>
>>
>> I'd change
>> -  #elif defined __mips__
>> +  #elif defined(__mips__ || __csky__)
>>
>> and then change the rest of the text to add csky when appropriate.
>> Am I correct?
> 
> AFAICT, you are correct. I think the reason that csky is missing is
> that the architecture was added after this manual pages was added.

Yep, I guessed it was that :)

Thanks,

Alex

> 
> Thanks,
> 
> Michael
> 
>

[PATCH] set_tid_address.2: SYNOPSIS: Fix set_tid_address() return type

2020-11-23 Thread Alejandro Colomar

The Linux kernel uses 'pid_t' instead of 'long' for the return type.
As glibc provides no wrapper, use the same types the kernel uses.

$ sed -n 34,36p man-pages/man2/set_tid_address.2
.PP
.IR Note :
There is no glibc wrapper for this system call; see NOTES.

$ grep -rn 'SYSCALL_DEFINE.*set_tid_address' linux/
linux/kernel/fork.c:1632:
SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)

$ sed -n 1632,1638p linux/kernel/fork.c
SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
{
current->clear_child_tid = tidptr;

return task_pid_vnr(current);
}

$ grep -rn 'task_pid_vnr(struct' linux/
linux/include/linux/sched.h:1374:
static inline pid_t task_pid_vnr(struct task_struct *tsk)

Signed-off-by: Alejandro Colomar 
---
 man2/set_tid_address.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/set_tid_address.2 b/man2/set_tid_address.2
index 380efcdd8..b18b8efef 100644
--- a/man2/set_tid_address.2
+++ b/man2/set_tid_address.2
@@ -29,7 +29,7 @@ set_tid_address \- set pointer to thread ID
 .nf
 .B #include 
 .PP
-.BI "long set_tid_address(int *" tidptr );
+.BI "pid_t set_tid_address(int *" tidptr );
 .fi
 .PP
 .IR Note :
-- 
2.29.2

set_thread_area.2: csky architecture undocumented

2020-11-23 Thread Alejandro Colomar (man-pages)

Hi Michael,

SYNOPSIS
   #include 

   #if defined __i386__ || defined __x86_64__
   # include 

   int get_thread_area(struct user_desc *u_info);
   int set_thread_area(struct user_desc *u_info);

   #elif defined __m68k__

   int get_thread_area(void);
   int set_thread_area(unsigned long tp);

   #elif defined __mips__

   int set_thread_area(unsigned long addr);

   #endif

   Note: There are no glibc wrappers for these system  calls;  see
   NOTES.


$ grep -rn 'SYSCALL_DEFINE.*et_thread_area'
arch/csky/kernel/syscall.c:6:
SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
arch/mips/kernel/syscall.c:86:
SYSCALL_DEFINE1(set_thread_area, unsigned long, addr)
arch/x86/kernel/tls.c:191:
SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, u_info)
arch/x86/kernel/tls.c:243:
SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, u_info)
arch/x86/um/tls_32.c:277:
SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, user_desc)
arch/x86/um/tls_32.c:325:
SYSCALL_DEFINE1(get_thread_area, struct user_desc __user *, user_desc)


See kernel commit 4859bfca11c7d63d55175bcd85a75d6cee4b7184


I'd change
-  #elif defined __mips__
+  #elif defined(__mips__ || __csky__)

and then change the rest of the text to add csky when appropriate.
Am I correct?

Thanks,

Alex

[PATCH] restart_syscall.2: SYNOPSIS: Fix restart_syscall() return type

2020-11-23 Thread Alejandro Colomar

The Linux kernel uses 'long' instead of 'int' for the return type.
As glibc provides no wrapper, use the same types the kernel uses.

$ grep -rn 'SYSCALL_DEFINE.*(restart_syscall'
kernel/signal.c:2891:SYSCALL_DEFINE0(restart_syscall)

$ sed -n 2891,2895p kernel/signal.c
SYSCALL_DEFINE0(restart_syscall)
{
struct restart_block *restart = ¤t->restart_block;
return restart->fn(restart);
}

$ grep -rn 'struct restart_block {'
include/linux/restart_block.h:25:struct restart_block {

$ sed -n 25,56p include/linux/restart_block.h
struct restart_block {
long (*fn)(struct restart_block *);
union {
/* For futex_wait and futex_wait_requeue_pi */
struct {
u32 __user *uaddr;
u32 val;
u32 flags;
u32 bitset;
u64 time;
u32 __user *uaddr2;
} futex;
/* For nanosleep */
struct {
clockid_t clockid;
enum timespec_type type;
union {
struct __kernel_timespec __user *rmtp;
struct old_timespec32 __user *compat_rmtp;
};
u64 expires;
} nanosleep;
/* For poll */
struct {
struct pollfd __user *ufds;
int nfds;
int has_timeout;
unsigned long tv_sec;
unsigned long tv_nsec;
} poll;
    };
};

Signed-off-by: Alejandro Colomar 
---
 man2/restart_syscall.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/restart_syscall.2 b/man2/restart_syscall.2
index e7d96bd4d..21cc2df1d 100644
--- a/man2/restart_syscall.2
+++ b/man2/restart_syscall.2
@@ -34,7 +34,7 @@
 .SH NAME
 restart_syscall \- restart a system call after interruption by a stop signal
 .SH SYNOPSIS
-.B int restart_syscall(void);
+.B long restart_syscall(void);
 .PP
 .IR Note :
 There is no glibc wrapper for this system call; see NOTES.
-- 
2.29.2

Re: [PATCH] lseek.2: SYNOPSIS: Use correct types

2020-11-22 Thread Alejandro Colomar (man-pages)

Hi Florian,

On 11/22/20 1:43 PM, Florian Weimer wrote:
> * Alejandro Colomar:
> 
>> The Linux kernel uses 'unsigned int' instead of 'int' for 'fd' and
>> 'whence'.  As glibc provides no wrapper, use the same types the
>> kernel uses.
> 
> lseek is a POSIX interface, and glibc provides it.  POSIX uses int for
> file descriptors (and the whence parameter in case of lseek).
> 
> The llseek system call is a different matter, that's indeed
> Linux-specific.
> 

Ahhh, true.  So many similar functions... :p

Thanks,

Alex

Re: [PATCH] lseek.2: SYNOPSIS: Use correct types

2020-11-21 Thread Alejandro Colomar (man-pages)

Hi Michael,

I'm a bit lost in all the *lseek* pages.
You had a good read some months ago, so you may know it better.
I don't know which of those functions come from the kernel,
and which come from glibc (if any).
In the kernel I only found the lseek, llseek, and 32_llseek
(as you can see in the patch).
So if any other prototype needs to be updated, please do so.
Especially, have a look at lseek64(3),
which I suspect needs the same changes I propose in that patch.

Thanks,

Alex

On 11/21/20 6:30 PM, Alejandro Colomar wrote:
> The Linux kernel uses 'unsigned int' instead of 'int'
> for 'fd' and 'whence'.
> As glibc provides no wrapper, use the same types the kernel uses.
> 
> src/linux$ grep -rn "SYSCALL_DEFINE.*lseek"
> fs/read_write.c:322:SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, 
> unsigned int, whence)
> fs/read_write.c:328:COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, 
> compat_off_t, offset, unsigned int, whence)
> fs/read_write.c:336:SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, 
> offset_high,
> arch/mips/kernel/linux32.c:65:SYSCALL_DEFINE5(32_llseek, unsigned int, fd, 
> unsigned int, offset_high,
> 
> src/linux$ sed -n 322,325p fs/read_write.c
> SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
> {
>   return ksys_lseek(fd, offset, whence);
> }
> 
> Signed-off-by: Alejandro Colomar 
> ---
>  man2/lseek.2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man2/lseek.2 b/man2/lseek.2
> index e35e410a6..2ff878ffa 100644
> --- a/man2/lseek.2
> +++ b/man2/lseek.2
> @@ -51,7 +51,7 @@ lseek \- reposition read/write file offset
>  .br
>  .B #include 
>  .PP
> -.BI "off_t lseek(int " fd ", off_t " offset ", int " whence );
> +.BI "off_t lseek(unsigned int " fd ", off_t " offset ", unsigned int " 
> whence );
>  .SH DESCRIPTION
>  .BR lseek ()
>  repositions the file offset of the open file description
>

[PATCH] lseek.2: SYNOPSIS: Use correct types

2020-11-21 Thread Alejandro Colomar

The Linux kernel uses 'unsigned int' instead of 'int'
for 'fd' and 'whence'.
As glibc provides no wrapper, use the same types the kernel uses.

src/linux$ grep -rn "SYSCALL_DEFINE.*lseek"
fs/read_write.c:322:SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, 
unsigned int, whence)
fs/read_write.c:328:COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, 
compat_off_t, offset, unsigned int, whence)
fs/read_write.c:336:SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, 
offset_high,
arch/mips/kernel/linux32.c:65:SYSCALL_DEFINE5(32_llseek, unsigned int, fd, 
unsigned int, offset_high,

src/linux$ sed -n 322,325p fs/read_write.c
SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
{
return ksys_lseek(fd, offset, whence);
}

Signed-off-by: Alejandro Colomar 
---
 man2/lseek.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/lseek.2 b/man2/lseek.2
index e35e410a6..2ff878ffa 100644
--- a/man2/lseek.2
+++ b/man2/lseek.2
@@ -51,7 +51,7 @@ lseek \- reposition read/write file offset
 .br
 .B #include 
 .PP
-.BI "off_t lseek(int " fd ", off_t " offset ", int " whence );
+.BI "off_t lseek(unsigned int " fd ", off_t " offset ", unsigned int " whence 
);
 .SH DESCRIPTION
 .BR lseek ()
 repositions the file offset of the open file description
-- 
2.29.2

[PATCH 3/4] fs/attr.c, fs/bad_inode.c, fs/binfmt_aout.c, fs/binfmt_elf.c: Cosmetic

2020-11-21 Thread Alejandro Colomar

Slightly non-trivial changes:

- Move declarations to the top of function definitions.
- Split multiple assignments in a single line to
  multiple lines with a signle assignment each.

Signed-off-by: Alejandro Colomar 
---
 fs/attr.c|  5 ++---
 fs/bad_inode.c   |  5 +++--
 fs/binfmt_aout.c |  3 ++-
 fs/binfmt_elf.c  | 26 --
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index b32ad8c678a5..61f7a75ac330 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -62,13 +62,14 @@ int setattr_prepare(struct dentry *dentry, struct iattr 
*attr)
 {
struct inode *inode = d_inode(dentry);
unsigned int ia_valid = attr->ia_valid;
+   int error;
 
/*
 * First check size constraints.  These can't be overridden using
 * ATTR_FORCE.
 */
if (ia_valid & ATTR_SIZE) {
-   int error = inode_newsize_ok(inode, attr->ia_size);
+   error = inode_newsize_ok(inode, attr->ia_size);
if (error)
return error;
}
@@ -105,8 +106,6 @@ int setattr_prepare(struct dentry *dentry, struct iattr 
*attr)
 kill_priv:
/* User has permission for the change */
if (ia_valid & ATTR_KILL_PRIV) {
-   int error;
-
error = security_inode_killpriv(dentry);
if (error)
return error;
diff --git a/fs/bad_inode.c b/fs/bad_inode.c
index f0457b6c17dc..4c5e677ec423 100644
--- a/fs/bad_inode.c
+++ b/fs/bad_inode.c
@@ -200,8 +200,9 @@ void make_bad_inode(struct inode *inode)
remove_inode_hash(inode);
 
inode->i_mode = S_IFREG;
-   inode->i_atime = inode->i_mtime = inode->i_ctime =
-   current_time(inode);
+   inode->i_ctime = current_time(inode);
+   inode->i_mtime = inode->i_ctime;
+   inode->i_atime = inode->i_ctime;
inode->i_op = &bad_inode_ops;
inode->i_opflags &= ~IOP_XATTR;
inode->i_fop = &bad_file_ops;
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 92d6b70ddab0..976d5f1565e1 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -97,7 +97,8 @@ static unsigned long __user *create_aout_tables(char __user 
*p, struct linux_bin
} while (c);
}
put_user(NULL, argv);
-   current->mm->arg_end = current->mm->env_start = (unsigned long)p;
+   current->mm->env_start = (unsigned long)p;
+   current->mm->arg_end = (unsigned long)p;
while (envc-- > 0) {
char c;
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 955927ac2b80..b5e1e0a0917a 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1035,13 +1035,12 @@ static int load_elf_binary(struct linux_binprm *bprm)
unsigned long k, vaddr;
unsigned long total_size = 0;
unsigned long alignment;
+   unsigned long nbyte;
 
if (elf_ppnt->p_type != PT_LOAD)
continue;
 
if (unlikely(elf_brk > elf_bss)) {
-   unsigned long nbyte;
-
/*
 * There was a PT_LOAD segment with p_memsz > p_filesz
 * before this one. Map anonymous pages, if needed,
@@ -1277,10 +1276,12 @@ static int load_elf_binary(struct linux_binprm *bprm)
 */
if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) &&
elf_ex->e_type == ET_DYN && !interpreter) {
-   mm->brk = mm->start_brk = ELF_ET_DYN_BASE;
+   mm->start_brk   = ELF_ET_DYN_BASE;
+   mm->brk = ELF_ET_DYN_BASE;
}
 
-   mm->brk = mm->start_brk = arch_randomize_brk(mm);
+   mm->start_brk = arch_randomize_brk(mm);
+   mm->brk = mm->start_brk;
 #ifdef compat_brk_randomized
current->brk_randomized = 1;
 #endif
@@ -1506,7 +1507,8 @@ static void fill_note(struct memelfnote *note, const char 
*name, int type,
 static void fill_prstatus(struct elf_prstatus *prstatus,
  struct task_struct *p, long signr)
 {
-   prstatus->pr_info.si_signo = prstatus->pr_cursig = signr;
+   prstatus->pr_cursig = signr;
+   prstatus->pr_info.si_signo = signr;
prstatus->pr_sigpend = p->pending.signal.sig[0];
prstatus->pr_sighold = p->blocked.sig[0];
rcu_read_lock();
@@ -1618,6 +1620,7 @@ static int fill_files_note(struct memelfnote *note)
user_long_t *data;
user_long_t *start_end_ofs;
char *name_base, *name_curpos;
+   unsigned int shift_bytes;
 
/* *Estimated* file count and total data size needed */
count = mm->map_count;
@@ -1639,7 +1642,8 @@ static int fill_fil

[PATCH 4/4] fs/binfmt_elf.c: Cosmetic

2020-11-21 Thread Alejandro Colomar

Non-trivial changes:

Invert 'if's to simplify logic.
Use 'goto' in conjunction with the above, when appropriate.

Signed-off-by: Alejandro Colomar 
---
 fs/binfmt_elf.c | 115 +---
 1 file changed, 59 insertions(+), 56 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index b5e1e0a0917a..dbd50b5bf238 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1079,65 +1079,68 @@ static int load_elf_binary(struct linux_binprm *bprm)
 */
if (elf_ex->e_type == ET_EXEC || load_addr_set) {
elf_flags |= MAP_FIXED;
-   } else if (elf_ex->e_type == ET_DYN) {
-   /*
-* This logic is run once for the first LOAD Program
-* Header for ET_DYN binaries to calculate the
-* randomization (load_bias) for all the LOAD
-* Program Headers, and to calculate the entire
-* size of the ELF mapping (total_size). (Note that
-* load_addr_set is set to true later once the
-* initial mapping is performed.)
-*
-* There are effectively two types of ET_DYN
-* binaries: programs (i.e. PIE: ET_DYN with INTERP)
-* and loaders (ET_DYN without INTERP, since they
-* _are_ the ELF interpreter). The loaders must
-* be loaded away from programs since the program
-* may otherwise collide with the loader (especially
-* for ET_EXEC which does not have a randomized
-* position). For example to handle invocations of
-* "./ld.so someprog" to test out a new version of
-* the loader, the subsequent program that the
-* loader loads must avoid the loader itself, so
-* they cannot share the same load range. Sufficient
-* room for the brk must be allocated with the
-* loader as well, since brk must be available with
-* the loader.
-*
-* Therefore, programs are loaded offset from
-* ELF_ET_DYN_BASE and loaders are loaded into the
-* independently randomized mmap region (0 load_bias
-* without MAP_FIXED).
-*/
-   if (interpreter) {
-   load_bias = ELF_ET_DYN_BASE;
-   if (current->flags & PF_RANDOMIZE)
-   load_bias += arch_mmap_rnd();
-   alignment = maximum_alignment(elf_phdata, 
elf_ex->e_phnum);
-   if (alignment)
-   load_bias &= ~(alignment - 1);
-   elf_flags |= MAP_FIXED;
-   } else
-   load_bias = 0;
+   goto proceed_normally;
+   }
+   if (elf_ex->e_type != ET_DYN)
+   goto proceed_normally;
+   /*
+* This logic is run once for the first LOAD Program
+* Header for ET_DYN binaries to calculate the
+* randomization (load_bias) for all the LOAD
+* Program Headers, and to calculate the entire
+* size of the ELF mapping (total_size). (Note that
+* load_addr_set is set to true later once the
+* initial mapping is performed.)
+*
+* There are effectively two types of ET_DYN
+* binaries: programs (i.e. PIE: ET_DYN with INTERP)
+* and loaders (ET_DYN without INTERP, since they
+* _are_ the ELF interpreter). The loaders must
+* be loaded away from programs since the program
+* may otherwise collide with the loader (especially
+* for ET_EXEC which does not have a randomized
+* position). For example to handle invocations of
+* "./ld.so someprog" to test out a new version of
+* the loader, the subsequent program that the
+* loader loads must avoid the loader itself, so
+* they cannot share the same load range. Sufficient
+* room for the brk must be allocated with the
+* loader as well, since brk must be available with
+* the loader.
+*
+* Therefore, programs are loaded offset from
+* ELF_ET_DYN_BASE and loaders are loaded i

[PATCH 1/4] fs/anon_inodes.c: Use "%s" + func instead of hardcoding function name

2020-11-21 Thread Alejandro Colomar

Signed-off-by: Alejandro Colomar 
---
 fs/anon_inodes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..7609d208bb53 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -152,11 +152,11 @@ static int __init anon_inode_init(void)
 {
anon_inode_mnt = kern_mount(&anon_inode_fs_type);
if (IS_ERR(anon_inode_mnt))
-   panic("anon_inode_init() kernel mount failed (%ld)\n", 
PTR_ERR(anon_inode_mnt));
+   panic("%s() kernel mount failed (%ld)\n", __func__, 
PTR_ERR(anon_inode_mnt));
 
anon_inode_inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
if (IS_ERR(anon_inode_inode))
-   panic("anon_inode_init() inode allocation failed (%ld)\n", 
PTR_ERR(anon_inode_inode));
+   panic("%s() inode allocation failed (%ld)\n", __func__, 
PTR_ERR(anon_inode_inode));
 
return 0;
 }
-- 
2.28.0

[PATCH 2/4] fs/anon_inodes.c, fs/attr.c, fs/bad_inode.c, fs/binfmt_aout.c, fs/binfmt_elf.c: Cosmetic

2020-11-21 Thread Alejandro Colomar

This patch contains only trivial changes:
Some of them found with checkpatch.pl in strict mode.

- Remove trailing whitespace.
- Remove spaces coming before tabs.
- Fix typos in comments.
- Convert spaces into tabs.
- Add a space around operators that should have them,
  and remove them when they shouldn't have them sround,
  according to coding_style.rst.
- Use braces accordint to coding_style.rst.
- Align multi-line function prototypes, and other similar cases.
- Remove or add blank lines:
* Add blank lines after declarations, and before code.
* Remove blank lines after function definitions and before
  EXPORT_SYMBOL().
- Remove redundant parentheses, when the don't improve readability.
- Fix comments' style according to coding_style.rst.
- Simplify comparisons against NULL, using '!' (or nothing at all).
- Use C89 comments (/* */), instead of C99 (//).

Signed-off-by: Alejandro Colomar 
---
 fs/anon_inodes.c |   1 +
 fs/attr.c|   7 +--
 fs/bad_inode.c   |  50 +-
 fs/binfmt_aout.c |  94 +-
 fs/binfmt_elf.c  | 129 ---
 5 files changed, 143 insertions(+), 138 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 7609d208bb53..bef68dbcbb88 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -43,6 +43,7 @@ static const struct dentry_operations 
anon_inodefs_dentry_operations = {
 static int anon_inodefs_init_fs_context(struct fs_context *fc)
 {
struct pseudo_fs_context *ctx = init_pseudo(fc, ANON_INODE_FS_MAGIC);
+
if (!ctx)
return -ENOMEM;
ctx->dops = &anon_inodefs_dentry_operations;
diff --git a/fs/attr.c b/fs/attr.c
index b4bbdbd4c8ca..b32ad8c678a5 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -64,7 +64,7 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
unsigned int ia_valid = attr->ia_valid;
 
/*
-* First check size constraints.  These can't be overriden using
+* First check size constraints.  These can't be overridden using
 * ATTR_FORCE.
 */
if (ia_valid & ATTR_SIZE) {
@@ -220,7 +220,8 @@ EXPORT_SYMBOL(setattr_copy);
  * the file open for write, as there can be no conflicting delegation in
  * that case.
  */
-int notify_change(struct dentry * dentry, struct iattr * attr, struct inode 
**delegated_inode)
+int notify_change(struct dentry *dentry, struct iattr *attr,
+ struct inode **delegated_inode)
 {
struct inode *inode = dentry->d_inode;
umode_t mode = inode->i_mode;
@@ -284,7 +285,7 @@ int notify_change(struct dentry * dentry, struct iattr * 
attr, struct inode **de
 * no function will ever call notify_change with both ATTR_MODE and
 * ATTR_KILL_S*ID set.
 */
-   if ((ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID)) &&
+   if ((ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID)) &&
(ia_valid & ATTR_MODE))
BUG();
 
diff --git a/fs/bad_inode.c b/fs/bad_inode.c
index 54f0ce444272..f0457b6c17dc 100644
--- a/fs/bad_inode.c
+++ b/fs/bad_inode.c
@@ -22,25 +22,25 @@ static int bad_file_open(struct inode *inode, struct file 
*filp)
return -EIO;
 }
 
-static const struct file_operations bad_file_ops =
-{
+static const struct file_operations bad_file_ops = {
.open   = bad_file_open,
 };
 
-static int bad_inode_create (struct inode *dir, struct dentry *dentry,
-   umode_t mode, bool excl)
+static int bad_inode_create(struct inode *dir, struct dentry *dentry,
+   umode_t mode, bool excl)
 {
return -EIO;
 }
 
 static struct dentry *bad_inode_lookup(struct inode *dir,
-   struct dentry *dentry, unsigned int flags)
+  struct dentry *dentry,
+  unsigned int flags)
 {
return ERR_PTR(-EIO);
 }
 
-static int bad_inode_link (struct dentry *old_dentry, struct inode *dir,
-   struct dentry *dentry)
+static int bad_inode_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *dentry)
 {
return -EIO;
 }
@@ -50,25 +50,25 @@ static int bad_inode_unlink(struct inode *dir, struct 
dentry *dentry)
return -EIO;
 }
 
-static int bad_inode_symlink (struct inode *dir, struct dentry *dentry,
-   const char *symname)
+static int bad_inode_symlink(struct inode *dir, struct dentry *dentry,
+const char *symname)
 {
return -EIO;
 }
 
 static int bad_inode_mkdir(struct inode *dir, struct dentry *dentry,
-   umode_t mode)
+  umode_t mode)
 {
return -EIO;
 }
 
-static int bad_inode_rmdir (struct inode *dir, struct dentry *dentry)
+static int bad_inode_rmdir(struct inode *dir, struct dentry *dentry)
 {

1 2 3 >

1 - 100 of 214 matches

Mail list logo