Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear

2017-05-15 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 04:19:21PM -0500, Tom Lendacky wrote:
> Boot data (such as EFI related data) is not encrypted when the system is
> booted because UEFI/BIOS does not run with SME active. In order to access
> this data properly it needs to be mapped decrypted.
> 
> The early_memremap() support is updated to provide an arch specific

"Update early_memremap() to provide... "

> routine to modify the pagetable protection attributes before they are
> applied to the new mapping. This is used to remove the encryption mask
> for boot related data.
> 
> The memremap() support is updated to provide an arch specific routine

Ditto. Passive tone always reads harder than an active tone,
"doer"-sentence.

> to determine if RAM remapping is allowed.  RAM remapping will cause an
> encrypted mapping to be generated. By preventing RAM remapping,
> ioremap_cache() will be used instead, which will provide a decrypted
> mapping of the boot related data.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/io.h |4 +
>  arch/x86/mm/ioremap.c |  182 
> +
>  include/linux/io.h|2 
>  kernel/memremap.c |   20 -
>  mm/early_ioremap.c|   18 
>  5 files changed, 219 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index 7afb0e2..75f2858 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -381,4 +381,8 @@ extern int __must_check arch_phys_wc_add(unsigned long 
> base,
>  #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
>  #endif
>  
> +extern bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size,
> +unsigned long flags);
> +#define arch_memremap_do_ram_remap arch_memremap_do_ram_remap
> +
>  #endif /* _ASM_X86_IO_H */
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 9bfcb1f..bce0604 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -21,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "physaddr.h"
>  
> @@ -419,6 +421,186 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>   iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
>  }
>  
> +/*
> + * Examine the physical address to determine if it is an area of memory
> + * that should be mapped decrypted.  If the memory is not part of the
> + * kernel usable area it was accessed and created decrypted, so these
> + * areas should be mapped decrypted.
> + */
> +static bool memremap_should_map_decrypted(resource_size_t phys_addr,
> +   unsigned long size)
> +{
> + /* Check if the address is outside kernel usable area */
> + switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
> + case E820_TYPE_RESERVED:
> + case E820_TYPE_ACPI:
> + case E820_TYPE_NVS:
> + case E820_TYPE_UNUSABLE:
> + return true;
> + default:
> + break;
> + }
> +
> + return false;
> +}
> +
> +/*
> + * Examine the physical address to determine if it is EFI data. Check
> + * it against the boot params structure and EFI tables and memory types.
> + */
> +static bool memremap_is_efi_data(resource_size_t phys_addr,
> +  unsigned long size)
> +{
> + u64 paddr;
> +
> + /* Check if the address is part of EFI boot/runtime data */
> + if (efi_enabled(EFI_BOOT)) {

Save indentation level:

if (!efi_enabled(EFI_BOOT))
return false;


> + paddr = boot_params.efi_info.efi_memmap_hi;
> + paddr <<= 32;
> + paddr |= boot_params.efi_info.efi_memmap;
> + if (phys_addr == paddr)
> + return true;
> +
> + paddr = boot_params.efi_info.efi_systab_hi;
> + paddr <<= 32;
> + paddr |= boot_params.efi_info.efi_systab;

So those two above look like could be two global vars which are
initialized somewhere in the EFI init path:

efi_memmap_phys and efi_systab_phys or so.

Matt ?

And then you won't need to create that paddr each time on the fly. I
mean, it's not a lot of instructions but still...

> + if (phys_addr == paddr)
> + return true;
> +
> + if (efi_table_address_match(phys_addr))
> + return true;
> +
> + switch (efi_mem_type(phys_addr)) {
> + case EFI_BOOT_SERVICES_DATA:
> + case EFI_RUNTIME_SERVICES_DATA:
> + return true;
> + default:
> + break;
> + }
> + }
> +
> + return false;
> +}
> +
> +/*
> + * Examine the physical address to determine if it is boot data by checking
> + * it against the boot params setup_data chain.
> + */
> +static bool memremap_is_setup_data(resou

Re: [PATCH v5 14/32] efi: Add an EFI table address match function

2017-05-15 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 04:18:48PM -0500, Tom Lendacky wrote:
> Add a function that will determine if a supplied physical address matches
> the address of an EFI table.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  drivers/firmware/efi/efi.c |   33 +
>  include/linux/efi.h|7 +++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index b372aad..8f606a3 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -55,6 +55,25 @@ struct efi __read_mostly efi = {
>  };
>  EXPORT_SYMBOL(efi);
>  
> +static unsigned long *efi_tables[] = {
> + &efi.mps,
> + &efi.acpi,
> + &efi.acpi20,
> + &efi.smbios,
> + &efi.smbios3,
> + &efi.sal_systab,
> + &efi.boot_info,
> + &efi.hcdp,
> + &efi.uga,
> + &efi.uv_systab,
> + &efi.fw_vendor,
> + &efi.runtime,
> + &efi.config_table,
> + &efi.esrt,
> + &efi.properties_table,
> + &efi.mem_attr_table,
> +};
> +
>  static bool disable_runtime;
>  static int __init setup_noefi(char *arg)
>  {
> @@ -854,6 +873,20 @@ int efi_status_to_err(efi_status_t status)
>   return err;
>  }
>  
> +bool efi_table_address_match(unsigned long phys_addr)

efi_is_table_address() reads easier/better in the code.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/17] convert/reorganize Documentation/security/

2017-05-15 Thread Kees Cook
On Mon, May 15, 2017 at 10:26 AM, Jonathan Corbet  wrote:
> On Sat, 13 May 2017 04:51:36 -0700
> Kees Cook  wrote:
>
>> This ReSTifies everything under Documentation/security/, and reorganizes
>> some of it (mainly the LSMs) under /admin-guide/ per Jon's request. Since
>> /security/ is already being indexed under the kernel development portion
>> of the sphinx index, I didn't move it, keeping only things that were
>> directly related to internal kernel development (keys, creds, etc).
>>
>> I also updated some path references, and MAINTAINERS lines. Some of the
>> conversion could probably do with some tweaks, but I think this is a
>> good first step in the right direction.
>
> This all looks pretty good to me, though I'll confess I haven't actually
> built the resulting docs yet.  Assuming no issues turn up there, I'd be
> happy to just apply these and let any follow-on tweaks go from there.
> Thanks for doing this, and for humoring me on the organizational issues :)

My local tree builds the docs sanely from what I can see, so if it
looks good to you too, yeah, please take these as they are. I'm sure
we'll need tweaks going forward, but this seems like the bulk of the
organizational and basic ReST work.

BTW, something I noticed in while doing this conversion is the
difference between the section headings and the left-side nav bar at
the top level:
https://www.kernel.org/doc/html/latest/index.html

The section names don't match, and each of the Kernel API
Documentation sections is at the top level in the nav bar. I think
this would be cleaner if everything matched up, but I didn't yet dig
into figuring out why they were different.

-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/17] convert/reorganize Documentation/security/

2017-05-15 Thread Jonathan Corbet
On Sat, 13 May 2017 04:51:36 -0700
Kees Cook  wrote:

> This ReSTifies everything under Documentation/security/, and reorganizes
> some of it (mainly the LSMs) under /admin-guide/ per Jon's request. Since
> /security/ is already being indexed under the kernel development portion
> of the sphinx index, I didn't move it, keeping only things that were
> directly related to internal kernel development (keys, creds, etc).
> 
> I also updated some path references, and MAINTAINERS lines. Some of the
> conversion could probably do with some tweaks, but I think this is a
> good first step in the right direction.

This all looks pretty good to me, though I'll confess I haven't actually
built the resulting docs yet.  Assuming no issues turn up there, I'd be
happy to just apply these and let any follow-on tweaks go from there.
Thanks for doing this, and for humoring me on the organizational issues :)

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Mauro Carvalho Chehab
Em Mon, 15 May 2017 13:49:19 +0200
Peter Zijlstra  escreveu:

> On Mon, May 15, 2017 at 01:29:58PM +0300, Jani Nikula wrote:
> > On Mon, 15 May 2017, Peter Zijlstra  wrote:  
> > > The intention is to aid readability. Making comments worse so that some
> > > retarded script can generate better html or whatnot is just that,
> > > retarded.
> > >
> > > Code matters, generated documentation not so much. I'll take a comment
> > > that reads well over one that generates pretty html any day.  
> > 
> > The deal is that if you start your comments with "/**" they'll be
> > processed with the retarded script to produce pretty html.
> > 
> > For the most part the comments that generate pretty html also read well,
> > and we don't expect or want anyone to go overboard with markup. I don't
> > think it's unreasonable to make small concessions to improve generated
> > documentation for people who care about it even if you don't.  
> 
> No. Such a concession has pure negative value. It opens the door to more
> patches converting this or that comment to be prettier or whatnot. And
> before you know it there's a Markus like idiot spamming you with dozens
> of crap patches to prettify the generated crud.

I see your point. Nobody wants a pile of senseless random prettify patches
on their queue. Yet, on the other hand, nobody wants lots of warnings/errors
produced when building the Kernel or the documentation, as it can ride
important things that would require fixes. So, subsystem maintainers
need to find what works best for the subsystems they care of. That's
not different than accepting/rejecting a random patch.

That's said, from my side, I don't like the way ReST handle indentation.
I would have preferred some markup dialect that would be less sensitive
to it. My personal preference were to use docutils or doxygen (but I'm 
pretty sure they would have other limitations). 

Yet, ReST is not a bad choice, as it allows extending its syntax by 
writing Python scripts and adding  to our tree, with has been an 
interesting feature to extend it to our needs.

Yet, every parser/dialect have limitations. We have to deal with it
somehow.

Currently, kernel-doc avoids some indentation issues. For example,
in the code code below:

/**
 *@v:foo
 *bar
 ...
 */

The position of '@v' output, in ReST, would mangle indentation,
depending on the way it is converted. Yet, kernel-doc handles it
well. So, at least on some cases, kernel-doc works fine with 
indentation differences, making it transparent to the user. 
The above produces the following ReST output:

**Parameters**

``v``
  foo
  bar

Both "Parameters" and "v" will be bold; "v" will use a monospaced
font[1].

So, at least for most parameter/description indentation, kernel-doc
does the right thing.

[1 ] Btw, the *only* way I found on ReST notation to produce a bold
 monotonic font is to use:

``foo``
  bar

As doing **``foo``** or ``**foo**`` won't work - at least with
Sphinx up to version 1.4.

At least on media, some vars are enums, and we want to describe
the possible values used at enums, like on this kernel-doc comment
snippet:

 * Entities have flags that describe the entity capabilities and state:
 *
 * %MEDIA_ENT_FL_DEFAULT
 *indicates the default entity for a given type.
 *This can be used to report the default audio and video devices or the
 *default camera sensor.
 *

For it to work, kernel-doc should not mangle with whitespaces, passing the
indentation to Sphinx.

So, I fail to see a way to avoid fixing the few cases where the
indentation doesn't follow what's expected by ReST.

Yet, if you prefer a minimalist change, I can remove the ReST-specific
dialect, as in the enclosed patch.

> Not to mention that this would mean having to learn this rest crud in
> order to write these comments.
> 
> All things I'm not prepared to do.
> 
> 
> I'm all for useful comments, but I see no value _at_all_ in this
> generated nonsense. The only reason I sometimes use the docbook comment
> style is because its fairly uniform and the build bot gets you a warning
> when your function signature no longer matches with the comment. But
> if you make this painful I'll simply stop using them.

Thanks,
Mauro

[PATCH v2] mutex, futex: adjust kernel-doc markups to generate ReST

There are a few issues on some kernel-doc markups that was
causing troubles with kernel-doc output on ReST format.
Fix them.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab index, file_inode(vma->vm_file),
  * offset_within_page).  For private mappings, it's (uaddr, current->mm).
@@ -1259,9 +1259,9 @@ static int lock_pi_update_atomic(u32 __user *uaddr, u32 
uval, u32 newval)
  * @set_waiters:   force setting the FUTEX_WAITERS bit (1) or not (0)
  *
  * Return:
- *  0 - ready to wait;
- *  1 - acquired the lock;
- * <0 - error
+ *  -  0 - ready to wait;
+

Re: [PATCH 06/17] doc: security: minor cleanups to build kernel-doc

2017-05-15 Thread Jonathan Corbet
On Mon, 15 May 2017 09:17:25 +1000 (AEST)
James Morris  wrote:

> On Sat, 13 May 2017, Kees Cook wrote:
> 
> > These fixes were needed to parse lsm_hooks.h kernel-doc. More work is
> > needed, but this is the first step.
> > 
> > Cc: Casey Schaufler 
> > Signed-off-by: Kees Cook   
> 
> Should these changes go in via the docs tree or mine?
> 
> In any case:
> Acked-by: James Morris 

These changes are entirely independent from the documentation stuff, so it
could really go either way.  I'm happy to carry them with the set, if that
works for you.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Convert more books to ReST

2017-05-15 Thread Jonathan Corbet
On Mon, 15 May 2017 14:09:12 +0200
Boris Brezillon  wrote:

> >   mtd: adjust kernel-docs to avoid Sphinx/kerneldoc warnings  
> 
> Not sure how you plan to merge these changes, but if it goes through
> a single tree I'll probably need an immutable topic branch, because I
> plan to change a few things in nand_base.c nand.h for the next release.

docs-next doesn't rebase, so there shouldn't be trouble there.  But we
could also just separate this patch into two pieces.  I suspect we could
live with a couple of warnings for a period during the 4.13 merge window
without too much pain...

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/36] docs-rst: convert kernel-hacking to ReST

2017-05-15 Thread Jonathan Corbet
On Fri, 12 May 2017 10:59:44 -0300
Mauro Carvalho Chehab  wrote:

>  Documentation/DocBook/Makefile|2 +-
>  Documentation/DocBook/kernel-hacking.tmpl | 1312 
> -
>  Documentation/conf.py |2 +
>  Documentation/index.rst   |1 +
>  Documentation/kernel-hacking/conf.py  |   10 +
>  Documentation/kernel-hacking/index.rst|  794 +

So I was looking at part 3, wondering why I was seeing this material
again.  If you redo the series, it might be nice to just land it in
hacking.rst to begin with and reduce the subsequent churn a little bit.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Darren Hart
On Mon, May 15, 2017 at 01:49:19PM +0200, Peter Zijlstra wrote:
> On Mon, May 15, 2017 at 01:29:58PM +0300, Jani Nikula wrote:
> > On Mon, 15 May 2017, Peter Zijlstra  wrote:
> > > The intention is to aid readability. Making comments worse so that some
> > > retarded script can generate better html or whatnot is just that,
> > > retarded.
> > >
> > > Code matters, generated documentation not so much. I'll take a comment
> > > that reads well over one that generates pretty html any day.
> > 
> > The deal is that if you start your comments with "/**" they'll be
> > processed with the retarded script to produce pretty html.
> > 
> > For the most part the comments that generate pretty html also read well,
> > and we don't expect or want anyone to go overboard with markup. I don't
> > think it's unreasonable to make small concessions to improve generated
> > documentation for people who care about it even if you don't.
> 
> No. Such a concession has pure negative value. It opens the door to more
> patches converting this or that comment to be prettier or whatnot. And
> before you know it there's a Markus like idiot spamming you with dozens
> of crap patches to prettify the generated crud.

Well that I can certainly understand.

> 
> Not to mention that this would mean having to learn this rest crud in
> order to write these comments.

I have complete confidence in you here Peter :-b

> 
> All things I'm not prepared to do.
> 
> 
> I'm all for useful comments, but I see no value _at_all_ in this
> generated nonsense. The only reason I sometimes use the docbook comment
> style is because its fairly uniform and the build bot gets you a warning
> when your function signature no longer matches with the comment. But
> if you make this painful I'll simply stop using them.
> 

Making documentation more accessible to people is a good thing. This type of
automated publication reduces the barrier to access. The lack of this kind of
tooling, honestly, also discourages participation among some groups of
of capable contributors.

That said, I support the direction both Mauro and Peter have voiced to minimize
the impact to comment blocks. What does rest do with this formatting it doesn't
understand - does it fail gracefully? Falling back to  or something
like that?

-- 
Darren Hart
VMware Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/36] fs: locks: Fix some troubles at kernel-doc comments

2017-05-15 Thread J. Bruce Fields
On Sat, May 13, 2017 at 06:14:06AM -0300, Mauro Carvalho Chehab wrote:
> Hi Jeff,
> 
> Em Fri, 12 May 2017 10:02:56 -0400
> Jeff Layton  escreveu:
> 
> > On Fri, 2017-05-12 at 11:00 -0300, Mauro Carvalho Chehab wrote:
> > > There are a few syntax violations that cause outputs of
> > > a few comments to not be properly parsed in ReST format.
> > > 
> > > No functional changes.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab 
> > > ---
> > >  fs/locks.c | 18 --
> > >  1 file changed, 8 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 26811321d39b..bdce708e4251 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -1858,8 +1858,8 @@ EXPORT_SYMBOL(generic_setlease);
> > >   *
> > >   * Call this to establish a lease on the file. The "lease" argument is 
> > > not
> > >   * used for F_UNLCK requests and may be NULL. For commands that set or 
> > > alter
> > > - * an existing lease, the (*lease)->fl_lmops->lm_break operation must be 
> > > set;
> > > - * if not, this function will return -ENOLCK (and generate a 
> > > scary-looking
> > > + * an existing lease, the ``(*lease)->fl_lmops->lm_break`` operation 
> > > must be
> > > + * set; if not, this function will return -ENOLCK (and generate a 
> > > scary-looking
> > >   * stack trace).
> > >   *
> > >   * The "priv" pointer is passed directly to the lm_setup function as-is. 
> > > It
> > > @@ -1972,15 +1972,13 @@ EXPORT_SYMBOL(locks_lock_inode_wait);
> > >   *   @cmd: the type of lock to apply.
> > >   *
> > >   *   Apply a %FL_FLOCK style lock to an open file descriptor.
> > > - *   The @cmd can be one of
> > > + *   The @cmd can be one of:
> > >   *
> > > - *   %LOCK_SH -- a shared lock.
> > > - *
> > > - *   %LOCK_EX -- an exclusive lock.
> > > - *
> > > - *   %LOCK_UN -- remove an existing lock.
> > > - *
> > > - *   %LOCK_MAND -- a `mandatory' flock.  This exists to emulate 
> > > Windows Share Modes.
> > > + *   - %LOCK_SH -- a shared lock.
> > > + *   - %LOCK_EX -- an exclusive lock.
> > > + *   - %LOCK_UN -- remove an existing lock.
> > > + *   - %LOCK_MAND -- a 'mandatory' flock.
> > > + * This exists to emulate Windows Share Modes.
> > >   *
> > >   *   %LOCK_MAND can be combined with %LOCK_READ or %LOCK_WRITE to 
> > > allow other
> > >   *   processes read and write access respectively.  
> > 
> > LGTM. Do you need me or Bruce to pick this one up?
> 
> Feel free to pick it, if it works best for you.
> 
> > Reviewed-by: Jeff Layton 

I'll take it for 4.13.  Thanks!

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/17] doc: security: minor cleanups to build kernel-doc

2017-05-15 Thread Kees Cook
On Sun, May 14, 2017 at 5:00 PM, Casey Schaufler  wrote:
> On 5/13/2017 4:51 AM, Kees Cook wrote:
>> These fixes were needed to parse lsm_hooks.h kernel-doc. More work is
>> needed, but this is the first step.
>>
>> Cc: Casey Schaufler 
>> Signed-off-by: Kees Cook 
>
> Acked_by: Casey Schaufler 
>
> Tell me more about the additional work that's needed.

What I wanted to do was insert the kernel-doc from lsm_hooks.h into
the LSM kernel API documentation .rst file (via the special ReST
markup that includes structure documentation). There is, however,
free-form text in the existing union security_list_options kernel-doc
to announce related function groups which ReST just kind of skips over
and the collects all at the end in the HTML output. It also orders the
HTML doc output by the struct ordering, which makes things even
stranger to parse. And additionally, kernel-doc for fields that are
function pointers is especially hard to read.

So, while this patch fixes the kernel-doc to at least be parsed
without errors, it doesn't really fix the overall appearance, which
I'm not sure how to fix yet. It might be possible to do per-struct
"/** DOC:" markup, but I decided to leave that for another pass in the
future.

If there is a v2 of this series, I'll update the changelog to include
these details. :)

-Kees

>> ---
>>  include/linux/lsm_hooks.h | 25 -
>>  1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 080f34e66017..a1eeaf603d2f 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -29,6 +29,8 @@
>>  #include 
>>
>>  /**
>> + * union security_list_options - Linux Security Module hook function list
>> + *
>>   * Security hooks for program execution operations.
>>   *
>>   * @bprm_set_creds:
>> @@ -193,8 +195,8 @@
>>   *   @value will be set to the allocated attribute value.
>>   *   @len will be set to the length of the value.
>>   *   Returns 0 if @name and @value have been successfully set,
>> - *   -EOPNOTSUPP if no security attribute is needed, or
>> - *   -ENOMEM on memory allocation failure.
>> + *   -EOPNOTSUPP if no security attribute is needed, or
>> + *   -ENOMEM on memory allocation failure.
>>   * @inode_create:
>>   *   Check permission to create a regular file.
>>   *   @dir contains inode structure of the parent of the new file.
>> @@ -510,8 +512,7 @@
>>   *   process @tsk.  Note that this hook is sometimes called from interrupt.
>>   *   Note that the fown_struct, @fown, is never outside the context of a
>>   *   struct file, so the file structure (and associated security 
>> information)
>> - *   can always be obtained:
>> - *   container_of(fown, struct file, f_owner)
>> + *   can always be obtained: container_of(fown, struct file, f_owner)
>>   *   @tsk contains the structure of task receiving signal.
>>   *   @fown contains the file owner information.
>>   *   @sig is the signal that will be sent.  When 0, kernel sends SIGIO.
>> @@ -521,7 +522,7 @@
>>   *   to receive an open file descriptor via socket IPC.
>>   *   @file contains the file structure being received.
>>   *   Return 0 if permission is granted.
>> - * @file_open
>> + * @file_open:
>>   *   Save open-time permission checking state for later use upon
>>   *   file_permission, and recheck access if anything has changed
>>   *   since inode_permission.
>> @@ -1143,7 +1144,7 @@
>>   *   @sma contains the semaphore structure.  May be NULL.
>>   *   @cmd contains the operation to be performed.
>>   *   Return 0 if permission is granted.
>> - * @sem_semop
>> + * @sem_semop:
>>   *   Check permissions before performing operations on members of the
>>   *   semaphore set @sma.  If the @alter flag is nonzero, the semaphore set
>>   *   may be modified.
>> @@ -1153,20 +1154,20 @@
>>   *   @alter contains the flag indicating whether changes are to be made.
>>   *   Return 0 if permission is granted.
>>   *
>> - * @binder_set_context_mgr
>> + * @binder_set_context_mgr:
>>   *   Check whether @mgr is allowed to be the binder context manager.
>>   *   @mgr contains the task_struct for the task being registered.
>>   *   Return 0 if permission is granted.
>> - * @binder_transaction
>> + * @binder_transaction:
>>   *   Check whether @from is allowed to invoke a binder transaction call
>>   *   to @to.
>>   *   @from contains the task_struct for the sending task.
>>   *   @to contains the task_struct for the receiving task.
>> - * @binder_transfer_binder
>> + * @binder_transfer_binder:
>>   *   Check whether @from is allowed to transfer a binder reference to @to.
>>   *   @from contains the task_struct for the sending task.
>>   *   @to contains the task_struct for the receiving task.
>> - * @binder_transfer_file
>> + * @binder_transfer_file:
>>   *   Check whether @from is allowed to transfer @file to @to.
>>   *   @from contains the task_struct for the sending task.
>>   *   @file contains t

Re: [PATCH 13/17] doc: ReSTify Smack.txt

2017-05-15 Thread Casey Schaufler
On 5/13/2017 4:51 AM, Kees Cook wrote:
> Adjusts for ReST markup and moves under LSM admin guide.
>
> Cc: Casey Schaufler 
> Signed-off-by: Kees Cook 

Acked-by: Casey Schaufler 

Thank you.

> ---
>  .../Smack.txt => admin-guide/LSM/Smack.rst}| 273 
> ++---
>  Documentation/admin-guide/LSM/index.rst|   1 +
>  Documentation/security/00-INDEX|   2 -
>  MAINTAINERS|   2 +-
>  4 files changed, 191 insertions(+), 87 deletions(-)
>  rename Documentation/{security/Smack.txt => admin-guide/LSM/Smack.rst} (85%)
>
> diff --git a/Documentation/security/Smack.txt 
> b/Documentation/admin-guide/LSM/Smack.rst
> similarity index 85%
> rename from Documentation/security/Smack.txt
> rename to Documentation/admin-guide/LSM/Smack.rst
> index 945cc633d883..6a5826a13aea 100644
> --- a/Documentation/security/Smack.txt
> +++ b/Documentation/admin-guide/LSM/Smack.rst
> @@ -1,3 +1,6 @@
> +=
> +Smack
> +=
>  
>  
>  "Good for you, you've decided to clean the elevator!"
> @@ -14,6 +17,7 @@ available to determine which is best suited to the problem
>  at hand.
>  
>  Smack consists of three major components:
> +
>  - The kernel
>  - Basic utilities, which are helpful but not required
>  - Configuration data
> @@ -39,16 +43,24 @@ The current git repository for Smack user space is:
>  This should make and install on most modern distributions.
>  There are five commands included in smackutil:
>  
> -chsmack- display or set Smack extended attribute values
> -smackctl   - load the Smack access rules
> -smackaccess - report if a process with one label has access
> -  to an object with another
> +chsmack:
> + display or set Smack extended attribute values
> +
> +smackctl:
> + load the Smack access rules
> +
> +smackaccess:
> + report if a process with one label has access
> + to an object with another
>  
>  These two commands are obsolete with the introduction of
>  the smackfs/load2 and smackfs/cipso2 interfaces.
>  
> -smackload  - properly formats data for writing to smackfs/load
> -smackcipso - properly formats data for writing to smackfs/cipso
> +smackload:
> + properly formats data for writing to smackfs/load
> +
> +smackcipso:
> + properly formats data for writing to smackfs/cipso
>  
>  In keeping with the intent of Smack, configuration data is
>  minimal and not strictly required. The most important
> @@ -56,15 +68,15 @@ configuration step is mounting the smackfs pseudo 
> filesystem.
>  If smackutil is installed the startup script will take care
>  of this, but it can be manually as well.
>  
> -Add this line to /etc/fstab:
> +Add this line to ``/etc/fstab``::
>  
>  smackfs /sys/fs/smackfs smackfs defaults 0 0
>  
> -The /sys/fs/smackfs directory is created by the kernel.
> +The ``/sys/fs/smackfs`` directory is created by the kernel.
>  
>  Smack uses extended attributes (xattrs) to store labels on filesystem
>  objects. The attributes are stored in the extended attribute security
> -name space. A process must have CAP_MAC_ADMIN to change any of these
> +name space. A process must have ``CAP_MAC_ADMIN`` to change any of these
>  attributes.
>  
>  The extended attributes that Smack uses are:
> @@ -73,14 +85,17 @@ SMACK64
>   Used to make access control decisions. In almost all cases
>   the label given to a new filesystem object will be the label
>   of the process that created it.
> +
>  SMACK64EXEC
>   The Smack label of a process that execs a program file with
>   this attribute set will run with this attribute's value.
> +
>  SMACK64MMAP
>   Don't allow the file to be mmapped by a process whose Smack
>   label does not allow all of the access permitted to a process
>   with the label contained in this attribute. This is a very
>   specific use case for shared libraries.
> +
>  SMACK64TRANSMUTE
>   Can only have the value "TRUE". If this attribute is present
>   on a directory when an object is created in the directory and
> @@ -89,27 +104,29 @@ SMACK64TRANSMUTE
>   gets the label of the directory instead of the label of the
>   creating process. If the object being created is a directory
>   the SMACK64TRANSMUTE attribute is set as well.
> +
>  SMACK64IPIN
>   This attribute is only available on file descriptors for sockets.
>   Use the Smack label in this attribute for access control
>   decisions on packets being delivered to this socket.
> +
>  SMACK64IPOUT
>   This attribute is only available on file descriptors for sockets.
>   Use the Smack label in this attribute for access control
>   decisions on packets coming from this socket.
>  
> -There are multiple ways to set a Smack label on a file:
> +There are multiple ways to set a Smack label on a file::
>  
>  # attr -S -s SMACK64 -V "value" path
>  # chsmack -a value path
>  
>  A process can see the Smack label it is ru

Re: [PATCH] mm, docs: update memory.stat description with workingset* entries

2017-05-15 Thread Johannes Weiner
On Thu, May 11, 2017 at 08:18:13PM +0100, Roman Gushchin wrote:
> Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in
> cache workingset transition") introduced three new entries in memory
> stat file:
>  - workingset_refault,
>  - workingset_activate,
>  - workingset_nodereclaim.
> 
> This commit adds a corresponding description to the cgroup v2 docs.
> 
> Signed-off-by: Roman Gushchin 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 
> Cc: Tejun Heo 
> Cc: Li Zefan 
> Cc: cgro...@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org

Acked-by: Johannes Weiner 

Thanks Roman!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] Documentation: howto: Remove outdated info about bugzilla mailing lists

2017-05-15 Thread Jonathan Neuschäfer
The mailing list archives[1,2] show no activity since September 2011.

[1]: https://lists.linuxfoundation.org/pipermail/bugme-new/
[2]: https://lists.linuxfoundation.org/pipermail/bugme-janitors/

Signed-off-by: Jonathan Neuschäfer 
---
 Documentation/process/howto.rst | 8 
 1 file changed, 8 deletions(-)

diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst
index 340fa18ff341..b696a51a832c 100644
--- a/Documentation/process/howto.rst
+++ b/Documentation/process/howto.rst
@@ -380,14 +380,6 @@ bugs is one of the best ways to get merits among other 
developers, because
 not many people like wasting time fixing other people's bugs.
 
 To work in the already reported bug reports, go to https://bugzilla.kernel.org.
-If you want to be advised of the future bug reports, you can subscribe to the
-bugme-new mailing list (only new bug reports are mailed here) or to the
-bugme-janitor mailing list (every change in the bugzilla is mailed here)
-
-   https://lists.linux-foundation.org/mailman/listinfo/bugme-new
-
-   https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors
-
 
 
 Mailing lists
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] Documentation: Remove outdated info about -git patches

2017-05-15 Thread Jonathan Neuschäfer
Since the 3.2 cycle, there were no -git patches/tarballs on kernel.org.

Signed-off-by: Jonathan Neuschäfer 
---
 Documentation/process/applying-patches.rst | 40 +-
 Documentation/process/howto.rst|  9 ---
 2 files changed, 1 insertion(+), 48 deletions(-)

diff --git a/Documentation/process/applying-patches.rst 
b/Documentation/process/applying-patches.rst
index a0d058cc6d25..eaf3e0296d6d 100644
--- a/Documentation/process/applying-patches.rst
+++ b/Documentation/process/applying-patches.rst
@@ -344,7 +344,7 @@ possible.
 
 This is a good branch to run for people who want to help out testing
 development kernels but do not want to run some of the really experimental
-stuff (such people should see the sections about -git and -mm kernels below).
+stuff (such people should see the sections about -mm kernels below).
 
 The -rc patches are not incremental, they apply to a base 4.x kernel, just
 like the 4.x.y patches described above. The kernel version before the -rcN
@@ -380,44 +380,6 @@ Here are 3 examples of how to apply these patches::
$ mv linux-4.7.3 linux-4.8-rc5  # rename the kernel source dir
 
 
-The -git kernels
-
-
-These are daily snapshots of Linus' kernel tree (managed in a git
-repository, hence the name).
-
-These patches are usually released daily and represent the current state of
-Linus's tree. They are more experimental than -rc kernels since they are
-generated automatically without even a cursory glance to see if they are
-sane.
-
--git patches are not incremental and apply either to a base 4.x kernel or
-a base 4.x-rc kernel -- you can see which from their name.
-A patch named 4.7-git1 applies to the 4.7 kernel source and a patch
-named 4.8-rc3-git2 applies to the source of the 4.8-rc3 kernel.
-
-Here are some examples of how to apply these patches::
-
-   # moving from 4.7 to 4.7-git1
-
-   $ cd ~/linux-4.7# change to the kernel source 
dir
-   $ patch -p1 < ../patch-4.7-git1 # apply the 4.7-git1 patch
-   $ cd ..
-   $ mv linux-4.7 linux-4.7-git1   # rename the kernel source dir
-
-   # moving from 4.7-git1 to 4.8-rc2-git3
-
-   $ cd ~/linux-4.7-git1   # change to the kernel source 
dir
-   $ patch -p1 -R < ../patch-4.7-git1  # revert the 4.7-git1 patch
-   # we now have a 4.7 kernel
-   $ patch -p1 < ../patch-4.8-rc2  # apply the 4.8-rc2 patch
-   # the kernel is now 4.8-rc2
-   $ patch -p1 < ../patch-4.8-rc2-git3 # apply the 4.8-rc2-git3 patch
-   # the kernel is now 4.8-rc2-git3
-   $ cd ..
-   $ mv linux-4.7-git1 linux-4.8-rc2-git3  # rename source dir
-
-
 The -mm patches and the linux-next tree
 ===
 
diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst
index 1260f60d4cb9..340fa18ff341 100644
--- a/Documentation/process/howto.rst
+++ b/Documentation/process/howto.rst
@@ -314,15 +314,6 @@ The file Documentation/process/stable-kernel-rules.rst in 
the kernel tree
 documents what kinds of changes are acceptable for the -stable tree, and
 how the release process works.
 
-4.x -git patches
-
-
-These are daily snapshots of Linus' kernel tree which are managed in a
-git repository (hence the name.) These patches are usually released
-daily and represent the current state of Linus' tree.  They are more
-experimental than -rc kernels since they are generated automatically
-without even a cursory glance to see if they are sane.
-
 Subsystem Specific kernel trees and patches
 ~~~
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] Documentation: kernel-docs: Remove "Here is its" at the end of lines

2017-05-15 Thread Jonathan Neuschäfer
Before commit 9e03ea7f683e ("Documentation/kernel-docs.txt: convert it to
ReST markup"), it read:

   Description: Linux Journal Kernel Korner article. Here is its
   abstract: "..."

In Sphinx' HTML formatting, however, the "Here is its" doesn't make
sense anymore, because the "Abstract:" is clearly separated.

Signed-off-by: Jonathan Neuschäfer 
---
 Documentation/process/kernel-docs.rst | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/process/kernel-docs.rst 
b/Documentation/process/kernel-docs.rst
index 6ff8291194ee..a9aad381c3a4 100644
--- a/Documentation/process/kernel-docs.rst
+++ b/Documentation/process/kernel-docs.rst
@@ -356,7 +356,7 @@ On-line docs
   :URL: http://www.linuxjournal.com/article.php?sid=2391
   :Date: 1997
   :Keywords: RAID, MD driver.
-  :Description: Linux Journal Kernel Korner article. Here is its
+  :Description: Linux Journal Kernel Korner article.
   :Abstract: *A description of the implementation of the RAID-1,
 RAID-4 and RAID-5 personalities of the MD device driver in the
 Linux kernel, providing users with high performance and reliable,
@@ -381,7 +381,7 @@ On-line docs
   :Date: 1996
   :Keywords: device driver, module, loading/unloading modules,
 allocating resources.
-  :Description: Linux Journal Kernel Korner article. Here is its
+  :Description: Linux Journal Kernel Korner article.
   :Abstract: *This is the first of a series of four articles
 co-authored by Alessandro Rubini and Georg Zezchwitz which present
 a practical approach to writing Linux device drivers as kernel
@@ -397,7 +397,7 @@ On-line docs
   :Keywords: character driver, init_module, clean_up module,
 autodetection, mayor number, minor number, file operations,
 open(), close().
-  :Description: Linux Journal Kernel Korner article. Here is its
+  :Description: Linux Journal Kernel Korner article.
   :Abstract: *This article, the second of four, introduces part of
 the actual code to create custom module implementing a character
 device driver. It describes the code for module initialization and
@@ -410,7 +410,7 @@ On-line docs
   :Date: 1996
   :Keywords: read(), write(), select(), ioctl(), blocking/non
 blocking mode, interrupt handler.
-  :Description: Linux Journal Kernel Korner article. Here is its
+  :Description: Linux Journal Kernel Korner article.
   :Abstract: *This article, the third of four on writing character
 device drivers, introduces concepts of reading, writing, and using
 ioctl-calls*.
@@ -421,7 +421,7 @@ On-line docs
   :URL: http://www.linuxjournal.com/article.php?sid=1222
   :Date: 1996
   :Keywords: interrupts, irqs, DMA, bottom halves, task queues.
-  :Description: Linux Journal Kernel Korner article. Here is its
+  :Description: Linux Journal Kernel Korner article.
   :Abstract: *This is the fourth in a series of articles about
 writing character device drivers as loadable kernel modules. This
 month, we further investigate the field of interrupt handling.
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] Documentation: kernel-docs: Move vfs.txt under "Docs at the Linux Kernel tree"

2017-05-15 Thread Jonathan Neuschäfer
It's unneccessary to point to an external mirror of the Documentation
directory. Also, drop the date field, because in-kernel documentation is
continually updated.

Signed-off-by: Jonathan Neuschäfer 
---
 Documentation/process/kernel-docs.rst | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/Documentation/process/kernel-docs.rst 
b/Documentation/process/kernel-docs.rst
index 05a7857a4a83..6ff8291194ee 100644
--- a/Documentation/process/kernel-docs.rst
+++ b/Documentation/process/kernel-docs.rst
@@ -84,6 +84,17 @@ The Sphinx books should be built with ``make {htmldocs | 
pdfdocs | epubdocs}``.
 different". Freely redistributable under the conditions of the GNU
 General Public License.
 
+* Title: **Overview of the Virtual File System**
+
+  :Author: Richard Gooch.
+  :Location: Documentation/filesystems/vfs.txt
+  :Keywords: VFS, File System, mounting filesystems, opening files,
+dentries, dcache.
+  :Description: Brief introduction to the Linux Virtual File System.
+What is it, how it works, operations taken when opening a file or
+mounting a file system and description of important data
+structures explaining the purpose of each of their entries.
+
 On-line docs
 
 
@@ -127,18 +138,6 @@ On-line docs
 [...]. This paper examines some common problems for
 submitting larger changes and some strategies to avoid problems.
 
-* Title: **Overview of the Virtual File System**
-
-  :Author: Richard Gooch.
-  :URL: http://www.mjmwired.net/kernel/Documentation/filesystems/vfs.txt
-  :Date: 2007
-  :Keywords: VFS, File System, mounting filesystems, opening files,
-dentries, dcache.
-  :Description: Brief introduction to the Linux Virtual File System.
-What is it, how it works, operations taken when opening a file or
-mounting a file system and description of important data
-structures explaining the purpose of each of their entries.
-
 * Title: **Linux Device Drivers, Third Edition**
 
   :Author: Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] Documentation: mono: Update links and s/CVS/Git/

2017-05-15 Thread Jonathan Neuschäfer
The old URLs redirect to the new ones, so just update them in mono.rst.

Signed-off-by: Jonathan Neuschäfer 
---
 Documentation/admin-guide/mono.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/mono.rst 
b/Documentation/admin-guide/mono.rst
index cdddc099af64..77045ce548a0 100644
--- a/Documentation/admin-guide/mono.rst
+++ b/Documentation/admin-guide/mono.rst
@@ -9,14 +9,14 @@ This will allow you to execute Mono-based .NET binaries just 
like any
 other program after you have done the following:
 
 1) You MUST FIRST install the Mono CLR support, either by downloading
-   a binary package, a source tarball or by installing from CVS. Binary
+   a binary package, a source tarball or by installing from Git. Binary
packages for several distributions can be found at:
 
-   http://go-mono.com/download.html
+   http://www.mono-project.com/download/
 
Instructions for compiling Mono can be found at:
 
-   http://www.go-mono.com/compiling.html
+   http://www.mono-project.com/docs/compiling-mono/
 
Once the Mono CLR support has been installed, just check that
``/usr/bin/mono`` (which could be located elsewhere, for example
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] Documentation: coding-style: Escape \n\t to fix HTML rendering

2017-05-15 Thread Jonathan Neuschäfer
Without this patch, Sphinx renders the sentence as follows, thus hiding
the backslashes:

[...] end each string except the last with nt to
properly indent the next instruction [...]

Signed-off-by: Jonathan Neuschäfer 
---
 Documentation/process/coding-style.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/process/coding-style.rst 
b/Documentation/process/coding-style.rst
index d20d52a4d812..7710c7e0240c 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -980,7 +980,7 @@ do so, though, and doing so unnecessarily can limit 
optimization.
 
 When writing a single inline assembly statement containing multiple
 instructions, put each instruction on a separate line in a separate quoted
-string, and end each string except the last with \n\t to properly indent the
+string, and end each string except the last with ``\n\t`` to properly indent 
the
 next instruction in the assembly output:
 
 .. code-block:: c
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 02/17] cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS

2017-05-15 Thread Waiman Long
From: Tejun Heo 

css_task_iter currently always walks all tasks.  With the scheduled
cgroup v2 thread support, the iterator would need to handle multiple
types of iteration.  As a preparation, add @flags to
css_task_iter_start() and implement CSS_TASK_ITER_PROCS.  If the flag
is not specified, it walks all tasks as before.  When asserted, the
iterator only walks the group leaders.

For now, the only user of the flag is cgroup v2 "cgroup.procs" file
which no longer needs to skip non-leader tasks in cgroup_procs_next().
Note that cgroup v1 "cgroup.procs" can't use the group leader walk as
v1 "cgroup.procs" doesn't mean "list all thread group leaders in the
cgroup" but "list all thread group id's with any threads in the
cgroup".

While at it, update cgroup_procs_show() to use task_pid_vnr() instead
of task_tgid_vnr().  As the iteration guarantees that the function
only sees group leaders, this doesn't change the output and will allow
sharing the function for thread iteration.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup.h   |  6 +-
 kernel/cgroup/cgroup-v1.c|  6 +++---
 kernel/cgroup/cgroup.c   | 24 ++--
 kernel/cgroup/cpuset.c   |  6 +++---
 kernel/cgroup/freezer.c  |  6 +++---
 mm/memcontrol.c  |  2 +-
 net/core/netclassid_cgroup.c |  2 +-
 7 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed2573e..3568aa1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -36,9 +36,13 @@
 #define CGROUP_WEIGHT_DFL  100
 #define CGROUP_WEIGHT_MAX  1
 
+/* walk only threadgroup leaders */
+#define CSS_TASK_ITER_PROCS(1U << 0)
+
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
struct cgroup_subsys*ss;
+   unsigned intflags;
 
struct list_head*cset_pos;
struct list_head*cset_head;
@@ -129,7 +133,7 @@ struct task_struct *cgroup_taskset_first(struct 
cgroup_taskset *tset,
 struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset,
struct cgroup_subsys_state **dst_cssp);
 
-void css_task_iter_start(struct cgroup_subsys_state *css,
+void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
 struct css_task_iter *it);
 struct task_struct *css_task_iter_next(struct css_task_iter *it);
 void css_task_iter_end(struct css_task_iter *it);
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index f13ccab..c212856 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -121,7 +121,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup 
*from)
 * ->can_attach() fails.
 */
do {
-   css_task_iter_start(&from->self, &it);
+   css_task_iter_start(&from->self, 0, &it);
task = css_task_iter_next(&it);
if (task)
get_task_struct(task);
@@ -377,7 +377,7 @@ static int pidlist_array_load(struct cgroup *cgrp, enum 
cgroup_filetype type,
if (!array)
return -ENOMEM;
/* now, populate the array */
-   css_task_iter_start(&cgrp->self, &it);
+   css_task_iter_start(&cgrp->self, 0, &it);
while ((tsk = css_task_iter_next(&it))) {
if (unlikely(n == length))
break;
@@ -753,7 +753,7 @@ int cgroupstats_build(struct cgroupstats *stats, struct 
dentry *dentry)
}
rcu_read_unlock();
 
-   css_task_iter_start(&cgrp->self, &it);
+   css_task_iter_start(&cgrp->self, 0, &it);
while ((tsk = css_task_iter_next(&it))) {
switch (tsk->state) {
case TASK_RUNNING:
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1cf2409..8e3a5c8 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3595,6 +3595,7 @@ static void css_task_iter_advance(struct css_task_iter 
*it)
lockdep_assert_held(&css_set_lock);
WARN_ON_ONCE(!l);
 
+repeat:
/*
 * Advance iterator to find next entry.  cset->tasks is consumed
 * first and then ->mg_tasks.  After ->mg_tasks, we move onto the
@@ -3609,11 +3610,18 @@ static void css_task_iter_advance(struct css_task_iter 
*it)
css_task_iter_advance_css_set(it);
else
it->task_pos = l;
+
+   /* if PROCS, skip over tasks which aren't group leaders */
+   if ((it->flags & CSS_TASK_ITER_PROCS) && it->task_pos &&
+   !thread_group_leader(list_entry(it->task_pos, struct task_struct,
+   cg_list)))
+   goto repeat;
 }
 
 /**
  * css_task_iter_start - initiate task iteration
  * @css: the css to walk tasks of
+ * @flags: CSS_TASK_ITER_* flags
  * @it: the task iterator to use
  *
  * Initiate iteratio

[RFC PATCH v2 04/17] cgroup: implement CSS_TASK_ITER_THREADED

2017-05-15 Thread Waiman Long
From: Tejun Heo 

cgroup v2 is in the process of growing thread granularity support.
Once thread mode is enabled, the root cgroup of the subtree serves as
the proc_cgrp to which the processes of the subtree conceptually
belong and domain-level resource consumptions not tied to any specific
task are charged.  In the subtree, threads won't be subject to process
granularity or no-internal-task constraint and can be distributed
arbitrarily across the subtree.

This patch implements a new task iterator flag CSS_TASK_ITER_THREADED,
which, when used on a proc_cgrp, makes the iteration include the tasks
on all the associated threaded css_sets.  "cgroup.procs" read path is
updated to use it so that reading the file on a proc_cgrp lists all
processes.  This will also be used by controller implementations which
need to walk processes or tasks at the resource domain level.

Task iteration is implemented nested in css_set iteration.  If
CSS_TASK_ITER_THREADED is specified, after walking tasks of each
!threaded css_set, all the associated threaded css_sets are visited
before moving onto the next !threaded css_set.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup.h |  6 
 kernel/cgroup/cgroup.c | 81 +-
 2 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 3568aa1..e2c0b23 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -38,6 +38,8 @@
 
 /* walk only threadgroup leaders */
 #define CSS_TASK_ITER_PROCS(1U << 0)
+/* walk threaded css_sets as part of their proc_csets */
+#define CSS_TASK_ITER_THREADED (1U << 1)
 
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
@@ -47,11 +49,15 @@ struct css_task_iter {
struct list_head*cset_pos;
struct list_head*cset_head;
 
+   struct list_head*tcset_pos;
+   struct list_head*tcset_head;
+
struct list_head*task_pos;
struct list_head*tasks_head;
struct list_head*mg_tasks_head;
 
struct css_set  *cur_cset;
+   struct css_set  *cur_pcset;
struct task_struct  *cur_task;
struct list_headiters_node; /* css_set->task_iters 
*/
 };
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a9c3d640a..7efb5da 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3597,27 +3597,36 @@ bool css_has_online_children(struct cgroup_subsys_state 
*css)
return ret;
 }
 
-/**
- * css_task_iter_advance_css_set - advance a task itererator to the next 
css_set
- * @it: the iterator to advance
- *
- * Advance @it to the next css_set to walk.
- */
-static void css_task_iter_advance_css_set(struct css_task_iter *it)
+static struct css_set *css_task_iter_next_css_set(struct css_task_iter *it)
 {
-   struct list_head *l = it->cset_pos;
+   bool threaded = it->flags & CSS_TASK_ITER_THREADED;
+   struct list_head *l;
struct cgrp_cset_link *link;
struct css_set *cset;
 
lockdep_assert_held(&css_set_lock);
 
-   /* Advance to the next non-empty css_set */
+   /* find the next threaded cset */
+   if (it->tcset_pos) {
+   l = it->tcset_pos->next;
+
+   if (l != it->tcset_head) {
+   it->tcset_pos = l;
+   return container_of(l, struct css_set,
+   threaded_csets_node);
+   }
+
+   it->tcset_pos = NULL;
+   }
+
+   /* find the next cset */
+   l = it->cset_pos;
+
do {
l = l->next;
if (l == it->cset_head) {
it->cset_pos = NULL;
-   it->task_pos = NULL;
-   return;
+   return NULL;
}
 
if (it->ss) {
@@ -3627,10 +3636,50 @@ static void css_task_iter_advance_css_set(struct 
css_task_iter *it)
link = list_entry(l, struct cgrp_cset_link, cset_link);
cset = link->cset;
}
-   } while (!css_set_populated(cset));
+
+   /*
+* For threaded iterations, threaded csets are walked
+* together with their proc_csets.  Skip here.
+*/
+   } while (threaded && css_set_threaded(cset));
 
it->cset_pos = l;
 
+   /* initialize threaded cset walking */
+   if (threaded) {
+   if (it->cur_pcset)
+   put_css_set_locked(it->cur_pcset);
+   it->cur_pcset = cset;
+   get_css_set(cset);
+
+   it->tcset_head = &cset->threaded_csets;
+   it->tcset_pos = &cset->threaded_csets;
+   }
+
+   return cset;
+}
+
+/**
+ * 

[RFC PATCH v2 01/17] cgroup: reorganize cgroup.procs / task write path

2017-05-15 Thread Waiman Long
From: Tejun Heo 

Currently, writes "cgroup.procs" and "cgroup.tasks" files are all
handled by __cgroup_procs_write() on both v1 and v2.  This patch
reoragnizes the write path so that there are common helper functions
that different write paths use.

While this somewhat increases LOC, the different paths are no longer
intertwined and each path has more flexibility to implement different
behaviors which will be necessary for the planned v2 thread support.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup/cgroup-internal.h |   8 +-
 kernel/cgroup/cgroup-v1.c   |  58 --
 kernel/cgroup/cgroup.c  | 163 +---
 3 files changed, 142 insertions(+), 87 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 00f4d6b..f0a0dba 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -180,10 +180,10 @@ int cgroup_migrate(struct task_struct *leader, bool 
threadgroup,
 
 int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
   bool threadgroup);
-ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
-size_t nbytes, loff_t off, bool threadgroup);
-ssize_t cgroup_procs_write(struct kernfs_open_file *of, char *buf, size_t 
nbytes,
-  loff_t off);
+struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup)
+   __acquires(&cgroup_threadgroup_rwsem);
+void cgroup_procs_write_finish(void)
+   __releases(&cgroup_threadgroup_rwsem);
 
 void cgroup_lock_and_drain_offline(struct cgroup *cgrp);
 
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 85d7515..f13ccab 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -514,10 +514,58 @@ static int cgroup_pidlist_show(struct seq_file *s, void 
*v)
return 0;
 }
 
-static ssize_t cgroup_tasks_write(struct kernfs_open_file *of,
- char *buf, size_t nbytes, loff_t off)
+static ssize_t __cgroup1_procs_write(struct kernfs_open_file *of,
+char *buf, size_t nbytes, loff_t off,
+bool threadgroup)
 {
-   return __cgroup_procs_write(of, buf, nbytes, off, false);
+   struct cgroup *cgrp;
+   struct task_struct *task;
+   const struct cred *cred, *tcred;
+   ssize_t ret;
+
+   cgrp = cgroup_kn_lock_live(of->kn, false);
+   if (!cgrp)
+   return -ENODEV;
+
+   task = cgroup_procs_write_start(buf, threadgroup);
+   ret = PTR_ERR_OR_ZERO(task);
+   if (ret)
+   goto out_unlock;
+
+   /*
+* Even if we're attaching all tasks in the thread group, we only
+* need to check permissions on one of them.
+*/
+   cred = current_cred();
+   tcred = get_task_cred(task);
+   if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+   !uid_eq(cred->euid, tcred->uid) &&
+   !uid_eq(cred->euid, tcred->suid))
+   ret = -EACCES;
+   put_cred(tcred);
+   if (ret)
+   goto out_finish;
+
+   ret = cgroup_attach_task(cgrp, task, threadgroup);
+
+out_finish:
+   cgroup_procs_write_finish();
+out_unlock:
+   cgroup_kn_unlock(of->kn);
+
+   return ret ?: nbytes;
+}
+
+static ssize_t cgroup1_procs_write(struct kernfs_open_file *of,
+  char *buf, size_t nbytes, loff_t off)
+{
+   return __cgroup1_procs_write(of, buf, nbytes, off, true);
+}
+
+static ssize_t cgroup1_tasks_write(struct kernfs_open_file *of,
+  char *buf, size_t nbytes, loff_t off)
+{
+   return __cgroup1_procs_write(of, buf, nbytes, off, false);
 }
 
 static ssize_t cgroup_release_agent_write(struct kernfs_open_file *of,
@@ -596,7 +644,7 @@ struct cftype cgroup1_base_files[] = {
.seq_stop = cgroup_pidlist_stop,
.seq_show = cgroup_pidlist_show,
.private = CGROUP_FILE_PROCS,
-   .write = cgroup_procs_write,
+   .write = cgroup1_procs_write,
},
{
.name = "cgroup.clone_children",
@@ -615,7 +663,7 @@ struct cftype cgroup1_base_files[] = {
.seq_stop = cgroup_pidlist_stop,
.seq_show = cgroup_pidlist_show,
.private = CGROUP_FILE_TASKS,
-   .write = cgroup_tasks_write,
+   .write = cgroup1_tasks_write,
},
{
.name = "notify_on_release",
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c3c9a0e..1cf2409 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1919,6 +1919,23 @@ int task_cgroup_path(struct task_struct *task, char 
*buf, size_t buflen)
 }
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
+static struct cgroup *cgroup_migrate_common_ancestor(struct task_struct *task,
+  

[RFC PATCH v2 03/17] cgroup: introduce cgroup->proc_cgrp and threaded css_set handling

2017-05-15 Thread Waiman Long
From: Tejun Heo 

cgroup v2 is in the process of growing thread granularity support.
Once thread mode is enabled, the root cgroup of the subtree serves as
the proc_cgrp to which the processes of the subtree conceptually
belong and domain-level resource consumptions not tied to any specific
task are charged.  In the subtree, threads won't be subject to process
granularity or no-internal-task constraint and can be distributed
arbitrarily across the subtree.

This patch introduces cgroup->proc_cgrp along with threaded css_set
handling.

* cgroup->proc_cgrp is NULL if !threaded.  If threaded, points to the
  proc_cgrp (root of the threaded subtree).

* css_set->proc_cset points to self if !threaded.  If threaded, points
  to the css_set which belongs to the cgrp->proc_cgrp.  The proc_cgrp
  serves as the resource domain and needs the matching csses readily
  available.  The proc_cset holds those csses and makes them easily
  accessible.

* All threaded csets are linked on their proc_csets to enable
  iteration of all threaded tasks.

This patch adds the above but doesn't actually use them yet.  The
following patches will build on top.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup-defs.h | 22 
 kernel/cgroup/cgroup.c  | 87 +
 2 files changed, 103 insertions(+), 6 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 2174594..3f3cfdd 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -162,6 +162,15 @@ struct css_set {
/* reference count */
refcount_t refcount;
 
+   /*
+* If not threaded, the following points to self.  If threaded, to
+* a cset which belongs to the top cgroup of the threaded subtree.
+* The proc_cset provides access to the process cgroup and its
+* csses to which domain level resource consumptions should be
+* charged.
+*/
+   struct css_set __rcu *proc_cset;
+
/* the default cgroup associated with this css_set */
struct cgroup *dfl_cgrp;
 
@@ -187,6 +196,10 @@ struct css_set {
 */
struct list_head e_cset_node[CGROUP_SUBSYS_COUNT];
 
+   /* all csets whose ->proc_cset points to this cset */
+   struct list_head threaded_csets;
+   struct list_head threaded_csets_node;
+
/*
 * List running through all cgroup groups in the same hash
 * slot. Protected by css_set_lock
@@ -293,6 +306,15 @@ struct cgroup {
struct list_head e_csets[CGROUP_SUBSYS_COUNT];
 
/*
+* If !threaded, NULL.  If threaded, it points to the top cgroup of
+* the threaded subtree, on which it points to self.  Threaded
+* subtree is exempt from process granularity and no-internal-task
+* constraint.  Domain level resource consumptions which aren't
+* tied to a specific task should be charged to the proc_cgrp.
+*/
+   struct cgroup *proc_cgrp;
+
+   /*
 * list of pidlists, up to two for each namespace (one for procs, one
 * for tasks); created on demand.
 */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8e3a5c8..a9c3d640a 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -560,9 +560,11 @@ struct cgroup_subsys_state *of_css(struct kernfs_open_file 
*of)
  */
 struct css_set init_css_set = {
.refcount   = REFCOUNT_INIT(1),
+   .proc_cset  = RCU_INITIALIZER(&init_css_set),
.tasks  = LIST_HEAD_INIT(init_css_set.tasks),
.mg_tasks   = LIST_HEAD_INIT(init_css_set.mg_tasks),
.task_iters = LIST_HEAD_INIT(init_css_set.task_iters),
+   .threaded_csets = LIST_HEAD_INIT(init_css_set.threaded_csets),
.cgrp_links = LIST_HEAD_INIT(init_css_set.cgrp_links),
.mg_preload_node= LIST_HEAD_INIT(init_css_set.mg_preload_node),
.mg_node= LIST_HEAD_INIT(init_css_set.mg_node),
@@ -581,6 +583,17 @@ static bool css_set_populated(struct css_set *cset)
return !list_empty(&cset->tasks) || !list_empty(&cset->mg_tasks);
 }
 
+static struct css_set *proc_css_set(struct css_set *cset)
+{
+   return rcu_dereference_protected(cset->proc_cset,
+lockdep_is_held(&css_set_lock));
+}
+
+static bool css_set_threaded(struct css_set *cset)
+{
+   return proc_css_set(cset) != cset;
+}
+
 /**
  * cgroup_update_populated - updated populated count of a cgroup
  * @cgrp: the target cgroup
@@ -732,6 +745,8 @@ void put_css_set_locked(struct css_set *cset)
if (!refcount_dec_and_test(&cset->refcount))
return;
 
+   WARN_ON_ONCE(!list_empty(&cset->threaded_csets));
+
/* This css_set is dead. unlink it and release cgroup and css refs */
for_each_subsys(ss, ssid) {
list_del(&cset->e_cset_node[ssid]);
@@ -748,6 

[RFC PATCH v2 09/17] cgroup: Keep accurate count of tasks in each css_set

2017-05-15 Thread Waiman Long
The reference count in the css_set data structure was used as a
proxy of the number of tasks attached to that css_set. However, that
count is actually not an accurate measure especially with thread mode
support. So a new variable task_count is added to the css_set to keep
track of the actual task count. This new variable is protected by
the css_set_lock. Functions that require the actual task count are
updated to use the new variable.

Signed-off-by: Waiman Long 
---
 include/linux/cgroup-defs.h | 3 +++
 kernel/cgroup/cgroup-v1.c   | 6 +-
 kernel/cgroup/cgroup.c  | 5 +
 kernel/cgroup/debug.c   | 6 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index b123afc..104be73 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -163,6 +163,9 @@ struct css_set {
/* reference count */
refcount_t refcount;
 
+   /* internal task count, protected by css_set_lock */
+   int task_count;
+
/*
 * If not threaded, the following points to self.  If threaded, to
 * a cset which belongs to the top cgroup of the threaded subtree.
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 7ad6b17..302b3b8 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -334,10 +334,6 @@ static struct cgroup_pidlist 
*cgroup_pidlist_find_create(struct cgroup *cgrp,
 /**
  * cgroup_task_count - count the number of tasks in a cgroup.
  * @cgrp: the cgroup in question
- *
- * Return the number of tasks in the cgroup.  The returned number can be
- * higher than the actual number of tasks due to css_set references from
- * namespace roots and temporary usages.
  */
 static int cgroup_task_count(const struct cgroup *cgrp)
 {
@@ -346,7 +342,7 @@ static int cgroup_task_count(const struct cgroup *cgrp)
 
spin_lock_irq(&css_set_lock);
list_for_each_entry(link, &cgrp->cset_links, cset_link)
-   count += refcount_read(&link->cset->refcount);
+   count += link->cset->task_count;
spin_unlock_irq(&css_set_lock);
return count;
 }
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 7b085d5..7e3ddfb 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1676,6 +1676,7 @@ static void cgroup_enable_task_cg_lists(void)
css_set_update_populated(cset, true);
list_add_tail(&p->cg_list, &cset->tasks);
get_css_set(cset);
+   cset->task_count++;
}
spin_unlock(&p->sighand->siglock);
} while_each_thread(g, p);
@@ -2159,8 +2160,10 @@ static int cgroup_migrate_execute(struct cgroup_mgctx 
*mgctx)
struct css_set *to_cset = cset->mg_dst_cset;
 
get_css_set(to_cset);
+   to_cset->task_count++;
css_set_move_task(task, from_cset, to_cset, true);
put_css_set_locked(from_cset);
+   from_cset->task_count--;
}
}
spin_unlock_irq(&css_set_lock);
@@ -5160,6 +5163,7 @@ void cgroup_post_fork(struct task_struct *child)
cset = task_css_set(current);
if (list_empty(&child->cg_list)) {
get_css_set(cset);
+   cset->task_count++;
css_set_move_task(child, NULL, cset, false);
}
spin_unlock_irq(&css_set_lock);
@@ -5209,6 +5213,7 @@ void cgroup_exit(struct task_struct *tsk)
if (!list_empty(&tsk->cg_list)) {
spin_lock_irq(&css_set_lock);
css_set_move_task(tsk, cset, NULL, false);
+   cset->task_count--;
spin_unlock_irq(&css_set_lock);
} else {
get_css_set(cset);
diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index 56e60a2..ada53e6 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -23,10 +23,6 @@ static void debug_css_free(struct cgroup_subsys_state *css)
 /*
  * debug_taskcount_read - return the number of tasks in a cgroup.
  * @cgrp: the cgroup in question
- *
- * Return the number of tasks in the cgroup.  The returned number can be
- * higher than the actual number of tasks due to css_set references from
- * namespace roots and temporary usages.
  */
 static u64 debug_taskcount_read(struct cgroup_subsys_state *css,
struct cftype *cft)
@@ -37,7 +33,7 @@ static u64 debug_taskcount_read(struct cgroup_subsys_state 
*css,
 
spin_lock_irq(&css_set_lock);
list_for_each_entry(link, &cgrp->cset_links, cset_link)
-   count += refcount_read(&link->cset->refcount);
+   count += link->cset->task_count;
spin_unlock_irq(&css_set_lock);
return count;
 }
-- 
1.8.3.1

--
To unsubscribe from

[RFC PATCH v2 05/17] cgroup: implement cgroup v2 thread support

2017-05-15 Thread Waiman Long
From: Tejun Heo 

This patch implements cgroup v2 thread support.  The goal of the
thread mode is supporting hierarchical accounting and control at
thread granularity while staying inside the resource domain model
which allows coordination across different resource controllers and
handling of anonymous resource consumptions.

Once thread mode is enabled on a cgroup, the threads of the processes
which are in its subtree can be placed inside the subtree without
being restricted by process granularity or no-internal-process
constraint.  Note that the threads aren't allowed to escape to a
different threaded subtree.  To be used inside a threaded subtree, a
controller should explicitly support threaded mode and be able to
handle internal competition in the way which is appropriate for the
resource.

The root of a threaded subtree, where thread mode is enabled in the
first place, is called the thread root and serves as the resource
domain for the whole subtree.  This is the last cgroup where
non-threaded controllers are operational and where all the
domain-level resource consumptions in the subtree are accounted.  This
allows threaded controllers to operate at thread granularity when
requested while staying inside the scope of system-level resource
distribution.

Internally, in a threaded subtree, each css_set has its ->proc_cset
pointing to a matching css_set which belongs to the thread root.  This
ensures that thread root level cgroup_subsys_state for all threaded
controllers are readily accessible for domain-level operations.

This patch enables threaded mode for the pids and perf_events
controllers.  Neither has to worry about domain-level resource
consumptions and it's enough to simply set the flag.

For more details on the interface and behavior of the thread mode,
please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
by this patch.  Note that the documentation update is not complete as
the rest of the documentation needs to be updated accordingly.
Rolling those updates into this patch can be confusing so that will be
separate patches.

Signed-off-by: Tejun Heo 
---
 Documentation/cgroup-v2.txt |  75 +-
 include/linux/cgroup-defs.h |  16 +++
 kernel/cgroup/cgroup.c  | 240 +++-
 kernel/cgroup/pids.c|   1 +
 kernel/events/core.c|   1 +
 5 files changed, 326 insertions(+), 7 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dc5e2dc..1c6f5a9 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -16,7 +16,9 @@ CONTENTS
   1-2. What is cgroup?
 2. Basic Operations
   2-1. Mounting
-  2-2. Organizing Processes
+  2-2. Organizing Processes and Threads
+2-2-1. Processes
+2-2-2. Threads
   2-3. [Un]populated Notification
   2-4. Controlling Controllers
 2-4-1. Enabling and Disabling
@@ -150,7 +152,9 @@ and experimenting easier, the kernel parameter 
cgroup_no_v1= allows
 disabling controllers in v1 and make them always available in v2.
 
 
-2-2. Organizing Processes
+2-2. Organizing Processes and Threads
+
+2-2-1. Processes
 
 Initially, only the root cgroup exists to which all processes belong.
 A child cgroup can be created by creating a sub-directory.
@@ -201,6 +205,73 @@ is removed subsequently, " (deleted)" is appended to the 
path.
   0::/test-cgroup/test-cgroup-nested (deleted)
 
 
+2-2-2. Threads
+
+cgroup v2 supports thread granularity for a subset of controllers to
+support use cases requiring hierarchical resource distribution across
+the threads of a group of processes.  By default, all threads of a
+process belong to the same cgroup, which also serves as the resource
+domain to host resource consumptions which are not specific to a
+process or thread.  The thread mode allows threads to be spread across
+a subtree while still maintaining the common resource domain for them.
+
+Enabling thread mode on a subtree makes it threaded.  The root of a
+threaded subtree is called thread root and serves as the resource
+domain for the entire subtree.  In a threaded subtree, threads of a
+process can be put in different cgroups and are not subject to the no
+internal process constraint - threaded controllers can be enabled on
+non-leaf cgroups whether they have threads in them or not.
+
+To enable the thread mode, the following conditions must be met.
+
+- The thread root doesn't have any child cgroups.
+
+- The thread root doesn't have any controllers enabled.
+
+Thread mode can be enabled by writing "enable" to "cgroup.threads"
+file.
+
+  # echo enable > cgroup.threads
+
+Inside a threaded subtree, "cgroup.threads" can be read and contains
+the list of the thread IDs of all threads in the cgroup.  Except that
+the operations are per-thread instead of per-process, "cgroup.threads"
+has the same format and behaves the same way as "cgroup.procs".
+
+The thread root serves as the resource domain for the whole subtree,
+and, while the threads can be scat

[RFC PATCH v2 06/17] cgroup: Fix reference counting bug in cgroup_procs_write()

2017-05-15 Thread Waiman Long
The cgroup_procs_write_start() took a reference to the task structure
which was not properly released within cgroup_procs_write() and so
on. So a put_task_struct() call is added to cgroup_procs_write_finish()
to match the get_task_struct() in cgroup_procs_write_start() to fix
this reference counting error.

Signed-off-by: Waiman Long 
---
 kernel/cgroup/cgroup-internal.h |  2 +-
 kernel/cgroup/cgroup-v1.c   |  2 +-
 kernel/cgroup/cgroup.c  | 10 ++
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index f0a0dba..2c8e3a9 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -182,7 +182,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct 
task_struct *leader,
   bool threadgroup);
 struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup)
__acquires(&cgroup_threadgroup_rwsem);
-void cgroup_procs_write_finish(void)
+void cgroup_procs_write_finish(struct task_struct *task)
__releases(&cgroup_threadgroup_rwsem);
 
 void cgroup_lock_and_drain_offline(struct cgroup *cgrp);
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index c212856..1e101b9 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -549,7 +549,7 @@ static ssize_t __cgroup1_procs_write(struct 
kernfs_open_file *of,
ret = cgroup_attach_task(cgrp, task, threadgroup);
 
 out_finish:
-   cgroup_procs_write_finish();
+   cgroup_procs_write_finish(task);
 out_unlock:
cgroup_kn_unlock(of->kn);
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index d7bab5e..f14deca 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2492,12 +2492,15 @@ struct task_struct *cgroup_procs_write_start(char *buf, 
bool threadgroup)
return tsk;
 }
 
-void cgroup_procs_write_finish(void)
+void cgroup_procs_write_finish(struct task_struct *task)
__releases(&cgroup_threadgroup_rwsem)
 {
struct cgroup_subsys *ss;
int ssid;
 
+   /* release reference from cgroup_procs_write_start() */
+   put_task_struct(task);
+
percpu_up_write(&cgroup_threadgroup_rwsem);
for_each_subsys(ss, ssid)
if (ss->post_attach)
@@ -3300,7 +3303,6 @@ static int cgroup_addrm_files(struct cgroup_subsys_state 
*css,
 
 static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
 {
-   LIST_HEAD(pending);
struct cgroup_subsys *ss = cfts[0].ss;
struct cgroup *root = &ss->root->cgrp;
struct cgroup_subsys_state *css;
@@ -4065,7 +4067,7 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file 
*of,
ret = cgroup_attach_task(cgrp, task, true);
 
 out_finish:
-   cgroup_procs_write_finish();
+   cgroup_procs_write_finish(task);
 out_unlock:
cgroup_kn_unlock(of->kn);
 
@@ -4135,7 +4137,7 @@ static ssize_t cgroup_threads_write(struct 
kernfs_open_file *of,
ret = cgroup_attach_task(cgrp, task, false);
 
 out_finish:
-   cgroup_procs_write_finish();
+   cgroup_procs_write_finish(task);
 out_unlock:
cgroup_kn_unlock(of->kn);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file

2017-05-15 Thread Waiman Long
The debug cgroup currently resides within cgroup-v1.c and is enabled
only for v1 cgroup. To enable the debug cgroup also for v2, it
makes sense to put the code into its own file as it will no longer
be v1 specific. The only change in this patch is the expansion of
cgroup_task_count() within the debug_taskcount_read() function.

Signed-off-by: Waiman Long 
---
 kernel/cgroup/Makefile|   1 +
 kernel/cgroup/cgroup-v1.c | 147 -
 kernel/cgroup/debug.c | 165 ++
 3 files changed, 166 insertions(+), 147 deletions(-)
 create mode 100644 kernel/cgroup/debug.c

diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 387348a..ce693cc 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_CGROUP_FREEZER) += freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
+obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 1e101b9..7ad6b17 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1311,150 +1311,3 @@ static int __init cgroup_no_v1(char *str)
return 1;
 }
 __setup("cgroup_no_v1=", cgroup_no_v1);
-
-
-#ifdef CONFIG_CGROUP_DEBUG
-static struct cgroup_subsys_state *
-debug_css_alloc(struct cgroup_subsys_state *parent_css)
-{
-   struct cgroup_subsys_state *css = kzalloc(sizeof(*css), GFP_KERNEL);
-
-   if (!css)
-   return ERR_PTR(-ENOMEM);
-
-   return css;
-}
-
-static void debug_css_free(struct cgroup_subsys_state *css)
-{
-   kfree(css);
-}
-
-static u64 debug_taskcount_read(struct cgroup_subsys_state *css,
-   struct cftype *cft)
-{
-   return cgroup_task_count(css->cgroup);
-}
-
-static u64 current_css_set_read(struct cgroup_subsys_state *css,
-   struct cftype *cft)
-{
-   return (u64)(unsigned long)current->cgroups;
-}
-
-static u64 current_css_set_refcount_read(struct cgroup_subsys_state *css,
-struct cftype *cft)
-{
-   u64 count;
-
-   rcu_read_lock();
-   count = refcount_read(&task_css_set(current)->refcount);
-   rcu_read_unlock();
-   return count;
-}
-
-static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
-{
-   struct cgrp_cset_link *link;
-   struct css_set *cset;
-   char *name_buf;
-
-   name_buf = kmalloc(NAME_MAX + 1, GFP_KERNEL);
-   if (!name_buf)
-   return -ENOMEM;
-
-   spin_lock_irq(&css_set_lock);
-   rcu_read_lock();
-   cset = rcu_dereference(current->cgroups);
-   list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
-   struct cgroup *c = link->cgrp;
-
-   cgroup_name(c, name_buf, NAME_MAX + 1);
-   seq_printf(seq, "Root %d group %s\n",
-  c->root->hierarchy_id, name_buf);
-   }
-   rcu_read_unlock();
-   spin_unlock_irq(&css_set_lock);
-   kfree(name_buf);
-   return 0;
-}
-
-#define MAX_TASKS_SHOWN_PER_CSS 25
-static int cgroup_css_links_read(struct seq_file *seq, void *v)
-{
-   struct cgroup_subsys_state *css = seq_css(seq);
-   struct cgrp_cset_link *link;
-
-   spin_lock_irq(&css_set_lock);
-   list_for_each_entry(link, &css->cgroup->cset_links, cset_link) {
-   struct css_set *cset = link->cset;
-   struct task_struct *task;
-   int count = 0;
-
-   seq_printf(seq, "css_set %pK\n", cset);
-
-   list_for_each_entry(task, &cset->tasks, cg_list) {
-   if (count++ > MAX_TASKS_SHOWN_PER_CSS)
-   goto overflow;
-   seq_printf(seq, "  task %d\n", task_pid_vnr(task));
-   }
-
-   list_for_each_entry(task, &cset->mg_tasks, cg_list) {
-   if (count++ > MAX_TASKS_SHOWN_PER_CSS)
-   goto overflow;
-   seq_printf(seq, "  task %d\n", task_pid_vnr(task));
-   }
-   continue;
-   overflow:
-   seq_puts(seq, "  ...\n");
-   }
-   spin_unlock_irq(&css_set_lock);
-   return 0;
-}
-
-static u64 releasable_read(struct cgroup_subsys_state *css, struct cftype *cft)
-{
-   return (!cgroup_is_populated(css->cgroup) &&
-   !css_has_online_children(&css->cgroup->self));
-}
-
-static struct cftype debug_files[] =  {
-   {
-   .name = "taskcount",
-   .read_u64 = debug_taskcount_read,
-   },
-
-   {
-   .name = "current_css_set",
-   .read_u64 = current_css_set_read,
-   },
-
-   {
-   .name = "current_css_set_refcount",
-   .read_u64 = current_css_set_refcount_read,
-   },
-
-   {
-   .name = "current_css_set_cg_lin

[RFC PATCH v2 10/17] cgroup: Make debug cgroup support v2 and thread mode

2017-05-15 Thread Waiman Long
Besides supporting cgroup v2 and thread mode, the following changes
are also made:
 1) current_* cgroup files now resides only at the root as we don't
need duplicated files of the same function all over the cgroup
hierarchy.
 2) The cgroup_css_links_read() function is modified to report
the number of tasks that are skipped because of overflow.
 3) The relationship between proc_cset and threaded_csets are displayed.
 4) The number of extra unaccounted references are displayed.
 5) The status of being a thread root or threaded cgroup is displayed.
 6) The current_css_set_read() function now prints out the addresses of
the css'es associated with the current css_set.
 7) A new cgroup_subsys_states file is added to display the css objects
associated with a cgroup.
 8) A new cgroup_masks file is added to display the various controller
bit masks in the cgroup.

Signed-off-by: Waiman Long 
---
 kernel/cgroup/debug.c | 196 +-
 1 file changed, 179 insertions(+), 17 deletions(-)

diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index ada53e6..3121811 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -38,10 +38,37 @@ static u64 debug_taskcount_read(struct cgroup_subsys_state 
*css,
return count;
 }
 
-static u64 current_css_set_read(struct cgroup_subsys_state *css,
-   struct cftype *cft)
+static int current_css_set_read(struct seq_file *seq, void *v)
 {
-   return (u64)(unsigned long)current->cgroups;
+   struct css_set *cset;
+   struct cgroup_subsys *ss;
+   struct cgroup_subsys_state *css;
+   int i, refcnt;
+
+   mutex_lock(&cgroup_mutex);
+   spin_lock_irq(&css_set_lock);
+   rcu_read_lock();
+   cset = rcu_dereference(current->cgroups);
+   refcnt = refcount_read(&cset->refcount);
+   seq_printf(seq, "css_set %pK %d", cset, refcnt);
+   if (refcnt > cset->task_count)
+   seq_printf(seq, " +%d", refcnt - cset->task_count);
+   seq_puts(seq, "\n");
+
+   /*
+* Print the css'es stored in the current css_set.
+*/
+   for_each_subsys(ss, i) {
+   css = cset->subsys[ss->id];
+   if (!css)
+   continue;
+   seq_printf(seq, "%2d: %-4s\t- %lx[%d]\n", ss->id, ss->name,
+ (unsigned long)css, css->id);
+   }
+   rcu_read_unlock();
+   spin_unlock_irq(&css_set_lock);
+   mutex_unlock(&cgroup_mutex);
+   return 0;
 }
 
 static u64 current_css_set_refcount_read(struct cgroup_subsys_state *css,
@@ -86,31 +113,151 @@ static int cgroup_css_links_read(struct seq_file *seq, 
void *v)
 {
struct cgroup_subsys_state *css = seq_css(seq);
struct cgrp_cset_link *link;
+   int dead_cnt = 0, extra_refs = 0, threaded_csets = 0;
 
spin_lock_irq(&css_set_lock);
+   if (css->cgroup->proc_cgrp)
+   seq_puts(seq, (css->cgroup->proc_cgrp == css->cgroup)
+ ? "[thread root]\n" : "[threaded]\n");
+
list_for_each_entry(link, &css->cgroup->cset_links, cset_link) {
struct css_set *cset = link->cset;
struct task_struct *task;
int count = 0;
+   int refcnt = refcount_read(&cset->refcount);
+
+   /*
+* Print out the proc_cset and threaded_cset relationship
+* and highlight difference between refcount and task_count.
+*/
+   seq_printf(seq, "css_set %pK", cset);
+   if (rcu_dereference_protected(cset->proc_cset, 1) != cset) {
+   threaded_csets++;
+   seq_printf(seq, "=>%pK", cset->proc_cset);
+   }
+   if (!list_empty(&cset->threaded_csets)) {
+   struct css_set *tcset;
+   int idx = 0;
 
-   seq_printf(seq, "css_set %pK\n", cset);
+   list_for_each_entry(tcset, &cset->threaded_csets,
+   threaded_csets_node) {
+   seq_puts(seq, idx ? "," : "<=");
+   seq_printf(seq, "%pK", tcset);
+   idx++;
+   }
+   } else {
+   seq_printf(seq, " %d", refcnt);
+   if (refcnt - cset->task_count > 0) {
+   int extra = refcnt - cset->task_count;
+
+   seq_printf(seq, " +%d", extra);
+   /*
+* Take out the one additional reference in
+* init_css_set.
+*/
+   if (cset == &init_css_set)
+   extra--;
+   extra_refs += extra;
+ 

[RFC PATCH v2 14/17] cgroup: Enable printing of v2 controllers' cgroup hierarchy

2017-05-15 Thread Waiman Long
This patch add a new debug control file on the cgroup v2 root directory
to print out the cgroup hierarchy for each of the v2 controllers.

Signed-off-by: Waiman Long 
---
 kernel/cgroup/debug.c | 141 ++
 1 file changed, 141 insertions(+)

diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index a2dbf77..3adb26a 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -268,6 +268,141 @@ static int cgroup_masks_read(struct seq_file *seq, void 
*v)
return 0;
 }
 
+/*
+ * Print out all the child cgroup names that doesn't have a css for the
+ * corresponding cgroup_subsys. If a child cgroup has a css, put that into
+ * the given cglist to be processed in the next iteration.
+ */
+#define CGLIST_MAX 16
+static void print_hierarchy(struct seq_file *seq,
+   struct cgroup *cgrp,
+   struct cgroup_subsys *ss,
+   struct cgroup_subsys_state *css,
+   struct cgroup **cglist,
+   int *cgcnt)
+{
+   struct cgroup *child;
+   struct cgroup_subsys_state *child_css;
+   char cgname[64];
+
+   cgname[sizeof(cgname) - 1] = '\0';
+   /*
+* Iterate all live children of the given cgroup.
+*/
+   list_for_each_entry(child, &cgrp->self.children, self.sibling) {
+   if (cgroup_is_dead(child))
+   continue;
+
+   child_css = rcu_dereference_check(child->subsys[ss->id], true);
+   if (child_css) {
+   WARN_ON(child_css->parent != css);
+   if (*cgcnt < CGLIST_MAX) {
+   cglist[*cgcnt] = child;
+   (*cgcnt)++;
+   }
+   continue;
+   }
+
+   /*
+* Skip resource domain cgroup
+*/
+   if (test_bit(CGRP_RESOURCE_DOMAIN, &child->flags))
+   continue;
+
+   cgroup_name(child, cgname, sizeof(cgname)-1);
+   seq_putc(seq, ',');
+   seq_puts(seq, cgname);
+   print_hierarchy(seq, child, ss, css, cglist, cgcnt);
+   }
+}
+
+/*
+ * Print the hierachies with respect to each controller for the default
+ * hierarchy.
+ *
+ * Each child level is printed on a separate line. Set of cgroups that
+ * have the same css will be grouped together and separated by comma.
+ * Process in those cgroups will be in the same node (css) in the
+ * controller's hierarchy. There is an exception that for resource
+ * domain cgroup, the processes associated with its parent and its
+ * affiliates will be mapped to the css of that resource domain cgroup
+ * instead.
+ *
+ * If there are more than CGLIST_MAX sets of cgroups in each level,
+ * the extra ones will be skipped.
+ */
+static int controller_hierachies_read(struct seq_file *seq, void *v)
+{
+   struct cgroup *root = seq_css(seq)->cgroup;
+   struct cgroup_subsys *ss;
+   struct cgroup_subsys_state *css;
+   struct cgroup *cgrp;
+   struct cgroup *cglist[CGLIST_MAX];
+   struct cgroup *cg2list[CGLIST_MAX];
+   int i, idx, cgnum, cg2num;
+   char cgname[64];
+
+   cgname[sizeof(cgname) - 1] = '\0';
+   mutex_lock(&cgroup_mutex);
+   for_each_subsys(ss, i) {
+   if (!(root->root->subsys_mask & (1 << ss->id)))
+   continue;
+   seq_puts(seq, ss->name);
+   seq_puts(seq, ":\n");
+
+   cgnum = 1;
+   cg2num = 0;
+   cglist[0] = root;
+   idx = 0;
+   while (cgnum) {
+   if (idx)
+   seq_putc(seq, ' ');
+   cgrp = cglist[idx];
+   if (test_bit(CGRP_RESOURCE_DOMAIN, &cgrp->flags)) {
+   struct cgroup *parent;
+
+   parent = container_of(cgrp->self.parent,
+ struct cgroup, self);
+   cgroup_name(parent, cgname, sizeof(cgname)-1);
+   seq_printf(seq, "%s.rd", cgname);
+   } else {
+   cgroup_name(cgrp, cgname, sizeof(cgname)-1);
+   seq_puts(seq, cgname);
+   }
+   css = rcu_dereference_check(cgrp->subsys[ss->id], true);
+   WARN_ON(!css);
+
+   if (cgrp == root)
+   seq_printf(seq, "[%d]", css->id);
+   else
+   seq_printf(seq, "[%d:P=%d]", css->id,
+  css->parent->id);
+
+   /*
+* List all the cgroups that use the current
+  

[RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2

2017-05-15 Thread Waiman Long
For cgroup v1, different controllers can be binded to different cgroup
hierarchies optimized for their own use cases. That is not currently
the case for cgroup v2 where combining all these controllers into
the same hierarchy will probably require more levels than is needed
by each individual controller.

By not enabling a controller in a cgroup and its descendants, we can
effectively trim the hierarchy as seen by a controller from the leafs
up. However, there is currently no way to compress the hierarchy in
the intermediate levels.

This patch implements a fine-grained mechanism to allow a controller to
skip some intermediate levels in a hierarchy and effectively flatten
the hierarchy as seen by that controller.

Controllers can now be directly enabled or disabled in a cgroup
by writing to the "cgroup.controllers" file.  The special prefix
'#' with the controller name is used to set that controller in
pass-through mode.  In that mode, the controller is disabled for that
cgroup but it allows its children to have that controller enabled or
in pass-through mode again.

With this change, each controller can now have a unique view of their
virtual process hierarchy that can be quite different from other
controllers.  We now have the freedom and flexibility to create the
right hierarchy for each controller to suit their own needs without
performance loss when compared with cgroup v1.

Signed-off-by: Waiman Long 
---
 Documentation/cgroup-v2.txt | 125 ++---
 include/linux/cgroup-defs.h |  11 ++
 kernel/cgroup/cgroup.c  | 263 ++--
 kernel/cgroup/debug.c   |   8 +-
 4 files changed, 359 insertions(+), 48 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 0f41282..bb27491 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -308,25 +308,28 @@ both cgroups.
 2-4-1. Enabling and Disabling
 
 Each cgroup has a "cgroup.controllers" file which lists all
-controllers available for the cgroup to enable.
+controllers available for the cgroup to enable for its children.
 
   # cat cgroup.controllers
   cpu io memory
 
-No controller is enabled by default.  Controllers can be enabled and
-disabled by writing to the "cgroup.subtree_control" file.
+No controller is enabled by default.  Controllers can be
+enabled and disabled on the child cgroups by writing to the
+"cgroup.subtree_control" file. A '+' prefix enables the controller,
+and a '-' prefix disables it.
 
   # echo "+cpu +memory -io" > cgroup.subtree_control
 
-Only controllers which are listed in "cgroup.controllers" can be
-enabled.  When multiple operations are specified as above, either they
-all succeed or fail.  If multiple operations on the same controller
-are specified, the last one is effective.
+Only controllers which are listed in "cgroup.controllers" can
+be enabled in the "cgroup.subtree_control" file.  When multiple
+operations are specified as above, either they all succeed or fail.
+If multiple operations on the same controller are specified, the last
+one is effective.
 
 Enabling a controller in a cgroup indicates that the distribution of
 the target resource across its immediate children will be controlled.
-Consider the following sub-hierarchy.  The enabled controllers are
-listed in parentheses.
+Consider the following sub-hierarchy.  The enabled controllers in the
+"cgroup.subtree_control" file are listed in parentheses.
 
   A(cpu,memory) - B(memory) - C()
 \ D()
@@ -336,6 +339,17 @@ of CPU cycles and memory to its children, in this case, B. 
 As B has
 "memory" enabled but not "CPU", C and D will compete freely on CPU
 cycles but their division of memory available to B will be controlled.
 
+By not enabling a controller in a cgroup and its descendants, we can
+effectively trim the hierarchy as seen by a controller from the leafs
+up. From the perspective of the cpu controller, the hierarchy is:
+
+  A - B|C|D
+
+From the perspective of the memory controller, the hierarchy becomes:
+
+  A - B - C
+\ D
+
 As a controller regulates the distribution of the target resource to
 the cgroup's children, enabling it creates the controller's interface
 files in the child cgroups.  In the above example, enabling "cpu" on B
@@ -343,7 +357,81 @@ would create the "cpu." prefixed controller interface 
files in C and
 D.  Likewise, disabling "memory" from B would remove the "memory."
 prefixed controller interface files from C and D.  This means that the
 controller interface files - anything which doesn't start with
-"cgroup." are owned by the parent rather than the cgroup itself.
+"cgroup." can be considered to be owned by the parent under this
+control scheme.
+
+Enabling controllers via the "cgroup.subtree_control" file is
+relatively coarse-grained.  Fine-grained control of the controllers in
+a non-root cgroup can be done by writing to its "cgroup.controllers"
+file directly. A '+' prefix enab

[RFC PATCH v2 07/17] cgroup: Prevent kill_css() from being called more than once

2017-05-15 Thread Waiman Long
The kill_css() function may be called more than once under the condition
that the css was killed but not physically removed yet followed by the
removal of the cgroup that is hosting the css. This patch prevents any
harmm from being done when that happens.

Signed-off-by: Waiman Long 
---
 include/linux/cgroup-defs.h | 1 +
 kernel/cgroup/cgroup.c  | 5 +
 2 files changed, 6 insertions(+)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index e8d0cfc..b123afc 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -48,6 +48,7 @@ enum {
CSS_ONLINE  = (1 << 1), /* between ->css_online() and 
->css_offline() */
CSS_RELEASED= (1 << 2), /* refcnt reached zero, released */
CSS_VISIBLE = (1 << 3), /* css is visible to userland */
+   CSS_DYING   = (1 << 4), /* css is dying */
 };
 
 /* bits in struct cgroup flags field */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f14deca..7b085d5 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4630,6 +4630,11 @@ static void kill_css(struct cgroup_subsys_state *css)
 {
lockdep_assert_held(&cgroup_mutex);
 
+   if (css->flags & CSS_DYING)
+   return;
+
+   css->flags |= CSS_DYING;
+
/*
 * This must happen before css is disassociated with its cgroup.
 * See seq_css() for details.
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-15 Thread Waiman Long
The current thread mode semantics aren't sufficient to fully support
threaded controllers like cpu. The main problem is that when thread
mode is enabled at root (mainly for performance reason), all the
non-threaded controllers cannot be supported at all.

To alleviate this problem, the roles of thread root and threaded
cgroups are now further separated. Now thread mode can only be enabled
on a non-root leaf cgroup whose parent will then become the thread
root. All the descendants of a threaded cgroup will still need to be
threaded. All the non-threaded resource will be accounted for in the
thread root. Unlike the previous thread mode, however, a thread root
can have non-threaded children where system resources like memory
can be further split down the hierarchy.

Now we could have something like

R -- A -- B
 \
  T1 -- T2

where R is the thread root, A and B are non-threaded cgroups, T1 and
T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree
where all the non-threaded resources are accounted for in R.  The no
internal process constraint does not apply in the threaded subtree.
Non-threaded controllers need to properly handle the competition
between internal processes and child cgroups at the thread root.

This model will be flexible enough to support the need of the threaded
controllers.

Signed-off-by: Waiman Long 
---
 Documentation/cgroup-v2.txt |  51 +++
 kernel/cgroup/cgroup-internal.h |  10 +++
 kernel/cgroup/cgroup.c  | 186 +++-
 3 files changed, 209 insertions(+), 38 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 1c6f5a9..3ae7e9c 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -222,21 +222,32 @@ process can be put in different cgroups and are not 
subject to the no
 internal process constraint - threaded controllers can be enabled on
 non-leaf cgroups whether they have threads in them or not.
 
-To enable the thread mode, the following conditions must be met.
+To enable the thread mode on a cgroup, the following conditions must
+be met.
 
-- The thread root doesn't have any child cgroups.
+- The cgroup doesn't have any child cgroups.
 
-- The thread root doesn't have any controllers enabled.
+- The cgroup doesn't have any non-threaded controllers enabled.
+
+- The cgroup doesn't have any processes attached to it.
 
 Thread mode can be enabled by writing "enable" to "cgroup.threads"
 file.
 
   # echo enable > cgroup.threads
 
-Inside a threaded subtree, "cgroup.threads" can be read and contains
-the list of the thread IDs of all threads in the cgroup.  Except that
-the operations are per-thread instead of per-process, "cgroup.threads"
-has the same format and behaves the same way as "cgroup.procs".
+The parent of the threaded cgroup will become the thread root, if
+it hasn't been a thread root yet. In other word, thread mode cannot
+be enabled on the root cgroup as it doesn't have a parent cgroup. A
+thread root can have child cgroups and controllers enabled before
+becoming one.
+
+A threaded subtree includes the thread root and all the threaded child
+cgroups as well as their descendants which are all threaded cgroups.
+"cgroup.threads" can be read and contains the list of the thread
+IDs of all threads in the cgroup.  Except that the operations are
+per-thread instead of per-process, "cgroup.threads" has the same
+format and behaves the same way as "cgroup.procs".
 
 The thread root serves as the resource domain for the whole subtree,
 and, while the threads can be scattered across the subtree, all the
@@ -246,25 +257,30 @@ not readable in the subtree proper.  However, 
"cgroup.procs" can be
 written to from anywhere in the subtree to migrate all threads of the
 matching process to the cgroup.
 
-Only threaded controllers can be enabled in a threaded subtree.  When
-a threaded controller is enabled inside a threaded subtree, it only
-accounts for and controls resource consumptions associated with the
-threads in the cgroup and its descendants.  All consumptions which
-aren't tied to a specific thread belong to the thread root.
+Only threaded controllers can be enabled in a non-root threaded cgroup.
+When a threaded controller is enabled inside a threaded subtree,
+it only accounts for and controls resource consumptions associated
+with the threads in the cgroup and its descendants.  All consumptions
+which aren't tied to a specific thread belong to the thread root.
 
 Because a threaded subtree is exempt from no internal process
 constraint, a threaded controller must be able to handle competition
 between threads in a non-leaf cgroup and its child cgroups.  Each
 threaded controller defines how such competitions are handled.
 
+A new child cgroup created under a thread root will not be threaded.
+Thread mode has to be explicitly enabled on each of the thread root's
+children.  Descendants of a threaded cgroup, however, will a

[RFC PATCH v2 17/17] sched: Make cpu/cpuacct threaded controllers

2017-05-15 Thread Waiman Long
Make cpu and cpuacct cgroup controllers usable within a threaded cgroup.

Signed-off-by: Waiman Long 
---
 kernel/sched/core.c| 1 +
 kernel/sched/cpuacct.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b041081..479f69e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7453,6 +7453,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
.legacy_cftypes = cpu_legacy_files,
.dfl_cftypes= cpu_files,
.early_init = true,
+   .threaded   = true,
 #ifdef CONFIG_CGROUP_CPUACCT
/*
 * cpuacct is enabled together with cpu on the unified hierarchy
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index fc1cf13..853d18a 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -414,4 +414,5 @@ struct cgroup_subsys cpuacct_cgrp_subsys = {
.css_free   = cpuacct_css_free,
.legacy_cftypes = files,
.early_init = true,
+   .threaded   = true,
 };
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint

2017-05-15 Thread Waiman Long
The rationale behind the cgroup v2 no internal process constraint is
to avoid resouorce competition between internal processes and child
cgroups. However, not all controllers have problem with internal
process competiton. Enforcing this rule may lead to unnatural process
hierarchy and unneeded levels for those controllers.

This patch removes the no internal process contraint by enabling those
controllers that don't like internal process competition to have a
separate set of control knobs just for internal processes in a cgroup.

A new control file "cgroup.resource_control" is added. Enabling a
controller with a "+" prefix will create a separate set of control
knobs for that controller in the special "cgroup.resource_domain"
sub-directory for all the internal processes. The existing control
knobs in the cgroup will then be used to manage resource distribution
between internal processes as a group and other child cgroups.

Signed-off-by: Waiman Long 
---
 Documentation/cgroup-v2.txt |  76 ++-
 include/linux/cgroup-defs.h |  15 +++
 kernel/cgroup/cgroup-internal.h |   1 -
 kernel/cgroup/cgroup-v1.c   |   3 -
 kernel/cgroup/cgroup.c  | 275 
 kernel/cgroup/debug.c   |   7 +-
 6 files changed, 260 insertions(+), 117 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 3ae7e9c..0f41282 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -23,7 +23,7 @@ CONTENTS
   2-4. Controlling Controllers
 2-4-1. Enabling and Disabling
 2-4-2. Top-down Constraint
-2-4-3. No Internal Process Constraint
+2-4-3. Managing Internal Process Competition
   2-5. Delegation
 2-5-1. Model of Delegation
 2-5-2. Delegation Containment
@@ -218,9 +218,7 @@ a subtree while still maintaining the common resource 
domain for them.
 Enabling thread mode on a subtree makes it threaded.  The root of a
 threaded subtree is called thread root and serves as the resource
 domain for the entire subtree.  In a threaded subtree, threads of a
-process can be put in different cgroups and are not subject to the no
-internal process constraint - threaded controllers can be enabled on
-non-leaf cgroups whether they have threads in them or not.
+process can be put in different cgroups.
 
 To enable the thread mode on a cgroup, the following conditions must
 be met.
@@ -263,11 +261,6 @@ it only accounts for and controls resource consumptions 
associated
 with the threads in the cgroup and its descendants.  All consumptions
 which aren't tied to a specific thread belong to the thread root.
 
-Because a threaded subtree is exempt from no internal process
-constraint, a threaded controller must be able to handle competition
-between threads in a non-leaf cgroup and its child cgroups.  Each
-threaded controller defines how such competitions are handled.
-
 A new child cgroup created under a thread root will not be threaded.
 Thread mode has to be explicitly enabled on each of the thread root's
 children.  Descendants of a threaded cgroup, however, will always be
@@ -364,35 +357,38 @@ the parent has the controller enabled and a controller 
can't be
 disabled if one or more children have it enabled.
 
 
-2-4-3. No Internal Process Constraint
+2-4-3. Managing Internal Process Competition
 
-Non-root cgroups can only distribute resources to their children when
-they don't have any processes of their own.  In other words, only
-cgroups which don't contain any processes can have controllers enabled
-in their "cgroup.subtree_control" files.
+There are resources managed by some controllers that don't work well
+if the internal processes in a non-leaf cgroup have to compete against
+the resource requirement of the other child cgroups. Other controllers
+work perfectly fine with internal process competition.
 
-This guarantees that, when a controller is looking at the part of the
-hierarchy which has it enabled, processes are always only on the
-leaves.  This rules out situations where child cgroups compete against
-internal processes of the parent.
+Internal processes are allowed in a non-leaf cgroup. Controllers
+that don't like internal process competition can use
+the "cgroup.resource_control" file to create a special
+"cgroup.resource_domain" child cgroup that hold the control knobs
+for all the internal processes in the cgroup.
 
-The root cgroup is exempt from this restriction.  Root contains
-processes and anonymous resource consumption which can't be associated
-with any other cgroups and requires special treatment from most
-controllers.  How resource consumption in the root cgroup is governed
-is up to each controller.
+  # echo "+memory -pids" > cgroup.resource_control
 
-The threaded cgroups and the thread roots are also exempt from this
-restriction.
+Here, the control files for the memory controller are activated in the
+"cgroup.resource_domain" directory while that of the pids controller
+are remo

[RFC PATCH v2 15/17] sched: Misc preps for cgroup unified hierarchy interface

2017-05-15 Thread Waiman Long
From: Tejun Heo 

Make the following changes in preparation for the cpu controller
interface implementation for the unified hierarchy.  This patch
doesn't cause any functional differences.

* s/cpu_stats_show()/cpu_cfs_stats_show()/

* s/cpu_files/cpu_legacy_files/

* Separate out cpuacct_stats_read() from cpuacct_stats_show().  While
  at it, make the @val array u64 for consistency.

Signed-off-by: Tejun Heo 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Li Zefan 
Cc: Johannes Weiner 
---
 kernel/sched/core.c|  8 
 kernel/sched/cpuacct.c | 29 ++---
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c888bd3..be2527b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7230,7 +7230,7 @@ static int __cfs_schedulable(struct task_group *tg, u64 
period, u64 quota)
return ret;
 }
 
-static int cpu_stats_show(struct seq_file *sf, void *v)
+static int cpu_cfs_stats_show(struct seq_file *sf, void *v)
 {
struct task_group *tg = css_tg(seq_css(sf));
struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
@@ -7270,7 +7270,7 @@ static u64 cpu_rt_period_read_uint(struct 
cgroup_subsys_state *css,
 }
 #endif /* CONFIG_RT_GROUP_SCHED */
 
-static struct cftype cpu_files[] = {
+static struct cftype cpu_legacy_files[] = {
 #ifdef CONFIG_FAIR_GROUP_SCHED
{
.name = "shares",
@@ -7291,7 +7291,7 @@ static u64 cpu_rt_period_read_uint(struct 
cgroup_subsys_state *css,
},
{
.name = "stat",
-   .seq_show = cpu_stats_show,
+   .seq_show = cpu_cfs_stats_show,
},
 #endif
 #ifdef CONFIG_RT_GROUP_SCHED
@@ -7317,7 +7317,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
.fork   = cpu_cgroup_fork,
.can_attach = cpu_cgroup_can_attach,
.attach = cpu_cgroup_attach,
-   .legacy_cftypes = cpu_files,
+   .legacy_cftypes = cpu_legacy_files,
.early_init = true,
 };
 
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f95ab29..6151c23 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -276,26 +276,33 @@ static int cpuacct_all_seq_show(struct seq_file *m, void 
*V)
return 0;
 }
 
-static int cpuacct_stats_show(struct seq_file *sf, void *v)
+static void cpuacct_stats_read(struct cpuacct *ca,
+  u64 (*val)[CPUACCT_STAT_NSTATS])
 {
-   struct cpuacct *ca = css_ca(seq_css(sf));
-   s64 val[CPUACCT_STAT_NSTATS];
int cpu;
-   int stat;
 
-   memset(val, 0, sizeof(val));
+   memset(val, 0, sizeof(*val));
+
for_each_possible_cpu(cpu) {
u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat;
 
-   val[CPUACCT_STAT_USER]   += cpustat[CPUTIME_USER];
-   val[CPUACCT_STAT_USER]   += cpustat[CPUTIME_NICE];
-   val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM];
-   val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ];
-   val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ];
+   (*val)[CPUACCT_STAT_USER]   += cpustat[CPUTIME_USER];
+   (*val)[CPUACCT_STAT_USER]   += cpustat[CPUTIME_NICE];
+   (*val)[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM];
+   (*val)[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ];
+   (*val)[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ];
}
+}
+
+static int cpuacct_stats_show(struct seq_file *sf, void *v)
+{
+   u64 val[CPUACCT_STAT_NSTATS];
+   int stat;
+
+   cpuacct_stats_read(css_ca(seq_css(sf)), &val);
 
for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) {
-   seq_printf(sf, "%s %lld\n",
+   seq_printf(sf, "%s %llu\n",
   cpuacct_stat_desc[stat],
   (long long)nsec_to_clock_t(val[stat]));
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 16/17] sched: Implement interface for cgroup unified hierarchy

2017-05-15 Thread Waiman Long
From: Tejun Heo 

While the cpu controller doesn't have any functional problems, there
are a couple interface issues which can be addressed in the v2
interface.

* cpuacct being a separate controller.  This separation is artificial
  and rather pointless as demonstrated by most use cases co-mounting
  the two controllers.  It also forces certain information to be
  accounted twice.

* Use of different time units.  Writable control knobs use
  microseconds, some stat fields use nanoseconds while other cpuacct
  stat fields use centiseconds.

* Control knobs which can't be used in the root cgroup still show up
  in the root.

* Control knob names and semantics aren't consistent with other
  controllers.

This patchset implements cpu controller's interface on the unified
hierarchy which adheres to the controller file conventions described in
Documentation/cgroup-v2.txt.  Overall, the following changes are made.

* cpuacct is implictly enabled and disabled by cpu and its information
  is reported through "cpu.stat" which now uses microseconds for all
  time durations.  All time duration fields now have "_usec" appended
  to them for clarity.  While this doesn't solve the double accounting
  immediately, once majority of users switch to v2, cpu can directly
  account and report the relevant stats and cpuacct can be disabled on
  the unified hierarchy.

  Note that cpuacct.usage_percpu is currently not included in
  "cpu.stat".  If this information is actually called for, it can be
  added later.

* "cpu.shares" is replaced with "cpu.weight" and operates on the
  standard scale defined by CGROUP_WEIGHT_MIN/DFL/MAX (1, 100, 1).
  The weight is scaled to scheduler weight so that 100 maps to 1024
  and the ratio relationship is preserved - if weight is W and its
  scaled value is S, W / 100 == S / 1024.  While the mapped range is a
  bit smaller than the original scheduler weight range, the dead zones
  on both sides are relatively small and covers wider range than the
  nice value mappings.  This file doesn't make sense in the root
  cgroup and isn't create on root.

* "cpu.cfs_quota_us" and "cpu.cfs_period_us" are replaced by "cpu.max"
  which contains both quota and period.

* "cpu.rt_runtime_us" and "cpu.rt_period_us" are replaced by
  "cpu.rt.max" which contains both runtime and period.

v2: cpu_stats_show() was incorrectly using CONFIG_FAIR_GROUP_SCHED for
CFS bandwidth stats and also using raw division for u64.  Use
CONFIG_CFS_BANDWIDTH and do_div() instead.

The semantics of "cpu.rt.max" is not fully decided yet.  Dropped
for now.

Signed-off-by: Tejun Heo 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Li Zefan 
Cc: Johannes Weiner 
---
 kernel/sched/core.c| 141 +
 kernel/sched/cpuacct.c |  25 +
 kernel/sched/cpuacct.h |   5 ++
 3 files changed, 171 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index be2527b..b041081 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7309,6 +7309,139 @@ static u64 cpu_rt_period_read_uint(struct 
cgroup_subsys_state *css,
{ } /* Terminate */
 };
 
+static int cpu_stats_show(struct seq_file *sf, void *v)
+{
+   cpuacct_cpu_stats_show(sf);
+
+#ifdef CONFIG_CFS_BANDWIDTH
+   {
+   struct task_group *tg = css_tg(seq_css(sf));
+   struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
+   u64 throttled_usec;
+
+   throttled_usec = cfs_b->throttled_time;
+   do_div(throttled_usec, NSEC_PER_USEC);
+
+   seq_printf(sf, "nr_periods %d\n"
+  "nr_throttled %d\n"
+  "throttled_usec %llu\n",
+  cfs_b->nr_periods, cfs_b->nr_throttled,
+  throttled_usec);
+   }
+#endif
+   return 0;
+}
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+static u64 cpu_weight_read_u64(struct cgroup_subsys_state *css,
+  struct cftype *cft)
+{
+   struct task_group *tg = css_tg(css);
+   u64 weight = scale_load_down(tg->shares);
+
+   return DIV_ROUND_CLOSEST_ULL(weight * CGROUP_WEIGHT_DFL, 1024);
+}
+
+static int cpu_weight_write_u64(struct cgroup_subsys_state *css,
+   struct cftype *cftype, u64 weight)
+{
+   /*
+* cgroup weight knobs should use the common MIN, DFL and MAX
+* values which are 1, 100 and 1 respectively.  While it loses
+* a bit of range on both ends, it maps pretty well onto the shares
+* value used by scheduler and the round-trip conversions preserve
+* the original value over the entire range.
+*/
+   if (weight < CGROUP_WEIGHT_MIN || weight > CGROUP_WEIGHT_MAX)
+   return -ERANGE;
+
+   weight = DIV_ROUND_CLOSEST_ULL(weight * 1024, CGROUP_WEIGHT_DFL);
+
+   return sched_group_set_shares(css_tg(css), scale_load(weight));
+}
+#endif
+
+static void __maybe_unused c

[RFC PATCH v2 00/17] cgroup: Major changes to cgroup v2 core

2017-05-15 Thread Waiman Long
 v1->v2:
  - Add a new pass-through mode to allow each controller its own
unique virtual hierarchy.
  - Add a new control file "cgroup.resource_control" to enable
the user creation of separate control knobs for internal process
anywhere in the v2 hierarchy instead of doing that automatically
in the thread root only.
  - More functionality in the debug controller to dump out more
internal states.
  - Ported to the 4.12 kernel.
  - Other miscellaneous bug fixes.

 v1: https://lwn.net/Articles/720651/

The existing cgroup v2 core has quite a number of limitations and
constraints that make it hard to migrate controllers from v1 to v2
without suffering performance loss and usability.

This patchset makes some major changes to the cgroup v2 core to
give more freedom and flexibility to controllers so that they can
have their own unique views of the virtual process hierarchies that
are best suit for thier own use cases without suffering unneeded
performance problem. So "Live Free or Die".

On the other hand, the existing controller activation mechanism via
the cgroup.subtree_control file remains unchanged. So existing code 
that relies on the current cgroup v2 semantics should not be impacted.

The major changes are:
 1) Getting rid of the no internal process constraint by allowing
controllers that don't like internal process competition to have
separate sets of control knobs for internal processes as if they
are in a child cgroup of their own.
 2) A thread mode for threaded controllers (e.g. cpu) that can
have unthreaded child cgroups under a thread root.
 3) A pass-through mode for controllers that disable them for a cgroup
effectively collapsing the cgroup's processes to its parent
from the perspective of those controllers while allowing child
cgroups to have the controllers enabled again. This allows each
controller a unique virtual hierarchy that can be quite different
from other controllers.

This patchset incorporates the following 2 patchsets from Tejun Heo:

 1) cgroup v2 thread mode patchset (Patches 1-5)
https://lkml.org/lkml/2017/2/2/592
 2) CPU Controller on Control Group v2 (Patches 15 & 16)
https://lkml.org/lkml/2016/8/5/368

Patch 6 fixes a task_struct reference counting bug introduced in
patch 1.

Patch 7 fixes a problem that css_kill() may be called more than once.

Patch 8 moves the debug cgroup out from cgroup_v1.c into its own
file.

Patch 9 keeps more accurate counts of the number of tasks associated
with each css_set.

Patch 10 enhances the debug controller to provide more information
relevant to the cgroup v2 thread mode to ease debugging effort.

Patch 11 implements the enhanced cgroup v2 thread mode with the
following enhancements:

 1) Thread roots are treated differently from threaded cgroups.
 2) Thread root can now have non-threaded controllers enabled as well
as non-threaded children.

Patch 12 gets rid of the no internal process contraint.

Patch 13 enables fine grained control of controllers including a new
pass-through mode.

Patch 14 enhances the debug controller to print out the virtual
hierarchies for each controller in cgroup v2.

Patch 17 makes both cpu and cpuacct controllers threaded.

Tejun Heo (7):
  cgroup: reorganize cgroup.procs / task write path
  cgroup: add @flags to css_task_iter_start() and implement
CSS_TASK_ITER_PROCS
  cgroup: introduce cgroup->proc_cgrp and threaded css_set handling
  cgroup: implement CSS_TASK_ITER_THREADED
  cgroup: implement cgroup v2 thread support
  sched: Misc preps for cgroup unified hierarchy interface
  sched: Implement interface for cgroup unified hierarchy

Waiman Long (10):
  cgroup: Fix reference counting bug in cgroup_procs_write()
  cgroup: Prevent kill_css() from being called more than once
  cgroup: Move debug cgroup to its own file
  cgroup: Keep accurate count of tasks in each css_set
  cgroup: Make debug cgroup support v2 and thread mode
  cgroup: Implement new thread mode semantics
  cgroup: Remove cgroup v2 no internal process constraint
  cgroup: Allow fine-grained controllers control in cgroup v2
  cgroup: Enable printing of v2 controllers' cgroup hierarchy
  sched: Make cpu/cpuacct threaded controllers

 Documentation/cgroup-v2.txt |  287 +++--
 include/linux/cgroup-defs.h |   68 ++
 include/linux/cgroup.h  |   12 +-
 kernel/cgroup/Makefile  |1 +
 kernel/cgroup/cgroup-internal.h |   19 +-
 kernel/cgroup/cgroup-v1.c   |  220 ++-
 kernel/cgroup/cgroup.c  | 1317 ---
 kernel/cgroup/cpuset.c  |6 +-
 kernel/cgroup/debug.c   |  471 ++
 kernel/cgroup/freezer.c |6 +-
 kernel/cgroup/pids.c|1 +
 kernel/events/core.c|1 +
 kernel/sched/core.c |  150 -
 kernel/sched/cpuacct.c  |   55 +-
 kernel/sched/cpuacct.h  |5 +
 mm/memcontrol.c |2 +-
 net/core/n

Re: [PATCH v4 2/3] hwmon: (adt7475) temperature smoothing

2017-05-15 Thread Guenter Roeck

On 05/14/2017 06:30 PM, Chris Packham wrote:

When enabled temperature smoothing allows ramping the fan speed over a
configurable period of time instead of jumping to the new speed
instantaneously.

Signed-off-by: Chris Packham 


Applied to -next.


---
Changes in v2:
- use a single tempN_smoothing attribute
Changes in v3:
- change enh_acou to enh_acoustics
- simplify show_temp_st()
Changes in v4:
- removed dead code.
- Make the order of the smoothing attributes match the other temperature
  attributes.

Guenter,

We'd previously discussed making the smoothing values set CONFIG6[SLOW] to
expose the other set of potential values. I wasn't sure where you wanted to go
on that one.

Personally I was on the fence since the difference would only be noticeable for
the higher values. If we do want to add support for the other values it could
be done as a subsequent patch (or a v5 if you want it).


It can be added later if anyone cares.

Thanks,
Guenter


 Documentation/hwmon/adt7475 |  4 ++
 drivers/hwmon/adt7475.c | 91 +
 2 files changed, 95 insertions(+)

diff --git a/Documentation/hwmon/adt7475 b/Documentation/hwmon/adt7475
index dc0b55794c47..09d73a10644c 100644
--- a/Documentation/hwmon/adt7475
+++ b/Documentation/hwmon/adt7475
@@ -114,6 +114,10 @@ minimum (i.e. auto_point1_pwm). This behaviour can be 
configured using the
 pwm[1-*]_stall_disable sysfs attribute. A value of 0 means the fans will shut
 off. A value of 1 means the fans will run at auto_point1_pwm.

+The responsiveness of the ADT747x to temperature changes can be configured.
+This allows smoothing of the fan speed transition. To set the transition time
+set the value in ms in the temp[1-*]_smoothing sysfs attribute.
+
 Notes
 -

diff --git a/drivers/hwmon/adt7475.c b/drivers/hwmon/adt7475.c
index 3eb8c5c2f8af..3056076fae27 100644
--- a/drivers/hwmon/adt7475.c
+++ b/drivers/hwmon/adt7475.c
@@ -526,6 +526,88 @@ static ssize_t set_temp(struct device *dev, struct 
device_attribute *attr,
return count;
 }

+/* Assuming CONFIG6[SLOW] is 0 */
+static const int ad7475_st_map[] = {
+   37500, 18800, 12500, 7500, 4700, 3100, 1600, 800,
+};
+
+static ssize_t show_temp_st(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+   struct sensor_device_attribute_2 *sattr = to_sensor_dev_attr_2(attr);
+   struct i2c_client *client = to_i2c_client(dev);
+   struct adt7475_data *data = i2c_get_clientdata(client);
+   long val;
+
+   switch (sattr->index) {
+   case 0:
+   val = data->enh_acoustics[0] & 0xf;
+   break;
+   case 1:
+   val = (data->enh_acoustics[1] >> 4) & 0xf;
+   break;
+   case 2:
+   default:
+   val = data->enh_acoustics[1] & 0xf;
+   break;
+   }
+
+   if (val & 0x8)
+   return sprintf(buf, "%d\n", ad7475_st_map[val & 0x7]);
+   else
+   return sprintf(buf, "0\n");
+}
+
+static ssize_t set_temp_st(struct device *dev, struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   struct sensor_device_attribute_2 *sattr = to_sensor_dev_attr_2(attr);
+   struct i2c_client *client = to_i2c_client(dev);
+   struct adt7475_data *data = i2c_get_clientdata(client);
+   unsigned char reg;
+   int shift, idx;
+   ulong val;
+
+   if (kstrtoul(buf, 10, &val))
+   return -EINVAL;
+
+   switch (sattr->index) {
+   case 0:
+   reg = REG_ENHANCE_ACOUSTICS1;
+   shift = 0;
+   idx = 0;
+   break;
+   case 1:
+   reg = REG_ENHANCE_ACOUSTICS2;
+   shift = 0;
+   idx = 1;
+   break;
+   case 2:
+   default:
+   reg = REG_ENHANCE_ACOUSTICS2;
+   shift = 4;
+   idx = 1;
+   break;
+   }
+
+   if (val > 0) {
+   val = find_closest_descending(val, ad7475_st_map,
+ ARRAY_SIZE(ad7475_st_map));
+   val |= 0x8;
+   }
+
+   mutex_lock(&data->lock);
+
+   data->enh_acoustics[idx] &= ~(0xf << shift);
+   data->enh_acoustics[idx] |= (val << shift);
+
+   i2c_smbus_write_byte_data(client, reg, data->enh_acoustics[idx]);
+
+   mutex_unlock(&data->lock);
+
+   return count;
+}
+
 /*
  * Table of autorange values - the user will write the value in millidegrees,
  * and we'll convert it
@@ -1008,6 +1090,8 @@ static SENSOR_DEVICE_ATTR_2(temp1_crit, S_IRUGO | 
S_IWUSR, show_temp, set_temp,
THERM, 0);
 static SENSOR_DEVICE_ATTR_2(temp1_crit_hyst, S_IRUGO | S_IWUSR, show_temp,
set_temp, HYSTERSIS, 0);
+static SENSOR_DEVICE_ATTR_2(temp1_smoothing, S_IRUGO | S_IWUSR, show_temp_st,
+   set_temp_st, 0, 0);
 static S

Re: [PATCH v4 1/3] hwmon: (adt7475) fan stall prevention

2017-05-15 Thread Guenter Roeck

On 05/14/2017 06:30 PM, Chris Packham wrote:

By default adt7475 will stop the fans (pwm duty cycle 0%) when the
temperature drops past Tmin - hysteresis. Some systems want to keep the
fans moving even when the temperature drops so add new sysfs attributes
that configure the enhanced acoustics min 1-3 which allows the fans to
run at the minimum configure pwm duty cycle.

Signed-off-by: Chris Packham 


Applied to hwmon-next.

Thanks,
Guenter


---
Changes in v2:
- use pwmN_stall_dis as the attribute name. I think this describes the purpose
  pretty well. I went with a new attribute instead of overloading
  pwmN_auto_point1_pwm so this doesn't affect existing users.
Changes in v3:
- Fix grammar.
- change enh_acou to enh_acoustics
Changes in v4:
- Change sysfs attribute to pwmN_stall_disable

 Documentation/hwmon/adt7475 |  5 +
 drivers/hwmon/adt7475.c | 50 +
 2 files changed, 55 insertions(+)

diff --git a/Documentation/hwmon/adt7475 b/Documentation/hwmon/adt7475
index 0502f2b464e1..dc0b55794c47 100644
--- a/Documentation/hwmon/adt7475
+++ b/Documentation/hwmon/adt7475
@@ -109,6 +109,11 @@ fan speed) is applied. PWM values range from 0 (off) to 
255 (full speed).
 Fan speed may be set to maximum when the temperature sensor associated with
 the PWM control exceeds temp#_max.

+At Tmin - hysteresis the PWM output can either be off (0% duty cycle) or at the
+minimum (i.e. auto_point1_pwm). This behaviour can be configured using the
+pwm[1-*]_stall_disable sysfs attribute. A value of 0 means the fans will shut
+off. A value of 1 means the fans will run at auto_point1_pwm.
+
 Notes
 -

diff --git a/drivers/hwmon/adt7475.c b/drivers/hwmon/adt7475.c
index ec0c43fbcdce..3eb8c5c2f8af 100644
--- a/drivers/hwmon/adt7475.c
+++ b/drivers/hwmon/adt7475.c
@@ -79,6 +79,9 @@

 #define REG_TEMP_TRANGE_BASE   0x5F

+#define REG_ENHANCE_ACOUSTICS1 0x62
+#define REG_ENHANCE_ACOUSTICS2 0x63
+
 #define REG_PWM_MIN_BASE   0x64

 #define REG_TEMP_TMIN_BASE 0x67
@@ -209,6 +212,7 @@ struct adt7475_data {
u8 range[3];
u8 pwmctl[3];
u8 pwmchan[3];
+   u8 enh_acoustics[2];

u8 vid;
u8 vrm;
@@ -700,6 +704,43 @@ static ssize_t set_pwm(struct device *dev, struct 
device_attribute *attr,
data->pwm[sattr->nr][sattr->index] = clamp_val(val, 0, 0xFF);
i2c_smbus_write_byte_data(client, reg,
  data->pwm[sattr->nr][sattr->index]);
+   mutex_unlock(&data->lock);
+
+   return count;
+}
+
+static ssize_t show_stall_disable(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct sensor_device_attribute_2 *sattr = to_sensor_dev_attr_2(attr);
+   struct i2c_client *client = to_i2c_client(dev);
+   struct adt7475_data *data = i2c_get_clientdata(client);
+   u8 mask = BIT(5 + sattr->index);
+
+   return sprintf(buf, "%d\n", !!(data->enh_acoustics[0] & mask));
+}
+
+static ssize_t set_stall_disable(struct device *dev,
+struct device_attribute *attr, const char *buf,
+size_t count)
+{
+   struct sensor_device_attribute_2 *sattr = to_sensor_dev_attr_2(attr);
+   struct i2c_client *client = to_i2c_client(dev);
+   struct adt7475_data *data = i2c_get_clientdata(client);
+   long val;
+   u8 mask = BIT(5 + sattr->index);
+
+   if (kstrtol(buf, 10, &val))
+   return -EINVAL;
+
+   mutex_lock(&data->lock);
+
+   data->enh_acoustics[0] &= ~mask;
+   if (val)
+   data->enh_acoustics[0] |= mask;
+
+   i2c_smbus_write_byte_data(client, REG_ENHANCE_ACOUSTICS1,
+ data->enh_acoustics[0]);

mutex_unlock(&data->lock);

@@ -1028,6 +1069,8 @@ static SENSOR_DEVICE_ATTR_2(pwm1_auto_point1_pwm, S_IRUGO 
| S_IWUSR, show_pwm,
set_pwm, MIN, 0);
 static SENSOR_DEVICE_ATTR_2(pwm1_auto_point2_pwm, S_IRUGO | S_IWUSR, show_pwm,
set_pwm, MAX, 0);
+static SENSOR_DEVICE_ATTR_2(pwm1_stall_disable, S_IRUGO | S_IWUSR,
+   show_stall_disable, set_stall_disable, 0, 0);
 static SENSOR_DEVICE_ATTR_2(pwm2, S_IRUGO | S_IWUSR, show_pwm, set_pwm, INPUT,
1);
 static SENSOR_DEVICE_ATTR_2(pwm2_freq, S_IRUGO | S_IWUSR, show_pwmfreq,
@@ -1040,6 +1083,8 @@ static SENSOR_DEVICE_ATTR_2(pwm2_auto_point1_pwm, S_IRUGO 
| S_IWUSR, show_pwm,
set_pwm, MIN, 1);
 static SENSOR_DEVICE_ATTR_2(pwm2_auto_point2_pwm, S_IRUGO | S_IWUSR, show_pwm,
set_pwm, MAX, 1);
+static SENSOR_DEVICE_ATTR_2(pwm2_stall_disable, S_IRUGO | S_IWUSR,
+   show_stall_disable, set_stall_disable, 0, 1);
 static SENSOR_DEVICE_ATTR_2(pwm3, S_IRUGO | S_IWUSR, show_pwm, set_pwm, INPUT,
2);
 static SENSOR_DEVICE_ATTR_2(pwm3_freq, S_IRUG

Re: [PATCH 24/36] fs: jbd2: escape a string with special chars on a kernel-doc

2017-05-15 Thread Jan Kara
On Fri 12-05-17 11:00:07, Mauro Carvalho Chehab wrote:
> kernel-doc will try to interpret a foo() string, except if
> properly escaped.
> 
> Signed-off-by: Mauro Carvalho Chehab 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/jbd2/transaction.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index fe7f4a373436..38e1dcabbaca 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1066,10 +1066,10 @@ static bool jbd2_write_access_granted(handle_t 
> *handle, struct buffer_head *bh,
>   * @handle: transaction to add buffer modifications to
>   * @bh: bh to be used for metadata writes
>   *
> - * Returns an error code or 0 on success.
> + * Returns: error code or 0 on success.
>   *
>   * In full data journalling mode the buffer may be of type BJ_AsyncData,
> - * because we're write()ing a buffer which is also part of a shared mapping.
> + * because we're ``write()ing`` a buffer which is also part of a shared 
> mapping.
>   */
>  
>  int jbd2_journal_get_write_access(handle_t *handle, struct buffer_head *bh)
> -- 
> 2.9.3
> 
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/36] fs: jbd2: make jbd2_journal_start() kernel-doc parseable

2017-05-15 Thread Jan Kara
On Fri 12-05-17 11:00:02, Mauro Carvalho Chehab wrote:
> kernel-doc script expects that a function documentation to
> be just before the function, otherwise it will be ignored.
> 
> So, move the kernel-doc markup to the right place.
> 
> Signed-off-by: Mauro Carvalho Chehab 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/jbd2/transaction.c | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 9ee4832b6f8b..fe7f4a373436 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -409,25 +409,6 @@ static handle_t *new_handle(int nblocks)
>   return handle;
>  }
>  
> -/**
> - * handle_t *jbd2_journal_start() - Obtain a new handle.
> - * @journal: Journal to start transaction on.
> - * @nblocks: number of block buffer we might modify
> - *
> - * We make sure that the transaction can guarantee at least nblocks of
> - * modified buffers in the log.  We block until the log can guarantee
> - * that much space. Additionally, if rsv_blocks > 0, we also create another
> - * handle with rsv_blocks reserved blocks in the journal. This handle is
> - * is stored in h_rsv_handle. It is not attached to any particular 
> transaction
> - * and thus doesn't block transaction commit. If the caller uses this 
> reserved
> - * handle, it has to set h_rsv_handle to NULL as otherwise 
> jbd2_journal_stop()
> - * on the parent handle will dispose the reserved one. Reserved handle has to
> - * be converted to a normal handle using jbd2_journal_start_reserved() before
> - * it can be used.
> - *
> - * Return a pointer to a newly allocated handle, or an ERR_PTR() value
> - * on failure.
> - */
>  handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int 
> rsv_blocks,
> gfp_t gfp_mask, unsigned int type,
> unsigned int line_no)
> @@ -478,6 +459,25 @@ handle_t *jbd2__journal_start(journal_t *journal, int 
> nblocks, int rsv_blocks,
>  EXPORT_SYMBOL(jbd2__journal_start);
>  
>  
> +/**
> + * handle_t *jbd2_journal_start() - Obtain a new handle.
> + * @journal: Journal to start transaction on.
> + * @nblocks: number of block buffer we might modify
> + *
> + * We make sure that the transaction can guarantee at least nblocks of
> + * modified buffers in the log.  We block until the log can guarantee
> + * that much space. Additionally, if rsv_blocks > 0, we also create another
> + * handle with rsv_blocks reserved blocks in the journal. This handle is
> + * is stored in h_rsv_handle. It is not attached to any particular 
> transaction
> + * and thus doesn't block transaction commit. If the caller uses this 
> reserved
> + * handle, it has to set h_rsv_handle to NULL as otherwise 
> jbd2_journal_stop()
> + * on the parent handle will dispose the reserved one. Reserved handle has to
> + * be converted to a normal handle using jbd2_journal_start_reserved() before
> + * it can be used.
> + *
> + * Return a pointer to a newly allocated handle, or an ERR_PTR() value
> + * on failure.
> + */
>  handle_t *jbd2_journal_start(journal_t *journal, int nblocks)
>  {
>   return jbd2__journal_start(journal, nblocks, 0, GFP_NOFS, 0, 0);
> -- 
> 2.9.3
> 
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Convert more books to ReST

2017-05-15 Thread Boris Brezillon
On Sat, 13 May 2017 08:10:53 -0300
Mauro Carvalho Chehab  wrote:

> This patch series convert the following books to ReST:
>   - librs
>   - mtdnand
>   - sh
> 
> And it is based on my previous series of conversion patches.
> 
> After this series, there will be just one DocBook pending conversion:
>   - lsm (Linux Security Modules)
> 
> This book is very outdated: no changes since the Kernel moved 
> to git, in 2005 (except for a minor editorial fix in 2008).
> 
> I took a look on the described API: it doesn't seem to be describing
> the current security implementation.
> 
> The best here is if someone that works with LSM to convert it to
> ReST with:
>   $ Documentation/sphinx/tmplcvt Documentation/DocBook/lsm.tmpl lsm.rst
> 
> And fix the document to produce something that reflects the current
> implementation. If nobody is interested, then maybe we could just
> drop it.
> 
> -
> 
> This patch series is based on my past 00/36 patch series, applied on
> the top of docs tree (next branch).
> 
> The full patch series is on this tree is at:
> 
>https://git.linuxtv.org//mchehab/experimental.git/log/?h=docbook
> 
> And the HTML output at:
> 
>   http://www.infradead.org/~mchehab/kernel_docs/
>   https://mchehab.fedorapeople.org/kernel_docs/ 
> 
> Mauro Carvalho Chehab (5):
>   docs-rst: convert librs book to ReST
>   docs-rst: convert mtdnand book to ReST
>   mtdnand.rst: Fix some typos and group the "::" with previous line

MTD maintainers did not receive the above patch. Can you Cc us the
whole series next time.

BTW, I had a look at your branch and it seems the typo you're fixing is
actually not a type. Flags are *OR-ed* (with the | operator) to form a
valid combination of flags.

>   mtd: adjust kernel-docs to avoid Sphinx/kerneldoc warnings

Not sure how you plan to merge these changes, but if it goes through
a single tree I'll probably need an immutable topic branch, because I
plan to change a few things in nand_base.c nand.h for the next release.

>   docs-rst: convert sh book to ReST
> 
>  Documentation/DocBook/Makefile   |5 +-
>  Documentation/DocBook/librs.tmpl |  289 
>  Documentation/DocBook/mtdnand.tmpl   | 1291 
> --
>  Documentation/DocBook/sh.tmpl|  105 ---
>  Documentation/conf.py|2 +
>  Documentation/core-api/index.rst |1 +
>  Documentation/core-api/librs.rst |  212 ++
>  Documentation/driver-api/index.rst   |1 +
>  Documentation/driver-api/mtdnand.rst | 1007 ++
>  Documentation/index.rst  |   11 +
>  Documentation/sh/index.rst   |   59 ++
>  drivers/mtd/nand/nand_base.c |7 +-
>  include/linux/mtd/nand.h |2 +-
>  13 files changed, 1300 insertions(+), 1692 deletions(-)
>  delete mode 100644 Documentation/DocBook/librs.tmpl
>  delete mode 100644 Documentation/DocBook/mtdnand.tmpl
>  delete mode 100644 Documentation/DocBook/sh.tmpl
>  create mode 100644 Documentation/core-api/librs.rst
>  create mode 100644 Documentation/driver-api/mtdnand.rst
>  create mode 100644 Documentation/sh/index.rst
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Jani Nikula
On Mon, 15 May 2017, Peter Zijlstra  wrote:
> On Mon, May 15, 2017 at 01:29:58PM +0300, Jani Nikula wrote:
>> On Mon, 15 May 2017, Peter Zijlstra  wrote:
>> > The intention is to aid readability. Making comments worse so that some
>> > retarded script can generate better html or whatnot is just that,
>> > retarded.
>> >
>> > Code matters, generated documentation not so much. I'll take a comment
>> > that reads well over one that generates pretty html any day.
>> 
>> The deal is that if you start your comments with "/**" they'll be
>> processed with the retarded script to produce pretty html.
>> 
>> For the most part the comments that generate pretty html also read well,
>> and we don't expect or want anyone to go overboard with markup. I don't
>> think it's unreasonable to make small concessions to improve generated
>> documentation for people who care about it even if you don't.
>
> No. Such a concession has pure negative value. It opens the door to more
> patches converting this or that comment to be prettier or whatnot. And
> before you know it there's a Markus like idiot spamming you with dozens
> of crap patches to prettify the generated crud.
>
> Not to mention that this would mean having to learn this rest crud in
> order to write these comments.
>
> All things I'm not prepared to do.
>
> I'm all for useful comments, but I see no value _at_all_ in this
> generated nonsense. The only reason I sometimes use the docbook comment
> style is because its fairly uniform and the build bot gets you a warning
> when your function signature no longer matches with the comment. But
> if you make this painful I'll simply stop using them.

I see plenty of value in the generated documentation, but I see zero
return on investment in spending any time trying to convince you about
any of it.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Peter Zijlstra
On Mon, May 15, 2017 at 01:29:58PM +0300, Jani Nikula wrote:
> On Mon, 15 May 2017, Peter Zijlstra  wrote:
> > The intention is to aid readability. Making comments worse so that some
> > retarded script can generate better html or whatnot is just that,
> > retarded.
> >
> > Code matters, generated documentation not so much. I'll take a comment
> > that reads well over one that generates pretty html any day.
> 
> The deal is that if you start your comments with "/**" they'll be
> processed with the retarded script to produce pretty html.
> 
> For the most part the comments that generate pretty html also read well,
> and we don't expect or want anyone to go overboard with markup. I don't
> think it's unreasonable to make small concessions to improve generated
> documentation for people who care about it even if you don't.

No. Such a concession has pure negative value. It opens the door to more
patches converting this or that comment to be prettier or whatnot. And
before you know it there's a Markus like idiot spamming you with dozens
of crap patches to prettify the generated crud.

Not to mention that this would mean having to learn this rest crud in
order to write these comments.

All things I'm not prepared to do.


I'm all for useful comments, but I see no value _at_all_ in this
generated nonsense. The only reason I sometimes use the docbook comment
style is because its fairly uniform and the build bot gets you a warning
when your function signature no longer matches with the comment. But
if you make this painful I'll simply stop using them.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Jani Nikula
On Mon, 15 May 2017, Peter Zijlstra  wrote:
> The intention is to aid readability. Making comments worse so that some
> retarded script can generate better html or whatnot is just that,
> retarded.
>
> Code matters, generated documentation not so much. I'll take a comment
> that reads well over one that generates pretty html any day.

The deal is that if you start your comments with "/**" they'll be
processed with the retarded script to produce pretty html.

For the most part the comments that generate pretty html also read well,
and we don't expect or want anyone to go overboard with markup. I don't
think it's unreasonable to make small concessions to improve generated
documentation for people who care about it even if you don't.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Peter Zijlstra
On Mon, May 15, 2017 at 06:00:46AM -0300, Mauro Carvalho Chehab wrote:

> > Well, I don't mind the '-' thing before return values too much, but the
> > below chunk is just pure drivel. It makes a perfectly good comment
> > worse.
> > 
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -227,9 +227,11 @@ static void __sched __mutex_lock_slowpath(struct mutex 
> > *lock);
> >   * (or statically defined) before it can be locked. memset()-ing
> >   * the mutex to 0 is not allowed.
> >   *
> > - * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging
> > + * .. note::
> > + *
> > + *   The CONFIG_DEBUG_MUTEXES .config option turns on debugging
> >   *   checks that will enforce the restrictions and will also do
> > - *   deadlock debugging. )
> > + *   deadlock debugging.
> >   *
> >   * This function is similar to (but not equivalent to) down().
> >   */
> 
> What caused problems with the orignal markup is that Sphinx is
> highly oriented by indentation: different indentation levels on
> it cause troubles. A minimal change for it to be parsed would as 
> expected would be to remove the extra spaces that caused Sphinx
> to misinterpret the paragraph, e. g.:
> 
>  * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging
>  * checks that will enforce the restrictions and will also do
>  * deadlock debugging. )

That's ugly and doesn't read right either. Also C isn't whitespace
sensitive, so I don't feel we should add such brain damaged constraints
to our comments.

> But, if the intention of that spaces were to highlight the content
> inside the parenthesis (with is what I assumed), then the
> .. note markup will do the job. 

The intention is to aid readability. Making comments worse so that some
retarded script can generate better html or whatnot is just that,
retarded.

Code matters, generated documentation not so much. I'll take a comment
that reads well over one that generates pretty html any day.

> That's said, I guess it shouldn't be hard to add something at 
> kernel-doc script to convert some specially-crafted tag (like "Note:")
> to avoid having ReST notation for this specific case, e. g.:
> 
>  * Note:
>  *
>  * The CONFIG_DEBUG_MUTEXES .config option turns on debugging
>  * checks that will enforce the restrictions and will also do
>  * deadlock debugging.
> 
> Yet, IMHO, we should take some care to avoid adding much
> translations to it, as, otherwise, we'll end by having two
> markup languages instead of just one.

I'm all for _no_ markup language.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/13] sound: fix the comments that refers to kernel-doc

2017-05-15 Thread Takashi Iwai
On Sun, 14 May 2017 17:38:44 +0200,
Mauro Carvalho Chehab wrote:
> 
> The markup inside the #if 0 comment actually refers to a
> kernel-doc markup. As we're getting rid of DocBook update it.
> 
> Signed-off-by: Mauro Carvalho Chehab 

I guess you prefer taking it from your tree?  Feel free to take my
ack:
  Reviewed-by: Takashi Iwai 


thanks,

Takashi

> ---
>  include/sound/pcm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/sound/pcm.h b/include/sound/pcm.h
> index 361749e60799..bbf97d4c4c17 100644
> --- a/include/sound/pcm.h
> +++ b/include/sound/pcm.h
> @@ -1054,7 +1054,7 @@ int snd_pcm_format_unsigned(snd_pcm_format_t format);
>  int snd_pcm_format_linear(snd_pcm_format_t format);
>  int snd_pcm_format_little_endian(snd_pcm_format_t format);
>  int snd_pcm_format_big_endian(snd_pcm_format_t format);
> -#if 0 /* just for DocBook */
> +#if 0 /* just for kernel-doc */
>  /**
>   * snd_pcm_format_cpu_endian - Check the PCM format is CPU-endian
>   * @format: the format to check
> -- 
> 2.9.3
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Mauro Carvalho Chehab
Em Mon, 15 May 2017 09:03:48 +0200
Peter Zijlstra  escreveu:

> On Fri, May 12, 2017 at 03:19:17PM -0700, Darren Hart wrote:
> > On Sat, May 13, 2017 at 12:11:09AM +0200, Peter Zijlstra wrote:  
> 
> > > And I really _really_ hate to see that rest crap spread here. Can't we
> > > just delete all that nonsense and go back to 80 column 7bit ASCII ?
> > >   
> > 
> > Depending on the source this could be a genuine appeal or satire :-D  
> 
> A bit of both of course ;-)
> 
> > In this case, I don't think the ReST changes (with -) make the comment 
> > block any
> > less readable in the C files.
> >   
> > > It is an incentive not to use kerneldoc..
> > >   
> > 
> > I like the kerneldoc if for no other reason that it helps keeps formatting
> > consistent. I would object if I started seeing XML or some other horrible
> > formatting style showing up in the code, but this honestly seems like a 
> > fairly
> > minimal imposition... but that's me.  
> 
> Well, I don't mind the '-' thing before return values too much, but the
> below chunk is just pure drivel. It makes a perfectly good comment
> worse.
> 
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -227,9 +227,11 @@ static void __sched __mutex_lock_slowpath(struct mutex 
> *lock);
>   * (or statically defined) before it can be locked. memset()-ing
>   * the mutex to 0 is not allowed.
>   *
> - * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging
> + * .. note::
> + *
> + *   The CONFIG_DEBUG_MUTEXES .config option turns on debugging
>   *   checks that will enforce the restrictions and will also do
> - *   deadlock debugging. )
> + *   deadlock debugging.
>   *
>   * This function is similar to (but not equivalent to) down().
>   */

What caused problems with the orignal markup is that Sphinx is
highly oriented by indentation: different indentation levels on
it cause troubles. A minimal change for it to be parsed would as 
expected would be to remove the extra spaces that caused Sphinx
to misinterpret the paragraph, e. g.:

 * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging
 * checks that will enforce the restrictions and will also do
 * deadlock debugging. )

But, if the intention of that spaces were to highlight the content
inside the parenthesis (with is what I assumed), then the
.. note markup will do the job. 

That's said, I guess it shouldn't be hard to add something at 
kernel-doc script to convert some specially-crafted tag (like "Note:")
to avoid having ReST notation for this specific case, e. g.:

 * Note:
 *
 * The CONFIG_DEBUG_MUTEXES .config option turns on debugging
 * checks that will enforce the restrictions and will also do
 * deadlock debugging.

Yet, IMHO, we should take some care to avoid adding much
translations to it, as, otherwise, we'll end by having two
markup languages instead of just one.

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 28/36] docs-rst: convert s390-drivers DocBook to ReST

2017-05-15 Thread Cornelia Huck
On Fri, 12 May 2017 11:00:11 -0300
Mauro Carvalho Chehab  wrote:

> Use pandoc to convert documentation to ReST by calling
> Documentation/sphinx/tmplcvt script.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/DocBook/Makefile|   2 +-
>  Documentation/DocBook/s390-drivers.tmpl   | 161 
> --
>  Documentation/driver-api/index.rst|   1 +
>  Documentation/driver-api/s390-drivers.rst | 111 
>  4 files changed, 113 insertions(+), 162 deletions(-)
>  delete mode 100644 Documentation/DocBook/s390-drivers.tmpl
>  create mode 100644 Documentation/driver-api/s390-drivers.rst

Acked-by: Cornelia Huck 

...but I wonder how good the information in there still is, given that
I haven't touched it in ages...

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/36] mutex, futex: adjust kernel-doc markups to generate ReST

2017-05-15 Thread Peter Zijlstra
On Fri, May 12, 2017 at 03:19:17PM -0700, Darren Hart wrote:
> On Sat, May 13, 2017 at 12:11:09AM +0200, Peter Zijlstra wrote:

> > And I really _really_ hate to see that rest crap spread here. Can't we
> > just delete all that nonsense and go back to 80 column 7bit ASCII ?
> > 
> 
> Depending on the source this could be a genuine appeal or satire :-D

A bit of both of course ;-)

> In this case, I don't think the ReST changes (with -) make the comment block 
> any
> less readable in the C files.
> 
> > It is an incentive not to use kerneldoc..
> > 
> 
> I like the kerneldoc if for no other reason that it helps keeps formatting
> consistent. I would object if I started seeing XML or some other horrible
> formatting style showing up in the code, but this honestly seems like a fairly
> minimal imposition... but that's me.

Well, I don't mind the '-' thing before return values too much, but the
below chunk is just pure drivel. It makes a perfectly good comment
worse.

--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -227,9 +227,11 @@ static void __sched __mutex_lock_slowpath(struct mutex 
*lock);
  * (or statically defined) before it can be locked. memset()-ing
  * the mutex to 0 is not allowed.
  *
- * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging
+ * .. note::
+ *
+ *   The CONFIG_DEBUG_MUTEXES .config option turns on debugging
  *   checks that will enforce the restrictions and will also do
- *   deadlock debugging. )
+ *   deadlock debugging.
  *
  * This function is similar to (but not equivalent to) down().
  */
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html