Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Shakeel Butt
On Mon, Jun 22, 2020 at 2:58 PM Roman Gushchin  wrote:
>
> On Mon, Jun 22, 2020 at 02:28:54PM -0700, Shakeel Butt wrote:
> > On Mon, Jun 22, 2020 at 2:15 PM Roman Gushchin  wrote:
> > >
> > > On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote:
> > > > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
> > > > >
> > > > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > > > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > > > > > >
> > > > > > > Instead of having two sets of kmem_caches: one for system-wide and
> > > > > > > non-accounted allocations and the second one shared by all 
> > > > > > > accounted
> > > > > > > allocations, we can use just one.
> > > > > > >
> > > > > > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > > > > > on demand and filled only for accounted allocations.
> > > > > > >
> > > > > > > It allows to remove a bunch of code which is required to handle
> > > > > > > kmem_cache clones for accounted allocations. There is no more need
> > > > > > > to create them, accumulate statistics, propagate attributes, etc.
> > > > > > > It's a quite significant simplification.
> > > > > > >
> > > > > > > Also, because the total number of slab_caches is reduced almost 
> > > > > > > twice
> > > > > > > (not all kmem_caches have a memcg clone), some additional memory
> > > > > > > savings are expected. On my devvm it additionally saves about 3.5%
> > > > > > > of slab memory.
> > > > > > >
> > > > > > > Suggested-by: Johannes Weiner 
> > > > > > > Signed-off-by: Roman Gushchin 
> > > > > > > Reviewed-by: Vlastimil Babka 
> > > > > > > ---
> > > > > > [snip]
> > > > > > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache 
> > > > > > > *s,
> > > > > > >   struct obj_cgroup 
> > > > > > > *objcg,
> > > > > > > - size_t size, void 
> > > > > > > **p)
> > > > > > > + gfp_t flags, size_t 
> > > > > > > size,
> > > > > > > + void **p)
> > > > > > >  {
> > > > > > > struct page *page;
> > > > > > > unsigned long off;
> > > > > > > size_t i;
> > > > > > >
> > > > > > > +   if (!objcg)
> > > > > > > +   return;
> > > > > > > +
> > > > > > > +   flags &= ~__GFP_ACCOUNT;
> > > > > > > for (i = 0; i < size; i++) {
> > > > > > > if (likely(p[i])) {
> > > > > > > page = virt_to_head_page(p[i]);
> > > > > > > +
> > > > > > > +   if (!page_has_obj_cgroups(page) &&
> > > > > >
> > > > > > The page is already linked into the kmem_cache, don't you need
> > > > > > synchronization for memcg_alloc_page_obj_cgroups().
> > > > >
> > > > > Hm, yes, in theory we need it. I guess the reason behind why I've 
> > > > > never seen any issues
> > > > > here is the SLUB percpu partial list.
> > > > >
> > > > > So in theory we need something like:
> > > > >
> > > > > diff --git a/mm/slab.h b/mm/slab.h
> > > > > index 0a31600a0f5c..44bf57815816 100644
> > > > > --- a/mm/slab.h
> > > > > +++ b/mm/slab.h
> > > > > @@ -237,7 +237,10 @@ static inline int 
> > > > > memcg_alloc_page_obj_cgroups(struct page *page,
> > > > > if (!vec)
> > > > > return -ENOMEM;
> > > > >
> > > > > -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned 
> > > > > long)vec | 0x1UL);
> > > > > +   if (cmpxchg(&page->obj_cgroups, 0,
> > > > > +   (struct obj_cgroup **) ((unsigned long)vec | 
> > > > > 0x1UL)))
> > > > > +   kfree(vec);
> > > > > +
> > > > > return 0;
> > > > >  }
> > > > >
> > > > >
> > > > > But I wonder if we might put it under #ifdef CONFIG_SLAB?
> > > > > Or any other ideas how to make it less expensive?
> > > > >
> > > > > > What's the reason to remove this from charge_slab_page()?
> > > > >
> > > > > Because at charge_slab_page() we don't know if we'll ever need
> > > > > page->obj_cgroups. Some caches might have only few or even zero
> > > > > accounted objects.
> > > > >
> > > >
> > > > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
> > > > need page->obj_cgroups.  The charge_slab_page() happens between
> > > > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
> > > > to tell if page->obj_cgroups is needed.
> > >
> > > Yes, but the opposite is not always true: we can reuse the existing page
> > > without allocated page->obj_cgroups. In this case charge_slab_page() is
> > > not involved at all.
> > >
> >
> > Hmm yeah, you are right. I missed that.
> >
> > >
> > > Or do you mean that we can minimize the amount of required synchronization
> > > by allocating some obj_cgroups vectors from charge_slab_page()?
> >
> > One optimization would be to always pre-allocate page->obj_cgroups for
> > kmem_caches with SLAB_ACCOUNT.
>
> Even this is not completely memory o

Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Roman Gushchin
On Mon, Jun 22, 2020 at 02:28:54PM -0700, Shakeel Butt wrote:
> On Mon, Jun 22, 2020 at 2:15 PM Roman Gushchin  wrote:
> >
> > On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote:
> > > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
> > > >
> > > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > > > > >
> > > > > > Instead of having two sets of kmem_caches: one for system-wide and
> > > > > > non-accounted allocations and the second one shared by all accounted
> > > > > > allocations, we can use just one.
> > > > > >
> > > > > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > > > > on demand and filled only for accounted allocations.
> > > > > >
> > > > > > It allows to remove a bunch of code which is required to handle
> > > > > > kmem_cache clones for accounted allocations. There is no more need
> > > > > > to create them, accumulate statistics, propagate attributes, etc.
> > > > > > It's a quite significant simplification.
> > > > > >
> > > > > > Also, because the total number of slab_caches is reduced almost 
> > > > > > twice
> > > > > > (not all kmem_caches have a memcg clone), some additional memory
> > > > > > savings are expected. On my devvm it additionally saves about 3.5%
> > > > > > of slab memory.
> > > > > >
> > > > > > Suggested-by: Johannes Weiner 
> > > > > > Signed-off-by: Roman Gushchin 
> > > > > > Reviewed-by: Vlastimil Babka 
> > > > > > ---
> > > > > [snip]
> > > > > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > > > > >   struct obj_cgroup 
> > > > > > *objcg,
> > > > > > - size_t size, void **p)
> > > > > > + gfp_t flags, size_t 
> > > > > > size,
> > > > > > + void **p)
> > > > > >  {
> > > > > > struct page *page;
> > > > > > unsigned long off;
> > > > > > size_t i;
> > > > > >
> > > > > > +   if (!objcg)
> > > > > > +   return;
> > > > > > +
> > > > > > +   flags &= ~__GFP_ACCOUNT;
> > > > > > for (i = 0; i < size; i++) {
> > > > > > if (likely(p[i])) {
> > > > > > page = virt_to_head_page(p[i]);
> > > > > > +
> > > > > > +   if (!page_has_obj_cgroups(page) &&
> > > > >
> > > > > The page is already linked into the kmem_cache, don't you need
> > > > > synchronization for memcg_alloc_page_obj_cgroups().
> > > >
> > > > Hm, yes, in theory we need it. I guess the reason behind why I've never 
> > > > seen any issues
> > > > here is the SLUB percpu partial list.
> > > >
> > > > So in theory we need something like:
> > > >
> > > > diff --git a/mm/slab.h b/mm/slab.h
> > > > index 0a31600a0f5c..44bf57815816 100644
> > > > --- a/mm/slab.h
> > > > +++ b/mm/slab.h
> > > > @@ -237,7 +237,10 @@ static inline int 
> > > > memcg_alloc_page_obj_cgroups(struct page *page,
> > > > if (!vec)
> > > > return -ENOMEM;
> > > >
> > > > -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec 
> > > > | 0x1UL);
> > > > +   if (cmpxchg(&page->obj_cgroups, 0,
> > > > +   (struct obj_cgroup **) ((unsigned long)vec | 
> > > > 0x1UL)))
> > > > +   kfree(vec);
> > > > +
> > > > return 0;
> > > >  }
> > > >
> > > >
> > > > But I wonder if we might put it under #ifdef CONFIG_SLAB?
> > > > Or any other ideas how to make it less expensive?
> > > >
> > > > > What's the reason to remove this from charge_slab_page()?
> > > >
> > > > Because at charge_slab_page() we don't know if we'll ever need
> > > > page->obj_cgroups. Some caches might have only few or even zero
> > > > accounted objects.
> > > >
> > >
> > > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
> > > need page->obj_cgroups.  The charge_slab_page() happens between
> > > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
> > > to tell if page->obj_cgroups is needed.
> >
> > Yes, but the opposite is not always true: we can reuse the existing page
> > without allocated page->obj_cgroups. In this case charge_slab_page() is
> > not involved at all.
> >
> 
> Hmm yeah, you are right. I missed that.
> 
> >
> > Or do you mean that we can minimize the amount of required synchronization
> > by allocating some obj_cgroups vectors from charge_slab_page()?
> 
> One optimization would be to always pre-allocate page->obj_cgroups for
> kmem_caches with SLAB_ACCOUNT.

Even this is not completely memory overhead-free, because processes belonging
to the root cgroup and kthreads might allocate from such cache.

Anyway, I think I'll go with cmpxchg() for now and will think about possible
optimizations later. Because the allocation happens only once per the lifetime
of a slab page, and is very unlikely racing with a concurrent

Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Shakeel Butt
On Mon, Jun 22, 2020 at 2:15 PM Roman Gushchin  wrote:
>
> On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote:
> > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
> > >
> > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > > > >
> > > > > Instead of having two sets of kmem_caches: one for system-wide and
> > > > > non-accounted allocations and the second one shared by all accounted
> > > > > allocations, we can use just one.
> > > > >
> > > > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > > > on demand and filled only for accounted allocations.
> > > > >
> > > > > It allows to remove a bunch of code which is required to handle
> > > > > kmem_cache clones for accounted allocations. There is no more need
> > > > > to create them, accumulate statistics, propagate attributes, etc.
> > > > > It's a quite significant simplification.
> > > > >
> > > > > Also, because the total number of slab_caches is reduced almost twice
> > > > > (not all kmem_caches have a memcg clone), some additional memory
> > > > > savings are expected. On my devvm it additionally saves about 3.5%
> > > > > of slab memory.
> > > > >
> > > > > Suggested-by: Johannes Weiner 
> > > > > Signed-off-by: Roman Gushchin 
> > > > > Reviewed-by: Vlastimil Babka 
> > > > > ---
> > > > [snip]
> > > > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > > > >   struct obj_cgroup 
> > > > > *objcg,
> > > > > - size_t size, void **p)
> > > > > + gfp_t flags, size_t 
> > > > > size,
> > > > > + void **p)
> > > > >  {
> > > > > struct page *page;
> > > > > unsigned long off;
> > > > > size_t i;
> > > > >
> > > > > +   if (!objcg)
> > > > > +   return;
> > > > > +
> > > > > +   flags &= ~__GFP_ACCOUNT;
> > > > > for (i = 0; i < size; i++) {
> > > > > if (likely(p[i])) {
> > > > > page = virt_to_head_page(p[i]);
> > > > > +
> > > > > +   if (!page_has_obj_cgroups(page) &&
> > > >
> > > > The page is already linked into the kmem_cache, don't you need
> > > > synchronization for memcg_alloc_page_obj_cgroups().
> > >
> > > Hm, yes, in theory we need it. I guess the reason behind why I've never 
> > > seen any issues
> > > here is the SLUB percpu partial list.
> > >
> > > So in theory we need something like:
> > >
> > > diff --git a/mm/slab.h b/mm/slab.h
> > > index 0a31600a0f5c..44bf57815816 100644
> > > --- a/mm/slab.h
> > > +++ b/mm/slab.h
> > > @@ -237,7 +237,10 @@ static inline int 
> > > memcg_alloc_page_obj_cgroups(struct page *page,
> > > if (!vec)
> > > return -ENOMEM;
> > >
> > > -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 
> > > 0x1UL);
> > > +   if (cmpxchg(&page->obj_cgroups, 0,
> > > +   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
> > > +   kfree(vec);
> > > +
> > > return 0;
> > >  }
> > >
> > >
> > > But I wonder if we might put it under #ifdef CONFIG_SLAB?
> > > Or any other ideas how to make it less expensive?
> > >
> > > > What's the reason to remove this from charge_slab_page()?
> > >
> > > Because at charge_slab_page() we don't know if we'll ever need
> > > page->obj_cgroups. Some caches might have only few or even zero
> > > accounted objects.
> > >
> >
> > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
> > need page->obj_cgroups.  The charge_slab_page() happens between
> > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
> > to tell if page->obj_cgroups is needed.
>
> Yes, but the opposite is not always true: we can reuse the existing page
> without allocated page->obj_cgroups. In this case charge_slab_page() is
> not involved at all.
>

Hmm yeah, you are right. I missed that.

>
> Or do you mean that we can minimize the amount of required synchronization
> by allocating some obj_cgroups vectors from charge_slab_page()?

One optimization would be to always pre-allocate page->obj_cgroups for
kmem_caches with SLAB_ACCOUNT.


Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Roman Gushchin
On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote:
> On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
> >
> > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > > >
> > > > Instead of having two sets of kmem_caches: one for system-wide and
> > > > non-accounted allocations and the second one shared by all accounted
> > > > allocations, we can use just one.
> > > >
> > > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > > on demand and filled only for accounted allocations.
> > > >
> > > > It allows to remove a bunch of code which is required to handle
> > > > kmem_cache clones for accounted allocations. There is no more need
> > > > to create them, accumulate statistics, propagate attributes, etc.
> > > > It's a quite significant simplification.
> > > >
> > > > Also, because the total number of slab_caches is reduced almost twice
> > > > (not all kmem_caches have a memcg clone), some additional memory
> > > > savings are expected. On my devvm it additionally saves about 3.5%
> > > > of slab memory.
> > > >
> > > > Suggested-by: Johannes Weiner 
> > > > Signed-off-by: Roman Gushchin 
> > > > Reviewed-by: Vlastimil Babka 
> > > > ---
> > > [snip]
> > > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > > >   struct obj_cgroup *objcg,
> > > > - size_t size, void **p)
> > > > + gfp_t flags, size_t size,
> > > > + void **p)
> > > >  {
> > > > struct page *page;
> > > > unsigned long off;
> > > > size_t i;
> > > >
> > > > +   if (!objcg)
> > > > +   return;
> > > > +
> > > > +   flags &= ~__GFP_ACCOUNT;
> > > > for (i = 0; i < size; i++) {
> > > > if (likely(p[i])) {
> > > > page = virt_to_head_page(p[i]);
> > > > +
> > > > +   if (!page_has_obj_cgroups(page) &&
> > >
> > > The page is already linked into the kmem_cache, don't you need
> > > synchronization for memcg_alloc_page_obj_cgroups().
> >
> > Hm, yes, in theory we need it. I guess the reason behind why I've never 
> > seen any issues
> > here is the SLUB percpu partial list.
> >
> > So in theory we need something like:
> >
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 0a31600a0f5c..44bf57815816 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct 
> > page *page,
> > if (!vec)
> > return -ENOMEM;
> >
> > -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 
> > 0x1UL);
> > +   if (cmpxchg(&page->obj_cgroups, 0,
> > +   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
> > +   kfree(vec);
> > +
> > return 0;
> >  }
> >
> >
> > But I wonder if we might put it under #ifdef CONFIG_SLAB?
> > Or any other ideas how to make it less expensive?
> >
> > > What's the reason to remove this from charge_slab_page()?
> >
> > Because at charge_slab_page() we don't know if we'll ever need
> > page->obj_cgroups. Some caches might have only few or even zero
> > accounted objects.
> >
> 
> If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
> need page->obj_cgroups.  The charge_slab_page() happens between
> slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
> to tell if page->obj_cgroups is needed.

Yes, but the opposite is not always true: we can reuse the existing page
without allocated page->obj_cgroups. In this case charge_slab_page() is
not involved at all.

Or do you mean that we can minimize the amount of required synchronization
by allocating some obj_cgroups vectors from charge_slab_page()?


Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Shakeel Butt
On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
>
> On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > >
> > > Instead of having two sets of kmem_caches: one for system-wide and
> > > non-accounted allocations and the second one shared by all accounted
> > > allocations, we can use just one.
> > >
> > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > on demand and filled only for accounted allocations.
> > >
> > > It allows to remove a bunch of code which is required to handle
> > > kmem_cache clones for accounted allocations. There is no more need
> > > to create them, accumulate statistics, propagate attributes, etc.
> > > It's a quite significant simplification.
> > >
> > > Also, because the total number of slab_caches is reduced almost twice
> > > (not all kmem_caches have a memcg clone), some additional memory
> > > savings are expected. On my devvm it additionally saves about 3.5%
> > > of slab memory.
> > >
> > > Suggested-by: Johannes Weiner 
> > > Signed-off-by: Roman Gushchin 
> > > Reviewed-by: Vlastimil Babka 
> > > ---
> > [snip]
> > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > >   struct obj_cgroup *objcg,
> > > - size_t size, void **p)
> > > + gfp_t flags, size_t size,
> > > + void **p)
> > >  {
> > > struct page *page;
> > > unsigned long off;
> > > size_t i;
> > >
> > > +   if (!objcg)
> > > +   return;
> > > +
> > > +   flags &= ~__GFP_ACCOUNT;
> > > for (i = 0; i < size; i++) {
> > > if (likely(p[i])) {
> > > page = virt_to_head_page(p[i]);
> > > +
> > > +   if (!page_has_obj_cgroups(page) &&
> >
> > The page is already linked into the kmem_cache, don't you need
> > synchronization for memcg_alloc_page_obj_cgroups().
>
> Hm, yes, in theory we need it. I guess the reason behind why I've never seen 
> any issues
> here is the SLUB percpu partial list.
>
> So in theory we need something like:
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 0a31600a0f5c..44bf57815816 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct 
> page *page,
> if (!vec)
> return -ENOMEM;
>
> -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 
> 0x1UL);
> +   if (cmpxchg(&page->obj_cgroups, 0,
> +   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
> +   kfree(vec);
> +
> return 0;
>  }
>
>
> But I wonder if we might put it under #ifdef CONFIG_SLAB?
> Or any other ideas how to make it less expensive?
>
> > What's the reason to remove this from charge_slab_page()?
>
> Because at charge_slab_page() we don't know if we'll ever need
> page->obj_cgroups. Some caches might have only few or even zero
> accounted objects.
>

If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
need page->obj_cgroups.  The charge_slab_page() happens between
slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
to tell if page->obj_cgroups is needed.


Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Roman Gushchin
On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> >
> > Instead of having two sets of kmem_caches: one for system-wide and
> > non-accounted allocations and the second one shared by all accounted
> > allocations, we can use just one.
> >
> > The idea is simple: space for obj_cgroup metadata can be allocated
> > on demand and filled only for accounted allocations.
> >
> > It allows to remove a bunch of code which is required to handle
> > kmem_cache clones for accounted allocations. There is no more need
> > to create them, accumulate statistics, propagate attributes, etc.
> > It's a quite significant simplification.
> >
> > Also, because the total number of slab_caches is reduced almost twice
> > (not all kmem_caches have a memcg clone), some additional memory
> > savings are expected. On my devvm it additionally saves about 3.5%
> > of slab memory.
> >
> > Suggested-by: Johannes Weiner 
> > Signed-off-by: Roman Gushchin 
> > Reviewed-by: Vlastimil Babka 
> > ---
> [snip]
> >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> >   struct obj_cgroup *objcg,
> > - size_t size, void **p)
> > + gfp_t flags, size_t size,
> > + void **p)
> >  {
> > struct page *page;
> > unsigned long off;
> > size_t i;
> >
> > +   if (!objcg)
> > +   return;
> > +
> > +   flags &= ~__GFP_ACCOUNT;
> > for (i = 0; i < size; i++) {
> > if (likely(p[i])) {
> > page = virt_to_head_page(p[i]);
> > +
> > +   if (!page_has_obj_cgroups(page) &&
> 
> The page is already linked into the kmem_cache, don't you need
> synchronization for memcg_alloc_page_obj_cgroups().

Hm, yes, in theory we need it. I guess the reason behind why I've never seen 
any issues
here is the SLUB percpu partial list.

So in theory we need something like:

diff --git a/mm/slab.h b/mm/slab.h
index 0a31600a0f5c..44bf57815816 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct page 
*page,
if (!vec)
return -ENOMEM;
 
-   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 0x1UL);
+   if (cmpxchg(&page->obj_cgroups, 0,
+   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
+   kfree(vec);
+
return 0;
 }


But I wonder if we might put it under #ifdef CONFIG_SLAB?
Or any other ideas how to make it less expensive?

> What's the reason to remove this from charge_slab_page()?

Because at charge_slab_page() we don't know if we'll ever need
page->obj_cgroups. Some caches might have only few or even zero
accounted objects.

> 
> > +   memcg_alloc_page_obj_cgroups(page, s, flags)) {
> > +   obj_cgroup_uncharge(objcg, 
> > obj_full_size(s));
> > +   continue;
> > +   }
> > +
> > off = obj_to_index(s, page, p[i]);
> > obj_cgroup_get(objcg);
> > page_obj_cgroups(page)[off] = objcg;


Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Shakeel Butt
On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
>
> Instead of having two sets of kmem_caches: one for system-wide and
> non-accounted allocations and the second one shared by all accounted
> allocations, we can use just one.
>
> The idea is simple: space for obj_cgroup metadata can be allocated
> on demand and filled only for accounted allocations.
>
> It allows to remove a bunch of code which is required to handle
> kmem_cache clones for accounted allocations. There is no more need
> to create them, accumulate statistics, propagate attributes, etc.
> It's a quite significant simplification.
>
> Also, because the total number of slab_caches is reduced almost twice
> (not all kmem_caches have a memcg clone), some additional memory
> savings are expected. On my devvm it additionally saves about 3.5%
> of slab memory.
>
> Suggested-by: Johannes Weiner 
> Signed-off-by: Roman Gushchin 
> Reviewed-by: Vlastimil Babka 
> ---
[snip]
>  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
>   struct obj_cgroup *objcg,
> - size_t size, void **p)
> + gfp_t flags, size_t size,
> + void **p)
>  {
> struct page *page;
> unsigned long off;
> size_t i;
>
> +   if (!objcg)
> +   return;
> +
> +   flags &= ~__GFP_ACCOUNT;
> for (i = 0; i < size; i++) {
> if (likely(p[i])) {
> page = virt_to_head_page(p[i]);
> +
> +   if (!page_has_obj_cgroups(page) &&

The page is already linked into the kmem_cache, don't you need
synchronization for memcg_alloc_page_obj_cgroups(). What's the reason
to remove this from charge_slab_page()?

> +   memcg_alloc_page_obj_cgroups(page, s, flags)) {
> +   obj_cgroup_uncharge(objcg, obj_full_size(s));
> +   continue;
> +   }
> +
> off = obj_to_index(s, page, p[i]);
> obj_cgroup_get(objcg);
> page_obj_cgroups(page)[off] = objcg;


Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-18 Thread Roman Gushchin
On Thu, Jun 18, 2020 at 09:33:08AM +0200, Vlastimil Babka wrote:
> On 6/18/20 2:35 AM, Roman Gushchin wrote:
> > On Wed, Jun 17, 2020 at 04:35:28PM -0700, Andrew Morton wrote:
> >> On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin  wrote:
> >> 
> >> > Instead of having two sets of kmem_caches: one for system-wide and
> >> > non-accounted allocations and the second one shared by all accounted
> >> > allocations, we can use just one.
> >> > 
> >> > The idea is simple: space for obj_cgroup metadata can be allocated
> >> > on demand and filled only for accounted allocations.
> >> > 
> >> > It allows to remove a bunch of code which is required to handle
> >> > kmem_cache clones for accounted allocations. There is no more need
> >> > to create them, accumulate statistics, propagate attributes, etc.
> >> > It's a quite significant simplification.
> >> > 
> >> > Also, because the total number of slab_caches is reduced almost twice
> >> > (not all kmem_caches have a memcg clone), some additional memory
> >> > savings are expected. On my devvm it additionally saves about 3.5%
> >> > of slab memory.
> >> > 
> >> 
> >> This ran afoul of Vlastimil's "mm, slab/slub: move and improve
> >> cache_from_obj()"
> >> (http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz).  I
> >> resolved things as below.  Not too sure about slab.c's
> >> cache_from_obj()...
> > 
> > It can actually be as simple as:
> > static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void 
> > *x)
> > {
> > return s;
> > }
> > 
> > But I wonder if we need it at all, or maybe we wanna rename it to
> > something like obj_check_kmem_cache(void *obj, struct kmem_cache *s),
> > because it has now only debug purposes.
> > 
> > Let me and Vlastimil figure it out and send a follow-up patch.
> > Your version is definitely correct.
> 
> Well, Kees wants to restore the common version of cache_from_obj() [1] for 
> SLAB
> hardening.
> 
> To prevent all that back and forth churn entering git history, I think the 
> best
> is for me to send a -fix to my patch that is functionally same while keeping 
> the
> common function, and then this your patch should only have a minor conflict 
> and
> Kees can rebase his patches on top to become much smaller?

Sounds good to me!

Thanks!


Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-18 Thread Vlastimil Babka
On 6/18/20 2:35 AM, Roman Gushchin wrote:
> On Wed, Jun 17, 2020 at 04:35:28PM -0700, Andrew Morton wrote:
>> On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin  wrote:
>> 
>> > Instead of having two sets of kmem_caches: one for system-wide and
>> > non-accounted allocations and the second one shared by all accounted
>> > allocations, we can use just one.
>> > 
>> > The idea is simple: space for obj_cgroup metadata can be allocated
>> > on demand and filled only for accounted allocations.
>> > 
>> > It allows to remove a bunch of code which is required to handle
>> > kmem_cache clones for accounted allocations. There is no more need
>> > to create them, accumulate statistics, propagate attributes, etc.
>> > It's a quite significant simplification.
>> > 
>> > Also, because the total number of slab_caches is reduced almost twice
>> > (not all kmem_caches have a memcg clone), some additional memory
>> > savings are expected. On my devvm it additionally saves about 3.5%
>> > of slab memory.
>> > 
>> 
>> This ran afoul of Vlastimil's "mm, slab/slub: move and improve
>> cache_from_obj()"
>> (http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz).  I
>> resolved things as below.  Not too sure about slab.c's
>> cache_from_obj()...
> 
> It can actually be as simple as:
> static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
> {
>   return s;
> }
> 
> But I wonder if we need it at all, or maybe we wanna rename it to
> something like obj_check_kmem_cache(void *obj, struct kmem_cache *s),
> because it has now only debug purposes.
> 
> Let me and Vlastimil figure it out and send a follow-up patch.
> Your version is definitely correct.

Well, Kees wants to restore the common version of cache_from_obj() [1] for SLAB
hardening.

To prevent all that back and forth churn entering git history, I think the best
is for me to send a -fix to my patch that is functionally same while keeping the
common function, and then this your patch should only have a minor conflict and
Kees can rebase his patches on top to become much smaller?

[1] 
https://lore.kernel.org/linux-mm/20200617195349.3471794-1-keesc...@chromium.org/

> Thanks!
> 



Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-17 Thread Roman Gushchin
On Wed, Jun 17, 2020 at 04:35:28PM -0700, Andrew Morton wrote:
> On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin  wrote:
> 
> > Instead of having two sets of kmem_caches: one for system-wide and
> > non-accounted allocations and the second one shared by all accounted
> > allocations, we can use just one.
> > 
> > The idea is simple: space for obj_cgroup metadata can be allocated
> > on demand and filled only for accounted allocations.
> > 
> > It allows to remove a bunch of code which is required to handle
> > kmem_cache clones for accounted allocations. There is no more need
> > to create them, accumulate statistics, propagate attributes, etc.
> > It's a quite significant simplification.
> > 
> > Also, because the total number of slab_caches is reduced almost twice
> > (not all kmem_caches have a memcg clone), some additional memory
> > savings are expected. On my devvm it additionally saves about 3.5%
> > of slab memory.
> > 
> 
> This ran afoul of Vlastimil's "mm, slab/slub: move and improve
> cache_from_obj()"
> (http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz).  I
> resolved things as below.  Not too sure about slab.c's
> cache_from_obj()...

It can actually be as simple as:
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
return s;
}

But I wonder if we need it at all, or maybe we wanna rename it to
something like obj_check_kmem_cache(void *obj, struct kmem_cache *s),
because it has now only debug purposes.

Let me and Vlastimil figure it out and send a follow-up patch.
Your version is definitely correct.

Thanks!


Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-17 Thread Andrew Morton
On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin  wrote:

> Instead of having two sets of kmem_caches: one for system-wide and
> non-accounted allocations and the second one shared by all accounted
> allocations, we can use just one.
> 
> The idea is simple: space for obj_cgroup metadata can be allocated
> on demand and filled only for accounted allocations.
> 
> It allows to remove a bunch of code which is required to handle
> kmem_cache clones for accounted allocations. There is no more need
> to create them, accumulate statistics, propagate attributes, etc.
> It's a quite significant simplification.
> 
> Also, because the total number of slab_caches is reduced almost twice
> (not all kmem_caches have a memcg clone), some additional memory
> savings are expected. On my devvm it additionally saves about 3.5%
> of slab memory.
> 

This ran afoul of Vlastimil's "mm, slab/slub: move and improve
cache_from_obj()"
(http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz).  I
resolved things as below.  Not too sure about slab.c's
cache_from_obj()...


From: Roman Gushchin 
Subject: mm: memcg/slab: use a single set of kmem_caches for all allocations

Instead of having two sets of kmem_caches: one for system-wide and
non-accounted allocations and the second one shared by all accounted
allocations, we can use just one.

The idea is simple: space for obj_cgroup metadata can be allocated
on demand and filled only for accounted allocations.

It allows to remove a bunch of code which is required to handle
kmem_cache clones for accounted allocations. There is no more need
to create them, accumulate statistics, propagate attributes, etc.
It's a quite significant simplification.

Also, because the total number of slab_caches is reduced almost twice
(not all kmem_caches have a memcg clone), some additional memory
savings are expected. On my devvm it additionally saves about 3.5%
of slab memory.

Link: http://lkml.kernel.org/r/20200608230654.828134-18-g...@fb.com
Suggested-by: Johannes Weiner 
Signed-off-by: Roman Gushchin 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Andrew Morton 
---

 include/linux/slab.h |2 
 include/linux/slab_def.h |3 
 include/linux/slub_def.h |   10 -
 mm/memcontrol.c  |5 
 mm/slab.c|   46 ---
 mm/slab.h|  176 ++--
 mm/slab_common.c |  230 -
 mm/slub.c|  166 --
 8 files changed, 58 insertions(+), 580 deletions(-)

--- 
a/include/linux/slab_def.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations
+++ a/include/linux/slab_def.h
@@ -72,9 +72,6 @@ struct kmem_cache {
int obj_offset;
 #endif /* CONFIG_DEBUG_SLAB */
 
-#ifdef CONFIG_MEMCG
-   struct memcg_cache_params memcg_params;
-#endif
 #ifdef CONFIG_KASAN
struct kasan_cache kasan_info;
 #endif
--- 
a/include/linux/slab.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations
+++ a/include/linux/slab.h
@@ -155,8 +155,6 @@ struct kmem_cache *kmem_cache_create_use
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 
-void memcg_create_kmem_cache(struct kmem_cache *cachep);
-
 /*
  * Please use this macro to create slab caches. Simply specify the
  * name of the structure and maybe some flags that are listed above.
--- 
a/include/linux/slub_def.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations
+++ a/include/linux/slub_def.h
@@ -108,17 +108,7 @@ struct kmem_cache {
struct list_head list;  /* List of slab caches */
 #ifdef CONFIG_SYSFS
struct kobject kobj;/* For sysfs */
-   struct work_struct kobj_remove_work;
 #endif
-#ifdef CONFIG_MEMCG
-   struct memcg_cache_params memcg_params;
-   /* For propagation, maximum size of a stored attr */
-   unsigned int max_attr_size;
-#ifdef CONFIG_SYSFS
-   struct kset *memcg_kset;
-#endif
-#endif
-
 #ifdef CONFIG_SLAB_FREELIST_HARDENED
unsigned long random;
 #endif
--- 
a/mm/memcontrol.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations
+++ a/mm/memcontrol.c
@@ -2826,7 +2826,10 @@ struct mem_cgroup *mem_cgroup_from_obj(v
 
off = obj_to_index(page->slab_cache, page, p);
objcg = page_obj_cgroups(page)[off];
-   return obj_cgroup_memcg(objcg);
+   if (objcg)
+   return obj_cgroup_memcg(objcg);
+
+   return NULL;
}
 
/* All other pages use page->mem_cgroup */
--- 
a/mm/slab.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations
+++ a/mm/slab.c
@@ -1369,11 +1369,7 @@ static struct page *kmem_getpages(struct
return NULL;
}
 
-   if (charge_slab_page(page, flags, cachep->gfporder, cachep)) {
-   __free_pages(page, cachep->gfporder);
-   return NULL;
-   }
-
+   charge_slab_page(page, flags, cachep->gfporder, ca