Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, Jun 22, 2020 at 2:58 PM Roman Gushchin wrote: > > On Mon, Jun 22, 2020 at 02:28:54PM -0700, Shakeel Butt wrote: > > On Mon, Jun 22, 2020 at 2:15 PM Roman Gushchin wrote: > > > > > > On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote: > > > > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin wrote: > > > > > > > > > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote: > > > > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > > > > > > > > > > > > > Instead of having two sets of kmem_caches: one for system-wide and > > > > > > > non-accounted allocations and the second one shared by all > > > > > > > accounted > > > > > > > allocations, we can use just one. > > > > > > > > > > > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > > > > > > on demand and filled only for accounted allocations. > > > > > > > > > > > > > > It allows to remove a bunch of code which is required to handle > > > > > > > kmem_cache clones for accounted allocations. There is no more need > > > > > > > to create them, accumulate statistics, propagate attributes, etc. > > > > > > > It's a quite significant simplification. > > > > > > > > > > > > > > Also, because the total number of slab_caches is reduced almost > > > > > > > twice > > > > > > > (not all kmem_caches have a memcg clone), some additional memory > > > > > > > savings are expected. On my devvm it additionally saves about 3.5% > > > > > > > of slab memory. > > > > > > > > > > > > > > Suggested-by: Johannes Weiner > > > > > > > Signed-off-by: Roman Gushchin > > > > > > > Reviewed-by: Vlastimil Babka > > > > > > > --- > > > > > > [snip] > > > > > > > static inline void memcg_slab_post_alloc_hook(struct kmem_cache > > > > > > > *s, > > > > > > > struct obj_cgroup > > > > > > > *objcg, > > > > > > > - size_t size, void > > > > > > > **p) > > > > > > > + gfp_t flags, size_t > > > > > > > size, > > > > > > > + void **p) > > > > > > > { > > > > > > > struct page *page; > > > > > > > unsigned long off; > > > > > > > size_t i; > > > > > > > > > > > > > > + if (!objcg) > > > > > > > + return; > > > > > > > + > > > > > > > + flags &= ~__GFP_ACCOUNT; > > > > > > > for (i = 0; i < size; i++) { > > > > > > > if (likely(p[i])) { > > > > > > > page = virt_to_head_page(p[i]); > > > > > > > + > > > > > > > + if (!page_has_obj_cgroups(page) && > > > > > > > > > > > > The page is already linked into the kmem_cache, don't you need > > > > > > synchronization for memcg_alloc_page_obj_cgroups(). > > > > > > > > > > Hm, yes, in theory we need it. I guess the reason behind why I've > > > > > never seen any issues > > > > > here is the SLUB percpu partial list. > > > > > > > > > > So in theory we need something like: > > > > > > > > > > diff --git a/mm/slab.h b/mm/slab.h > > > > > index 0a31600a0f5c..44bf57815816 100644 > > > > > --- a/mm/slab.h > > > > > +++ b/mm/slab.h > > > > > @@ -237,7 +237,10 @@ static inline int > > > > > memcg_alloc_page_obj_cgroups(struct page *page, > > > > > if (!vec) > > > > > return -ENOMEM; > > > > > > > > > > - page->obj_cgroups = (struct obj_cgroup **) ((unsigned > > > > > long)vec | 0x1UL); > > > > > + if (cmpxchg(&page->obj_cgroups, 0, > > > > > + (struct obj_cgroup **) ((unsigned long)vec | > > > > > 0x1UL))) > > > > > + kfree(vec); > > > > > + > > > > > return 0; > > > > > } > > > > > > > > > > > > > > > But I wonder if we might put it under #ifdef CONFIG_SLAB? > > > > > Or any other ideas how to make it less expensive? > > > > > > > > > > > What's the reason to remove this from charge_slab_page()? > > > > > > > > > > Because at charge_slab_page() we don't know if we'll ever need > > > > > page->obj_cgroups. Some caches might have only few or even zero > > > > > accounted objects. > > > > > > > > > > > > > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely > > > > need page->obj_cgroups. The charge_slab_page() happens between > > > > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able > > > > to tell if page->obj_cgroups is needed. > > > > > > Yes, but the opposite is not always true: we can reuse the existing page > > > without allocated page->obj_cgroups. In this case charge_slab_page() is > > > not involved at all. > > > > > > > Hmm yeah, you are right. I missed that. > > > > > > > > Or do you mean that we can minimize the amount of required synchronization > > > by allocating some obj_cgroups vectors from charge_slab_page()? > > > > One optimization would be to always pre-allocate page->obj_cgroups for > > kmem_caches with SLAB_ACCOUNT. > > Even this is not completely memory o
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, Jun 22, 2020 at 02:28:54PM -0700, Shakeel Butt wrote: > On Mon, Jun 22, 2020 at 2:15 PM Roman Gushchin wrote: > > > > On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote: > > > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin wrote: > > > > > > > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote: > > > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > > > > > > > > > > > Instead of having two sets of kmem_caches: one for system-wide and > > > > > > non-accounted allocations and the second one shared by all accounted > > > > > > allocations, we can use just one. > > > > > > > > > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > > > > > on demand and filled only for accounted allocations. > > > > > > > > > > > > It allows to remove a bunch of code which is required to handle > > > > > > kmem_cache clones for accounted allocations. There is no more need > > > > > > to create them, accumulate statistics, propagate attributes, etc. > > > > > > It's a quite significant simplification. > > > > > > > > > > > > Also, because the total number of slab_caches is reduced almost > > > > > > twice > > > > > > (not all kmem_caches have a memcg clone), some additional memory > > > > > > savings are expected. On my devvm it additionally saves about 3.5% > > > > > > of slab memory. > > > > > > > > > > > > Suggested-by: Johannes Weiner > > > > > > Signed-off-by: Roman Gushchin > > > > > > Reviewed-by: Vlastimil Babka > > > > > > --- > > > > > [snip] > > > > > > static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > > > > > > struct obj_cgroup > > > > > > *objcg, > > > > > > - size_t size, void **p) > > > > > > + gfp_t flags, size_t > > > > > > size, > > > > > > + void **p) > > > > > > { > > > > > > struct page *page; > > > > > > unsigned long off; > > > > > > size_t i; > > > > > > > > > > > > + if (!objcg) > > > > > > + return; > > > > > > + > > > > > > + flags &= ~__GFP_ACCOUNT; > > > > > > for (i = 0; i < size; i++) { > > > > > > if (likely(p[i])) { > > > > > > page = virt_to_head_page(p[i]); > > > > > > + > > > > > > + if (!page_has_obj_cgroups(page) && > > > > > > > > > > The page is already linked into the kmem_cache, don't you need > > > > > synchronization for memcg_alloc_page_obj_cgroups(). > > > > > > > > Hm, yes, in theory we need it. I guess the reason behind why I've never > > > > seen any issues > > > > here is the SLUB percpu partial list. > > > > > > > > So in theory we need something like: > > > > > > > > diff --git a/mm/slab.h b/mm/slab.h > > > > index 0a31600a0f5c..44bf57815816 100644 > > > > --- a/mm/slab.h > > > > +++ b/mm/slab.h > > > > @@ -237,7 +237,10 @@ static inline int > > > > memcg_alloc_page_obj_cgroups(struct page *page, > > > > if (!vec) > > > > return -ENOMEM; > > > > > > > > - page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec > > > > | 0x1UL); > > > > + if (cmpxchg(&page->obj_cgroups, 0, > > > > + (struct obj_cgroup **) ((unsigned long)vec | > > > > 0x1UL))) > > > > + kfree(vec); > > > > + > > > > return 0; > > > > } > > > > > > > > > > > > But I wonder if we might put it under #ifdef CONFIG_SLAB? > > > > Or any other ideas how to make it less expensive? > > > > > > > > > What's the reason to remove this from charge_slab_page()? > > > > > > > > Because at charge_slab_page() we don't know if we'll ever need > > > > page->obj_cgroups. Some caches might have only few or even zero > > > > accounted objects. > > > > > > > > > > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely > > > need page->obj_cgroups. The charge_slab_page() happens between > > > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able > > > to tell if page->obj_cgroups is needed. > > > > Yes, but the opposite is not always true: we can reuse the existing page > > without allocated page->obj_cgroups. In this case charge_slab_page() is > > not involved at all. > > > > Hmm yeah, you are right. I missed that. > > > > > Or do you mean that we can minimize the amount of required synchronization > > by allocating some obj_cgroups vectors from charge_slab_page()? > > One optimization would be to always pre-allocate page->obj_cgroups for > kmem_caches with SLAB_ACCOUNT. Even this is not completely memory overhead-free, because processes belonging to the root cgroup and kthreads might allocate from such cache. Anyway, I think I'll go with cmpxchg() for now and will think about possible optimizations later. Because the allocation happens only once per the lifetime of a slab page, and is very unlikely racing with a concurrent
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, Jun 22, 2020 at 2:15 PM Roman Gushchin wrote: > > On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote: > > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin wrote: > > > > > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote: > > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > > > > > > > > > Instead of having two sets of kmem_caches: one for system-wide and > > > > > non-accounted allocations and the second one shared by all accounted > > > > > allocations, we can use just one. > > > > > > > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > > > > on demand and filled only for accounted allocations. > > > > > > > > > > It allows to remove a bunch of code which is required to handle > > > > > kmem_cache clones for accounted allocations. There is no more need > > > > > to create them, accumulate statistics, propagate attributes, etc. > > > > > It's a quite significant simplification. > > > > > > > > > > Also, because the total number of slab_caches is reduced almost twice > > > > > (not all kmem_caches have a memcg clone), some additional memory > > > > > savings are expected. On my devvm it additionally saves about 3.5% > > > > > of slab memory. > > > > > > > > > > Suggested-by: Johannes Weiner > > > > > Signed-off-by: Roman Gushchin > > > > > Reviewed-by: Vlastimil Babka > > > > > --- > > > > [snip] > > > > > static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > > > > > struct obj_cgroup > > > > > *objcg, > > > > > - size_t size, void **p) > > > > > + gfp_t flags, size_t > > > > > size, > > > > > + void **p) > > > > > { > > > > > struct page *page; > > > > > unsigned long off; > > > > > size_t i; > > > > > > > > > > + if (!objcg) > > > > > + return; > > > > > + > > > > > + flags &= ~__GFP_ACCOUNT; > > > > > for (i = 0; i < size; i++) { > > > > > if (likely(p[i])) { > > > > > page = virt_to_head_page(p[i]); > > > > > + > > > > > + if (!page_has_obj_cgroups(page) && > > > > > > > > The page is already linked into the kmem_cache, don't you need > > > > synchronization for memcg_alloc_page_obj_cgroups(). > > > > > > Hm, yes, in theory we need it. I guess the reason behind why I've never > > > seen any issues > > > here is the SLUB percpu partial list. > > > > > > So in theory we need something like: > > > > > > diff --git a/mm/slab.h b/mm/slab.h > > > index 0a31600a0f5c..44bf57815816 100644 > > > --- a/mm/slab.h > > > +++ b/mm/slab.h > > > @@ -237,7 +237,10 @@ static inline int > > > memcg_alloc_page_obj_cgroups(struct page *page, > > > if (!vec) > > > return -ENOMEM; > > > > > > - page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | > > > 0x1UL); > > > + if (cmpxchg(&page->obj_cgroups, 0, > > > + (struct obj_cgroup **) ((unsigned long)vec | 0x1UL))) > > > + kfree(vec); > > > + > > > return 0; > > > } > > > > > > > > > But I wonder if we might put it under #ifdef CONFIG_SLAB? > > > Or any other ideas how to make it less expensive? > > > > > > > What's the reason to remove this from charge_slab_page()? > > > > > > Because at charge_slab_page() we don't know if we'll ever need > > > page->obj_cgroups. Some caches might have only few or even zero > > > accounted objects. > > > > > > > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely > > need page->obj_cgroups. The charge_slab_page() happens between > > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able > > to tell if page->obj_cgroups is needed. > > Yes, but the opposite is not always true: we can reuse the existing page > without allocated page->obj_cgroups. In this case charge_slab_page() is > not involved at all. > Hmm yeah, you are right. I missed that. > > Or do you mean that we can minimize the amount of required synchronization > by allocating some obj_cgroups vectors from charge_slab_page()? One optimization would be to always pre-allocate page->obj_cgroups for kmem_caches with SLAB_ACCOUNT.
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote: > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin wrote: > > > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote: > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > > > > > > > Instead of having two sets of kmem_caches: one for system-wide and > > > > non-accounted allocations and the second one shared by all accounted > > > > allocations, we can use just one. > > > > > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > > > on demand and filled only for accounted allocations. > > > > > > > > It allows to remove a bunch of code which is required to handle > > > > kmem_cache clones for accounted allocations. There is no more need > > > > to create them, accumulate statistics, propagate attributes, etc. > > > > It's a quite significant simplification. > > > > > > > > Also, because the total number of slab_caches is reduced almost twice > > > > (not all kmem_caches have a memcg clone), some additional memory > > > > savings are expected. On my devvm it additionally saves about 3.5% > > > > of slab memory. > > > > > > > > Suggested-by: Johannes Weiner > > > > Signed-off-by: Roman Gushchin > > > > Reviewed-by: Vlastimil Babka > > > > --- > > > [snip] > > > > static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > > > > struct obj_cgroup *objcg, > > > > - size_t size, void **p) > > > > + gfp_t flags, size_t size, > > > > + void **p) > > > > { > > > > struct page *page; > > > > unsigned long off; > > > > size_t i; > > > > > > > > + if (!objcg) > > > > + return; > > > > + > > > > + flags &= ~__GFP_ACCOUNT; > > > > for (i = 0; i < size; i++) { > > > > if (likely(p[i])) { > > > > page = virt_to_head_page(p[i]); > > > > + > > > > + if (!page_has_obj_cgroups(page) && > > > > > > The page is already linked into the kmem_cache, don't you need > > > synchronization for memcg_alloc_page_obj_cgroups(). > > > > Hm, yes, in theory we need it. I guess the reason behind why I've never > > seen any issues > > here is the SLUB percpu partial list. > > > > So in theory we need something like: > > > > diff --git a/mm/slab.h b/mm/slab.h > > index 0a31600a0f5c..44bf57815816 100644 > > --- a/mm/slab.h > > +++ b/mm/slab.h > > @@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct > > page *page, > > if (!vec) > > return -ENOMEM; > > > > - page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | > > 0x1UL); > > + if (cmpxchg(&page->obj_cgroups, 0, > > + (struct obj_cgroup **) ((unsigned long)vec | 0x1UL))) > > + kfree(vec); > > + > > return 0; > > } > > > > > > But I wonder if we might put it under #ifdef CONFIG_SLAB? > > Or any other ideas how to make it less expensive? > > > > > What's the reason to remove this from charge_slab_page()? > > > > Because at charge_slab_page() we don't know if we'll ever need > > page->obj_cgroups. Some caches might have only few or even zero > > accounted objects. > > > > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely > need page->obj_cgroups. The charge_slab_page() happens between > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able > to tell if page->obj_cgroups is needed. Yes, but the opposite is not always true: we can reuse the existing page without allocated page->obj_cgroups. In this case charge_slab_page() is not involved at all. Or do you mean that we can minimize the amount of required synchronization by allocating some obj_cgroups vectors from charge_slab_page()?
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin wrote: > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote: > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > > > > > Instead of having two sets of kmem_caches: one for system-wide and > > > non-accounted allocations and the second one shared by all accounted > > > allocations, we can use just one. > > > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > > on demand and filled only for accounted allocations. > > > > > > It allows to remove a bunch of code which is required to handle > > > kmem_cache clones for accounted allocations. There is no more need > > > to create them, accumulate statistics, propagate attributes, etc. > > > It's a quite significant simplification. > > > > > > Also, because the total number of slab_caches is reduced almost twice > > > (not all kmem_caches have a memcg clone), some additional memory > > > savings are expected. On my devvm it additionally saves about 3.5% > > > of slab memory. > > > > > > Suggested-by: Johannes Weiner > > > Signed-off-by: Roman Gushchin > > > Reviewed-by: Vlastimil Babka > > > --- > > [snip] > > > static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > > > struct obj_cgroup *objcg, > > > - size_t size, void **p) > > > + gfp_t flags, size_t size, > > > + void **p) > > > { > > > struct page *page; > > > unsigned long off; > > > size_t i; > > > > > > + if (!objcg) > > > + return; > > > + > > > + flags &= ~__GFP_ACCOUNT; > > > for (i = 0; i < size; i++) { > > > if (likely(p[i])) { > > > page = virt_to_head_page(p[i]); > > > + > > > + if (!page_has_obj_cgroups(page) && > > > > The page is already linked into the kmem_cache, don't you need > > synchronization for memcg_alloc_page_obj_cgroups(). > > Hm, yes, in theory we need it. I guess the reason behind why I've never seen > any issues > here is the SLUB percpu partial list. > > So in theory we need something like: > > diff --git a/mm/slab.h b/mm/slab.h > index 0a31600a0f5c..44bf57815816 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct > page *page, > if (!vec) > return -ENOMEM; > > - page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | > 0x1UL); > + if (cmpxchg(&page->obj_cgroups, 0, > + (struct obj_cgroup **) ((unsigned long)vec | 0x1UL))) > + kfree(vec); > + > return 0; > } > > > But I wonder if we might put it under #ifdef CONFIG_SLAB? > Or any other ideas how to make it less expensive? > > > What's the reason to remove this from charge_slab_page()? > > Because at charge_slab_page() we don't know if we'll ever need > page->obj_cgroups. Some caches might have only few or even zero > accounted objects. > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely need page->obj_cgroups. The charge_slab_page() happens between slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able to tell if page->obj_cgroups is needed.
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote: > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > > > Instead of having two sets of kmem_caches: one for system-wide and > > non-accounted allocations and the second one shared by all accounted > > allocations, we can use just one. > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > on demand and filled only for accounted allocations. > > > > It allows to remove a bunch of code which is required to handle > > kmem_cache clones for accounted allocations. There is no more need > > to create them, accumulate statistics, propagate attributes, etc. > > It's a quite significant simplification. > > > > Also, because the total number of slab_caches is reduced almost twice > > (not all kmem_caches have a memcg clone), some additional memory > > savings are expected. On my devvm it additionally saves about 3.5% > > of slab memory. > > > > Suggested-by: Johannes Weiner > > Signed-off-by: Roman Gushchin > > Reviewed-by: Vlastimil Babka > > --- > [snip] > > static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > > struct obj_cgroup *objcg, > > - size_t size, void **p) > > + gfp_t flags, size_t size, > > + void **p) > > { > > struct page *page; > > unsigned long off; > > size_t i; > > > > + if (!objcg) > > + return; > > + > > + flags &= ~__GFP_ACCOUNT; > > for (i = 0; i < size; i++) { > > if (likely(p[i])) { > > page = virt_to_head_page(p[i]); > > + > > + if (!page_has_obj_cgroups(page) && > > The page is already linked into the kmem_cache, don't you need > synchronization for memcg_alloc_page_obj_cgroups(). Hm, yes, in theory we need it. I guess the reason behind why I've never seen any issues here is the SLUB percpu partial list. So in theory we need something like: diff --git a/mm/slab.h b/mm/slab.h index 0a31600a0f5c..44bf57815816 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct page *page, if (!vec) return -ENOMEM; - page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 0x1UL); + if (cmpxchg(&page->obj_cgroups, 0, + (struct obj_cgroup **) ((unsigned long)vec | 0x1UL))) + kfree(vec); + return 0; } But I wonder if we might put it under #ifdef CONFIG_SLAB? Or any other ideas how to make it less expensive? > What's the reason to remove this from charge_slab_page()? Because at charge_slab_page() we don't know if we'll ever need page->obj_cgroups. Some caches might have only few or even zero accounted objects. > > > + memcg_alloc_page_obj_cgroups(page, s, flags)) { > > + obj_cgroup_uncharge(objcg, > > obj_full_size(s)); > > + continue; > > + } > > + > > off = obj_to_index(s, page, p[i]); > > obj_cgroup_get(objcg); > > page_obj_cgroups(page)[off] = objcg;
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > Instead of having two sets of kmem_caches: one for system-wide and > non-accounted allocations and the second one shared by all accounted > allocations, we can use just one. > > The idea is simple: space for obj_cgroup metadata can be allocated > on demand and filled only for accounted allocations. > > It allows to remove a bunch of code which is required to handle > kmem_cache clones for accounted allocations. There is no more need > to create them, accumulate statistics, propagate attributes, etc. > It's a quite significant simplification. > > Also, because the total number of slab_caches is reduced almost twice > (not all kmem_caches have a memcg clone), some additional memory > savings are expected. On my devvm it additionally saves about 3.5% > of slab memory. > > Suggested-by: Johannes Weiner > Signed-off-by: Roman Gushchin > Reviewed-by: Vlastimil Babka > --- [snip] > static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, > struct obj_cgroup *objcg, > - size_t size, void **p) > + gfp_t flags, size_t size, > + void **p) > { > struct page *page; > unsigned long off; > size_t i; > > + if (!objcg) > + return; > + > + flags &= ~__GFP_ACCOUNT; > for (i = 0; i < size; i++) { > if (likely(p[i])) { > page = virt_to_head_page(p[i]); > + > + if (!page_has_obj_cgroups(page) && The page is already linked into the kmem_cache, don't you need synchronization for memcg_alloc_page_obj_cgroups(). What's the reason to remove this from charge_slab_page()? > + memcg_alloc_page_obj_cgroups(page, s, flags)) { > + obj_cgroup_uncharge(objcg, obj_full_size(s)); > + continue; > + } > + > off = obj_to_index(s, page, p[i]); > obj_cgroup_get(objcg); > page_obj_cgroups(page)[off] = objcg;
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Thu, Jun 18, 2020 at 09:33:08AM +0200, Vlastimil Babka wrote: > On 6/18/20 2:35 AM, Roman Gushchin wrote: > > On Wed, Jun 17, 2020 at 04:35:28PM -0700, Andrew Morton wrote: > >> On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin wrote: > >> > >> > Instead of having two sets of kmem_caches: one for system-wide and > >> > non-accounted allocations and the second one shared by all accounted > >> > allocations, we can use just one. > >> > > >> > The idea is simple: space for obj_cgroup metadata can be allocated > >> > on demand and filled only for accounted allocations. > >> > > >> > It allows to remove a bunch of code which is required to handle > >> > kmem_cache clones for accounted allocations. There is no more need > >> > to create them, accumulate statistics, propagate attributes, etc. > >> > It's a quite significant simplification. > >> > > >> > Also, because the total number of slab_caches is reduced almost twice > >> > (not all kmem_caches have a memcg clone), some additional memory > >> > savings are expected. On my devvm it additionally saves about 3.5% > >> > of slab memory. > >> > > >> > >> This ran afoul of Vlastimil's "mm, slab/slub: move and improve > >> cache_from_obj()" > >> (http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz). I > >> resolved things as below. Not too sure about slab.c's > >> cache_from_obj()... > > > > It can actually be as simple as: > > static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void > > *x) > > { > > return s; > > } > > > > But I wonder if we need it at all, or maybe we wanna rename it to > > something like obj_check_kmem_cache(void *obj, struct kmem_cache *s), > > because it has now only debug purposes. > > > > Let me and Vlastimil figure it out and send a follow-up patch. > > Your version is definitely correct. > > Well, Kees wants to restore the common version of cache_from_obj() [1] for > SLAB > hardening. > > To prevent all that back and forth churn entering git history, I think the > best > is for me to send a -fix to my patch that is functionally same while keeping > the > common function, and then this your patch should only have a minor conflict > and > Kees can rebase his patches on top to become much smaller? Sounds good to me! Thanks!
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On 6/18/20 2:35 AM, Roman Gushchin wrote: > On Wed, Jun 17, 2020 at 04:35:28PM -0700, Andrew Morton wrote: >> On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin wrote: >> >> > Instead of having two sets of kmem_caches: one for system-wide and >> > non-accounted allocations and the second one shared by all accounted >> > allocations, we can use just one. >> > >> > The idea is simple: space for obj_cgroup metadata can be allocated >> > on demand and filled only for accounted allocations. >> > >> > It allows to remove a bunch of code which is required to handle >> > kmem_cache clones for accounted allocations. There is no more need >> > to create them, accumulate statistics, propagate attributes, etc. >> > It's a quite significant simplification. >> > >> > Also, because the total number of slab_caches is reduced almost twice >> > (not all kmem_caches have a memcg clone), some additional memory >> > savings are expected. On my devvm it additionally saves about 3.5% >> > of slab memory. >> > >> >> This ran afoul of Vlastimil's "mm, slab/slub: move and improve >> cache_from_obj()" >> (http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz). I >> resolved things as below. Not too sure about slab.c's >> cache_from_obj()... > > It can actually be as simple as: > static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) > { > return s; > } > > But I wonder if we need it at all, or maybe we wanna rename it to > something like obj_check_kmem_cache(void *obj, struct kmem_cache *s), > because it has now only debug purposes. > > Let me and Vlastimil figure it out and send a follow-up patch. > Your version is definitely correct. Well, Kees wants to restore the common version of cache_from_obj() [1] for SLAB hardening. To prevent all that back and forth churn entering git history, I think the best is for me to send a -fix to my patch that is functionally same while keeping the common function, and then this your patch should only have a minor conflict and Kees can rebase his patches on top to become much smaller? [1] https://lore.kernel.org/linux-mm/20200617195349.3471794-1-keesc...@chromium.org/ > Thanks! >
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Wed, Jun 17, 2020 at 04:35:28PM -0700, Andrew Morton wrote: > On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin wrote: > > > Instead of having two sets of kmem_caches: one for system-wide and > > non-accounted allocations and the second one shared by all accounted > > allocations, we can use just one. > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > on demand and filled only for accounted allocations. > > > > It allows to remove a bunch of code which is required to handle > > kmem_cache clones for accounted allocations. There is no more need > > to create them, accumulate statistics, propagate attributes, etc. > > It's a quite significant simplification. > > > > Also, because the total number of slab_caches is reduced almost twice > > (not all kmem_caches have a memcg clone), some additional memory > > savings are expected. On my devvm it additionally saves about 3.5% > > of slab memory. > > > > This ran afoul of Vlastimil's "mm, slab/slub: move and improve > cache_from_obj()" > (http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz). I > resolved things as below. Not too sure about slab.c's > cache_from_obj()... It can actually be as simple as: static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) { return s; } But I wonder if we need it at all, or maybe we wanna rename it to something like obj_check_kmem_cache(void *obj, struct kmem_cache *s), because it has now only debug purposes. Let me and Vlastimil figure it out and send a follow-up patch. Your version is definitely correct. Thanks!
Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations
On Mon, 8 Jun 2020 16:06:52 -0700 Roman Gushchin wrote: > Instead of having two sets of kmem_caches: one for system-wide and > non-accounted allocations and the second one shared by all accounted > allocations, we can use just one. > > The idea is simple: space for obj_cgroup metadata can be allocated > on demand and filled only for accounted allocations. > > It allows to remove a bunch of code which is required to handle > kmem_cache clones for accounted allocations. There is no more need > to create them, accumulate statistics, propagate attributes, etc. > It's a quite significant simplification. > > Also, because the total number of slab_caches is reduced almost twice > (not all kmem_caches have a memcg clone), some additional memory > savings are expected. On my devvm it additionally saves about 3.5% > of slab memory. > This ran afoul of Vlastimil's "mm, slab/slub: move and improve cache_from_obj()" (http://lkml.kernel.org/r/20200610163135.17364-10-vba...@suse.cz). I resolved things as below. Not too sure about slab.c's cache_from_obj()... From: Roman Gushchin Subject: mm: memcg/slab: use a single set of kmem_caches for all allocations Instead of having two sets of kmem_caches: one for system-wide and non-accounted allocations and the second one shared by all accounted allocations, we can use just one. The idea is simple: space for obj_cgroup metadata can be allocated on demand and filled only for accounted allocations. It allows to remove a bunch of code which is required to handle kmem_cache clones for accounted allocations. There is no more need to create them, accumulate statistics, propagate attributes, etc. It's a quite significant simplification. Also, because the total number of slab_caches is reduced almost twice (not all kmem_caches have a memcg clone), some additional memory savings are expected. On my devvm it additionally saves about 3.5% of slab memory. Link: http://lkml.kernel.org/r/20200608230654.828134-18-g...@fb.com Suggested-by: Johannes Weiner Signed-off-by: Roman Gushchin Reviewed-by: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/slab.h |2 include/linux/slab_def.h |3 include/linux/slub_def.h | 10 - mm/memcontrol.c |5 mm/slab.c| 46 --- mm/slab.h| 176 ++-- mm/slab_common.c | 230 - mm/slub.c| 166 -- 8 files changed, 58 insertions(+), 580 deletions(-) --- a/include/linux/slab_def.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/include/linux/slab_def.h @@ -72,9 +72,6 @@ struct kmem_cache { int obj_offset; #endif /* CONFIG_DEBUG_SLAB */ -#ifdef CONFIG_MEMCG - struct memcg_cache_params memcg_params; -#endif #ifdef CONFIG_KASAN struct kasan_cache kasan_info; #endif --- a/include/linux/slab.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/include/linux/slab.h @@ -155,8 +155,6 @@ struct kmem_cache *kmem_cache_create_use void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); -void memcg_create_kmem_cache(struct kmem_cache *cachep); - /* * Please use this macro to create slab caches. Simply specify the * name of the structure and maybe some flags that are listed above. --- a/include/linux/slub_def.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/include/linux/slub_def.h @@ -108,17 +108,7 @@ struct kmem_cache { struct list_head list; /* List of slab caches */ #ifdef CONFIG_SYSFS struct kobject kobj;/* For sysfs */ - struct work_struct kobj_remove_work; #endif -#ifdef CONFIG_MEMCG - struct memcg_cache_params memcg_params; - /* For propagation, maximum size of a stored attr */ - unsigned int max_attr_size; -#ifdef CONFIG_SYSFS - struct kset *memcg_kset; -#endif -#endif - #ifdef CONFIG_SLAB_FREELIST_HARDENED unsigned long random; #endif --- a/mm/memcontrol.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/mm/memcontrol.c @@ -2826,7 +2826,10 @@ struct mem_cgroup *mem_cgroup_from_obj(v off = obj_to_index(page->slab_cache, page, p); objcg = page_obj_cgroups(page)[off]; - return obj_cgroup_memcg(objcg); + if (objcg) + return obj_cgroup_memcg(objcg); + + return NULL; } /* All other pages use page->mem_cgroup */ --- a/mm/slab.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/mm/slab.c @@ -1369,11 +1369,7 @@ static struct page *kmem_getpages(struct return NULL; } - if (charge_slab_page(page, flags, cachep->gfporder, cachep)) { - __free_pages(page, cachep->gfporder); - return NULL; - } - + charge_slab_page(page, flags, cachep->gfporder, ca