Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
On Fri, Mar 12, 2021 at 11:59 PM Johannes Weiner wrote: > > On Fri, Mar 12, 2021 at 05:22:55PM +0800, Muchun Song wrote: > > On Thu, Mar 11, 2021 at 6:05 AM Johannes Weiner wrote: > > > > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct > > > > uncharge_gather *ug) > > > > > > > > static void uncharge_page(struct page *page, struct uncharge_gather > > > > *ug) > > > > { > > > > - unsigned long nr_pages; > > > > + unsigned long nr_pages, nr_kmem; > > > > struct mem_cgroup *memcg; > > > > > > > > VM_BUG_ON_PAGE(PageLRU(page), page); > > > > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, > > > > struct uncharge_gather *ug) > > > > if (!page_memcg_charged(page)) > > > > return; > > > > > > > > + nr_pages = compound_nr(page); > > > > /* > > > >* Nobody should be changing or seriously looking at > > > > - * page memcg at this point, we have fully exclusive > > > > - * access to the page. > > > > + * page memcg or objcg at this point, we have fully > > > > + * exclusive access to the page. > > > >*/ > > > > - memcg = page_memcg_check(page); > > > > + if (PageMemcgKmem(page)) { > > > > + struct obj_cgroup *objcg; > > > > + > > > > + objcg = page_objcg(page); > > > > + memcg = obj_cgroup_memcg_get(objcg); > > > > + > > > > + page->memcg_data = 0; > > > > + obj_cgroup_put(objcg); > > > > + nr_kmem = nr_pages; > > > > + } else { > > > > + memcg = page_memcg(page); > > > > + page->memcg_data = 0; > > > > + nr_kmem = 0; > > > > + } > > > > > > Why is all this moved above the uncharge_batch() call? > > > > Before calling obj_cgroup_put(), we need set page->memcg_data > > to zero. So I move "page->memcg_data = 0" to here. > > Yeah, it makes sense to keep those together, but we can move them both > down to after the uncharge, right? Right. I am doing this. > > > > It separates the pointer manipulations from the refcounting, which > > > makes the code very difficult to follow. > > > > > > > + > > > > if (ug->memcg != memcg) { > > > > if (ug->memcg) { > > > > uncharge_batch(ug); > > > > uncharge_gather_clear(ug); > > > > } > > > > ug->memcg = memcg; > > > > + ug->dummy_page = page; > > > > > > Why this change? > > > > Just like ug->memcg, we do not need to set > > ug->dummy_page in every loop. > > Ah, okay. That's a reasonable change, it's just confusing because I > thought this was a requirement for the new code to work. But I didn't > see how it relied on that, and it made me think I'm not understanding > your code ;) It's better to split that into a separate patch. Sorry for confusing you. I will split that into a separate patch. Thanks. > > > I will rework the code in the next version. > > Thanks!
Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
On Fri, Mar 12, 2021 at 05:22:55PM +0800, Muchun Song wrote: > On Thu, Mar 11, 2021 at 6:05 AM Johannes Weiner wrote: > > > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct > > > uncharge_gather *ug) > > > > > > static void uncharge_page(struct page *page, struct uncharge_gather *ug) > > > { > > > - unsigned long nr_pages; > > > + unsigned long nr_pages, nr_kmem; > > > struct mem_cgroup *memcg; > > > > > > VM_BUG_ON_PAGE(PageLRU(page), page); > > > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, > > > struct uncharge_gather *ug) > > > if (!page_memcg_charged(page)) > > > return; > > > > > > + nr_pages = compound_nr(page); > > > /* > > >* Nobody should be changing or seriously looking at > > > - * page memcg at this point, we have fully exclusive > > > - * access to the page. > > > + * page memcg or objcg at this point, we have fully > > > + * exclusive access to the page. > > >*/ > > > - memcg = page_memcg_check(page); > > > + if (PageMemcgKmem(page)) { > > > + struct obj_cgroup *objcg; > > > + > > > + objcg = page_objcg(page); > > > + memcg = obj_cgroup_memcg_get(objcg); > > > + > > > + page->memcg_data = 0; > > > + obj_cgroup_put(objcg); > > > + nr_kmem = nr_pages; > > > + } else { > > > + memcg = page_memcg(page); > > > + page->memcg_data = 0; > > > + nr_kmem = 0; > > > + } > > > > Why is all this moved above the uncharge_batch() call? > > Before calling obj_cgroup_put(), we need set page->memcg_data > to zero. So I move "page->memcg_data = 0" to here. Yeah, it makes sense to keep those together, but we can move them both down to after the uncharge, right? > > It separates the pointer manipulations from the refcounting, which > > makes the code very difficult to follow. > > > > > + > > > if (ug->memcg != memcg) { > > > if (ug->memcg) { > > > uncharge_batch(ug); > > > uncharge_gather_clear(ug); > > > } > > > ug->memcg = memcg; > > > + ug->dummy_page = page; > > > > Why this change? > > Just like ug->memcg, we do not need to set > ug->dummy_page in every loop. Ah, okay. That's a reasonable change, it's just confusing because I thought this was a requirement for the new code to work. But I didn't see how it relied on that, and it made me think I'm not understanding your code ;) It's better to split that into a separate patch. > I will rework the code in the next version. Thanks!
Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
On Thu, Mar 11, 2021 at 6:05 AM Johannes Weiner wrote: > > Hello Munchun, > > On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote: > > @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct > > uncharge_gather *ug) > > static void uncharge_batch(const struct uncharge_gather *ug) > > { > > unsigned long flags; > > + unsigned long nr_pages; > > > > - if (!mem_cgroup_is_root(ug->memcg)) { > > - page_counter_uncharge(>memcg->memory, ug->nr_pages); > > + /* > > + * The kmem pages can be reparented to the root memcg, in > > + * order to prevent the memory counter of root memcg from > > + * increasing indefinitely. We should decrease the memory > > + * counter when unchange. > > + */ > > + if (mem_cgroup_is_root(ug->memcg)) > > + nr_pages = ug->nr_kmem; > > + else > > + nr_pages = ug->nr_pages; > > Correct or not, I find this unreadable. We're uncharging nr_kmem on > the root, and nr_pages against leaf groups? > > It implies several things that might not be immediately obvious to the > reader of this function. Namely, that nr_kmem is a subset of nr_pages. > Or that we don't *want* to account LRU pages for the root cgroup. > > The old code followed a very simple pattern: the root memcg's page > counters aren't touched. > > This is no longer true: we modify them depending on very specific > circumstances. But that's too clever for the stupid uncharge_batch() > which is only supposed to flush a number of accumulators into their > corresponding page counters. > > This distinction really needs to be moved down to uncharge_page() now. OK. I will rework the code here to make it readable. > > > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct > > uncharge_gather *ug) > > > > static void uncharge_page(struct page *page, struct uncharge_gather *ug) > > { > > - unsigned long nr_pages; > > + unsigned long nr_pages, nr_kmem; > > struct mem_cgroup *memcg; > > > > VM_BUG_ON_PAGE(PageLRU(page), page); > > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct > > uncharge_gather *ug) > > if (!page_memcg_charged(page)) > > return; > > > > + nr_pages = compound_nr(page); > > /* > >* Nobody should be changing or seriously looking at > > - * page memcg at this point, we have fully exclusive > > - * access to the page. > > + * page memcg or objcg at this point, we have fully > > + * exclusive access to the page. > >*/ > > - memcg = page_memcg_check(page); > > + if (PageMemcgKmem(page)) { > > + struct obj_cgroup *objcg; > > + > > + objcg = page_objcg(page); > > + memcg = obj_cgroup_memcg_get(objcg); > > + > > + page->memcg_data = 0; > > + obj_cgroup_put(objcg); > > + nr_kmem = nr_pages; > > + } else { > > + memcg = page_memcg(page); > > + page->memcg_data = 0; > > + nr_kmem = 0; > > + } > > Why is all this moved above the uncharge_batch() call? Before calling obj_cgroup_put(), we need set page->memcg_data to zero. So I move "page->memcg_data = 0" to here. > > It separates the pointer manipulations from the refcounting, which > makes the code very difficult to follow. > > > + > > if (ug->memcg != memcg) { > > if (ug->memcg) { > > uncharge_batch(ug); > > uncharge_gather_clear(ug); > > } > > ug->memcg = memcg; > > + ug->dummy_page = page; > > Why this change? Just like ug->memcg, we do not need to set ug->dummy_page in every loop. > > > /* pairs with css_put in uncharge_batch */ > > css_get(>memcg->css); > > } > > > > - nr_pages = compound_nr(page); > > ug->nr_pages += nr_pages; > > + ug->nr_kmem += nr_kmem; > > + ug->pgpgout += !nr_kmem; > > Oof. > > Yes, this pgpgout line is an equivalent transformation for counting > LRU compound pages. But unless you already know that, it's completely > impossible to understand what the intent here is. > > Please avoid clever tricks like this. If you need to check whether the > page is kmem, test PageMemcgKmem() instead of abusing the counters as > boolean flags. This is supposed to be read by human beings, too. Got it. > > > - if (PageMemcgKmem(page)) > > - ug->nr_kmem += nr_pages; > > - else > > - ug->pgpgout++; > > - > > - ug->dummy_page = page; > > - page->memcg_data = 0; > > - css_put(>memcg->css); > > + css_put(>css); > > Sorry, these two functions are no longer readable after your changes. > > Please retain the following sequence as discrete steps: > > 1. look up memcg from the page > 2. flush existing batch if memcg changed > 3. add page's various counts to the current batch > 4. clear page->memcg and decrease the referece
Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
On Thu, Mar 11, 2021 at 3:53 AM Roman Gushchin wrote: > > On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote: > > Since Roman series "The new cgroup slab memory controller" applied. All > > slab objects are charged via the new APIs of obj_cgroup. The new APIs > > introduce a struct obj_cgroup to charge slab objects. It prevents > > long-living objects from pinning the original memory cgroup in the memory. > > But there are still some corner objects (e.g. allocations larger than > > order-1 page on SLUB) which are not charged via the new APIs. Those > > objects (include the pages which are allocated from buddy allocator > > directly) are charged as kmem pages which still hold a reference to > > the memory cgroup. > > > > This patch aims to charge the kmem pages by using the new APIs of > > obj_cgroup. Finally, the page->memcg_data of the kmem page points to > > an object cgroup. We can use the page_objcg() to get the object > > cgroup associated with a kmem page. Or we can use page_memcg_check() > > to get the memory cgroup associated with a kmem page, but caller must > > ensure that the returned memcg won't be released (e.g. acquire the > > rcu_read_lock or css_set_lock). > > > > Signed-off-by: Muchun Song > > --- > > include/linux/memcontrol.h | 63 ++-- > > mm/memcontrol.c| 119 > > ++--- > > 2 files changed, 128 insertions(+), 54 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 83cbcdcfcc92..07c449af9c0f 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page > > *page) > > } > > > > /* > > + * After the initialization objcg->memcg is always pointing at > > + * a valid memcg, but can be atomically swapped to the parent memcg. > > + * > > + * The caller must ensure that the returned memcg won't be released: > > + * e.g. acquire the rcu_read_lock or css_set_lock. > > + */ > > +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > > +{ > > + return READ_ONCE(objcg->memcg); > > +} > > + > > +/* > > * page_memcg - get the memory cgroup associated with a non-kmem page > > * @page: a pointer to the page struct > > * > > @@ -422,15 +434,19 @@ static inline struct mem_cgroup > > *page_memcg_rcu(struct page *page) > > * @page: a pointer to the page struct > > * > > * Returns a pointer to the memory cgroup associated with the page, > > - * or NULL. This function unlike page_memcg() can take any page > > + * or NULL. This function unlike page_memcg() can take any page > > * as an argument. It has to be used in cases when it's not known if a page > > - * has an associated memory cgroup pointer or an object cgroups vector. > > + * has an associated memory cgroup pointer or an object cgroups vector or > > + * an object cgroup. > > * > > * Any of the following ensures page and memcg binding stability: > > * - the page lock > > * - LRU isolation > > * - lock_page_memcg() > > * - exclusive reference > > + * > > + * Should be called under rcu lock which can protect memcg associated with > > a > > + * kmem page from being released. > > How about this: > > For a non-kmem page any of the following ensures page and memcg binding > stability: > - the page lock > - LRU isolation > - lock_page_memcg() > - exclusive reference > > For a kmem page a caller should hold an rcu read lock to protect memcg > associated > with a kmem page from being released. OK. I will use this. Thanks Roman. > > > */ > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > @@ -443,6 +459,13 @@ static inline struct mem_cgroup > > *page_memcg_check(struct page *page) > > if (memcg_data & MEMCG_DATA_OBJCGS) > > return NULL; > > > > + if (memcg_data & MEMCG_DATA_KMEM) { > > + struct obj_cgroup *objcg; > > + > > + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > + return obj_cgroup_memcg(objcg); > > + } > > + > > return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > @@ -501,6 +524,25 @@ static inline struct obj_cgroup > > **page_objcgs_check(struct page *page) > > return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > +/* > > + * page_objcg - get the object cgroup associated with a kmem page > > + * @page: a pointer to the page struct > > + * > > + * Returns a pointer to the object cgroup associated with the kmem page, > > + * or NULL. This function assumes that the page is known to have an > > + * associated object cgroup. It's only safe to call this function > > + * against kmem pages (PageMemcgKmem() returns true). > > + */ > > +static inline struct obj_cgroup *page_objcg(struct page *page) > > +{ > > + unsigned long memcg_data = page->memcg_data; > > + > > +