Also, this one is needed: 8df4a44cc46b "mm: check shrinker is memcg-aware in register_shrinker_prepared()"
On 27.03.2020 19:45, Andrey Ryabinin wrote: > From: Kirill Tkhai <ktk...@virtuozzo.com> > > The patch introduces a special value SHRINKER_REGISTERING to use instead > of list_empty() to differ a registering shrinker from unregistered > shrinker. Why we need that at all? > > Shrinker registration is split in two parts. The first one is > prealloc_shrinker(), which allocates shrinker memory and reserves ID in > shrinker_idr. This function can fail. The second is > register_shrinker_prepared(), and it finalizes the registration. This > function actually makes shrinker available to be used from > shrink_slab(), and it can't fail. > > One shrinker may be based on more then one LRU lists. So, we never > clear the bit in memcg shrinker maps, when (one of) corresponding LRU > list becomes empty, since other LRU lists may be not empty. See > superblock shrinker for example: it is based on two LRU lists: > s_inode_lru and s_dentry_lru. We do not want to clear shrinker bit, > when there are no inodes in s_inode_lru, as s_dentry_lru may contain > dentries. > > Instead of that, we use special algorithm to detect shrinkers having no > elements at all its LRU lists, and this is made in shrink_slab_memcg(). > See the comment in this function for the details. > > Also, in shrink_slab_memcg() we clear shrinker bit in the map, when we > meet unregistered shrinker (bit is set, while there is no a shrinker in > IDR). Otherwise, we would have done that at the moment of shrinker > unregistration for all memcgs (and this looks worse, since iteration > over all memcg may take much time). Also this would have imposed > restrictions on shrinker unregistration order for its users: they would > have had to guarantee, there are no new elements after > unregister_shrinker() (otherwise, a new added element would have set a > bit). > > So, if we meet a set bit in map and no shrinker in IDR when we're > iterating over the map in shrink_slab_memcg(), this means the > corresponding shrinker is unregistered, and we must clear the bit. > > Another case is shrinker registration. We want two things there: > > 1) do_shrink_slab() can be called only for completely registered > shrinkers; > > 2) shrinker internal lists may be populated in any order with > register_shrinker_prepared() (let's talk on the example with sb). Both > of: > > a)list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru); [cpu0] > memcg_set_shrinker_bit(); [cpu0] > ... > register_shrinker_prepared(); [cpu1] > > and > > b)register_shrinker_prepared(); [cpu0] > ... > list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru); [cpu1] > memcg_set_shrinker_bit(); [cpu1] > > are legitimate. We don't want to impose restriction here and to > force people to use only (b) variant. We don't want to force people to > care, there is no elements in LRU lists before the shrinker is > completely registered. Internal users of LRU lists and shrinker code > are two different subsystems, and they have to be closed in themselves > each other. > > In (a) case we have the bit set before shrinker is completely > registered. We don't want do_shrink_slab() is called at this moment, so > we have to detect such the registering shrinkers. > > Before this patch list_empty() (shrinker is not linked to the list) > check was used for that. So, in (a) there could be a bit set, but we > don't call do_shrink_slab() unless shrinker is linked to the list. It's > just an indicator, I just overloaded linking to the list. > > This was not the best solution, since it's better not to touch the > shrinker memory from shrink_slab_memcg() before it's completely > registered (this also will be useful in the future to make shrink_slab() > completely lockless). > > So, this patch introduces better way to detect registering shrinker, > which allows not to dereference shrinker memory. It's just a ~0UL > value, which we insert into the IDR during ID allocation. After > shrinker is ready to be used, we insert actual shrinker pointer in the > IDR, and it becomes available to shrink_slab_memcg(). > > We can't use NULL instead of this new value for this purpose as: > shrink_slab_memcg() already uses NULL to detect unregistered shrinkers, > and we don't want the function sees NULL and clears the bit, otherwise > (a) won't work. > > This is the only thing the patch makes: the better way to detect > registering shrinker. Nothing else this patch makes. > > Also this gives a better assembler, but it's minor side of the patch: > > Before: > callq <idr_find> > mov %rax,%r15 > test %rax,%rax > je <shrink_slab_memcg+0x1d5> > mov 0x20(%rax),%rax > lea 0x20(%r15),%rdx > cmp %rax,%rdx > je <shrink_slab_memcg+0xbd> > mov 0x8(%rsp),%edx > mov %r15,%rsi > lea 0x10(%rsp),%rdi > callq <do_shrink_slab> > > After: > callq <idr_find> > mov %rax,%r15 > lea -0x1(%rax),%rax > cmp $0xfffffffffffffffd,%rax > ja <shrink_slab_memcg+0x1cd> > mov 0x8(%rsp),%edx > mov %r15,%rsi > lea 0x10(%rsp),%rdi > callq ffffffff810cefd0 <do_shrink_slab> > > [ktk...@virtuozzo.com: add #ifdef CONFIG_MEMCG_KMEM around idr_replace()] > Link: > http://lkml.kernel.org/r/758b8fec-7573-47eb-b26a-7b2847ae7...@virtuozzo.com > Link: > http://lkml.kernel.org/r/153355467546.11522.4518015068123480218.stgit@localhost.localdomain > Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com> > Reviewed-by: Andrew Morton <a...@linux-foundation.org> > Cc: Vladimir Davydov <vdavydov....@gmail.com> > Cc: Michal Hocko <mho...@suse.com> > Cc: Andrey Ryabinin <aryabi...@virtuozzo.com> > Cc: "Huang, Ying" <ying.hu...@intel.com> > Cc: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> > Cc: Matthew Wilcox <wi...@infradead.org> > Cc: Shakeel Butt <shake...@google.com> > Cc: Josef Bacik <jba...@fb.com> > Signed-off-by: Andrew Morton <a...@linux-foundation.org> > Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> > (cherry picked from commit 7e010df53c80197b23119e7d7b95892aa13629df) > Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com> > --- > mm/vmscan.c | 43 +++++++++++++++++++++---------------------- > 1 file changed, 21 insertions(+), 22 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 18e4f74e1ab3..580c8c6843b9 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -183,6 +183,20 @@ static LIST_HEAD(shrinker_list); > static DECLARE_RWSEM(shrinker_rwsem); > > #ifdef CONFIG_MEMCG_KMEM > + > +/* > + * We allow subsystems to populate their shrinker-related > + * LRU lists before register_shrinker_prepared() is called > + * for the shrinker, since we don't want to impose > + * restrictions on their internal registration order. > + * In this case shrink_slab_memcg() may find corresponding > + * bit is set in the shrinkers map. > + * > + * This value is used by the function to detect registering > + * shrinkers and to skip do_shrink_slab() calls for them. > + */ > +#define SHRINKER_REGISTERING ((struct shrinker *)~0UL) > + > static DEFINE_IDR(shrinker_idr); > static int shrinker_nr_max; > > @@ -192,7 +206,7 @@ static int prealloc_memcg_shrinker(struct shrinker > *shrinker) > > down_write(&shrinker_rwsem); > /* This may call shrinker, so it must use down_read_trylock() */ > - id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); > + id = idr_alloc(&shrinker_idr, SHRINKER_REGISTERING, 0, 0, GFP_KERNEL); > if (id < 0) > goto unlock; > > @@ -332,21 +346,6 @@ int prealloc_shrinker(struct shrinker *shrinker) > if (!shrinker->nr_deferred) > return -ENOMEM; > > - /* > - * There is a window between prealloc_shrinker() > - * and register_shrinker_prepared(). We don't want > - * to clear bit of a shrinker in such the state > - * in shrink_slab_memcg(), since this will impose > - * restrictions on a code registering a shrinker > - * (they would have to guarantee, their LRU lists > - * are empty till shrinker is completely registered). > - * So, we differ the situation, when 1)a shrinker > - * is semi-registered (id is assigned, but it has > - * not yet linked to shrinker_list) and 2)shrinker > - * is not registered (id is not assigned). > - */ > - INIT_LIST_HEAD(&shrinker->list); > - > if (shrinker->flags & SHRINKER_MEMCG_AWARE) { > if (prealloc_memcg_shrinker(shrinker)) > goto free_deferred; > @@ -376,6 +375,9 @@ void register_shrinker_prepared(struct shrinker *shrinker) > { > down_write(&shrinker_rwsem); > list_add_tail(&shrinker->list, &shrinker_list); > +#ifdef CONFIG_MEMCG_KMEM > + idr_replace(&shrinker_idr, shrinker, shrinker->id); > +#endif > up_write(&shrinker_rwsem); > } > > @@ -557,15 +559,12 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, > int nid, > struct shrinker *shrinker; > > shrinker = idr_find(&shrinker_idr, i); > - if (unlikely(!shrinker)) { > - clear_bit(i, map->map); > + if (unlikely(!shrinker || shrinker == SHRINKER_REGISTERING)) { > + if (!shrinker) > + clear_bit(i, map->map); > continue; > } > > - /* See comment in prealloc_shrinker() */ > - if (unlikely(list_empty(&shrinker->list))) > - continue; > - > ret = do_shrink_slab(&sc, shrinker, priority); > if (ret == SHRINK_EMPTY) { > clear_bit(i, map->map); > _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel