Also, this one is needed: 8df4a44cc46b "mm: check shrinker is memcg-aware in 
register_shrinker_prepared()"

On 27.03.2020 19:45, Andrey Ryabinin wrote:
> From: Kirill Tkhai <ktk...@virtuozzo.com>
> 
> The patch introduces a special value SHRINKER_REGISTERING to use instead
> of list_empty() to differ a registering shrinker from unregistered
> shrinker.  Why we need that at all?
> 
> Shrinker registration is split in two parts.  The first one is
> prealloc_shrinker(), which allocates shrinker memory and reserves ID in
> shrinker_idr.  This function can fail.  The second is
> register_shrinker_prepared(), and it finalizes the registration.  This
> function actually makes shrinker available to be used from
> shrink_slab(), and it can't fail.
> 
> One shrinker may be based on more then one LRU lists.  So, we never
> clear the bit in memcg shrinker maps, when (one of) corresponding LRU
> list becomes empty, since other LRU lists may be not empty.  See
> superblock shrinker for example: it is based on two LRU lists:
> s_inode_lru and s_dentry_lru.  We do not want to clear shrinker bit,
> when there are no inodes in s_inode_lru, as s_dentry_lru may contain
> dentries.
> 
> Instead of that, we use special algorithm to detect shrinkers having no
> elements at all its LRU lists, and this is made in shrink_slab_memcg().
> See the comment in this function for the details.
> 
> Also, in shrink_slab_memcg() we clear shrinker bit in the map, when we
> meet unregistered shrinker (bit is set, while there is no a shrinker in
> IDR).  Otherwise, we would have done that at the moment of shrinker
> unregistration for all memcgs (and this looks worse, since iteration
> over all memcg may take much time).  Also this would have imposed
> restrictions on shrinker unregistration order for its users: they would
> have had to guarantee, there are no new elements after
> unregister_shrinker() (otherwise, a new added element would have set a
> bit).
> 
> So, if we meet a set bit in map and no shrinker in IDR when we're
> iterating over the map in shrink_slab_memcg(), this means the
> corresponding shrinker is unregistered, and we must clear the bit.
> 
> Another case is shrinker registration.  We want two things there:
> 
> 1) do_shrink_slab() can be called only for completely registered
>    shrinkers;
> 
> 2) shrinker internal lists may be populated in any order with
>    register_shrinker_prepared() (let's talk on the example with sb).  Both
>    of:
> 
>   a)list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru); [cpu0]
>     memcg_set_shrinker_bit();                               [cpu0]
>     ...
>     register_shrinker_prepared();                           [cpu1]
> 
>   and
> 
>   b)register_shrinker_prepared();                           [cpu0]
>     ...
>     list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru); [cpu1]
>     memcg_set_shrinker_bit();                               [cpu1]
> 
>    are legitimate.  We don't want to impose restriction here and to
>    force people to use only (b) variant.  We don't want to force people to
>    care, there is no elements in LRU lists before the shrinker is
>    completely registered.  Internal users of LRU lists and shrinker code
>    are two different subsystems, and they have to be closed in themselves
>    each other.
> 
> In (a) case we have the bit set before shrinker is completely
> registered.  We don't want do_shrink_slab() is called at this moment, so
> we have to detect such the registering shrinkers.
> 
> Before this patch list_empty() (shrinker is not linked to the list)
> check was used for that.  So, in (a) there could be a bit set, but we
> don't call do_shrink_slab() unless shrinker is linked to the list.  It's
> just an indicator, I just overloaded linking to the list.
> 
> This was not the best solution, since it's better not to touch the
> shrinker memory from shrink_slab_memcg() before it's completely
> registered (this also will be useful in the future to make shrink_slab()
> completely lockless).
> 
> So, this patch introduces better way to detect registering shrinker,
> which allows not to dereference shrinker memory.  It's just a ~0UL
> value, which we insert into the IDR during ID allocation.  After
> shrinker is ready to be used, we insert actual shrinker pointer in the
> IDR, and it becomes available to shrink_slab_memcg().
> 
> We can't use NULL instead of this new value for this purpose as:
> shrink_slab_memcg() already uses NULL to detect unregistered shrinkers,
> and we don't want the function sees NULL and clears the bit, otherwise
> (a) won't work.
> 
> This is the only thing the patch makes: the better way to detect
> registering shrinker.  Nothing else this patch makes.
> 
> Also this gives a better assembler, but it's minor side of the patch:
> 
> Before:
>   callq  <idr_find>
>   mov    %rax,%r15
>   test   %rax,%rax
>   je     <shrink_slab_memcg+0x1d5>
>   mov    0x20(%rax),%rax
>   lea    0x20(%r15),%rdx
>   cmp    %rax,%rdx
>   je     <shrink_slab_memcg+0xbd>
>   mov    0x8(%rsp),%edx
>   mov    %r15,%rsi
>   lea    0x10(%rsp),%rdi
>   callq  <do_shrink_slab>
> 
> After:
>   callq  <idr_find>
>   mov    %rax,%r15
>   lea    -0x1(%rax),%rax
>   cmp    $0xfffffffffffffffd,%rax
>   ja     <shrink_slab_memcg+0x1cd>
>   mov    0x8(%rsp),%edx
>   mov    %r15,%rsi
>   lea    0x10(%rsp),%rdi
>   callq  ffffffff810cefd0 <do_shrink_slab>
> 
> [ktk...@virtuozzo.com: add #ifdef CONFIG_MEMCG_KMEM around idr_replace()]
>   Link: 
> http://lkml.kernel.org/r/758b8fec-7573-47eb-b26a-7b2847ae7...@virtuozzo.com
> Link: 
> http://lkml.kernel.org/r/153355467546.11522.4518015068123480218.stgit@localhost.localdomain
> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>
> Reviewed-by: Andrew Morton <a...@linux-foundation.org>
> Cc: Vladimir Davydov <vdavydov....@gmail.com>
> Cc: Michal Hocko <mho...@suse.com>
> Cc: Andrey Ryabinin <aryabi...@virtuozzo.com>
> Cc: "Huang, Ying" <ying.hu...@intel.com>
> Cc: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp>
> Cc: Matthew Wilcox <wi...@infradead.org>
> Cc: Shakeel Butt <shake...@google.com>
> Cc: Josef Bacik <jba...@fb.com>
> Signed-off-by: Andrew Morton <a...@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
> (cherry picked from commit 7e010df53c80197b23119e7d7b95892aa13629df)
> Signed-off-by: Andrey Ryabinin <aryabi...@virtuozzo.com>
> ---
>  mm/vmscan.c | 43 +++++++++++++++++++++----------------------
>  1 file changed, 21 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 18e4f74e1ab3..580c8c6843b9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -183,6 +183,20 @@ static LIST_HEAD(shrinker_list);
>  static DECLARE_RWSEM(shrinker_rwsem);
>  
>  #ifdef CONFIG_MEMCG_KMEM
> +
> +/*
> + * We allow subsystems to populate their shrinker-related
> + * LRU lists before register_shrinker_prepared() is called
> + * for the shrinker, since we don't want to impose
> + * restrictions on their internal registration order.
> + * In this case shrink_slab_memcg() may find corresponding
> + * bit is set in the shrinkers map.
> + *
> + * This value is used by the function to detect registering
> + * shrinkers and to skip do_shrink_slab() calls for them.
> + */
> +#define SHRINKER_REGISTERING ((struct shrinker *)~0UL)
> +
>  static DEFINE_IDR(shrinker_idr);
>  static int shrinker_nr_max;
>  
> @@ -192,7 +206,7 @@ static int prealloc_memcg_shrinker(struct shrinker 
> *shrinker)
>  
>       down_write(&shrinker_rwsem);
>       /* This may call shrinker, so it must use down_read_trylock() */
> -     id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
> +     id = idr_alloc(&shrinker_idr, SHRINKER_REGISTERING, 0, 0, GFP_KERNEL);
>       if (id < 0)
>               goto unlock;
>  
> @@ -332,21 +346,6 @@ int prealloc_shrinker(struct shrinker *shrinker)
>       if (!shrinker->nr_deferred)
>               return -ENOMEM;
>  
> -     /*
> -      * There is a window between prealloc_shrinker()
> -      * and register_shrinker_prepared(). We don't want
> -      * to clear bit of a shrinker in such the state
> -      * in shrink_slab_memcg(), since this will impose
> -      * restrictions on a code registering a shrinker
> -      * (they would have to guarantee, their LRU lists
> -      * are empty till shrinker is completely registered).
> -      * So, we differ the situation, when 1)a shrinker
> -      * is semi-registered (id is assigned, but it has
> -      * not yet linked to shrinker_list) and 2)shrinker
> -      * is not registered (id is not assigned).
> -      */
> -     INIT_LIST_HEAD(&shrinker->list);
> -
>       if (shrinker->flags & SHRINKER_MEMCG_AWARE) {
>               if (prealloc_memcg_shrinker(shrinker))
>                       goto free_deferred;
> @@ -376,6 +375,9 @@ void register_shrinker_prepared(struct shrinker *shrinker)
>  {
>       down_write(&shrinker_rwsem);
>       list_add_tail(&shrinker->list, &shrinker_list);
> +#ifdef CONFIG_MEMCG_KMEM
> +     idr_replace(&shrinker_idr, shrinker, shrinker->id);
> +#endif
>       up_write(&shrinker_rwsem);
>  }
>  
> @@ -557,15 +559,12 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, 
> int nid,
>               struct shrinker *shrinker;
>  
>               shrinker = idr_find(&shrinker_idr, i);
> -             if (unlikely(!shrinker)) {
> -                     clear_bit(i, map->map);
> +             if (unlikely(!shrinker || shrinker == SHRINKER_REGISTERING)) {
> +                     if (!shrinker)
> +                             clear_bit(i, map->map);
>                       continue;
>               }
>  
> -             /* See comment in prealloc_shrinker() */
> -             if (unlikely(list_empty(&shrinker->list)))
> -                     continue;
> -
>               ret = do_shrink_slab(&sc, shrinker, priority);
>               if (ret == SHRINK_EMPTY) {
>                       clear_bit(i, map->map);
> 

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to