On Fri, Feb 20, 2015 at 07:51:16PM -0800, Hugh Dickins wrote:
> Though debug kernels have a VM_BUG_ON to help protect from misaccounting
> lru_size, non-debug kernels are liable to wrap it around: and then the
> vast unsigned long size draws page reclaim into a loop of repeatedly
> doing nothing on an empty list, without even a cond_resched().
> 
> That soft lockup looks confusingly like an over-busy reclaim scenario,
> with lots of contention on the lruvec lock in shrink_inactive_list():
> yet has a totally different origin.
> 
> Help differentiate with a custom warning in mem_cgroup_update_lru_size(),
> even in non-debug kernels; and reset the size to avoid the lockup.  But
> the particular bug which suggested this change was mine alone, and since
> fixed.

Do we need this kind of check for !MEMCG kernels?

> Signed-off-by: Hugh Dickins <hu...@google.com>
> ---
>  include/linux/mm_inline.h |    2 +-
>  mm/memcontrol.c           |   24 ++++++++++++++++++++----
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> --- thpfs.orig/include/linux/mm_inline.h      2013-11-03 15:41:51.000000000 
> -0800
> +++ thpfs/include/linux/mm_inline.h   2015-02-20 19:33:25.928096883 -0800
> @@ -35,8 +35,8 @@ static __always_inline void del_page_fro
>                               struct lruvec *lruvec, enum lru_list lru)
>  {
>       int nr_pages = hpage_nr_pages(page);
> -     mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>       list_del(&page->lru);
> +     mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>       __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, 
> -nr_pages);
>  }
>  
> --- thpfs.orig/mm/memcontrol.c        2015-02-08 18:54:22.000000000 -0800
> +++ thpfs/mm/memcontrol.c     2015-02-20 19:33:25.928096883 -0800
> @@ -1296,22 +1296,38 @@ out:
>   * @lru: index of lru list the page is sitting on
>   * @nr_pages: positive when adding or negative when removing
>   *
> - * This function must be called when a page is added to or removed from an
> - * lru list.
> + * This function must be called under lruvec lock, just before a page is 
> added
> + * to or just after a page is removed from an lru list (that ordering being 
> so
> + * as to allow it to check that lru_size 0 is consistent with list_empty).
>   */
>  void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>                               int nr_pages)
>  {
>       struct mem_cgroup_per_zone *mz;
>       unsigned long *lru_size;
> +     long size;
> +     bool empty;
>  
>       if (mem_cgroup_disabled())
>               return;
>  
>       mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec);
>       lru_size = mz->lru_size + lru;
> -     *lru_size += nr_pages;
> -     VM_BUG_ON((long)(*lru_size) < 0);
> +     empty = list_empty(lruvec->lists + lru);
> +
> +     if (nr_pages < 0)
> +             *lru_size += nr_pages;
> +
> +     size = *lru_size;
> +     if (WARN(size < 0 || empty != !size,
> +     "mem_cgroup_update_lru_size(%p, %d, %d): lru_size %ld but %sempty\n",
> +                     lruvec, lru, nr_pages, size, empty ? "" : "not ")) {

Formatting can be unscrewed this way:

        if (WARN(size < 0 || empty != !size,
                "%s(%p, %d, %d): lru_size %ld but %sempty\n",
                __func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) {

> +             VM_BUG_ON(1);
> +             *lru_size = 0;
> +     }
> +
> +     if (nr_pages > 0)
> +             *lru_size += nr_pages;
>  }
>  
>  bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, struct mem_cgroup 
> *root)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to