On 01/16/2015 06:47 AM, Andrew Morton wrote:
On Wed, 14 Jan 2015 17:06:59 +0530 Vinayak Menon <vinme...@codeaurora.org> 
wrote:

It is observed that sometimes multiple tasks get blocked for long
in the congestion_wait loop below, in shrink_inactive_list. This
is because of vm_stat values not being synced.

(__schedule) from [<c0a03328>]
(schedule_timeout) from [<c0a04940>]
(io_schedule_timeout) from [<c01d585c>]
(congestion_wait) from [<c01cc9d8>]
(shrink_inactive_list) from [<c01cd034>]
(shrink_zone) from [<c01cdd08>]
(try_to_free_pages) from [<c01c442c>]
(__alloc_pages_nodemask) from [<c01f1884>]
(new_slab) from [<c09fcf60>]
(__slab_alloc) from [<c01f1a6c>]

In one such instance, zone_page_state(zone, NR_ISOLATED_FILE)
had returned 14, zone_page_state(zone, NR_INACTIVE_FILE)
returned 92, and GFP_IOFS was set, and this resulted
in too_many_isolated returning true. But one of the CPU's
pageset vm_stat_diff had NR_ISOLATED_FILE as "-14". So the
actual isolated count was zero. As there weren't any more
updates to NR_ISOLATED_FILE and vmstat_update deffered work
had not been scheduled yet, 7 tasks were spinning in the
congestion wait loop for around 4 seconds, in the direct
reclaim path.

This patch uses zone_page_state_snapshot instead, but restricts
its usage to avoid performance penalty.

Seems reasonable.


...

@@ -1516,15 +1531,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
lruvec *lruvec,
        unsigned long nr_immediate = 0;
        isolate_mode_t isolate_mode = 0;
        int file = is_file_lru(lru);
+       int safe = 0;
        struct zone *zone = lruvec_zone(lruvec);
        struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;

-       while (unlikely(too_many_isolated(zone, file, sc))) {
+       while (unlikely(too_many_isolated(zone, file, sc, safe))) {
                congestion_wait(BLK_RW_ASYNC, HZ/10);

                /* We are about to die and free our memory. Return now. */
                if (fatal_signal_pending(current))
                        return SWAP_CLUSTER_MAX;
+
+               safe = 1;
        }

But here and under the circumstances you describe, we'll call
congestion_wait() a single time.  That shouldn't have occurred.

So how about we put the fallback logic into too_many_isolated() itself?



congestion_wait was allowed to run once as an optimization, considering that __too_many_isolated (unsafe and faster) can be correct in returning true most of the time. So we avoid calling the safe version, in most of the cases. But I agree that we should not call congestion_wait unnecessarily even in those rare cases. So this looks correct to me.



From: Andrew Morton <a...@linux-foundation.org>
Subject: mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix

Move the zone_page_state_snapshot() fallback logic into
too_many_isolated(), so shrink_inactive_list() doesn't incorrectly call
congestion_wait().

Cc: Johannes Weiner <han...@cmpxchg.org>
Cc: Mel Gorman <mgor...@suse.de>
Cc: Michal Hocko <mho...@suse.cz>
Cc: Minchan Kim <minc...@kernel.org>
Cc: Vinayak Menon <vinme...@codeaurora.org>
Cc: Vladimir Davydov <vdavy...@parallels.com>
Signed-off-by: Andrew Morton <a...@linux-foundation.org>
---

  mm/vmscan.c |   23 +++++++++++------------
  1 file changed, 11 insertions(+), 12 deletions(-)

diff -puN 
mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix 
mm/vmscan.c
--- 
a/mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated-fix
+++ a/mm/vmscan.c
@@ -1402,7 +1402,7 @@ int isolate_lru_page(struct page *page)
  }

  static int __too_many_isolated(struct zone *zone, int file,
-       struct scan_control *sc, int safe)
+                              struct scan_control *sc, int safe)
  {
        unsigned long inactive, isolated;

@@ -1435,7 +1435,7 @@ static int __too_many_isolated(struct zo
   * unnecessary swapping, thrashing and OOM.
   */
  static int too_many_isolated(struct zone *zone, int file,
-               struct scan_control *sc, int safe)
+                            struct scan_control *sc)
  {
        if (current_is_kswapd())
                return 0;
@@ -1443,12 +1443,14 @@ static int too_many_isolated(struct zone
        if (!global_reclaim(sc))
                return 0;

-       if (unlikely(__too_many_isolated(zone, file, sc, 0))) {
-               if (safe)
-                       return __too_many_isolated(zone, file, sc, safe);
-               else
-                       return 1;
-       }
+       /*
+        * __too_many_isolated(safe=0) is fast but inaccurate, because it
+        * doesn't account for the vm_stat_diff[] counters.  So if it looks
+        * like too_many_isolated() is about to return true, fall back to the
+        * slower, more accurate zone_page_state_snapshot().
+        */
+       if (unlikely(__too_many_isolated(zone, file, sc, 0)))
+               return __too_many_isolated(zone, file, sc, safe);

        return 0;
  }
@@ -1540,18 +1542,15 @@ shrink_inactive_list(unsigned long nr_to
        unsigned long nr_immediate = 0;
        isolate_mode_t isolate_mode = 0;
        int file = is_file_lru(lru);
-       int safe = 0;
        struct zone *zone = lruvec_zone(lruvec);
        struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;

-       while (unlikely(too_many_isolated(zone, file, sc, safe))) {
+       while (unlikely(too_many_isolated(zone, file, sc))) {
                congestion_wait(BLK_RW_ASYNC, HZ/10);

                /* We are about to die and free our memory. Return now. */
                if (fatal_signal_pending(current))
                        return SWAP_CLUSTER_MAX;
-
-               safe = 1;
        }

        lru_add_drain();
_



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to