Re: [Qemu-devel] [PATCH 5/6] mm: compaction: Cache if a pageblock was scanned and no pages were isolated

2012-09-20 Thread Rik van Riel

On 09/20/2012 10:04 AM, Mel Gorman wrote:

When compaction was implemented it was known that scanning could potentially
be excessive. The ideal was that a counter be maintained for each pageblock
but maintaining this information would incur a severe penalty due to a
shared writable cache line. It has reached the point where the scanning
costs are an serious problem, particularly on long-lived systems where a
large process starts and allocates a large number of THPs at the same time.

Instead of using a shared counter, this patch adds another bit to the
pageblock flags called PG_migrate_skip. If a pageblock is scanned by
either migrate or free scanner and 0 pages were isolated, the pageblock
is marked to be skipped in the future. When scanning, this bit is checked
before any scanning takes place and the block skipped if set.

The main difficulty with a patch like this is "when to ignore the cached
information?" If it's ignored too often, the scanning rates will still
be excessive. If the information is too stale then allocations will fail
that might have otherwise succeeded. In this patch


Big hammer, but I guess it is effective...

Acked-by: Rik van Riel 




[Qemu-devel] [PATCH 5/6] mm: compaction: Cache if a pageblock was scanned and no pages were isolated

2012-09-20 Thread Mel Gorman
When compaction was implemented it was known that scanning could potentially
be excessive. The ideal was that a counter be maintained for each pageblock
but maintaining this information would incur a severe penalty due to a
shared writable cache line. It has reached the point where the scanning
costs are an serious problem, particularly on long-lived systems where a
large process starts and allocates a large number of THPs at the same time.

Instead of using a shared counter, this patch adds another bit to the
pageblock flags called PG_migrate_skip. If a pageblock is scanned by
either migrate or free scanner and 0 pages were isolated, the pageblock
is marked to be skipped in the future. When scanning, this bit is checked
before any scanning takes place and the block skipped if set.

The main difficulty with a patch like this is "when to ignore the cached
information?" If it's ignored too often, the scanning rates will still
be excessive. If the information is too stale then allocations will fail
that might have otherwise succeeded. In this patch

o CMA always ignores the information
o If the migrate and free scanner meet then the cached information will
  be discarded if it's at least 5 seconds since the last time the cache
  was discarded
o If there are a large number of allocation failures, discard the cache.

The time-based heuristic is very clumsy but there are few choices for a
better event. Depending solely on multiple allocation failures still allows
excessive scanning when THP allocations are failing in quick succession
due to memory pressure. Waiting until memory pressure is relieved would
cause compaction to continually fail instead of using reclaim/compaction
to try allocate the page. The time-based mechanism is clumsy but a better
option is not obvious.

Signed-off-by: Mel Gorman 
---
 include/linux/mmzone.h  |3 ++
 include/linux/pageblock-flags.h |   19 +++-
 mm/compaction.c |   93 +--
 mm/internal.h   |1 +
 mm/page_alloc.c |1 +
 5 files changed, 111 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 603d0b5..a456361 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -368,6 +368,9 @@ struct zone {
 */
spinlock_t  lock;
int all_unreclaimable; /* All pages pinned */
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+   unsigned long   compact_blockskip_expire;
+#endif
 #ifdef CONFIG_MEMORY_HOTPLUG
/* see spanned/present_pages for more description */
seqlock_t   span_seqlock;
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 19ef95d..eed27f4 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -30,6 +30,9 @@ enum pageblock_bits {
PB_migrate,
PB_migrate_end = PB_migrate + 3 - 1,
/* 3 bits required for migrate types */
+#ifdef CONFIG_COMPACTION
+   PB_migrate_skip,/* If set the block is skipped by compaction */
+#endif /* CONFIG_COMPACTION */
NR_PAGEBLOCK_BITS
 };
 
@@ -65,10 +68,22 @@ unsigned long get_pageblock_flags_group(struct page *page,
 void set_pageblock_flags_group(struct page *page, unsigned long flags,
int start_bitidx, int end_bitidx);
 
+#ifdef CONFIG_COMPACTION
+#define get_pageblock_skip(page) \
+   get_pageblock_flags_group(page, PB_migrate_skip, \
+   PB_migrate_skip + 1)
+#define clear_pageblock_skip(page) \
+   set_pageblock_flags_group(page, 0, PB_migrate_skip,  \
+   PB_migrate_skip + 1)
+#define set_pageblock_skip(page) \
+   set_pageblock_flags_group(page, 1, PB_migrate_skip,  \
+   PB_migrate_skip + 1)
+#endif /* CONFIG_COMPACTION */
+
 #define get_pageblock_flags(page) \
-   get_pageblock_flags_group(page, 0, NR_PAGEBLOCK_BITS-1)
+   get_pageblock_flags_group(page, 0, PB_migrate_end)
 #define set_pageblock_flags(page, flags) \
set_pageblock_flags_group(page, flags,  \
- 0, NR_PAGEBLOCK_BITS-1)
+ 0, PB_migrate_end)
 
 #endif /* PAGEBLOCK_FLAGS_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index 6058822..fae0011 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -50,6 +50,64 @@ static inline bool migrate_async_suitable(int migratetype)
return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
 }
 
+/* Returns true if the pageblock should be scanned for pages to isolate. */
+static inline bool isolation_suitable(struct compact_control *cc,
+