On Wed, May 25, 2016 at 02:06:17PM +0800, Ye Xiaolong wrote: >On Mon, May 23, 2016 at 04:46:05PM -0400, Johannes Weiner wrote: >>Hi, >> >>thanks for your report. >> >>On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote: >>> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit: >>> >>> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: >>> per-cgroup cache thrash detection") >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master >>> >>> in testcase: vm-scalability >>> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory >>> with following conditions: >>> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice >> >>That test hammers the LRU activation path, to which this patch added >>the cgroup lookup and pinning code. Does the following patch help? >>
Hi, Johannes FYI, I have done more tests with your fix patch. 1) apply it on top of latest kernel (head commit: 478a1469("Merge tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm") The following is the comparison info among first bad commit's parent, first bad commit, head commit of linus' master branch, your fix commit(a7abed95): ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability commit: 612e44939c3c77245ac80843c0c7876c8cf97282 23047a96d7cfcfca1a6d026ecaec526ea4803e9e 478a1469a7d27fe6b2f85fc801ecdeb8afc836e6 a7abed950afdc1186d4eaf442b7eb296ff04c947 612e44939c3c7724 23047a96d7cfcfca1a6d026eca 478a1469a7d27fe6b2f85fc801 a7abed950afdc1186d4eaf442b ---------------- -------------------------- -------------------------- -------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 28384711 ± 0% -23.8% 21621405 ± 0% -12.4% 24865101 ± 4% -8.1% 26076417 ± 3% vm-scalability.throughput 1854112 ± 0% -7.7% 1711141 ± 0% +6.4% 1973257 ± 4% +9.2% 2025214 ± 3% vm-scalability.time.involuntary_context_switches 5279 ± 0% -0.7% 5243 ± 0% -2.6% 5143 ± 0% -2.4% 5153 ± 0% vm-scalability.time.percent_of_cpu_this_job_got 16267 ± 0% -0.6% 16173 ± 0% -2.0% 15934 ± 0% -1.8% 15978 ± 0% vm-scalability.time.system_time 176.03 ± 0% -22.2% 136.95 ± 1% -10.4% 157.66 ± 1% -11.2% 156.32 ± 0% vm-scalability.time.user_time 302905 ± 2% -31.2% 208386 ± 0% +5.8% 320618 ± 47% -36.0% 193991 ± 22% vm-scalability.time.voluntary_context_switches 0.92 ± 2% +51.0% 1.38 ± 2% +96.5% 1.80 ± 0% +97.3% 1.81 ± 0% perf-profile.cycles-pp.kswapd 2585 ± 1% -1.9% 2536 ± 0% +9.6% 2834 ± 1% +10.7% 2862 ± 1% uptime.idle 754212 ± 1% -29.2% 533832 ± 2% -34.8% 491397 ± 2% -27.5% 546666 ± 8% softirqs.RCU 151918 ± 8% +5.7% 160522 ± 5% -17.4% 125419 ± 18% -22.7% 117409 ± 7% softirqs.SCHED 176.03 ± 0% -22.2% 136.95 ± 1% -10.4% 157.66 ± 1% -11.2% 156.32 ± 0% time.user_time 302905 ± 2% -31.2% 208386 ± 0% +5.8% 320618 ± 47% -36.0% 193991 ± 22% time.voluntary_context_switches 2) apply it on top of v4.6 (head commit: 2dcd0af5("Linux 4.6")) The following is the comparison info among first bad commit's parent, first bad commit, v4.6, your fix commit(c05f8814): ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability commit: 612e44939c3c77245ac80843c0c7876c8cf97282 23047a96d7cfcfca1a6d026ecaec526ea4803e9e v4.6 c05f8814641ceabbc628cd4edc7f64ff58498d5a 612e44939c3c7724 23047a96d7cfcfca1a6d026eca v4.6 c05f8814641ceabbc628cd4edc ---------------- -------------------------- -------------------------- -------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 28384711 ± 0% -23.8% 21621405 ± 0% -18.9% 23013011 ± 0% -19.2% 22937943 ± 0% vm-scalability.throughput 1854112 ± 0% -7.7% 1711141 ± 0% -5.2% 1757124 ± 0% -4.9% 1762398 ± 0% vm-scalability.time.involuntary_context_switches 66021 ± 0% -0.4% 65745 ± 0% +1.8% 67231 ± 1% +3.4% 68291 ± 0% vm-scalability.time.minor_page_faults 5279 ± 0% -0.7% 5243 ± 0% -1.6% 5197 ± 0% -1.5% 5198 ± 0% vm-scalability.time.percent_of_cpu_this_job_got 16267 ± 0% -0.6% 16173 ± 0% -1.5% 16030 ± 0% -1.4% 16032 ± 0% vm-scalability.time.system_time 176.03 ± 0% -22.2% 136.95 ± 1% -17.8% 144.66 ± 1% -17.0% 146.15 ± 1% vm-scalability.time.user_time 302905 ± 2% -31.2% 208386 ± 0% -19.4% 244167 ± 1% -19.3% 244584 ± 1% vm-scalability.time.voluntary_context_switches 23999 ± 2% -5.9% 22575 ± 2% -7.1% 22291 ± 3% -7.0% 22319 ± 1% meminfo.Mapped 0.92 ± 2% +51.0% 1.38 ± 2% +96.8% 1.81 ± 0% +95.5% 1.79 ± 0% perf-profile.cycles-pp.kswapd 754212 ± 1% -29.2% 533832 ± 2% -41.1% 444019 ± 4% -42.8% 431345 ± 0% softirqs.RCU 20518 ± 2% -8.1% 18866 ± 2% -2.6% 19980 ± 3% -2.9% 19926 ± 7% vmstat.system.cs 10574 ± 19% +29.9% 13737 ± 8% +1.6% 10740 ± 33% +16.9% 12359 ± 17% numa-meminfo.node0.Mapped 13490 ± 13% -36.6% 8549 ± 17% -13.3% 11689 ± 29% -26.5% 9912 ± 22% numa-meminfo.node1.Mapped 176.03 ± 0% -22.2% 136.95 ± 1% -17.8% 144.66 ± 1% -17.0% 146.15 ± 1% time.user_time 302905 ± 2% -31.2% 208386 ± 0% -19.4% 244167 ± 1% -19.3% 244584 ± 1% time.voluntary_context_switches Thanks, Xiaolong > >Hi, > >Here is the comparison of original first bad commit (23047a96d) and your new >patch (063f6715e). >vm-scalability.throughput improved 11.3%. > > >compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: > > gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability > >commit: > 23047a96d7cfcfca1a6d026ecaec526ea4803e9e > 063f6715e77a7be5770d6081fe6d7ca2437ac9f2 > >23047a96d7cfcfca 063f6715e77a7be5770d6081fe >---------------- -------------------------- > %stddev %change %stddev > \ | \ > 21621405 ± 0% +11.3% 24069657 ± 2% vm-scalability.throughput > 1711141 ± 0% +40.9% 2411083 ± 2% > vm-scalability.time.involuntary_context_switches > 2747 ± 0% +2.4% 2812 ± 0% > vm-scalability.time.maximum_resident_set_size > 5243 ± 0% -1.2% 5180 ± 0% > vm-scalability.time.percent_of_cpu_this_job_got > 136.95 ± 1% +13.6% 155.55 ± 0% vm-scalability.time.user_time > 208386 ± 0% -71.5% 59394 ± 16% > vm-scalability.time.voluntary_context_switches > 1.38 ± 2% +21.7% 1.69 ± 2% perf-profile.cycles-pp.kswapd > 160522 ± 5% -30.0% 112342 ± 2% softirqs.SCHED > 2536 ± 0% +7.3% 2722 ± 2% uptime.idle > 1711141 ± 0% +40.9% 2411083 ± 2% time.involuntary_context_switches > 136.95 ± 1% +13.6% 155.55 ± 0% time.user_time > 208386 ± 0% -71.5% 59394 ± 16% time.voluntary_context_switches > 1052 ± 13% +1453.8% 16346 ± 39% cpuidle.C1-HSW.usage > 1045 ± 12% -54.3% 477.50 ± 25% cpuidle.C3-HSW.usage > 5.719e+08 ± 1% +17.9% 6.743e+08 ± 0% cpuidle.C6-HSW.time > 40424411 ± 2% -97.3% 1076732 ± 99% cpuidle.POLL.time > 7179 ± 5% -99.9% 6.50 ± 53% cpuidle.POLL.usage > 0.51 ± 8% -40.6% 0.30 ± 13% turbostat.CPU%c1 > 2.83 ± 2% +30.5% 3.70 ± 0% turbostat.CPU%c6 > 0.23 ± 79% +493.4% 1.35 ± 2% turbostat.Pkg%pc2 > 255.52 ± 0% +3.3% 263.95 ± 0% turbostat.PkgWatt > 53.26 ± 0% +14.9% 61.22 ± 0% turbostat.RAMWatt > 1836104 ± 0% +13.3% 2079934 ± 4% vmstat.memory.free > 5.00 ± 0% -70.0% 1.50 ± 33% vmstat.procs.b > 107.00 ± 0% +8.4% 116.00 ± 2% vmstat.procs.r > 18866 ± 2% +40.1% 26436 ± 13% vmstat.system.cs > 69056 ± 0% +11.8% 77219 ± 1% vmstat.system.in > 31628132 ± 0% +80.9% 57224963 ± 0% meminfo.Active > 31294504 ± 0% +81.7% 56876042 ± 0% meminfo.Active(file) > 142271 ± 6% +11.2% 158138 ± 5% meminfo.DirectMap4k > 30612825 ± 0% -87.2% 3915695 ± 0% meminfo.Inactive > 30562772 ± 0% -87.4% 3862631 ± 0% meminfo.Inactive(file) > 15635 ± 1% +38.0% 21572 ± 8% meminfo.KernelStack > 22575 ± 2% +7.7% 24316 ± 4% meminfo.Mapped > 1762372 ± 3% +12.2% 1976873 ± 3% meminfo.MemFree > 847557 ± 0% +105.5% 1741958 ± 8% meminfo.SReclaimable > 946378 ± 0% +95.1% 1846370 ± 8% meminfo.Slab > >Thanks, >Xiaolong > >>From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001 >>From: Johannes Weiner <han...@cmpxchg.org> >>Date: Mon, 23 May 2016 16:14:24 -0400 >>Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation() >> >>23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection") >>added cgroup lookup and pinning overhead to the LRU activation path, >>which the vm-scalability benchmark is particularly sensitive to. >> >>Inline the lookup functions to eliminate calls. Furthermore, since >>activations are not moved when pages are moved between memcgs, we >>don't need the full page->mem_cgroup locking; holding the RCU lock is >>enough to prevent the memcg from being freed. >> >>Signed-off-by: Johannes Weiner <han...@cmpxchg.org> >>--- >> include/linux/memcontrol.h | 43 ++++++++++++++++++++++++++++++++++++++++++- >> include/linux/mm.h | 8 ++++++++ >> mm/memcontrol.c | 42 ------------------------------------------ >> mm/workingset.c | 10 ++++++---- >> 4 files changed, 56 insertions(+), 47 deletions(-) >> >>diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >>index a805474df4ab..0bb36cf89bf6 100644 >>--- a/include/linux/memcontrol.h >>+++ b/include/linux/memcontrol.h >>@@ -306,7 +306,48 @@ void mem_cgroup_uncharge_list(struct list_head >>*page_list); >> >> void mem_cgroup_migrate(struct page *oldpage, struct page *newpage); >> >>-struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); >>+static inline struct mem_cgroup_per_zone * >>+mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone) >>+{ >>+ int nid = zone_to_nid(zone); >>+ int zid = zone_idx(zone); >>+ >>+ return &memcg->nodeinfo[nid]->zoneinfo[zid]; >>+} >>+ >>+/** >>+ * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg >>+ * @zone: zone of the wanted lruvec >>+ * @memcg: memcg of the wanted lruvec >>+ * >>+ * Returns the lru list vector holding pages for the given @zone and >>+ * @mem. This can be the global zone lruvec, if the memory controller >>+ * is disabled. >>+ */ >>+static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone, >>+ struct mem_cgroup *memcg) >>+{ >>+ struct mem_cgroup_per_zone *mz; >>+ struct lruvec *lruvec; >>+ >>+ if (mem_cgroup_disabled()) { >>+ lruvec = &zone->lruvec; >>+ goto out; >>+ } >>+ >>+ mz = mem_cgroup_zone_zoneinfo(memcg, zone); >>+ lruvec = &mz->lruvec; >>+out: >>+ /* >>+ * Since a node can be onlined after the mem_cgroup was created, >>+ * we have to be prepared to initialize lruvec->zone here; >>+ * and if offlined then reonlined, we need to reinitialize it. >>+ */ >>+ if (unlikely(lruvec->zone != zone)) >>+ lruvec->zone = zone; >>+ return lruvec; >>+} >>+ >> struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); >> >> bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); >>diff --git a/include/linux/mm.h b/include/linux/mm.h >>index b530c99e8e81..a9dd54e196a7 100644 >>--- a/include/linux/mm.h >>+++ b/include/linux/mm.h >>@@ -943,11 +943,19 @@ static inline struct mem_cgroup *page_memcg(struct page >>*page) >> { >> return page->mem_cgroup; >> } >>+static inline struct mem_cgroup *page_memcg_rcu(struct page *page) >>+{ >>+ return READ_ONCE(page->mem_cgroup); >>+} >> #else >> static inline struct mem_cgroup *page_memcg(struct page *page) >> { >> return NULL; >> } >>+static inline struct mem_cgroup *page_memcg_rcu(struct page *page) >>+{ >>+ return NULL; >>+} >> #endif >> >> /* >>diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>index b3f16ab4b431..f65e5e527864 100644 >>--- a/mm/memcontrol.c >>+++ b/mm/memcontrol.c >>@@ -323,15 +323,6 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key); >> >> #endif /* !CONFIG_SLOB */ >> >>-static struct mem_cgroup_per_zone * >>-mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone) >>-{ >>- int nid = zone_to_nid(zone); >>- int zid = zone_idx(zone); >>- >>- return &memcg->nodeinfo[nid]->zoneinfo[zid]; >>-} >>- >> /** >> * mem_cgroup_css_from_page - css of the memcg associated with a page >> * @page: page of interest >>@@ -944,39 +935,6 @@ static void invalidate_reclaim_iterators(struct >>mem_cgroup *dead_memcg) >> iter = mem_cgroup_iter(NULL, iter, NULL)) >> >> /** >>- * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg >>- * @zone: zone of the wanted lruvec >>- * @memcg: memcg of the wanted lruvec >>- * >>- * Returns the lru list vector holding pages for the given @zone and >>- * @mem. This can be the global zone lruvec, if the memory controller >>- * is disabled. >>- */ >>-struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone, >>- struct mem_cgroup *memcg) >>-{ >>- struct mem_cgroup_per_zone *mz; >>- struct lruvec *lruvec; >>- >>- if (mem_cgroup_disabled()) { >>- lruvec = &zone->lruvec; >>- goto out; >>- } >>- >>- mz = mem_cgroup_zone_zoneinfo(memcg, zone); >>- lruvec = &mz->lruvec; >>-out: >>- /* >>- * Since a node can be onlined after the mem_cgroup was created, >>- * we have to be prepared to initialize lruvec->zone here; >>- * and if offlined then reonlined, we need to reinitialize it. >>- */ >>- if (unlikely(lruvec->zone != zone)) >>- lruvec->zone = zone; >>- return lruvec; >>-} >>- >>-/** >> * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page >> * @page: the page >> * @zone: zone of the page >>diff --git a/mm/workingset.c b/mm/workingset.c >>index 8a75f8d2916a..8252de4566e9 100644 >>--- a/mm/workingset.c >>+++ b/mm/workingset.c >>@@ -305,9 +305,10 @@ bool workingset_refault(void *shadow) >> */ >> void workingset_activation(struct page *page) >> { >>+ struct mem_cgroup *memcg; >> struct lruvec *lruvec; >> >>- lock_page_memcg(page); >>+ rcu_read_lock(); >> /* >> * Filter non-memcg pages here, e.g. unmap can call >> * mark_page_accessed() on VDSO pages. >>@@ -315,12 +316,13 @@ void workingset_activation(struct page *page) >> * XXX: See workingset_refault() - this should return >> * root_mem_cgroup even for !CONFIG_MEMCG. >> */ >>- if (!mem_cgroup_disabled() && !page_memcg(page)) >>+ memcg = page_memcg_rcu(page); >>+ if (!mem_cgroup_disabled() && !memcg) >> goto out; >>- lruvec = mem_cgroup_zone_lruvec(page_zone(page), page_memcg(page)); >>+ lruvec = mem_cgroup_zone_lruvec(page_zone(page), memcg); >> atomic_long_inc(&lruvec->inactive_age); >> out: >>- unlock_page_memcg(page); >>+ rcu_read_unlock(); >> } >> >> /* >>-- >>2.8.2 >> >_______________________________________________ >LKP mailing list >l...@lists.01.org >https://lists.01.org/mailman/listinfo/lkp