RE: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression
Hi, Waiman Did you post that patch? Let's see if it helps. -Original Message- From: LKP [mailto:lkp-boun...@lists.01.org] On Behalf Of Waiman Long Sent: Tuesday, November 6, 2018 6:40 AM To: Linus Torvalds ; vba...@suse.cz; Davidlohr Bueso Cc: yang@linux.alibaba.com; Linux Kernel Mailing List ; Matthew Wilcox ; mho...@kernel.org; Colin King ; Andrew Morton ; lduf...@linux.vnet.ibm.com; l...@01.org; kirill.shute...@linux.intel.com Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression On 11/05/2018 05:14 PM, Linus Torvalds wrote: > On Mon, Nov 5, 2018 at 12:12 PM Vlastimil Babka wrote: >> I didn't spot an obvious mistake in the patch itself, so it looks >> like some bad interaction between scheduler and the mmap downgrade? > I'm thinking it's RWSEM_SPIN_ON_OWNER that ends up being confused by > the downgrade. > > It looks like the benchmark used to be basically CPU-bound, at about > 800% CPU, and now it's somewhere in the 200% CPU region: > > will-it-scale.time.percent_of_cpu_this_job_got > > 800 +-+---+ > |.+.+.+.+.+.+.+. .+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+.+.+. .+.+.+.| > 700 +-+ +.+ | > | | > 600 +-+ | > | | > 500 +-+ | > | | > 400 +-+ | > | | > 300 +-+ | > | | > 200 O-O O O O OO | > | O O O O O O O O O O O O O O O O O O| > 100 +-+---+ > > which sounds like the downgrade really messes with the "spin waiting > for lock" logic. > > I'm thinking it's the "wake up waiter" logic that has some bad > interaction with spinning, and breaks that whole optimization. > > Adding Waiman and Davidlohr to the participants, because they seem to > be the obvious experts in this area. > > Linus Optimistic spinning on rwsem is done only on writers spinning on a writer-owned rwsem. If a write-lock is downgraded to a read-lock, all the spinning waiters will quit. That may explain the drop in cpu utilization. I do have a old patch that enable a certain amount of reader spinning which may help the situation. I can rebase that and send it out for review if people have interest. Cheers, Longman ___ LKP mailing list l...@lists.01.org https://lists.01.org/mailman/listinfo/lkp
RE: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression
Hi, SeongJae Any update or any info you need from my side? -Original Message- From: SeongJae Park [mailto:sj38.p...@gmail.com] Sent: Wednesday, July 11, 2018 12:53 AM To: Wang, Kemi Cc: Ye, Xiaolong ; ax...@kernel.dk; ax...@fb.com; l...@01.org; linux-kernel@vger.kernel.org Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression Oops, I mistakenly found this mail now. I will look inside for this though it will take some time because I will not be in office for this week. Thanks, SeongJae Park On Tue, Jul 10, 2018 at 1:30 AM kemi wrote: > > Hi, SeongJae > Do you have any input for this regression? thanks > > On 2018年06月04日 13:52, kernel test robot wrote: > > > > Greeting, > > > > FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit: > > > > > > commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as > > non-rotational") > > https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git > > for-4.18/block > > > > in testcase: aim7 > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ > > 3.00GHz with 384G memory with following parameters: > > > > disk: 1BRD_48G > > fs: btrfs > > test: disk_rw > > load: 1500 > > cpufreq_governor: performance > > > > test-description: AIM7 is a traditional UNIX system level benchmark suite > > which is used to test and measure the performance of multiuser system. > > test-url: > > https://sourceforge.net/projects/aimbench/files/aim-suite7/ > > > > > > > > Details are as below: > > --> > > > > > > = > > compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase: > > > > gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64- > > 2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7 > > > > commit: > > 522a777566 ("block: consolidate struct request timestamp fields") > > 316ba5736c ("brd: Mark as non-rotational") > > > > 522a777566f56696 316ba5736c9caa5dbcd8408598 > > -- > > %stddev %change %stddev > > \ |\ > > 28321 -11.2% 25147aim7.jobs-per-min > > 318.19 +12.6% 358.23aim7.time.elapsed_time > > 318.19 +12.6% 358.23aim7.time.elapsed_time.max > >1437526 ± 2% +14.6%1646849 ± 2% > > aim7.time.involuntary_context_switches > > 11986 +14.2% 13691aim7.time.system_time > > 73.06 ± 2% -3.6% 70.43aim7.time.user_time > >2449470 ± 2% -25.0%1837521 ± 4% > > aim7.time.voluntary_context_switches > > 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked > > 456062 -16.3% 381859softirqs.SCHED > > 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree > > 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked > > 5.24 ± 3% -1.23.99 ± 2% mpstat.cpu.idle% > > 0.61 ± 2% -0.10.52 ± 2% mpstat.cpu.usr% > > 16627 +12.8% 18762 ± 4% > > slabinfo.Acpi-State.active_objs > > 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs > > 57.00 ± 2% +17.5% 67.00vmstat.procs.r > > 20936 -24.8% 15752 ± 2% vmstat.system.cs > > 45474-1.7% 44681vmstat.system.in > > 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock > > 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written > > 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma > > 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock > > 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time > >1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage > >3499814 ± 2% -38.5%2153158 ± 5% cpuidle.C1E.time > > 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage > >9865857 ± 3% -40.1%5905155 ± 5% cpuidle.C3.time > > 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage > > 590856 ± 2% -12.3% 517910cpuidle.C6.usage > > 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL
RE: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression
Hi, SeongJae Any update or any info you need from my side? -Original Message- From: SeongJae Park [mailto:sj38.p...@gmail.com] Sent: Wednesday, July 11, 2018 12:53 AM To: Wang, Kemi Cc: Ye, Xiaolong ; ax...@kernel.dk; ax...@fb.com; l...@01.org; linux-kernel@vger.kernel.org Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression Oops, I mistakenly found this mail now. I will look inside for this though it will take some time because I will not be in office for this week. Thanks, SeongJae Park On Tue, Jul 10, 2018 at 1:30 AM kemi wrote: > > Hi, SeongJae > Do you have any input for this regression? thanks > > On 2018年06月04日 13:52, kernel test robot wrote: > > > > Greeting, > > > > FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit: > > > > > > commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as > > non-rotational") > > https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git > > for-4.18/block > > > > in testcase: aim7 > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ > > 3.00GHz with 384G memory with following parameters: > > > > disk: 1BRD_48G > > fs: btrfs > > test: disk_rw > > load: 1500 > > cpufreq_governor: performance > > > > test-description: AIM7 is a traditional UNIX system level benchmark suite > > which is used to test and measure the performance of multiuser system. > > test-url: > > https://sourceforge.net/projects/aimbench/files/aim-suite7/ > > > > > > > > Details are as below: > > --> > > > > > > = > > compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase: > > > > gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64- > > 2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7 > > > > commit: > > 522a777566 ("block: consolidate struct request timestamp fields") > > 316ba5736c ("brd: Mark as non-rotational") > > > > 522a777566f56696 316ba5736c9caa5dbcd8408598 > > -- > > %stddev %change %stddev > > \ |\ > > 28321 -11.2% 25147aim7.jobs-per-min > > 318.19 +12.6% 358.23aim7.time.elapsed_time > > 318.19 +12.6% 358.23aim7.time.elapsed_time.max > >1437526 ± 2% +14.6%1646849 ± 2% > > aim7.time.involuntary_context_switches > > 11986 +14.2% 13691aim7.time.system_time > > 73.06 ± 2% -3.6% 70.43aim7.time.user_time > >2449470 ± 2% -25.0%1837521 ± 4% > > aim7.time.voluntary_context_switches > > 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked > > 456062 -16.3% 381859softirqs.SCHED > > 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree > > 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked > > 5.24 ± 3% -1.23.99 ± 2% mpstat.cpu.idle% > > 0.61 ± 2% -0.10.52 ± 2% mpstat.cpu.usr% > > 16627 +12.8% 18762 ± 4% > > slabinfo.Acpi-State.active_objs > > 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs > > 57.00 ± 2% +17.5% 67.00vmstat.procs.r > > 20936 -24.8% 15752 ± 2% vmstat.system.cs > > 45474-1.7% 44681vmstat.system.in > > 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock > > 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written > > 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma > > 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock > > 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time > >1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage > >3499814 ± 2% -38.5%2153158 ± 5% cpuidle.C1E.time > > 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage > >9865857 ± 3% -40.1%5905155 ± 5% cpuidle.C3.time > > 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage > > 590856 ± 2% -12.3% 517910cpuidle.C6.usage > > 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL
RE: [PATCH v11 00/26] Speculative page faults
Full run would take one or two weeks depended on our resource available. Could you pick some ones up, e.g. those have performance regression? -Original Message- From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of Laurent Dufour Sent: Monday, May 28, 2018 4:55 PM To: Song, HaiyanX <haiyanx.s...@intel.com> Cc: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; Matthew Wilcox <wi...@infradead.org>; khand...@linux.vnet.ibm.com; aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; m...@ellerman.id.au; pau...@samba.org; Thomas Gleixner <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; h...@zytor.com; Will Deacon <will.dea...@arm.com>; Sergey Senozhatsky <sergey.senozhat...@gmail.com>; sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli <aarca...@redhat.com>; Alexei Starovoitov <alexei.starovoi...@gmail.com>; Wang, Kemi <kemi.w...@intel.com>; Daniel Jordan <daniel.m.jor...@oracle.com>; David Rientjes <rient...@google.com>; Jerome Glisse <jgli...@redhat.com>; Ganesh Mahendran <opensource.gan...@gmail.com>; Minchan Kim <minc...@kernel.org>; Punit Agrawal <punitagra...@gmail.com>; vinayak menon <vinayakm.l...@gmail.com>; Yang Shi <yang@linux.alibaba.com>; linux-kernel@vger.kernel.org; linux...@kvack.org; ha...@linux.vnet.ibm.com; npig...@gmail.com; bsinghar...@gmail.com; paul...@linux.vnet.ibm.com; Tim Chen <tim.c.c...@linux.intel.com>; linuxppc-...@lists.ozlabs.org; x...@kernel.org Subject: Re: [PATCH v11 00/26] Speculative page faults On 28/05/2018 10:22, Haiyan Song wrote: > Hi Laurent, > > Yes, these tests are done on V9 patch. Do you plan to give this V11 a run ? > > > Best regards, > Haiyan Song > > On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote: >> On 28/05/2018 07:23, Song, HaiyanX wrote: >>> >>> Some regression and improvements is found by LKP-tools(linux kernel >>> performance) on V9 patch series tested on Intel 4s Skylake platform. >> >> Hi, >> >> Thanks for reporting this benchmark results, but you mentioned the >> "V9 patch series" while responding to the v11 header series... >> Were these tests done on v9 or v11 ? >> >> Cheers, >> Laurent. >> >>> >>> The regression result is sorted by the metric will-it-scale.per_thread_ops. >>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 >>> patch series) Commit id: >>> base commit: d55f34411b1b126429a823d06c3124c16283231f >>> head commit: 0355322b3577eeab7669066df42c550a56801110 >>> Benchmark suite: will-it-scale >>> Download link: >>> https://github.com/antonblanchard/will-it-scale/tree/master/tests >>> Metrics: >>> will-it-scale.per_process_ops=processes/nr_cpu >>> will-it-scale.per_thread_ops=threads/nr_cpu >>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G) >>> THP: enable / disable >>> nr_task: 100% >>> >>> 1. Regressions: >>> a) THP enabled: >>> testcasebasechange head >>> metric >>> page_fault3/ enable THP 10092 -17.5% 8323 >>> will-it-scale.per_thread_ops >>> page_fault2/ enable THP 8300 -17.2% 6869 >>> will-it-scale.per_thread_ops >>> brk1/ enable THP 957.67 -7.6% 885 >>> will-it-scale.per_thread_ops >>> page_fault3/ enable THP172821-5.3%163692 >>> will-it-scale.per_process_ops >>> signal1/ enable THP 9125-3.2% 8834 >>> will-it-scale.per_process_ops >>> >>> b) THP disabled: >>> testcasebasechange head >>> metric >>> page_fault3/ disable THP10107 -19.1% 8180 >>> will-it-scale.per_thread_ops >>> page_fault2/ disable THP 8432 -17.8% 6931 >>> will-it-scale.per_thread_ops >>> context_switch1/ disable THP 215389-6.8%200776 >>> will-it-scale.per_thread_ops >>> brk1/ disable THP 939.67 -6.6% 877.33 >>> will-it-scale.per_thread_ops >>> page_fault3/ disable THP 173145-4.7%165064 >>> will-it-scale.per_process_ops >>> signal1/ disable THP 9162
RE: [PATCH v11 00/26] Speculative page faults
Full run would take one or two weeks depended on our resource available. Could you pick some ones up, e.g. those have performance regression? -Original Message- From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of Laurent Dufour Sent: Monday, May 28, 2018 4:55 PM To: Song, HaiyanX Cc: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; Matthew Wilcox ; khand...@linux.vnet.ibm.com; aneesh.ku...@linux.vnet.ibm.com; b...@kernel.crashing.org; m...@ellerman.id.au; pau...@samba.org; Thomas Gleixner ; Ingo Molnar ; h...@zytor.com; Will Deacon ; Sergey Senozhatsky ; sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli ; Alexei Starovoitov ; Wang, Kemi ; Daniel Jordan ; David Rientjes ; Jerome Glisse ; Ganesh Mahendran ; Minchan Kim ; Punit Agrawal ; vinayak menon ; Yang Shi ; linux-kernel@vger.kernel.org; linux...@kvack.org; ha...@linux.vnet.ibm.com; npig...@gmail.com; bsinghar...@gmail.com; paul...@linux.vnet.ibm.com; Tim Chen ; linuxppc-...@lists.ozlabs.org; x...@kernel.org Subject: Re: [PATCH v11 00/26] Speculative page faults On 28/05/2018 10:22, Haiyan Song wrote: > Hi Laurent, > > Yes, these tests are done on V9 patch. Do you plan to give this V11 a run ? > > > Best regards, > Haiyan Song > > On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote: >> On 28/05/2018 07:23, Song, HaiyanX wrote: >>> >>> Some regression and improvements is found by LKP-tools(linux kernel >>> performance) on V9 patch series tested on Intel 4s Skylake platform. >> >> Hi, >> >> Thanks for reporting this benchmark results, but you mentioned the >> "V9 patch series" while responding to the v11 header series... >> Were these tests done on v9 or v11 ? >> >> Cheers, >> Laurent. >> >>> >>> The regression result is sorted by the metric will-it-scale.per_thread_ops. >>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 >>> patch series) Commit id: >>> base commit: d55f34411b1b126429a823d06c3124c16283231f >>> head commit: 0355322b3577eeab7669066df42c550a56801110 >>> Benchmark suite: will-it-scale >>> Download link: >>> https://github.com/antonblanchard/will-it-scale/tree/master/tests >>> Metrics: >>> will-it-scale.per_process_ops=processes/nr_cpu >>> will-it-scale.per_thread_ops=threads/nr_cpu >>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G) >>> THP: enable / disable >>> nr_task: 100% >>> >>> 1. Regressions: >>> a) THP enabled: >>> testcasebasechange head >>> metric >>> page_fault3/ enable THP 10092 -17.5% 8323 >>> will-it-scale.per_thread_ops >>> page_fault2/ enable THP 8300 -17.2% 6869 >>> will-it-scale.per_thread_ops >>> brk1/ enable THP 957.67 -7.6% 885 >>> will-it-scale.per_thread_ops >>> page_fault3/ enable THP172821-5.3%163692 >>> will-it-scale.per_process_ops >>> signal1/ enable THP 9125-3.2% 8834 >>> will-it-scale.per_process_ops >>> >>> b) THP disabled: >>> testcasebasechange head >>> metric >>> page_fault3/ disable THP10107 -19.1% 8180 >>> will-it-scale.per_thread_ops >>> page_fault2/ disable THP 8432 -17.8% 6931 >>> will-it-scale.per_thread_ops >>> context_switch1/ disable THP 215389-6.8%200776 >>> will-it-scale.per_thread_ops >>> brk1/ disable THP 939.67 -6.6% 877.33 >>> will-it-scale.per_thread_ops >>> page_fault3/ disable THP 173145-4.7%165064 >>> will-it-scale.per_process_ops >>> signal1/ disable THP 9162-3.9% 8802 >>> will-it-scale.per_process_ops >>> >>> 2. Improvements: >>> a) THP enabled: >>> testcasebasechange head >>> metric >>> malloc1/ enable THP 66.33+469.8% 383.67 >>> will-it-scale.per_thread_ops >>> writeseek3/ enable THP 2531 +4.5% 2646 >>> will-it-scale.per_thread_ops >>> si
RE: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement
Of course, we should do that AFAP. Thanks for your comments :) -Original Message- From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of Michal Hocko Sent: Thursday, November 30, 2017 5:45 PM To: Wang, Kemi <kemi.w...@intel.com> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>; Andrew Morton <a...@linux-foundation.org>; Vlastimil Babka <vba...@suse.cz>; Mel Gorman <mgor...@techsingularity.net>; Johannes Weiner <han...@cmpxchg.org>; Christopher Lameter <c...@linux.com>; YASUAKI ISHIMATSU <yasu.isim...@gmail.com>; Andrey Ryabinin <aryabi...@virtuozzo.com>; Nikolay Borisov <nbori...@suse.com>; Pavel Tatashin <pasha.tatas...@oracle.com>; David Rientjes <rient...@google.com>; Sebastian Andrzej Siewior <bige...@linutronix.de>; Dave <dave.han...@linux.intel.com>; Kleen, Andi <andi.kl...@intel.com>; Chen, Tim C <tim.c.c...@intel.com>; Jesper Dangaard Brouer <bro...@redhat.com>; Huang, Ying <ying.hu...@intel.com>; Lu, Aaron <aaron...@intel.com>; Li, Aubrey <aubrey...@intel.com>; Linux MM <linux...@kvack.org>; Linux Kernel <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement On Thu 30-11-17 17:32:08, kemi wrote: [...] > Your patch saves more code than mine because the node stats framework > is reused for numa stats. But it has a performance regression because > of the limitation of threshold size (125 at most, see > calculate_normal_threshold() in vmstat.c) in inc_node_state(). But this "regression" would be visible only on those workloads which really need to squeeze every single cycle out of the allocation hot path and those are supposed to disable the accounting altogether. Or is this visible on a wider variety of workloads. Do not get me wrong. If we want to make per-node stats more optimal, then by all means let's do that. But having 3 sets of counters is just way to much. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org
RE: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement
Of course, we should do that AFAP. Thanks for your comments :) -Original Message- From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of Michal Hocko Sent: Thursday, November 30, 2017 5:45 PM To: Wang, Kemi Cc: Greg Kroah-Hartman ; Andrew Morton ; Vlastimil Babka ; Mel Gorman ; Johannes Weiner ; Christopher Lameter ; YASUAKI ISHIMATSU ; Andrey Ryabinin ; Nikolay Borisov ; Pavel Tatashin ; David Rientjes ; Sebastian Andrzej Siewior ; Dave ; Kleen, Andi ; Chen, Tim C ; Jesper Dangaard Brouer ; Huang, Ying ; Lu, Aaron ; Li, Aubrey ; Linux MM ; Linux Kernel Subject: Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement On Thu 30-11-17 17:32:08, kemi wrote: [...] > Your patch saves more code than mine because the node stats framework > is reused for numa stats. But it has a performance regression because > of the limitation of threshold size (125 at most, see > calculate_normal_threshold() in vmstat.c) in inc_node_state(). But this "regression" would be visible only on those workloads which really need to squeeze every single cycle out of the allocation hot path and those are supposed to disable the accounting altogether. Or is this visible on a wider variety of workloads. Do not get me wrong. If we want to make per-node stats more optimal, then by all means let's do that. But having 3 sets of counters is just way to much. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org
RE: [PATCH 1/3] mm, sysctl: make VM stats configurable
-Original Message- From: Michal Hocko [mailto:mho...@kernel.org] Sent: Friday, September 15, 2017 7:50 PM To: Wang, Kemi <kemi.w...@intel.com> Cc: Luis R . Rodriguez <mcg...@kernel.org>; Kees Cook <keesc...@chromium.org>; Andrew Morton <a...@linux-foundation.org>; Jonathan Corbet <cor...@lwn.net>; Mel Gorman <mgor...@techsingularity.net>; Johannes Weiner <han...@cmpxchg.org>; Christopher Lameter <c...@linux.com>; Sebastian Andrzej Siewior <bige...@linutronix.de>; Vlastimil Babka <vba...@suse.cz>; Hillf Danton <hillf...@alibaba-inc.com>; Dave <dave.han...@linux.intel.com>; Chen, Tim C <tim.c.c...@intel.com>; Kleen, Andi <andi.kl...@intel.com>; Jesper Dangaard Brouer <bro...@redhat.com>; Huang, Ying <ying.hu...@intel.com>; Lu, Aaron <aaron...@intel.com>; Proc sysctl <linux-fsde...@vger.kernel.org>; Linux MM <linux...@kvack.org>; Linux Kernel <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 1/3] mm, sysctl: make VM stats configurable On Fri 15-09-17 17:23:24, Kemi Wang wrote: > This patch adds a tunable interface that allows VM stats configurable, as > suggested by Dave Hansen and Ying Huang. > > When performance becomes a bottleneck and you can tolerate some possible > tool breakage and some decreased counter precision (e.g. numa counter), you > can do: > echo [C|c]oarse > /proc/sys/vm/vmstat_mode > > When performance is not a bottleneck and you want all tooling to work, you > can do: > echo [S|s]trict > /proc/sys/vm/vmstat_mode > > We recommend automatic detection of virtual memory statistics by system, > this is also system default configuration, you can do: > echo [A|a]uto > /proc/sys/vm/vmstat_mode > > The next patch handles numa statistics distinctively based-on different VM > stats mode. I would just merge this with the second patch so that it is clear how those modes are implemented. I am also wondering why cannot we have a much simpler interface and implementation to enable/disable numa stats (btw. sysctl_vm_numa_stats would be more descriptive IMHO). The motivation is that we propose a general tunable interface for VM stats. This would be more scalable, since we don't have to add an individual Interface for each type of counter that can be configurable. In the second patch, NUMA stats, as an example, can benefit for that. If you still hold your idea, I don't mind to merge them together. -- Michal Hocko SUSE Labs
RE: [PATCH 1/3] mm, sysctl: make VM stats configurable
-Original Message- From: Michal Hocko [mailto:mho...@kernel.org] Sent: Friday, September 15, 2017 7:50 PM To: Wang, Kemi Cc: Luis R . Rodriguez ; Kees Cook ; Andrew Morton ; Jonathan Corbet ; Mel Gorman ; Johannes Weiner ; Christopher Lameter ; Sebastian Andrzej Siewior ; Vlastimil Babka ; Hillf Danton ; Dave ; Chen, Tim C ; Kleen, Andi ; Jesper Dangaard Brouer ; Huang, Ying ; Lu, Aaron ; Proc sysctl ; Linux MM ; Linux Kernel Subject: Re: [PATCH 1/3] mm, sysctl: make VM stats configurable On Fri 15-09-17 17:23:24, Kemi Wang wrote: > This patch adds a tunable interface that allows VM stats configurable, as > suggested by Dave Hansen and Ying Huang. > > When performance becomes a bottleneck and you can tolerate some possible > tool breakage and some decreased counter precision (e.g. numa counter), you > can do: > echo [C|c]oarse > /proc/sys/vm/vmstat_mode > > When performance is not a bottleneck and you want all tooling to work, you > can do: > echo [S|s]trict > /proc/sys/vm/vmstat_mode > > We recommend automatic detection of virtual memory statistics by system, > this is also system default configuration, you can do: > echo [A|a]uto > /proc/sys/vm/vmstat_mode > > The next patch handles numa statistics distinctively based-on different VM > stats mode. I would just merge this with the second patch so that it is clear how those modes are implemented. I am also wondering why cannot we have a much simpler interface and implementation to enable/disable numa stats (btw. sysctl_vm_numa_stats would be more descriptive IMHO). The motivation is that we propose a general tunable interface for VM stats. This would be more scalable, since we don't have to add an individual Interface for each type of counter that can be configurable. In the second patch, NUMA stats, as an example, can benefit for that. If you still hold your idea, I don't mind to merge them together. -- Michal Hocko SUSE Labs