Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work
On 1/29/2021 2:08 AM, Paul E. McKenney wrote: On Thu, Jan 28, 2021 at 05:09:05PM +0800, Hillf Danton wrote: On Thu, 28 Jan 2021 15:52:40 +0800 Xing Zhengjun wrote: [ . . . ] I test the patch 4 times, no warning appears in the kernel log. Thank you so much Zhengjun! And the overall brain dump so far is 1/ before and after d5bff968ea, changing the allowed ptr at online time is the key to quiesce the warning in process_one_work(). 2/ marking pcpu before changing aptr in rebind_workers() is mandatory in regards to cutting the risk of triggering such a warning. 3/ we canot maintain such an order without quiescing the 508 warning for kworkers. And we have a couple of excuses to do so, a) the number of allowed CPUs is no longer checked in is_per_cpu_kthread() instead of PF_NO_SETAFFINITY, b) there is always a followup act to change the aptr in order to fix the number of aCPUs. 4/ same order is maintained also at rescue time. Just out of curiosity, does this test still fail on current mainline? Thanx, Paul I test mainline v5.11-rc5, it has no issue. The issue is only for d5bff968ea which is in https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b. -- Zhengjun Xing
Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work
On 1/27/2021 5:21 PM, Hillf Danton wrote: On Wed, 27 Jan 2021 16:04:25 +0800 Xing Zhengjun wrote: On 1/26/2021 3:39 PM, Hillf Danton wrote: On 26 Jan 2021 10:45:21 +0800 Xing Zhengjun wrote: On 1/25/2021 5:29 PM, Hillf Danton wrote: On 25 Jan 2021 16:31:32 +0800 Xing Zhengjun wrote: On 1/22/2021 3:59 PM, Hillf Danton wrote: On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote: On 1/21/2021 12:00 PM, Hillf Danton wrote: On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote: On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote: Thu, 14 Jan 2021 15:45:11 +0800 FYI, we noticed the following commit (built with gcc-9): commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively") https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b [...] [ 73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work Thanks for your report. We can also break CPU affinity by checking POOL_DISASSOCIATED at attach time without extra cost paid; that way we have the same behavior as at the unbind time. What is more the change that makes kworker pcpu is cut because they are going to not help either hotplug or the mechanism of stop machine. hi, by applying below patch, the issue still happened. Thanks for your report. [ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers [ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds [ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 0x000c-0x000d] [ 4.578648] PCI: CLS 0 bytes, default 64 [ 4.579685] Unpacking initramfs... [ 8.878031] ---[ cut here ]--- [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0 [ 8.880688] Modules linked in: [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2 The kworker bond to CPU1 runs on CPU0 and triggers the warning, which shows that scheduler breaks CPU affinity, after 06249738a41a ("workqueue: Manually break affinity on hotplug"), though quite likely by kworker/1:0 for the initial workers. [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 8.887539] Workqueue: 0x0 (events) [ 8.887838] EIP: process_one_work+0x92/0x9e0 [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 [ 8.887838] EAX: 42f51d08 EBX: ECX: EDX: 0001 [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08 [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010002 [ 8.887838] CR0: 80050033 CR2: CR3: 034e3000 CR4: 000406d0 [ 8.887838] Call Trace: [ 8.887838] ? worker_thread+0x98/0x6a0 [ 8.887838] ? worker_thread+0x2dd/0x6a0 [ 8.887838] ? kthread+0x1ba/0x1e0 [ 8.887838] ? create_worker+0x1e0/0x1e0 [ 8.887838] ? kzalloc+0x20/0x20 [ 8.887838] ? ret_from_fork+0x1c/0x28 [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0 [ 8.887838] --[ end trace ac461b4d54c37cfa ]-- Instead of creating the initial workers only on the active CPUS, rebind them (labeled pcpu) and jump to the right CPU at bootup time. --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2385,6 +2385,16 @@ woke_up: return 0; } + if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() != + pool->cpu) { + /* scheduler breaks CPU affinity for us, rebind it */ + raw_spin_unlock_irq(&pool->lock); + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + /* and jump to the right seat */ + schedule_timeout_interruptible(1); + goto woke_up; + } + worker_leave_idle(worker); recheck: /* no more worker necessary? */ -- I test the patch, the warning still appears in the kernel log. Thanks for your report. [ 230.356503] smpboot: CPU 1 is now offline [ 230.544652] x86: Booting SMP configuration: [ 230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock [ 230.545675] masked ExtINT on CPU#1 [ 230.593829] [ cut here ] [ 230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 [ 230.594990] Modules linked in: rcutorture torture mousedev input_leds led_class pcspkr psmouse evbug tiny_power_button button [ 230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2 Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers warning, due to scheduler breaking CPU affinity for us. What is ne
Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work
On 1/26/2021 3:39 PM, Hillf Danton wrote: On 26 Jan 2021 10:45:21 +0800 Xing Zhengjun wrote: On 1/25/2021 5:29 PM, Hillf Danton wrote: On 25 Jan 2021 16:31:32 +0800 Xing Zhengjun wrote: On 1/22/2021 3:59 PM, Hillf Danton wrote: On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote: On 1/21/2021 12:00 PM, Hillf Danton wrote: On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote: On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote: Thu, 14 Jan 2021 15:45:11 +0800 FYI, we noticed the following commit (built with gcc-9): commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively") https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b [...] [ 73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work Thanks for your report. We can also break CPU affinity by checking POOL_DISASSOCIATED at attach time without extra cost paid; that way we have the same behavior as at the unbind time. What is more the change that makes kworker pcpu is cut because they are going to not help either hotplug or the mechanism of stop machine. hi, by applying below patch, the issue still happened. Thanks for your report. [ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers [ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds [ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 0x000c-0x000d] [ 4.578648] PCI: CLS 0 bytes, default 64 [ 4.579685] Unpacking initramfs... [ 8.878031] ---[ cut here ]--- [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0 [ 8.880688] Modules linked in: [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2 The kworker bond to CPU1 runs on CPU0 and triggers the warning, which shows that scheduler breaks CPU affinity, after 06249738a41a ("workqueue: Manually break affinity on hotplug"), though quite likely by kworker/1:0 for the initial workers. [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 8.887539] Workqueue: 0x0 (events) [ 8.887838] EIP: process_one_work+0x92/0x9e0 [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 [ 8.887838] EAX: 42f51d08 EBX: ECX: EDX: 0001 [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08 [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010002 [ 8.887838] CR0: 80050033 CR2: CR3: 034e3000 CR4: 000406d0 [ 8.887838] Call Trace: [ 8.887838] ? worker_thread+0x98/0x6a0 [ 8.887838] ? worker_thread+0x2dd/0x6a0 [ 8.887838] ? kthread+0x1ba/0x1e0 [ 8.887838] ? create_worker+0x1e0/0x1e0 [ 8.887838] ? kzalloc+0x20/0x20 [ 8.887838] ? ret_from_fork+0x1c/0x28 [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0 [ 8.887838] --[ end trace ac461b4d54c37cfa ]-- Instead of creating the initial workers only on the active CPUS, rebind them (labeled pcpu) and jump to the right CPU at bootup time. --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2385,6 +2385,16 @@ woke_up: return 0; } + if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() != + pool->cpu) { + /* scheduler breaks CPU affinity for us, rebind it */ + raw_spin_unlock_irq(&pool->lock); + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + /* and jump to the right seat */ + schedule_timeout_interruptible(1); + goto woke_up; + } + worker_leave_idle(worker); recheck: /* no more worker necessary? */ -- I test the patch, the warning still appears in the kernel log. Thanks for your report. [ 230.356503] smpboot: CPU 1 is now offline [ 230.544652] x86: Booting SMP configuration: [ 230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock [ 230.545675] masked ExtINT on CPU#1 [ 230.593829] [ cut here ] [ 230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 [ 230.594990] Modules linked in: rcutorture torture mousedev input_leds led_class pcspkr psmouse evbug tiny_power_button button [ 230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2 Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers warning, due to scheduler breaking CPU affinity for us. What is new, the affinity was broken at offline time instead of bootup. [ 230.596621] Hardware name: QEMU
Re: Test report for kernel direct mapping performance
On 1/26/2021 11:00 PM, Michal Hocko wrote: On Fri 15-01-21 15:23:07, Xing Zhengjun wrote: Hi, There is currently a bit of a debate about the kernel direct map. Does using 2M/1G pages aggressively for the kernel direct map help performance? Or, is it an old optimization which is not as helpful on modern CPUs as it was in the old days? What is the penalty of a kernel feature that heavily demotes this mapping from larger to smaller pages? We did a set of runs with 1G and 2M pages enabled /disabled and saw the changes. [Conclusions] Assuming that this was a good representative set of workloads and that the data are good, for server usage, we conclude that the existing aggressive use of 1G mappings is a good choice since it represents the best in a plurality of the workloads. However, in a *majority* of cases, another mapping size (2M or 4k) potentially offers a performance improvement. This leads us to conclude that although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice, or that folks deriving benefits (like hardening) from smaller mapping sizes should avoid the smaller mapping sizes. Thanks for conducting these tests! This is definitely useful and quite honestly I would have expected a much more noticeable differences. Please note that I am not really deep into benchmarking but one thing that popped in my mind was whethere these (micro)benchmarks are really representative workloads. Some of them tend to be rather narrow in executed code paths or data structures used AFAIU. Is it possible they simply didn't generate sufficient TLB pressure? The test was done on 4 server platforms with 11 benchmarks which 0day run daily. For the 11 different benchmarks that were used, echo benchmarks have a lot of subcases, so there was a total of 259 test cases. The test memory size for the 4 server platform ranges from 128GB to 512GB. Yes, some of the benchmarks tend to be narrow in executed code paths or data structures. So we run a total of 259 cases which include test cases in memory, CPU scheduling, network, io, and database, try to cover most of the code path. For the 11 benchmarks, some of them may not generate sufficient TLB pressure, but I think cases in vm-scalability and will-it-scale may generate sufficient TLB pressure. I have provided the test results for different benchmarks, if you are interested, you can see in the details of the test report: https://01.org/sites/default/files/documentation/test_report_for_kernel_direct_mapping_performance_0.pdf Have you tried to look closer on profiles of respective configurations where the overhead comes from? The test cases selected from the 0day daily run cases, just use the different kernel settings; Enable both 2M and 1G huge pages (up to 1G, so named to "1G" in the test report): no extra kernel command line need Disable 1G pages (up to 2M, so named to 2M in the test report): add kernel command line "nogbpages" Disable both 2M and 1G huge pages (up to 4k, so named to 4K in the test report): add kernel command line "nohugepages_mapping" (by debug patch) User spaces add THP enabled setting for all the three kernels (1G/2M/4K) transparent_hugepage: thp_enabled: always thp_defrag: always During the test, we enabled some monitors, but the overhead should be not too big, most of the overhead should be the test cases themselves. I will study some test cases to find the hotspot from which overhead comes from and provide it later if someone is interested in it. -- Zhengjun Xing
Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work
On 1/25/2021 5:29 PM, Hillf Danton wrote: On 25 Jan 2021 16:31:32 +0800 Xing Zhengjun wrote: On 1/22/2021 3:59 PM, Hillf Danton wrote: On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote: On 1/21/2021 12:00 PM, Hillf Danton wrote: On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote: On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote: Thu, 14 Jan 2021 15:45:11 +0800 FYI, we noticed the following commit (built with gcc-9): commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively") https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b [...] [ 73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work Thanks for your report. We can also break CPU affinity by checking POOL_DISASSOCIATED at attach time without extra cost paid; that way we have the same behavior as at the unbind time. What is more the change that makes kworker pcpu is cut because they are going to not help either hotplug or the mechanism of stop machine. hi, by applying below patch, the issue still happened. Thanks for your report. [ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers [ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds [ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 0x000c-0x000d] [ 4.578648] PCI: CLS 0 bytes, default 64 [ 4.579685] Unpacking initramfs... [ 8.878031] ---[ cut here ]--- [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0 [ 8.880688] Modules linked in: [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2 The kworker bond to CPU1 runs on CPU0 and triggers the warning, which shows that scheduler breaks CPU affinity, after 06249738a41a ("workqueue: Manually break affinity on hotplug"), though quite likely by kworker/1:0 for the initial workers. [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 8.887539] Workqueue: 0x0 (events) [ 8.887838] EIP: process_one_work+0x92/0x9e0 [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 [ 8.887838] EAX: 42f51d08 EBX: ECX: EDX: 0001 [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08 [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010002 [ 8.887838] CR0: 80050033 CR2: CR3: 034e3000 CR4: 000406d0 [ 8.887838] Call Trace: [ 8.887838] ? worker_thread+0x98/0x6a0 [ 8.887838] ? worker_thread+0x2dd/0x6a0 [ 8.887838] ? kthread+0x1ba/0x1e0 [ 8.887838] ? create_worker+0x1e0/0x1e0 [ 8.887838] ? kzalloc+0x20/0x20 [ 8.887838] ? ret_from_fork+0x1c/0x28 [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0 [ 8.887838] --[ end trace ac461b4d54c37cfa ]-- Instead of creating the initial workers only on the active CPUS, rebind them (labeled pcpu) and jump to the right CPU at bootup time. --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2385,6 +2385,16 @@ woke_up: return 0; } + if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() != + pool->cpu) { + /* scheduler breaks CPU affinity for us, rebind it */ + raw_spin_unlock_irq(&pool->lock); + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + /* and jump to the right seat */ + schedule_timeout_interruptible(1); + goto woke_up; + } + worker_leave_idle(worker); recheck: /* no more worker necessary? */ -- I test the patch, the warning still appears in the kernel log. Thanks for your report. [ 230.356503] smpboot: CPU 1 is now offline [ 230.544652] x86: Booting SMP configuration: [ 230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock [ 230.545675] masked ExtINT on CPU#1 [ 230.593829] [ cut here ] [ 230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 [ 230.594990] Modules linked in: rcutorture torture mousedev input_leds led_class pcspkr psmouse evbug tiny_power_button button [ 230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2 Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers warning, due to scheduler breaking CPU affinity for us. What is new, the affinity was broken at offline time instead of bootup. [ 230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 230.597322] Workqueue: 0x0 (rcu_gp) [
Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work
On 1/22/2021 3:59 PM, Hillf Danton wrote: On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote: On 1/21/2021 12:00 PM, Hillf Danton wrote: On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote: On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote: Thu, 14 Jan 2021 15:45:11 +0800 FYI, we noticed the following commit (built with gcc-9): commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively") https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b [...] [ 73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work Thanks for your report. We can also break CPU affinity by checking POOL_DISASSOCIATED at attach time without extra cost paid; that way we have the same behavior as at the unbind time. What is more the change that makes kworker pcpu is cut because they are going to not help either hotplug or the mechanism of stop machine. hi, by applying below patch, the issue still happened. Thanks for your report. [ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers [ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds [ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 0x000c-0x000d] [ 4.578648] PCI: CLS 0 bytes, default 64 [ 4.579685] Unpacking initramfs... [ 8.878031] ---[ cut here ]--- [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0 [ 8.880688] Modules linked in: [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2 The kworker bond to CPU1 runs on CPU0 and triggers the warning, which shows that scheduler breaks CPU affinity, after 06249738a41a ("workqueue: Manually break affinity on hotplug"), though quite likely by kworker/1:0 for the initial workers. [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 8.887539] Workqueue: 0x0 (events) [ 8.887838] EIP: process_one_work+0x92/0x9e0 [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 [ 8.887838] EAX: 42f51d08 EBX: ECX: EDX: 0001 [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08 [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010002 [ 8.887838] CR0: 80050033 CR2: CR3: 034e3000 CR4: 000406d0 [ 8.887838] Call Trace: [ 8.887838] ? worker_thread+0x98/0x6a0 [ 8.887838] ? worker_thread+0x2dd/0x6a0 [ 8.887838] ? kthread+0x1ba/0x1e0 [ 8.887838] ? create_worker+0x1e0/0x1e0 [ 8.887838] ? kzalloc+0x20/0x20 [ 8.887838] ? ret_from_fork+0x1c/0x28 [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0 [ 8.887838] --[ end trace ac461b4d54c37cfa ]-- Instead of creating the initial workers only on the active CPUS, rebind them (labeled pcpu) and jump to the right CPU at bootup time. --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2385,6 +2385,16 @@ woke_up: return 0; } + if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() != + pool->cpu) { + /* scheduler breaks CPU affinity for us, rebind it */ + raw_spin_unlock_irq(&pool->lock); + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + /* and jump to the right seat */ + schedule_timeout_interruptible(1); + goto woke_up; + } + worker_leave_idle(worker); recheck: /* no more worker necessary? */ -- I test the patch, the warning still appears in the kernel log. Thanks for your report. [ 230.356503] smpboot: CPU 1 is now offline [ 230.544652] x86: Booting SMP configuration: [ 230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock [ 230.545675] masked ExtINT on CPU#1 [ 230.593829] [ cut here ] [ 230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 [ 230.594990] Modules linked in: rcutorture torture mousedev input_leds led_class pcspkr psmouse evbug tiny_power_button button [ 230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2 Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers warning, due to scheduler breaking CPU affinity for us. What is new, the affinity was broken at offline time instead of bootup. [ 230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 230.597322] Workqueue: 0x0 (rcu_gp) [ 230.597636] EIP: process_one_work+0x92/0x9e0 [ 230.598005] Code: 37 64 a1 58 54 4c 43 39 45 24
Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work
On 1/21/2021 12:00 PM, Hillf Danton wrote: On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote: On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote: Thu, 14 Jan 2021 15:45:11 +0800 FYI, we noticed the following commit (built with gcc-9): commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively") https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b [...] [ 73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work Thanks for your report. We can also break CPU affinity by checking POOL_DISASSOCIATED at attach time without extra cost paid; that way we have the same behavior as at the unbind time. What is more the change that makes kworker pcpu is cut because they are going to not help either hotplug or the mechanism of stop machine. hi, by applying below patch, the issue still happened. Thanks for your report. [ 4.574467] pci :00:00.0: Limiting direct PCI/PCI transfers [ 4.575651] pci :00:01.0: Activating ISA DMA hang workarounds [ 4.576900] pci :00:02.0: Video device with shadowed ROM at [mem 0x000c-0x000d] [ 4.578648] PCI: CLS 0 bytes, default 64 [ 4.579685] Unpacking initramfs... [ 8.878031] ---[ cut here ]--- [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0 [ 8.880688] Modules linked in: [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2 The kworker bond to CPU1 runs on CPU0 and triggers the warning, which shows that scheduler breaks CPU affinity, after 06249738a41a ("workqueue: Manually break affinity on hotplug"), though quite likely by kworker/1:0 for the initial workers. [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 8.887539] Workqueue: 0x0 (events) [ 8.887838] EIP: process_one_work+0x92/0x9e0 [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 [ 8.887838] EAX: 42f51d08 EBX: ECX: EDX: 0001 [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08 [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010002 [ 8.887838] CR0: 80050033 CR2: CR3: 034e3000 CR4: 000406d0 [ 8.887838] Call Trace: [ 8.887838] ? worker_thread+0x98/0x6a0 [ 8.887838] ? worker_thread+0x2dd/0x6a0 [ 8.887838] ? kthread+0x1ba/0x1e0 [ 8.887838] ? create_worker+0x1e0/0x1e0 [ 8.887838] ? kzalloc+0x20/0x20 [ 8.887838] ? ret_from_fork+0x1c/0x28 [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0 [ 8.887838] --[ end trace ac461b4d54c37cfa ]-- Instead of creating the initial workers only on the active CPUS, rebind them (labeled pcpu) and jump to the right CPU at bootup time. --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2385,6 +2385,16 @@ woke_up: return 0; } + if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() != + pool->cpu) { + /* scheduler breaks CPU affinity for us, rebind it */ + raw_spin_unlock_irq(&pool->lock); + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + /* and jump to the right seat */ + schedule_timeout_interruptible(1); + goto woke_up; + } + worker_leave_idle(worker); recheck: /* no more worker necessary? */ -- I test the patch, the warning still appears in the kernel log. [ 230.356503] smpboot: CPU 1 is now offline [ 230.544652] x86: Booting SMP configuration: [ 230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock [ 230.545675] masked ExtINT on CPU#1 [ 230.593829] [ cut here ] [ 230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 [ 230.594990] Modules linked in: rcutorture torture mousedev input_leds led_class pcspkr psmouse evbug tiny_power_button button [ 230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2 [ 230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 230.597322] Workqueue: 0x0 (rcu_gp) [ 230.597636] EIP: process_one_work+0x92/0x9e0 [ 230.598005] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 f4 85 13 00 ff 05 cc 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 [ 230.599569] EAX: 42f51d08 EBX: ECX: EDX: 0001 [ 230.600100] ESI: 43d94240 EDI: df4040f4 EBP: de7f23c0 ESP: bf5f1f08 [ 230.600629] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010002 [ 2
Re: [LKP] Re: [percpu_ref] 2b0d3d3e4f: reaim.jobs_per_min -18.4% regression
On 1/11/2021 5:58 PM, Ming Lei wrote: On Sun, Jan 10, 2021 at 10:32:47PM +0800, kernel test robot wrote: Greeting, FYI, we noticed a -18.4% regression of reaim.jobs_per_min due to commit: commit: 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e ("percpu_ref: reduce memory footprint of percpu_ref in fast path") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: reaim on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: runtime: 300s nr_task: 100% test: short cpufreq_governor: performance ucode: 0x5002f01 test-description: REAIM is an updated and improved version of AIM 7 benchmark. test-url: https://sourceforge.net/projects/re-aim-7/ In addition to that, the commit also has significant impact on the following tests: +--+---+ | testcase: change | vm-scalability: vm-scalability.throughput -2.8% regression | | test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | | test parameters | cpufreq_governor=performance | | | runtime=300s | | | test=lru-file-mmap-read-rand | | | ucode=0x5003003 | +--+---+ | testcase: change | will-it-scale: will-it-scale.per_process_ops 14.5% improvement| | test machine | 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory| | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=50% | | | test=page_fault2 | | | ucode=0x16 | +--+---+ | testcase: change | will-it-scale: will-it-scale.per_process_ops -13.0% regression| | test machine | 104 threads Skylake with 192G memory | | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=50% | | | test=malloc1 | | | ucode=0x2006906 | +--+---+ | testcase: change | vm-scalability: vm-scalability.throughput -2.3% regression | | test machine | 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory | | test parameters | cpufreq_governor=performance | | | runtime=300s | | | test=lru-file-mmap-read-rand | | | ucode=0x5002f01 | +--+---+ | testcase: change | fio-basic: fio.read_iops -4.8% regression | | test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | | test parameters | bs=4k | | | cpufreq_governor=performance | | | disk=2pmem | | | fs=xfs | | | ioengine=libaio | | | nr_task=50% | | | runtime=200s | | | rw=randread | | | test_size=200G | |
Test report for kernel direct mapping performance
Hi, There is currently a bit of a debate about the kernel direct map. Does using 2M/1G pages aggressively for the kernel direct map help performance? Or, is it an old optimization which is not as helpful on modern CPUs as it was in the old days? What is the penalty of a kernel feature that heavily demotes this mapping from larger to smaller pages? We did a set of runs with 1G and 2M pages enabled /disabled and saw the changes. [Conclusions] Assuming that this was a good representative set of workloads and that the data are good, for server usage, we conclude that the existing aggressive use of 1G mappings is a good choice since it represents the best in a plurality of the workloads. However, in a *majority* of cases, another mapping size (2M or 4k) potentially offers a performance improvement. This leads us to conclude that although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice, or that folks deriving benefits (like hardening) from smaller mapping sizes should avoid the smaller mapping sizes. [Summary of results] 1. The test was done on server platforms with 11 benchmarks. For the 4 different server platforms tested, each with three different maximums kernel mapping sizes: 4k, 2M, and 1G. Each system has enough memory to effectively deploy 1G mappings. For the 11 different benchmarks were used, not every benchmark was run on every system, there was a total of 259 tests. 2. For each benchmark/system combination, the 1G mapping had the highest performance for 45% of the tests, 2M for ~30%, and 4k for~20%. 3. From the average delta, among 1G/2M/4K, 4K gets the lowest performance in all the 4 test machines, while 1G gets the best performance on 2 test machines and 2M gets the best performance on the other 2 machines. 4. By testing with machine memory from 256G to 512G, we observed that the larger memory will lead to the performance better for 1G page size. With Large memory, Will-it-scale/vm-scalability/unixbench/reaim/hackbench shows 1G has the best performance, while kbuild/memtier/netperf shows 4K has the best performance. For more details please see the following web link: https://01.org/sites/default/files/documentation/test_report_for_kernel_direct_mapping_performance_0.pdf -- Zhengjun Xing
Re: [LKP] Re: [btrfs] e076ab2a2c: fio.write_iops -18.3% regression
On 1/12/2021 11:45 PM, David Sterba wrote: On Tue, Jan 12, 2021 at 11:36:14PM +0800, kernel test robot wrote: Greeting, FYI, we noticed a -18.3% regression of fio.write_iops due to commit: commit: e076ab2a2ca70a0270232067cd49f76cd92efe64 ("btrfs: shrink delalloc pages instead of full inodes") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: fio-basic on test machine: 192 threads Intel(R) Xeon(R) CPU @ 2.20GHz with 192G memory with following parameters: disk: 1SSD fs: btrfs runtime: 300s nr_task: 8 rw: randwrite bs: 4k ioengine: sync test_size: 256g Though I do a similar test (emulating bit torrent workload), it's a bit extreme as it's 4k synchronous on a huge file. It always takes a lot of time but could point out some concurrency issues namely on faster devices. There are 8 threads possibly competing for the same inode lock or other locks related to it. The mentioned commit fixed another perf regression on a much more common workload (untgrring files), so at this point drop in this fio workload is inevitable. Do you have a plan to fix it? Thanks. ___ LKP mailing list -- l...@lists.01.org To unsubscribe send an email to lkp-le...@lists.01.org -- Zhengjun Xing
Re: [LKP] [locking/rwsem] 617f3ef951: unixbench.score -21.2% regression
Hi Waiman, Do you have time to look at this? Thanks. As you describe in commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 ("locking/rwsem: Remove reader optimistic spinning"), The patch that disables reader optimistic spinning shows reduced performance at lightly loaded cases, so for this regression, Is it as expected? On 12/17/2020 9:33 AM, kernel test robot wrote: Greeting, FYI, we noticed a -21.2% regression of unixbench.score due to commit: commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 ("locking/rwsem: Remove reader optimistic spinning") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: unixbench on test machine: 16 threads Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz with 32G memory with following parameters: runtime: 300s nr_task: 30% test: shell8 cpufreq_governor: performance ucode: 0xde test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system. test-url: https://github.com/kdlucas/byte-unixbench If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-cfl-e1/shell8/unixbench/0xde commit: 1a728dff85 ("locking/rwsem: Enable reader optimistic lock stealing") 617f3ef951 ("locking/rwsem: Remove reader optimistic spinning") 1a728dff855a318b 617f3ef95177840c77f59c2aec1 --- fail:runs %reproductionfail:runs | | | 39:4 -992%:4 perf-profile.calltrace.cycles-pp.error_entry 25:4 -635%:4 perf-profile.children.cycles-pp.error_entry %stddev %change %stddev \ |\ 21807 ± 3% -21.2% 17186unixbench.score 1287072 ± 3% -38.7% 788414 unixbench.time.involuntary_context_switches 37161 ± 4% +31.3% 48798unixbench.time.major_page_faults 1.047e+08 ± 3% -21.1% 82610985unixbench.time.minor_page_faults 1341 -27.1% 978.00 unixbench.time.percent_of_cpu_this_job_got 370.87 -33.3% 247.55unixbench.time.system_time 490.05 -23.3% 376.03unixbench.time.user_time 3083520 ± 3% +59.7%4924900 unixbench.time.voluntary_context_switches 824314 ± 3% -21.2% 649654unixbench.workload 0.03 ± 27% -51.9% 0.02 ± 59% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork 385.15 ± 2% +62.5% 625.72uptime.idle 17.03-1.8% 16.73boot-time.boot 11.01-1.6% 10.83boot-time.dhcp 214.12 ± 3% -3.1% 207.49boot-time.idle 13.72 ± 4% +23.5 37.24mpstat.cpu.all.idle% 1.06-0.10.94mpstat.cpu.all.irq% 49.32 ± 2% -11.8 37.53mpstat.cpu.all.sys% 35.24 ± 2% -11.6 23.68mpstat.cpu.all.usr% 15.50 ± 3%+145.2% 38.00vmstat.cpu.id 49.00 ± 2% -22.4% 38.00vmstat.cpu.sy 33.75 ± 2% -33.3% 22.50 ± 2% vmstat.cpu.us 21.75 ± 3% -33.3% 14.50 ± 3% vmstat.procs.r 97370 ± 3% +56.4% 152258vmstat.system.cs 37589-2.1% 36804vmstat.system.in 11861 ± 9% -18.0% 9730slabinfo.filp.active_objs 13242 ± 8% -15.5% 11184slabinfo.filp.num_objs 14731 ± 7% -9.5% 13325 ± 5% slabinfo.kmalloc-8.active_objs 14731 ± 7% -9.5% 13325 ± 5% slabinfo.kmalloc-8.num_objs 5545 ± 2% -13.8% 4780 ± 4% slabinfo.pid.active_objs 5563 ± 2% -13.8% 4793 ± 4% slabinfo.pid.num_objs 5822 ± 14% -40.4% 3468 ± 5% slabinfo.task_delay_info.active_objs 5825 ± 14% -40.5% 3468 ± 5% slabinfo.task_delay_info.num_objs 32104492 ± 3%+303.3% 1.295e+08 ± 11% cpuidle.C1.time 882330 ± 5%+131.5%2042656 ± 10% cpuidle.C1.usage 21965263 ± 3%+340.5% 96762398 ± 14% cpuidle.C1E.time 442911 ± 2%+211.3%1378866 ± 14% cpuidle.C1E.usage 6511399 ± 4%+606.6% 46010023 ± 1
Re: [LKP] Re: [sched/hotplug] 2558aacff8: will-it-scale.per_thread_ops -1.6% regression
On 12/11/2020 12:14 AM, Peter Zijlstra wrote: On Thu, Dec 10, 2020 at 04:18:59PM +0800, kernel test robot wrote: FYI, we noticed a -1.6% regression of will-it-scale.per_thread_ops due to commit: commit: 2558aacff8586699bcd248b406febb28b0a25de2 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug") Mooo, weird but whatever. Does the below help at all? I test the patch, the regression reduced to -0.6%. = tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode: lkp-cpl-4sp1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/thread/sched_yield/performance/0x71e commit: 565790d28b1e33ee2f77bad5348b99f6dfc366fd 2558aacff8586699bcd248b406febb28b0a25de2 4b26139b8db627a55043183614a32b0aba799d27 (this test patch) 565790d28b1e33ee 2558aacff8586699bcd248b406f 4b26139b8db627a55043183614a --- --- %stddev %change %stddev %change %stddev \ |\ |\ 4.011e+08-1.6% 3.945e+08-0.6% 3.989e+08 will-it-scale.144.threads 2785455-1.6%2739520-0.6%2769967 will-it-scale.per_thread_ops 4.011e+08-1.6% 3.945e+08-0.6% 3.989e+08 will-it-scale.workload --- kernel/sched/core.c | 40 +++- kernel/sched/sched.h | 13 + 2 files changed, 20 insertions(+), 33 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7af80c3fce12..f80245c7f903 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3985,15 +3985,20 @@ static void do_balance_callbacks(struct rq *rq, struct callback_head *head) } } +static void balance_push(struct rq *rq); + +struct callback_head balance_push_callback = { + .next = NULL, + .func = (void (*)(struct callback_head *))balance_push, +}; + static inline struct callback_head *splice_balance_callbacks(struct rq *rq) { struct callback_head *head = rq->balance_callback; lockdep_assert_held(&rq->lock); - if (head) { + if (head) rq->balance_callback = NULL; - rq->balance_flags &= ~BALANCE_WORK; - } return head; } @@ -4014,21 +4019,6 @@ static inline void balance_callbacks(struct rq *rq, struct callback_head *head) } } -static void balance_push(struct rq *rq); - -static inline void balance_switch(struct rq *rq) -{ - if (likely(!rq->balance_flags)) - return; - - if (rq->balance_flags & BALANCE_PUSH) { - balance_push(rq); - return; - } - - __balance_callbacks(rq); -} - #else static inline void __balance_callbacks(struct rq *rq) @@ -4044,10 +4034,6 @@ static inline void balance_callbacks(struct rq *rq, struct callback_head *head) { } -static inline void balance_switch(struct rq *rq) -{ -} - #endif static inline void @@ -4075,7 +4061,7 @@ static inline void finish_lock_switch(struct rq *rq) * prev into current: */ spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_); - balance_switch(rq); + __balance_callbacks(rq); raw_spin_unlock_irq(&rq->lock); } @@ -7256,6 +7242,10 @@ static void balance_push(struct rq *rq) lockdep_assert_held(&rq->lock); SCHED_WARN_ON(rq->cpu != smp_processor_id()); + /* +* Ensure the thing is persistent until balance_push_set(, on = false); +*/ + rq->balance_callback = &balance_push_callback; /* * Both the cpu-hotplug and stop task are in this case and are @@ -7305,9 +7295,9 @@ static void balance_push_set(int cpu, bool on) rq_lock_irqsave(rq, &rf); if (on) - rq->balance_flags |= BALANCE_PUSH; + rq->balance_callback = &balance_push_callback; else - rq->balance_flags &= ~BALANCE_PUSH; + rq->balance_callback = NULL; rq_unlock_irqrestore(rq, &rf); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f5acb6c5ce49..12ada79d40f3 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -975,7 +975,6 @@ struct rq { unsigned long cpu_capacity_orig; struct callback_head *balance_callback; - unsigned char balance_flags; unsigned char nohz_idle_balance; unsigned char idle_balance; @@ -1226,6 +1225,8 @@ struct rq_flags { #endif }; +extern struct callback_head balance_push_callback; + /* * Lockdep annotation that avoids accidental unlocks; it's like a * sticky/continuous lockdep_assert_held(). @@ -1243,9 +1244,9 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf) #ifdef CONFIG_SCHED_DEBUG
Re: [Intel-gfx] [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression
On 11/27/2020 5:34 AM, Chris Wilson wrote: Quoting Xing Zhengjun (2020-11-26 01:44:55) On 11/25/2020 4:47 AM, Chris Wilson wrote: Quoting Oliver Sang (2020-11-19 07:20:18) On Fri, Nov 13, 2020 at 04:27:13PM +0200, Joonas Lahtinen wrote: Hi, Could you add intel-...@lists.freedesktop.org into reports going forward. Quoting kernel test robot (2020-11-11 17:58:11) Greeting, FYI, we noticed a -54.0% regression of phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second due to commit: How many runs are there on the bad version to ensure the bisect is repeatable? test 4 times. zxing@inn:/result/phoronix-test-suite/performance-true-Radial_Gradient_Paint-1024x1024-jxrendermark-1.2.4-ucode=0xd6-monitor=da39a3ee/lkp-cfl-d1/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/59dd13ad310793757e34afa489dd6fc8544fc3da$ grep -r "operations_per_second" */stats.json 0/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4133.487932, 1/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4120.421503, 2/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4188.414835, 3/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4068.549514, a w/o revert (drm-tip) b w/ revert +mB+ | ..b | | ..b.aa | | a.a | | a.a | | b b a | | b b b b. a | | b bb bbb... | |b ab bbab.bb.bba b a aab a| | |__A__| | | |MA_|| +--+ NMin MaxMedian Avg Stddev a 120 3621.8761 7356.4442 4606.7895 4607.9132 156.17693 b 120 2664.0563 6359.9686 4519.5036 4534.4463 95.471121 The patch is not expected to have any impact on the machine you are testing on. -Chris What's your code base? For my side: 1) sync the code to the head of Linux mainline 2) git reset --hard 59dd13ad31 3) git revert 59dd13ad3107 We compare the test result of commit 59dd13ad3107 (step 2) and 2052847b06f8 (step 3, revert 59dd13ad3107), the regression should related with 59dd13ad3107. Each test case we run 5 times. a 59dd13ad31 b revert +mB+ |a | | aa | | .bba | | .bbaab | | .b . b b | |a b.. ..bb bb | | b a b.b.a bb | |aa b..aaa..b.b..bab b a .| | |__A__| | | |___A_| | +--+ NMin MaxMedian Avg Stddev a 120 3658.3435 6363.7812 4527.4406 4536.612 86.095459 b 120 3928.9643 6375.829 4576.0482 4585.4224 157.284 Could you share with me your test commands and the hardware info, then I can reproduce it on my side? Thanks. -- Zhengjun Xing
Re: [Intel-gfx] [drm/i915/gem] 59dd13ad31: phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -54.0% regression
On 11/25/2020 4:47 AM, Chris Wilson wrote: Quoting Oliver Sang (2020-11-19 07:20:18) On Fri, Nov 13, 2020 at 04:27:13PM +0200, Joonas Lahtinen wrote: Hi, Could you add intel-...@lists.freedesktop.org into reports going forward. Quoting kernel test robot (2020-11-11 17:58:11) Greeting, FYI, we noticed a -54.0% regression of phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second due to commit: How many runs are there on the bad version to ensure the bisect is repeatable? test 4 times. zxing@inn:/result/phoronix-test-suite/performance-true-Radial_Gradient_Paint-1024x1024-jxrendermark-1.2.4-ucode=0xd6-monitor=da39a3ee/lkp-cfl-d1/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/59dd13ad310793757e34afa489dd6fc8544fc3da$ grep -r "operations_per_second" */stats.json 0/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4133.487932, 1/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4120.421503, 2/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4188.414835, 3/stats.json: "phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second": 4068.549514, a w/o revert (drm-tip) b w/ revert +mB+ | ..b | | ..b.aa | | a.a | | a.a | | b b a | | b b b b. a | | b bb bbb... | |b ab bbab.bb.bba b a aab a| | |__A__| | | |MA_|| +--+ NMin MaxMedian Avg Stddev a 120 3621.8761 7356.4442 4606.7895 4607.9132 156.17693 b 120 2664.0563 6359.9686 4519.5036 4534.4463 95.471121 The patch is not expected to have any impact on the machine you are testing on. -Chris What's your code base? For my side: 1) sync the code to the head of Linux mainline 2) git reset --hard 59dd13ad31 3) git revert 59dd13ad3107 We compare the test result of commit 59dd13ad3107 (step 2) and 2052847b06f8 (step 3, revert 59dd13ad3107), the regression should related with 59dd13ad3107. Each test case we run 5 times. = tbox_group/testcase/rootfs/kconfig/compiler/need_x/test/option_a/option_b/cpufreq_governor/ucode/debug-setup: lkp-cfl-d1/phoronix-test-suite/debian-x86_64-phoronix/x86_64-rhel-8.3/gcc-9/true/jxrendermark-1.2.4/Radial Gradient Paint/1024x1024/performance/0xde/regression_test commit: 0dccdba51e852271a3dbc9358375f4c882b863f2 59dd13ad310793757e34afa489dd6fc8544fc3da 2052847b06f863a028f7f3bbc62401e043b34301 (revert 59dd13ad3107) 0dccdba51e852271 59dd13ad310793757e34afa489d 2052847b06f863a028f7f3bbc62 --- --- %stddev %change %stddev %change %stddev \ |\ |\ 8145 ± 2% -53.1% 3817 ± 3% -1.8% 7995 phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second -- Zhengjun Xing
Re: [drm/fb] 6a1b34c0a3: WARNING:at_drivers/gpu/drm/drm_fb_helper.c:#drm_fb_helper_damage_work
On 11/23/2020 4:04 PM, Thomas Zimmermann wrote: Hi Am 22.11.20 um 15:18 schrieb kernel test robot: Greeting, FYI, we noticed the following commit (built with gcc-9): commit: 6a1b34c0a339fdc75d7932ad5702f2177c9d7a1c ("drm/fb-helper: Move damage blit code and its setup into separate routine") url: https://github.com/0day-ci/linux/commits/Thomas-Zimmermann/drm-fb-helper-Various-fixes-and-cleanups/20201120-182750 in testcase: trinity version: trinity-static-i386-x86_64-f93256fb_2019-08-28 with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): That dmesg is full of messages like [ 696.323556] alloc_vmap_area: 24 callbacks suppressed [ 696.323562] vmap allocation for size 3149824 failed: use vmalloc= to increase size I think the test system needs to be reconfigured first. We have tried "vmalloc=256M" and "vmalloc=512M", the same warning still happened. Best regards Thomas +---+++ | | 154f2d1afd | 6a1b34c0a3 | +---+++ | WARNING:at_drivers/gpu/drm/drm_fb_helper.c:#drm_fb_helper_damage_work | 0 | 36 | | EIP:drm_fb_helper_damage_work | 0 | 36 | +---+++ If you fix the issue, kindly add following tag Reported-by: kernel test robot [ 106.616652] WARNING: CPU: 1 PID: 173 at drivers/gpu/drm/drm_fb_helper.c:434 drm_fb_helper_damage_work+0x371/0x390 [ 106.627732] Modules linked in: [ 106.632419] CPU: 1 PID: 173 Comm: kworker/1:2 Not tainted 5.10.0-rc4-next-20201120-7-g6a1b34c0a339 #3 [ 106.637806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 106.642853] Workqueue: events drm_fb_helper_damage_work [ 106.647664] EIP: drm_fb_helper_damage_work+0x371/0x390 [ 106.652305] Code: b1 17 c7 01 68 bd 5b 2d c5 53 50 68 55 21 2d c5 83 15 44 b1 17 c7 00 e8 ae bc b1 01 83 05 48 b1 17 c7 01 83 15 4c b1 17 c7 00 <0f> 0b 83 05 50 b1 17 c7 01 83 15 54 b1 17 c7 00 83 c4 10 e9 78 fd [ 106.663517] EAX: 002d EBX: c8730520 ECX: 0847 EDX: [ 106.668423] ESI: ca987000 EDI: cab274d8 EBP: f62f5f20 ESP: f62f5ee8 [ 106.673214] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010246 [ 106.678295] CR0: 80050033 CR2: CR3: 063a7000 CR4: 000406d0 [ 106.683160] DR0: DR1: DR2: DR3: [ 106.687967] DR6: fffe0ff0 DR7: 0400 [ 106.690763] Call Trace: [ 106.693394] process_one_work+0x3ea/0xaa0 [ 106.693501] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver [ 106.695300] worker_thread+0x330/0x900 [ 106.697406] ixgbevf: Copyright (c) 2009 - 2018 Intel Corporation. [ 106.702963] kthread+0x190/0x210 [ 106.705709] ? rescuer_thread+0x650/0x650 [ 106.708379] ? kthread_insert_work_sanity_check+0x120/0x120 [ 106.711271] ret_from_fork+0x1c/0x30 [ 106.713973] ---[ end trace dd528799d3369ac1 ]--- To reproduce: # build kernel cd linux cp config-5.10.0-rc4-next-20201120-7-g6a1b34c0a339 .config make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Thanks, Oliver Sang ___ LKP mailing list -- l...@lists.01.org To unsubscribe send an email to lkp-le...@lists.01.org -- Zhengjun Xing
Re: [LKP] Re: [mm] be5d0a74c6: will-it-scale.per_thread_ops -9.1% regression
On 11/17/2020 12:19 AM, Johannes Weiner wrote: On Sun, Nov 15, 2020 at 05:55:44PM +0800, kernel test robot wrote: Greeting, FYI, we noticed a -9.1% regression of will-it-scale.per_thread_ops due to commit: commit: be5d0a74c62d8da43f9526a5b08cdd18e2bbc37a ("mm: memcontrol: switch to native NR_ANON_MAPPED counter") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: nr_task: 50% mode: thread test: page_fault2 cpufreq_governor: performance ucode: 0x5002f01 I suspect it's the lock_page_memcg() in page_remove_rmap(). We already needed it for shared mappings, and this patch added it to private path as well, which this test exercises. The slowpath for this lock is extremely cold - most of the time it's just an rcu_read_lock(). But we're still doing the function call. Could you try if this patch helps, please? I apply the patch to Linux mainline v5.10-rc4, Linux-next next-20201117, and "be5d0a74c6", they are all failed. What's your codebase for the patch? I appreciate it if you can rebase the patch to "be5d0a74c6". From "be5d0a74c6" to v5.10-rc4 or next-20201117, there are a lot of commits, they will affect the test result. Thanks. From f6e8e56b369109d1362de2c27ea6601d5c411b2e Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 16 Nov 2020 10:48:06 -0500 Subject: [PATCH] lockpagememcg --- include/linux/memcontrol.h | 61 ++-- mm/memcontrol.c| 82 +++--- 2 files changed, 73 insertions(+), 70 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 20108e426f84..b4b73e375948 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -842,9 +842,64 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); extern bool cgroup_memory_noswap; #endif -struct mem_cgroup *lock_page_memcg(struct page *page); -void __unlock_page_memcg(struct mem_cgroup *memcg); -void unlock_page_memcg(struct page *page); +struct mem_cgroup *lock_page_memcg_slowpath(struct page *page, + struct mem_cgroup *memcg); +void unlock_page_memcg_slowpath(struct mem_cgroup *memcg); + +/** + * lock_page_memcg - lock a page and memcg binding + * @page: the page + * + * This function protects unlocked LRU pages from being moved to + * another cgroup. + * + * It ensures lifetime of the memcg -- the caller is responsible for + * the lifetime of the page; __unlock_page_memcg() is available when + * @page might get freed inside the locked section. + */ +static inline struct mem_cgroup *lock_page_memcg(struct page *page) +{ + struct page *head = compound_head(page); /* rmap on tail pages */ + struct mem_cgroup *memcg; + + /* +* The RCU lock is held throughout the transaction. The fast +* path can get away without acquiring the memcg->move_lock +* because page moving starts with an RCU grace period. +* +* The RCU lock also protects the memcg from being freed when +* the page state that is going to change is the only thing +* preventing the page itself from being freed. E.g. writeback +* doesn't hold a page reference and relies on PG_writeback to +* keep off truncation, migration and so forth. + */ + rcu_read_lock(); + + if (mem_cgroup_disabled()) + return NULL; + + memcg = page_memcg(head); + if (unlikely(!memcg)) + return NULL; + + if (likely(!atomic_read(&memcg->moving_account))) + return memcg; + + return lock_page_memcg_slowpath(head, memcg); +} + +static inline void __unlock_page_memcg(struct mem_cgroup *memcg) +{ + if (unlikely(memcg && memcg->move_lock_task == current)) + unlock_page_memcg_slowpath(memcg); + + rcu_read_unlock(); +} + +static inline void unlock_page_memcg(struct page *page) +{ + __unlock_page_memcg(page_memcg(compound_head(page))); +} /* * idx can be of type enum memcg_stat_item or node_stat_item. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 69a2893a6455..9acc42388b86 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2084,49 +2084,19 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) pr_cont(" are going to be killed due to memory.oom.group set\n"); } -/** - * lock_page_memcg - lock a page and memcg binding - * @page: the page - * - * This function protects unlocked LRU pages from being moved to - * another cgroup. - * - * It ensures lifetime of the returned memcg. Caller is responsible - * for the lifetime of the page; __unlock_page_memcg() is available - * when @page might get freed inside the locked section. - */ -struct mem_cgroup *lock_page_memcg(struct page *page) +struct me
Re: [LKP] Re: [mm] e6e88712e4: stress-ng.tmpfs.ops_per_sec -69.7% regression
On 11/7/2020 4:55 AM, Matthew Wilcox wrote: On Mon, Nov 02, 2020 at 01:21:39PM +0800, Rong Chen wrote: we compared the tmpfs.ops_per_sec: (363 / 103.02) between this commit and parent commit. Thanks! I see about a 50% hit on my system, and this patch restores the performance. Can you verify this works for you? diff --git a/mm/madvise.c b/mm/madvise.c index 9b065d412e5f..e602333f8c0d 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -225,7 +225,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, struct address_space *mapping) { XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); - pgoff_t end_index = end / PAGE_SIZE; + pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1); struct page *page; rcu_read_lock(); ___ LKP mailing list -- l...@lists.01.org To unsubscribe send an email to lkp-le...@lists.01.org I test the patch, the regression is disappeared. = tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/testtime/class/cpufreq_governor/ucode: lkp-csl-2sp3/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/1HDD/100s/memory/performance/0x42c commit: f5df8635c5a3c912919c91be64aa198554b0f9ed e6e88712e43b7942df451508aafc2f083266f56b 6bc25f0c5e0d55145f7ef087adea2693802a80f3 (this test patch) f5df8635c5a3c912 e6e88712e43b7942df451508aaf 6bc25f0c5e0d55145f7ef087ade --- --- %stddev %change %stddev %change %stddev \ |\ |\ 1198 ± 4% -69.7% 362.67+3.3% 1238 ± 3% stress-ng.tmpfs.ops 11.62 ± 4% -69.7% 3.52+3.4% 12.02 ± 3% stress-ng.tmpfs.ops_per_sec -- Zhengjun Xing
Re: [LKP] Re: [mm/gup] a308c71bf1: stress-ng.vm-splice.ops_per_sec -95.6% regression
On 11/6/2020 2:37 AM, Linus Torvalds wrote: On Thu, Nov 5, 2020 at 12:29 AM Xing Zhengjun wrote: Rong - mind testing this? I don't think the zero-page _should_ be something that real loads care about, but hey, maybe people do want to do things like splice zeroes very efficiently.. I test the patch, the regression still existed. Thanks. So Jann's suspicion seems interesting but apparently not the reason for this particular case. For being such a _huge_ difference (20x improvement followed by a 20x regression), it's surprising how little the numbers give a clue. The big changes are things like "interrupts.CPU19.CAL:Function_call_interrupts", but while those change by hundreds of percent, most of the changes seem to just be about them moving to different CPU's. IOW, we have things like 5652 ± 59%+387.9% 27579 ± 96% interrupts.CPU13.CAL:Function_call_interrupts 28249 ± 32% -69.3% 8675 ± 50% interrupts.CPU28.CAL:Function_call_interrupts which isn't really much of a change at all despite the changes looking very big - it's just the stats jumping from one CPU to another. Maybe there's some actual change in there, but it's very well hidden if so. Yes, some of the numbers get worse: 868396 ± 3% +20.9%1050234 ± 14% interrupts.RES:Rescheduling_interrupts so that's a 20% increase in rescheduling interrupts, But it's a 20% increase, not a 500% one. So the fact that performance changes by 20x is still very unclear to me. We do have a lot of those numa-meminfo changes, but they could just come from allocation patterns. That said - another difference between the fast-cup code and the regular gup code is that the fast-gup code does if (pte_protnone(pte)) goto pte_unmap; and the regular slow case does if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; now, FOLL_NUMA is always set in the slow case if we don't have FOLL_FORCE set, so this difference isn't "real", but it's one of those cases where the zero-page might be marked for NUMA faulting, and doing the forced COW might then cause it to be accessible. Just out of curiosity, do the numbers change enormously if you just remove that if (pte_protnone(pte)) goto pte_unmap; test from the fast-cup case (top of the loop in gup_pte_range()) - effectively making fast-gup basically act like FOLL_FORCE wrt numa placement.. Based on the last debug patch, I removed the two lines code at the top of the loop in gup_pte_range() as you mentioned, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/testtime/class/cpufreq_governor/ucode: lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/1HDD/30s/pipe/performance/0x5002f01 commit: 1a0cf26323c80e2f1c58fc04f15686de61bfab0c a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a da5ba9980aa2211c1e2a89fc814abab2fea6f69d (last debug patch) 8803d304738b52f66f6b683be38c4f8b9cf4bff5 (to debug the odd performance numbers) 1a0cf26323c80e2f a308c71bf1e6e19cc2e4ced3185 da5ba9980aa2211c1e2a89fc814 8803d304738b52f66f6b683be38 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 3.406e+09 -95.6% 1.49e+08 -96.4% 1.213e+08 -96.5% 1.201e+08stress-ng.vm-splice.ops 1.135e+08 -95.6%4965911 -96.4%4041777 -96.5%4002572stress-ng.vm-splice.ops_per_sec I'm not convinced that's a valid change in general, so this is just a "to debug the odd performance numbers" issue. Also out of curiosity: is the performance profile limited to just the load, or is it a system profile (ie do you have "-a" on the perf record line or not). In our test , "-a" is enabled on the perf record line. Linus -- Zhengjun Xing
Re: [LKP] Re: [mm/gup] a308c71bf1: stress-ng.vm-splice.ops_per_sec -95.6% regression
On 11/5/2020 2:29 AM, Linus Torvalds wrote: On Mon, Nov 2, 2020 at 1:15 AM kernel test robot wrote: Greeting, FYI, we noticed a -95.6% regression of stress-ng.vm-splice.ops_per_sec due to commit: commit: a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a ("mm/gup: Remove enfornced COW mechanism") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master Note that this is just the reverse of the previous 2000% improvement reported by the test robot here: https://lore.kernel.org/lkml/20200611040453.GK12456@shao2-debian/ and the explanation seems to remain the same: https://lore.kernel.org/lkml/cag48ez1v1b4x5lgfya6nvi33-twwqna_dc5jgfvosqqhdn_...@mail.gmail.com/ IOW, this is testing a special case (zero page lookup) that the "force COW" patches happened to turn into a regular case (COW creating a regular page from the zero page). The question is whether we should care about the zero page for gup_fast lookup. If we do care, then the proper fix is likely simply to allow the zero page in fast-gup, the same way we already do in slow-gup. ENTIRELY UNTESTED PATCH ATTACHED. Rong - mind testing this? I don't think the zero-page _should_ be something that real loads care about, but hey, maybe people do want to do things like splice zeroes very efficiently.. I test the patch, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/testtime/class/cpufreq_governor/ucode: lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/1HDD/30s/pipe/performance/0x5002f01 commit: 1a0cf26323c80e2f1c58fc04f15686de61bfab0c a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a da5ba9980aa2211c1e2a89fc814abab2fea6f69d (debug patch) 1a0cf26323c80e2f a308c71bf1e6e19cc2e4ced3185 da5ba9980aa2211c1e2a89fc814 --- --- %stddev %change %stddev %change %stddev \ |\ |\ 3.406e+09 -95.6% 1.49e+08 -96.4% 1.213e+08 stress-ng.vm-splice.ops 1.135e+08 -95.6%4965911 -96.4%4041777 stress-ng.vm-splice.ops_per_sec And note the "untested" part of the patch. It _looks_ fairly obvious, but maybe I'm missing something. Linus ___ LKP mailing list -- l...@lists.01.org To unsubscribe send an email to lkp-le...@lists.01.org -- Zhengjun Xing
Re: [LKP] Re: [mm/memcg] bd0b230fe1: will-it-scale.per_process_ops -22.7% regression
On 11/2/2020 6:02 PM, Michal Hocko wrote: On Mon 02-11-20 17:53:14, Rong Chen wrote: On 11/2/20 5:27 PM, Michal Hocko wrote: On Mon 02-11-20 17:15:43, kernel test robot wrote: Greeting, FYI, we noticed a -22.7% regression of will-it-scale.per_process_ops due to commit: commit: bd0b230fe14554bfffbae54e19038716f96f5a41 ("mm/memcg: unify swap and memsw page counters") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master I really fail to see how this can be anything else than a data structure layout change. There is one counter less. btw. are cgroups configured at all? What would be the configuration? Hi Michal, We used the default configure of cgroups, not sure what configuration you want, could you give me more details? and here is the cgroup info of will-it-scale process: $ cat /proc/3042/cgroup 12:hugetlb:/ 11:memory:/system.slice/lkp-bootstrap.service OK, this means that memory controler is enabled and in use. Btw. do you get the original performance if you add one phony page_counter after the union? I add one phony page_counter after the union and re-test, the regression reduced to -1.2%. It looks like the regression caused by the data structure layout change. = tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode/debug-setup: lkp-hsw-4ex1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/50%/process/page_fault2/performance/0x16/test1 commit: 8d387a5f172f26ff8c76096d5876b881dec6b7ce bd0b230fe14554bfffbae54e19038716f96f5a41 b3233916ab0a883e1117397e28b723bd0e4ac1eb (debug patch add one phony page_counter after the union) 8d387a5f172f26ff bd0b230fe14554bfffbae54e190 b3233916ab0a883e1117397e28b --- --- %stddev %change %stddev %change %stddev \ |\ |\ 187632 -22.8% 144931-1.2% 185391 will-it-scale.per_process_ops 13509525 -22.8% 10435073-1.2% 13348181 will-it-scale.workload -- Zhengjun Xing
Re: [LKP] Re: [btrfs] c75e839414: aim7.jobs-per-min -9.1% regression
Hi Josef, I re-test it in v5.10-rc2, the regression still existed. Do you have time to take a look at this? Thanks. On 10/13/2020 2:30 PM, Xing Zhengjun wrote: Hi Josef, I re-test in v5.9, the regression still existed. Do you have time to take a look at this? Thanks. On 6/15/2020 11:21 AM, Xing Zhengjun wrote: Hi Josef, Do you have time to take a look at this? Thanks. On 6/12/2020 2:11 PM, kernel test robot wrote: Greeting, FYI, we noticed a -9.1% regression of aim7.jobs-per-min due to commit: commit: c75e839414d3610e6487ae3145199c500d55f7f7 ("btrfs: kill the subvol_srcu") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: aim7 on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: disk: 4BRD_12G md: RAID0 fs: btrfs test: disk_wrt load: 1500 cpufreq_governor: performance ucode: 0x52c test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system. test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/4BRD_12G/btrfs/x86_64-rhel-7.6/1500/RAID0/debian-x86_64-20191114.cgz/lkp-csl-2ap2/disk_wrt/aim7/0x52c commit: efc3453494 ("btrfs: make btrfs_cleanup_fs_roots use the radix tree lock") c75e839414 ("btrfs: kill the subvol_srcu") efc3453494af7818 c75e839414d3610e6487ae31451 --- fail:runs %reproduction fail:runs | | | 3:9 -33% :8 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x %stddev %change %stddev \ | \ 29509 ± 2% -9.1% 26837 ± 2% aim7.jobs-per-min 305.28 ± 2% +10.0% 335.72 ± 2% aim7.time.elapsed_time 305.28 ± 2% +10.0% 335.72 ± 2% aim7.time.elapsed_time.max 4883135 ± 10% +37.9% 6735464 ± 7% aim7.time.involuntary_context_switches 56288 ± 2% +10.5% 62202 ± 2% aim7.time.system_time 2344783 +6.5% 2497364 ± 2% aim7.time.voluntary_context_switches 62337721 ± 2% +9.8% 68456490 ± 2% turbostat.IRQ 431.56 ± 6% +22.3% 527.88 ± 4% vmstat.procs.r 27340 ± 2% +11.2% 30397 ± 2% vmstat.system.cs 226804 ± 6% +21.7% 276057 ± 4% meminfo.Active(file) 221309 ± 6% +22.3% 270668 ± 4% meminfo.Dirty 720.89 ±111% +49.3% 1076 ± 73% meminfo.Mlocked 14278 ± 2% -8.3% 13094 ± 2% meminfo.max_used_kB 57228 ± 6% +22.7% 70195 ± 5% numa-meminfo.node0.Active(file) 55433 ± 6% +21.6% 67431 ± 4% numa-meminfo.node0.Dirty 56152 ± 6% +21.4% 68180 ± 5% numa-meminfo.node1.Active(file) 55001 ± 6% +22.5% 67397 ± 4% numa-meminfo.node1.Dirty 56373 ± 6% +21.7% 68594 ± 4% numa-meminfo.node2.Active(file) 55222 ± 7% +22.6% 67726 ± 4% numa-meminfo.node2.Dirty 56671 ± 6% +20.5% 68317 ± 3% numa-meminfo.node3.Active(file) 55285 ± 6% +21.8% 67355 ± 4% numa-meminfo.node3.Dirty 56694 ± 6% +21.7% 69019 ± 4% proc-vmstat.nr_active_file 55342 ± 6% +22.3% 67662 ± 4% proc-vmstat.nr_dirty 402316 +2.1% 410951 proc-vmstat.nr_file_pages 180.22 ±111% +49.4% 269.25 ± 73% proc-vmstat.nr_mlock 56694 ± 6% +21.7% 69019 ± 4% proc-vmstat.nr_zone_active_file 54680 ± 6% +22.8% 67168 ± 4% proc-vmstat.nr_zone_write_pending 3144381 ± 2% +6.1% 3335275 proc-vmstat.pgactivate 1387558 ± 2% +7.9% 1496754 ± 2% proc-vmstat.pgfault 983.33 ± 4% +5.4% 1036 proc-vmstat.unevictable_pgs_culled 14331 ± 6% +22.6% 17566 ± 5% numa-vmstat.node0.nr_active_file 13884 ± 6% +21.6% 16884 ± 4% numa-vmstat.node0.nr_dirty 14330 ± 6% +22.6% 17566 ± 5% numa-vmstat.node0.nr_zone_active_file 13714 ± 6% +22.2% 16755 ± 4% numa-vmstat.node0.nr_zone_write_pending 14047 ± 6% +21.3% 17043 ± 4%
Re: [LKP] Re: [sched] bdfcae1140: will-it-scale.per_thread_ops -37.0% regression
On 10/22/2020 9:19 PM, Mathieu Desnoyers wrote: - On Oct 21, 2020, at 9:54 PM, Xing Zhengjun zhengjun.x...@linux.intel.com wrote: [...] In fact, 0-day just copy the will-it-scale benchmark from the GitHub, if you think the will-it-scale benchmark has some issues, you can contribute your idea and help to improve it, later we will update the will-it-scale benchmark to the new version. This is why I CC'd the maintainer of the will-it-scale github project, Anton Blanchard. My main intent is to report this issue to him, but I have not heard back from him yet. Is this project maintained ? Let me try to add his ozlabs.org address in CC. For this test case, if we bind the workload to a specific CPU, then it will hide the scheduler balance issue. In the real world, we seldom bind the CPU... When you say that you bind the workload to a specific CPU, is that done outside of the will-it-scale testsuite, thus limiting the entire testsuite to a single CPU, or you expect that internally the will-it-scale context-switch1 test gets affined to a single specific CPU/core/hardware thread through use of hwloc ? The later one. Thanks, Mathieu -- Zhengjun Xing
Re: [LKP] Re: [sched] bdfcae1140: will-it-scale.per_thread_ops -37.0% regression
On 10/20/2020 9:14 PM, Mathieu Desnoyers wrote: - On Oct 19, 2020, at 11:24 PM, Xing Zhengjun zhengjun.x...@linux.intel.com wrote: On 10/7/2020 10:50 PM, Mathieu Desnoyers wrote: - On Oct 2, 2020, at 4:33 AM, Rong Chen rong.a.c...@intel.com wrote: Greeting, FYI, we noticed a -37.0% regression of will-it-scale.per_thread_ops due to commit: commit: bdfcae11403e5099769a7c8dc3262e3c4193edef ("[RFC PATCH 2/3] sched: membarrier: cover kthread_use_mm (v3)") url: https://github.com/0day-ci/linux/commits/Mathieu-Desnoyers/Membarrier-updates/20200925-012549 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 848785df48835eefebe0c4eb5da7690690b0a8b7 in testcase: will-it-scale on test machine: 104 threads Skylake with 192G memory with following parameters: nr_task: 50% mode: thread test: context_switch1 cpufreq_governor: performance ucode: 0x2006906 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Hi, I would like to report what I suspect is a random thread placement issue in the context_switch1 test used by the 0day bot when running on a machine with hyperthread enabled. AFAIU the test code uses hwloc for thread placement which should theoretically ensure that each thread is placed on same processing unit, core and numa node between runs. We can find the test code here: https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c And the main file containing thread setup is here: https://github.com/antonblanchard/will-it-scale/blob/master/main.c AFAIU, the test is started without the "-m" switch, which therefore affinitizes tasks on cores rather than on processing units (SMT threads). When testcase() creates the child thread with new_task(), it basically issues: pthread_create(&threads[nr_threads++], NULL, func, arg); passing a NULL pthread_attr_t, and not executing any pre_trampoline on the child. The pre_trampoline would have issued hwloc_set_thread_cpubind if it were executed on the child, but it's not. Therefore, we expect the cpu affinity mask of the parent to be copied on clone and used by the child. A quick test on a machine with hyperthreading enabled shows that the cpu affinity mask for the parent and child has two bits set: taskset -p 1868607 pid 1868607's current affinity mask: 10001 taskset -p 1868606 pid 1868606's current affinity mask: 10001 So AFAIU the placement of the parent and child will be random on either the same processing unit, or on separate processing units within the same core. I suspect this randomness can significantly affect the performance number between runs, and trigger unwarranted performance regression warnings. Thanks, Mathieu Yes, the randomness may happen in some special cases. But in 0-day, we test multi times (>=3), the report is the average number. For this case, we test 4 times, it is stable, the wave is ± 2%. So I don't think the -37.0% regression is caused by the randomness. 0/stats.json: "will-it-scale.per_thread_ops": 105228, 1/stats.json: "will-it-scale.per_thread_ops": 100443, 2/stats.json: "will-it-scale.per_thread_ops": 98786, 3/stats.json: "will-it-scale.per_thread_ops": 102821, c2daff748f0ea954 bdfcae11403e5099769a7c8dc32 --- %stddev %change %stddev \ |\ 161714 ± 2% -37.0% 101819 ± 2% will-it-scale.per_thread_ops Arguing whether this specific instance of the test is indeed a performance regression or not is not relevant to this discussion. What I am pointing out here is that the test needs fixing because it generates noise due to a random thread placement configuration. This issue is about whether we can trust the results of those tests as kernel maintainers. So on one hand, you can fix the test. This is simple to do: make sure the thread affinity does not allow for this randomness on SMT. But you seem to argue that the test does not need to be fixed, because the 0day infrastructure in which it runs will cover for this randomness. I really doubt about this. If you indeed choose to argue that the test does not need fixing, then here is the statistical analysis I am looking for: - With the 4 runs, what are the odds that the average result for one class significantly differs from the other class due to this randomness. It may be small, but it is certainly not zero, If 4 runs are not enough, how many times' run do you think is OK? In fact, I have re-test it for more than 10 times, the test result is almost the same. ===
Re: [LKP] Re: Unreliable will-it-scale context_switch1 test on 0day bot
On 10/19/2020 11:24 PM, Philip Li wrote: On Mon, Oct 19, 2020 at 09:27:32AM -0400, Mathieu Desnoyers wrote: Hi, I pointed out an issue with the will-it-scale context_switch1 test run by the 0day bot on October 7, 2020, and got no reply. Thanks Mathieu for the feedback, we had added it to the TODO list but sorry for not reply in time. Zhengjun, can you help follow up this mail thread? I have replied in the origin mail. Until this issue is solved, the results of those tests are basically pure noise when run on SMT hardware: https://lore.kernel.org/lkml/1183082664.11002.1602082242482.javamail.zim...@efficios.com/ Who is maintaining those tests and the 0day bot ? will-it-scale itself is from community at https://github.com/antonblanchard/will-it-scale and we will look for the support if we don't have quick solution. 0day bot basically wraps the test and analyze the result to find which commit leads to change. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ___ LKP mailing list -- l...@lists.01.org To unsubscribe send an email to lkp-le...@lists.01.org ___ LKP mailing list -- l...@lists.01.org To unsubscribe send an email to lkp-le...@lists.01.org -- Zhengjun Xing
Re: [LKP] Re: [sched] bdfcae1140: will-it-scale.per_thread_ops -37.0% regression
On 10/7/2020 10:50 PM, Mathieu Desnoyers wrote: - On Oct 2, 2020, at 4:33 AM, Rong Chen rong.a.c...@intel.com wrote: Greeting, FYI, we noticed a -37.0% regression of will-it-scale.per_thread_ops due to commit: commit: bdfcae11403e5099769a7c8dc3262e3c4193edef ("[RFC PATCH 2/3] sched: membarrier: cover kthread_use_mm (v3)") url: https://github.com/0day-ci/linux/commits/Mathieu-Desnoyers/Membarrier-updates/20200925-012549 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 848785df48835eefebe0c4eb5da7690690b0a8b7 in testcase: will-it-scale on test machine: 104 threads Skylake with 192G memory with following parameters: nr_task: 50% mode: thread test: context_switch1 cpufreq_governor: performance ucode: 0x2006906 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Hi, I would like to report what I suspect is a random thread placement issue in the context_switch1 test used by the 0day bot when running on a machine with hyperthread enabled. AFAIU the test code uses hwloc for thread placement which should theoretically ensure that each thread is placed on same processing unit, core and numa node between runs. We can find the test code here: https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c And the main file containing thread setup is here: https://github.com/antonblanchard/will-it-scale/blob/master/main.c AFAIU, the test is started without the "-m" switch, which therefore affinitizes tasks on cores rather than on processing units (SMT threads). When testcase() creates the child thread with new_task(), it basically issues: pthread_create(&threads[nr_threads++], NULL, func, arg); passing a NULL pthread_attr_t, and not executing any pre_trampoline on the child. The pre_trampoline would have issued hwloc_set_thread_cpubind if it were executed on the child, but it's not. Therefore, we expect the cpu affinity mask of the parent to be copied on clone and used by the child. A quick test on a machine with hyperthreading enabled shows that the cpu affinity mask for the parent and child has two bits set: taskset -p 1868607 pid 1868607's current affinity mask: 10001 taskset -p 1868606 pid 1868606's current affinity mask: 10001 So AFAIU the placement of the parent and child will be random on either the same processing unit, or on separate processing units within the same core. I suspect this randomness can significantly affect the performance number between runs, and trigger unwarranted performance regression warnings. Thanks, Mathieu Yes, the randomness may happen in some special cases. But in 0-day, we test multi times (>=3), the report is the average number. For this case, we test 4 times, it is stable, the wave is ± 2%. So I don't think the -37.0% regression is caused by the randomness. 0/stats.json: "will-it-scale.per_thread_ops": 105228, 1/stats.json: "will-it-scale.per_thread_ops": 100443, 2/stats.json: "will-it-scale.per_thread_ops": 98786, 3/stats.json: "will-it-scale.per_thread_ops": 102821, c2daff748f0ea954 bdfcae11403e5099769a7c8dc32 --- %stddev %change %stddev \ |\ 161714 ± 2% -37.0% 101819 ± 2% will-it-scale.per_thread_ops -- Zhengjun Xing
Re: [LKP] Re: [btrfs] c75e839414: aim7.jobs-per-min -9.1% regression
Hi Josef, I re-test in v5.9, the regression still existed. Do you have time to take a look at this? Thanks. On 6/15/2020 11:21 AM, Xing Zhengjun wrote: Hi Josef, Do you have time to take a look at this? Thanks. On 6/12/2020 2:11 PM, kernel test robot wrote: Greeting, FYI, we noticed a -9.1% regression of aim7.jobs-per-min due to commit: commit: c75e839414d3610e6487ae3145199c500d55f7f7 ("btrfs: kill the subvol_srcu") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: aim7 on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: disk: 4BRD_12G md: RAID0 fs: btrfs test: disk_wrt load: 1500 cpufreq_governor: performance ucode: 0x52c test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system. test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/4BRD_12G/btrfs/x86_64-rhel-7.6/1500/RAID0/debian-x86_64-20191114.cgz/lkp-csl-2ap2/disk_wrt/aim7/0x52c commit: efc3453494 ("btrfs: make btrfs_cleanup_fs_roots use the radix tree lock") c75e839414 ("btrfs: kill the subvol_srcu") efc3453494af7818 c75e839414d3610e6487ae31451 --- fail:runs %reproduction fail:runs | | | 3:9 -33% :8 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x %stddev %change %stddev \ | \ 29509 ± 2% -9.1% 26837 ± 2% aim7.jobs-per-min 305.28 ± 2% +10.0% 335.72 ± 2% aim7.time.elapsed_time 305.28 ± 2% +10.0% 335.72 ± 2% aim7.time.elapsed_time.max 4883135 ± 10% +37.9% 6735464 ± 7% aim7.time.involuntary_context_switches 56288 ± 2% +10.5% 62202 ± 2% aim7.time.system_time 2344783 +6.5% 2497364 ± 2% aim7.time.voluntary_context_switches 62337721 ± 2% +9.8% 68456490 ± 2% turbostat.IRQ 431.56 ± 6% +22.3% 527.88 ± 4% vmstat.procs.r 27340 ± 2% +11.2% 30397 ± 2% vmstat.system.cs 226804 ± 6% +21.7% 276057 ± 4% meminfo.Active(file) 221309 ± 6% +22.3% 270668 ± 4% meminfo.Dirty 720.89 ±111% +49.3% 1076 ± 73% meminfo.Mlocked 14278 ± 2% -8.3% 13094 ± 2% meminfo.max_used_kB 57228 ± 6% +22.7% 70195 ± 5% numa-meminfo.node0.Active(file) 55433 ± 6% +21.6% 67431 ± 4% numa-meminfo.node0.Dirty 56152 ± 6% +21.4% 68180 ± 5% numa-meminfo.node1.Active(file) 55001 ± 6% +22.5% 67397 ± 4% numa-meminfo.node1.Dirty 56373 ± 6% +21.7% 68594 ± 4% numa-meminfo.node2.Active(file) 55222 ± 7% +22.6% 67726 ± 4% numa-meminfo.node2.Dirty 56671 ± 6% +20.5% 68317 ± 3% numa-meminfo.node3.Active(file) 55285 ± 6% +21.8% 67355 ± 4% numa-meminfo.node3.Dirty 56694 ± 6% +21.7% 69019 ± 4% proc-vmstat.nr_active_file 55342 ± 6% +22.3% 67662 ± 4% proc-vmstat.nr_dirty 402316 +2.1% 410951 proc-vmstat.nr_file_pages 180.22 ±111% +49.4% 269.25 ± 73% proc-vmstat.nr_mlock 56694 ± 6% +21.7% 69019 ± 4% proc-vmstat.nr_zone_active_file 54680 ± 6% +22.8% 67168 ± 4% proc-vmstat.nr_zone_write_pending 3144381 ± 2% +6.1% 3335275 proc-vmstat.pgactivate 1387558 ± 2% +7.9% 1496754 ± 2% proc-vmstat.pgfault 983.33 ± 4% +5.4% 1036 proc-vmstat.unevictable_pgs_culled 14331 ± 6% +22.6% 17566 ± 5% numa-vmstat.node0.nr_active_file 13884 ± 6% +21.6% 16884 ± 4% numa-vmstat.node0.nr_dirty 14330 ± 6% +22.6% 17566 ± 5% numa-vmstat.node0.nr_zone_active_file 13714 ± 6% +22.2% 16755 ± 4% numa-vmstat.node0.nr_zone_write_pending 14047 ± 6% +21.3% 17043 ± 4% numa-vmstat.node1.nr_active_file 13763 ± 6% +22.3% 16838 ± 4% numa-vmstat.node1.nr_dirty 14047 ± 6% +21.3%
Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression
On 10/13/2020 11:01 AM, Mike Kravetz wrote: On 10/12/20 6:59 PM, Xing Zhengjun wrote: On 10/13/2020 1:40 AM, Mike Kravetz wrote: On 10/11/20 10:29 PM, Xing Zhengjun wrote: Hi Mike, I re-test it in v5.9-rc8, the regression still existed. It is almost the same as 34ae204f1851. Do you have time to look at it? Thanks. Thank you for testing. Just curious, did you apply the series in this thread or just test v5.9-rc8? If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851, so results being the same are expected. I just test v5.9-rc8. Where can I find the series patches you mentioned here? Or just wait for the next mainline release? My apologies. I missed that you were not cc'ed on this thred: https://lore.kernel.org/linux-mm/20200706202615.32111-1-mike.krav...@oracle.com/ As mentioned, there will likely be another revision to the way locking is handled. The new scheme will try to consider performance as is done in the above link. I suggest you wait for next revision. If you do not mind, I will cc you when the new code is posted. OK. I will wait for the next revision. -- Zhengjun Xing
Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression
On 10/13/2020 1:40 AM, Mike Kravetz wrote: On 10/11/20 10:29 PM, Xing Zhengjun wrote: Hi Mike, I re-test it in v5.9-rc8, the regression still existed. It is almost the same as 34ae204f1851. Do you have time to look at it? Thanks. Thank you for testing. Just curious, did you apply the series in this thread or just test v5.9-rc8? If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851, so results being the same are expected. I just test v5.9-rc8. Where can I find the series patches you mentioned here? Or just wait for the next mainline release? There are some functional issues with this new hugetlb locking model that are currently being worked. It is likely to result in significantly different code. The performance issues discovered here will be taken into account with the new code. However, as previously mentioned additional synchronization is required for functional correctness. As a result, there will be some regression in this code. -- Zhengjun Xing
Re: [LKP] [fs] b6509f6a8c: will-it-scale.per_thread_ops -12.6% regression
On 10/12/2020 4:18 PM, Mel Gorman wrote: On Mon, Oct 12, 2020 at 02:20:26PM +0800, Xing Zhengjun wrote: Hi Mel, It is a revert commit caused the regression, Do you have a plan to fix it? Thanks. I re-test it in v5.9-rc8, the regression still existed. The revert caused a *performance* regression but the original performance gain caused a functional failure. The overall performance should be unchanged. I have not revisited the topic since. Thanks for the explanation. We will stop tracking it. -- Zhengjun Xing
Re: [LKP] [fs] b6509f6a8c: will-it-scale.per_thread_ops -12.6% regression
Hi Mel, It is a revert commit caused the regression, Do you have a plan to fix it? Thanks. I re-test it in v5.9-rc8, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode: lkp-csl-2ap4/will-it-scale/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-9/100%/thread/eventfd1/performance/0x5002f01 commit: v5.8-rc3 b6509f6a8c4313c068c69785c001451415969e44 v5.8 v5.9-rc1 v5.9-rc8 v5.8-rc3 b6509f6a8c4313c068c69785c00 v5.8v5.9-rc1v5.9-rc8 --- --- --- --- %stddev %change %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ |\ 1652352 -12.6%1444002 ± 2% -13.3%1431865 -9.9%1489323-9.1%1502580 will-it-scale.per_thread_ops 3.173e+08 -12.6% 2.772e+08 ± 2% -13.3% 2.749e+08 -9.9% 2.86e+08-9.1% 2.885e+08 will-it-scale.workload On 7/6/2020 9:20 AM, kernel test robot wrote: Greeting, FYI, we noticed a -12.6% regression of will-it-scale.per_thread_ops due to commit: commit: b6509f6a8c4313c068c69785c001451415969e44 ("Revert "fs: Do not check if there is a fsnotify watcher on pseudo inodes"") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: nr_task: 100% mode: thread test: eventfd1 cpufreq_governor: performance ucode: 0x5002f01 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale In addition to that, the commit also has significant impact on the following tests: +--+---+ | testcase: change | will-it-scale: will-it-scale.per_process_ops -6.4% regression | | test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=100% | | | test=unix1 | | | ucode=0x5002f01 | +--+---+ | testcase: change | will-it-scale: will-it-scale.per_thread_ops -2.3% regression | | test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | | test parameters | cpufreq_governor=performance | | | mode=thread | | | nr_task=100% | | | test=pipe1 | | | ucode=0x5002f01 | +--+---+ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-csl-2ap4/eventfd1/will-it-scale/0x5002f01 commit: v5.8-rc3 b6509f6a8c ("Revert "fs: Do not check if there is a fsnotify watcher on pseudo inodes"") v5.8-rc3 b6509f6a8c4313c068c69785c00 --- %stddev %change %stddev \ |\ 1652
Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression
Hi Mike, I re-test it in v5.9-rc8, the regression still existed. It is almost the same as 34ae204f1851. Do you have time to look at it? Thanks. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode: lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11 commit: 49aef7175cc6eb703a9280a7b830e675fe8f2704 c0d0381ade79885c04a04c303284b040616b116e v5.8 34ae204f18519f0920bd50a644abd6fefc8dbfcf v5.9-rc1 v5.9-rc8 49aef7175cc6eb70 c0d0381ade79885c04a04c30328v5.8 34ae204f18519f0920bd50a644av5.9-rc1 v5.9-rc8 --- --- --- --- --- %stddev %change %stddev %change %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ |\ | \ 38043 ± 3% -30.2% 26560 ± 4% -29.5% 26815 ± 6% -7.4% 35209 ± 2% -7.4% 35244-8.8% 34704vm-scalability.median 7.86 ± 19% +9.7 17.54 ± 21% +10.4 18.23 ± 34% -3.14.75 ± 7% -4.53.36 ± 7% -4.0 3.82 ± 15% vm-scalability.median_stddev% 12822071 ± 3% -34.1%8450822 ± 4% -33.6%8517252 ± 6% -10.7% 11453675 ± 2% -10.2% 11513595 ± 2% -11.6% 11331657vm-scalability.throughput 2.523e+09 ± 3% -20.7% 2.001e+09 ± 5% -19.9% 2.021e+09 ± 7% +6.8% 2.694e+09 ± 2% +7.3% 2.707e+09 ± 2% +5.4% 2.661e+09vm-scalability.workload On 8/22/2020 7:36 AM, Mike Kravetz wrote: On 8/21/20 2:02 PM, Mike Kravetz wrote: Would you be willing to test this series on top of 34ae204f1851? I will need to rebase the series to take the changes made by 34ae204f1851 into account. Actually, the series in this thread will apply/run cleanly on top of 34ae204f1851. No need to rebase or port. If we decide to move forward more work is required. See a few FIXME's in the patches. -- Zhengjun Xing
Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression
On 6/26/2020 5:33 AM, Mike Kravetz wrote: On 6/22/20 3:01 PM, Mike Kravetz wrote: On 6/21/20 5:55 PM, kernel test robot wrote: Greeting, FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit: commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: vm-scalability on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory with following parameters: runtime: 300s size: 8T test: anon-cow-seq-hugetlb cpufreq_governor: performance ucode: 0x11 Some performance regression is not surprising as the change includes acquiring and holding the i_mmap_rwsem (in read mode) during hugetlb page faults. 33.4% seems a bit high. But, the test is primarily exercising the hugetlb page fault path and little else. The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from invalidating the pmd we are operating on. This specific test case is operating on anonymous private mappings. So, PMD sharing is not possible and we can eliminate acquiring the mutex in this case. In fact, we should check all mappings (even sharable) for the possibly of PMD sharing and only take the mutex if necessary. It will make the code a bit uglier, but will take care of some of these regressions. We still need to take the mutex in the case of PMD sharing. I'm afraid a regression is unavoidable in that case. I'll put together a patch. Not acquiring the mutex on faults when sharing is not possible is quite straight forward. We can even use the existing routine vma_shareable() to easily check. However, the next patch in the series 87bf91d39bb5 "hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends on always acquiring the mutex. If we break this assumption, then the code to back out hugetlb reservations needs to be written. A high level view of what needs to be done is in the commit message for 87bf91d39bb5. I'm working on the code to back out reservations. I find that 34ae204f18519f0920bd50a644abd6fefc8dbfcf(hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem) fixed this regression, I test with the patch, the regression reduced to 10.1%, do you have plan to continue to improve it? Thanks. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode: lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11 commit: 49aef7175cc6eb703a9280a7b830e675fe8f2704 c0d0381ade79885c04a04c303284b040616b116e v5.8 34ae204f18519f0920bd50a644abd6fefc8dbfcf v5.9-rc1 49aef7175cc6eb70 c0d0381ade79885c04a04c30328v5.8 34ae204f18519f0920bd50a644av5.9-rc1 --- --- --- --- %stddev %change %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ |\ 38084 -31.1% 26231 ± 2% -26.6% 27944 ± 5% -7.0% 35405-7.5% 35244 vm-scalability.median 9.92 ± 9% +12.0 21.95 ± 4% +3.9 13.87 ± 30% -5.34.66 ± 9% -6.63.36 ± 7% vm-scalability.median_stddev% 12827311 -35.0%8340256 ± 2% -30.9%8865669 ± 5% -10.1% 11532087 -10.2% 11513595 ± 2% vm-scalability.throughput 2.507e+09 -22.7% 1.938e+09 -15.3% 2.122e+09 ± 6% +8.0% 2.707e+09+8.0% 2.707e+09 ± 2% vm-scalability.workload -- Zhengjun Xing
Re: [LKP] Re: [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression
On 7/22/2020 2:17 PM, Xing Zhengjun wrote: On 7/15/2020 7:04 PM, Ritesh Harjani wrote: Hello Xing, On 4/7/20 1:30 PM, kernel test robot wrote: Greeting, FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec due to commit: commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move ext4_fiemap to use iomap framework") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 10% disk: 1HDD testtime: 1s class: os cpufreq_governor: performance ucode: 0x52c fs: ext4 I started looking into this issue. But with my unit testing, I didn't find any perf issue with fiemap ioctl call. I haven't yet explored about how stress-ng take fiemap performance numbers, it could be doing something differently. But in my testing I just made sure to create a file with large number of extents and used xfs_io -c "fiemap -v" cmd to check how much time it takes to read all the entries in 1st and subsequent iterations. Setup comprised of qemu machine on x86_64 with latest linux branch. 1. created a file of 10G using fallocate. (this allocated unwritten extents for this file). 2. Then I punched hole on every alternate block of file. This step took a long time. And after sufficiently long time, I had to cancel it. for i in $(seq 1 2 x); do echo $i; fallocate -p -o $(($i*4096)) -l 4096; done 3. Then issued fiemap call via xfs_io and took the time measurement. time xfs_io -c "fiemap -v" bigfile > /dev/null Perf numbers on latest default kernel build for above cmd. 1st iteration == real 0m31.684s user 0m1.593s sys 0m24.174s 2nd and subsequent iteration real 0m3.379s user 0m1.300s sys 0m2.080s 4. Then I reverted all the iomap_fiemap patches and re-tested this. With this the older ext4_fiemap implementation will be tested:- 1st iteration == real 0m31.591s user 0m1.400s sys 0m24.243s 2nd and subsequent iteration (had to cancel it since it was taking more time then 15m) ^C^C real 15m49.884s user 0m0.032s sys 15m49.722s I guess the reason why 2nd iteration with older implementation is taking too much time is since with previous implementation we never cached extent entries in extent_status tree. And also in 1st iteration the page cache may get filled with lot of buffer_head entries. So maybe page reclaims are taking more time. While with the latest implementation using iomap_fiemap(), the call to query extent blocks is done using ext4_map_blocks(). ext4_map_blocks() by default will also cache the extent entries into extent_status tree. Hence during 2nd iteration, we will directly read the entries from extent_status tree and will not do any disk I/O. -ritesh I re-test it on the v5.9-rc1, the regression still existed. Have you tried stress-ng test cases? Could you try stress-ng( https://kernel.ubuntu.com/~cking/stress-ng/) test cases? The tarballs can be get from https://kernel.ubuntu.com/~cking/tarballs/stress-ng/. The command for this case you can try "stress-ng --timeout 1 --times --verify --metrics-brief --sequential 9 --class os --minimize --exclude spawn,exec,swap" I re-test it on the v5.8-rc6, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/fs/class/cpufreq_governor/ucode: lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/test/10%/1HDD/1s/ext4/os/performance/0x5002f01 commit: b2c5764262edded1b1cfff5a6ca82c3d61bb4a4a d3b6f23f71670007817a5d59f3fbafab2b794e8c v5.8-rc6 b2c5764262edded1 d3b6f23f71670007817a5d59f3f v5.8-rc6 --- --- %stddev %change %stddev %change %stddev \ | \ | \ 20419 ± 3% -4.9% 19423 ± 4% +27.1% 25959 stress-ng.af-alg.ops 19655 ± 3% -5.7% 18537 ± 4% +27.8% 25111 stress-ng.af-alg.ops_per_sec 64.67 ± 5% -17.0% 53.67 ± 38% +22.2% 79.00 ± 9% stress-ng.chdir.ops 55.34 ± 3% -13.3% 47.99 ± 38% +26.4% 69.96 ± 10% stress-ng.chdir.ops_per_sec 64652 ± 7% -14.1% 55545 ± 11% -13.6% 55842 ± 6% stress-ng.chown.ops 64683 ± 7% -14.1% 55565 ± 11% -13.6% 55858 ± 6% stress-ng.chown.ops_per_sec 2805 ± 2% +0.6% 2820 ± 2% +130.0% 6452 stress-ng.clone.ops 2802 ± 2% +0.6% 2818 ± 2% +129.9% 6443 st
Re: [LKP] [rcu] 276c410448: will-it-scale.per_thread_ops -12.3% regression
On 6/17/2020 12:28 AM, Paul E. McKenney wrote: On Tue, Jun 16, 2020 at 10:02:24AM +0800, Xing Zhengjun wrote: Hi Paul, Do you have time to take a look at this? Thanks. I do not see how this change could affect anything that isn't directly using RCU Tasks Trace. Yes, there is some addition to process creation, but that isn't what is showing the increased overhead. I see that the instruction count increased. Is it possible that this is due to changes in offsets within the task_struct structure? Thanx, Paul How about this regression? I test the latest v5.9-rc1, the regression is still exsited. = tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_task/mode/test/cpufreq_governor/ucode: lkp-knm01/will-it-scale/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-9/test2/100%/thread/page_fault3/performance/0x11 commit: b0afa0f056676ffe0a7213818f09d2460adbcc16 276c410448dbca357a2bc3539acfe04862e5f172 v5.9-rc1 b0afa0f056676ffe 276c410448dbca357a2bc3539acv5.9-rc1 --- --- %stddev %change %stddev %change %stddev \ |\ |\ 1417 -13.2% 1230 ± 2% -16.6% 1182 will-it-scale.per_thread_ops 408456 -13.2% 354391 ± 2% -16.6% 340519 will-it-scale.workload On 6/15/2020 4:57 PM, kernel test robot wrote: Greeting, FYI, we noticed a -12.3% regression of will-it-scale.per_thread_ops due to commit: commit: 276c410448dbca357a2bc3539acfe04862e5f172 ("rcu-tasks: Split ->trc_reader_need_end") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory with following parameters: nr_task: 100% mode: thread test: page_fault3 cpufreq_governor: performance ucode: 0x11 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-knm01/page_fault3/will-it-scale/0x11 commit: b0afa0f056 ("rcu-tasks: Provide boot parameter to delay IPIs until late in grace period") 276c410448 ("rcu-tasks: Split ->trc_reader_need_end") b0afa0f056676ffe 276c410448dbca357a2bc3539ac --- fail:runs %reproductionfail:runs | | | 2:4 -50%:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x :4 28% 1:4 perf-profile.calltrace.cycles-pp.error_entry 0:40% 0:4 perf-profile.children.cycles-pp.error_exit 1:47% 2:4 perf-profile.children.cycles-pp.error_entry 0:44% 1:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ |\ 1414 -12.3% 1241 ± 2% will-it-scale.per_thread_ops 463.32+1.7% 470.99will-it-scale.time.elapsed_time 463.32+1.7% 470.99 will-it-scale.time.elapsed_time.max 407566 -12.3% 357573 ± 2% will-it-scale.workload 48.51-1.5% 47.77boot-time.boot 7.203e+10 +20.0% 8.64e+10 ± 2% cpuidle.C1.time 2.162e+08 ± 2% +27.7% 2.761e+08 ± 2% cpuidle.C1.usage 60.50 +12.2 72.74 ± 2% mpstat.cpu.all.idle% 39.17 -12.2 26.97 ± 6% mpstat.cpu.all.sys% 2334 ± 12% +18.8% 2772 ± 5% slabinfo.khugepaged_mm_slot.active_objs 2334 ± 12% +18.8% 2772 ± 5% slabinfo.khugepaged_mm_slot.num_objs 60.25 +20.3% 72.50 ± 2% vmstat.cpu.id 92.75
Re: [LKP] Re: [fsnotify] c738fbabb0: will-it-scale.per_process_ops -9.5% regression
On 7/24/2020 10:44 AM, Rong Chen wrote: On 7/21/20 11:59 PM, Amir Goldstein wrote: On Tue, Jul 21, 2020 at 3:15 AM kernel test robot wrote: Greeting, FYI, we noticed a -9.5% regression of will-it-scale.per_process_ops due to commit: commit: c738fbabb0ff62d0f9a9572e56e65d05a1b34c6a ("fsnotify: fold fsnotify() call into fsnotify_parent()") Strange, that's a pretty dumb patch moving some inlined code from one function to another (assuming there are no fsnotify marks in this test). Unless I am missing something the only thing that changes slightly is an extra d_inode(file->f_path.dentry) deference. I can get rid of it. Is it possible to ask for a re-test with fix patch (attached)? I apply the fix patch, the regression still exists. = tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode: lkp-csl-2ap2/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/16/process/open1/performance/0x5002f01 commit: 71d734103edfa2b4c6657578a3082ee0e51d767e c738fbabb0ff62d0f9a9572e56e65d05a1b34c6a 5c32fe90f2a57e7c4da06be51f705aec6affceb6 (the commit which the fix patch apply based on) 7f66797f773621d0ef6718df0ef2cf849814d114 (the fix patch) 71d734103edfa2b4 c738fbabb0ff62d0f9a9572e56e 5c32fe90f2a57e7c4da06be51f7 7f66797f773621d0ef6718df0ef --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 229940-9.8% 207333 -13.0% 16 -11.7% 202927will-it-scale.per_process_ops 3679048-9.8%3317347 -13.0%3199942 -11.7%3246851will-it-scale.workload Hi Amir, We failed to apply this patch, could you tell us the base commit or the base branch? Best Regards, Rong Chen ___ LKP mailing list -- l...@lists.01.org To unsubscribe send an email to lkp-le...@lists.01.org -- Zhengjun Xing
Re: [LKP] [x86, sched] 1567c3e346: vm-scalability.median -15.8% regression
On 7/9/2020 8:43 PM, Giovanni Gherdovich wrote: On Tue, 2020-07-07 at 10:58 +0800, Xing Zhengjun wrote: On 6/12/2020 4:11 PM, Xing Zhengjun wrote: Hi Giovanni, I test the regression, it still existed in v5.7. Do you have time to take a look at this? Thanks. Ping... Hello, I haven't sat down to reproduce this yet but I've read the benchmark code and configuration, and this regression seems likely to be more of a benchmarking artifact than an actual performance bug. Likely a benchmarking artifact: First off, the test used the "performance" governor from the "intel_pstate" cpufreq driver, but points at the patch introducing the "frequency invariance on x86" feature as the culprit. This is suspicious because "frequency invariance on x86" influences frequency selection when the "schedutil" governor is in use (not your case). It may also affect the scheduler load balancing but here you have $NUM_CPUS processes so there isn't a lot of room for creativity there, each CPU gets a process. Some notes on this benchmark for my future reference: The test in question is "anon-cow-seq" from "vm-scalability", which is based on the "usemem" program originally written by Andrew Morton and exercises the memory management subsystem. The invocation is: usemem --nproc $NUM_CPUS \ --prealloc \ --prefault \ $SIZE What this does is to create an anonymous mmap()-ing of $SIZE bytes in the main process, fork $NUM_CPUS distinct child processes and have all of them scan the mapping sequentially from byte 0 to byte N, writing 0, 1, 2, ..., N on the region as they scan it, all together at the same time. So we have the "anon" part (the mapping isn't file-backed), the "cow" part (the parent process allocates the region, then each children copy-on-write's to it) and the "seq" part (memory accesses happen sequentially from low to high address). The test measures how quick this happens; I believe the regression happens in the median time it takes a process to finish (or the median throughput, but $SIZE is fixed so it's equivalent). The $SIZE parameter is selected so that there is enough space for everybody: each children plus the parent need a copy of the mapped region, so that makes $NUM_CPUS+1 instances. The formula for $SIZE adds a factor 2 for good measure: SIZE = $MEM_SIZE / ($NUM_CPUS + 1) / 2 So we have a benchmark dominated by page allocation and copying, run with the "performance" cpufreq governor, and your bisections points to a commit such as 1567c3e3467cddeb019a7b53ec632f834b6a9239 ("x86, sched: Add support for frequency invariance") which: * changes how frequency is selected by a governor you're not using * doesn't touch the memory management subsystem or related functions I'm not entirely dismissing your finding, just explaining why this analysis hasn't been in my top priorities lately (plus, I've just returned from a 3 weeks vacation :). I'm curious too about what causes the test to go red, but I'm not overly worried given the above context. Thanks, Giovanni Gherdovich This regression only happened on the testbox "lkp-hsw-4ex1", the machine hardware info: model: Haswell-EX nr_node: 4 nr_cpu: 144 memory: 512G brand: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz We have ever reproduced it for many times, but recently we upgrade both software and hardware for it, then we can not reproduce the regression on it, we also try to revert the upgrade, it still can not be reproduced. We will continue to run the test case and once the regression reproduced will let you know. -- Zhengjun Xing
Re: [LKP] [xfs] a5949d3fae: aim7.jobs-per-min -33.6% regression
On 7/7/2020 2:30 AM, Darrick J. Wong wrote: On Wed, Jul 01, 2020 at 03:49:52PM +0800, Xing Zhengjun wrote: On 6/10/2020 11:07 AM, Xing Zhengjun wrote: Hi Darrick, Do you have time to take a look at this? Thanks. Ping... Yes, that decrease is the expected end result of making the write path take a longer route to avoid a file corruption vector. --D Thanks for the explanation, We will stop tracking it. On 6/6/2020 11:48 PM, kernel test robot wrote: Greeting, FYI, we noticed a -33.6% regression of aim7.jobs-per-min due to commit: commit: a5949d3faedf492fa7863b914da408047ab46eb0 ("xfs: force writes to delalloc regions to unwritten") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: aim7 on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory with following parameters: disk: 1BRD_48G fs: xfs test: sync_disk_rw load: 600 cpufreq_governor: performance ucode: 0x42e test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system. test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/1BRD_48G/xfs/x86_64-rhel-7.6/600/debian-x86_64-20191114.cgz/lkp-ivb-2ep1/sync_disk_rw/aim7/0x42e commit: 590b16516e ("xfs: refactor xfs_iomap_prealloc_size") a5949d3fae ("xfs: force writes to delalloc regions to unwritten") 590b16516ef38e2e a5949d3faedf492fa7863b914da --- fail:runs %reproduction fail:runs | | | :4 50% 2:4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x %stddev %change %stddev \ | \ 35272 -33.6% 23430 aim7.jobs-per-min 102.13 +50.5% 153.75 aim7.time.elapsed_time 102.13 +50.5% 153.75 aim7.time.elapsed_time.max 1388038 +40.2% 1945838 aim7.time.involuntary_context_switches 43420 ± 2% +13.4% 49255 ± 2% aim7.time.minor_page_faults 3123 +44.2% 4504 ± 2% aim7.time.system_time 59.31 +6.5% 63.18 aim7.time.user_time 48595108 +58.6% 77064959 aim7.time.voluntary_context_switches 1.44 -28.8% 1.02 iostat.cpu.user 0.07 ± 6% +0.4 0.44 ± 7% mpstat.cpu.all.iowait% 1.44 -0.4 1.02 mpstat.cpu.all.usr% 8632 ± 50% +75.6% 15156 ± 34% numa-meminfo.node0.KernelStack 6583 ±136% +106.0% 13562 ± 82% numa-meminfo.node0.PageTables 63325 ± 11% +14.3% 72352 ± 12% numa-meminfo.node0.SUnreclaim 8647 ± 50% +75.3% 15156 ± 34% numa-vmstat.node0.nr_kernel_stack 1656 ±136% +104.6% 3389 ± 82% numa-vmstat.node0.nr_page_table_pages 15831 ± 11% +14.3% 18087 ± 12% numa-vmstat.node0.nr_slab_unreclaimable 93640 ± 3% +41.2% 132211 ± 2% meminfo.AnonHugePages 21641 +39.9% 30271 ± 4% meminfo.KernelStack 129269 +12.3% 145114 meminfo.SUnreclaim 28000 -31.2% 19275 meminfo.max_used_kB 1269307 -26.9% 927657 vmstat.io.bo 149.75 ± 3% -17.4% 123.75 ± 4% vmstat.procs.r 718992 +13.3% 814567 vmstat.system.cs 231397 -9.3% 209881 ± 2% vmstat.system.in 6.774e+08 +70.0% 1.152e+09 cpuidle.C1.time 18203372 +60.4% 29198744 cpuidle.C1.usage 2.569e+08 ± 18% +81.8% 4.672e+08 ± 5% cpuidle.C1E.time 2691402 ± 13% +98.7% 5346901 ± 3% cpuidle.C1E.usage 990350 +95.0% 1931226 ± 2% cpuidle.POLL.time 520061 +97.7% 1028004 ± 2% cpuidle.POLL.usage 77231 +1.8% 78602 proc-vmstat.nr_active_anon 19868 +3.8% 20615 proc-vmstat.nr_dirty 381302 +1.0% 384969 proc-vmstat.nr_file_pages 4388 -2.7% 4270 proc-vmstat.nr_inactive_anon 69865
Re: [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression
On 7/15/2020 7:04 PM, Ritesh Harjani wrote: Hello Xing, On 4/7/20 1:30 PM, kernel test robot wrote: Greeting, FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec due to commit: commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move ext4_fiemap to use iomap framework") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 10% disk: 1HDD testtime: 1s class: os cpufreq_governor: performance ucode: 0x52c fs: ext4 I started looking into this issue. But with my unit testing, I didn't find any perf issue with fiemap ioctl call. I haven't yet explored about how stress-ng take fiemap performance numbers, it could be doing something differently. But in my testing I just made sure to create a file with large number of extents and used xfs_io -c "fiemap -v" cmd to check how much time it takes to read all the entries in 1st and subsequent iterations. Setup comprised of qemu machine on x86_64 with latest linux branch. 1. created a file of 10G using fallocate. (this allocated unwritten extents for this file). 2. Then I punched hole on every alternate block of file. This step took a long time. And after sufficiently long time, I had to cancel it. for i in $(seq 1 2 x); do echo $i; fallocate -p -o $(($i*4096)) -l 4096; done 3. Then issued fiemap call via xfs_io and took the time measurement. time xfs_io -c "fiemap -v" bigfile > /dev/null Perf numbers on latest default kernel build for above cmd. 1st iteration == real 0m31.684s user 0m1.593s sys 0m24.174s 2nd and subsequent iteration real 0m3.379s user 0m1.300s sys 0m2.080s 4. Then I reverted all the iomap_fiemap patches and re-tested this. With this the older ext4_fiemap implementation will be tested:- 1st iteration == real 0m31.591s user 0m1.400s sys 0m24.243s 2nd and subsequent iteration (had to cancel it since it was taking more time then 15m) ^C^C real 15m49.884s user 0m0.032s sys 15m49.722s I guess the reason why 2nd iteration with older implementation is taking too much time is since with previous implementation we never cached extent entries in extent_status tree. And also in 1st iteration the page cache may get filled with lot of buffer_head entries. So maybe page reclaims are taking more time. While with the latest implementation using iomap_fiemap(), the call to query extent blocks is done using ext4_map_blocks(). ext4_map_blocks() by default will also cache the extent entries into extent_status tree. Hence during 2nd iteration, we will directly read the entries from extent_status tree and will not do any disk I/O. -ritesh Could you try stress-ng( https://kernel.ubuntu.com/~cking/stress-ng/) test cases? The tarballs can be get from https://kernel.ubuntu.com/~cking/tarballs/stress-ng/. The command for this case you can try "stress-ng --timeout 1 --times --verify --metrics-brief --sequential 9 --class os --minimize --exclude spawn,exec,swap" I re-test it on the v5.8-rc6, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/fs/class/cpufreq_governor/ucode: lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/test/10%/1HDD/1s/ext4/os/performance/0x5002f01 commit: b2c5764262edded1b1cfff5a6ca82c3d61bb4a4a d3b6f23f71670007817a5d59f3fbafab2b794e8c v5.8-rc6 b2c5764262edded1 d3b6f23f71670007817a5d59f3fv5.8-rc6 --- --- %stddev %change %stddev %change %stddev \ |\ |\ 20419 ± 3% -4.9% 19423 ± 4% +27.1% 25959 stress-ng.af-alg.ops 19655 ± 3% -5.7% 18537 ± 4% +27.8% 25111 stress-ng.af-alg.ops_per_sec 64.67 ± 5% -17.0% 53.67 ± 38% +22.2% 79.00 ± 9% stress-ng.chdir.ops 55.34 ± 3% -13.3% 47.99 ± 38% +26.4% 69.96 ± 10% stress-ng.chdir.ops_per_sec 64652 ± 7% -14.1% 55545 ± 11% -13.6% 55842 ± 6% stress-ng.chown.ops 64683 ± 7% -14.1% 55565 ± 11% -13.6% 55858 ± 6% stress-ng.chown.ops_per_sec 2805 ± 2% +0.6% 2820 ± 2%+130.0% 6452 stress-ng.clone.ops 2802 ± 2% +0.6% 2818 ± 2%+129.9% 6443 stress-ng.clone.ops_per_sec 34.67+1.9% 35.33 ± 3% -9.6% 31.33 ± 3% stress-ng.copy-file.ops 22297 ± 23% +26.7% 28258 ± 2% +38.1% 30783 ± 14% stress-ng.dir.ops_per_sec 47
Re: [LKP] [x86, sched] 1567c3e346: vm-scalability.median -15.8% regression
On 6/12/2020 4:11 PM, Xing Zhengjun wrote: Hi Giovanni, I test the regression, it still existed in v5.7. Do you have time to take a look at this? Thanks. Ping... = tbox_group/testcase/rootfs/kconfig/compiler/runtime/debug-setup/size/test/cpufreq_governor/ucode: lkp-hsw-4ex1/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/test/8T/anon-cow-seq/performance/0x16 commit: 2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d 1567c3e3467cddeb019a7b53ec632f834b6a9239 v5.7-rc1 v5.7 2a4b03ffc69f2ded 1567c3e3467cddeb019a7b53ec6 v5.7-rc1 v5.7 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 211462 -16.0% 177702 -15.0% 179809 -15.1% 179510 vm-scalability.median 5.34 ± 9% -3.1 2.23 ± 11% -2.9 2.49 ± 5% -2.7 2.61 ± 11% vm-scalability.median_stddev% 30430671 -16.3% 25461360 -15.5% 25707029 -15.5% 25701713 vm-scalability.throughput 7.967e+09 -11.1% 7.082e+09 -11.1% 7.082e+09 -11.1% 7.082e+09 vm-scalability.workload On 4/16/2020 2:20 PM, Giovanni Gherdovich wrote: On Thu, 2020-04-16 at 14:10 +0800, Xing Zhengjun wrote: Hi Giovanni, 1567c3e346("x86, sched: Add support for frequency invariance") has been merged into Linux mainline v5.7-rc1 now. Do you have time to take a look at this? Thanks. Apologies, this slipped under my radar. I'm on it, thanks. Giovanni Gherdovich -- Zhengjun Xing
Re: [LKP] [xfs] a5949d3fae: aim7.jobs-per-min -33.6% regression
On 6/10/2020 11:07 AM, Xing Zhengjun wrote: Hi Darrick, Do you have time to take a look at this? Thanks. Ping... On 6/6/2020 11:48 PM, kernel test robot wrote: Greeting, FYI, we noticed a -33.6% regression of aim7.jobs-per-min due to commit: commit: a5949d3faedf492fa7863b914da408047ab46eb0 ("xfs: force writes to delalloc regions to unwritten") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: aim7 on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory with following parameters: disk: 1BRD_48G fs: xfs test: sync_disk_rw load: 600 cpufreq_governor: performance ucode: 0x42e test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system. test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/1BRD_48G/xfs/x86_64-rhel-7.6/600/debian-x86_64-20191114.cgz/lkp-ivb-2ep1/sync_disk_rw/aim7/0x42e commit: 590b16516e ("xfs: refactor xfs_iomap_prealloc_size") a5949d3fae ("xfs: force writes to delalloc regions to unwritten") 590b16516ef38e2e a5949d3faedf492fa7863b914da --- fail:runs %reproduction fail:runs | | | :4 50% 2:4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x %stddev %change %stddev \ | \ 35272 -33.6% 23430 aim7.jobs-per-min 102.13 +50.5% 153.75 aim7.time.elapsed_time 102.13 +50.5% 153.75 aim7.time.elapsed_time.max 1388038 +40.2% 1945838 aim7.time.involuntary_context_switches 43420 ± 2% +13.4% 49255 ± 2% aim7.time.minor_page_faults 3123 +44.2% 4504 ± 2% aim7.time.system_time 59.31 +6.5% 63.18 aim7.time.user_time 48595108 +58.6% 77064959 aim7.time.voluntary_context_switches 1.44 -28.8% 1.02 iostat.cpu.user 0.07 ± 6% +0.4 0.44 ± 7% mpstat.cpu.all.iowait% 1.44 -0.4 1.02 mpstat.cpu.all.usr% 8632 ± 50% +75.6% 15156 ± 34% numa-meminfo.node0.KernelStack 6583 ±136% +106.0% 13562 ± 82% numa-meminfo.node0.PageTables 63325 ± 11% +14.3% 72352 ± 12% numa-meminfo.node0.SUnreclaim 8647 ± 50% +75.3% 15156 ± 34% numa-vmstat.node0.nr_kernel_stack 1656 ±136% +104.6% 3389 ± 82% numa-vmstat.node0.nr_page_table_pages 15831 ± 11% +14.3% 18087 ± 12% numa-vmstat.node0.nr_slab_unreclaimable 93640 ± 3% +41.2% 132211 ± 2% meminfo.AnonHugePages 21641 +39.9% 30271 ± 4% meminfo.KernelStack 129269 +12.3% 145114 meminfo.SUnreclaim 28000 -31.2% 19275 meminfo.max_used_kB 1269307 -26.9% 927657 vmstat.io.bo 149.75 ± 3% -17.4% 123.75 ± 4% vmstat.procs.r 718992 +13.3% 814567 vmstat.system.cs 231397 -9.3% 209881 ± 2% vmstat.system.in 6.774e+08 +70.0% 1.152e+09 cpuidle.C1.time 18203372 +60.4% 29198744 cpuidle.C1.usage 2.569e+08 ± 18% +81.8% 4.672e+08 ± 5% cpuidle.C1E.time 2691402 ± 13% +98.7% 5346901 ± 3% cpuidle.C1E.usage 990350 +95.0% 1931226 ± 2% cpuidle.POLL.time 520061 +97.7% 1028004 ± 2% cpuidle.POLL.usage 77231 +1.8% 78602 proc-vmstat.nr_active_anon 19868 +3.8% 20615 proc-vmstat.nr_dirty 381302 +1.0% 384969 proc-vmstat.nr_file_pages 4388 -2.7% 4270 proc-vmstat.nr_inactive_anon 69865 +4.7% 73155 proc-vmstat.nr_inactive_file 21615 +40.0% 30251 ± 4% proc-vmstat.nr_kernel_stack 7363 -3.2% 7127 proc-vmstat.nr_mapped 12595 ± 3% +5.2% 13255 ± 4% proc-vmstat.nr_shmem 19619
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/18/2020 4:24 PM, Hillf Danton wrote: On Thu, 18 Jun 2020 10:45:01 +0800 Xing Zhengjun wrote: On 6/18/2020 12:25 AM, Vincent Guittot wrote: Le mercredi 17 juin 2020 à 16:57:25 (+0200), Vincent Guittot a écrit : Le mercredi 17 juin 2020 à 08:30:21 (+0800), Xing Zhengjun a écrit : On 6/16/2020 2:54 PM, Vincent Guittot wrote: Hi Xing, Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit : On 6/15/2020 4:10 PM, Vincent Guittot wrote: Hi Xing, Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit : On 6/12/2020 7:06 PM, Hillf Danton wrote: On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote: ... ... I apply the patch based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 63a5d0fbb5ec62f5148c251c01e --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 +1.0% 0.69reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 -0.1% 0.62reaim.child_utime 66870 -10.0% 60187-7.6% 61787 +1.1% 67636reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 +1.1% 16909reaim.jobs_per_min_child OK. So the regression disappears when the conditions on runnable_avg are removed. In the meantime, I have been able to understand more deeply what was happeningi for this bench and how it is impacted by commit: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") This bench forks a new thread for each and every new step. But a newly forked threads start with a load_avg and a runnable_avg set to max whereas the threads are running shortly before exiting. This makes the CPU to be set overloaded in some case whereas it isn't. Could you try the patch below ? It fixes the problem on my setup (I have finally been able to reproduce the problem) --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ae62807..b33a4a9e1491 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p) } } - sa->runnable_avg = cpu_scale; + sa->runnable_avg = sa->util_avg; if (p->sched_class != &fair_sched_class) { /* -- 2.17.1 The patch above tries to move back to the group in the same classification as before but this could harm other benchmarks. There is another way to fix this by easing the migration of task in the case of migrate_util imbalance. Could you also try the patch below instead of the one above ? --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ae62807..fcaf66c4d086 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7753,7 +7753,8 @@ static int detach_tasks(struct lb_env *env) case migrate_util: util = task_util_est(p); - if (util > env->imbalance) + if (util/2 > env->imbalance && + env->sd->nr_balance_failed <= env->sd->cache_nice_tries) goto next; Hmm... this sheds a shaft of light on computing imbalance for migrate_util, see below. env->imbalance -= util; -- 2.17.1 I apply the patch based on v5.7, the test result is as the following: Thanks. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 69c81543653bf5f2c7105086502889fa019c15cb (the test patch) 9f68395333ad7f5b 070f5e860ee2bf
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/18/2020 8:35 PM, Vincent Guittot wrote: On Thu, 18 Jun 2020 at 04:45, Xing Zhengjun wrote: This bench forks a new thread for each and every new step. But a newly forked threads start with a load_avg and a runnable_avg set to max whereas the threads are running shortly before exiting. This makes the CPU to be set overloaded in some case whereas it isn't. Could you try the patch below ? It fixes the problem on my setup (I have finally been able to reproduce the problem) --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ae62807..b33a4a9e1491 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p) } } -sa->runnable_avg = cpu_scale; +sa->runnable_avg = sa->util_avg; if (p->sched_class != &fair_sched_class) { /* -- 2.17.1 The patch above tries to move back to the group in the same classification as before but this could harm other benchmarks. There is another way to fix this by easing the migration of task in the case of migrate_util imbalance. Could you also try the patch below instead of the one above ? --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ae62807..fcaf66c4d086 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7753,7 +7753,8 @@ static int detach_tasks(struct lb_env *env) case migrate_util: util = task_util_est(p); - if (util > env->imbalance) + if (util/2 > env->imbalance && + env->sd->nr_balance_failed <= env->sd->cache_nice_tries) goto next; env->imbalance -= util; -- 2.17.1 I apply the patch based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 69c81543653bf5f2c7105086502889fa019c15cb (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 69c81543653bf5f2c7105086502 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 -7.6% 0.63reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 +1.9% 0.63reaim.child_utime 66870 -10.0% 60187-7.6% 61787 -5.9% 62947reaim.jobs_per_min There is an improvement but not at the same level as on my setup. I'm not sure with patch you tested here. Is it the last one that modify detach_tasks() or the previous one that modify post_init_entity_util_avg() ? It is the last one that modify detach_tasks(). Could you also try the other one ? Both patches were improving results on y setup but the behavior doesn't seem to be the same on your setup. The test result for the other one has been sent in another mail. 16717 -10.0% 15046-7.6% 15446 -5.9% 15736reaim.jobs_per_min_child 97.84-1.1% 96.75-0.4% 97.43 -0.4% 97.47reaim.jti 72000 -10.8% 64216-8.3% 66000 -5.7% 67885reaim.max_jobs_per_min 0.36 +10.6% 0.40+7.8% 0.39 +6.0% 0.38reaim.parent_time 1.58 ± 2% +71.0% 2.70 ± 2% +26.9% 2.01 ± 2% +23.6% 1.95 ± 3% reaim.std_dev_percent 0.00 ± 5%+110.4% 0.01 ± 3% +48.8% 0.01 ± 7% +43.2% 0.01 ± 5% reaim.std_dev_time 50800-2.4% 49600-1.6% 5 -0.8% 50400 reaim.workload ... -- Zhengjun Xing -- Zhengjun Xing -- Zhengjun Xing -- Zhengjun Xing
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/17/2020 10:57 PM, Vincent Guittot wrote: Le mercredi 17 juin 2020 à 08:30:21 (+0800), Xing Zhengjun a écrit : On 6/16/2020 2:54 PM, Vincent Guittot wrote: Hi Xing, Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit : On 6/15/2020 4:10 PM, Vincent Guittot wrote: Hi Xing, Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit : On 6/12/2020 7:06 PM, Hillf Danton wrote: On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote: ... I apply the patch based on v5.7, the test result is as the following: TBH, I didn't expect that the results would still be bad, so i wonder if the threshold are the root problem. Could you run tests with the patch below that removes condition with runnable_avg ? I just want to make sure that those 2 conditions are the root cause. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da3e5b54715b..f5774d0af059 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8210,10 +8210,6 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) - return false; - if ((sgs->group_capacity * 100) > (sgs->group_util * imbalance_pct)) return true; @@ -8239,10 +8235,6 @@ group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) (sgs->group_util * imbalance_pct)) return true; - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) - return true; - return false; } Thanks. Vincent I apply the patch based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 63a5d0fbb5ec62f5148c251c01e --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 +1.0% 0.69reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 -0.1% 0.62reaim.child_utime 66870 -10.0% 60187-7.6% 61787 +1.1% 67636reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 +1.1% 16909reaim.jobs_per_min_child OK. So the regression disappears when the conditions on runnable_avg are removed. In the meantime, I have been able to understand more deeply what was happeningi for this bench and how it is impacted by commit: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") This bench forks a new thread for each and every new step. But a newly forked threads start with a load_avg and a runnable_avg set to max whereas the threads are running shortly before exiting. This makes the CPU to be set overloaded in some case whereas it isn't. Could you try the patch below ? It fixes the problem on my setup (I have finally been able to reproduce the problem) --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ae62807..b33a4a9e1491 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p) } } - sa->runnable_avg = cpu_scale; + sa->runnable_avg = sa->util_avg; if (p->sched_class != &fair_sched_class) { /* I apply the patch above based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 cbb4d668e7431479a7978fa79d64c2271adefab0 ( the test patch which modify post_init_entity_util_avg()
Re: [LKP] Re: [mm] 1431d4d11a: vm-scalability.throughput -11.5% regression
On 6/16/2020 10:45 PM, Johannes Weiner wrote: On Tue, Jun 16, 2020 at 03:57:50PM +0800, kernel test robot wrote: Greeting, FYI, we noticed a -11.5% regression of vm-scalability.throughput due to commit: commit: 1431d4d11abb265e79cd44bed2f5ea93f1bcc57b ("mm: base LRU balancing on an explicit cost model") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master That's a really curious result. An bit of an increase in system time is expected in the series as a whole: 1. When thrashing is detected in the file cache, it intentionally makes more effort to find reclaimable anon pages than before - in an effort to converge on a stable state that has *neither* page reclaim nor refault IO. 2. There are a couple of XXX about unrealized lock batching opportunities. Those weren't/aren't expected to have too much of a practical impact, and require a bit more infrastructure work that would have interferred with other ongoing work in the area. However, this patch in particular doesn't add any locked sections (it adds a function call to an existing one, I guess?), and the workload is doing streaming mmapped IO and shouldn't experience any thrashing. In addition, we shouldn't even scan anon pages - from below: swap_partitions: rootfs_partition: "/dev/disk/by-id/wwn-0x5000c50067b47753-part1" Does that mean that no swap space (not even a file) is configured? In this case, the swap is disabled (if enabled, you should find the "swap:" in the job file), "swap_patitions:" is just the description of the hardware. in testcase: vm-scalability on test machine: 160 threads Intel(R) Xeon(R) CPU E7-8890 v4 @ 2.20GHz with 256G memory with following parameters: runtime: 300s test: lru-file-mmap-read cpufreq_governor: performance ucode: 0xb38 test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-7.6/debian-x86_64-20191114.cgz/300s/lkp-bdw-ex2/lru-file-mmap-read/vm-scalability/0xb38 commit: a4fe1631f3 ("mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count()") 1431d4d11a ("mm: base LRU balancing on an explicit cost model") a4fe1631f313f75c 1431d4d11abb265e79cd44bed2f --- %stddev %change %stddev \ |\ 0.23 ± 2% +11.7% 0.26vm-scalability.free_time What's free_time? The average of the time to free memory(unit: second) , you can see the output of vm-scalability( https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/) benchamrk log "xxx usecs to free memory" 103991 -11.6% 91935vm-scalability.median 20717269 -11.5% 18336098vm-scalability.throughput 376.47+8.3% 407.78vm-scalability.time.elapsed_time 376.47+8.3% 407.78 vm-scalability.time.elapsed_time.max 392226+7.2% 420612 vm-scalability.time.involuntary_context_switches 11731+4.4% 12247 vm-scalability.time.percent_of_cpu_this_job_got 41005 +14.5% 46936vm-scalability.time.system_time 3156-4.8% 3005vm-scalability.time.user_time 52662860 ± 5% -14.0% 45266760 ± 5% meminfo.DirectMap2M 4.43-0.53.90 ± 2% mpstat.cpu.all.usr% 1442 ± 5% -14.9% 1227 ± 10% slabinfo.kmalloc-rcl-96.active_objs 1442 ± 5% -14.9% 1227 ± 10% slabinfo.kmalloc-rcl-96.num_objs 37.50 ± 2% -7.3% 34.75vmstat.cpu.id 57.25+5.2% 60.25vmstat.cpu.sy 54428 ± 60% -96.5% 1895 ±173% numa-meminfo.node1.AnonHugePages 116516 ± 48% -88.2% 13709 ± 26% numa-meminfo.node1.AnonPages 132303 ± 84% -88.9% 14731 ± 61% numa-meminfo.node3.Inactive(anon) These counters capture present state, not history. Are these averages or snapshots? If snapshots, when are they taken during the test? These are averages. 311.75 ± 8% +16.0% 361.75 ± 2% numa-vmstat.node0.nr_isolated_file 29136 ± 48% -
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/18/2020 12:25 AM, Vincent Guittot wrote: Le mercredi 17 juin 2020 à 16:57:25 (+0200), Vincent Guittot a écrit : Le mercredi 17 juin 2020 à 08:30:21 (+0800), Xing Zhengjun a écrit : On 6/16/2020 2:54 PM, Vincent Guittot wrote: Hi Xing, Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit : On 6/15/2020 4:10 PM, Vincent Guittot wrote: Hi Xing, Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit : On 6/12/2020 7:06 PM, Hillf Danton wrote: On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote: ... ... I apply the patch based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 63a5d0fbb5ec62f5148c251c01e --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 +1.0% 0.69reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 -0.1% 0.62reaim.child_utime 66870 -10.0% 60187-7.6% 61787 +1.1% 67636reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 +1.1% 16909reaim.jobs_per_min_child OK. So the regression disappears when the conditions on runnable_avg are removed. In the meantime, I have been able to understand more deeply what was happeningi for this bench and how it is impacted by commit: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") This bench forks a new thread for each and every new step. But a newly forked threads start with a load_avg and a runnable_avg set to max whereas the threads are running shortly before exiting. This makes the CPU to be set overloaded in some case whereas it isn't. Could you try the patch below ? It fixes the problem on my setup (I have finally been able to reproduce the problem) --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ae62807..b33a4a9e1491 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -807,7 +807,7 @@ void post_init_entity_util_avg(struct task_struct *p) } } - sa->runnable_avg = cpu_scale; + sa->runnable_avg = sa->util_avg; if (p->sched_class != &fair_sched_class) { /* -- 2.17.1 The patch above tries to move back to the group in the same classification as before but this could harm other benchmarks. There is another way to fix this by easing the migration of task in the case of migrate_util imbalance. Could you also try the patch below instead of the one above ? --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ae62807..fcaf66c4d086 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7753,7 +7753,8 @@ static int detach_tasks(struct lb_env *env) case migrate_util: util = task_util_est(p); - if (util > env->imbalance) + if (util/2 > env->imbalance && + env->sd->nr_balance_failed <= env->sd->cache_nice_tries) goto next; env->imbalance -= util; -- 2.17.1 I apply the patch based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 69c81543653bf5f2c7105086502889fa019c15cb (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 69c81543653bf5f2c7105086502 --- --- --- %stddev %ch
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/16/2020 2:54 PM, Vincent Guittot wrote: Hi Xing, Le mardi 16 juin 2020 à 11:17:16 (+0800), Xing Zhengjun a écrit : On 6/15/2020 4:10 PM, Vincent Guittot wrote: Hi Xing, Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit : On 6/12/2020 7:06 PM, Hillf Danton wrote: On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote: ... I apply the patch based on v5.7, the test result is as the following: TBH, I didn't expect that the results would still be bad, so i wonder if the threshold are the root problem. Could you run tests with the patch below that removes condition with runnable_avg ? I just want to make sure that those 2 conditions are the root cause. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da3e5b54715b..f5774d0af059 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8210,10 +8210,6 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) - return false; - if ((sgs->group_capacity * 100) > (sgs->group_util * imbalance_pct)) return true; @@ -8239,10 +8235,6 @@ group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) (sgs->group_util * imbalance_pct)) return true; - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) - return true; - return false; } Thanks. Vincent I apply the patch based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 63a5d0fbb5ec62f5148c251c01e709b8358cd0ee (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 63a5d0fbb5ec62f5148c251c01e --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 +1.0% 0.69reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 -0.1% 0.62reaim.child_utime 66870 -10.0% 60187-7.6% 61787 +1.1% 67636reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 +1.1% 16909reaim.jobs_per_min_child 97.84-1.1% 96.75-0.4% 97.43 +0.3% 98.09reaim.jti 72000 -10.8% 64216-8.3% 66000 +0.0% 72000reaim.max_jobs_per_min 0.36 +10.6% 0.40+7.8% 0.39 -1.1% 0.36reaim.parent_time 1.58 ± 2% +71.0% 2.70 ± 2% +26.9% 2.01 ± 2% -11.9% 1.39 ± 4% reaim.std_dev_percent 0.00 ± 5%+110.4% 0.01 ± 3% +48.8% 0.01 ± 7% -27.3% 0.00 ± 15% reaim.std_dev_time 50800-2.4% 49600-1.6% 5 +0.0% 50800reaim.workload = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 3e1643da53f3fc7414cfa3ad2a16ab2a164b7f4d (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 3e1643da53f3fc7414cfa3ad2a1 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 -7.1% 0.64reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 +1.3% 0.63reaim.child_utime 66870 -10.0% 60187-7.6% 61787 -6.1% 62807re
Re: [LKP] [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression
Hi Ritesh, I test, the regression still existed in v5.8-rc1. Do you have time to take a look at it? Thanks. On 4/14/2020 1:49 PM, Xing Zhengjun wrote: Thanks for your quick response, if you need any more test information about the regression, please let me known. On 4/13/2020 6:56 PM, Ritesh Harjani wrote: On 4/13/20 2:07 PM, Xing Zhengjun wrote: Hi Harjani, Do you have time to take a look at this? Thanks. Hello Xing, I do want to look into this. But as of now I am stuck with another mballoc failure issue. I will get back at this once I have some handle over that one. BTW, are you planning to take look at this? -ritesh On 4/7/2020 4:00 PM, kernel test robot wrote: Greeting, FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec due to commit: commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move ext4_fiemap to use iomap framework") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 10% disk: 1HDD testtime: 1s class: os cpufreq_governor: performance ucode: 0x52c fs: ext4 Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode: os/gcc-7/performance/1HDD/ext4/x86_64-rhel-7.6/10%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/stress-ng/1s/0x52c commit: b2c5764262 ("ext4: make ext4_ind_map_blocks work with fiemap") d3b6f23f71 ("ext4: move ext4_fiemap to use iomap framework") b2c5764262edded1 d3b6f23f71670007817a5d59f3f --- fail:runs %reproduction fail:runs | | | :4 25% 1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x 2:4 5% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry 2:4 6% 3:4 perf-profile.calltrace.cycles-pp.error_entry 3:4 9% 3:4 perf-profile.children.cycles-pp.error_entry 0:4 1% 0:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ | \ 28623 +28.2% 36703 ± 12% stress-ng.daemon.ops 28632 +28.2% 36704 ± 12% stress-ng.daemon.ops_per_sec 566.00 ± 22% -53.2% 265.00 ± 53% stress-ng.dev.ops 278.81 ± 22% -53.0% 131.00 ± 54% stress-ng.dev.ops_per_sec 73160 -60.6% 28849 ± 3% stress-ng.fiemap.ops 72471 -60.5% 28612 ± 3% stress-ng.fiemap.ops_per_sec 23421 ± 12% +21.2% 28388 ± 6% stress-ng.filename.ops 22638 ± 12% +20.3% 27241 ± 6% stress-ng.filename.ops_per_sec 21.25 ± 7% -10.6% 19.00 ± 3% stress-ng.iomix.ops 38.75 ± 49% -47.7% 20.25 ± 96% stress-ng.memhotplug.ops 34.45 ± 52% -51.8% 16.62 ±106% stress-ng.memhotplug.ops_per_sec 1734 ± 10% +31.4% 2278 ± 10% stress-ng.resources.ops 807.56 ± 5% +35.2% 1091 ± 8% stress-ng.resources.ops_per_sec 1007356 ± 3% -16.5% 840642 ± 9% stress-ng.revio.ops 1007692 ± 3% -16.6% 840711 ± 9% stress-ng.revio.ops_per_sec 21812 ± 3% +16.0% 25294 ± 5% stress-ng.sysbadaddr.ops 21821 ± 3% +15.9% 25294 ± 5% stress-ng.sysbadaddr.ops_per_sec 440.75 ± 4% +21.9% 537.25 ± 9% stress-ng.sysfs.ops 440.53 ± 4% +21.9% 536.86 ± 9% stress-ng.sysfs.ops_per_sec 13286582 -11.1% 11805520 ± 6% stress-ng.time.file_system_outputs 68253896 +2.4% 69860122 stress-ng.time.minor_page_faults 197.00 ± 4% -15.9% 165.75 ± 12% stress-ng.xattr.ops 192.45 ± 5% -16.1% 161.46 ± 11% stress-ng.xattr.ops_per_sec 15310 +62.5% 24875 ± 22% stress-ng.zombie.ops 15310 +62.5% 24874 ± 22% stress-ng.zombie.ops_per_sec 203.50 ± 12% -47.3% 107.25 ± 49% vmstat.io.bi 861318 ± 18% -29.7% 605884 ± 5% meminfo.AnonHugePages 1062742 ± 14% -20.2% 847853 ± 3% meminfo.AnonPages 31093 ± 6% +9.6% 34090 ± 3% meminfo.KernelStack 7151 ± 34% +55.8% 11145 ± 9% meminfo.Mlocked 1.082e+08 ± 5% -40.2% 64705429 ± 31% numa-numa
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/15/2020 11:10 PM, Hillf Danton wrote: On Mon, 15 Jun 2020 10:10:41 +0200 Vincent Guittot wrote: Le lundi 15 juin 2020 15:26:59 (+0800), Xing Zhengjun a crit : On 6/12/2020 7:06 PM, Hillf Danton wrote: On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote: ... I apply the patch based on v5.7, the regression still existed. Thanks for the test. Thanks. I don't know if it's relevant or not but the results seem a bit better with the patch and I'd like to check that it's only a matter of threshold to fix the problem. Could you try the patch below which is quite aggressive but will help to confirm this ? diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 28be1c984a42..3c51d557547b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8322,10 +8322,13 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { + unsigned long imb; + if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < + imb = sgs->sum_nr_running * 100; + if ((sgs->group_capacity * imb) < (sgs->group_runnable * 100)) return false; @@ -8347,6 +8350,8 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { + unsigned long imb; + if (sgs->sum_nr_running <= sgs->group_weight) return false; @@ -8354,7 +8359,8 @@ group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) (sgs->group_util * imbalance_pct)) return true; - if ((sgs->group_capacity * imbalance_pct) < + imb = sgs->sum_nr_running * 100; + if ((sgs->group_capacity * imb) < (sgs->group_runnable * 100)) return true; = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 6b33257768b8dd3982054885ea310871be2cfe0b (Hillf's patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 6b33257768b8dd3982054885ea3 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 -10.1% 0.62reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 +0.3% 0.62reaim.child_utime 66870 -10.0% 60187-7.6% 61787 -8.3% 61305reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 -8.3% 15326reaim.jobs_per_min_child 97.84-1.1% 96.75-0.4% 97.43 -0.5% 97.37reaim.jti 72000 -10.8% 64216-8.3% 66000 -8.3% 66000reaim.max_jobs_per_min 0.36 +10.6% 0.40+7.8% 0.39 +9.4% 0.39reaim.parent_time 1.58 2% +71.0% 2.70 2% +26.9% 2.01 2% +33.2% 2.11reaim.std_dev_percent 0.00 5%+110.4% 0.01 3% +48.8% 0.01 7% +65.3% 0.01 3% reaim.std_dev_time 50800-2.4% 49600-1.6% 5 -1.8% 49866reaim.workload Following the introduction of runnable_avg there came a gap between it and util, and it can be supposedly filled up by determining the pivot point using the imb percent. The upside is that no heuristic is added. --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8215,15 +8215,8 @@ group_has_capacity(unsigned int imbalanc if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) - return false; - - if ((sgs->group_capacity * 100) > - (sgs->group_util * imbalance_pct)) - return true; - - return false; + return sgs->group_capacity * imbalance_pct > + (sgs->group_util + sgs->group_runnable) *50; } /* @@ -8240,15 +8233,8 @@ group_is_overloaded(unsigned int imbala
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/15/2020 4:10 PM, Vincent Guittot wrote: Hi Xing, Le lundi 15 juin 2020 à 15:26:59 (+0800), Xing Zhengjun a écrit : On 6/12/2020 7:06 PM, Hillf Danton wrote: On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote: ... --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8215,12 +8215,8 @@ group_has_capacity(unsigned int imbalanc if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) - return false; - - if ((sgs->group_capacity * 100) > - (sgs->group_util * imbalance_pct)) + if ((sgs->group_capacity * 100) > (sgs->group_util * imbalance_pct) && + (sgs->group_capacity * 100) > (sgs->group_runnable * imbalance_pct)) return true; return false; @@ -8240,12 +8236,8 @@ group_is_overloaded(unsigned int imbalan if (sgs->sum_nr_running <= sgs->group_weight) return false; - if ((sgs->group_capacity * 100) < - (sgs->group_util * imbalance_pct)) - return true; - - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) + if ((sgs->group_capacity * 100) < (sgs->group_util * imbalance_pct) || + (sgs->group_capacity * 100) < (sgs->group_runnable * imbalance_pct)) return true; return false; I apply the patch based on v5.7, the regression still existed. Thanks for the test. I don't know if it's relevant or not but the results seem a bit better with the patch and I'd like to check that it's only a matter of threshold to fix the problem. Could you try the patch below which is quite aggressive but will help to confirm this ? diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 28be1c984a42..3c51d557547b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8322,10 +8322,13 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { + unsigned long imb; + if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < + imb = sgs->sum_nr_running * 100; + if ((sgs->group_capacity * imb) < (sgs->group_runnable * 100)) return false; @@ -8347,6 +8350,8 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { + unsigned long imb; + if (sgs->sum_nr_running <= sgs->group_weight) return false; @@ -8354,7 +8359,8 @@ group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) (sgs->group_util * imbalance_pct)) return true; - if ((sgs->group_capacity * imbalance_pct) < + imb = sgs->sum_nr_running * 100; + if ((sgs->group_capacity * imb) < (sgs->group_runnable * 100)) return true; I apply the patch based on v5.7, the test result is as the following: = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 3e1643da53f3fc7414cfa3ad2a16ab2a164b7f4d (the test patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 3e1643da53f3fc7414cfa3ad2a1 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 -7.1% 0.64reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 +1.3% 0.63reaim.child_utime 66870 -10.0% 60187-7.6% 61787 -6.1% 62807reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 -6.1% 15701reaim.jobs_per_min_child 97.84-1.1% 96.75-0.4% 97.43 -0.5% 97.34reaim.jti 72000 -10.8% 64216-8.3% 66000 -5.7% 678
Re: [LKP] [rcu] 276c410448: will-it-scale.per_thread_ops -12.3% regression
Hi Paul, Do you have time to take a look at this? Thanks. On 6/15/2020 4:57 PM, kernel test robot wrote: Greeting, FYI, we noticed a -12.3% regression of will-it-scale.per_thread_ops due to commit: commit: 276c410448dbca357a2bc3539acfe04862e5f172 ("rcu-tasks: Split ->trc_reader_need_end") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory with following parameters: nr_task: 100% mode: thread test: page_fault3 cpufreq_governor: performance ucode: 0x11 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-knm01/page_fault3/will-it-scale/0x11 commit: b0afa0f056 ("rcu-tasks: Provide boot parameter to delay IPIs until late in grace period") 276c410448 ("rcu-tasks: Split ->trc_reader_need_end") b0afa0f056676ffe 276c410448dbca357a2bc3539ac --- fail:runs %reproductionfail:runs | | | 2:4 -50%:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x :4 28% 1:4 perf-profile.calltrace.cycles-pp.error_entry 0:40% 0:4 perf-profile.children.cycles-pp.error_exit 1:47% 2:4 perf-profile.children.cycles-pp.error_entry 0:44% 1:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ |\ 1414 -12.3% 1241 ± 2% will-it-scale.per_thread_ops 463.32+1.7% 470.99will-it-scale.time.elapsed_time 463.32+1.7% 470.99 will-it-scale.time.elapsed_time.max 407566 -12.3% 357573 ± 2% will-it-scale.workload 48.51-1.5% 47.77boot-time.boot 7.203e+10 +20.0% 8.64e+10 ± 2% cpuidle.C1.time 2.162e+08 ± 2% +27.7% 2.761e+08 ± 2% cpuidle.C1.usage 60.50 +12.2 72.74 ± 2% mpstat.cpu.all.idle% 39.17 -12.2 26.97 ± 6% mpstat.cpu.all.sys% 2334 ± 12% +18.8% 2772 ± 5% slabinfo.khugepaged_mm_slot.active_objs 2334 ± 12% +18.8% 2772 ± 5% slabinfo.khugepaged_mm_slot.num_objs 60.25 +20.3% 72.50 ± 2% vmstat.cpu.id 92.75 ± 3% -21.6% 72.75 ± 5% vmstat.procs.r 223709 +41.8% 317250 ± 3% vmstat.system.cs 641687 ± 3% +8.0% 693245 ± 2% proc-vmstat.nr_inactive_anon 641688 ± 3% +8.0% 693245 ± 2% proc-vmstat.nr_zone_inactive_anon 166782-3.7% 160632proc-vmstat.numa_hint_faults 166782-3.7% 160632 proc-vmstat.numa_hint_faults_local 984.25 -14.2% 844.75 ± 2% proc-vmstat.numa_huge_pte_updates 710979 -11.2% 631134proc-vmstat.numa_pte_updates 1.967e+08 -10.9% 1.752e+08proc-vmstat.pgfault 58.18+3.4% 60.17perf-stat.i.MPKI 1.173e+09 +10.5% 1.296e+09perf-stat.i.branch-instructions 6.74-0.16.68perf-stat.i.branch-miss-rate% 72495831 +10.7% 80219684perf-stat.i.branch-misses 14.68-0.6 14.06perf-stat.i.cache-miss-rate% 43014696 +10.5% 47551690perf-stat.i.cache-misses 2.936e+08 +15.6% 3.393e+08perf-stat.i.cache-references 227441 +42.0% 323034 ± 3% perf-stat.i.context-switches 37.22 -29.0% 26.44 ± 5% perf-stat.i.cpi 1.828e+11 -22.3% 1.421e+11 ± 3% perf-stat.i.cpu-cycles 513.71 +13.6% 583.63perf-stat.i.cpu-migrations 4303 -27.8% 3107 ± 5% perf-stat.i.cycles-between-cache-misses 1.78-0.01.74perf-stat.
Re: [LKP] [sched/fair] 6c8116c914: stress-ng.mmapfork.ops_per_sec -38.0% regression
On 6/15/2020 1:18 PM, Tao Zhou wrote: Hi, On Fri, Jun 12, 2020 at 03:59:31PM +0800, Xing Zhengjun wrote: Hi, I test the regression, it still existed in v5.7. If you have any fix for it, please send it to me, I can verify it. Thanks. When busiest group is group_fully_busy and local group <= group_fully_busy the metric used: local group busiest group use metric group_fully_busy group_fully_busy avg load group_has_sparegroup_fully_busy idle cpu/task num In find_busiest_group() about this condition: 'if (busiest->group_type != group_overloaded) {' in this case, busiest type is group_fully_busy, local type <= group_fully_busy. in this branch, it check idle cpu and task num and can go to out_balance. That is to say ignore group_fully_busy other than group_has_spare(this case is done in calculate_imbalance()). When local group and busiest group are all group_fully_busy, need to use avg load to metric(in calculate_imbalance()). So give the below change: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cbcb2f71599b..0afbea39dd5a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9219,24 +9219,26 @@ static struct sched_group *find_busiest_group(struct lb_env *env) */ goto out_balanced; - if (busiest->group_weight > 1 && - local->idle_cpus <= (busiest->idle_cpus + 1)) - /* -* If the busiest group is not overloaded -* and there is no imbalance between this and busiest -* group wrt idle CPUs, it is balanced. The imbalance -* becomes significant if the diff is greater than 1 -* otherwise we might end up to just move the imbalance -* on another group. Of course this applies only if -* on another group. Of course this applies only if -* there is more than 1 CPU per group. -*/ - goto out_balanced; + if (local->group_type == group_has_spare) { + if (busiest->group_weight > 1 && + local->idle_cpus <= (busiest->idle_cpus + 1)) + /* +* If the busiest group is not overloaded +* and there is no imbalance between this and busiest +* group wrt idle CPUs, it is balanced. The imbalance +* becomes significant if the diff is greater than 1 +* otherwise we might end up to just move the imbalance +* on another group. Of course this applies only if +* there is more than 1 CPU per group. +*/ + goto out_balanced; - if (busiest->sum_h_nr_running == 1) - /* -* busiest doesn't have any tasks waiting to run -*/ - goto out_balanced; + if (busiest->sum_h_nr_running == 1) + /* +* busiest doesn't have any tasks waiting to run +*/ + goto out_balanced; + } } force_balance: In fact, I don't know this change can help or not, can be right or not. No test, no compile. If it is wrong, just ignore. Thanks I apply the patch based on v5.7, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/sc_pid_max/testtime/class/cpufreq_governor/ucode: lkp-bdw-ep6/stress-ng/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/100%/1HDD/4194304/1s/scheduler/performance/0xb38 commit: e94f80f6c49020008e6fa0f3d4b806b8595d17d8 6c8116c914b65be5e4d6f66d69c8142eb0648c22 v5.7 c7e6d37f60da32f808140b1b7dabcc3cde73c4cc (Tao's patch) e94f80f6c4902000 6c8116c914b65be5e4d6f66d69cv5.7 c7e6d37f60da32f808140b1b7da --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 819250 ± 5% -10.1% 736616 ± 8% +41.2%1156877 ± 3% +43.6%1176246 ± 5% stress-ng.futex.ops 818985 ± 5% -10.1% 736460 ± 8% +41.2%1156215 ± 3%
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/12/2020 11:19 PM, Vincent Guittot wrote: Le vendredi 12 juin 2020 à 14:36:49 (+0800), Xing Zhengjun a écrit : Hi Vincent, We test the regression still existed in v5.7, do you have time to look at it? Thanks. The commit 070f5e860ee2 moveis some cases from the state "group has spare capacity" to to the state "group is overloaded". Typically when util_avg decreases significantly after a migration but the group is in fact still overloaded. The current rule uses a fix threshold but has the disavantage of possibly including some cases with spare capacity but a high runnable_avg (because of tasks running simultaneously as an example). It looks like this benchmark is impacted by moving such cases from has_spare_capacity to is_overloaded. I have a patch in my backlog that tries to fix the problem but I never sent it because I failed to find a benchmark that will benefit from it. This patch moves back some cases from overloaded state to "has spare capacity" state. Could you make it a try ? --- kernel/sched/fair.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ed04d2a8959..c24f85969591 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8213,10 +8213,14 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { + unsigned long imb; + if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < + imb = imbalance_pct-100; + imb = sgs->sum_nr_running * imb + 100; + if ((sgs->group_capacity * imb) < (sgs->group_runnable * 100)) return false; @@ -8238,6 +8242,8 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { + unsigned long imb; + if (sgs->sum_nr_running <= sgs->group_weight) return false; @@ -8245,7 +8251,9 @@ group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) (sgs->group_util * imbalance_pct)) return true; - if ((sgs->group_capacity * imbalance_pct) < + imb = imbalance_pct-100; + imb = sgs->sum_nr_running * imb + 100; + if ((sgs->group_capacity * imb) < (sgs->group_runnable * 100)) return true; -- 2.17.1 I apply the patch based on v5.7, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 068638639cdfa15dbff137a0e3ef4a4cc6730ff4 (Vincent's patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 068638639cdfa15dbff137a0e3e --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 -8.9% 0.63reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 +0.6% 0.62reaim.child_utime 66870 -10.0% 60187-7.6% 61787 -7.7% 61714reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 -7.7% 15428reaim.jobs_per_min_child 97.84-1.1% 96.75-0.4% 97.43 -0.6% 97.25reaim.jti 72000 -10.8% 64216-8.3% 66000 -8.3% 66000reaim.max_jobs_per_min 0.36 +10.6% 0.40+7.8% 0.39 +8.2% 0.39reaim.parent_time 1.58 ± 2% +71.0% 2.70 ± 2% +26.9% 2.01 ± 2% +38.4% 2.19 ± 6% reaim.std_dev_percent 0.00 ± 5%+110.4% 0.01 ± 3% +48.8% 0.01 ± 7% +67.1% 0.01 ± 9% reaim.std_dev_time 50800-2.4% 49600-1.6% 5 -1.6% 5reaim.workload = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/rea
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
On 6/12/2020 7:06 PM, Hillf Danton wrote: On Fri, 12 Jun 2020 14:36:49 +0800 Xing Zhengjun wrote: Hi Vincent, We test the regression still existed in v5.7, do you have time to look at it? Thanks. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 --- --- %stddev %change %stddev %change %stddev \ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 reaim.child_utime 66870 -10.0% 60187-7.6% 61787 reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 reaim.jobs_per_min_child 97.84-1.1% 96.75-0.4% 97.43 reaim.jti 72000 -10.8% 64216-8.3% 66000 reaim.max_jobs_per_min 0.36 +10.6% 0.40+7.8% 0.39 reaim.parent_time 1.58 ą 2% +71.0% 2.70 ą 2% +26.9% 2.01 ą 2% reaim.std_dev_percent 0.00 ą 5%+110.4% 0.01 ą 3% +48.8% 0.01 ą 7% reaim.std_dev_time 50800-2.4% 49600-1.6% 5 reaim.workload On 3/19/2020 10:38 AM, kernel test robot wrote: Greeting, FYI, we noticed a -10.5% regression of reaim.jobs_per_min due to commit: commit: 070f5e860ee2bf588c99ef7b4c202451faa48236 ("sched/fair: Take into account runnable_avg to classify group") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: reaim on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory with following parameters: runtime: 300s nr_task: 100% test: five_sec cpufreq_governor: performance ucode: 0x21 test-description: REAIM is an updated and improved version of AIM 7 benchmark. test-url: https://sourceforge.net/projects/re-aim-7/ Hi Xing After 070f5e860ee2 let's treat runnable the same way as util on comparing capacity in the assumption that (125 + 110 + 117) / 3 = 117 accounts for 105 within margin of error before any other proposal with some more reasons. thanks Hillf --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8215,12 +8215,8 @@ group_has_capacity(unsigned int imbalanc if (sgs->sum_nr_running < sgs->group_weight) return true; - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) - return false; - - if ((sgs->group_capacity * 100) > - (sgs->group_util * imbalance_pct)) + if ((sgs->group_capacity * 100) > (sgs->group_util * imbalance_pct) && + (sgs->group_capacity * 100) > (sgs->group_runnable * imbalance_pct)) return true; return false; @@ -8240,12 +8236,8 @@ group_is_overloaded(unsigned int imbalan if (sgs->sum_nr_running <= sgs->group_weight) return false; - if ((sgs->group_capacity * 100) < - (sgs->group_util * imbalance_pct)) - return true; - - if ((sgs->group_capacity * imbalance_pct) < - (sgs->group_runnable * 100)) + if ((sgs->group_capacity * 100) < (sgs->group_util * imbalance_pct) || + (sgs->group_capacity * 100) < (sgs->group_runnable * imbalance_pct)) return true; return false; I apply the patch based on v5.7, the regression still existed. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 6b33257768b8dd3982054885ea310871be2cfe0b (Hillf's patch) 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 6b33257768b8dd3982054885ea3 --- --- --- %stddev %change %stddev %change %stddev %change
Re: [LKP] [btrfs] c75e839414: aim7.jobs-per-min -9.1% regression
Hi Josef, Do you have time to take a look at this? Thanks. On 6/12/2020 2:11 PM, kernel test robot wrote: Greeting, FYI, we noticed a -9.1% regression of aim7.jobs-per-min due to commit: commit: c75e839414d3610e6487ae3145199c500d55f7f7 ("btrfs: kill the subvol_srcu") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: aim7 on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: disk: 4BRD_12G md: RAID0 fs: btrfs test: disk_wrt load: 1500 cpufreq_governor: performance ucode: 0x52c test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system. test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/4BRD_12G/btrfs/x86_64-rhel-7.6/1500/RAID0/debian-x86_64-20191114.cgz/lkp-csl-2ap2/disk_wrt/aim7/0x52c commit: efc3453494 ("btrfs: make btrfs_cleanup_fs_roots use the radix tree lock") c75e839414 ("btrfs: kill the subvol_srcu") efc3453494af7818 c75e839414d3610e6487ae31451 --- fail:runs %reproductionfail:runs | | | 3:9 -33%:8 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x %stddev %change %stddev \ |\ 29509 ± 2% -9.1% 26837 ± 2% aim7.jobs-per-min 305.28 ± 2% +10.0% 335.72 ± 2% aim7.time.elapsed_time 305.28 ± 2% +10.0% 335.72 ± 2% aim7.time.elapsed_time.max 4883135 ± 10% +37.9%6735464 ± 7% aim7.time.involuntary_context_switches 56288 ± 2% +10.5% 62202 ± 2% aim7.time.system_time 2344783+6.5%2497364 ± 2% aim7.time.voluntary_context_switches 62337721 ± 2% +9.8% 68456490 ± 2% turbostat.IRQ 431.56 ± 6% +22.3% 527.88 ± 4% vmstat.procs.r 27340 ± 2% +11.2% 30397 ± 2% vmstat.system.cs 226804 ± 6% +21.7% 276057 ± 4% meminfo.Active(file) 221309 ± 6% +22.3% 270668 ± 4% meminfo.Dirty 720.89 ±111% +49.3% 1076 ± 73% meminfo.Mlocked 14278 ± 2% -8.3% 13094 ± 2% meminfo.max_used_kB 57228 ± 6% +22.7% 70195 ± 5% numa-meminfo.node0.Active(file) 55433 ± 6% +21.6% 67431 ± 4% numa-meminfo.node0.Dirty 56152 ± 6% +21.4% 68180 ± 5% numa-meminfo.node1.Active(file) 55001 ± 6% +22.5% 67397 ± 4% numa-meminfo.node1.Dirty 56373 ± 6% +21.7% 68594 ± 4% numa-meminfo.node2.Active(file) 55222 ± 7% +22.6% 67726 ± 4% numa-meminfo.node2.Dirty 56671 ± 6% +20.5% 68317 ± 3% numa-meminfo.node3.Active(file) 55285 ± 6% +21.8% 67355 ± 4% numa-meminfo.node3.Dirty 56694 ± 6% +21.7% 69019 ± 4% proc-vmstat.nr_active_file 55342 ± 6% +22.3% 67662 ± 4% proc-vmstat.nr_dirty 402316+2.1% 410951proc-vmstat.nr_file_pages 180.22 ±111% +49.4% 269.25 ± 73% proc-vmstat.nr_mlock 56694 ± 6% +21.7% 69019 ± 4% proc-vmstat.nr_zone_active_file 54680 ± 6% +22.8% 67168 ± 4% proc-vmstat.nr_zone_write_pending 3144381 ± 2% +6.1%3335275proc-vmstat.pgactivate 1387558 ± 2% +7.9%1496754 ± 2% proc-vmstat.pgfault 983.33 ± 4% +5.4% 1036 proc-vmstat.unevictable_pgs_culled 14331 ± 6% +22.6% 17566 ± 5% numa-vmstat.node0.nr_active_file 13884 ± 6% +21.6% 16884 ± 4% numa-vmstat.node0.nr_dirty 14330 ± 6% +22.6% 17566 ± 5% numa-vmstat.node0.nr_zone_active_file 13714 ± 6% +22.2% 16755 ± 4% numa-vmstat.node0.nr_zone_write_pending 14047 ± 6% +21.3% 17043 ± 4% numa-vmstat.node1.nr_active_file 13763 ± 6% +22.3% 16838 ± 4% numa-vmstat.node1.nr_dirty 14047 ± 6% +21.3% 17043 ± 4% numa-vmstat.node1.nr_zone_active_file 13599 ± 6% +23.0% 16726 ± 4% numa-vmstat.node1.nr_zone_write_pending 14074 ± 5% +21.7% 17130 ± 4% numa-vmstat.node2.nr_active_
Re: [LKP] [x86, sched] 1567c3e346: vm-scalability.median -15.8% regression
Hi Giovanni, I test the regression, it still existed in v5.7. Do you have time to take a look at this? Thanks. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/debug-setup/size/test/cpufreq_governor/ucode: lkp-hsw-4ex1/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/test/8T/anon-cow-seq/performance/0x16 commit: 2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d 1567c3e3467cddeb019a7b53ec632f834b6a9239 v5.7-rc1 v5.7 2a4b03ffc69f2ded 1567c3e3467cddeb019a7b53ec6v5.7-rc1 v5.7 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 211462 -16.0% 177702 -15.0% 179809 -15.1% 179510vm-scalability.median 5.34 ± 9% -3.12.23 ± 11% -2.92.49 ± 5% -2.72.61 ± 11% vm-scalability.median_stddev% 30430671 -16.3% 25461360 -15.5% 25707029 -15.5% 25701713vm-scalability.throughput 7.967e+09 -11.1% 7.082e+09 -11.1% 7.082e+09 -11.1% 7.082e+09vm-scalability.workload On 4/16/2020 2:20 PM, Giovanni Gherdovich wrote: On Thu, 2020-04-16 at 14:10 +0800, Xing Zhengjun wrote: Hi Giovanni, 1567c3e346("x86, sched: Add support for frequency invariance") has been merged into Linux mainline v5.7-rc1 now. Do you have time to take a look at this? Thanks. Apologies, this slipped under my radar. I'm on it, thanks. Giovanni Gherdovich -- Zhengjun Xing
Re: [LKP] [sched/fair] 6c8116c914: stress-ng.mmapfork.ops_per_sec -38.0% regression
Hi, I test the regression, it still existed in v5.7. If you have any fix for it, please send it to me, I can verify it. Thanks. = tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/sc_pid_max/testtime/class/cpufreq_governor/ucode: lkp-bdw-ep6/stress-ng/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/100%/1HDD/4194304/1s/scheduler/performance/0xb38 commit: e94f80f6c49020008e6fa0f3d4b806b8595d17d8 6c8116c914b65be5e4d6f66d69c8142eb0648c22 v5.7-rc3 v5.7 e94f80f6c4902000 6c8116c914b65be5e4d6f66d69cv5.7-rc3 v5.7 --- --- --- %stddev %change %stddev %change %stddev %change %stddev \ |\ |\ |\ 21398 ± 7% +6.5% 22781 ± 2% -14.5% 18287 ± 4% -5.5% 20231 ± 14% stress-ng.clone.ops 819250 ± 5% -10.1% 736616 ± 8% +34.2%1099410 ± 5% +41.2%1156877 ± 3% stress-ng.futex.ops 818985 ± 5% -10.1% 736460 ± 8% +34.2%1099487 ± 5% +41.2%1156215 ± 3% stress-ng.futex.ops_per_sec 1551 ± 3% -3.4% 1498 ± 5% -9.5% 1404 ± 2% -4.6% 1480 ± 5% stress-ng.inotify.ops 1547 ± 3% -3.5% 1492 ± 5% -9.5% 1400 ± 2% -4.8% 1472 ± 5% stress-ng.inotify.ops_per_sec 11292 ± 8% -2.8% 10974 ± 8% +1.9% 11505 ± 13% -9.4% 10225 ± 6% stress-ng.kill.ops 28.20 ± 4% -35.4% 18.22 -33.5% 18.75 -33.4% 18.77stress-ng.mmapfork.ops_per_sec 1932318+1.5%1961688 ± 2% -22.8%1492231 ± 2% +4.0%2010509 ± 3% stress-ng.softlockup.ops 1931679 ± 2% +1.5%1961143 ± 2% -22.8%1491939 ± 2% +4.0%2009585 ± 3% stress-ng.softlockup.ops_per_sec 18607406 ± 6% -12.9% 16210450 ± 21% -12.7% 16238693 ± 14% -8.0% 17120880 ± 13% stress-ng.switch.ops 18604406 ± 6% -12.9% 16208270 ± 21% -12.7% 16237956 ± 14% -8.0% 17115273 ± 13% stress-ng.switch.ops_per_sec 2999012 ± 21% -10.1%2696954 ± 22% -9.1%2725653 ± 21% -88.5% 37 ± 11% stress-ng.tee.ops_per_sec 7882 ± 3% -5.4% 7458 ± 4% -4.0% 7566 ± 4% -2.0% 7724 ± 3% stress-ng.vforkmany.ops 7804 ± 3% -5.2% 7400 ± 4% -3.8% 7504 ± 4% -2.0% 7647 ± 3% stress-ng.vforkmany.ops_per_sec 46745421 ± 3% -8.1% 42938569 ± 3% -7.8% 43078233 ± 3% -5.2% 44312072 ± 4% stress-ng.yield.ops 46734472 ± 3% -8.1% 42926316 ± 3% -7.8% 43067447 ± 3% -5.2% 44290338 ± 4% stress-ng.yield.ops_per_sec On 4/27/2020 8:46 PM, Vincent Guittot wrote: On Mon, 27 Apr 2020 at 13:35, Hillf Danton wrote: On Mon, 27 Apr 2020 11:03:58 +0200 Vincent Guittot wrote: On Sun, 26 Apr 2020 at 14:42, Hillf Danton wrote: On 4/21/2020 8:47 AM, kernel test robot wrote: Greeting, FYI, we noticed a 56.4% improvement of stress-ng.fifo.ops_per_sec due to commit: commit: 6c8116c914b65be5e4d6f66d69c8142eb0648c22 ("sched/fair: Fix condition of avg_load calculation") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: stress-ng on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory with following parameters: nr_threads: 100% disk: 1HDD testtime: 1s class: scheduler cpufreq_governor: performance ucode: 0xb38 sc_pid_max: 4194304 We need to handle group_fully_busy in a different way from group_overloaded as task push does not help grow load balance in the former case. Have you tested this patch for the UC above ? Do you have figures ? No I am looking for a box of 88 threads. Likely to get access to it in as early as three weeks. --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8744,30 +8744,20 @@ find_idlest_group(struct sched_domain *s switch (local_sgs.group_type) { case group_overloaded: - case group_fully_busy: - /* -* When comparing groups across NUMA domains, it's possible for -* the local domain to be very lightly loaded relative to the -* remote domains but "imbalance" skews the comparison making -* remote CPUs look much more favourable. When considering -* cross-domain, add imbalance to the load on the remote node -* and consider staying local. -*/ - - if ((sd->flags & SD_NUMA) && - ((idlest_sgs.avg_load + imbalance) >= local_sgs.avg_load)) + i
Re: [LKP] [sched/fair] 070f5e860e: reaim.jobs_per_min -10.5% regression
Hi Vincent, We test the regression still existed in v5.7, do you have time to look at it? Thanks. = tbox_group/testcase/rootfs/kconfig/compiler/runtime/nr_task/debug-setup/test/cpufreq_governor/ucode: lkp-ivb-d04/reaim/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/100%/test/five_sec/performance/0x21 commit: 9f68395333ad7f5bfe2f83473fed363d4229f11c 070f5e860ee2bf588c99ef7b4c202451faa48236 v5.7 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2v5.7 --- --- %stddev %change %stddev %change %stddev \ |\ |\ 0.69 -10.3% 0.62-9.1% 0.62 reaim.child_systime 0.62-1.0% 0.61+0.5% 0.62 reaim.child_utime 66870 -10.0% 60187-7.6% 61787 reaim.jobs_per_min 16717 -10.0% 15046-7.6% 15446 reaim.jobs_per_min_child 97.84-1.1% 96.75-0.4% 97.43 reaim.jti 72000 -10.8% 64216-8.3% 66000 reaim.max_jobs_per_min 0.36 +10.6% 0.40+7.8% 0.39 reaim.parent_time 1.58 ± 2% +71.0% 2.70 ± 2% +26.9% 2.01 ± 2% reaim.std_dev_percent 0.00 ± 5%+110.4% 0.01 ± 3% +48.8% 0.01 ± 7% reaim.std_dev_time 50800-2.4% 49600-1.6% 5 reaim.workload On 3/19/2020 10:38 AM, kernel test robot wrote: Greeting, FYI, we noticed a -10.5% regression of reaim.jobs_per_min due to commit: commit: 070f5e860ee2bf588c99ef7b4c202451faa48236 ("sched/fair: Take into account runnable_avg to classify group") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: reaim on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory with following parameters: runtime: 300s nr_task: 100% test: five_sec cpufreq_governor: performance ucode: 0x21 test-description: REAIM is an updated and improved version of AIM 7 benchmark. test-url: https://sourceforge.net/projects/re-aim-7/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/100%/debian-x86_64-20191114.cgz/300s/lkp-ivb-d04/five_sec/reaim/0x21 commit: 9f68395333 ("sched/pelt: Add a new runnable average signal") 070f5e860e ("sched/fair: Take into account runnable_avg to classify group") 9f68395333ad7f5b 070f5e860ee2bf588c99ef7b4c2 --- fail:runs %reproductionfail:runs | | | 4:4 -18% 3:4 perf-profile.children.cycles-pp.error_entry 3:4 -12% 3:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ |\ 0.68 -10.4% 0.61reaim.child_systime 67235 -10.5% 60195reaim.jobs_per_min 16808 -10.5% 15048reaim.jobs_per_min_child 97.90-1.2% 96.70reaim.jti 72000 -10.8% 64216reaim.max_jobs_per_min 0.36 +11.3% 0.40reaim.parent_time 1.56 ± 3% +79.1% 2.80 ± 6% reaim.std_dev_percent 0.00 ± 7%+145.9% 0.01 ± 9% reaim.std_dev_time 104276 -16.0% 87616 reaim.time.involuntary_context_switches 15511157-2.4% 15144312reaim.time.minor_page_faults 55.00-7.3% 51.00 reaim.time.percent_of_cpu_this_job_got 88.01 -12.4% 77.12reaim.time.system_time 79.97-3.2% 77.38reaim.time.user_time 216380-3.4% 208924 reaim.time.voluntary_context_switches 50800-2.4% 49600reaim.workload 30.40 ± 2% -4.7% 28.97 ± 2% boot-time.boot 9.38-0.78.66 ± 3% mpstat.cpu.all.sys% 7452+7.5% 8014vmstat.system.cs 1457802 ± 16% +49.3%
Re: [LKP] [ima] 8eb613c0b8: stress-ng.icache.ops_per_sec -84.2% regression
On 6/11/2020 6:53 PM, Mimi Zohar wrote: On Thu, 2020-06-11 at 15:10 +0800, Xing Zhengjun wrote: On 6/10/2020 9:53 PM, Mimi Zohar wrote: ucode: 0x52c Does the following change resolve it? diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index c44414a7f82e..78e1dfc8a3f2 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -426,7 +426,8 @@ int ima_file_mprotect(struct vm_area_struct *vma, unsigned long prot) int pcr; /* Is mprotect making an mmap'ed file executable? */ - if (!vma->vm_file || !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC)) + if (!(ima_policy_flag & IMA_APPRAISE) || !vma->vm_file || + !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC)) return 0; security_task_getsecid(current, &secid); Thanks. I test the change, it can resolve the regression. Thanks! Can I get your "Tested-by" tag? Mimi Sure. -- Zhengjun Xing
Re: [LKP] [ima] 8eb613c0b8: stress-ng.icache.ops_per_sec -84.2% regression
On 6/10/2020 9:53 PM, Mimi Zohar wrote: Hi Xing, On Wed, 2020-06-10 at 11:21 +0800, Xing Zhengjun wrote: Hi Mimi, Do you have time to take a look at this? we noticed a 3.7% regression of boot-time.dhcp and a 84.2% regression of stress-ng.icache.ops_per_sec. Thanks. On 6/3/2020 5:11 PM, kernel test robot wrote: Greeting, FYI, we noticed a 3.7% regression of boot-time.dhcp due to commit: commit: 8eb613c0b8f19627ba1846dcf78bb2c85edbe8dd ("ima: verify mprotect change is consistent with mmap policy") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 100% disk: 1HDD testtime: 30s class: cpu-cache cpufreq_governor: performance ucode: 0x52c Does the following change resolve it? diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index c44414a7f82e..78e1dfc8a3f2 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -426,7 +426,8 @@ int ima_file_mprotect(struct vm_area_struct *vma, unsigned long prot) int pcr; /* Is mprotect making an mmap'ed file executable? */ - if (!vma->vm_file || !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC)) + if (!(ima_policy_flag & IMA_APPRAISE) || !vma->vm_file || + !(prot & PROT_EXEC) || (vma->vm_flags & VM_EXEC)) return 0; security_task_getsecid(current, &secid); Thanks. I test the change, it can resolve the regression. = tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/class/cpufreq_governor/ucode: lkp-csl-2sp5/stress-ng/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-9/test/100%/1HDD/30s/cpu-cache/performance/0x52c commit: 0c4395fb2aa77341269ea619c5419ea48171883f 8eb613c0b8f19627ba1846dcf78bb2c85edbe8dd 8745d6eb3a493b1d324eeb9edefec5d23c16cba9 (fix for the regression) 0c4395fb2aa77341 8eb613c0b8f19627ba1846dcf78 8745d6eb3a493b1d324eeb9edef --- --- %stddev %change %stddev %change %stddev \ |\ |\ 884.33 ± 4% +4.6% 924.67 +45.1% 1283 ± 3% stress-ng.cache.ops 29.47 ± 4% +4.6% 30.82 +45.1% 42.76 ± 3% stress-ng.cache.ops_per_sec 1245720 -84.3% 195648-0.8%1235416 stress-ng.icache.ops 41522 -84.3% 6520-0.8% 41179 stress-ng.icache.ops_per_sec -- Zhengjun Xing
Re: [LKP] [xfs] a5949d3fae: aim7.jobs-per-min -33.6% regression
Hi Darrick, Do you have time to take a look at this? Thanks. On 6/6/2020 11:48 PM, kernel test robot wrote: Greeting, FYI, we noticed a -33.6% regression of aim7.jobs-per-min due to commit: commit: a5949d3faedf492fa7863b914da408047ab46eb0 ("xfs: force writes to delalloc regions to unwritten") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: aim7 on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory with following parameters: disk: 1BRD_48G fs: xfs test: sync_disk_rw load: 600 cpufreq_governor: performance ucode: 0x42e test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system. test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/1BRD_48G/xfs/x86_64-rhel-7.6/600/debian-x86_64-20191114.cgz/lkp-ivb-2ep1/sync_disk_rw/aim7/0x42e commit: 590b16516e ("xfs: refactor xfs_iomap_prealloc_size") a5949d3fae ("xfs: force writes to delalloc regions to unwritten") 590b16516ef38e2e a5949d3faedf492fa7863b914da --- fail:runs %reproductionfail:runs | | | :4 50% 2:4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x %stddev %change %stddev \ |\ 35272 -33.6% 23430aim7.jobs-per-min 102.13 +50.5% 153.75aim7.time.elapsed_time 102.13 +50.5% 153.75aim7.time.elapsed_time.max 1388038 +40.2%1945838 aim7.time.involuntary_context_switches 43420 ± 2% +13.4% 49255 ± 2% aim7.time.minor_page_faults 3123 +44.2% 4504 ± 2% aim7.time.system_time 59.31+6.5% 63.18aim7.time.user_time 48595108 +58.6% 77064959 aim7.time.voluntary_context_switches 1.44 -28.8% 1.02iostat.cpu.user 0.07 ± 6% +0.40.44 ± 7% mpstat.cpu.all.iowait% 1.44-0.41.02mpstat.cpu.all.usr% 8632 ± 50% +75.6% 15156 ± 34% numa-meminfo.node0.KernelStack 6583 ±136%+106.0% 13562 ± 82% numa-meminfo.node0.PageTables 63325 ± 11% +14.3% 72352 ± 12% numa-meminfo.node0.SUnreclaim 8647 ± 50% +75.3% 15156 ± 34% numa-vmstat.node0.nr_kernel_stack 1656 ±136%+104.6% 3389 ± 82% numa-vmstat.node0.nr_page_table_pages 15831 ± 11% +14.3% 18087 ± 12% numa-vmstat.node0.nr_slab_unreclaimable 93640 ± 3% +41.2% 132211 ± 2% meminfo.AnonHugePages 21641 +39.9% 30271 ± 4% meminfo.KernelStack 129269 +12.3% 145114meminfo.SUnreclaim 28000 -31.2% 19275meminfo.max_used_kB 1269307 -26.9% 927657vmstat.io.bo 149.75 ± 3% -17.4% 123.75 ± 4% vmstat.procs.r 718992 +13.3% 814567vmstat.system.cs 231397-9.3% 209881 ± 2% vmstat.system.in 6.774e+08 +70.0% 1.152e+09cpuidle.C1.time 18203372 +60.4% 29198744cpuidle.C1.usage 2.569e+08 ± 18% +81.8% 4.672e+08 ± 5% cpuidle.C1E.time 2691402 ± 13% +98.7%5346901 ± 3% cpuidle.C1E.usage 990350 +95.0%1931226 ± 2% cpuidle.POLL.time 520061 +97.7%1028004 ± 2% cpuidle.POLL.usage 77231+1.8% 78602proc-vmstat.nr_active_anon 19868+3.8% 20615proc-vmstat.nr_dirty 381302+1.0% 384969proc-vmstat.nr_file_pages 4388-2.7% 4270proc-vmstat.nr_inactive_anon 69865+4.7% 73155proc-vmstat.nr_inactive_file 21615 +40.0% 30251 ± 4% proc-vmstat.nr_kernel_stack 7363-3.2% 7127proc-vmstat.nr_mapped 12595 ± 3% +5.2% 13255 ± 4% proc-vmstat.nr_shmem 19619+3.2% 20247proc-vmstat.nr_slab_reclaimable 32316 +12.3% 36280proc-vmstat.n
Re: [LKP] [ima] 8eb613c0b8: stress-ng.icache.ops_per_sec -84.2% regression
Hi Mimi, Do you have time to take a look at this? we noticed a 3.7% regression of boot-time.dhcp and a 84.2% regression of stress-ng.icache.ops_per_sec. Thanks. On 6/3/2020 5:11 PM, kernel test robot wrote: Greeting, FYI, we noticed a 3.7% regression of boot-time.dhcp due to commit: commit: 8eb613c0b8f19627ba1846dcf78bb2c85edbe8dd ("ima: verify mprotect change is consistent with mmap policy") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 100% disk: 1HDD testtime: 30s class: cpu-cache cpufreq_governor: performance ucode: 0x52c If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode: cpu-cache/gcc-9/performance/1HDD/x86_64-rhel-7.6/100%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/stress-ng/30s/0x52c commit: 0c4395fb2a ("evm: Fix possible memory leak in evm_calc_hmac_or_hash()") 8eb613c0b8 ("ima: verify mprotect change is consistent with mmap policy") 0c4395fb2aa77341 8eb613c0b8f19627ba1846dcf78 --- fail:runs %reproductionfail:runs | | | :4 25% 1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x 0:43% 0:4 perf-profile.children.cycles-pp.error_entry %stddev %change %stddev \ |\ 1245570 -84.2% 197151stress-ng.icache.ops 41517 -84.2% 6570stress-ng.icache.ops_per_sec 1.306e+09 -82.1% 2.338e+08stress-ng.time.minor_page_faults 2985 +13.5% 3387stress-ng.time.system_time 4.28 +13.1% 4.85iostat.cpu.system 4.18+0.64.73mpstat.cpu.all.sys% 10121+9.6% 11096 ± 3% softirqs.CPU67.SCHED 203299-4.2% 194854 ± 5% vmstat.system.in 26.91+2.8% 27.67 ± 3% boot-time.boot 16.34+3.7% 16.94 ± 2% boot-time.dhcp 2183 ± 3% +3.7% 2263boot-time.idle 1042938 ± 80% +8208.2% 86649242 ±156% cpuidle.C1.time 48428 ±114% +1842.4% 940677 ±151% cpuidle.C1.usage 15748 ± 28%+301.0% 63144 ± 79% cpuidle.POLL.usage 61300 ± 4% +82.8% 112033 ± 11% numa-vmstat.node1.nr_active_anon 47060 ± 3%+106.8% 97323 ± 12% numa-vmstat.node1.nr_anon_pages 42.67 ± 2%+217.0% 135.25 ± 14% numa-vmstat.node1.nr_anon_transparent_hugepages 61301 ± 4% +82.8% 112032 ± 11% numa-vmstat.node1.nr_zone_active_anon 3816 ± 2% +3.0% 3931proc-vmstat.nr_page_table_pages 35216541+2.9% 36244047proc-vmstat.pgalloc_normal 1.308e+09 -82.0% 2.356e+08proc-vmstat.pgfault 35173363+2.8% 36173843proc-vmstat.pgfree 248171 ± 5% +82.5% 452893 ± 11% numa-meminfo.node1.Active 244812 ± 4% +83.5% 449116 ± 11% numa-meminfo.node1.Active(anon) 88290 ± 3%+214.4% 277591 ± 15% numa-meminfo.node1.AnonHugePages 187940 ± 3%+107.8% 390486 ± 12% numa-meminfo.node1.AnonPages 1366813 ± 3% +12.0%1530428 ± 6% numa-meminfo.node1.MemUsed 571.00 ± 8% +10.4% 630.50 ± 8% slabinfo.UDP.active_objs 571.00 ± 8% +10.4% 630.50 ± 8% slabinfo.UDP.num_objs 300.00 ± 5% +20.0% 360.00 ± 10% slabinfo.kmem_cache.active_objs 300.00 ± 5% +20.0% 360.00 ± 10% slabinfo.kmem_cache.num_objs 606.33 ± 4% +17.6% 713.00 ± 8% slabinfo.kmem_cache_node.active_objs 661.33 ± 4% +16.1% 768.00 ± 8% slabinfo.kmem_cache_node.num_objs 114561 ± 23% -34.3% 75239 ± 7% sched_debug.cfs_rq:/.load.max 14869 ± 22% -36.6% 9424 ± 8% sched_debug.cfs_rq:/.load.stddev 4040842 ± 5% +18.0%4767515 ± 13% sched_debug.cpu.avg_idle.max 2019061 ± 8% +25.5%2534134 ± 14% sched_debug.cpu.max_idle_balance_cost.max 378044 ± 3% +22.5% 463135 ± 8% sched_debug.cpu.max_idle_balance_cost.stddev 41605 +12.6% 46852 ± 2% sched_debug.c
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
On 8/30/2019 8:43 AM, Xing Zhengjun wrote: On 8/7/2019 3:56 PM, Xing Zhengjun wrote: On 7/24/2019 1:17 PM, Xing Zhengjun wrote: On 7/12/2019 2:42 PM, Xing Zhengjun wrote: Hi Trond, I attached perf-profile part big changes, hope it is useful for analyzing the issue. Ping... ping... ping... ping... In testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 20x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 80G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- %stddev %change %stddev \ | \ 527.29 -22.6% 407.96 fsmark.files_per_sec 1.97 ± 11% +0.9 2.88 ± 4% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 0.00 +0.9 0.93 ± 4% perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.11 ± 10% +0.9 3.05 ± 4% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 5.29 ± 2% +1.2 6.46 ± 7% perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 0.00 +3.4 3.41 ± 4% perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 0.00 +3.4 3.44 ± 4% perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 0.00 +3.5 3.54 ± 4% perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 1.81 ± 4% +3.8 5.59 ± 4% perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 1.80 ± 3% +3.8 5.59 ± 3% perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 1.73 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 1.72 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 0.00 +5.4 5.42 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 0.00 +5.5 5.52 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 0.00 +5.5 5.53 ± 4% perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.children.cycles-pp.worker_thread 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.children.cycles-pp.process_one_work 6.19 +3.2 9.40 ± 4% perf-profile.children.cycles-pp.memcpy_erms 34.53 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.kthread 0.00 +3.5 3.46 ± 4% perf-profile.children.cycles-pp.memcpy_from_page 0.00 +3.6 3.56 ± 4% perf-profile.children.cycles-pp._copy_from_iter_full 2.47 ± 4% +3.7 6.18 ± 3% perf-profil
Re: [PATCH v3] trace:Add "gfp_t" support in synthetic_events
Hi Steve, On 8/13/2019 11:04 AM, Steven Rostedt wrote: On Tue, 13 Aug 2019 09:04:28 +0800 Xing Zhengjun wrote: Hi Steve, Could you help to review? Thanks. Thanks for the ping. Yes, I'll take a look at it. I'll be pulling in a lot of patches that have queued up. -- Steve Could you help to review? Thanks. On 7/13/2019 12:05 AM, Tom Zanussi wrote: Hi Zhengjun, On Fri, 2019-07-12 at 09:53 +0800, Zhengjun Xing wrote: Add "gfp_t" support in synthetic_events, then the "gfp_t" type parameter in some functions can be traced. Prints the gfp flags as hex in addition to the human-readable flag string. Example output: whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO) rcuc/0-11 [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC) rcuc/0-11 [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC) Signed-off-by: Tom Zanussi Signed-off-by: Zhengjun Xing Looks good to me, thanks! Tom --- kernel/trace/trace_events_hist.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index ca6b0dff60c5..30f0f32aca62 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -13,6 +13,10 @@ #include #include +/* for gfp flag names */ +#include +#include + #include "tracing_map.h" #include "trace.h" #include "trace_dynevent.h" @@ -752,6 +756,8 @@ static int synth_field_size(char *type) size = sizeof(unsigned long); else if (strcmp(type, "pid_t") == 0) size = sizeof(pid_t); + else if (strcmp(type, "gfp_t") == 0) + size = sizeof(gfp_t); else if (synth_field_is_string(type)) size = synth_field_string_size(type); @@ -792,6 +798,8 @@ static const char *synth_field_fmt(char *type) fmt = "%lu"; else if (strcmp(type, "pid_t") == 0) fmt = "%d"; + else if (strcmp(type, "gfp_t") == 0) + fmt = "%x"; else if (synth_field_is_string(type)) fmt = "%s"; @@ -834,9 +842,20 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, i == se->n_fields - 1 ? "" : " "); n_u64 += STR_VAR_LEN_MAX / sizeof(u64); } else { + struct trace_print_flags __flags[] = { + __def_gfpflag_names, {-1, NULL} }; + trace_seq_printf(s, print_fmt, se- fields[i]->name, entry->fields[n_u64], i == se->n_fields - 1 ? "" : " "); + + if (strcmp(se->fields[i]->type, "gfp_t") == 0) { + trace_seq_puts(s, " ("); + trace_print_flags_seq(s, "|", + entry- fields[n_u64], + __flags); + trace_seq_putc(s, ')'); + } n_u64++; } } -- Zhengjun Xing
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
On 8/7/2019 3:56 PM, Xing Zhengjun wrote: On 7/24/2019 1:17 PM, Xing Zhengjun wrote: On 7/12/2019 2:42 PM, Xing Zhengjun wrote: Hi Trond, I attached perf-profile part big changes, hope it is useful for analyzing the issue. Ping... ping... ping... In testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 20x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 80G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- %stddev %change %stddev \ | \ 527.29 -22.6% 407.96 fsmark.files_per_sec 1.97 ± 11% +0.9 2.88 ± 4% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 0.00 +0.9 0.93 ± 4% perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.11 ± 10% +0.9 3.05 ± 4% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 5.29 ± 2% +1.2 6.46 ± 7% perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 0.00 +3.4 3.41 ± 4% perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 0.00 +3.4 3.44 ± 4% perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 0.00 +3.5 3.54 ± 4% perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 1.81 ± 4% +3.8 5.59 ± 4% perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 1.80 ± 3% +3.8 5.59 ± 3% perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 1.73 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 1.72 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 0.00 +5.4 5.42 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 0.00 +5.5 5.52 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 0.00 +5.5 5.53 ± 4% perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.children.cycles-pp.worker_thread 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.children.cycles-pp.process_one_work 6.19 +3.2 9.40 ± 4% perf-profile.children.cycles-pp.memcpy_erms 34.53 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.kthread 0.00 +3.5 3.46 ± 4% perf-profile.children.cycles-pp.memcpy_from_page 0.00 +3.6 3.56 ± 4% perf-profile.children.cycles-pp._copy_from_iter_full 2.47 ± 4% +3.7 6.18 ± 3% perf-profile.children.cycles-pp.__rpc_execute 2.30 ± 5% +3.7
Re: [PATCH v3] trace:Add "gfp_t" support in synthetic_events
Hi Steve, Could you help to review? Thanks. On 7/13/2019 12:05 AM, Tom Zanussi wrote: Hi Zhengjun, On Fri, 2019-07-12 at 09:53 +0800, Zhengjun Xing wrote: Add "gfp_t" support in synthetic_events, then the "gfp_t" type parameter in some functions can be traced. Prints the gfp flags as hex in addition to the human-readable flag string. Example output: whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO) rcuc/0-11 [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC) rcuc/0-11 [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC) Signed-off-by: Tom Zanussi Signed-off-by: Zhengjun Xing Looks good to me, thanks! Tom --- kernel/trace/trace_events_hist.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index ca6b0dff60c5..30f0f32aca62 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -13,6 +13,10 @@ #include #include +/* for gfp flag names */ +#include +#include + #include "tracing_map.h" #include "trace.h" #include "trace_dynevent.h" @@ -752,6 +756,8 @@ static int synth_field_size(char *type) size = sizeof(unsigned long); else if (strcmp(type, "pid_t") == 0) size = sizeof(pid_t); + else if (strcmp(type, "gfp_t") == 0) + size = sizeof(gfp_t); else if (synth_field_is_string(type)) size = synth_field_string_size(type); @@ -792,6 +798,8 @@ static const char *synth_field_fmt(char *type) fmt = "%lu"; else if (strcmp(type, "pid_t") == 0) fmt = "%d"; + else if (strcmp(type, "gfp_t") == 0) + fmt = "%x"; else if (synth_field_is_string(type)) fmt = "%s"; @@ -834,9 +842,20 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, i == se->n_fields - 1 ? "" : " "); n_u64 += STR_VAR_LEN_MAX / sizeof(u64); } else { + struct trace_print_flags __flags[] = { + __def_gfpflag_names, {-1, NULL} }; + trace_seq_printf(s, print_fmt, se- fields[i]->name, entry->fields[n_u64], i == se->n_fields - 1 ? "" : " "); + + if (strcmp(se->fields[i]->type, "gfp_t") == 0) { + trace_seq_puts(s, " ("); + trace_print_flags_seq(s, "|", + entry- fields[n_u64], + __flags); + trace_seq_putc(s, ')'); + } n_u64++; } } -- Zhengjun Xing
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
On 7/24/2019 1:17 PM, Xing Zhengjun wrote: On 7/12/2019 2:42 PM, Xing Zhengjun wrote: Hi Trond, I attached perf-profile part big changes, hope it is useful for analyzing the issue. Ping... ping... In testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 20x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 80G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- %stddev %change %stddev \ | \ 527.29 -22.6% 407.96 fsmark.files_per_sec 1.97 ± 11% +0.9 2.88 ± 4% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 0.00 +0.9 0.93 ± 4% perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.11 ± 10% +0.9 3.05 ± 4% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 5.29 ± 2% +1.2 6.46 ± 7% perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 0.00 +3.4 3.41 ± 4% perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 0.00 +3.4 3.44 ± 4% perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 0.00 +3.5 3.54 ± 4% perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 1.81 ± 4% +3.8 5.59 ± 4% perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 1.80 ± 3% +3.8 5.59 ± 3% perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 1.73 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 1.72 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 0.00 +5.4 5.42 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 0.00 +5.5 5.52 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 0.00 +5.5 5.53 ± 4% perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.children.cycles-pp.worker_thread 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.children.cycles-pp.process_one_work 6.19 +3.2 9.40 ± 4% perf-profile.children.cycles-pp.memcpy_erms 34.53 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.kthread 0.00 +3.5 3.46 ± 4% perf-profile.children.cycles-pp.memcpy_from_page 0.00 +3.6 3.56 ± 4% perf-profile.children.cycles-pp._copy_from_iter_full 2.47 ± 4% +3.7 6.18 ± 3% perf-profile.children.cycles-pp.__rpc_execute 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.children.cycles-p
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
On 7/12/2019 2:42 PM, Xing Zhengjun wrote: Hi Trond, I attached perf-profile part big changes, hope it is useful for analyzing the issue. Ping... In testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 20x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 80G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- %stddev %change %stddev \ | \ 527.29 -22.6% 407.96 fsmark.files_per_sec 1.97 ± 11% +0.9 2.88 ± 4% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 0.00 +0.9 0.93 ± 4% perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.11 ± 10% +0.9 3.05 ± 4% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 5.29 ± 2% +1.2 6.46 ± 7% perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 0.00 +3.4 3.41 ± 4% perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 0.00 +3.4 3.44 ± 4% perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 0.00 +3.5 3.54 ± 4% perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 1.81 ± 4% +3.8 5.59 ± 4% perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 1.80 ± 3% +3.8 5.59 ± 3% perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 1.73 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 1.72 ± 4% +3.8 5.54 ± 4% perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 0.00 +5.4 5.42 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 0.00 +5.5 5.52 ± 4% perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 0.00 +5.5 5.53 ± 4% perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 9.61 ± 5% +3.1 12.70 ± 2% perf-profile.children.cycles-pp.worker_thread 9.27 ± 5% +3.1 12.40 ± 2% perf-profile.children.cycles-pp.process_one_work 6.19 +3.2 9.40 ± 4% perf-profile.children.cycles-pp.memcpy_erms 34.53 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.ret_from_fork 34.52 ± 4% +3.3 37.78 ± 2% perf-profile.children.cycles-pp.kthread 0.00 +3.5 3.46 ± 4% perf-profile.children.cycles-pp.memcpy_from_page 0.00 +3.6 3.56 ± 4% perf-profile.children.cycles-pp._copy_from_iter_full 2.47 ± 4% +3.7 6.18 ± 3% perf-profile.children.cycles-pp.__rpc_execute 2.30 ± 5% +3.7 6.02 ± 3% perf-profile.children.cycles-pp.rpc_async_schedule 1.90 ± 4% +3.8
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
smit 1.82 ± 4% +3.85.62 ± 3% perf-profile.children.cycles-pp.xs_tcp_send_request 1.81 ± 4% +3.85.62 ± 3% perf-profile.children.cycles-pp.xs_sendpages 0.21 ± 17% +5.35.48 ± 4% perf-profile.children.cycles-pp.tcp_sendmsg_locked 0.25 ± 18% +5.35.59 ± 3% perf-profile.children.cycles-pp.tcp_sendmsg 0.26 ± 16% +5.35.60 ± 3% perf-profile.children.cycles-pp.sock_sendmsg 1.19 ± 5% +0.51.68 ± 3% perf-profile.self.cycles-pp.get_page_from_freelist 6.10+3.29.27 ± 4% perf-profile.self.cycles-pp.memcpy_erms On 7/9/2019 10:39 AM, Xing Zhengjun wrote: Hi Trond, On 7/8/2019 7:44 PM, Trond Myklebust wrote: I've asked several times now about how to interpret your results. As far as I can tell from your numbers, the overhead appears to be entirely contained in the NUMA section of your results. IOW: it would appear to be a scheduling overhead due to NUMA. I've been asking whether or not that is a correct interpretation of the numbers you published. Thanks for your feedback. I used the same hardware and the same test parameters to test the two commits: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") If it is caused by NUMA, why only commit 0472e47660 throughput is decreased? The filesystem we test is NFS, commit 0472e47660 is related with the network, could you help to check if have any other clues for the regression. Thanks. -- Zhengjun Xing
Re: [PATCH v2] tracing: Add verbose gfp_flag printing to synthetic events
Hi Tom, On 7/11/2019 11:42 PM, Tom Zanussi wrote: Hi Zhengjun, The patch itself looks fine to me, but could you please create a v3 with a couple changes to the commit message? I noticed you dropped your original commit message - please add it back and combine with part of mine, as below. Also, please keep your original Subject line ('[PATCH] trace:add "gfp_t" support in synthetic_events') (but the first word after trace:, 'add', should be capitalized.) Thanks. I will send v3 version patch soon. On Thu, 2019-07-11 at 16:46 +0800, Zhengjun Xing wrote: Add on top of 'trace:add "gfp_t" support in synthetic_events'. Please remove this part but keep the part below. Prints the gfp flags as hex in addition to the human-readable flag string. Example output: whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO) rcuc/0-11 [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC) rcuc/0-11 [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC) So basically, something like this: [PATCH] trace: Add "gfp_t" support in synthetic_events Add "gfp_t" support in synthetic_events, then the "gfp_t" type parameter in some functions can be traced. Print the gfp flags as hex in addition to the human-readable flag string. Example output: whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO) rcuc/0-11 [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC) rcuc/0-11 [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC) Signed-off-by: Tom Zanussi Signed-off-by: Zhengjun Xing Thanks, Tom --- kernel/trace/trace_events_hist.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index ca6b0dff60c5..938ef3f54c5c 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -13,6 +13,10 @@ #include #include +/* for gfp flag names */ +#include +#include + #include "tracing_map.h" #include "trace.h" #include "trace_dynevent.h" @@ -752,6 +756,8 @@ static int synth_field_size(char *type) size = sizeof(unsigned long); else if (strcmp(type, "pid_t") == 0) size = sizeof(pid_t); + else if (strcmp(type, "gfp_t") == 0) + size = sizeof(gfp_t); else if (synth_field_is_string(type)) size = synth_field_string_size(type); @@ -792,6 +798,8 @@ static const char *synth_field_fmt(char *type) fmt = "%lu"; else if (strcmp(type, "pid_t") == 0) fmt = "%d"; + else if (strcmp(type, "gfp_t") == 0) + fmt = "%x"; else if (synth_field_is_string(type)) fmt = "%s"; @@ -834,9 +838,20 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, i == se->n_fields - 1 ? "" : " "); n_u64 += STR_VAR_LEN_MAX / sizeof(u64); } else { + struct trace_print_flags __flags[] = { + __def_gfpflag_names, {-1, NULL} }; + trace_seq_printf(s, print_fmt, se- fields[i]->name, entry->fields[n_u64], i == se->n_fields - 1 ? "" : " "); + + if (strcmp(se->fields[i]->type, "gfp_t") == 0) { + trace_seq_puts(s, " ("); + trace_print_flags_seq(s, "|", + entry- fields[n_u64], + __flags); + trace_seq_putc(s, ')'); + } n_u64++; } } -- Zhengjun Xing
Re: [PATCH] trace:add "gfp_t" support in synthetic_events
Hi Tom, On 7/11/2019 3:51 AM, Tom Zanussi wrote: Hi Zhengjun, On Thu, 2019-07-04 at 10:55 +0800, Zhengjun Xing wrote: Add "gfp_t" support in synthetic_events, then the "gfp_t" type parameter in some functions can be traced. Signed-off-by: Zhengjun Xing --- kernel/trace/trace_events_hist.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index ca6b0dff60c5..0d3ab01b7cb5 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -752,6 +752,8 @@ static int synth_field_size(char *type) size = sizeof(unsigned long); else if (strcmp(type, "pid_t") == 0) size = sizeof(pid_t); + else if (strcmp(type, "gfp_t") == 0) + size = sizeof(gfp_t); else if (synth_field_is_string(type)) size = synth_field_string_size(type); @@ -792,6 +794,8 @@ static const char *synth_field_fmt(char *type) fmt = "%lu"; else if (strcmp(type, "pid_t") == 0) fmt = "%d"; + else if (strcmp(type, "gfp_t") == 0) + fmt = "%u"; else if (synth_field_is_string(type)) fmt = "%s"; This will work, but I think it would be better to display as hex, and also show the flags in human-readable form. How about adding something like this on top of your patch?: Thanks, I will add it to the v2 version patch. [PATCH] tracing: Add verbose gfp_flag printing to synthetic events Add on top of 'trace:add "gfp_t" support in synthetic_events'. Prints the gfp flags as hex in addition to the human-readable flag string. Example output: whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO) rcuc/0-11 [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC) rcuc/0-11 [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC) Signed-off-by: Tom Zanussi --- kernel/trace/trace_events_hist.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 0d3ab01..aeb4449 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -13,6 +13,10 @@ #include #include +/* for gfp flag names */ +#include +#include + #include "tracing_map.h" #include "trace.h" #include "trace_dynevent.h" @@ -795,7 +799,7 @@ static const char *synth_field_fmt(char *type) else if (strcmp(type, "pid_t") == 0) fmt = "%d"; else if (strcmp(type, "gfp_t") == 0) - fmt = "%u"; + fmt = "%x"; else if (synth_field_is_string(type)) fmt = "%s"; @@ -838,9 +842,20 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter, i == se->n_fields - 1 ? "" : " "); n_u64 += STR_VAR_LEN_MAX / sizeof(u64); } else { + struct trace_print_flags __flags[] = + { __def_gfpflag_names, { -1, NULL }}; + trace_seq_printf(s, print_fmt, se->fields[i]->name, entry->fields[n_u64], i == se->n_fields - 1 ? "" : " "); + + if (strcmp(se->fields[i]->type, "gfp_t") == 0) { + trace_seq_puts(s, " ("); + trace_print_flags_seq(s, "|", + entry->fields[n_u64], + __flags); + trace_seq_putc(s, ')'); + } n_u64++; } } -- Zhengjun Xing
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
Hi Trond, On 7/8/2019 7:44 PM, Trond Myklebust wrote: I've asked several times now about how to interpret your results. As far as I can tell from your numbers, the overhead appears to be entirely contained in the NUMA section of your results. IOW: it would appear to be a scheduling overhead due to NUMA. I've been asking whether or not that is a correct interpretation of the numbers you published. Thanks for your feedback. I used the same hardware and the same test parameters to test the two commits: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") If it is caused by NUMA, why only commit 0472e47660 throughput is decreased? The filesystem we test is NFS, commit 0472e47660 is related with the network, could you help to check if have any other clues for the regression. Thanks. -- Zhengjun Xing
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
Hi Trond, I retest, it still can be reproduced. I test with the following parameters, only change "nr_threads", the test results are as the following. From the test results, more threads in the test, more regression will happen. Could you help to check? Thanks. In testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 20x nr_threads: 1t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 80G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- %stddev %change %stddev \ |\ 59.74-0.7% 59.32fsmark.files_per_sec (nr_threads= 1) 114.06-8.1% 104.83fsmark.files_per_sec (nr_threads= 2) 184.53 -13.1% 160.29fsmark.files_per_sec (nr_threads= 4) 257.05 -15.5% 217.22fsmark.files_per_sec (nr_threads= 8) 306.08 -15.5% 258.68fsmark.files_per_sec (nr_threads=16) 498.34 -22.7% 385.33fsmark.files_per_sec (nr_threads=32) 527.29 -22.6% 407.96fsmark.files_per_sec (nr_threads=64) On 5/31/2019 11:27 AM, Xing Zhengjun wrote: On 5/31/2019 3:10 AM, Trond Myklebust wrote: On Thu, 2019-05-30 at 15:20 +0800, Xing Zhengjun wrote: On 5/30/2019 10:00 AM, Trond Myklebust wrote: Hi Xing, On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote: Hi Trond, On 5/20/2019 1:54 PM, kernel test robot wrote: Greeting, FYI, we noticed a 16.0% improvement of fsmark.app_overhead due to commit: commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC: Convert socket page send code to use iov_iter()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 1x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 40G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ Details are as below: - -> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml === == compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconf ig/n r_threads/rootfs/sync_method/tbox_group/test_size/testcase: gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel- 7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb- ep01/40G/fsmark commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- fail:runs %reproduction fail:runs | | | :4 50% 2:4 dmesg.WARNING:a t#for _ip_interrupt_entry/0x %stddev %change %stddev \ | \ 15118573 ± 2% +16.0% 17538083 fsmark.app_overhead 510.93 - 22.7% 395.12 fsmark.files_per_sec 24.90 +22.8% 30.57 fsmark.time.ela psed_ time 24.90 +22.8% 30.57 fsmark.time.ela psed_ time.max 288.00 ± 2% - 27.8% 208.00 fsmark.time.percent_of_cpu_this_job_got 70.03 ± 2% - 11.3% 62.14 fsmark.time.system_time Do you have time to take a look at this regression? From your stats, it looks to me as if the problem is increased NUMA overhead. Pretty much everything else appears to be the same or actually performing better than previously. Am I interpreting that correctly? The real regression is the throughput(fsmark.files_per_sec) is decreased by 22.7%. Understood, but I'm trying to make sen
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
On 5/31/2019 3:10 AM, Trond Myklebust wrote: On Thu, 2019-05-30 at 15:20 +0800, Xing Zhengjun wrote: On 5/30/2019 10:00 AM, Trond Myklebust wrote: Hi Xing, On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote: Hi Trond, On 5/20/2019 1:54 PM, kernel test robot wrote: Greeting, FYI, we noticed a 16.0% improvement of fsmark.app_overhead due to commit: commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC: Convert socket page send code to use iov_iter()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 1x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 40G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ Details are as below: - -> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml === == compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconf ig/n r_threads/rootfs/sync_method/tbox_group/test_size/testcase: gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel- 7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb- ep01/40G/fsmark commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- fail:runs %reproductionfail:runs | | | :4 50% 2:4 dmesg.WARNING:a t#for _ip_interrupt_entry/0x %stddev %change %stddev \ |\ 15118573 ± 2% +16.0% 17538083fsmark.app_overhead 510.93 - 22.7% 395.12fsmark.files_per_sec 24.90 +22.8% 30.57fsmark.time.ela psed_ time 24.90 +22.8% 30.57fsmark.time.ela psed_ time.max 288.00 ± 2% - 27.8% 208.00fsmark.time.percent_of_cpu_this_job_got 70.03 ± 2% - 11.3% 62.14fsmark.time.system_time Do you have time to take a look at this regression? From your stats, it looks to me as if the problem is increased NUMA overhead. Pretty much everything else appears to be the same or actually performing better than previously. Am I interpreting that correctly? The real regression is the throughput(fsmark.files_per_sec) is decreased by 22.7%. Understood, but I'm trying to make sense of why. I'm not able to reproduce this, so I have to rely on your performance stats to understand where the 22.7% regression is coming from. As far as I can see, the only numbers in the stats you published that are showing a performance regression (other than the fsmark number itself), are the NUMA numbers. Is that a correct interpretation? We re-test the case yesterday, the test result almost is the same. we will do more test and also check the test case itself, if you need more information, please let me know, thanks. If my interpretation above is correct, then I'm not seeing where this patch would be introducing new NUMA regressions. It is just converting from using one method of doing socket I/O to another. Could it perhaps be a memory artefact due to your running the NFS client and server on the same machine? Apologies for pushing back a little, but I just don't have the hardware available to test NUMA configurations, so I'm relying on external testing for the above kind of scenario. Thanks for looking at this. If you need more information, please let me know. Thanks Trond -- Zhengjun Xing
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
On 5/30/2019 10:00 AM, Trond Myklebust wrote: Hi Xing, On Thu, 2019-05-30 at 09:35 +0800, Xing Zhengjun wrote: Hi Trond, On 5/20/2019 1:54 PM, kernel test robot wrote: Greeting, FYI, we noticed a 16.0% improvement of fsmark.app_overhead due to commit: commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC: Convert socket page send code to use iov_iter()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 1x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 40G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ Details are as below: - -> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml === == compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/n r_threads/rootfs/sync_method/tbox_group/test_size/testcase: gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel- 7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb- ep01/40G/fsmark commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- fail:runs %reproductionfail:runs | | | :4 50% 2:4 dmesg.WARNING:at#for _ip_interrupt_entry/0x %stddev %change %stddev \ |\ 15118573 ± 2% +16.0% 17538083fsmark.app_overhead 510.93 -22.7% 395.12fsmark.files_per_sec 24.90 +22.8% 30.57fsmark.time.elapsed_ time 24.90 +22.8% 30.57fsmark.time.elapsed_ time.max 288.00 ± 2% - 27.8% 208.00fsmark.time.percent_of_cpu_this_job_got 70.03 ± 2% - 11.3% 62.14fsmark.time.system_time Do you have time to take a look at this regression? From your stats, it looks to me as if the problem is increased NUMA overhead. Pretty much everything else appears to be the same or actually performing better than previously. Am I interpreting that correctly? The real regression is the throughput(fsmark.files_per_sec) is decreased by 22.7%. If my interpretation above is correct, then I'm not seeing where this patch would be introducing new NUMA regressions. It is just converting from using one method of doing socket I/O to another. Could it perhaps be a memory artefact due to your running the NFS client and server on the same machine? Apologies for pushing back a little, but I just don't have the hardware available to test NUMA configurations, so I'm relying on external testing for the above kind of scenario. Thanks for looking at this. If you need more information, please let me know. Thanks Trond -- Zhengjun Xing
Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
Hi Trond, On 5/20/2019 1:54 PM, kernel test robot wrote: Greeting, FYI, we noticed a 16.0% improvement of fsmark.app_overhead due to commit: commit: 0472e476604998c127f3c80d291113e77c5676ac ("SUNRPC: Convert socket page send code to use iov_iter()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: fsmark on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory with following parameters: iterations: 1x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 40G sync_method: fsyncBeforeClose cpufreq_governor: performance test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: gcc-7/performance/1BRD_48G/4M/nfsv4/xfs/1x/x86_64-rhel-7.6/64t/debian-x86_64-2018-04-03.cgz/fsyncBeforeClose/lkp-ivb-ep01/40G/fsmark commit: e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") 0472e47660 ("SUNRPC: Convert socket page send code to use iov_iter()") e791f8e9380d945e 0472e476604998c127f3c80d291 --- fail:runs %reproductionfail:runs | | | :4 50% 2:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x %stddev %change %stddev \ |\ 15118573 ± 2% +16.0% 17538083fsmark.app_overhead 510.93 -22.7% 395.12fsmark.files_per_sec 24.90 +22.8% 30.57fsmark.time.elapsed_time 24.90 +22.8% 30.57fsmark.time.elapsed_time.max 288.00 ± 2% -27.8% 208.00 fsmark.time.percent_of_cpu_this_job_got 70.03 ± 2% -11.3% 62.14fsmark.time.system_time 4391964 -16.7%3658341meminfo.max_used_kB 6.10 ± 4% +1.97.97 ± 3% mpstat.cpu.all.iowait% 0.27-0.00.24 ± 3% mpstat.cpu.all.soft% 13668070 ± 40%+118.0% 29801846 ± 19% numa-numastat.node0.local_node 1364 ± 40%+117.9% 29810258 ± 19% numa-numastat.node0.numa_hit 5.70 ± 3% +32.1% 7.53 ± 3% iostat.cpu.iowait 16.42 ± 2% -5.8% 15.47iostat.cpu.system 2.57-4.1% 2.46iostat.cpu.user 1406781 ± 2% -15.5%1188498vmstat.io.bo 251792 ± 3% -16.6% 209928vmstat.system.cs 84841-1.9% 83239vmstat.system.in 97374502 ± 20% +66.1% 1.617e+08 ± 17% cpuidle.C1E.time 573934 ± 19% +44.6% 829662 ± 26% cpuidle.C1E.usage 5.892e+08 ± 8% +15.3% 6.796e+08 ± 2% cpuidle.C6.time 1968016 ± 3% -15.1%1670867 ± 3% cpuidle.POLL.time 106420 ± 47% +86.2% 198108 ± 35% numa-meminfo.node0.Active 106037 ± 48% +86.2% 197395 ± 35% numa-meminfo.node0.Active(anon) 105052 ± 48% +86.6% 196037 ± 35% numa-meminfo.node0.AnonPages 212876 ± 24% -41.5% 124572 ± 56% numa-meminfo.node1.Active 211801 ± 24% -41.5% 123822 ± 56% numa-meminfo.node1.Active(anon) 208559 ± 24% -42.2% 120547 ± 57% numa-meminfo.node1.AnonPages 9955+1.6% 10116proc-vmstat.nr_kernel_stack 452.25 ± 59%+280.9% 1722 ±100% proc-vmstat.numa_hint_faults_local 33817303 +55.0% 52421773 ± 5% proc-vmstat.numa_hit 33804286 +55.0% 52408807 ± 5% proc-vmstat.numa_local 33923002 +81.8% 61663426 ± 5% proc-vmstat.pgalloc_normal 184765+9.3% 201985proc-vmstat.pgfault 12840986 +216.0% 40581327 ± 7% proc-vmstat.pgfree 31447 ± 11% -26.1% 23253 ± 13% sched_debug.cfs_rq:/.min_vruntime.max 4241 ± 3% -12.2% 3724 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev 20631 ± 11% -36.7% 13069 ± 29% sched_debug.cfs_rq:/.spread0.max 4238 ± 4% -12.1% 3724 ± 11% sched_debug.cfs_rq:/.spread0.stddev 497105 ± 19% -16.0% 41 ± 4% sched_debug.cpu.avg_idle.avg 21199 ± 10% -12.0% 18650 ± 3% sched_debug.cpu.nr_load_updates.max 2229 ± 10% -15.0%
Re: [PATCH] USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
On 3/22/2018 8:03 PM, Greg KH wrote: On Wed, Mar 21, 2018 at 01:29:42PM +0800, Zhengjun Xing wrote: USB3 hubs don't support global suspend. USB3 specification 10.10, Enhanced SuperSpeed hubs only support selective suspend and resume, they do not support global suspend/resume where the hub downstream facing ports states are not affected. When system enters hibernation it first enters freeze process where only the root hub enters suspend, usb_port_suspend() is not called for other devices, and suspend status flags are not set for them. Other devices are expected to suspend globally. Some external USB3 hubs will suspend the downstream facing port at global suspend. These devices won't be resumed at thaw as the suspend status flag is not set. A USB3 removable hard disk connected through a USB3 hub that won't resume at thaw will fail to synchronize SCSI cache, return “cmd cmplt err -71” error, and needs a 60 seconds timeout which causing system hang for 60s before the USB host reset the port for the USB3 removable hard disk to recover. Fix this by always calling usb_port_suspend() during freeze for USB3 devices. This should go to the stable trees as well, right? greg k-h Yes. It should go to the stable trees.
RE: [PATCH 1/2] tracing: Handle NULL formats in hold_module_trace_bprintk_format()
I agree with you. You can also add me to the "Signed-off-by". Best Regards, Zhengjun -Original Message- From: Steven Rostedt [mailto:rost...@goodmis.org] Sent: Monday, June 20, 2016 9:53 PM To: linux-kernel@vger.kernel.org Cc: Linus Torvalds ; Ingo Molnar ; Andrew Morton ; Xing, Zhengjun ; Namhyung Kim ; sta...@vger.kernel.org Subject: [PATCH 1/2] tracing: Handle NULL formats in hold_module_trace_bprintk_format() From: "Steven Rostedt (Red Hat)" If a task uses a non constant string for the format parameter in trace_printk(), then the trace_printk_fmt variable is set to NULL. This variable is then saved in the __trace_printk_fmt section. The function hold_module_trace_bprintk_format() checks to see if duplicate formats are used by modules, and reuses them if so (saves them to the list if it is new). But this function calls lookup_format() that does a strcmp() to the value (which is now NULL) and can cause a kernel oops. This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") which added "__used" to the trace_printk_fmt variable, and before that, the kernel simply optimized it out (no NULL value was saved). The fix is simply to handle the NULL pointer in lookup_format() and have the caller ignore the value if it was NULL. Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.x...@intel.com Reported-by: xingzhen Acked-by: Namhyung Kim Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") Cc: sta...@vger.kernel.org # v3.5+ Signed-off-by: Steven Rostedt --- kernel/trace/trace_printk.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c index f96f0383f6c6..ad1d6164e946 100644 --- a/kernel/trace/trace_printk.c +++ b/kernel/trace/trace_printk.c @@ -36,6 +36,10 @@ struct trace_bprintk_fmt { static inline struct trace_bprintk_fmt *lookup_format(const char *fmt) { struct trace_bprintk_fmt *pos; + + if (!fmt) + return ERR_PTR(-EINVAL); + list_for_each_entry(pos, &trace_bprintk_fmt_list, list) { if (!strcmp(pos->fmt, fmt)) return pos; @@ -57,7 +61,8 @@ void hold_module_trace_bprintk_format(const char **start, const char **end) for (iter = start; iter < end; iter++) { struct trace_bprintk_fmt *tb_fmt = lookup_format(*iter); if (tb_fmt) { - *iter = tb_fmt->fmt; + if (!IS_ERR(tb_fmt)) + *iter = tb_fmt->fmt; continue; } -- 2.8.0.rc3