Re: [mm/gup] 47e29d32af: phoronix-test-suite.npb.FT.A.total_mop_s -45.0% regression
On Wed, Nov 18, 2020 at 10:17:27AM -0800, Dan Williams wrote: > On Wed, Nov 18, 2020 at 5:51 AM Jan Kara wrote: > > > > On Mon 16-11-20 19:35:31, John Hubbard wrote: > > > > > > On 11/16/20 6:48 PM, kernel test robot wrote: > > > > > > > > Greeting, > > > > > > > > FYI, we noticed a -45.0% regression of > > > > phoronix-test-suite.npb.FT.A.total_mop_s due to commit: > > > > > > > > > > That's a huge slowdown... > > > > > > > > > > > commit: 47e29d32afba11b13efb51f03154a8cf22fb4360 ("mm/gup: > > > > page->hpage_pinned_refcount: exact pin counts for huge pages") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > ...but that commit happened in April, 2020. Surely if this were a serious > > > issue we would have some other indication...is this worth following up > > > on?? I'm inclined to ignore it, honestly. > > > > Why this was detected so late is a fair question although it doesn't quite > > invalidate the report... > > I don't know what specifically happened in this case, perhaps someone > from the lkp team can comment? - some extra phoronix test suites are enabled/fixed gradually so we will have better coverage - we scan kernel releases within the year to baseline the performance, it may trigger bisection if one release has regressed and not recovered. With this continuous effort, 0-day ci can detect the changes on mainline. > However, the myth / contention that > "surely someone else would have noticed by now" is why the lkp project > was launched. Kernels regressed without much complaint and it wasn't > until much later in the process, around the time enterprise distros > rebased to new kernels, did end users start filing performance loss > regression reports. Given -stable kernel releases, 6-7 months is still > faster than many end user upgrade cycles to new kernel baselines.
Re: [mm/gup] 47e29d32af: phoronix-test-suite.npb.FT.A.total_mop_s -45.0% regression
On 11/18/20 10:17 AM, Dan Williams wrote: On Wed, Nov 18, 2020 at 5:51 AM Jan Kara wrote: On Mon 16-11-20 19:35:31, John Hubbard wrote: On 11/16/20 6:48 PM, kernel test robot wrote: Greeting, FYI, we noticed a -45.0% regression of phoronix-test-suite.npb.FT.A.total_mop_s due to commit: That's a huge slowdown... commit: 47e29d32afba11b13efb51f03154a8cf22fb4360 ("mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master ...but that commit happened in April, 2020. Surely if this were a serious issue we would have some other indication...is this worth following up on?? I'm inclined to ignore it, honestly. Why this was detected so late is a fair question although it doesn't quite invalidate the report... I don't know what specifically happened in this case, perhaps someone from the lkp team can comment? However, the myth / contention that "surely someone else would have noticed by now" is why the lkp project was launched. Kernels regressed without much complaint and it wasn't until much later in the process, around the time enterprise distros rebased to new kernels, did end users start filing performance loss regression reports. Given -stable kernel releases, 6-7 months is still faster than many end user upgrade cycles to new kernel baselines. I see, thanks for explaining. I'll take a peek, then. thanks, -- John Hubbard NVIDIA
Re: [mm/gup] 47e29d32af: phoronix-test-suite.npb.FT.A.total_mop_s -45.0% regression
On Wed, Nov 18, 2020 at 5:51 AM Jan Kara wrote: > > On Mon 16-11-20 19:35:31, John Hubbard wrote: > > > > On 11/16/20 6:48 PM, kernel test robot wrote: > > > > > > Greeting, > > > > > > FYI, we noticed a -45.0% regression of > > > phoronix-test-suite.npb.FT.A.total_mop_s due to commit: > > > > > > > That's a huge slowdown... > > > > > > > > commit: 47e29d32afba11b13efb51f03154a8cf22fb4360 ("mm/gup: > > > page->hpage_pinned_refcount: exact pin counts for huge pages") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > ...but that commit happened in April, 2020. Surely if this were a serious > > issue we would have some other indication...is this worth following up > > on?? I'm inclined to ignore it, honestly. > > Why this was detected so late is a fair question although it doesn't quite > invalidate the report... I don't know what specifically happened in this case, perhaps someone from the lkp team can comment? However, the myth / contention that "surely someone else would have noticed by now" is why the lkp project was launched. Kernels regressed without much complaint and it wasn't until much later in the process, around the time enterprise distros rebased to new kernels, did end users start filing performance loss regression reports. Given -stable kernel releases, 6-7 months is still faster than many end user upgrade cycles to new kernel baselines.
Re: [mm/gup] 47e29d32af: phoronix-test-suite.npb.FT.A.total_mop_s -45.0% regression
On Mon 16-11-20 19:35:31, John Hubbard wrote: > > On 11/16/20 6:48 PM, kernel test robot wrote: > > > > Greeting, > > > > FYI, we noticed a -45.0% regression of > > phoronix-test-suite.npb.FT.A.total_mop_s due to commit: > > > > That's a huge slowdown... > > > > > commit: 47e29d32afba11b13efb51f03154a8cf22fb4360 ("mm/gup: > > page->hpage_pinned_refcount: exact pin counts for huge pages") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > ...but that commit happened in April, 2020. Surely if this were a serious > issue we would have some other indication...is this worth following up > on?? I'm inclined to ignore it, honestly. Why this was detected so late is a fair question although it doesn't quite invalidate the report... The NPB benchmark appears to be a supercomputing benchmark so concievably it could be heavily using THPs. The question is why it would be a heavy user of pinning as well but even that is imaginable considering that MPI is in use etc. So maybe it is worth trying to reproduce this because heavy THP + pinning users might be indeed rare and only those would show regressions in THP pinning performance... Honza -- Jan Kara SUSE Labs, CR
Re: [mm/gup] 47e29d32af: phoronix-test-suite.npb.FT.A.total_mop_s -45.0% regression
On 11/16/20 6:48 PM, kernel test robot wrote: Greeting, FYI, we noticed a -45.0% regression of phoronix-test-suite.npb.FT.A.total_mop_s due to commit: That's a huge slowdown... commit: 47e29d32afba11b13efb51f03154a8cf22fb4360 ("mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master ...but that commit happened in April, 2020. Surely if this were a serious issue we would have some other indication...is this worth following up on?? I'm inclined to ignore it, honestly. thanks, -- John Hubbard NVIDIA in testcase: phoronix-test-suite on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: test: npb-1.3.1 option_a: FT.A cpufreq_governor: performance ucode: 0x5002f01 test-description: The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added. test-url: http://www.phoronix-test-suite.com/ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/FT.A/debian-x86_64-phoronix/lkp-csl-2sp8/npb-1.3.1/phoronix-test-suite/0x5002f01 commit: 3faa52c03f ("mm/gup: track FOLL_PIN pages") 47e29d32af ("mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages") 3faa52c03f440d1b 47e29d32afba11b13efb51f0315 --- fail:runs %reproductionfail:runs | | | 1:4 -25%:4 kmsg.Spurious_LAPIC_timer_interrupt_on_cpu %stddev %change %stddev \ |\ 4585 ± 2% -45.0% 2522 phoronix-test-suite.npb.FT.A.total_mop_s 1223 ± 4% +40.2% 1714 phoronix-test-suite.time.percent_of_cpu_this_job_got phoronix-test-suite.npb.FT.A.total_mop_s 6500 ++ | .+. .+. .+. | 6000 |.+ +.+.+.++.+.+.+.+.+.+.+ +.+.++ +.+.+.+.+.+.+.+.+.++.+ | 5500 |-+ : | | : | 5000 |-+: | 4500 |-++.+.+.| || 4000 |-+ | 3500 |-+ | || 3000 |-+ | 2500 |-+ O O O| | O O O O O OO O O O O O O O O O O| 2000 ++ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Oliver Sang