Re: d28296d248: stress-ng.sigsegv.ops_per_sec -82.7% regression
Alexey Gladkov writes: > On Wed, Feb 24, 2021 at 12:50:21PM -0600, Eric W. Biederman wrote: >> Alexey Gladkov writes: >> >> > On Wed, Feb 24, 2021 at 10:54:17AM -0600, Eric W. Biederman wrote: >> >> kernel test robot writes: >> >> >> >> > Greeting, >> >> > >> >> > FYI, we noticed a -82.7% regression of stress-ng.sigsegv.ops_per_sec >> >> > due to commit: >> >> > >> >> > >> >> > commit: d28296d2484fa11e94dff65e93eb25802a443d47 ("[PATCH v7 5/7] >> >> > Reimplement RLIMIT_SIGPENDING on top of ucounts") >> >> > url: >> >> > https://github.com/0day-ci/linux/commits/Alexey-Gladkov/Count-rlimits-in-each-user-namespace/20210222-175836 >> >> > base: >> >> > https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git >> >> > next >> >> > >> >> > in testcase: stress-ng >> >> > on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz >> >> > with 112G memory >> >> > with following parameters: >> >> > >> >> > nr_threads: 100% >> >> > disk: 1HDD >> >> > testtime: 60s >> >> > class: interrupt >> >> > test: sigsegv >> >> > cpufreq_governor: performance >> >> > ucode: 0x42e >> >> > >> >> > >> >> > In addition to that, the commit also has significant impact on the >> >> > following tests: >> >> >> >> Thank you. Now we have a sense of where we need to test the performance >> >> of these changes carefully. >> > >> > One of the reasons for this is that I rolled back the patch that changed >> > the ucounts.count type to atomic_t. Now get_ucounts() is forced to use a >> > spin_lock to increase the reference count. >> >> Which given the hickups with getting a working version seems justified. >> >> Now we can add incremental patches on top to improve the performance. > > I'm not sure that get_ucounts() should be used in __sigqueue_alloc() [1]. > I tried removing it and running KASAN tests that were failing before. So > far, I have not found any problems. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/legion/linux.git/tree/kernel/signal.c?h=patchset/per-userspace-rlimit/v7.1&id=2d4a2e2be7db42c95acb98abfc2a9b370ddd0604#n428 Hmm. The code you posted still seems to include the get_ucounts. I like the idea of not needing to increment and decrement the ucount reference count every time a signal is sent, unfortunately there is a problem. The way we have implemented setresuid allows different threads in a threaded application to have different cred->user values. That is actually an extension of what posix supports and pthreads will keep the creds of a process in sync. Still I recall looking into this a few years ago and there were a few applications that take advantage of the linux behavior. In principle I think it is possible to hold a ucount reference in somewhere such as task->signal. In practice there are enough complicating factors I don't immediately see how to implement that. If the creds were stored in signal_struct instead of in task_struct we could simply move the sigpending counts in set_user, when the uid of a process changed. With the current state I don't know how to pick which is the real user. Eric
Re: d28296d248: stress-ng.sigsegv.ops_per_sec -82.7% regression
On Wed, Feb 24, 2021 at 12:50:21PM -0600, Eric W. Biederman wrote: > Alexey Gladkov writes: > > > On Wed, Feb 24, 2021 at 10:54:17AM -0600, Eric W. Biederman wrote: > >> kernel test robot writes: > >> > >> > Greeting, > >> > > >> > FYI, we noticed a -82.7% regression of stress-ng.sigsegv.ops_per_sec due > >> > to commit: > >> > > >> > > >> > commit: d28296d2484fa11e94dff65e93eb25802a443d47 ("[PATCH v7 5/7] > >> > Reimplement RLIMIT_SIGPENDING on top of ucounts") > >> > url: > >> > https://github.com/0day-ci/linux/commits/Alexey-Gladkov/Count-rlimits-in-each-user-namespace/20210222-175836 > >> > base: > >> > https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git > >> > next > >> > > >> > in testcase: stress-ng > >> > on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz > >> > with 112G memory > >> > with following parameters: > >> > > >> > nr_threads: 100% > >> > disk: 1HDD > >> > testtime: 60s > >> > class: interrupt > >> > test: sigsegv > >> > cpufreq_governor: performance > >> > ucode: 0x42e > >> > > >> > > >> > In addition to that, the commit also has significant impact on the > >> > following tests: > >> > >> Thank you. Now we have a sense of where we need to test the performance > >> of these changes carefully. > > > > One of the reasons for this is that I rolled back the patch that changed > > the ucounts.count type to atomic_t. Now get_ucounts() is forced to use a > > spin_lock to increase the reference count. > > Which given the hickups with getting a working version seems justified. > > Now we can add incremental patches on top to improve the performance. I'm not sure that get_ucounts() should be used in __sigqueue_alloc() [1]. I tried removing it and running KASAN tests that were failing before. So far, I have not found any problems. [1] https://git.kernel.org/pub/scm/linux/kernel/git/legion/linux.git/tree/kernel/signal.c?h=patchset/per-userspace-rlimit/v7.1&id=2d4a2e2be7db42c95acb98abfc2a9b370ddd0604#n428 -- Rgrds, legion
Re: d28296d248: stress-ng.sigsegv.ops_per_sec -82.7% regression
On Wed, Feb 24, 2021 at 10:38 AM Alexey Gladkov wrote: > > One of the reasons for this is that I rolled back the patch that changed > the ucounts.count type to atomic_t. Now get_ucounts() is forced to use a > spin_lock to increase the reference count. Yeah, that definitely should be an atomic type, since the extended use of ucounts clearly puts way too much pressure on that ucount lock. I remember complaining about one version of that patch, but my complaint wasabout it changing semantics of the saturation logic (and I think it was also wrong because it still kept the spinlock for get_ucounts(), so it didn't even take advantage of the atomic refcount). Side note: I don't think a refcount_t" is necessarily the right thing to do, since the ucount reference counter does its own saturation logic, and the refcount_t version is imho not great. So it probably just needs to use an atomic_t, and do the saturation thing manually. Side note: look at try_get_page(). That one actually does refcounting with overflow protection better than refcount_t, in my opinion. But I am obviously biased, since I wrote it ;) See commits 88b1a17dfc3e mm: add 'try_get_page()' helper function f958d7b528b1 mm: make page ref count overflow check tighter and more explicit with that "page->_recount" being just a regular atomic_t. Linus
Re: d28296d248: stress-ng.sigsegv.ops_per_sec -82.7% regression
Alexey Gladkov writes: > On Wed, Feb 24, 2021 at 10:54:17AM -0600, Eric W. Biederman wrote: >> kernel test robot writes: >> >> > Greeting, >> > >> > FYI, we noticed a -82.7% regression of stress-ng.sigsegv.ops_per_sec due >> > to commit: >> > >> > >> > commit: d28296d2484fa11e94dff65e93eb25802a443d47 ("[PATCH v7 5/7] >> > Reimplement RLIMIT_SIGPENDING on top of ucounts") >> > url: >> > https://github.com/0day-ci/linux/commits/Alexey-Gladkov/Count-rlimits-in-each-user-namespace/20210222-175836 >> > base: >> > https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next >> > >> > in testcase: stress-ng >> > on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with >> > 112G memory >> > with following parameters: >> > >> >nr_threads: 100% >> >disk: 1HDD >> >testtime: 60s >> >class: interrupt >> >test: sigsegv >> >cpufreq_governor: performance >> >ucode: 0x42e >> > >> > >> > In addition to that, the commit also has significant impact on the >> > following tests: >> >> Thank you. Now we have a sense of where we need to test the performance >> of these changes carefully. > > One of the reasons for this is that I rolled back the patch that changed > the ucounts.count type to atomic_t. Now get_ucounts() is forced to use a > spin_lock to increase the reference count. Which given the hickups with getting a working version seems justified. Now we can add incremental patches on top to improve the performance. Eric
Re: d28296d248: stress-ng.sigsegv.ops_per_sec -82.7% regression
On Wed, Feb 24, 2021 at 10:54:17AM -0600, Eric W. Biederman wrote: > kernel test robot writes: > > > Greeting, > > > > FYI, we noticed a -82.7% regression of stress-ng.sigsegv.ops_per_sec due to > > commit: > > > > > > commit: d28296d2484fa11e94dff65e93eb25802a443d47 ("[PATCH v7 5/7] > > Reimplement RLIMIT_SIGPENDING on top of ucounts") > > url: > > https://github.com/0day-ci/linux/commits/Alexey-Gladkov/Count-rlimits-in-each-user-namespace/20210222-175836 > > base: > > https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next > > > > in testcase: stress-ng > > on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with > > 112G memory > > with following parameters: > > > > nr_threads: 100% > > disk: 1HDD > > testtime: 60s > > class: interrupt > > test: sigsegv > > cpufreq_governor: performance > > ucode: 0x42e > > > > > > In addition to that, the commit also has significant impact on the > > following tests: > > Thank you. Now we have a sense of where we need to test the performance > of these changes carefully. One of the reasons for this is that I rolled back the patch that changed the ucounts.count type to atomic_t. Now get_ucounts() is forced to use a spin_lock to increase the reference count. -- Rgrds, legion
Re: d28296d248: stress-ng.sigsegv.ops_per_sec -82.7% regression
kernel test robot writes: > Greeting, > > FYI, we noticed a -82.7% regression of stress-ng.sigsegv.ops_per_sec due to > commit: > > > commit: d28296d2484fa11e94dff65e93eb25802a443d47 ("[PATCH v7 5/7] Reimplement > RLIMIT_SIGPENDING on top of ucounts") > url: > https://github.com/0day-ci/linux/commits/Alexey-Gladkov/Count-rlimits-in-each-user-namespace/20210222-175836 > base: https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git > next > > in testcase: stress-ng > on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with > 112G memory > with following parameters: > > nr_threads: 100% > disk: 1HDD > testtime: 60s > class: interrupt > test: sigsegv > cpufreq_governor: performance > ucode: 0x42e > > > In addition to that, the commit also has significant impact on the > following tests: Thank you. Now we have a sense of where we need to test the performance of these changes carefully. Eric > +--+---+ > | testcase: change | stress-ng: stress-ng.sigq.ops_per_sec -56.1% regression > | > | test machine | 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz > with 112G memory | > | test parameters | class=interrupt > | > | | cpufreq_governor=performance > | > | | disk=1HDD > | > | | nr_threads=100% > | > | | test=sigq > | > | | testtime=60s > | > | | ucode=0x42e > | > +--+---+ > > > If you fix the issue, kindly add following tag > Reported-by: kernel test robot > > > Details are as below: > --> > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > bin/lkp installjob.yaml # job file is attached in > this email > bin/lkp split-job --compatible job.yaml > bin/lkp runcompatible-job.yaml > > = > class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode: > > interrupt/gcc-9/performance/1HDD/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/sigsegv/stress-ng/60s/0x42e > > commit: > 4660d663b4 ("Reimplement RLIMIT_MSGQUEUE on top of ucounts") > d28296d248 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") > > 4660d663b4207ce6 d28296d2484fa11e94dff65e93e > --- >fail:runs %reproductionfail:runs >| | | > 14:6 -217% 1:6 > perf-profile.children.cycles-pp.error_entry > 12:6 -179% 1:6 > perf-profile.self.cycles-pp.error_entry > %stddev %change %stddev > \ |\ > 4.766e+08 -82.7% 82358308stress-ng.sigsegv.ops >7942807 -82.7%1372639stress-ng.sigsegv.ops_per_sec > 29408 -77.5% 6621 ± 61% > stress-ng.time.file_system_inputs > 1566 +69.4% 2653stress-ng.time.system_time > 1274 -84.8% 193.22 ± 8% stress-ng.time.user_time > 12458 ± 5% +37.2% 17097 ± 5% numa-meminfo.node1.Active(anon) > 51.41 +66.5% 85.59iostat.cpu.system > 41.17 -84.2% 6.50 ± 7% iostat.cpu.user > 3040 ± 4% +37.9% 4193 ± 4% numa-vmstat.node1.nr_active_anon > 3040 ± 4% +37.9% 4193 ± 4% > numa-vmstat.node1.nr_zone_active_anon > 50.83 +67.2% 85.00vmstat.cpu.sy > 40.50 -85.6% 5.83 ± 11% vmstat.cpu.us > 225.33 -77.7% 50.33 ± 62% vmstat.io.bi > 7.00 -100.0% 0.00vmstat.memory.buff > 20735 ± 2% -14.1% 17812 ± 5% meminfo.Active > 13506 ± 3% +31.9% 17812 ± 5% meminfo.Active(anon) > 7228 -100.0% 0.00meminfo.Active(file) > 29308 +18.4% 34687 ± 2% meminfo.Shmem > 202067-9.5% 182899meminfo.VmallocUsed > 0.01 ± 17% -0.00.00 ± 10% mpstat.cpu.all.iowait% > 1.04-0.10.92mpst