Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Fri 29-04-16 20:54:13, Aaron Lu wrote: > On Fri, Apr 29, 2016 at 11:29:36AM +0200, Michal Hocko wrote: > > On Fri 29-04-16 16:59:37, Aaron Lu wrote: > > > On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > > > > All of them are order-2 and this was a known problem for "mm, oom: > > > > rework oom detection" commit and later should make it much more > > > > resistant to failures for higher (!costly) orders. So I would definitely > > > > encourage you to retest with the current _complete_ mmotm tree. > > > > > > OK, will run the test on this branch: > > > https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 > > > with head commit: > > > commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e > > > Author: Michal Hocko> > > Date: Thu Apr 28 12:03:24 2016 +0200 > > > > > > drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable > > > > > > Please let me know if this isn't right. > > > > Yes that should contain all the oom related patches in the mmotm tree. > > The test shows commit 81cc2e6f1e doesn't OOM anymore and its throughput > is 43609, the same level compared to 43802, so everyting is fine :-) Thanks a lot for double checking! This is highly appreciated! -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Fri 29-04-16 20:54:13, Aaron Lu wrote: > On Fri, Apr 29, 2016 at 11:29:36AM +0200, Michal Hocko wrote: > > On Fri 29-04-16 16:59:37, Aaron Lu wrote: > > > On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > > > > All of them are order-2 and this was a known problem for "mm, oom: > > > > rework oom detection" commit and later should make it much more > > > > resistant to failures for higher (!costly) orders. So I would definitely > > > > encourage you to retest with the current _complete_ mmotm tree. > > > > > > OK, will run the test on this branch: > > > https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 > > > with head commit: > > > commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e > > > Author: Michal Hocko > > > Date: Thu Apr 28 12:03:24 2016 +0200 > > > > > > drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable > > > > > > Please let me know if this isn't right. > > > > Yes that should contain all the oom related patches in the mmotm tree. > > The test shows commit 81cc2e6f1e doesn't OOM anymore and its throughput > is 43609, the same level compared to 43802, so everyting is fine :-) Thanks a lot for double checking! This is highly appreciated! -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Fri, Apr 29, 2016 at 11:29:36AM +0200, Michal Hocko wrote: > On Fri 29-04-16 16:59:37, Aaron Lu wrote: > > On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > > > All of them are order-2 and this was a known problem for "mm, oom: > > > rework oom detection" commit and later should make it much more > > > resistant to failures for higher (!costly) orders. So I would definitely > > > encourage you to retest with the current _complete_ mmotm tree. > > > > OK, will run the test on this branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 > > with head commit: > > commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e > > Author: Michal Hocko> > Date: Thu Apr 28 12:03:24 2016 +0200 > > > > drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable > > > > Please let me know if this isn't right. > > Yes that should contain all the oom related patches in the mmotm tree. The test shows commit 81cc2e6f1e doesn't OOM anymore and its throughput is 43609, the same level compared to 43802, so everyting is fine :-)
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Fri, Apr 29, 2016 at 11:29:36AM +0200, Michal Hocko wrote: > On Fri 29-04-16 16:59:37, Aaron Lu wrote: > > On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > > > All of them are order-2 and this was a known problem for "mm, oom: > > > rework oom detection" commit and later should make it much more > > > resistant to failures for higher (!costly) orders. So I would definitely > > > encourage you to retest with the current _complete_ mmotm tree. > > > > OK, will run the test on this branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 > > with head commit: > > commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e > > Author: Michal Hocko > > Date: Thu Apr 28 12:03:24 2016 +0200 > > > > drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable > > > > Please let me know if this isn't right. > > Yes that should contain all the oom related patches in the mmotm tree. The test shows commit 81cc2e6f1e doesn't OOM anymore and its throughput is 43609, the same level compared to 43802, so everyting is fine :-)
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Fri 29-04-16 16:59:37, Aaron Lu wrote: > On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > > All of them are order-2 and this was a known problem for "mm, oom: > > rework oom detection" commit and later should make it much more > > resistant to failures for higher (!costly) orders. So I would definitely > > encourage you to retest with the current _complete_ mmotm tree. > > OK, will run the test on this branch: > https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 > with head commit: > commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e > Author: Michal Hocko> Date: Thu Apr 28 12:03:24 2016 +0200 > > drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable > > Please let me know if this isn't right. Yes that should contain all the oom related patches in the mmotm tree. Thanks! -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Fri 29-04-16 16:59:37, Aaron Lu wrote: > On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > > All of them are order-2 and this was a known problem for "mm, oom: > > rework oom detection" commit and later should make it much more > > resistant to failures for higher (!costly) orders. So I would definitely > > encourage you to retest with the current _complete_ mmotm tree. > > OK, will run the test on this branch: > https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 > with head commit: > commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e > Author: Michal Hocko > Date: Thu Apr 28 12:03:24 2016 +0200 > > drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable > > Please let me know if this isn't right. Yes that should contain all the oom related patches in the mmotm tree. Thanks! -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > All of them are order-2 and this was a known problem for "mm, oom: > rework oom detection" commit and later should make it much more > resistant to failures for higher (!costly) orders. So I would definitely > encourage you to retest with the current _complete_ mmotm tree. OK, will run the test on this branch: https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 with head commit: commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e Author: Michal HockoDate: Thu Apr 28 12:03:24 2016 +0200 drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable Please let me know if this isn't right.
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Thu, Apr 28, 2016 at 01:21:35PM +0200, Michal Hocko wrote: > All of them are order-2 and this was a known problem for "mm, oom: > rework oom detection" commit and later should make it much more > resistant to failures for higher (!costly) orders. So I would definitely > encourage you to retest with the current _complete_ mmotm tree. OK, will run the test on this branch: https://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git since-4.5 with head commit: commit 81cc2e6f1e8bd81ebc7564a3cd3797844ee1712e Author: Michal Hocko Date: Thu Apr 28 12:03:24 2016 +0200 drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable Please let me know if this isn't right.
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Thu 28-04-16 17:45:23, Aaron Lu wrote: > On 04/28/2016 04:57 PM, Michal Hocko wrote: > > On Thu 28-04-16 13:17:08, Aaron Lu wrote: [...] > >> I have the same doubt too, but the results look really stable(only for > >> commit 0da9597ac9c0, see below for more explanation). > > > > I cannot seem to find this sha1. Where does it come from? linux-next? > > Neither can I... > The commit should come from 0day Kbuild service I suppose, which is a > robot to do automatic fetch/building etc. > Could it be that the commit appeared in linux-next some day and then > gone? This wouldn't be unusual because mmotm part of the linux next is constantly rebased. [...] > > OK, so we have 96G for consumers with 32G RAM and 96G of swap space, > > right? That would suggest they should fit in although the swapout could > > be large (2/3 of the faulted memory) and the random pattern can cause > > some trashing. Does the system bahave the same way with the stream anon > > load? Anyway I think we should be able to handle such load, although it > > By stream anon load, do you mean continuous write, without read? Yes > > is quite untypical from my experience because it can be pain with a slow > > swap but ramdisk swap should be as fast as it can get so the swap in/out > > should be basically noop. > > > >> So I guess the question here is, after the OOM rework, is the OOM > >> expected for such a case? If so, then we can ignore this report. > > > > Could you post the OOM reports please? I will try to emulate a similar > > load here as well. > > I attached the dmesg from one of the runs. [...] > [ 77.434044] slabinfo invoked oom-killer: > gfp_mask=0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), order=2, > oom_score_adj=0 [...] > [ 138.090480] kthreadd invoked oom-killer: > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 [...] > [ 141.823925] lkp-setup-rootf invoked oom-killer: > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 All of them are order-2 and this was a known problem for "mm, oom: rework oom detection" commit and later should make it much more resistant to failures for higher (!costly) orders. So I would definitely encourage you to retest with the current _complete_ mmotm tree. -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Thu 28-04-16 17:45:23, Aaron Lu wrote: > On 04/28/2016 04:57 PM, Michal Hocko wrote: > > On Thu 28-04-16 13:17:08, Aaron Lu wrote: [...] > >> I have the same doubt too, but the results look really stable(only for > >> commit 0da9597ac9c0, see below for more explanation). > > > > I cannot seem to find this sha1. Where does it come from? linux-next? > > Neither can I... > The commit should come from 0day Kbuild service I suppose, which is a > robot to do automatic fetch/building etc. > Could it be that the commit appeared in linux-next some day and then > gone? This wouldn't be unusual because mmotm part of the linux next is constantly rebased. [...] > > OK, so we have 96G for consumers with 32G RAM and 96G of swap space, > > right? That would suggest they should fit in although the swapout could > > be large (2/3 of the faulted memory) and the random pattern can cause > > some trashing. Does the system bahave the same way with the stream anon > > load? Anyway I think we should be able to handle such load, although it > > By stream anon load, do you mean continuous write, without read? Yes > > is quite untypical from my experience because it can be pain with a slow > > swap but ramdisk swap should be as fast as it can get so the swap in/out > > should be basically noop. > > > >> So I guess the question here is, after the OOM rework, is the OOM > >> expected for such a case? If so, then we can ignore this report. > > > > Could you post the OOM reports please? I will try to emulate a similar > > load here as well. > > I attached the dmesg from one of the runs. [...] > [ 77.434044] slabinfo invoked oom-killer: > gfp_mask=0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), order=2, > oom_score_adj=0 [...] > [ 138.090480] kthreadd invoked oom-killer: > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 [...] > [ 141.823925] lkp-setup-rootf invoked oom-killer: > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 All of them are order-2 and this was a known problem for "mm, oom: rework oom detection" commit and later should make it much more resistant to failures for higher (!costly) orders. So I would definitely encourage you to retest with the current _complete_ mmotm tree. -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Thu 28-04-16 13:17:08, Aaron Lu wrote: > On Wed, Apr 27, 2016 at 11:17:19AM +0200, Michal Hocko wrote: > > On Wed 27-04-16 16:44:31, Huang, Ying wrote: > > > Michal Hockowrites: > > > > > > > On Wed 27-04-16 16:20:43, Huang, Ying wrote: > > > >> Michal Hocko writes: > > > >> > > > >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: > > > >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with > > > >> >> the following commit: > > > >> > > > > >> > Could you be more specific what the test does please? > > > >> > > > >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated > > > >> pmem > > > >> device is used as a swap device, and a test program will allocate/write > > > >> anonymous memory randomly to exercise page allocation, reclaiming, and > > > >> swapping in code path. > > > > > > > > Can I download the test with the setup to play with this? > > > > > > There are reproduce steps in the original report email. > > > > > > To reproduce: > > > > > > git clone > > > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > > > cd lkp-tests > > > bin/lkp install job.yaml # job file is attached in this email > > > bin/lkp run job.yaml > > > > > > > > > The job.yaml and kconfig file are attached in the original report email. > > > > Thanks for the instructions. My bad I have overlooked that in the > > initial email. I have checked the configuration file and it seems rather > > hardcoded for a particular HW. It expects a machine with 128G and > > reserves 96G!4G which might lead to different amount of memory in the > > end depending on the particular memory layout. > > Indeed, the job file needs manual change. > The attached job file is the one we used on the test machine. > > > > > Before I go and try to recreate a similar setup, how stable are the > > results from this test. Random access pattern sounds like rather > > volatile to be consider for a throughput test. Or is there any other > > side effect I am missing and something fails which didn't use to > > previously. > > I have the same doubt too, but the results look really stable(only for > commit 0da9597ac9c0, see below for more explanation). I cannot seem to find this sha1. Where does it come from? linux-next? > We did 8 runs for this report and the standard deviation(represented by > the %stddev shown in the original report) is used to show exactly this. > > I just checked the results again and found that the 8 runs for your > commit faad2185f482 all OOMed, only 1 of them is able to finish the test > before the OOM occur and got a throughput value of 38653. If you are talking about "mm, oom: rework oom detection" then this wouldn't be that surprising. There are follow up patches which fortify the oom detection. Does the same happen with the whole series applied? Also does the test ever OOM before the oom rework? > The source code for this test is here: > https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/tree/usemem.c thanks for the pointer > And it's started as: > ./usemem --runtime 300 -n 16 --random 6368538624 > which means to fork 16 processes, each dealing with 6GiB around data. By > dealing here, I mean the process each will mmap an anonymous region of > 6GiB size and then write data to that area at random place, thus will > trigger swapouts and swapins after the memory is used up(since the > system has 128GiB memory and 96GiB is used by the pmem driver as swap > space, the memory will be used up after a little while). OK, so we have 96G for consumers with 32G RAM and 96G of swap space, right? That would suggest they should fit in although the swapout could be large (2/3 of the faulted memory) and the random pattern can cause some trashing. Does the system bahave the same way with the stream anon load? Anyway I think we should be able to handle such load, although it is quite untypical from my experience because it can be pain with a slow swap but ramdisk swap should be as fast as it can get so the swap in/out should be basically noop. > So I guess the question here is, after the OOM rework, is the OOM > expected for such a case? If so, then we can ignore this report. Could you post the OOM reports please? I will try to emulate a similar load here as well. Thanks! -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Thu 28-04-16 13:17:08, Aaron Lu wrote: > On Wed, Apr 27, 2016 at 11:17:19AM +0200, Michal Hocko wrote: > > On Wed 27-04-16 16:44:31, Huang, Ying wrote: > > > Michal Hocko writes: > > > > > > > On Wed 27-04-16 16:20:43, Huang, Ying wrote: > > > >> Michal Hocko writes: > > > >> > > > >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: > > > >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with > > > >> >> the following commit: > > > >> > > > > >> > Could you be more specific what the test does please? > > > >> > > > >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated > > > >> pmem > > > >> device is used as a swap device, and a test program will allocate/write > > > >> anonymous memory randomly to exercise page allocation, reclaiming, and > > > >> swapping in code path. > > > > > > > > Can I download the test with the setup to play with this? > > > > > > There are reproduce steps in the original report email. > > > > > > To reproduce: > > > > > > git clone > > > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > > > cd lkp-tests > > > bin/lkp install job.yaml # job file is attached in this email > > > bin/lkp run job.yaml > > > > > > > > > The job.yaml and kconfig file are attached in the original report email. > > > > Thanks for the instructions. My bad I have overlooked that in the > > initial email. I have checked the configuration file and it seems rather > > hardcoded for a particular HW. It expects a machine with 128G and > > reserves 96G!4G which might lead to different amount of memory in the > > end depending on the particular memory layout. > > Indeed, the job file needs manual change. > The attached job file is the one we used on the test machine. > > > > > Before I go and try to recreate a similar setup, how stable are the > > results from this test. Random access pattern sounds like rather > > volatile to be consider for a throughput test. Or is there any other > > side effect I am missing and something fails which didn't use to > > previously. > > I have the same doubt too, but the results look really stable(only for > commit 0da9597ac9c0, see below for more explanation). I cannot seem to find this sha1. Where does it come from? linux-next? > We did 8 runs for this report and the standard deviation(represented by > the %stddev shown in the original report) is used to show exactly this. > > I just checked the results again and found that the 8 runs for your > commit faad2185f482 all OOMed, only 1 of them is able to finish the test > before the OOM occur and got a throughput value of 38653. If you are talking about "mm, oom: rework oom detection" then this wouldn't be that surprising. There are follow up patches which fortify the oom detection. Does the same happen with the whole series applied? Also does the test ever OOM before the oom rework? > The source code for this test is here: > https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/tree/usemem.c thanks for the pointer > And it's started as: > ./usemem --runtime 300 -n 16 --random 6368538624 > which means to fork 16 processes, each dealing with 6GiB around data. By > dealing here, I mean the process each will mmap an anonymous region of > 6GiB size and then write data to that area at random place, thus will > trigger swapouts and swapins after the memory is used up(since the > system has 128GiB memory and 96GiB is used by the pmem driver as swap > space, the memory will be used up after a little while). OK, so we have 96G for consumers with 32G RAM and 96G of swap space, right? That would suggest they should fit in although the swapout could be large (2/3 of the faulted memory) and the random pattern can cause some trashing. Does the system bahave the same way with the stream anon load? Anyway I think we should be able to handle such load, although it is quite untypical from my experience because it can be pain with a slow swap but ramdisk swap should be as fast as it can get so the swap in/out should be basically noop. > So I guess the question here is, after the OOM rework, is the OOM > expected for such a case? If so, then we can ignore this report. Could you post the OOM reports please? I will try to emulate a similar load here as well. Thanks! -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Wed, Apr 27, 2016 at 11:17:19AM +0200, Michal Hocko wrote: > On Wed 27-04-16 16:44:31, Huang, Ying wrote: > > Michal Hockowrites: > > > > > On Wed 27-04-16 16:20:43, Huang, Ying wrote: > > >> Michal Hocko writes: > > >> > > >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: > > >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the > > >> >> following commit: > > >> > > > >> > Could you be more specific what the test does please? > > >> > > >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem > > >> device is used as a swap device, and a test program will allocate/write > > >> anonymous memory randomly to exercise page allocation, reclaiming, and > > >> swapping in code path. > > > > > > Can I download the test with the setup to play with this? > > > > There are reproduce steps in the original report email. > > > > To reproduce: > > > > git clone > > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > > cd lkp-tests > > bin/lkp install job.yaml # job file is attached in this email > > bin/lkp run job.yaml > > > > > > The job.yaml and kconfig file are attached in the original report email. > > Thanks for the instructions. My bad I have overlooked that in the > initial email. I have checked the configuration file and it seems rather > hardcoded for a particular HW. It expects a machine with 128G and > reserves 96G!4G which might lead to different amount of memory in the > end depending on the particular memory layout. Indeed, the job file needs manual change. The attached job file is the one we used on the test machine. > > Before I go and try to recreate a similar setup, how stable are the > results from this test. Random access pattern sounds like rather > volatile to be consider for a throughput test. Or is there any other > side effect I am missing and something fails which didn't use to > previously. I have the same doubt too, but the results look really stable(only for commit 0da9597ac9c0, see below for more explanation). We did 8 runs for this report and the standard deviation(represented by the %stddev shown in the original report) is used to show exactly this. I just checked the results again and found that the 8 runs for your commit faad2185f482 all OOMed, only 1 of them is able to finish the test before the OOM occur and got a throughput value of 38653. The source code for this test is here: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/tree/usemem.c And it's started as: ./usemem --runtime 300 -n 16 --random 6368538624 which means to fork 16 processes, each dealing with 6GiB around data. By dealing here, I mean the process each will mmap an anonymous region of 6GiB size and then write data to that area at random place, thus will trigger swapouts and swapins after the memory is used up(since the system has 128GiB memory and 96GiB is used by the pmem driver as swap space, the memory will be used up after a little while). So I guess the question here is, after the OOM rework, is the OOM expected for such a case? If so, then we can ignore this report.
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Wed, Apr 27, 2016 at 11:17:19AM +0200, Michal Hocko wrote: > On Wed 27-04-16 16:44:31, Huang, Ying wrote: > > Michal Hocko writes: > > > > > On Wed 27-04-16 16:20:43, Huang, Ying wrote: > > >> Michal Hocko writes: > > >> > > >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: > > >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the > > >> >> following commit: > > >> > > > >> > Could you be more specific what the test does please? > > >> > > >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem > > >> device is used as a swap device, and a test program will allocate/write > > >> anonymous memory randomly to exercise page allocation, reclaiming, and > > >> swapping in code path. > > > > > > Can I download the test with the setup to play with this? > > > > There are reproduce steps in the original report email. > > > > To reproduce: > > > > git clone > > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > > cd lkp-tests > > bin/lkp install job.yaml # job file is attached in this email > > bin/lkp run job.yaml > > > > > > The job.yaml and kconfig file are attached in the original report email. > > Thanks for the instructions. My bad I have overlooked that in the > initial email. I have checked the configuration file and it seems rather > hardcoded for a particular HW. It expects a machine with 128G and > reserves 96G!4G which might lead to different amount of memory in the > end depending on the particular memory layout. Indeed, the job file needs manual change. The attached job file is the one we used on the test machine. > > Before I go and try to recreate a similar setup, how stable are the > results from this test. Random access pattern sounds like rather > volatile to be consider for a throughput test. Or is there any other > side effect I am missing and something fails which didn't use to > previously. I have the same doubt too, but the results look really stable(only for commit 0da9597ac9c0, see below for more explanation). We did 8 runs for this report and the standard deviation(represented by the %stddev shown in the original report) is used to show exactly this. I just checked the results again and found that the 8 runs for your commit faad2185f482 all OOMed, only 1 of them is able to finish the test before the OOM occur and got a throughput value of 38653. The source code for this test is here: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/tree/usemem.c And it's started as: ./usemem --runtime 300 -n 16 --random 6368538624 which means to fork 16 processes, each dealing with 6GiB around data. By dealing here, I mean the process each will mmap an anonymous region of 6GiB size and then write data to that area at random place, thus will trigger swapouts and swapins after the memory is used up(since the system has 128GiB memory and 96GiB is used by the pmem driver as swap space, the memory will be used up after a little while). So I guess the question here is, after the OOM rework, is the OOM expected for such a case? If so, then we can ignore this report.
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Wed 27-04-16 16:44:31, Huang, Ying wrote: > Michal Hockowrites: > > > On Wed 27-04-16 16:20:43, Huang, Ying wrote: > >> Michal Hocko writes: > >> > >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: > >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the > >> >> following commit: > >> > > >> > Could you be more specific what the test does please? > >> > >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem > >> device is used as a swap device, and a test program will allocate/write > >> anonymous memory randomly to exercise page allocation, reclaiming, and > >> swapping in code path. > > > > Can I download the test with the setup to play with this? > > There are reproduce steps in the original report email. > > To reproduce: > > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp run job.yaml > > > The job.yaml and kconfig file are attached in the original report email. Thanks for the instructions. My bad I have overlooked that in the initial email. I have checked the configuration file and it seems rather hardcoded for a particular HW. It expects a machine with 128G and reserves 96G!4G which might lead to different amount of memory in the end depending on the particular memory layout. Before I go and try to recreate a similar setup, how stable are the results from this test. Random access pattern sounds like rather volatile to be consider for a throughput test. Or is there any other side effect I am missing and something fails which didn't use to previously. -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Wed 27-04-16 16:44:31, Huang, Ying wrote: > Michal Hocko writes: > > > On Wed 27-04-16 16:20:43, Huang, Ying wrote: > >> Michal Hocko writes: > >> > >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: > >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the > >> >> following commit: > >> > > >> > Could you be more specific what the test does please? > >> > >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem > >> device is used as a swap device, and a test program will allocate/write > >> anonymous memory randomly to exercise page allocation, reclaiming, and > >> swapping in code path. > > > > Can I download the test with the setup to play with this? > > There are reproduce steps in the original report email. > > To reproduce: > > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp run job.yaml > > > The job.yaml and kconfig file are attached in the original report email. Thanks for the instructions. My bad I have overlooked that in the initial email. I have checked the configuration file and it seems rather hardcoded for a particular HW. It expects a machine with 128G and reserves 96G!4G which might lead to different amount of memory in the end depending on the particular memory layout. Before I go and try to recreate a similar setup, how stable are the results from this test. Random access pattern sounds like rather volatile to be consider for a throughput test. Or is there any other side effect I am missing and something fails which didn't use to previously. -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
Michal Hockowrites: > On Wed 27-04-16 16:20:43, Huang, Ying wrote: >> Michal Hocko writes: >> >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the >> >> following commit: >> > >> > Could you be more specific what the test does please? >> >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem >> device is used as a swap device, and a test program will allocate/write >> anonymous memory randomly to exercise page allocation, reclaiming, and >> swapping in code path. > > Can I download the test with the setup to play with this? There are reproduce steps in the original report email. To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml The job.yaml and kconfig file are attached in the original report email. Best Regards, Huang, Ying
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
Michal Hocko writes: > On Wed 27-04-16 16:20:43, Huang, Ying wrote: >> Michal Hocko writes: >> >> > On Wed 27-04-16 11:15:56, kernel test robot wrote: >> >> FYI, we noticed vm-scalability.throughput -11.8% regression with the >> >> following commit: >> > >> > Could you be more specific what the test does please? >> >> The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem >> device is used as a swap device, and a test program will allocate/write >> anonymous memory randomly to exercise page allocation, reclaiming, and >> swapping in code path. > > Can I download the test with the setup to play with this? There are reproduce steps in the original report email. To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml The job.yaml and kconfig file are attached in the original report email. Best Regards, Huang, Ying
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Wed 27-04-16 16:20:43, Huang, Ying wrote: > Michal Hockowrites: > > > On Wed 27-04-16 11:15:56, kernel test robot wrote: > >> FYI, we noticed vm-scalability.throughput -11.8% regression with the > >> following commit: > > > > Could you be more specific what the test does please? > > The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem > device is used as a swap device, and a test program will allocate/write > anonymous memory randomly to exercise page allocation, reclaiming, and > swapping in code path. Can I download the test with the setup to play with this? -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
On Wed 27-04-16 16:20:43, Huang, Ying wrote: > Michal Hocko writes: > > > On Wed 27-04-16 11:15:56, kernel test robot wrote: > >> FYI, we noticed vm-scalability.throughput -11.8% regression with the > >> following commit: > > > > Could you be more specific what the test does please? > > The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem > device is used as a swap device, and a test program will allocate/write > anonymous memory randomly to exercise page allocation, reclaiming, and > swapping in code path. Can I download the test with the setup to play with this? -- Michal Hocko SUSE Labs
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
Michal Hockowrites: > On Wed 27-04-16 11:15:56, kernel test robot wrote: >> FYI, we noticed vm-scalability.throughput -11.8% regression with the >> following commit: > > Could you be more specific what the test does please? The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem device is used as a swap device, and a test program will allocate/write anonymous memory randomly to exercise page allocation, reclaiming, and swapping in code path. Best Regards, Huang, Ying
Re: [LKP] [lkp] [mm, oom] faad2185f4: vm-scalability.throughput -11.8% regression
Michal Hocko writes: > On Wed 27-04-16 11:15:56, kernel test robot wrote: >> FYI, we noticed vm-scalability.throughput -11.8% regression with the >> following commit: > > Could you be more specific what the test does please? The sub-testcase of vm-scalability is swap-w-rand. An RAM emulated pmem device is used as a swap device, and a test program will allocate/write anonymous memory randomly to exercise page allocation, reclaiming, and swapping in code path. Best Regards, Huang, Ying