Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu, Jul 02, 2020 at 07:13:02PM +0200, Michal Hocko wrote: > On Thu 02-07-20 09:37:38, Roman Gushchin wrote: > > On Thu, Jul 02, 2020 at 06:22:02PM +0200, Michal Hocko wrote: > > > On Wed 01-07-20 11:45:52, Roman Gushchin wrote: > > > [...] > > > > >From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 > > > > From: Roman Gushchin > > > > Date: Wed, 1 Jul 2020 11:05:32 -0700 > > > > Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible > > > > > > > > Historically the kernel memory accounting was an opt-in feature, which > > > > could be enabled for individual cgroups. But now it's not true, and > > > > it's on by default both on cgroup v1 and cgroup v2. And as long as a > > > > user has at least one non-root memory cgroup, the kernel memory > > > > accounting is on. So in most setups it's either always on (if memory > > > > cgroups are in use and kmem accounting is not disabled), either always > > > > off (otherwise). > > > > > > > > memcg_kmem_enabled() is used in many places to guard the kernel memory > > > > accounting code. If memcg_kmem_enabled() can reverse from returning > > > > true to returning false (as now), we can't rely on it on release paths > > > > and have to check if it was on before. > > > > > > > > If we'll make memcg_kmem_enabled() irreversible (always returning true > > > > after returning it for the first time), it'll make the general logic > > > > more simple and robust. It also will allow to guard some checks which > > > > otherwise would stay unguarded. > > > > > > > > Signed-off-by: Roman Gushchin > > > > --- > > > > mm/memcontrol.c | 6 ++ > > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index 50ae77f3985e..2d018a51c941 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup > > > > *memcg) > > > > objcg->memcg = memcg; > > > > rcu_assign_pointer(memcg->objcg, objcg); > > > > > > > > - static_branch_inc(&memcg_kmem_enabled_key); > > > > + if (!memcg_kmem_enabled()) > > > > + static_branch_inc(&memcg_kmem_enabled_key); > > > > > > Wouldn't be static_branch_enable() more readable? > > > > Agree, will change, add reported-by and tested-by tags and resend. > > Thanks! > > Feel free to add > Acked-by: Michal Hocko Thank you! > > > Btw, don't we wanna to change memcg_kmem_enabled() definition > > from static_branch_unlikely() to static_branch_likely()? > > Honestly, I do not know what would be the impact but considering kmem > is enabled unless explicitly disabled these days then likely sounds more > logical from reading POV. I do not think that early allocations until > the first memcg is created is the case to optimize for. > Worth a separate patch I guess. Yeah, I doubt there will be any measurable difference, it just strained my eyes. I prepare a small set of cleanups/cosmetic fixes, will add it to them. Thanks!
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu 02-07-20 09:37:38, Roman Gushchin wrote: > On Thu, Jul 02, 2020 at 06:22:02PM +0200, Michal Hocko wrote: > > On Wed 01-07-20 11:45:52, Roman Gushchin wrote: > > [...] > > > >From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 > > > From: Roman Gushchin > > > Date: Wed, 1 Jul 2020 11:05:32 -0700 > > > Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible > > > > > > Historically the kernel memory accounting was an opt-in feature, which > > > could be enabled for individual cgroups. But now it's not true, and > > > it's on by default both on cgroup v1 and cgroup v2. And as long as a > > > user has at least one non-root memory cgroup, the kernel memory > > > accounting is on. So in most setups it's either always on (if memory > > > cgroups are in use and kmem accounting is not disabled), either always > > > off (otherwise). > > > > > > memcg_kmem_enabled() is used in many places to guard the kernel memory > > > accounting code. If memcg_kmem_enabled() can reverse from returning > > > true to returning false (as now), we can't rely on it on release paths > > > and have to check if it was on before. > > > > > > If we'll make memcg_kmem_enabled() irreversible (always returning true > > > after returning it for the first time), it'll make the general logic > > > more simple and robust. It also will allow to guard some checks which > > > otherwise would stay unguarded. > > > > > > Signed-off-by: Roman Gushchin > > > --- > > > mm/memcontrol.c | 6 ++ > > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 50ae77f3985e..2d018a51c941 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup > > > *memcg) > > > objcg->memcg = memcg; > > > rcu_assign_pointer(memcg->objcg, objcg); > > > > > > - static_branch_inc(&memcg_kmem_enabled_key); > > > + if (!memcg_kmem_enabled()) > > > + static_branch_inc(&memcg_kmem_enabled_key); > > > > Wouldn't be static_branch_enable() more readable? > > Agree, will change, add reported-by and tested-by tags and resend. > Thanks! Feel free to add Acked-by: Michal Hocko > Btw, don't we wanna to change memcg_kmem_enabled() definition > from static_branch_unlikely() to static_branch_likely()? Honestly, I do not know what would be the impact but considering kmem is enabled unless explicitly disabled these days then likely sounds more logical from reading POV. I do not think that early allocations until the first memcg is created is the case to optimize for. Worth a separate patch I guess. -- Michal Hocko SUSE Labs
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu 02-07-20 18:35:31, Vlastimil Babka wrote: > On 7/2/20 6:22 PM, Michal Hocko wrote: > > On Wed 01-07-20 11:45:52, Roman Gushchin wrote: > > [...] > >> >From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 > >> From: Roman Gushchin > >> Date: Wed, 1 Jul 2020 11:05:32 -0700 > >> Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible > >> > >> Historically the kernel memory accounting was an opt-in feature, which > >> could be enabled for individual cgroups. But now it's not true, and > >> it's on by default both on cgroup v1 and cgroup v2. And as long as a > >> user has at least one non-root memory cgroup, the kernel memory > >> accounting is on. So in most setups it's either always on (if memory > >> cgroups are in use and kmem accounting is not disabled), either always > >> off (otherwise). > >> > >> memcg_kmem_enabled() is used in many places to guard the kernel memory > >> accounting code. If memcg_kmem_enabled() can reverse from returning > >> true to returning false (as now), we can't rely on it on release paths > >> and have to check if it was on before. > >> > >> If we'll make memcg_kmem_enabled() irreversible (always returning true > >> after returning it for the first time), it'll make the general logic > >> more simple and robust. It also will allow to guard some checks which > >> otherwise would stay unguarded. > >> > >> Signed-off-by: Roman Gushchin > > Fixes: ? or let Andrew squash it to some patch of your series (it's in mmotm I > think)? I would rather make it its own patch because this is really subtle. -- Michal Hocko SUSE Labs
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu, Jul 02, 2020 at 06:35:31PM +0200, Vlastimil Babka wrote: > On 7/2/20 6:22 PM, Michal Hocko wrote: > > On Wed 01-07-20 11:45:52, Roman Gushchin wrote: > > [...] > >> >From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 > >> From: Roman Gushchin > >> Date: Wed, 1 Jul 2020 11:05:32 -0700 > >> Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible > >> > >> Historically the kernel memory accounting was an opt-in feature, which > >> could be enabled for individual cgroups. But now it's not true, and > >> it's on by default both on cgroup v1 and cgroup v2. And as long as a > >> user has at least one non-root memory cgroup, the kernel memory > >> accounting is on. So in most setups it's either always on (if memory > >> cgroups are in use and kmem accounting is not disabled), either always > >> off (otherwise). > >> > >> memcg_kmem_enabled() is used in many places to guard the kernel memory > >> accounting code. If memcg_kmem_enabled() can reverse from returning > >> true to returning false (as now), we can't rely on it on release paths > >> and have to check if it was on before. > >> > >> If we'll make memcg_kmem_enabled() irreversible (always returning true > >> after returning it for the first time), it'll make the general logic > >> more simple and robust. It also will allow to guard some checks which > >> otherwise would stay unguarded. > >> > >> Signed-off-by: Roman Gushchin > > Fixes: ? or let Andrew squash it to some patch of your series (it's in mmotm I > think)? Hm, it's actually complicated. One obvious case was added by "mm: memcg/slab: save obj_cgroup for non-root slab objects", which is currently in the mm tree, so no stable hash. But I suspect that there are more cases where we just silently leaking a memcg reference. But because the whole setup (going back and forth between 0 and 1+ memory cgroups) can not be easily found in the real life, nobody cares. So I don't think we really need a stable backport. So IMO the best option is to put it as a standalone patch _before_ my series. Does it sound good to you? > > Acked-by: Vlastimil Babka Thanks! > > But see below: > > >> --- > >> mm/memcontrol.c | 6 ++ > >> 1 file changed, 2 insertions(+), 4 deletions(-) > >> > >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> index 50ae77f3985e..2d018a51c941 100644 > >> --- a/mm/memcontrol.c > >> +++ b/mm/memcontrol.c > >> @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup > >> *memcg) > >>objcg->memcg = memcg; > >>rcu_assign_pointer(memcg->objcg, objcg); > >> > >> - static_branch_inc(&memcg_kmem_enabled_key); > >> + if (!memcg_kmem_enabled()) > >> + static_branch_inc(&memcg_kmem_enabled_key); > > > > Wouldn't be static_branch_enable() more readable? > > Yes, and drop the if(). It will just do nothing and return if already enabled. > Maybe slightly less efficient, but this is not a fast path anyway, and it > feels > weird to modify the static key in a branch controlled by the static key itself > (CC peterz in case he wants to add something). Ok, will do in v2. Thanks!
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu, Jul 02, 2020 at 06:22:02PM +0200, Michal Hocko wrote: > On Wed 01-07-20 11:45:52, Roman Gushchin wrote: > [...] > > >From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 > > From: Roman Gushchin > > Date: Wed, 1 Jul 2020 11:05:32 -0700 > > Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible > > > > Historically the kernel memory accounting was an opt-in feature, which > > could be enabled for individual cgroups. But now it's not true, and > > it's on by default both on cgroup v1 and cgroup v2. And as long as a > > user has at least one non-root memory cgroup, the kernel memory > > accounting is on. So in most setups it's either always on (if memory > > cgroups are in use and kmem accounting is not disabled), either always > > off (otherwise). > > > > memcg_kmem_enabled() is used in many places to guard the kernel memory > > accounting code. If memcg_kmem_enabled() can reverse from returning > > true to returning false (as now), we can't rely on it on release paths > > and have to check if it was on before. > > > > If we'll make memcg_kmem_enabled() irreversible (always returning true > > after returning it for the first time), it'll make the general logic > > more simple and robust. It also will allow to guard some checks which > > otherwise would stay unguarded. > > > > Signed-off-by: Roman Gushchin > > --- > > mm/memcontrol.c | 6 ++ > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 50ae77f3985e..2d018a51c941 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) > > objcg->memcg = memcg; > > rcu_assign_pointer(memcg->objcg, objcg); > > > > - static_branch_inc(&memcg_kmem_enabled_key); > > + if (!memcg_kmem_enabled()) > > + static_branch_inc(&memcg_kmem_enabled_key); > > Wouldn't be static_branch_enable() more readable? Agree, will change, add reported-by and tested-by tags and resend. Thanks! Btw, don't we wanna to change memcg_kmem_enabled() definition from static_branch_unlikely() to static_branch_likely()?
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On 7/2/20 6:22 PM, Michal Hocko wrote: > On Wed 01-07-20 11:45:52, Roman Gushchin wrote: > [...] >> >From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 >> From: Roman Gushchin >> Date: Wed, 1 Jul 2020 11:05:32 -0700 >> Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible >> >> Historically the kernel memory accounting was an opt-in feature, which >> could be enabled for individual cgroups. But now it's not true, and >> it's on by default both on cgroup v1 and cgroup v2. And as long as a >> user has at least one non-root memory cgroup, the kernel memory >> accounting is on. So in most setups it's either always on (if memory >> cgroups are in use and kmem accounting is not disabled), either always >> off (otherwise). >> >> memcg_kmem_enabled() is used in many places to guard the kernel memory >> accounting code. If memcg_kmem_enabled() can reverse from returning >> true to returning false (as now), we can't rely on it on release paths >> and have to check if it was on before. >> >> If we'll make memcg_kmem_enabled() irreversible (always returning true >> after returning it for the first time), it'll make the general logic >> more simple and robust. It also will allow to guard some checks which >> otherwise would stay unguarded. >> >> Signed-off-by: Roman Gushchin Fixes: ? or let Andrew squash it to some patch of your series (it's in mmotm I think)? Acked-by: Vlastimil Babka But see below: >> --- >> mm/memcontrol.c | 6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 50ae77f3985e..2d018a51c941 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) >> objcg->memcg = memcg; >> rcu_assign_pointer(memcg->objcg, objcg); >> >> -static_branch_inc(&memcg_kmem_enabled_key); >> +if (!memcg_kmem_enabled()) >> +static_branch_inc(&memcg_kmem_enabled_key); > > Wouldn't be static_branch_enable() more readable? Yes, and drop the if(). It will just do nothing and return if already enabled. Maybe slightly less efficient, but this is not a fast path anyway, and it feels weird to modify the static key in a branch controlled by the static key itself (CC peterz in case he wants to add something). >> /* >> * A memory cgroup is considered kmem-online as soon as it gets >> * kmemcg_id. Setting the id after enabling static branching will >> @@ -3643,9 +3644,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg) >> /* css_alloc() failed, offlining didn't happen */ >> if (unlikely(memcg->kmem_state == KMEM_ONLINE)) >> memcg_offline_kmem(memcg); >> - >> -if (memcg->kmem_state == KMEM_ALLOCATED) >> -static_branch_dec(&memcg_kmem_enabled_key); >> } >> #else >> static int memcg_online_kmem(struct mem_cgroup *memcg) >> -- >> 2.26.2 >
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Wed 01-07-20 11:45:52, Roman Gushchin wrote: [...] > >From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 > From: Roman Gushchin > Date: Wed, 1 Jul 2020 11:05:32 -0700 > Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible > > Historically the kernel memory accounting was an opt-in feature, which > could be enabled for individual cgroups. But now it's not true, and > it's on by default both on cgroup v1 and cgroup v2. And as long as a > user has at least one non-root memory cgroup, the kernel memory > accounting is on. So in most setups it's either always on (if memory > cgroups are in use and kmem accounting is not disabled), either always > off (otherwise). > > memcg_kmem_enabled() is used in many places to guard the kernel memory > accounting code. If memcg_kmem_enabled() can reverse from returning > true to returning false (as now), we can't rely on it on release paths > and have to check if it was on before. > > If we'll make memcg_kmem_enabled() irreversible (always returning true > after returning it for the first time), it'll make the general logic > more simple and robust. It also will allow to guard some checks which > otherwise would stay unguarded. > > Signed-off-by: Roman Gushchin > --- > mm/memcontrol.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 50ae77f3985e..2d018a51c941 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) > objcg->memcg = memcg; > rcu_assign_pointer(memcg->objcg, objcg); > > - static_branch_inc(&memcg_kmem_enabled_key); > + if (!memcg_kmem_enabled()) > + static_branch_inc(&memcg_kmem_enabled_key); Wouldn't be static_branch_enable() more readable? > /* >* A memory cgroup is considered kmem-online as soon as it gets >* kmemcg_id. Setting the id after enabling static branching will > @@ -3643,9 +3644,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg) > /* css_alloc() failed, offlining didn't happen */ > if (unlikely(memcg->kmem_state == KMEM_ONLINE)) > memcg_offline_kmem(memcg); > - > - if (memcg->kmem_state == KMEM_ALLOCATED) > - static_branch_dec(&memcg_kmem_enabled_key); > } > #else > static int memcg_online_kmem(struct mem_cgroup *memcg) > -- > 2.26.2 -- Michal Hocko SUSE Labs
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Wed, Jul 1, 2020 at 11:46 AM Roman Gushchin wrote: ... > -- > > From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001 > From: Roman Gushchin > Date: Wed, 1 Jul 2020 11:05:32 -0700 > Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible > > Historically the kernel memory accounting was an opt-in feature, which > could be enabled for individual cgroups. But now it's not true, and > it's on by default both on cgroup v1 and cgroup v2. And as long as a > user has at least one non-root memory cgroup, the kernel memory > accounting is on. So in most setups it's either always on (if memory > cgroups are in use and kmem accounting is not disabled), either always > off (otherwise). > > memcg_kmem_enabled() is used in many places to guard the kernel memory > accounting code. If memcg_kmem_enabled() can reverse from returning > true to returning false (as now), we can't rely on it on release paths > and have to check if it was on before. > > If we'll make memcg_kmem_enabled() irreversible (always returning true > after returning it for the first time), it'll make the general logic > more simple and robust. It also will allow to guard some checks which > otherwise would stay unguarded. > > Signed-off-by: Roman Gushchin Reviewed-by: Shakeel Butt
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu, Jul 02, 2020 at 09:25:08PM +0530, Naresh Kamboju wrote: > On Thu, 2 Jul 2020 at 21:19, Roman Gushchin wrote: > > > > On Thu, Jul 02, 2020 at 12:22:03PM +0530, Naresh Kamboju wrote: > > > On Thu, 2 Jul 2020 at 00:16, Roman Gushchin wrote: > > > > > > > > On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote: > > > > > Smells like a different observable problem with the same/similar > > > > > culprit > > > > > as > > > > > http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=x...@mail.gmail.com > > > > > > > > > > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote: > > > > > > While running LTP mm test suite on x86_64 device the BUG: Bad page > > > > > > state in process > > > > > > noticed on linux-next 20200630 tag. > > > > > > > > > > > > Steps to reproduce: > > > > > > - boot linux-next 20200630 kernel on x86_64 device > > > > > > - cd /opt/ltp > > > > > > - ./runltp -f mm > > > > > > > > > > > > metadata: > > > > > > git branch: master > > > > > > git repo: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > > > > > git commit: f2b92b14533e646e434523abdbafddb727c23898 > > > > > > git describe: next-20200630 > > > > > > kernel-config: > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e= > > > > > > > > > > > > Test crash dump: > > > > > > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) > > > > > > 64*16kB > > > > > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE) > > > > > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB > > > > > > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0 > > > > > > hugepages_surp=0 hugepages_size=2048kB > > > > > > [ 803.930806] 2418 total pagecache pages > > > > > > [ 803.934559] 0 pages in swap cache > > > > > > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0 > > > > > > [ 803.943108] Free swap = 0kB > > > > > > [ 803.945997] Total swap = 0kB > > > > > > [ 803.948885] 4181245 pages RAM > > > > > > [ 803.951857] 0 pages HighMem/MovableOnly > > > > > > [ 803.955695] 626062 pages reserved > > > > > > [ 803.959016] Tasks state (memory values in pages): > > > > > > [ 803.963722] [ pid ] uid tgid total_vm rss > > > > > > pgtables_bytes > > > > > > swapents oom_score_adj name > > > > > > [ 803.972336] [332] 0 332 8529 507 106496 > > > > > > 0 0 systemd-journal > > > > > > [ 803.981387] [349] 0 34910730 508 118784 > > > > > > 0 -1000 systemd-udevd > > > > > > [ 803.990262] [371] 993 371 8666 108 118784 > > > > > > 0 0 systemd-network > > > > > > [ 803.999306] [379] 992 379 9529 99 110592 > > > > > > 0 0 systemd-resolve > > > > > > [ 804.008347] [388] 0 388 2112 1961440 > > > > > > 0 0 syslogd > > > > > > [ 804.016709] [389] 995 389 9308 108 122880 > > > > > > 0 0 avahi-daemon > > > > > > [ 804.025517] [391] 0 391 1075 2157344 > > > > > > 0 0 acpid > > > > > > [ 804.033695] [394] 995 394 9277 68 114688 > > > > > > 0 0 avahi-daemon > > > > > > [ 804.042476] [396] 996 396 7241 154 102400 > > > > > > 0 -900 dbus-daemon > > > > > > [ 804.051170] [397] 0 397 2313 7265536 > > > > > > 0 0 crond > > > > > > [ 804.059349] [399] 0 39934025 161 167936 > > > > > > 0 0 thermald > > > > > > [ 804.067783] [400] 0 400 8615 115 110592 > > > > > > 0 0 systemd-logind > > > > > > [ 804.076734] [401] 0 401 2112 3257344 > > > > > > 0 0 klogd > > > > > > [ 804.084907] [449] 65534 449 3245 3969632 > > > > > > 0 0 dnsmasq > > > > > > [ 804.093254] [450] 0 450 3187 3373728 > > > > > > 0 0 agetty > > > > > > [ 804.101541] [452] 0 452 3187 3373728 > > > > > > 0 0 agetty > > > > > > [ 804.109826] [453] 0 45314707 107 159744 > > > > > > 0 0 login > > > > > > [ 804.118007] [463] 0 463 9532 163 122880 > > > > > > 0 0 systemd > > > > > > [ 804.126362] [464] 0 46416132 424 172032 > > > > > > 0 0 (sd-pam) > > > > > > [ 804.134803] [468] 0 468 4538 10581920 > > > > > > 0 0 sh > > > > > > [ 804.142741] [472] 0 47211102 83 131072 > > > > > > 0 0 su > > > > > > [ 804.150680] [473] 0 473 4538 9981920 > > > > > >
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu, 2 Jul 2020 at 21:19, Roman Gushchin wrote: > > On Thu, Jul 02, 2020 at 12:22:03PM +0530, Naresh Kamboju wrote: > > On Thu, 2 Jul 2020 at 00:16, Roman Gushchin wrote: > > > > > > On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote: > > > > Smells like a different observable problem with the same/similar culprit > > > > as > > > > http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=x...@mail.gmail.com > > > > > > > > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote: > > > > > While running LTP mm test suite on x86_64 device the BUG: Bad page > > > > > state in process > > > > > noticed on linux-next 20200630 tag. > > > > > > > > > > Steps to reproduce: > > > > > - boot linux-next 20200630 kernel on x86_64 device > > > > > - cd /opt/ltp > > > > > - ./runltp -f mm > > > > > > > > > > metadata: > > > > > git branch: master > > > > > git repo: > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > > > > git commit: f2b92b14533e646e434523abdbafddb727c23898 > > > > > git describe: next-20200630 > > > > > kernel-config: > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e= > > > > > > > > > > Test crash dump: > > > > > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB > > > > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE) > > > > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB > > > > > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0 > > > > > hugepages_surp=0 hugepages_size=2048kB > > > > > [ 803.930806] 2418 total pagecache pages > > > > > [ 803.934559] 0 pages in swap cache > > > > > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0 > > > > > [ 803.943108] Free swap = 0kB > > > > > [ 803.945997] Total swap = 0kB > > > > > [ 803.948885] 4181245 pages RAM > > > > > [ 803.951857] 0 pages HighMem/MovableOnly > > > > > [ 803.955695] 626062 pages reserved > > > > > [ 803.959016] Tasks state (memory values in pages): > > > > > [ 803.963722] [ pid ] uid tgid total_vm rss pgtables_bytes > > > > > swapents oom_score_adj name > > > > > [ 803.972336] [332] 0 332 8529 507 106496 > > > > > 0 0 systemd-journal > > > > > [ 803.981387] [349] 0 34910730 508 118784 > > > > > 0 -1000 systemd-udevd > > > > > [ 803.990262] [371] 993 371 8666 108 118784 > > > > > 0 0 systemd-network > > > > > [ 803.999306] [379] 992 379 9529 99 110592 > > > > > 0 0 systemd-resolve > > > > > [ 804.008347] [388] 0 388 2112 1961440 > > > > > 0 0 syslogd > > > > > [ 804.016709] [389] 995 389 9308 108 122880 > > > > > 0 0 avahi-daemon > > > > > [ 804.025517] [391] 0 391 1075 2157344 > > > > > 0 0 acpid > > > > > [ 804.033695] [394] 995 394 9277 68 114688 > > > > > 0 0 avahi-daemon > > > > > [ 804.042476] [396] 996 396 7241 154 102400 > > > > > 0 -900 dbus-daemon > > > > > [ 804.051170] [397] 0 397 2313 7265536 > > > > > 0 0 crond > > > > > [ 804.059349] [399] 0 39934025 161 167936 > > > > > 0 0 thermald > > > > > [ 804.067783] [400] 0 400 8615 115 110592 > > > > > 0 0 systemd-logind > > > > > [ 804.076734] [401] 0 401 2112 3257344 > > > > > 0 0 klogd > > > > > [ 804.084907] [449] 65534 449 3245 3969632 > > > > > 0 0 dnsmasq > > > > > [ 804.093254] [450] 0 450 3187 3373728 > > > > > 0 0 agetty > > > > > [ 804.101541] [452] 0 452 3187 3373728 > > > > > 0 0 agetty > > > > > [ 804.109826] [453] 0 45314707 107 159744 > > > > > 0 0 login > > > > > [ 804.118007] [463] 0 463 9532 163 122880 > > > > > 0 0 systemd > > > > > [ 804.126362] [464] 0 46416132 424 172032 > > > > > 0 0 (sd-pam) > > > > > [ 804.134803] [468] 0 468 4538 10581920 > > > > > 0 0 sh > > > > > [ 804.142741] [472] 0 47211102 83 131072 > > > > > 0 0 su > > > > > [ 804.150680] [473] 0 473 4538 9981920 > > > > > 0 0 sh > > > > > [ 804.158637] [519] 0 519 2396 5761440 > > > > > 0 0 lava-test-runne > > > > > [ 804.167700] [ 1220] 0 1220 2396 5261440 > > > > > 0 0 lava-test-shell > > > > > [ 804.176738] [
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu, Jul 02, 2020 at 12:22:03PM +0530, Naresh Kamboju wrote: > On Thu, 2 Jul 2020 at 00:16, Roman Gushchin wrote: > > > > On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote: > > > Smells like a different observable problem with the same/similar culprit > > > as > > > http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=x...@mail.gmail.com > > > > > > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote: > > > > While running LTP mm test suite on x86_64 device the BUG: Bad page > > > > state in process > > > > noticed on linux-next 20200630 tag. > > > > > > > > Steps to reproduce: > > > > - boot linux-next 20200630 kernel on x86_64 device > > > > - cd /opt/ltp > > > > - ./runltp -f mm > > > > > > > > metadata: > > > > git branch: master > > > > git repo: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > > > git commit: f2b92b14533e646e434523abdbafddb727c23898 > > > > git describe: next-20200630 > > > > kernel-config: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e= > > > > > > > > Test crash dump: > > > > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB > > > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE) > > > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB > > > > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0 > > > > hugepages_surp=0 hugepages_size=2048kB > > > > [ 803.930806] 2418 total pagecache pages > > > > [ 803.934559] 0 pages in swap cache > > > > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0 > > > > [ 803.943108] Free swap = 0kB > > > > [ 803.945997] Total swap = 0kB > > > > [ 803.948885] 4181245 pages RAM > > > > [ 803.951857] 0 pages HighMem/MovableOnly > > > > [ 803.955695] 626062 pages reserved > > > > [ 803.959016] Tasks state (memory values in pages): > > > > [ 803.963722] [ pid ] uid tgid total_vm rss pgtables_bytes > > > > swapents oom_score_adj name > > > > [ 803.972336] [332] 0 332 8529 507 106496 > > > > 0 0 systemd-journal > > > > [ 803.981387] [349] 0 34910730 508 118784 > > > > 0 -1000 systemd-udevd > > > > [ 803.990262] [371] 993 371 8666 108 118784 > > > > 0 0 systemd-network > > > > [ 803.999306] [379] 992 379 9529 99 110592 > > > > 0 0 systemd-resolve > > > > [ 804.008347] [388] 0 388 2112 1961440 > > > > 0 0 syslogd > > > > [ 804.016709] [389] 995 389 9308 108 122880 > > > > 0 0 avahi-daemon > > > > [ 804.025517] [391] 0 391 1075 2157344 > > > > 0 0 acpid > > > > [ 804.033695] [394] 995 394 9277 68 114688 > > > > 0 0 avahi-daemon > > > > [ 804.042476] [396] 996 396 7241 154 102400 > > > > 0 -900 dbus-daemon > > > > [ 804.051170] [397] 0 397 2313 7265536 > > > > 0 0 crond > > > > [ 804.059349] [399] 0 39934025 161 167936 > > > > 0 0 thermald > > > > [ 804.067783] [400] 0 400 8615 115 110592 > > > > 0 0 systemd-logind > > > > [ 804.076734] [401] 0 401 2112 3257344 > > > > 0 0 klogd > > > > [ 804.084907] [449] 65534 449 3245 3969632 > > > > 0 0 dnsmasq > > > > [ 804.093254] [450] 0 450 3187 3373728 > > > > 0 0 agetty > > > > [ 804.101541] [452] 0 452 3187 3373728 > > > > 0 0 agetty > > > > [ 804.109826] [453] 0 45314707 107 159744 > > > > 0 0 login > > > > [ 804.118007] [463] 0 463 9532 163 122880 > > > > 0 0 systemd > > > > [ 804.126362] [464] 0 46416132 424 172032 > > > > 0 0 (sd-pam) > > > > [ 804.134803] [468] 0 468 4538 10581920 > > > > 0 0 sh > > > > [ 804.142741] [472] 0 47211102 83 131072 > > > > 0 0 su > > > > [ 804.150680] [473] 0 473 4538 9981920 > > > > 0 0 sh > > > > [ 804.158637] [519] 0 519 2396 5761440 > > > > 0 0 lava-test-runne > > > > [ 804.167700] [ 1220] 0 1220 2396 5261440 > > > > 0 0 lava-test-shell > > > > [ 804.176738] [ 1221] 0 1221 2396 5561440 > > > > 0 0 sh > > > > [ 804.184680] [ 1223] 0 1223 2462 13561440 > > > > 0 0 ltp.sh > > > > [ 804.192946] [ 1242] 0 1242 2462 134
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Thu, 2 Jul 2020 at 00:16, Roman Gushchin wrote: > > On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote: > > Smells like a different observable problem with the same/similar culprit > > as > > http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=x...@mail.gmail.com > > > > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote: > > > While running LTP mm test suite on x86_64 device the BUG: Bad page > > > state in process > > > noticed on linux-next 20200630 tag. > > > > > > Steps to reproduce: > > > - boot linux-next 20200630 kernel on x86_64 device > > > - cd /opt/ltp > > > - ./runltp -f mm > > > > > > metadata: > > > git branch: master > > > git repo: > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > > git commit: f2b92b14533e646e434523abdbafddb727c23898 > > > git describe: next-20200630 > > > kernel-config: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e= > > > > > > Test crash dump: > > > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB > > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE) > > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB > > > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0 > > > hugepages_surp=0 hugepages_size=2048kB > > > [ 803.930806] 2418 total pagecache pages > > > [ 803.934559] 0 pages in swap cache > > > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0 > > > [ 803.943108] Free swap = 0kB > > > [ 803.945997] Total swap = 0kB > > > [ 803.948885] 4181245 pages RAM > > > [ 803.951857] 0 pages HighMem/MovableOnly > > > [ 803.955695] 626062 pages reserved > > > [ 803.959016] Tasks state (memory values in pages): > > > [ 803.963722] [ pid ] uid tgid total_vm rss pgtables_bytes > > > swapents oom_score_adj name > > > [ 803.972336] [332] 0 332 8529 507 106496 > > > 0 0 systemd-journal > > > [ 803.981387] [349] 0 34910730 508 118784 > > > 0 -1000 systemd-udevd > > > [ 803.990262] [371] 993 371 8666 108 118784 > > > 0 0 systemd-network > > > [ 803.999306] [379] 992 379 9529 99 110592 > > > 0 0 systemd-resolve > > > [ 804.008347] [388] 0 388 2112 1961440 > > > 0 0 syslogd > > > [ 804.016709] [389] 995 389 9308 108 122880 > > > 0 0 avahi-daemon > > > [ 804.025517] [391] 0 391 1075 2157344 > > > 0 0 acpid > > > [ 804.033695] [394] 995 394 9277 68 114688 > > > 0 0 avahi-daemon > > > [ 804.042476] [396] 996 396 7241 154 102400 > > > 0 -900 dbus-daemon > > > [ 804.051170] [397] 0 397 2313 7265536 > > > 0 0 crond > > > [ 804.059349] [399] 0 39934025 161 167936 > > > 0 0 thermald > > > [ 804.067783] [400] 0 400 8615 115 110592 > > > 0 0 systemd-logind > > > [ 804.076734] [401] 0 401 2112 3257344 > > > 0 0 klogd > > > [ 804.084907] [449] 65534 449 3245 3969632 > > > 0 0 dnsmasq > > > [ 804.093254] [450] 0 450 3187 3373728 > > > 0 0 agetty > > > [ 804.101541] [452] 0 452 3187 3373728 > > > 0 0 agetty > > > [ 804.109826] [453] 0 45314707 107 159744 > > > 0 0 login > > > [ 804.118007] [463] 0 463 9532 163 122880 > > > 0 0 systemd > > > [ 804.126362] [464] 0 46416132 424 172032 > > > 0 0 (sd-pam) > > > [ 804.134803] [468] 0 468 4538 10581920 > > > 0 0 sh > > > [ 804.142741] [472] 0 47211102 83 131072 > > > 0 0 su > > > [ 804.150680] [473] 0 473 4538 9981920 > > > 0 0 sh > > > [ 804.158637] [519] 0 519 2396 5761440 > > > 0 0 lava-test-runne > > > [ 804.167700] [ 1220] 0 1220 2396 5261440 > > > 0 0 lava-test-shell > > > [ 804.176738] [ 1221] 0 1221 2396 5561440 > > > 0 0 sh > > > [ 804.184680] [ 1223] 0 1223 2462 13561440 > > > 0 0 ltp.sh > > > [ 804.192946] [ 1242] 0 1242 2462 13461440 > > > 0 0 ltp.sh > > > [ 804.201207] [ 1243] 0 1243 2462 13461440 > > > 0 0 ltp.sh > > > [ 804.209475] [ 1244] 0 1244 2462 13461440 > > > 0 0 ltp.sh > > > [ 804.217742] [
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Wed 01-07-20 11:45:52, Roman Gushchin wrote: [...] > So it makes me think that somehow memcg_kmem_enabled() became false > after being true, which can cause refcounting problems as well. Isn't this a similar class of problem as http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsha...@redhat.com? -- Michal Hocko SUSE Labs
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote: > Smells like a different observable problem with the same/similar culprit > as > http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=x...@mail.gmail.com > > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote: > > While running LTP mm test suite on x86_64 device the BUG: Bad page > > state in process > > noticed on linux-next 20200630 tag. > > > > Steps to reproduce: > > - boot linux-next 20200630 kernel on x86_64 device > > - cd /opt/ltp > > - ./runltp -f mm > > > > metadata: > > git branch: master > > git repo: > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > git commit: f2b92b14533e646e434523abdbafddb727c23898 > > git describe: next-20200630 > > kernel-config: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e= > > > > > > Test crash dump: > > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE) > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB > > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0 > > hugepages_surp=0 hugepages_size=2048kB > > [ 803.930806] 2418 total pagecache pages > > [ 803.934559] 0 pages in swap cache > > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0 > > [ 803.943108] Free swap = 0kB > > [ 803.945997] Total swap = 0kB > > [ 803.948885] 4181245 pages RAM > > [ 803.951857] 0 pages HighMem/MovableOnly > > [ 803.955695] 626062 pages reserved > > [ 803.959016] Tasks state (memory values in pages): > > [ 803.963722] [ pid ] uid tgid total_vm rss pgtables_bytes > > swapents oom_score_adj name > > [ 803.972336] [332] 0 332 8529 507 106496 > > 0 0 systemd-journal > > [ 803.981387] [349] 0 34910730 508 118784 > > 0 -1000 systemd-udevd > > [ 803.990262] [371] 993 371 8666 108 118784 > > 0 0 systemd-network > > [ 803.999306] [379] 992 379 9529 99 110592 > > 0 0 systemd-resolve > > [ 804.008347] [388] 0 388 2112 1961440 > > 0 0 syslogd > > [ 804.016709] [389] 995 389 9308 108 122880 > > 0 0 avahi-daemon > > [ 804.025517] [391] 0 391 1075 2157344 > > 0 0 acpid > > [ 804.033695] [394] 995 394 9277 68 114688 > > 0 0 avahi-daemon > > [ 804.042476] [396] 996 396 7241 154 102400 > > 0 -900 dbus-daemon > > [ 804.051170] [397] 0 397 2313 7265536 > > 0 0 crond > > [ 804.059349] [399] 0 39934025 161 167936 > > 0 0 thermald > > [ 804.067783] [400] 0 400 8615 115 110592 > > 0 0 systemd-logind > > [ 804.076734] [401] 0 401 2112 3257344 > > 0 0 klogd > > [ 804.084907] [449] 65534 449 3245 3969632 > > 0 0 dnsmasq > > [ 804.093254] [450] 0 450 3187 3373728 > > 0 0 agetty > > [ 804.101541] [452] 0 452 3187 3373728 > > 0 0 agetty > > [ 804.109826] [453] 0 45314707 107 159744 > > 0 0 login > > [ 804.118007] [463] 0 463 9532 163 122880 > > 0 0 systemd > > [ 804.126362] [464] 0 46416132 424 172032 > > 0 0 (sd-pam) > > [ 804.134803] [468] 0 468 4538 10581920 > > 0 0 sh > > [ 804.142741] [472] 0 47211102 83 131072 > > 0 0 su > > [ 804.150680] [473] 0 473 4538 9981920 > > 0 0 sh > > [ 804.158637] [519] 0 519 2396 5761440 > > 0 0 lava-test-runne > > [ 804.167700] [ 1220] 0 1220 2396 5261440 > > 0 0 lava-test-shell > > [ 804.176738] [ 1221] 0 1221 2396 5561440 > > 0 0 sh > > [ 804.184680] [ 1223] 0 1223 2462 13561440 > > 0 0 ltp.sh > > [ 804.192946] [ 1242] 0 1242 2462 13461440 > > 0 0 ltp.sh > > [ 804.201207] [ 1243] 0 1243 2462 13461440 > > 0 0 ltp.sh > > [ 804.209475] [ 1244] 0 1244 2462 13461440 > > 0 0 ltp.sh > > [ 804.217742] [ 1245] 0 1245 2561 22965536 > > 0 0 runltp > > [ 804.226010] [ 1246] 0 1246 1072 1553248 > > 0 0 tee > > [ 804.234012] [ 1313] 0 1313 1070 2953248 > > 0
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
On Wed, 1 Jul 2020 at 13:59, Michal Hocko wrote: > > Smells like a different observable problem with the same/similar culprit > as > http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=x...@mail.gmail.com Before I start bisecting here I am sharing more details of the problem. > > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote: > > While running LTP mm test suite on x86_64 device the BUG: Bad page > > state in process > > noticed on linux-next 20200630 tag. Bad: next-20200621 Good: next-20200618 > > > > Steps to reproduce: > > - boot linux-next 20200630 kernel on x86_64 device > > - cd /opt/ltp > > - ./runltp -f mm > > > > metadata: > > git branch: master > > git repo: > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > git commit: f2b92b14533e646e434523abdbafddb727c23898 > > git describe: next-20200630 > > kernel-config: > > https://builds.tuxbuild.com/j60yrp7CUpq3LCmqMB8Wdg/kernel.config > > > > Test crash dump: > > [ 804.293956] [ 3261] 0 3261 4726385 3349389 26939392 > > 0 0 oom01 > > [ 804.302129] [ 3265] 0 3265 3187 3373728 > > 0 0 agetty > > [ 804.310397] > > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=oom01,pid=3261,uid=0 > > [ 804.322751] Out of memory: Killed process 3261 (oom01) > > total-vm:18905540kB, anon-rss:13397556kB, file-rss:0kB, shmem-rss:0kB, > > UID:0 pgtables:26308kB oom_score_adj:0 > > [ 806.652952] oom_reaper: reaped process 3261 (oom01), now > > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > [ 807.579373] BUG: Bad page state in process kworker/u8:12 pfn:374308 > > [ 807.579521] BUG: Bad page state in process kworker/u8:13 pfn:4182a4 [ 858.236961] [ cut here ] [ 858.236963] WARNING: CPU: 1 PID: 5526 at mm/page_counter.c:57 page_counter_cancel+0x3e/0x50 [ 858.236963] Modules linked in: x86_pkg_temp_thermal [ 858.236965] CPU: 1 PID: 5526 Comm: (agetty) Tainted: GB D 5.8.0-rc1-next-20200621 #1 [ 858.236966] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 858.236967] RIP: 0010:page_counter_cancel+0x3e/0x50 [ 858.236969] Code: 6e fe ff 48 89 d8 48 f7 d8 f0 49 0f c1 04 24 48 29 d8 4c 89 e7 48 89 c3 48 89 c6 e8 8c fe ff ff 48 85 db 78 05 5b 41 5c 5d c3 <0f> 0b 5b 41 5c 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 85 ff 74 [ 858.236970] RSP: 0018:888226ba7728 EFLAGS: 00010286 [ 858.236971] RAX: RBX: RCX: 936f547f [ 858.236972] RDX: RSI: dc00 RDI: 88822fb51601 [ 858.236973] RBP: 888226ba7738 R08: 111045f6a2c0 R09: ed1045f6a2bd [ 858.236974] R10: 88822fb515e0 R11: ed1045f6a2bc R12: 88822fb515d9 [ 858.236975] R13: 888226ba7808 R14: R15: 888226ba7800 [ 858.236976] FS: () GS:88823088() knlGS: [ 858.236976] CS: 0010 DS: ES: CR0: 80050033 [ 858.236977] CR2: 55a355aff1e8 CR3: 000223e8c001 CR4: 003606e0 [ 858.236978] DR0: DR1: DR2: [ 858.236979] DR3: DR6: fffe0ff0 DR7: 0400 [ 858.236979] Call Trace: [ 858.236980] page_counter_uncharge+0x1d/0x40 [ 858.236981] uncharge_batch+0x126/0x180 [ 858.236981] uncharge_page+0x48/0x190 [ 858.236982] mem_cgroup_uncharge_list+0xd8/0x130 [ 858.236983] ? mem_cgroup_uncharge+0x100/0x100 [ 858.236983] ? _raw_write_lock_irqsave+0xd0/0xd0 [ 858.236984] release_pages+0x56c/0x670 [ 858.236984] ? __put_compound_page+0x50/0x50 [ 858.236985] ? lru_add_drain_cpu+0xce/0x1d0 [ 858.236986] free_pages_and_swap_cache+0x134/0x150 [ 858.236986] tlb_batch_pages_flush+0x25/0x70 [ 858.236987] tlb_finish_mmu+0x68/0x3e0 [ 858.236988] exit_mmap+0x158/0x2b0 [ 858.236988] ? do_munmap+0x10/0x10 [ 858.236989] ? __kasan_check_write+0x14/0x20 [ 858.236989] ? mutex_unlock+0x1d/0x40 [ 858.236990] mmput+0xaf/0x200 [ 858.236991] begin_new_exec+0x737/0x11e7 [ 858.236991] load_elf_binary+0x4c3/0x23a4 [ 858.236992] ? fsnotify+0x5c6/0x5f0 [ 858.236992] ? fsnotify+0x5f0/0x5f0 [ 858.236993] ? __fsnotify_update_child_dentry_flags.part.0+0x180/0x180 [ 858.236994] ? elf_map+0x190/0x190 [ 858.236994] ? __kasan_check_write+0x14/0x20 [ 858.236995] ? load_misc_binary+0x118/0x490 [ 858.236996] __do_execve_file.isra.0+0xa39/0x1240 [ 858.236996] ? open_exec+0x50/0x50 [ 858.236997] ? __kasan_check_write+0x14/0x20 [ 858.236997] ? strncpy_from_user+0xb7/0x1c0 [ 858.236998] ? getname_flags+0x11b/0x2a0 [ 858.236999] __x64_sys_execve+0x54/0x70 [ 858.236999] do_syscall_64+0x43/0x70 [ 858.237000] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 858.237001] RIP: 0033:0x7f35029a1877 [ 858.237001] Code: Bad RIP value. [ 858.237002] RSP: 002b:7fff0a4b9fd8 EFLAGS: 0246 ORIG_RAX: 003b [ 858.237003
Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
Smells like a different observable problem with the same/similar culprit as http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=x...@mail.gmail.com On Wed 01-07-20 13:48:57, Naresh Kamboju wrote: > While running LTP mm test suite on x86_64 device the BUG: Bad page > state in process > noticed on linux-next 20200630 tag. > > Steps to reproduce: > - boot linux-next 20200630 kernel on x86_64 device > - cd /opt/ltp > - ./runltp -f mm > > metadata: > git branch: master > git repo: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > git commit: f2b92b14533e646e434523abdbafddb727c23898 > git describe: next-20200630 > kernel-config: > https://builds.tuxbuild.com/j60yrp7CUpq3LCmqMB8Wdg/kernel.config > > Test crash dump: > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE) > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0 > hugepages_surp=0 hugepages_size=2048kB > [ 803.930806] 2418 total pagecache pages > [ 803.934559] 0 pages in swap cache > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0 > [ 803.943108] Free swap = 0kB > [ 803.945997] Total swap = 0kB > [ 803.948885] 4181245 pages RAM > [ 803.951857] 0 pages HighMem/MovableOnly > [ 803.955695] 626062 pages reserved > [ 803.959016] Tasks state (memory values in pages): > [ 803.963722] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > [ 803.972336] [332] 0 332 8529 507 106496 > 0 0 systemd-journal > [ 803.981387] [349] 0 34910730 508 118784 > 0 -1000 systemd-udevd > [ 803.990262] [371] 993 371 8666 108 118784 > 0 0 systemd-network > [ 803.999306] [379] 992 379 9529 99 110592 > 0 0 systemd-resolve > [ 804.008347] [388] 0 388 2112 1961440 > 0 0 syslogd > [ 804.016709] [389] 995 389 9308 108 122880 > 0 0 avahi-daemon > [ 804.025517] [391] 0 391 1075 2157344 > 0 0 acpid > [ 804.033695] [394] 995 394 9277 68 114688 > 0 0 avahi-daemon > [ 804.042476] [396] 996 396 7241 154 102400 > 0 -900 dbus-daemon > [ 804.051170] [397] 0 397 2313 7265536 > 0 0 crond > [ 804.059349] [399] 0 39934025 161 167936 > 0 0 thermald > [ 804.067783] [400] 0 400 8615 115 110592 > 0 0 systemd-logind > [ 804.076734] [401] 0 401 2112 3257344 > 0 0 klogd > [ 804.084907] [449] 65534 449 3245 3969632 > 0 0 dnsmasq > [ 804.093254] [450] 0 450 3187 3373728 > 0 0 agetty > [ 804.101541] [452] 0 452 3187 3373728 > 0 0 agetty > [ 804.109826] [453] 0 45314707 107 159744 > 0 0 login > [ 804.118007] [463] 0 463 9532 163 122880 > 0 0 systemd > [ 804.126362] [464] 0 46416132 424 172032 > 0 0 (sd-pam) > [ 804.134803] [468] 0 468 4538 10581920 > 0 0 sh > [ 804.142741] [472] 0 47211102 83 131072 > 0 0 su > [ 804.150680] [473] 0 473 4538 9981920 > 0 0 sh > [ 804.158637] [519] 0 519 2396 5761440 > 0 0 lava-test-runne > [ 804.167700] [ 1220] 0 1220 2396 5261440 > 0 0 lava-test-shell > [ 804.176738] [ 1221] 0 1221 2396 5561440 > 0 0 sh > [ 804.184680] [ 1223] 0 1223 2462 13561440 > 0 0 ltp.sh > [ 804.192946] [ 1242] 0 1242 2462 13461440 > 0 0 ltp.sh > [ 804.201207] [ 1243] 0 1243 2462 13461440 > 0 0 ltp.sh > [ 804.209475] [ 1244] 0 1244 2462 13461440 > 0 0 ltp.sh > [ 804.217742] [ 1245] 0 1245 2561 22965536 > 0 0 runltp > [ 804.226010] [ 1246] 0 1246 1072 1553248 > 0 0 tee > [ 804.234012] [ 1313] 0 1313 1070 2953248 > 0 0 ltp-pan > [ 804.242374] [ 3216] 0 3216 1613 2053248 > 0 0 oom01 > [ 804.250554] [ 3217] 0 3217 1646 3157344 > 0 0 oom01 > [ 804.258728] [ 3245] 0 324581271 469 266240 > 0 0 NetworkManager > [ 804.267688] [ 3249] 0 3249 6422 5498304 > 0 0 systemd-hostnam > [ 804.276734] [ 3250] 0 325052976 178 172032 > 0