Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Wed, Feb 07, 2018 at 11:06:42AM -0800, Andrew Morton wrote: > From: Michal Hocko> Subject: net/netfilter/x_tables.c: remove size check > > Back in 2002 vmalloc used to BUG on too large sizes. We are much better > behaved these days and vmalloc simply returns NULL for those. Remove the > check as it simply not needed and the comment is even misleading. Applied, thanks!
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Wed, Feb 07, 2018 at 11:06:42AM -0800, Andrew Morton wrote: > From: Michal Hocko > Subject: net/netfilter/x_tables.c: remove size check > > Back in 2002 vmalloc used to BUG on too large sizes. We are much better > behaved these days and vmalloc simply returns NULL for those. Remove the > check as it simply not needed and the comment is even misleading. Applied, thanks!
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Wed, 7 Feb 2018 18:44:39 +0100 Pablo Neira Ayusowrote: > Hi, > > On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote: > [...] > > Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range. > > My excavation tools pointed me to "VM: Rework vmalloc code to support > > mapping of arbitray pages" > > by Christoph back in 2002. So yes, we can safely remove it finally. Se > > below. > > > > > > From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Wed, 31 Jan 2018 09:16:56 +0100 > > Subject: [PATCH] net/netfilter/x_tables.c: remove size check > > > > Back in 2002 vmalloc used to BUG on too large sizes. We are much better > > behaved these days and vmalloc simply returns NULL for those. Remove > > the check as it simply not needed and the comment even misleading. > > > > Suggested-by: Andrew Morton > > Signed-off-by: Michal Hocko > > --- > > net/netfilter/x_tables.c | 4 > > 1 file changed, 4 deletions(-) > > > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > > index b55ec5aa51a6..48a6ff620493 100644 > > --- a/net/netfilter/x_tables.c > > +++ b/net/netfilter/x_tables.c > > @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > > size) > > if (sz < sizeof(*info)) > > return NULL; > > > > - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ > > - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > > - return NULL; > > - > > /* __GFP_NORETRY is not fully supported by kvmalloc but it should > > * work reasonably well if sz is too large and bail out rather > > * than shoot all processes down before realizing there is nothing > > Patchwork didn't catch this patch for some reason, would you mind to > resend? From: Michal Hocko Subject: net/netfilter/x_tables.c: remove size check Back in 2002 vmalloc used to BUG on too large sizes. We are much better behaved these days and vmalloc simply returns NULL for those. Remove the check as it simply not needed and the comment is even misleading. Link: http://lkml.kernel.org/r/20180131081916.go21...@dhcp22.suse.cz Suggested-by: Andrew Morton Signed-off-by: Michal Hocko Reviewed-by: Andrew Morton Cc: Florian Westphal Cc: David S. Miller Signed-off-by: Andrew Morton --- net/netfilter/x_tables.c |4 1 file changed, 4 deletions(-) diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check net/netfilter/x_tables.c --- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check +++ a/net/netfilter/x_tables.c @@ -1004,10 +1004,6 @@ struct xt_table_info *xt_alloc_table_inf if (sz < sizeof(*info)) return NULL; - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ - if ((size >> PAGE_SHIFT) + 2 > totalram_pages) - return NULL; - /* __GFP_NORETRY is not fully supported by kvmalloc but it should * work reasonably well if sz is too large and bail out rather * than shoot all processes down before realizing there is nothing _
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Wed, 7 Feb 2018 18:44:39 +0100 Pablo Neira Ayuso wrote: > Hi, > > On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote: > [...] > > Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range. > > My excavation tools pointed me to "VM: Rework vmalloc code to support > > mapping of arbitray pages" > > by Christoph back in 2002. So yes, we can safely remove it finally. Se > > below. > > > > > > From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Wed, 31 Jan 2018 09:16:56 +0100 > > Subject: [PATCH] net/netfilter/x_tables.c: remove size check > > > > Back in 2002 vmalloc used to BUG on too large sizes. We are much better > > behaved these days and vmalloc simply returns NULL for those. Remove > > the check as it simply not needed and the comment even misleading. > > > > Suggested-by: Andrew Morton > > Signed-off-by: Michal Hocko > > --- > > net/netfilter/x_tables.c | 4 > > 1 file changed, 4 deletions(-) > > > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > > index b55ec5aa51a6..48a6ff620493 100644 > > --- a/net/netfilter/x_tables.c > > +++ b/net/netfilter/x_tables.c > > @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > > size) > > if (sz < sizeof(*info)) > > return NULL; > > > > - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ > > - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > > - return NULL; > > - > > /* __GFP_NORETRY is not fully supported by kvmalloc but it should > > * work reasonably well if sz is too large and bail out rather > > * than shoot all processes down before realizing there is nothing > > Patchwork didn't catch this patch for some reason, would you mind to > resend? From: Michal Hocko Subject: net/netfilter/x_tables.c: remove size check Back in 2002 vmalloc used to BUG on too large sizes. We are much better behaved these days and vmalloc simply returns NULL for those. Remove the check as it simply not needed and the comment is even misleading. Link: http://lkml.kernel.org/r/20180131081916.go21...@dhcp22.suse.cz Suggested-by: Andrew Morton Signed-off-by: Michal Hocko Reviewed-by: Andrew Morton Cc: Florian Westphal Cc: David S. Miller Signed-off-by: Andrew Morton --- net/netfilter/x_tables.c |4 1 file changed, 4 deletions(-) diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check net/netfilter/x_tables.c --- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check +++ a/net/netfilter/x_tables.c @@ -1004,10 +1004,6 @@ struct xt_table_info *xt_alloc_table_inf if (sz < sizeof(*info)) return NULL; - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ - if ((size >> PAGE_SHIFT) + 2 > totalram_pages) - return NULL; - /* __GFP_NORETRY is not fully supported by kvmalloc but it should * work reasonably well if sz is too large and bail out rather * than shoot all processes down before realizing there is nothing _
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Hi, On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote: [...] > Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range. > My excavation tools pointed me to "VM: Rework vmalloc code to support mapping > of arbitray pages" > by Christoph back in 2002. So yes, we can safely remove it finally. Se > below. > > > From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001 > From: Michal Hocko> Date: Wed, 31 Jan 2018 09:16:56 +0100 > Subject: [PATCH] net/netfilter/x_tables.c: remove size check > > Back in 2002 vmalloc used to BUG on too large sizes. We are much better > behaved these days and vmalloc simply returns NULL for those. Remove > the check as it simply not needed and the comment even misleading. > > Suggested-by: Andrew Morton > Signed-off-by: Michal Hocko > --- > net/netfilter/x_tables.c | 4 > 1 file changed, 4 deletions(-) > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index b55ec5aa51a6..48a6ff620493 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > size) > if (sz < sizeof(*info)) > return NULL; > > - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ > - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > - return NULL; > - > /* __GFP_NORETRY is not fully supported by kvmalloc but it should >* work reasonably well if sz is too large and bail out rather >* than shoot all processes down before realizing there is nothing Patchwork didn't catch this patch for some reason, would you mind to resend? Thanks!
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Hi, On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote: [...] > Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range. > My excavation tools pointed me to "VM: Rework vmalloc code to support mapping > of arbitray pages" > by Christoph back in 2002. So yes, we can safely remove it finally. Se > below. > > > From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Wed, 31 Jan 2018 09:16:56 +0100 > Subject: [PATCH] net/netfilter/x_tables.c: remove size check > > Back in 2002 vmalloc used to BUG on too large sizes. We are much better > behaved these days and vmalloc simply returns NULL for those. Remove > the check as it simply not needed and the comment even misleading. > > Suggested-by: Andrew Morton > Signed-off-by: Michal Hocko > --- > net/netfilter/x_tables.c | 4 > 1 file changed, 4 deletions(-) > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index b55ec5aa51a6..48a6ff620493 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > size) > if (sz < sizeof(*info)) > return NULL; > > - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ > - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > - return NULL; > - > /* __GFP_NORETRY is not fully supported by kvmalloc but it should >* work reasonably well if sz is too large and bail out rather >* than shoot all processes down before realizing there is nothing Patchwork didn't catch this patch for some reason, would you mind to resend? Thanks!
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, Jan 30, 2018 at 03:39:58PM +0100, Michal Hocko wrote: > On Tue 30-01-18 15:01:11, Florian Westphal wrote: > > > From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > > > From: Michal Hocko> > > Date: Tue, 30 Jan 2018 14:51:07 +0100 > > > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > > > Acked-by: Florian Westphal > > Thanks! How should we route this change? Andrew, David? I'll place this in the nf.git tree if everyone is happy with it.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, Jan 30, 2018 at 03:39:58PM +0100, Michal Hocko wrote: > On Tue 30-01-18 15:01:11, Florian Westphal wrote: > > > From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > > > From: Michal Hocko > > > Date: Tue, 30 Jan 2018 14:51:07 +0100 > > > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > > > Acked-by: Florian Westphal > > Thanks! How should we route this change? Andrew, David? I'll place this in the nf.git tree if everyone is happy with it.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 11:27:45, Andrew Morton wrote: > On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hockowrote: > > > > Well, this is not about syzkaller, it merely pointed out a potential > > > DoS... And that has to be addressed somehow. > > > > So how about this? > > --- > > argh ;) doh, those hardwired moves... > > >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Tue, 30 Jan 2018 14:51:07 +0100 > > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > > > syzbot has noticed that xt_alloc_table_info can allocate a lot of > > memory. This is an admin only interface but an admin in a namespace > > is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use > > kvmalloc() in xt_alloc_table_info()") has changed the opencoded > > kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on > > the way because vmalloc has simply never fully supported __GFP_NORETRY > > semantic. This is still the case because e.g. page tables backing the > > vmalloc area are hardcoded GFP_KERNEL. > > > > Revert back to __GFP_NORETRY as a poors man defence against excessively > > large allocation request here. We will not rule out the OOM killer > > completely but __GFP_NORETRY should at least stop the large request > > in most cases. > > > > Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in > > xt_alloc_table_info()") > > Signed-off-by: Michal Hocko > > --- > > net/netfilter/x_tables.c | 8 +++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > > index d8571f414208..a5f5c29bcbdc 100644 > > --- a/net/netfilter/x_tables.c > > +++ b/net/netfilter/x_tables.c > > @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned > > int size) > > if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > > return NULL; > > offtopic: preceding comment here is "prevent them from hitting BUG() in > vmalloc.c". I suspect this is ancient code and vmalloc sure as heck > shouldn't go BUG with this input. And it should be using `sz' ;) Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range. My excavation tools pointed me to "VM: Rework vmalloc code to support mapping of arbitray pages" by Christoph back in 2002. So yes, we can safely remove it finally. Se below. >From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 31 Jan 2018 09:16:56 +0100 Subject: [PATCH] net/netfilter/x_tables.c: remove size check Back in 2002 vmalloc used to BUG on too large sizes. We are much better behaved these days and vmalloc simply returns NULL for those. Remove the check as it simply not needed and the comment even misleading. Suggested-by: Andrew Morton Signed-off-by: Michal Hocko --- net/netfilter/x_tables.c | 4 1 file changed, 4 deletions(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index b55ec5aa51a6..48a6ff620493 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if (sz < sizeof(*info)) return NULL; - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) - return NULL; - /* __GFP_NORETRY is not fully supported by kvmalloc but it should * work reasonably well if sz is too large and bail out rather * than shoot all processes down before realizing there is nothing -- 2.15.1 -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 11:27:45, Andrew Morton wrote: > On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hocko wrote: > > > > Well, this is not about syzkaller, it merely pointed out a potential > > > DoS... And that has to be addressed somehow. > > > > So how about this? > > --- > > argh ;) doh, those hardwired moves... > > >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Tue, 30 Jan 2018 14:51:07 +0100 > > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > > > syzbot has noticed that xt_alloc_table_info can allocate a lot of > > memory. This is an admin only interface but an admin in a namespace > > is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use > > kvmalloc() in xt_alloc_table_info()") has changed the opencoded > > kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on > > the way because vmalloc has simply never fully supported __GFP_NORETRY > > semantic. This is still the case because e.g. page tables backing the > > vmalloc area are hardcoded GFP_KERNEL. > > > > Revert back to __GFP_NORETRY as a poors man defence against excessively > > large allocation request here. We will not rule out the OOM killer > > completely but __GFP_NORETRY should at least stop the large request > > in most cases. > > > > Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in > > xt_alloc_table_info()") > > Signed-off-by: Michal Hocko > > --- > > net/netfilter/x_tables.c | 8 +++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > > index d8571f414208..a5f5c29bcbdc 100644 > > --- a/net/netfilter/x_tables.c > > +++ b/net/netfilter/x_tables.c > > @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned > > int size) > > if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > > return NULL; > > offtopic: preceding comment here is "prevent them from hitting BUG() in > vmalloc.c". I suspect this is ancient code and vmalloc sure as heck > shouldn't go BUG with this input. And it should be using `sz' ;) Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range. My excavation tools pointed me to "VM: Rework vmalloc code to support mapping of arbitray pages" by Christoph back in 2002. So yes, we can safely remove it finally. Se below. >From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 31 Jan 2018 09:16:56 +0100 Subject: [PATCH] net/netfilter/x_tables.c: remove size check Back in 2002 vmalloc used to BUG on too large sizes. We are much better behaved these days and vmalloc simply returns NULL for those. Remove the check as it simply not needed and the comment even misleading. Suggested-by: Andrew Morton Signed-off-by: Michal Hocko --- net/netfilter/x_tables.c | 4 1 file changed, 4 deletions(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index b55ec5aa51a6..48a6ff620493 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if (sz < sizeof(*info)) return NULL; - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */ - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) - return NULL; - /* __GFP_NORETRY is not fully supported by kvmalloc but it should * work reasonably well if sz is too large and bail out rather * than shoot all processes down before realizing there is nothing -- 2.15.1 -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hockowrote: > > Well, this is not about syzkaller, it merely pointed out a potential > > DoS... And that has to be addressed somehow. > > So how about this? > --- argh ;) > >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 30 Jan 2018 14:51:07 +0100 > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > syzbot has noticed that xt_alloc_table_info can allocate a lot of > memory. This is an admin only interface but an admin in a namespace > is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use > kvmalloc() in xt_alloc_table_info()") has changed the opencoded > kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on > the way because vmalloc has simply never fully supported __GFP_NORETRY > semantic. This is still the case because e.g. page tables backing the > vmalloc area are hardcoded GFP_KERNEL. > > Revert back to __GFP_NORETRY as a poors man defence against excessively > large allocation request here. We will not rule out the OOM killer > completely but __GFP_NORETRY should at least stop the large request > in most cases. > > Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in > xt_alloc_table_info()") > Signed-off-by: Michal Hocko > --- > net/netfilter/x_tables.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index d8571f414208..a5f5c29bcbdc 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > size) > if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > return NULL; offtopic: preceding comment here is "prevent them from hitting BUG() in vmalloc.c". I suspect this is ancient code and vmalloc sure as heck shouldn't go BUG with this input. And it should be using `sz' ;) So I suspect and hope that this code can be removed. If not, let's fix vmalloc! > - info = kvmalloc(sz, GFP_KERNEL); > + /* > + * __GFP_NORETRY is not fully supported by kvmalloc but it should > + * work reasonably well if sz is too large and bail out rather > + * than shoot all processes down before realizing there is nothing > + * more to reclaim. > + */ > + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); > if (!info) > return NULL; checkpatch sayeth networking block comments don't use an empty /* line, use /* Comment... So I'll do that and shall scoot the patch Davewards.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, 30 Jan 2018 15:01:04 +0100 Michal Hocko wrote: > > Well, this is not about syzkaller, it merely pointed out a potential > > DoS... And that has to be addressed somehow. > > So how about this? > --- argh ;) > >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 30 Jan 2018 14:51:07 +0100 > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > syzbot has noticed that xt_alloc_table_info can allocate a lot of > memory. This is an admin only interface but an admin in a namespace > is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use > kvmalloc() in xt_alloc_table_info()") has changed the opencoded > kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on > the way because vmalloc has simply never fully supported __GFP_NORETRY > semantic. This is still the case because e.g. page tables backing the > vmalloc area are hardcoded GFP_KERNEL. > > Revert back to __GFP_NORETRY as a poors man defence against excessively > large allocation request here. We will not rule out the OOM killer > completely but __GFP_NORETRY should at least stop the large request > in most cases. > > Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in > xt_alloc_table_info()") > Signed-off-by: Michal Hocko > --- > net/netfilter/x_tables.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index d8571f414208..a5f5c29bcbdc 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int > size) > if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) > return NULL; offtopic: preceding comment here is "prevent them from hitting BUG() in vmalloc.c". I suspect this is ancient code and vmalloc sure as heck shouldn't go BUG with this input. And it should be using `sz' ;) So I suspect and hope that this code can be removed. If not, let's fix vmalloc! > - info = kvmalloc(sz, GFP_KERNEL); > + /* > + * __GFP_NORETRY is not fully supported by kvmalloc but it should > + * work reasonably well if sz is too large and bail out rather > + * than shoot all processes down before realizing there is nothing > + * more to reclaim. > + */ > + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); > if (!info) > return NULL; checkpatch sayeth networking block comments don't use an empty /* line, use /* Comment... So I'll do that and shall scoot the patch Davewards.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 15:01:11, Florian Westphal wrote: > > From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > > From: Michal Hocko> > Date: Tue, 30 Jan 2018 14:51:07 +0100 > > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > Acked-by: Florian Westphal Thanks! How should we route this change? Andrew, David? -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 15:01:11, Florian Westphal wrote: > > From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Tue, 30 Jan 2018 14:51:07 +0100 > > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive > > Acked-by: Florian Westphal Thanks! How should we route this change? Andrew, David? -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
> From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > From: Michal Hocko> Date: Tue, 30 Jan 2018 14:51:07 +0100 > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive Acked-by: Florian Westphal
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
> From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 30 Jan 2018 14:51:07 +0100 > Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive Acked-by: Florian Westphal
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 10:57:39, Michal Hocko wrote: > On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote: > > On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov > >wrote: > > > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: > > >> Michal Hocko wrote: > > >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > >> > > Kirill A. Shutemov wrote: > > >> > [...] > > >> > > > I hate what I'm saying, but I guess we need some tunable here. > > >> > > > Not sure what exactly. > > >> > > > > >> > > Would memcg help? > > >> > > > >> > That really depends. I would have to check whether vmalloc path obeys > > >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > > >> > that shouldn't be a big deal). But then the other potential problem is > > >> > the life time of the xt_table_info (or other potentially large) data > > >> > structures. Are they bound to any process life time. > > >> > > >> No. > > > > > > Well, IIUC they bound to net namespace life time, so killing all > > > proccesses in the namespace would help to get memory back. :) > > > > ... unless the namespace is mounted into file system. > > > > Let's start with NOWARN as that's what kernel generally uses for > > allocations with user-controllable size. ENOMEM is roughly as > > informative as the WARNING message in this case. > > You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc > right now. More specifically kvmalloc doesn't guanratee that the request > will not trigger the OOM killer (like regular __GFP_NORETRY). This is > because of internal vmalloc restrictions. If you are however OK to > simply bail out in most cases then __GFP_NORETRY should work reasonably > fine. > > > I think we also need to consider setting up memory cgroup for > > syzkaller test processes (we do RLIMIT_AS, but that's weak). > > Well, this is not about syzkaller, it merely pointed out a potential > DoS... And that has to be addressed somehow. So how about this? --- >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 30 Jan 2018 14:51:07 +0100 Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive syzbot has noticed that xt_alloc_table_info can allocate a lot of memory. This is an admin only interface but an admin in a namespace is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on the way because vmalloc has simply never fully supported __GFP_NORETRY semantic. This is still the case because e.g. page tables backing the vmalloc area are hardcoded GFP_KERNEL. Revert back to __GFP_NORETRY as a poors man defence against excessively large allocation request here. We will not rule out the OOM killer completely but __GFP_NORETRY should at least stop the large request in most cases. Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") Signed-off-by: Michal Hocko --- net/netfilter/x_tables.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index d8571f414208..a5f5c29bcbdc 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) return NULL; - info = kvmalloc(sz, GFP_KERNEL); + /* +* __GFP_NORETRY is not fully supported by kvmalloc but it should +* work reasonably well if sz is too large and bail out rather +* than shoot all processes down before realizing there is nothing +* more to reclaim. +*/ + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); if (!info) return NULL; -- 2.15.1 -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 10:57:39, Michal Hocko wrote: > On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote: > > On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov > > wrote: > > > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: > > >> Michal Hocko wrote: > > >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > >> > > Kirill A. Shutemov wrote: > > >> > [...] > > >> > > > I hate what I'm saying, but I guess we need some tunable here. > > >> > > > Not sure what exactly. > > >> > > > > >> > > Would memcg help? > > >> > > > >> > That really depends. I would have to check whether vmalloc path obeys > > >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > > >> > that shouldn't be a big deal). But then the other potential problem is > > >> > the life time of the xt_table_info (or other potentially large) data > > >> > structures. Are they bound to any process life time. > > >> > > >> No. > > > > > > Well, IIUC they bound to net namespace life time, so killing all > > > proccesses in the namespace would help to get memory back. :) > > > > ... unless the namespace is mounted into file system. > > > > Let's start with NOWARN as that's what kernel generally uses for > > allocations with user-controllable size. ENOMEM is roughly as > > informative as the WARNING message in this case. > > You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc > right now. More specifically kvmalloc doesn't guanratee that the request > will not trigger the OOM killer (like regular __GFP_NORETRY). This is > because of internal vmalloc restrictions. If you are however OK to > simply bail out in most cases then __GFP_NORETRY should work reasonably > fine. > > > I think we also need to consider setting up memory cgroup for > > syzkaller test processes (we do RLIMIT_AS, but that's weak). > > Well, this is not about syzkaller, it merely pointed out a potential > DoS... And that has to be addressed somehow. So how about this? --- >From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 30 Jan 2018 14:51:07 +0100 Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive syzbot has noticed that xt_alloc_table_info can allocate a lot of memory. This is an admin only interface but an admin in a namespace is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on the way because vmalloc has simply never fully supported __GFP_NORETRY semantic. This is still the case because e.g. page tables backing the vmalloc area are hardcoded GFP_KERNEL. Revert back to __GFP_NORETRY as a poors man defence against excessively large allocation request here. We will not rule out the OOM killer completely but __GFP_NORETRY should at least stop the large request in most cases. Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") Signed-off-by: Michal Hocko --- net/netfilter/x_tables.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index d8571f414208..a5f5c29bcbdc 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) return NULL; - info = kvmalloc(sz, GFP_KERNEL); + /* +* __GFP_NORETRY is not fully supported by kvmalloc but it should +* work reasonably well if sz is too large and bail out rather +* than shoot all processes down before realizing there is nothing +* more to reclaim. +*/ + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); if (!info) return NULL; -- 2.15.1 -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote: > On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov >wrote: > > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: > >> Michal Hocko wrote: > >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > >> > > Kirill A. Shutemov wrote: > >> > [...] > >> > > > I hate what I'm saying, but I guess we need some tunable here. > >> > > > Not sure what exactly. > >> > > > >> > > Would memcg help? > >> > > >> > That really depends. I would have to check whether vmalloc path obeys > >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > >> > that shouldn't be a big deal). But then the other potential problem is > >> > the life time of the xt_table_info (or other potentially large) data > >> > structures. Are they bound to any process life time. > >> > >> No. > > > > Well, IIUC they bound to net namespace life time, so killing all > > proccesses in the namespace would help to get memory back. :) > > ... unless the namespace is mounted into file system. > > Let's start with NOWARN as that's what kernel generally uses for > allocations with user-controllable size. ENOMEM is roughly as > informative as the WARNING message in this case. You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc right now. More specifically kvmalloc doesn't guanratee that the request will not trigger the OOM killer (like regular __GFP_NORETRY). This is because of internal vmalloc restrictions. If you are however OK to simply bail out in most cases then __GFP_NORETRY should work reasonably fine. > I think we also need to consider setting up memory cgroup for > syzkaller test processes (we do RLIMIT_AS, but that's weak). Well, this is not about syzkaller, it merely pointed out a potential DoS... And that has to be addressed somehow. -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote: > On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov > wrote: > > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: > >> Michal Hocko wrote: > >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > >> > > Kirill A. Shutemov wrote: > >> > [...] > >> > > > I hate what I'm saying, but I guess we need some tunable here. > >> > > > Not sure what exactly. > >> > > > >> > > Would memcg help? > >> > > >> > That really depends. I would have to check whether vmalloc path obeys > >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > >> > that shouldn't be a big deal). But then the other potential problem is > >> > the life time of the xt_table_info (or other potentially large) data > >> > structures. Are they bound to any process life time. > >> > >> No. > > > > Well, IIUC they bound to net namespace life time, so killing all > > proccesses in the namespace would help to get memory back. :) > > ... unless the namespace is mounted into file system. > > Let's start with NOWARN as that's what kernel generally uses for > allocations with user-controllable size. ENOMEM is roughly as > informative as the WARNING message in this case. You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc right now. More specifically kvmalloc doesn't guanratee that the request will not trigger the OOM killer (like regular __GFP_NORETRY). This is because of internal vmalloc restrictions. If you are however OK to simply bail out in most cases then __GFP_NORETRY should work reasonably fine. > I think we also need to consider setting up memory cgroup for > syzkaller test processes (we do RLIMIT_AS, but that's weak). Well, this is not about syzkaller, it merely pointed out a potential DoS... And that has to be addressed somehow. -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 09:11:27, Florian Westphal wrote: > Michal Hockowrote: > > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > > Kirill A. Shutemov wrote: > > [...] > > > > I hate what I'm saying, but I guess we need some tunable here. > > > > Not sure what exactly. > > > > > > Would memcg help? > > > > That really depends. I would have to check whether vmalloc path obeys > > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > > that shouldn't be a big deal). But then the other potential problem is > > the life time of the xt_table_info (or other potentially large) data > > structures. Are they bound to any process life time. > > No. > > > Because if they are > > not then the OOM killer will not help. The OOM panic earlier in this > > thread suggests it doesn't because the test case managed to eat all the > > available memory and killed all the eligible tasks which didn't help. > > Yes, which is why we do not want any OOM killer invocation in first > place... The problem is that as soon as you eat that memory and ask for more until you fail with ENOMEM then the OOM is simply unavoidable. -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue 30-01-18 09:11:27, Florian Westphal wrote: > Michal Hocko wrote: > > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > > Kirill A. Shutemov wrote: > > [...] > > > > I hate what I'm saying, but I guess we need some tunable here. > > > > Not sure what exactly. > > > > > > Would memcg help? > > > > That really depends. I would have to check whether vmalloc path obeys > > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > > that shouldn't be a big deal). But then the other potential problem is > > the life time of the xt_table_info (or other potentially large) data > > structures. Are they bound to any process life time. > > No. > > > Because if they are > > not then the OOM killer will not help. The OOM panic earlier in this > > thread suggests it doesn't because the test case managed to eat all the > > available memory and killed all the eligible tasks which didn't help. > > Yes, which is why we do not want any OOM killer invocation in first > place... The problem is that as soon as you eat that memory and ask for more until you fail with ENOMEM then the OOM is simply unavoidable. -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemovwrote: > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: >> Michal Hocko wrote: >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote: >> > > Kirill A. Shutemov wrote: >> > [...] >> > > > I hate what I'm saying, but I guess we need some tunable here. >> > > > Not sure what exactly. >> > > >> > > Would memcg help? >> > >> > That really depends. I would have to check whether vmalloc path obeys >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but >> > that shouldn't be a big deal). But then the other potential problem is >> > the life time of the xt_table_info (or other potentially large) data >> > structures. Are they bound to any process life time. >> >> No. > > Well, IIUC they bound to net namespace life time, so killing all > proccesses in the namespace would help to get memory back. :) ... unless the namespace is mounted into file system. Let's start with NOWARN as that's what kernel generally uses for allocations with user-controllable size. ENOMEM is roughly as informative as the WARNING message in this case. I think we also need to consider setting up memory cgroup for syzkaller test processes (we do RLIMIT_AS, but that's weak).
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov wrote: > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: >> Michal Hocko wrote: >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote: >> > > Kirill A. Shutemov wrote: >> > [...] >> > > > I hate what I'm saying, but I guess we need some tunable here. >> > > > Not sure what exactly. >> > > >> > > Would memcg help? >> > >> > That really depends. I would have to check whether vmalloc path obeys >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but >> > that shouldn't be a big deal). But then the other potential problem is >> > the life time of the xt_table_info (or other potentially large) data >> > structures. Are they bound to any process life time. >> >> No. > > Well, IIUC they bound to net namespace life time, so killing all > proccesses in the namespace would help to get memory back. :) ... unless the namespace is mounted into file system. Let's start with NOWARN as that's what kernel generally uses for allocations with user-controllable size. ENOMEM is roughly as informative as the WARNING message in this case. I think we also need to consider setting up memory cgroup for syzkaller test processes (we do RLIMIT_AS, but that's weak).
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: > Michal Hockowrote: > > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > > Kirill A. Shutemov wrote: > > [...] > > > > I hate what I'm saying, but I guess we need some tunable here. > > > > Not sure what exactly. > > > > > > Would memcg help? > > > > That really depends. I would have to check whether vmalloc path obeys > > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > > that shouldn't be a big deal). But then the other potential problem is > > the life time of the xt_table_info (or other potentially large) data > > structures. Are they bound to any process life time. > > No. Well, IIUC they bound to net namespace life time, so killing all proccesses in the namespace would help to get memory back. :) -- Kirill A. Shutemov
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: > Michal Hocko wrote: > > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > > Kirill A. Shutemov wrote: > > [...] > > > > I hate what I'm saying, but I guess we need some tunable here. > > > > Not sure what exactly. > > > > > > Would memcg help? > > > > That really depends. I would have to check whether vmalloc path obeys > > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > > that shouldn't be a big deal). But then the other potential problem is > > the life time of the xt_table_info (or other potentially large) data > > structures. Are they bound to any process life time. > > No. Well, IIUC they bound to net namespace life time, so killing all proccesses in the namespace would help to get memory back. :) -- Kirill A. Shutemov
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Michal Hockowrote: > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > Kirill A. Shutemov wrote: > [...] > > > I hate what I'm saying, but I guess we need some tunable here. > > > Not sure what exactly. > > > > Would memcg help? > > That really depends. I would have to check whether vmalloc path obeys > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > that shouldn't be a big deal). But then the other potential problem is > the life time of the xt_table_info (or other potentially large) data > structures. Are they bound to any process life time. No. > Because if they are > not then the OOM killer will not help. The OOM panic earlier in this > thread suggests it doesn't because the test case managed to eat all the > available memory and killed all the eligible tasks which didn't help. Yes, which is why we do not want any OOM killer invocation in first place... > So in some sense the memcg would help to stop the excessive allocation, > but it wouldn't resolve it other than kill all tasks in the affected > memcg/container. Whether this is sufficient or not, I dunno. It sounds > quite suboptimal to me. But it is true this would be less tricky then > adding a global knob... Global knob doesn't really help at all, I can add multiple large iptables rulesets (so we would have to account), and we have same issue in virtually all of networking, so we need limits for interface count, tunnel count, ipsec policies/SAs, nftables, tc, etc etc.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Michal Hocko wrote: > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > Kirill A. Shutemov wrote: > [...] > > > I hate what I'm saying, but I guess we need some tunable here. > > > Not sure what exactly. > > > > Would memcg help? > > That really depends. I would have to check whether vmalloc path obeys > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > that shouldn't be a big deal). But then the other potential problem is > the life time of the xt_table_info (or other potentially large) data > structures. Are they bound to any process life time. No. > Because if they are > not then the OOM killer will not help. The OOM panic earlier in this > thread suggests it doesn't because the test case managed to eat all the > available memory and killed all the eligible tasks which didn't help. Yes, which is why we do not want any OOM killer invocation in first place... > So in some sense the memcg would help to stop the excessive allocation, > but it wouldn't resolve it other than kill all tasks in the affected > memcg/container. Whether this is sufficient or not, I dunno. It sounds > quite suboptimal to me. But it is true this would be less tricky then > adding a global knob... Global knob doesn't really help at all, I can add multiple large iptables rulesets (so we would have to account), and we have same issue in virtually all of networking, so we need limits for interface count, tunnel count, ipsec policies/SAs, nftables, tc, etc etc.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon 29-01-18 23:35:22, Florian Westphal wrote: > Kirill A. Shutemovwrote: [...] > > I hate what I'm saying, but I guess we need some tunable here. > > Not sure what exactly. > > Would memcg help? That really depends. I would have to check whether vmalloc path obeys __GFP_ACCOUNT (I suspect it does except for page tables allocations but that shouldn't be a big deal). But then the other potential problem is the life time of the xt_table_info (or other potentially large) data structures. Are they bound to any process life time. Because if they are not then the OOM killer will not help. The OOM panic earlier in this thread suggests it doesn't because the test case managed to eat all the available memory and killed all the eligible tasks which didn't help. So in some sense the memcg would help to stop the excessive allocation, but it wouldn't resolve it other than kill all tasks in the affected memcg/container. Whether this is sufficient or not, I dunno. It sounds quite suboptimal to me. But it is true this would be less tricky then adding a global knob... -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon 29-01-18 23:35:22, Florian Westphal wrote: > Kirill A. Shutemov wrote: [...] > > I hate what I'm saying, but I guess we need some tunable here. > > Not sure what exactly. > > Would memcg help? That really depends. I would have to check whether vmalloc path obeys __GFP_ACCOUNT (I suspect it does except for page tables allocations but that shouldn't be a big deal). But then the other potential problem is the life time of the xt_table_info (or other potentially large) data structures. Are they bound to any process life time. Because if they are not then the OOM killer will not help. The OOM panic earlier in this thread suggests it doesn't because the test case managed to eat all the available memory and killed all the eligible tasks which didn't help. So in some sense the memcg would help to stop the excessive allocation, but it wouldn't resolve it other than kill all tasks in the affected memcg/container. Whether this is sufficient or not, I dunno. It sounds quite suboptimal to me. But it is true this would be less tricky then adding a global knob... -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Kirill A. Shutemovwrote: > On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote: > > Kirill A. Shutemov wrote: > > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: > > > > > back > > > > > off when the current task is killed") but then became unkillable by > > > > > commit > > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > > > killed""). Therefore, we can't handle this problem from MM side. > > > > > Please consider adding some limit from networking side. > > > > > > > > I don't know what "some limit" would be. I would prefer if there was > > > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > > > > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > > > arbitrary large buffer in kernel. > > > > Isn't that what we do everywhere in network stack? > > > > I think we should try to allocate whatever amount of memory is needed > > for the given xtables ruleset, given that is what admin requested us to do. > > Is it correct that "admin" in this case is root in random container? Yes. > I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET? Yes. > This can be fun. Do we prevent "admin in random container" to insert 2m ipv6 routes (alternatively: ipsec tunnels, interfaces etc etc)? > > I also would not know what limit is sane -- I've seen setups with as much > > as 100k iptables rules, and that was 5 years ago. > > > > And even if we add a "Xk rules" limit, it might be too much for > > low-memory systems, or not enough for whatever other use case there > > might be. > > I hate what I'm saying, but I guess we need some tunable here. > Not sure what exactly. Would memcg help? (I don't buy the "run untrusted binaries on linux is safe" thing, so I would not know).
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Kirill A. Shutemov wrote: > On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote: > > Kirill A. Shutemov wrote: > > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: > > > > > back > > > > > off when the current task is killed") but then became unkillable by > > > > > commit > > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > > > killed""). Therefore, we can't handle this problem from MM side. > > > > > Please consider adding some limit from networking side. > > > > > > > > I don't know what "some limit" would be. I would prefer if there was > > > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > > > > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > > > arbitrary large buffer in kernel. > > > > Isn't that what we do everywhere in network stack? > > > > I think we should try to allocate whatever amount of memory is needed > > for the given xtables ruleset, given that is what admin requested us to do. > > Is it correct that "admin" in this case is root in random container? Yes. > I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET? Yes. > This can be fun. Do we prevent "admin in random container" to insert 2m ipv6 routes (alternatively: ipsec tunnels, interfaces etc etc)? > > I also would not know what limit is sane -- I've seen setups with as much > > as 100k iptables rules, and that was 5 years ago. > > > > And even if we add a "Xk rules" limit, it might be too much for > > low-memory systems, or not enough for whatever other use case there > > might be. > > I hate what I'm saying, but I guess we need some tunable here. > Not sure what exactly. Would memcg help? (I don't buy the "run untrusted binaries on linux is safe" thing, so I would not know).
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Monday 2018-01-29 17:57, Florian Westphal wrote: >> > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back >> > > off when the current task is killed") but then became unkillable by >> > > commit >> > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is >> > > killed""). Therefore, we can't handle this problem from MM side. >> > > Please consider adding some limit from networking side. >> > >> > I don't know what "some limit" would be. I would prefer if there was >> > a way to supress OOM Killer in first place so we can just -ENOMEM user. >> >> Just supressing OOM kill is a bad idea. We still leave a way to allocate >> arbitrary large buffer in kernel. At the very least, mm could limit that kind of "arbitrary". If the machine has x GB (swap included) and the admin tries to make the kernel allocate space for an x GB ruleset, no way is it going to be satisfiable _even with OOM_. >I think we should try to allocate whatever amount of memory is needed >for the given xtables ruleset, given that is what admin requested us to do.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Monday 2018-01-29 17:57, Florian Westphal wrote: >> > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back >> > > off when the current task is killed") but then became unkillable by >> > > commit >> > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is >> > > killed""). Therefore, we can't handle this problem from MM side. >> > > Please consider adding some limit from networking side. >> > >> > I don't know what "some limit" would be. I would prefer if there was >> > a way to supress OOM Killer in first place so we can just -ENOMEM user. >> >> Just supressing OOM kill is a bad idea. We still leave a way to allocate >> arbitrary large buffer in kernel. At the very least, mm could limit that kind of "arbitrary". If the machine has x GB (swap included) and the admin tries to make the kernel allocate space for an x GB ruleset, no way is it going to be satisfiable _even with OOM_. >I think we should try to allocate whatever amount of memory is needed >for the given xtables ruleset, given that is what admin requested us to do.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote: > Kirill A. Shutemovwrote: > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: > > > > back > > > > off when the current task is killed") but then became unkillable by > > > > commit > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > > killed""). Therefore, we can't handle this problem from MM side. > > > > Please consider adding some limit from networking side. > > > > > > I don't know what "some limit" would be. I would prefer if there was > > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > > arbitrary large buffer in kernel. > > Isn't that what we do everywhere in network stack? > > I think we should try to allocate whatever amount of memory is needed > for the given xtables ruleset, given that is what admin requested us to do. Is it correct that "admin" in this case is root in random container? I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET? This can be fun. > I also would not know what limit is sane -- I've seen setups with as much > as 100k iptables rules, and that was 5 years ago. > > And even if we add a "Xk rules" limit, it might be too much for > low-memory systems, or not enough for whatever other use case there > might be. I hate what I'm saying, but I guess we need some tunable here. Not sure what exactly. -- Kirill A. Shutemov
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon, Jan 29, 2018 at 05:57:22PM +0100, Florian Westphal wrote: > Kirill A. Shutemov wrote: > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: > > > > back > > > > off when the current task is killed") but then became unkillable by > > > > commit > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > > killed""). Therefore, we can't handle this problem from MM side. > > > > Please consider adding some limit from networking side. > > > > > > I don't know what "some limit" would be. I would prefer if there was > > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > > arbitrary large buffer in kernel. > > Isn't that what we do everywhere in network stack? > > I think we should try to allocate whatever amount of memory is needed > for the given xtables ruleset, given that is what admin requested us to do. Is it correct that "admin" in this case is root in random container? I mean, can we get access to it with CLONE_NEWUSER|CLONE_NEWNET? This can be fun. > I also would not know what limit is sane -- I've seen setups with as much > as 100k iptables rules, and that was 5 years ago. > > And even if we add a "Xk rules" limit, it might be too much for > low-memory systems, or not enough for whatever other use case there > might be. I hate what I'm saying, but I guess we need some tunable here. Not sure what exactly. -- Kirill A. Shutemov
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon 29-01-18 17:57:22, Florian Westphal wrote: > Kirill A. Shutemovwrote: > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: > > > > back > > > > off when the current task is killed") but then became unkillable by > > > > commit > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > > killed""). Therefore, we can't handle this problem from MM side. > > > > Please consider adding some limit from networking side. > > > > > > I don't know what "some limit" would be. I would prefer if there was > > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > > arbitrary large buffer in kernel. > > Isn't that what we do everywhere in network stack? > > I think we should try to allocate whatever amount of memory is needed > for the given xtables ruleset, given that is what admin requested us to do. If this is a root only thing then __GFP_NORETRY sounds like the most straightforward way to go. -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon 29-01-18 17:57:22, Florian Westphal wrote: > Kirill A. Shutemov wrote: > > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: > > > > back > > > > off when the current task is killed") but then became unkillable by > > > > commit > > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > > killed""). Therefore, we can't handle this problem from MM side. > > > > Please consider adding some limit from networking side. > > > > > > I don't know what "some limit" would be. I would prefer if there was > > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > > arbitrary large buffer in kernel. > > Isn't that what we do everywhere in network stack? > > I think we should try to allocate whatever amount of memory is needed > for the given xtables ruleset, given that is what admin requested us to do. If this is a root only thing then __GFP_NORETRY sounds like the most straightforward way to go. -- Michal Hocko SUSE Labs
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Kirill A. Shutemovwrote: > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back > > > off when the current task is killed") but then became unkillable by commit > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > killed""). Therefore, we can't handle this problem from MM side. > > > Please consider adding some limit from networking side. > > > > I don't know what "some limit" would be. I would prefer if there was > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > arbitrary large buffer in kernel. Isn't that what we do everywhere in network stack? I think we should try to allocate whatever amount of memory is needed for the given xtables ruleset, given that is what admin requested us to do. I also would not know what limit is sane -- I've seen setups with as much as 100k iptables rules, and that was 5 years ago. And even if we add a "Xk rules" limit, it might be too much for low-memory systems, or not enough for whatever other use case there might be.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Kirill A. Shutemov wrote: > On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back > > > off when the current task is killed") but then became unkillable by commit > > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > > killed""). Therefore, we can't handle this problem from MM side. > > > Please consider adding some limit from networking side. > > > > I don't know what "some limit" would be. I would prefer if there was > > a way to supress OOM Killer in first place so we can just -ENOMEM user. > > Just supressing OOM kill is a bad idea. We still leave a way to allocate > arbitrary large buffer in kernel. Isn't that what we do everywhere in network stack? I think we should try to allocate whatever amount of memory is needed for the given xtables ruleset, given that is what admin requested us to do. I also would not know what limit is sane -- I've seen setups with as much as 100k iptables rules, and that was 5 years ago. And even if we add a "Xk rules" limit, it might be too much for low-memory systems, or not enough for whatever other use case there might be.
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back > > off when the current task is killed") but then became unkillable by commit > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > killed""). Therefore, we can't handle this problem from MM side. > > Please consider adding some limit from networking side. > > I don't know what "some limit" would be. I would prefer if there was > a way to supress OOM Killer in first place so we can just -ENOMEM user. Just supressing OOM kill is a bad idea. We still leave a way to allocate arbitrary large buffer in kernel. -- Kirill A. Shutemov
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
On Mon, Jan 29, 2018 at 08:23:57AM +0100, Florian Westphal wrote: > > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back > > off when the current task is killed") but then became unkillable by commit > > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > > killed""). Therefore, we can't handle this problem from MM side. > > Please consider adding some limit from networking side. > > I don't know what "some limit" would be. I would prefer if there was > a way to supress OOM Killer in first place so we can just -ENOMEM user. Just supressing OOM kill is a bad idea. We still leave a way to allocate arbitrary large buffer in kernel. -- Kirill A. Shutemov
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Tetsuo Handawrote: > syzbot wrote: > > syzbot hit the following crash on net-next commit > > 6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +) > > Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed' > > > > C reproducer is attached. > > syzkaller reproducer is attached. > > Raw console output is attached. > > compiler: gcc (GCC) 7.1.1 20170620 > > .config is attached. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+8630e35fc7287b392...@syzkaller.appspotmail.com > > It will help syzbot understand when the bug is fixed. See footer for > > details. > > If you forward the report, please keep this part and the footer. > > > > [ 3685] 0 3685178211 1843200 0 sshd > > [ 3692] 0 3692 43760327680 0 > > syzkaller025682 > > [ 3695] 0 3695 43760368640 0 > > syzkaller025682 > > Kernel panic - not syncing: Out of memory and no killable processes... > > > > This sounds like too huge vmalloc() request where size is controlled by > userspace. Right. Before eacd86ca3b036e55e172b7279f101cef4a6ff3a4 this used info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY, would it help to re-add that? > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back > off when the current task is killed") but then became unkillable by commit > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > killed""). Therefore, we can't handle this problem from MM side. > Please consider adding some limit from networking side. I don't know what "some limit" would be. I would prefer if there was a way to supress OOM Killer in first place so we can just -ENOMEM user. AFAIU NOWARN|NORETRY does that, so I would propose to just read-add it?
Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)
Tetsuo Handa wrote: > syzbot wrote: > > syzbot hit the following crash on net-next commit > > 6bb46bc57c8e9ce947cc605e555b7204b44d2b10 (Fri Jan 26 16:00:23 2018 +) > > Merge branch 'cxgb4-fix-dump-collection-when-firmware-crashed' > > > > C reproducer is attached. > > syzkaller reproducer is attached. > > Raw console output is attached. > > compiler: gcc (GCC) 7.1.1 20170620 > > .config is attached. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+8630e35fc7287b392...@syzkaller.appspotmail.com > > It will help syzbot understand when the bug is fixed. See footer for > > details. > > If you forward the report, please keep this part and the footer. > > > > [ 3685] 0 3685178211 1843200 0 sshd > > [ 3692] 0 3692 43760327680 0 > > syzkaller025682 > > [ 3695] 0 3695 43760368640 0 > > syzkaller025682 > > Kernel panic - not syncing: Out of memory and no killable processes... > > > > This sounds like too huge vmalloc() request where size is controlled by > userspace. Right. Before eacd86ca3b036e55e172b7279f101cef4a6ff3a4 this used info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY, would it help to re-add that? > vmalloc() once became killable by commit 5d17a73a2ebeb8d1 ("vmalloc: back > off when the current task is killed") but then became unkillable by commit > b8c8a338f75e052d ("Revert "vmalloc: back off when the current task is > killed""). Therefore, we can't handle this problem from MM side. > Please consider adding some limit from networking side. I don't know what "some limit" would be. I would prefer if there was a way to supress OOM Killer in first place so we can just -ENOMEM user. AFAIU NOWARN|NORETRY does that, so I would propose to just read-add it?