Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-08-07 Thread Vlastimil Babka
On 08/07/2018 01:26 PM, Michal Hocko wrote:
> On Tue 07-08-18 13:19:26, Florian Westphal wrote:
>> Michal Hocko  wrote:
 I can't reproduce it anymore.
 If i understand correctly this way memory allocated will be
 accounted to kmem of this cgroup (if inside cgroup).
>>>
>>> s@this@caller's@
>>>
>>> Florian, is this patch acceptable
>>
>> I am no mm expert.  Should all longlived GFP_KERNEL allocations set ACCOUNT?
> 
> No. We should focus only on those that are under direct userspace
> control and it can be triggered by an untrusted user.

Looks like the description in include/linux/gfp.h could use some details
to guide developers, possibly also Mike's new/improved docs (+CC).

>> If so, there are more places that should get same treatment.
>> The change looks fine to me, but again, I don't know when ACCOUNT should
>> be set in the first place.
> 
> see above.
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-08-07 Thread Vlastimil Babka
On 08/02/2018 10:50 AM, Michal Hocko wrote:
> On Wed 01-08-18 19:03:03, Georgi Nikolov wrote:
>>
>> *Georgi Nikolov*
>> System Administrator
>> www.icdsoft.com <http://www.icdsoft.com>
>>
>> On 08/01/2018 11:33 AM, Michal Hocko wrote:
>>> On Wed 01-08-18 09:34:23, Vlastimil Babka wrote:
>>>> On 07/31/2018 04:05 PM, Florian Westphal wrote:
>>>>> Georgi Nikolov  wrote:
>>>>>>> No, I think that's rather for the netfilter folks to decide. However, it
>>>>>>> seems there has been the debate already [1] and it was not found. The
>>>>>>> conclusion was that __GFP_NORETRY worked fine before, so it should work
>>>>>>> again after it's added back. But now we know that it doesn't...
>>>>>>>
>>>>>>> [1] 
>>>>>>> https://lore.kernel.org/lkml/20180130140104.ge21...@dhcp22.suse.cz/T/#u
>>>>>> Yes i see. I will add Florian Westphal to CC list. netfilter-devel is
>>>>>> already in this list so probably have to wait for their opinion.
>>>>> It hasn't changed, I think having OOM killer zap random processes
>>>>> just because userspace wants to import large iptables ruleset is not a
>>>>> good idea.
>>>> If we denied the allocation instead of OOM (e.g. by using
>>>> __GFP_RETRY_MAYFAIL), a slightly smaller one may succeed, still leaving
>>>> the system without much memory, so it will invoke OOM killer sooner or
>>>> later anyway.
>>>>
>>>> I don't see any silver-bullet solution, unfortunately. If this can be
>>>> abused by (multiple) namespaces, then they have to be contained by
>>>> kmemcg as that's the generic mechanism intended for this. Then we could
>>>> use the __GFP_RETRY_MAYFAIL.
>>>> The only limit we could impose to outright deny the allocation (to
>>>> prevent obvious bugs/admin mistakes or abuses) could be based on the
>>>> amount of RAM, as was suggested in the old thread.
>>
>> Can we make this configurable - on/off switch or size above which
>> to pass GFP_NORETRY.
> 
> Yet another tunable? How do you decide which one to select? Seriously,
> configuration knobs sound attractive but they are rarely a good idea.
> Either we trust privileged users or we don't and we have kmem accounting
> for that.
> 
>> Probably hard coded based on amount of RAM is a good idea too.
> 
> How do you scale that?
> 
> In other words, why don't we simply do the following? Note that this is
> not tested. I have also no idea what is the lifetime of this allocation.
> Is it bound to any specific process or is it a namespace bound? If the
> later then the memcg OOM killer might wipe the whole memcg down without
> making any progress. This would make the whole namespace unsuable until
> somebody intervenes. Is this acceptable?
> ---
> From 4dec96eb64954a7e58264ed551afadf62ca4c5f7 Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Thu, 2 Aug 2018 10:38:57 +0200
> Subject: [PATCH] netfilter/x_tables: do not fail xt_alloc_table_info too
>  easilly
> 
> eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc()
> in xt_alloc_table_info()") has unintentionally fortified
> xt_alloc_table_info allocation when __GFP_RETRY has been dropped from
> the vmalloc fallback. Later on there was a syzbot report that this
> can lead to OOM killer invocations when tables are too large and
> 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
> has been merged to restore the original behavior. Georgi Nikolov however
> noticed that he is not able to install his iptables anymore so this can
> be seen as a regression.
> 
> The primary argument for 0537250fdc6c was that this allocation path
> shouldn't really trigger the OOM killer and kill innocent tasks. On the
> other hand the interface requires root and as such should allow what the
> admin asks for. Root inside a namespaces makes this more complicated
> because those might be not trusted in general. If they are not then such
> namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY
> and replace it by __GFP_ACCOUNT to enfore memcg constrains on it.
> 
> Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive")
> Reported-by: Georgi Nikolov 
> Suggested-by: Vlastimil Babka 
> Signed-off-by: Michal Hocko 
> ---
>  net/netfilter/x_tables.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index d0d8397c9588..b76

Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-08-01 Thread Vlastimil Babka
On 07/31/2018 04:05 PM, Florian Westphal wrote:
> Georgi Nikolov  wrote:
>>> No, I think that's rather for the netfilter folks to decide. However, it
>>> seems there has been the debate already [1] and it was not found. The
>>> conclusion was that __GFP_NORETRY worked fine before, so it should work
>>> again after it's added back. But now we know that it doesn't...
>>>
>>> [1] https://lore.kernel.org/lkml/20180130140104.ge21...@dhcp22.suse.cz/T/#u
>>
>> Yes i see. I will add Florian Westphal to CC list. netfilter-devel is
>> already in this list so probably have to wait for their opinion.
> 
> It hasn't changed, I think having OOM killer zap random processes
> just because userspace wants to import large iptables ruleset is not a
> good idea.

If we denied the allocation instead of OOM (e.g. by using
__GFP_RETRY_MAYFAIL), a slightly smaller one may succeed, still leaving
the system without much memory, so it will invoke OOM killer sooner or
later anyway.

I don't see any silver-bullet solution, unfortunately. If this can be
abused by (multiple) namespaces, then they have to be contained by
kmemcg as that's the generic mechanism intended for this. Then we could
use the __GFP_RETRY_MAYFAIL.
The only limit we could impose to outright deny the allocation (to
prevent obvious bugs/admin mistakes or abuses) could be based on the
amount of RAM, as was suggested in the old thread.

__GFP_NORETRY might look like a good match at first sight as that stops
allocating when "reclaim becomes hard" which means the system is still
relatively far from OOM. But it's not reliable in principle, and as this
bug report shows. That's fine when __GFP_NORETRY is used for optimistic
allocations that have some other fallback (e.g. huge page with fallback
to base page), but far from ideal when failure means returning -ENOMEM
to userspace.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-08-01 Thread Vlastimil Babka
On 07/31/2018 04:25 PM, Georgi Nikolov wrote:
> On 07/31/2018 05:05 PM, Florian Westphal wrote:
>> Georgi Nikolov  wrote:
 No, I think that's rather for the netfilter folks to decide. However, it
 seems there has been the debate already [1] and it was not found. The
 conclusion was that __GFP_NORETRY worked fine before, so it should work
 again after it's added back. But now we know that it doesn't...

 [1] https://lore.kernel.org/lkml/20180130140104.ge21...@dhcp22.suse.cz/T/#u
>>> Yes i see. I will add Florian Westphal to CC list. netfilter-devel is
>>> already in this list so probably have to wait for their opinion.
>> It hasn't changed, I think having OOM killer zap random processes
>> just because userspace wants to import large iptables ruleset is not a
>> good idea.
> And what about passing GFP_NORETRY only above some reasonable threshold?

What is the reasonable threshold?

> Or situation has to be handled in userspace.

How?

> 
> Regards,
> 
> --
> Georgi Nikolov
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-07-31 Thread Vlastimil Babka
On 07/30/2018 08:51 PM, Georgi Nikolov wrote:
> On 07/30/2018 09:38 PM, Michal Hocko wrote:
>> On Mon 30-07-18 18:54:24, Georgi Nikolov wrote:
>> [...]
>>> No i was wrong. The regression starts actually with 0537250fdc6c8.
>>> - old code, which opencodes kvmalloc, is masking error but error is there
>>> - kvmalloc without GFP_NORETRY works fine, but probably can consume a
>>> lot of memory - commit: eacd86ca3b036
>>> - kvmalloc with GFP_NORETRY shows error - commit: 0537250fdc6c8
>> OK.
>>
>>>>> What is correct way to fix it.
>>>>> - inside xt_alloc_table_info remove GFP_NORETRY from kvmalloc or add
>>>>> this flag only for sizes bigger than some threshold
>>>> This would reintroduce issue fixed by 0537250fdc6c8. Note that
>>>> kvmalloc(GFP_KERNEL | __GFP_NORETRY) is more or less equivalent to the
>>>> original code (well, except for __GFP_NOWARN).
>>> So probably we should pass GFP_NORETRY only for large requests (above
>>> some threshold).
>> What would be the treshold? This is not really my area so I just wanted
>> to keep the original code semantic.
>>  
>>>>> - inside kvmalloc_node remove GFP_NORETRY from
>>>>> __vmalloc_node_flags_caller (i don't know if it honors this flag, or
>>>>> the problem is elsewhere)
>>>> No, not really. This is basically equivalent to kvmalloc(GFP_KERNEL).
>>>>
>>>> I strongly suspect that this is not a regression in this code but rather
>>>> a side effect of larger memory fragmentation caused by something else.
>>>> In any case do you see this failure also without artificial test case
>>>> with a standard workload?
>>> Yes i can see failures with standard workload, in fact it was hard to
>>> reproduce it.
>>> Here is the error from production servers where allocation is smaller:
>>> iptables: vmalloc: allocation failure, allocated 131072 of 225280 bytes,
>>> mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)
>>>
>>> I didn't understand if vmalloc honors GFP_NORETRY.
>> 0537250fdc6c8 changelog tries to explain. kvmalloc doesn't really
>> support the GFP_NORETRY remantic because that would imply the request
>> wouldn't trigger the oom killer but in rare cases this might happen
>> (e.g. when page tables are allocated because those are hardcoded GFP_KERNEL).
>>
>> That being said, I have no objection to use GFP_KERNEL if it helps real
>> workloads but we probably need some cap...
> 
> Probably Vlastimil Babka can propose some limit:

No, I think that's rather for the netfilter folks to decide. However, it
seems there has been the debate already [1] and it was not found. The
conclusion was that __GFP_NORETRY worked fine before, so it should work
again after it's added back. But now we know that it doesn't...

[1] https://lore.kernel.org/lkml/20180130140104.ge21...@dhcp22.suse.cz/T/#u

> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote:
> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
> ("netfilter: x_tables: make allocation less aggressive") was backported
> to 4.14. Removing __GFP_NORETRY might help here, but bring back other
> issues. Less than 4MB is not that much though, maybe find some "sane"
> limit and use __GFP_NORETRY only above that?
> 
> 
> Regards,
> 
> --
> Georgi Nikolov
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-07-26 Thread Vlastimil Babka
On 07/26/2018 10:31 AM, Vlastimil Babka wrote:
> On 07/26/2018 10:03 AM, Michal Hocko wrote:
>> On Thu 26-07-18 09:50:45, Vlastimil Babka wrote:
>>> On 07/26/2018 09:42 AM, Michal Hocko wrote:
>>>> On Thu 26-07-18 09:34:58, Vlastimil Babka wrote:
>>>>> On 07/26/2018 09:26 AM, Michal Hocko wrote:
>>>>>> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote:
>>>>>>> On 07/25/2018 09:52 PM, Andrew Morton wrote:
>>>>>>>
>>>>>>> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
>>>>>>> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
>>>>>>> ("netfilter: x_tables: make allocation less aggressive") was backported
>>>>>>> to 4.14. Removing __GFP_NORETRY might help here, but bring back other
>>>>>>> issues. Less than 4MB is not that much though, maybe find some "sane"
>>>>>>> limit and use __GFP_NORETRY only above that?
>>>>>>
>>>>>> I have seen the same report via 
>>>>>> http://lkml.kernel.org/r/df6f501c-8546-1f55-40b1-7e3a8f54d...@icdsoft.com
>>>>>> and the reported confirmed that kvmalloc is not a real culprit
>>>>>> http://lkml.kernel.org/r/d99a9598-808a-6968-4131-c3949b752...@icdsoft.com
>>>>>
>>>>> Hmm but that was revert of eacd86ca3b03 ("net/netfilter/x_tables.c: use
>>>>> kvmalloc() in xt_alloc_table_info()") which was the 4.13 commit that
>>>>> removed __GFP_NORETRY (there's no __GFP_NORETRY under net/netfilter in
>>>>> v4.14). I assume it was reverted on top of vanilla v4.14 as there would
>>>>> be conflict on the stable with 0537250fdc6c backport. So what should be
>>>>> tested to be sure is either vanilla v4.14 without stable backports, or
>>>>> latest v4.14.y with revert of 0537250fdc6c.
>>>>
>>>> But 0537250fdc6c simply restored the previous NORETRY behavior from
>>>> before eacd86ca3b03. So whatever causes these issues doesn't seem to be
>>>> directly related to the kvmalloc change. Or do I miss what you are
>>>> saying?
>>>
>>> I'm saying that although it's not a regression, as you say (the
>>> vmalloc() there was only for a few kernel versions called without
>>> __GFP_NORETRY), it's still possible that removing __GFP_NORETRY will fix
>>> the issue and thus we will rule out other possibilities.
>>
>> http://lkml.kernel.org/r/d99a9598-808a-6968-4131-c3949b752...@icdsoft.com
>> claims that reverting eacd86ca3b03 didn't really help.

Ah, I see, that mail thread references a different kernel bugzilla
#200639 which doesn't mention 4.14, but outright blames commit
eacd86ca3b03. Yet the alloc fail message contains __GFP_NORETRY, so I
still suspect the kernel also had 0537250fdc6c backport. Georgi can you
please clarify which exact kernel version had the alloc failures, and
how exactly you tested the revert (which version was the baseline for
revert). Thanks.

> Of course not. eacd86ca3b03 *removed* __GFP_NORETRY, so the revert
> reintroduced it. I tried to explain it in the quoted part above starting
> with "Hmm but that was revert of eacd86ca3b03 ...". What I'm saying is
> that eacd86ca3b03 might have actually *fixed* (or rather prevented) this
> alloc failure, if there was not 0537250fdc6c and its 4.14 stable
> backport (the kernel bugzilla report says 4.14, I'm assuming new enough
> stable to contain 0537250fdc6c as the failure message contains
> __GFP_NORETRY).
> 
> The mail you reference also says "seems that old version is masking
> errors", which confirms that we are indeed looking at the right
> vmalloc(), because eacd86ca3b03 also removed __GFP_NOWARN there (and
> thus the revert reintroduced it).
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-07-26 Thread Vlastimil Babka
On 07/26/2018 10:03 AM, Michal Hocko wrote:
> On Thu 26-07-18 09:50:45, Vlastimil Babka wrote:
>> On 07/26/2018 09:42 AM, Michal Hocko wrote:
>>> On Thu 26-07-18 09:34:58, Vlastimil Babka wrote:
>>>> On 07/26/2018 09:26 AM, Michal Hocko wrote:
>>>>> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote:
>>>>>> On 07/25/2018 09:52 PM, Andrew Morton wrote:
>>>>>>
>>>>>> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
>>>>>> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
>>>>>> ("netfilter: x_tables: make allocation less aggressive") was backported
>>>>>> to 4.14. Removing __GFP_NORETRY might help here, but bring back other
>>>>>> issues. Less than 4MB is not that much though, maybe find some "sane"
>>>>>> limit and use __GFP_NORETRY only above that?
>>>>>
>>>>> I have seen the same report via 
>>>>> http://lkml.kernel.org/r/df6f501c-8546-1f55-40b1-7e3a8f54d...@icdsoft.com
>>>>> and the reported confirmed that kvmalloc is not a real culprit
>>>>> http://lkml.kernel.org/r/d99a9598-808a-6968-4131-c3949b752...@icdsoft.com
>>>>
>>>> Hmm but that was revert of eacd86ca3b03 ("net/netfilter/x_tables.c: use
>>>> kvmalloc() in xt_alloc_table_info()") which was the 4.13 commit that
>>>> removed __GFP_NORETRY (there's no __GFP_NORETRY under net/netfilter in
>>>> v4.14). I assume it was reverted on top of vanilla v4.14 as there would
>>>> be conflict on the stable with 0537250fdc6c backport. So what should be
>>>> tested to be sure is either vanilla v4.14 without stable backports, or
>>>> latest v4.14.y with revert of 0537250fdc6c.
>>>
>>> But 0537250fdc6c simply restored the previous NORETRY behavior from
>>> before eacd86ca3b03. So whatever causes these issues doesn't seem to be
>>> directly related to the kvmalloc change. Or do I miss what you are
>>> saying?
>>
>> I'm saying that although it's not a regression, as you say (the
>> vmalloc() there was only for a few kernel versions called without
>> __GFP_NORETRY), it's still possible that removing __GFP_NORETRY will fix
>> the issue and thus we will rule out other possibilities.
> 
> http://lkml.kernel.org/r/d99a9598-808a-6968-4131-c3949b752...@icdsoft.com
> claims that reverting eacd86ca3b03 didn't really help.

Of course not. eacd86ca3b03 *removed* __GFP_NORETRY, so the revert
reintroduced it. I tried to explain it in the quoted part above starting
with "Hmm but that was revert of eacd86ca3b03 ...". What I'm saying is
that eacd86ca3b03 might have actually *fixed* (or rather prevented) this
alloc failure, if there was not 0537250fdc6c and its 4.14 stable
backport (the kernel bugzilla report says 4.14, I'm assuming new enough
stable to contain 0537250fdc6c as the failure message contains
__GFP_NORETRY).

The mail you reference also says "seems that old version is masking
errors", which confirms that we are indeed looking at the right
vmalloc(), because eacd86ca3b03 also removed __GFP_NOWARN there (and
thus the revert reintroduced it).


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-07-26 Thread Vlastimil Babka
On 07/26/2018 09:42 AM, Michal Hocko wrote:
> On Thu 26-07-18 09:34:58, Vlastimil Babka wrote:
>> On 07/26/2018 09:26 AM, Michal Hocko wrote:
>>> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote:
>>>> On 07/25/2018 09:52 PM, Andrew Morton wrote:
>>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>>>> bugzilla web interface).
>>>>>
>>>>> On Wed, 25 Jul 2018 11:42:57 + bugzilla-dae...@bugzilla.kernel.org 
>>>>> wrote:
>>>>>
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=200651
>>>>>>
>>>>>> Bug ID: 200651
>>>>>>Summary: cgroups iptables-restor: vmalloc: allocation failure
>>>>>
>>>>> Thanks.  Please do note the above request.
>>>>>
>>>>>>Product: Memory Management
>>>>>>Version: 2.5
>>>>>> Kernel Version: 4.14
>>>>>>   Hardware: All
>>>>>> OS: Linux
>>>>>>   Tree: Mainline
>>>>>> Status: NEW
>>>>>>   Severity: normal
>>>>>>   Priority: P1
>>>>>>  Component: Other
>>>>>>   Assignee: a...@linux-foundation.org
>>>>>>   Reporter: gniko...@icdsoft.com
>>>>>> Regression: No
>>>>>>
>>>>>> Created attachment 277505
>>>>>>   --> https://bugzilla.kernel.org/attachment.cgi?id=277505=edit
>>>>>> iptables save
>>>>>>
>>>>>> After creating large number of cgroups and under memory pressure, 
>>>>>> iptables
>>>>>> command fails with following error:
>>>>>>
>>>>>> "iptables-restor: vmalloc: allocation failure, allocated 3047424 of 
>>>>>> 3465216
>>>>>> bytes, mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)"
>>>>
>>>> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
>>>> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
>>>> ("netfilter: x_tables: make allocation less aggressive") was backported
>>>> to 4.14. Removing __GFP_NORETRY might help here, but bring back other
>>>> issues. Less than 4MB is not that much though, maybe find some "sane"
>>>> limit and use __GFP_NORETRY only above that?
>>>
>>> I have seen the same report via 
>>> http://lkml.kernel.org/r/df6f501c-8546-1f55-40b1-7e3a8f54d...@icdsoft.com
>>> and the reported confirmed that kvmalloc is not a real culprit
>>> http://lkml.kernel.org/r/d99a9598-808a-6968-4131-c3949b752...@icdsoft.com
>>
>> Hmm but that was revert of eacd86ca3b03 ("net/netfilter/x_tables.c: use
>> kvmalloc() in xt_alloc_table_info()") which was the 4.13 commit that
>> removed __GFP_NORETRY (there's no __GFP_NORETRY under net/netfilter in
>> v4.14). I assume it was reverted on top of vanilla v4.14 as there would
>> be conflict on the stable with 0537250fdc6c backport. So what should be
>> tested to be sure is either vanilla v4.14 without stable backports, or
>> latest v4.14.y with revert of 0537250fdc6c.
> 
> But 0537250fdc6c simply restored the previous NORETRY behavior from
> before eacd86ca3b03. So whatever causes these issues doesn't seem to be
> directly related to the kvmalloc change. Or do I miss what you are
> saying?

I'm saying that although it's not a regression, as you say (the
vmalloc() there was only for a few kernel versions called without
__GFP_NORETRY), it's still possible that removing __GFP_NORETRY will fix
the issue and thus we will rule out other possibilities.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-07-26 Thread Vlastimil Babka
On 07/26/2018 09:26 AM, Michal Hocko wrote:
> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote:
>> On 07/25/2018 09:52 PM, Andrew Morton wrote:
>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>> bugzilla web interface).
>>>
>>> On Wed, 25 Jul 2018 11:42:57 + bugzilla-dae...@bugzilla.kernel.org 
>>> wrote:
>>>
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=200651
>>>>
>>>> Bug ID: 200651
>>>>Summary: cgroups iptables-restor: vmalloc: allocation failure
>>>
>>> Thanks.  Please do note the above request.
>>>
>>>>Product: Memory Management
>>>>Version: 2.5
>>>> Kernel Version: 4.14
>>>>   Hardware: All
>>>> OS: Linux
>>>>   Tree: Mainline
>>>> Status: NEW
>>>>   Severity: normal
>>>>   Priority: P1
>>>>  Component: Other
>>>>   Assignee: a...@linux-foundation.org
>>>>   Reporter: gniko...@icdsoft.com
>>>> Regression: No
>>>>
>>>> Created attachment 277505
>>>>   --> https://bugzilla.kernel.org/attachment.cgi?id=277505=edit
>>>> iptables save
>>>>
>>>> After creating large number of cgroups and under memory pressure, iptables
>>>> command fails with following error:
>>>>
>>>> "iptables-restor: vmalloc: allocation failure, allocated 3047424 of 3465216
>>>> bytes, mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)"
>>
>> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
>> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
>> ("netfilter: x_tables: make allocation less aggressive") was backported
>> to 4.14. Removing __GFP_NORETRY might help here, but bring back other
>> issues. Less than 4MB is not that much though, maybe find some "sane"
>> limit and use __GFP_NORETRY only above that?
> 
> I have seen the same report via 
> http://lkml.kernel.org/r/df6f501c-8546-1f55-40b1-7e3a8f54d...@icdsoft.com
> and the reported confirmed that kvmalloc is not a real culprit
> http://lkml.kernel.org/r/d99a9598-808a-6968-4131-c3949b752...@icdsoft.com

Hmm but that was revert of eacd86ca3b03 ("net/netfilter/x_tables.c: use
kvmalloc() in xt_alloc_table_info()") which was the 4.13 commit that
removed __GFP_NORETRY (there's no __GFP_NORETRY under net/netfilter in
v4.14). I assume it was reverted on top of vanilla v4.14 as there would
be conflict on the stable with 0537250fdc6c backport. So what should be
tested to be sure is either vanilla v4.14 without stable backports, or
latest v4.14.y with revert of 0537250fdc6c.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure

2018-07-26 Thread Vlastimil Babka
On 07/25/2018 09:52 PM, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 25 Jul 2018 11:42:57 + bugzilla-dae...@bugzilla.kernel.org wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=200651
>>
>> Bug ID: 200651
>>Summary: cgroups iptables-restor: vmalloc: allocation failure
> 
> Thanks.  Please do note the above request.
> 
>>Product: Memory Management
>>Version: 2.5
>> Kernel Version: 4.14
>>   Hardware: All
>> OS: Linux
>>   Tree: Mainline
>> Status: NEW
>>   Severity: normal
>>   Priority: P1
>>  Component: Other
>>   Assignee: a...@linux-foundation.org
>>   Reporter: gniko...@icdsoft.com
>> Regression: No
>>
>> Created attachment 277505
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=277505=edit
>> iptables save
>>
>> After creating large number of cgroups and under memory pressure, iptables
>> command fails with following error:
>>
>> "iptables-restor: vmalloc: allocation failure, allocated 3047424 of 3465216
>> bytes, mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null)"

This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 and
4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6c
("netfilter: x_tables: make allocation less aggressive") was backported
to 4.14. Removing __GFP_NORETRY might help here, but bring back other
issues. Less than 4MB is not that much though, maybe find some "sane"
limit and use __GFP_NORETRY only above that?

> I'm not sure what the problem is here, apart from iptables being
> over-optimistic about vmalloc()'s abilities.
> 
> Are cgroups having any impact on this, or is it simply vmalloc arena
> fragmentation, and the iptables code should use some data structure
> more sophisticated than a massive array?
> 
> Maybe all that ccgroup metadata is contributing to the arena
> fragmentation, but that allocations will be small and the two systems
> should be able to live alongside, by being realistic about vmalloc.
> 
>> System which is used to reproduce the bug is with 2 vcpus and 2GB of ram, but
>> it happens on more powerfull systems.
>>
>> Steps to reproduce:
>>
>> mkdir /cgroup
>> mount cgroup -t cgroup -omemory,pids,blkio,cpuacct /cgroup
>> for a in `seq 1 1000`; do for b in `seq 1 4` ; do mkdir -p
>> "/cgroup/user/$a/$b"; done; done
>>
>> Then in separate consoles
>>
>> cat /dev/vda > /dev/null
>> ./test
>> ./test
>> i=0;while sleep 0 ; do iptables-restore < iptables.save ; i=$(($i+1)); echo 
>> $i;
>> done
>>
>> Here is the source of "test" program and attached iptables.save. It happens
>> also with smaller iptables.save file.
>>
>> #include 
>> #include 
>>
>> int main(void) {
>>
>> srand(time(NULL));
>> int i = 0, j = 0, randnum=0;
>> int arr[6] = { 3072, 7168, 15360 , 31744, 64512, 130048}; 
>> while(1) {
>>
>> for (i = 0; i < 6 ; i++) {
>>
>> int *ptr = (int*) malloc(arr[i] * 93);  
>>
>> for(j = 0 ; j < arr[i] * 93 / sizeof(int); j++) {
>> *(ptr+j) = j+1;
>> }
>>
>> free(ptr);
>> }
>> }   
>> }
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html