Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
Hello, On Thu, Aug 08, 2013 at 04:43:51PM +0200, Michal Hocko wrote: > > Is it correct that you fix one local DoS by introducing a new one? > > With the page the !priv user can block root from registering a threshold. > > Is it really the way we want to fix it? > > OK, I will think about it some more. The only thing the patch does is replacing implicit global resource limit with an explicit one. Whether that's useful or not, I don't know, but it doesn't really change the nature of the problem or actually fix anything. The only way to fix it is rewriting the whole thing so that allocations are broken up per source, which I don't think is a good idea at this point. I'd just add a comment noting why it's broken. Given that delegating to !priv users is horribly broken anyway, I don't think this worsens the situation by too much anyway. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Thu 08-08-13 01:05:13, Kirill A. Shutemov wrote: > On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote: > > On Wed 07-08-13 09:58:18, Tejun Heo wrote: > > > Hello, > > > > > > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: > > > > OK, I have obviously misunderstood your concern mentioned in the other > > > > email. Could you be more specific what is the DoS scenario which was > > > > your concern, then? > > > > > > So, let's say the file is write-accessible to !priv user which is > > > under reasonable resource limits. Normally this shouldn't affect priv > > > system tools which are monitoring the same event as it shouldn't be > > > able to deplete resources as long as the resource control mechanisms > > > are configured and functioning properly; however, the memory usage > > > event puts all event listeners into a single contiguous table which a > > > !priv user can easily expand to a size where the table can no longer > > > be enlarged and if a priv system tool or another user tries to > > > register event afterwards, it'll fail. IOW, it creates a shared > > > resource which isn't properly provisioned and can be trivially filled > > > up making it an easy DoS target. > > > > OK, got your point. You are right and I haven't considered the size of > > the table and the size restrictions of kmalloc. Thanks for pointing this > > out! > > --- > > From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Wed, 7 Aug 2013 11:11:22 +0200 > > Subject: [PATCH] memcg: limit the number of thresholds per-memcg > > > > There is no limit for the maximum number of threshold events registered > > per memcg. It is even worse that all the events are stored in a > > per-memcg table which is enlarged when a new event is registered. This > > can lead to the following issue mentioned by Tejun: > > " > > So, let's say the file is write-accessible to !priv user which is > > under reasonable resource limits. Normally this shouldn't affect priv > > system tools which are monitoring the same event as it shouldn't be > > able to deplete resources as long as the resource control mechanisms > > are configured and functioning properly; however, the memory usage > > event puts all event listeners into a single contiguous table which a > > !priv user can easily expand to a size where the table can no longer > > be enlarged and if a priv system tool or another user tries to > > register event afterwards, it'll fail. IOW, it creates a shared > > resource which isn't properly provisioned and can be trivially filled > > up making it an easy DoS target. > > " > > > > Let's be more strict and cap the number of events that might be > > registered. MAX_THRESHOLD_EVENTS value is more or less random. The > > expectation is that it should be high enough to cover reasonable > > usecases while not too high to allow excessive resources consumption. > > 1024 events consume something like 16KB which shouldn't be a big deal > > and it should be good enough. > > Is it correct that you fix one local DoS by introducing a new one? > With the page the !priv user can block root from registering a threshold. > Is it really the way we want to fix it? OK, I will think about it some more. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Thu 08-08-13 01:05:13, Kirill A. Shutemov wrote: On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote: On Wed 07-08-13 09:58:18, Tejun Heo wrote: Hello, On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: OK, I have obviously misunderstood your concern mentioned in the other email. Could you be more specific what is the DoS scenario which was your concern, then? So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. OK, got your point. You are right and I haven't considered the size of the table and the size restrictions of kmalloc. Thanks for pointing this out! --- From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001 From: Michal Hocko mho...@suse.cz Date: Wed, 7 Aug 2013 11:11:22 +0200 Subject: [PATCH] memcg: limit the number of thresholds per-memcg There is no limit for the maximum number of threshold events registered per memcg. It is even worse that all the events are stored in a per-memcg table which is enlarged when a new event is registered. This can lead to the following issue mentioned by Tejun: So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. Is it correct that you fix one local DoS by introducing a new one? With the page the !priv user can block root from registering a threshold. Is it really the way we want to fix it? OK, I will think about it some more. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
Hello, On Thu, Aug 08, 2013 at 04:43:51PM +0200, Michal Hocko wrote: Is it correct that you fix one local DoS by introducing a new one? With the page the !priv user can block root from registering a threshold. Is it really the way we want to fix it? OK, I will think about it some more. The only thing the patch does is replacing implicit global resource limit with an explicit one. Whether that's useful or not, I don't know, but it doesn't really change the nature of the problem or actually fix anything. The only way to fix it is rewriting the whole thing so that allocations are broken up per source, which I don't think is a good idea at this point. I'd just add a comment noting why it's broken. Given that delegating to !priv users is horribly broken anyway, I don't think this worsens the situation by too much anyway. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote: > On Wed 07-08-13 09:58:18, Tejun Heo wrote: > > Hello, > > > > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: > > > OK, I have obviously misunderstood your concern mentioned in the other > > > email. Could you be more specific what is the DoS scenario which was > > > your concern, then? > > > > So, let's say the file is write-accessible to !priv user which is > > under reasonable resource limits. Normally this shouldn't affect priv > > system tools which are monitoring the same event as it shouldn't be > > able to deplete resources as long as the resource control mechanisms > > are configured and functioning properly; however, the memory usage > > event puts all event listeners into a single contiguous table which a > > !priv user can easily expand to a size where the table can no longer > > be enlarged and if a priv system tool or another user tries to > > register event afterwards, it'll fail. IOW, it creates a shared > > resource which isn't properly provisioned and can be trivially filled > > up making it an easy DoS target. > > OK, got your point. You are right and I haven't considered the size of > the table and the size restrictions of kmalloc. Thanks for pointing this > out! > --- > From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Wed, 7 Aug 2013 11:11:22 +0200 > Subject: [PATCH] memcg: limit the number of thresholds per-memcg > > There is no limit for the maximum number of threshold events registered > per memcg. It is even worse that all the events are stored in a > per-memcg table which is enlarged when a new event is registered. This > can lead to the following issue mentioned by Tejun: > " > So, let's say the file is write-accessible to !priv user which is > under reasonable resource limits. Normally this shouldn't affect priv > system tools which are monitoring the same event as it shouldn't be > able to deplete resources as long as the resource control mechanisms > are configured and functioning properly; however, the memory usage > event puts all event listeners into a single contiguous table which a > !priv user can easily expand to a size where the table can no longer > be enlarged and if a priv system tool or another user tries to > register event afterwards, it'll fail. IOW, it creates a shared > resource which isn't properly provisioned and can be trivially filled > up making it an easy DoS target. > " > > Let's be more strict and cap the number of events that might be > registered. MAX_THRESHOLD_EVENTS value is more or less random. The > expectation is that it should be high enough to cover reasonable > usecases while not too high to allow excessive resources consumption. > 1024 events consume something like 16KB which shouldn't be a big deal > and it should be good enough. Is it correct that you fix one local DoS by introducing a new one? With the page the !priv user can block root from registering a threshold. Is it really the way we want to fix it? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Wed 07-08-13 09:58:18, Tejun Heo wrote: > Hello, > > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: > > OK, I have obviously misunderstood your concern mentioned in the other > > email. Could you be more specific what is the DoS scenario which was > > your concern, then? > > So, let's say the file is write-accessible to !priv user which is > under reasonable resource limits. Normally this shouldn't affect priv > system tools which are monitoring the same event as it shouldn't be > able to deplete resources as long as the resource control mechanisms > are configured and functioning properly; however, the memory usage > event puts all event listeners into a single contiguous table which a > !priv user can easily expand to a size where the table can no longer > be enlarged and if a priv system tool or another user tries to > register event afterwards, it'll fail. IOW, it creates a shared > resource which isn't properly provisioned and can be trivially filled > up making it an easy DoS target. OK, got your point. You are right and I haven't considered the size of the table and the size restrictions of kmalloc. Thanks for pointing this out! --- >From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 7 Aug 2013 11:11:22 +0200 Subject: [PATCH] memcg: limit the number of thresholds per-memcg There is no limit for the maximum number of threshold events registered per memcg. It is even worse that all the events are stored in a per-memcg table which is enlarged when a new event is registered. This can lead to the following issue mentioned by Tejun: " So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. " Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. Reported-by: Tejun Heo Signed-off-by: Michal Hocko --- mm/memcontrol.c |8 1 file changed, 8 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e4330cd..8247db3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) mem_cgroup_oom_notify_cb(iter); } +/* Maximum number of treshold events registered per memcg. */ +#define MAX_THRESHOLD_EVENTS 1024 + static int mem_cgroup_usage_register_event(struct cgroup *cgrp, struct cftype *cft, struct eventfd_ctx *eventfd, const char *args) { @@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp, else BUG(); + if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) { + ret = -ENOSPC; + goto unlock; + } + usage = mem_cgroup_usage(memcg, type == _MEMSWAP); /* Check if a threshold crossed before adding a new one */ -- 1.7.10.4 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
Hello, On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: > OK, I have obviously misunderstood your concern mentioned in the other > email. Could you be more specific what is the DoS scenario which was > your concern, then? So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. Putting an extra limit on it isn't an actual solution but could be better, I think. It at least makes it clear that this is a limited global resource. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Wed 07-08-13 09:22:10, Tejun Heo wrote: > Hello, > > On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote: > > There is no limit for the maximum number of threshold events registered > > per memcg. This might lead to an user triggered memory depletion if a > > regular user is allowed to register on memory.[memsw.]usage_in_bytes > > eventfd interface. > > > > Let's be more strict and cap the number of events that might be > > registered. MAX_THRESHOLD_EVENTS value is more or less random. The > > expectation is that it should be high enough to cover reasonable > > usecases while not too high to allow excessive resources consumption. > > 1024 events consume something like 16KB which shouldn't be a big deal > > and it should be good enough. > > I don't think the memory consumption per-se is the issue to be handled > here (as kernel memory consumption is a different generic problem) but > rather that all listeners, regardless of their priv level, cgroup > membership and so on, end up contributing to this single shared > contiguous table, The table is per-memcg but you are right that everybody who has file write access to the particular group's usage file can register to it. > which makes it quite easy to do DoS attack on it if > the event control is actually delegated to untrusted security domain, OK, I have obviously misunderstood your concern mentioned in the other email. Could you be more specific what is the DoS scenario which was your concern, then? [...] > Can you please update the patch description to reflect the actual > problem? As soon as I understand what is your concern ;) -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
Hello, On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote: > There is no limit for the maximum number of threshold events registered > per memcg. This might lead to an user triggered memory depletion if a > regular user is allowed to register on memory.[memsw.]usage_in_bytes > eventfd interface. > > Let's be more strict and cap the number of events that might be > registered. MAX_THRESHOLD_EVENTS value is more or less random. The > expectation is that it should be high enough to cover reasonable > usecases while not too high to allow excessive resources consumption. > 1024 events consume something like 16KB which shouldn't be a big deal > and it should be good enough. I don't think the memory consumption per-se is the issue to be handled here (as kernel memory consumption is a different generic problem) but rather that all listeners, regardless of their priv level, cgroup membership and so on, end up contributing to this single shared contiguous table, which makes it quite easy to do DoS attack on it if the event control is actually delegated to untrusted security domain, which BTW kinda makes all these complexities kinda pointless as it nullifies the only use case (many un-coordinated listeners watching different thresholds) which the event mechanism can actually do better. A proper fix would be making it build sorted data structure, be it list or tree, and letting each listener insert its own probe at the appropriate position and updating the event generation maintain cursor in the tree and fire events as appropriate, but given that the whole usage model is being obsoleted, it probably isn't worth doing that and this fixed limit is better than just letting things go and allow allocation to fail at some point, I suppose. Can you please update the patch description to reflect the actual problem? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] memcg: limit the number of thresholds per-memcg
There is no limit for the maximum number of threshold events registered per memcg. This might lead to an user triggered memory depletion if a regular user is allowed to register on memory.[memsw.]usage_in_bytes eventfd interface. Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. Reported-by: Tejun Heo Signed-off-by: Michal Hocko --- mm/memcontrol.c |8 1 file changed, 8 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e4330cd..8247db3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) mem_cgroup_oom_notify_cb(iter); } +/* Maximum number of treshold events registered per memcg. */ +#define MAX_THRESHOLD_EVENTS 1024 + static int mem_cgroup_usage_register_event(struct cgroup *cgrp, struct cftype *cft, struct eventfd_ctx *eventfd, const char *args) { @@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp, else BUG(); + if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) { + ret = -ENOSPC; + goto unlock; + } + usage = mem_cgroup_usage(memcg, type == _MEMSWAP); /* Check if a threshold crossed before adding a new one */ -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] memcg: limit the number of thresholds per-memcg
There is no limit for the maximum number of threshold events registered per memcg. This might lead to an user triggered memory depletion if a regular user is allowed to register on memory.[memsw.]usage_in_bytes eventfd interface. Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. Reported-by: Tejun Heo t...@kernel.org Signed-off-by: Michal Hocko mho...@suse.cz --- mm/memcontrol.c |8 1 file changed, 8 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e4330cd..8247db3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) mem_cgroup_oom_notify_cb(iter); } +/* Maximum number of treshold events registered per memcg. */ +#define MAX_THRESHOLD_EVENTS 1024 + static int mem_cgroup_usage_register_event(struct cgroup *cgrp, struct cftype *cft, struct eventfd_ctx *eventfd, const char *args) { @@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp, else BUG(); + if (thresholds-primary-size == MAX_THRESHOLD_EVENTS) { + ret = -ENOSPC; + goto unlock; + } + usage = mem_cgroup_usage(memcg, type == _MEMSWAP); /* Check if a threshold crossed before adding a new one */ -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
Hello, On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote: There is no limit for the maximum number of threshold events registered per memcg. This might lead to an user triggered memory depletion if a regular user is allowed to register on memory.[memsw.]usage_in_bytes eventfd interface. Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. I don't think the memory consumption per-se is the issue to be handled here (as kernel memory consumption is a different generic problem) but rather that all listeners, regardless of their priv level, cgroup membership and so on, end up contributing to this single shared contiguous table, which makes it quite easy to do DoS attack on it if the event control is actually delegated to untrusted security domain, which BTW kinda makes all these complexities kinda pointless as it nullifies the only use case (many un-coordinated listeners watching different thresholds) which the event mechanism can actually do better. A proper fix would be making it build sorted data structure, be it list or tree, and letting each listener insert its own probe at the appropriate position and updating the event generation maintain cursor in the tree and fire events as appropriate, but given that the whole usage model is being obsoleted, it probably isn't worth doing that and this fixed limit is better than just letting things go and allow allocation to fail at some point, I suppose. Can you please update the patch description to reflect the actual problem? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Wed 07-08-13 09:22:10, Tejun Heo wrote: Hello, On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote: There is no limit for the maximum number of threshold events registered per memcg. This might lead to an user triggered memory depletion if a regular user is allowed to register on memory.[memsw.]usage_in_bytes eventfd interface. Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. I don't think the memory consumption per-se is the issue to be handled here (as kernel memory consumption is a different generic problem) but rather that all listeners, regardless of their priv level, cgroup membership and so on, end up contributing to this single shared contiguous table, The table is per-memcg but you are right that everybody who has file write access to the particular group's usage file can register to it. which makes it quite easy to do DoS attack on it if the event control is actually delegated to untrusted security domain, OK, I have obviously misunderstood your concern mentioned in the other email. Could you be more specific what is the DoS scenario which was your concern, then? [...] Can you please update the patch description to reflect the actual problem? As soon as I understand what is your concern ;) -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
Hello, On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: OK, I have obviously misunderstood your concern mentioned in the other email. Could you be more specific what is the DoS scenario which was your concern, then? So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. Putting an extra limit on it isn't an actual solution but could be better, I think. It at least makes it clear that this is a limited global resource. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Wed 07-08-13 09:58:18, Tejun Heo wrote: Hello, On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: OK, I have obviously misunderstood your concern mentioned in the other email. Could you be more specific what is the DoS scenario which was your concern, then? So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. OK, got your point. You are right and I haven't considered the size of the table and the size restrictions of kmalloc. Thanks for pointing this out! --- From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001 From: Michal Hocko mho...@suse.cz Date: Wed, 7 Aug 2013 11:11:22 +0200 Subject: [PATCH] memcg: limit the number of thresholds per-memcg There is no limit for the maximum number of threshold events registered per memcg. It is even worse that all the events are stored in a per-memcg table which is enlarged when a new event is registered. This can lead to the following issue mentioned by Tejun: So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. Reported-by: Tejun Heo t...@kernel.org Signed-off-by: Michal Hocko mho...@suse.cz --- mm/memcontrol.c |8 1 file changed, 8 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e4330cd..8247db3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) mem_cgroup_oom_notify_cb(iter); } +/* Maximum number of treshold events registered per memcg. */ +#define MAX_THRESHOLD_EVENTS 1024 + static int mem_cgroup_usage_register_event(struct cgroup *cgrp, struct cftype *cft, struct eventfd_ctx *eventfd, const char *args) { @@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup *cgrp, else BUG(); + if (thresholds-primary-size == MAX_THRESHOLD_EVENTS) { + ret = -ENOSPC; + goto unlock; + } + usage = mem_cgroup_usage(memcg, type == _MEMSWAP); /* Check if a threshold crossed before adding a new one */ -- 1.7.10.4 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg
On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote: On Wed 07-08-13 09:58:18, Tejun Heo wrote: Hello, On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote: OK, I have obviously misunderstood your concern mentioned in the other email. Could you be more specific what is the DoS scenario which was your concern, then? So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. OK, got your point. You are right and I haven't considered the size of the table and the size restrictions of kmalloc. Thanks for pointing this out! --- From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001 From: Michal Hocko mho...@suse.cz Date: Wed, 7 Aug 2013 11:11:22 +0200 Subject: [PATCH] memcg: limit the number of thresholds per-memcg There is no limit for the maximum number of threshold events registered per memcg. It is even worse that all the events are stored in a per-memcg table which is enlarged when a new event is registered. This can lead to the following issue mentioned by Tejun: So, let's say the file is write-accessible to !priv user which is under reasonable resource limits. Normally this shouldn't affect priv system tools which are monitoring the same event as it shouldn't be able to deplete resources as long as the resource control mechanisms are configured and functioning properly; however, the memory usage event puts all event listeners into a single contiguous table which a !priv user can easily expand to a size where the table can no longer be enlarged and if a priv system tool or another user tries to register event afterwards, it'll fail. IOW, it creates a shared resource which isn't properly provisioned and can be trivially filled up making it an easy DoS target. Let's be more strict and cap the number of events that might be registered. MAX_THRESHOLD_EVENTS value is more or less random. The expectation is that it should be high enough to cover reasonable usecases while not too high to allow excessive resources consumption. 1024 events consume something like 16KB which shouldn't be a big deal and it should be good enough. Is it correct that you fix one local DoS by introducing a new one? With the page the !priv user can block root from registering a threshold. Is it really the way we want to fix it? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/