Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-08 Thread Tejun Heo
Hello,

On Thu, Aug 08, 2013 at 04:43:51PM +0200, Michal Hocko wrote:
> > Is it correct that you fix one local DoS by introducing a new one?
> > With the page the !priv user can block root from registering a threshold.
> > Is it really the way we want to fix it?
> 
> OK, I will think about it some more.

The only thing the patch does is replacing implicit global resource
limit with an explicit one.  Whether that's useful or not, I don't
know, but it doesn't really change the nature of the problem or
actually fix anything.  The only way to fix it is rewriting the whole
thing so that allocations are broken up per source, which I don't
think is a good idea at this point.  I'd just add a comment noting why
it's broken.  Given that delegating to !priv users is horribly broken
anyway, I don't think this worsens the situation by too much anyway.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-08 Thread Michal Hocko
On Thu 08-08-13 01:05:13, Kirill A. Shutemov wrote:
> On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
> > On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > > > OK, I have obviously misunderstood your concern mentioned in the other
> > > > email. Could you be more specific what is the DoS scenario which was
> > > > your concern, then?
> > > 
> > > So, let's say the file is write-accessible to !priv user which is
> > > under reasonable resource limits.  Normally this shouldn't affect priv
> > > system tools which are monitoring the same event as it shouldn't be
> > > able to deplete resources as long as the resource control mechanisms
> > > are configured and functioning properly; however, the memory usage
> > > event puts all event listeners into a single contiguous table which a
> > > !priv user can easily expand to a size where the table can no longer
> > > be enlarged and if a priv system tool or another user tries to
> > > register event afterwards, it'll fail.  IOW, it creates a shared
> > > resource which isn't properly provisioned and can be trivially filled
> > > up making it an easy DoS target.
> > 
> > OK, got your point. You are right and I haven't considered the size of
> > the table and the size restrictions of kmalloc. Thanks for pointing this
> > out!
> > ---
> > From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
> > From: Michal Hocko 
> > Date: Wed, 7 Aug 2013 11:11:22 +0200
> > Subject: [PATCH] memcg: limit the number of thresholds per-memcg
> > 
> > There is no limit for the maximum number of threshold events registered
> > per memcg. It is even worse that all the events are stored in a
> > per-memcg table which is enlarged when a new event is registered. This
> > can lead to the following issue mentioned by Tejun:
> > "
> > So, let's say the file is write-accessible to !priv user which is
> > under reasonable resource limits.  Normally this shouldn't affect priv
> > system tools which are monitoring the same event as it shouldn't be
> > able to deplete resources as long as the resource control mechanisms
> > are configured and functioning properly; however, the memory usage
> > event puts all event listeners into a single contiguous table which a
> > !priv user can easily expand to a size where the table can no longer
> > be enlarged and if a priv system tool or another user tries to
> > register event afterwards, it'll fail.  IOW, it creates a shared
> > resource which isn't properly provisioned and can be trivially filled
> > up making it an easy DoS target.
> > "
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> Is it correct that you fix one local DoS by introducing a new one?
> With the page the !priv user can block root from registering a threshold.
> Is it really the way we want to fix it?

OK, I will think about it some more.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-08 Thread Michal Hocko
On Thu 08-08-13 01:05:13, Kirill A. Shutemov wrote:
 On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
  On Wed 07-08-13 09:58:18, Tejun Heo wrote:
   Hello,
   
   On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
OK, I have obviously misunderstood your concern mentioned in the other
email. Could you be more specific what is the DoS scenario which was
your concern, then?
   
   So, let's say the file is write-accessible to !priv user which is
   under reasonable resource limits.  Normally this shouldn't affect priv
   system tools which are monitoring the same event as it shouldn't be
   able to deplete resources as long as the resource control mechanisms
   are configured and functioning properly; however, the memory usage
   event puts all event listeners into a single contiguous table which a
   !priv user can easily expand to a size where the table can no longer
   be enlarged and if a priv system tool or another user tries to
   register event afterwards, it'll fail.  IOW, it creates a shared
   resource which isn't properly provisioned and can be trivially filled
   up making it an easy DoS target.
  
  OK, got your point. You are right and I haven't considered the size of
  the table and the size restrictions of kmalloc. Thanks for pointing this
  out!
  ---
  From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
  From: Michal Hocko mho...@suse.cz
  Date: Wed, 7 Aug 2013 11:11:22 +0200
  Subject: [PATCH] memcg: limit the number of thresholds per-memcg
  
  There is no limit for the maximum number of threshold events registered
  per memcg. It is even worse that all the events are stored in a
  per-memcg table which is enlarged when a new event is registered. This
  can lead to the following issue mentioned by Tejun:
  
  So, let's say the file is write-accessible to !priv user which is
  under reasonable resource limits.  Normally this shouldn't affect priv
  system tools which are monitoring the same event as it shouldn't be
  able to deplete resources as long as the resource control mechanisms
  are configured and functioning properly; however, the memory usage
  event puts all event listeners into a single contiguous table which a
  !priv user can easily expand to a size where the table can no longer
  be enlarged and if a priv system tool or another user tries to
  register event afterwards, it'll fail.  IOW, it creates a shared
  resource which isn't properly provisioned and can be trivially filled
  up making it an easy DoS target.
  
  
  Let's be more strict and cap the number of events that might be
  registered. MAX_THRESHOLD_EVENTS value is more or less random. The
  expectation is that it should be high enough to cover reasonable
  usecases while not too high to allow excessive resources consumption.
  1024 events consume something like 16KB which shouldn't be a big deal
  and it should be good enough.
 
 Is it correct that you fix one local DoS by introducing a new one?
 With the page the !priv user can block root from registering a threshold.
 Is it really the way we want to fix it?

OK, I will think about it some more.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-08 Thread Tejun Heo
Hello,

On Thu, Aug 08, 2013 at 04:43:51PM +0200, Michal Hocko wrote:
  Is it correct that you fix one local DoS by introducing a new one?
  With the page the !priv user can block root from registering a threshold.
  Is it really the way we want to fix it?
 
 OK, I will think about it some more.

The only thing the patch does is replacing implicit global resource
limit with an explicit one.  Whether that's useful or not, I don't
know, but it doesn't really change the nature of the problem or
actually fix anything.  The only way to fix it is rewriting the whole
thing so that allocations are broken up per source, which I don't
think is a good idea at this point.  I'd just add a comment noting why
it's broken.  Given that delegating to !priv users is horribly broken
anyway, I don't think this worsens the situation by too much anyway.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Kirill A. Shutemov
On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
> On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > > OK, I have obviously misunderstood your concern mentioned in the other
> > > email. Could you be more specific what is the DoS scenario which was
> > > your concern, then?
> > 
> > So, let's say the file is write-accessible to !priv user which is
> > under reasonable resource limits.  Normally this shouldn't affect priv
> > system tools which are monitoring the same event as it shouldn't be
> > able to deplete resources as long as the resource control mechanisms
> > are configured and functioning properly; however, the memory usage
> > event puts all event listeners into a single contiguous table which a
> > !priv user can easily expand to a size where the table can no longer
> > be enlarged and if a priv system tool or another user tries to
> > register event afterwards, it'll fail.  IOW, it creates a shared
> > resource which isn't properly provisioned and can be trivially filled
> > up making it an easy DoS target.
> 
> OK, got your point. You are right and I haven't considered the size of
> the table and the size restrictions of kmalloc. Thanks for pointing this
> out!
> ---
> From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 7 Aug 2013 11:11:22 +0200
> Subject: [PATCH] memcg: limit the number of thresholds per-memcg
> 
> There is no limit for the maximum number of threshold events registered
> per memcg. It is even worse that all the events are stored in a
> per-memcg table which is enlarged when a new event is registered. This
> can lead to the following issue mentioned by Tejun:
> "
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.
> "
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 16KB which shouldn't be a big deal
> and it should be good enough.

Is it correct that you fix one local DoS by introducing a new one?
With the page the !priv user can block root from registering a threshold.
Is it really the way we want to fix it?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:58:18, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> > OK, I have obviously misunderstood your concern mentioned in the other
> > email. Could you be more specific what is the DoS scenario which was
> > your concern, then?
> 
> So, let's say the file is write-accessible to !priv user which is
> under reasonable resource limits.  Normally this shouldn't affect priv
> system tools which are monitoring the same event as it shouldn't be
> able to deplete resources as long as the resource control mechanisms
> are configured and functioning properly; however, the memory usage
> event puts all event listeners into a single contiguous table which a
> !priv user can easily expand to a size where the table can no longer
> be enlarged and if a priv system tool or another user tries to
> register event afterwards, it'll fail.  IOW, it creates a shared
> resource which isn't properly provisioned and can be trivially filled
> up making it an easy DoS target.

OK, got your point. You are right and I haven't considered the size of
the table and the size restrictions of kmalloc. Thanks for pointing this
out!
---
>From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Wed, 7 Aug 2013 11:11:22 +0200
Subject: [PATCH] memcg: limit the number of thresholds per-memcg

There is no limit for the maximum number of threshold events registered
per memcg. It is even worse that all the events are stored in a
per-memcg table which is enlarged when a new event is registered. This
can lead to the following issue mentioned by Tejun:
"
So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.
"

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo 
Signed-off-by: Michal Hocko 
---
 mm/memcontrol.c |8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup 
*memcg)
mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS   1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup 
*cgrp,
else
BUG();
 
+   if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) {
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Tejun Heo
Hello,

On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
> OK, I have obviously misunderstood your concern mentioned in the other
> email. Could you be more specific what is the DoS scenario which was
> your concern, then?

So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.

Putting an extra limit on it isn't an actual solution but could be
better, I think.  It at least makes it clear that this is a limited
global resource.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:22:10, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> > There is no limit for the maximum number of threshold events registered
> > per memcg. This might lead to an user triggered memory depletion if a
> > regular user is allowed to register on memory.[memsw.]usage_in_bytes
> > eventfd interface.
> > 
> > Let's be more strict and cap the number of events that might be
> > registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> > expectation is that it should be high enough to cover reasonable
> > usecases while not too high to allow excessive resources consumption.
> > 1024 events consume something like 16KB which shouldn't be a big deal
> > and it should be good enough.
> 
> I don't think the memory consumption per-se is the issue to be handled
> here (as kernel memory consumption is a different generic problem) but
> rather that all listeners, regardless of their priv level, cgroup
> membership and so on, end up contributing to this single shared
> contiguous table,

The table is per-memcg but you are right that everybody who has file
write access to the particular group's usage file can register to it.

> which makes it quite easy to do DoS attack on it if
> the event control is actually delegated to untrusted security domain,

OK, I have obviously misunderstood your concern mentioned in the other
email. Could you be more specific what is the DoS scenario which was
your concern, then?

[...]
> Can you please update the patch description to reflect the actual
> problem?

As soon as I understand what is your concern ;)
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Tejun Heo
Hello,

On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
> There is no limit for the maximum number of threshold events registered
> per memcg. This might lead to an user triggered memory depletion if a
> regular user is allowed to register on memory.[memsw.]usage_in_bytes
> eventfd interface.
> 
> Let's be more strict and cap the number of events that might be
> registered. MAX_THRESHOLD_EVENTS value is more or less random. The
> expectation is that it should be high enough to cover reasonable
> usecases while not too high to allow excessive resources consumption.
> 1024 events consume something like 16KB which shouldn't be a big deal
> and it should be good enough.

I don't think the memory consumption per-se is the issue to be handled
here (as kernel memory consumption is a different generic problem) but
rather that all listeners, regardless of their priv level, cgroup
membership and so on, end up contributing to this single shared
contiguous table, which makes it quite easy to do DoS attack on it if
the event control is actually delegated to untrusted security domain,
which BTW kinda makes all these complexities kinda pointless as it
nullifies the only use case (many un-coordinated listeners watching
different thresholds) which the event mechanism can actually do
better.

A proper fix would be making it build sorted data structure, be it
list or tree, and letting each listener insert its own probe at the
appropriate position and updating the event generation maintain cursor
in the tree and fire events as appropriate, but given that the whole
usage model is being obsoleted, it probably isn't worth doing that and
this fixed limit is better than just letting things go and allow
allocation to fail at some point, I suppose.

Can you please update the patch description to reflect the actual
problem?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
There is no limit for the maximum number of threshold events registered
per memcg. This might lead to an user triggered memory depletion if a
regular user is allowed to register on memory.[memsw.]usage_in_bytes
eventfd interface.

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo 
Signed-off-by: Michal Hocko 
---
 mm/memcontrol.c |8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup 
*memcg)
mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS   1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup 
*cgrp,
else
BUG();
 
+   if (thresholds->primary->size == MAX_THRESHOLD_EVENTS) {
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
There is no limit for the maximum number of threshold events registered
per memcg. This might lead to an user triggered memory depletion if a
regular user is allowed to register on memory.[memsw.]usage_in_bytes
eventfd interface.

Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo t...@kernel.org
Signed-off-by: Michal Hocko mho...@suse.cz
---
 mm/memcontrol.c |8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup 
*memcg)
mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS   1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup 
*cgrp,
else
BUG();
 
+   if (thresholds-primary-size == MAX_THRESHOLD_EVENTS) {
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Tejun Heo
Hello,

On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
 There is no limit for the maximum number of threshold events registered
 per memcg. This might lead to an user triggered memory depletion if a
 regular user is allowed to register on memory.[memsw.]usage_in_bytes
 eventfd interface.
 
 Let's be more strict and cap the number of events that might be
 registered. MAX_THRESHOLD_EVENTS value is more or less random. The
 expectation is that it should be high enough to cover reasonable
 usecases while not too high to allow excessive resources consumption.
 1024 events consume something like 16KB which shouldn't be a big deal
 and it should be good enough.

I don't think the memory consumption per-se is the issue to be handled
here (as kernel memory consumption is a different generic problem) but
rather that all listeners, regardless of their priv level, cgroup
membership and so on, end up contributing to this single shared
contiguous table, which makes it quite easy to do DoS attack on it if
the event control is actually delegated to untrusted security domain,
which BTW kinda makes all these complexities kinda pointless as it
nullifies the only use case (many un-coordinated listeners watching
different thresholds) which the event mechanism can actually do
better.

A proper fix would be making it build sorted data structure, be it
list or tree, and letting each listener insert its own probe at the
appropriate position and updating the event generation maintain cursor
in the tree and fire events as appropriate, but given that the whole
usage model is being obsoleted, it probably isn't worth doing that and
this fixed limit is better than just letting things go and allow
allocation to fail at some point, I suppose.

Can you please update the patch description to reflect the actual
problem?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:22:10, Tejun Heo wrote:
 Hello,
 
 On Wed, Aug 07, 2013 at 01:28:25PM +0200, Michal Hocko wrote:
  There is no limit for the maximum number of threshold events registered
  per memcg. This might lead to an user triggered memory depletion if a
  regular user is allowed to register on memory.[memsw.]usage_in_bytes
  eventfd interface.
  
  Let's be more strict and cap the number of events that might be
  registered. MAX_THRESHOLD_EVENTS value is more or less random. The
  expectation is that it should be high enough to cover reasonable
  usecases while not too high to allow excessive resources consumption.
  1024 events consume something like 16KB which shouldn't be a big deal
  and it should be good enough.
 
 I don't think the memory consumption per-se is the issue to be handled
 here (as kernel memory consumption is a different generic problem) but
 rather that all listeners, regardless of their priv level, cgroup
 membership and so on, end up contributing to this single shared
 contiguous table,

The table is per-memcg but you are right that everybody who has file
write access to the particular group's usage file can register to it.

 which makes it quite easy to do DoS attack on it if
 the event control is actually delegated to untrusted security domain,

OK, I have obviously misunderstood your concern mentioned in the other
email. Could you be more specific what is the DoS scenario which was
your concern, then?

[...]
 Can you please update the patch description to reflect the actual
 problem?

As soon as I understand what is your concern ;)
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Tejun Heo
Hello,

On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
 OK, I have obviously misunderstood your concern mentioned in the other
 email. Could you be more specific what is the DoS scenario which was
 your concern, then?

So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.

Putting an extra limit on it isn't an actual solution but could be
better, I think.  It at least makes it clear that this is a limited
global resource.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Michal Hocko
On Wed 07-08-13 09:58:18, Tejun Heo wrote:
 Hello,
 
 On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
  OK, I have obviously misunderstood your concern mentioned in the other
  email. Could you be more specific what is the DoS scenario which was
  your concern, then?
 
 So, let's say the file is write-accessible to !priv user which is
 under reasonable resource limits.  Normally this shouldn't affect priv
 system tools which are monitoring the same event as it shouldn't be
 able to deplete resources as long as the resource control mechanisms
 are configured and functioning properly; however, the memory usage
 event puts all event listeners into a single contiguous table which a
 !priv user can easily expand to a size where the table can no longer
 be enlarged and if a priv system tool or another user tries to
 register event afterwards, it'll fail.  IOW, it creates a shared
 resource which isn't properly provisioned and can be trivially filled
 up making it an easy DoS target.

OK, got your point. You are right and I haven't considered the size of
the table and the size restrictions of kmalloc. Thanks for pointing this
out!
---
From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
From: Michal Hocko mho...@suse.cz
Date: Wed, 7 Aug 2013 11:11:22 +0200
Subject: [PATCH] memcg: limit the number of thresholds per-memcg

There is no limit for the maximum number of threshold events registered
per memcg. It is even worse that all the events are stored in a
per-memcg table which is enlarged when a new event is registered. This
can lead to the following issue mentioned by Tejun:

So, let's say the file is write-accessible to !priv user which is
under reasonable resource limits.  Normally this shouldn't affect priv
system tools which are monitoring the same event as it shouldn't be
able to deplete resources as long as the resource control mechanisms
are configured and functioning properly; however, the memory usage
event puts all event listeners into a single contiguous table which a
!priv user can easily expand to a size where the table can no longer
be enlarged and if a priv system tool or another user tries to
register event afterwards, it'll fail.  IOW, it creates a shared
resource which isn't properly provisioned and can be trivially filled
up making it an easy DoS target.


Let's be more strict and cap the number of events that might be
registered. MAX_THRESHOLD_EVENTS value is more or less random. The
expectation is that it should be high enough to cover reasonable
usecases while not too high to allow excessive resources consumption.
1024 events consume something like 16KB which shouldn't be a big deal
and it should be good enough.

Reported-by: Tejun Heo t...@kernel.org
Signed-off-by: Michal Hocko mho...@suse.cz
---
 mm/memcontrol.c |8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e4330cd..8247db3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5401,6 +5401,9 @@ static void mem_cgroup_oom_notify(struct mem_cgroup 
*memcg)
mem_cgroup_oom_notify_cb(iter);
 }
 
+/* Maximum number of treshold events registered per memcg. */
+#define MAX_THRESHOLD_EVENTS   1024
+
 static int mem_cgroup_usage_register_event(struct cgroup *cgrp,
struct cftype *cft, struct eventfd_ctx *eventfd, const char *args)
 {
@@ -5424,6 +5427,11 @@ static int mem_cgroup_usage_register_event(struct cgroup 
*cgrp,
else
BUG();
 
+   if (thresholds-primary-size == MAX_THRESHOLD_EVENTS) {
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
usage = mem_cgroup_usage(memcg, type == _MEMSWAP);
 
/* Check if a threshold crossed before adding a new one */
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] memcg: limit the number of thresholds per-memcg

2013-08-07 Thread Kirill A. Shutemov
On Wed, Aug 07, 2013 at 04:37:27PM +0200, Michal Hocko wrote:
 On Wed 07-08-13 09:58:18, Tejun Heo wrote:
  Hello,
  
  On Wed, Aug 07, 2013 at 03:46:54PM +0200, Michal Hocko wrote:
   OK, I have obviously misunderstood your concern mentioned in the other
   email. Could you be more specific what is the DoS scenario which was
   your concern, then?
  
  So, let's say the file is write-accessible to !priv user which is
  under reasonable resource limits.  Normally this shouldn't affect priv
  system tools which are monitoring the same event as it shouldn't be
  able to deplete resources as long as the resource control mechanisms
  are configured and functioning properly; however, the memory usage
  event puts all event listeners into a single contiguous table which a
  !priv user can easily expand to a size where the table can no longer
  be enlarged and if a priv system tool or another user tries to
  register event afterwards, it'll fail.  IOW, it creates a shared
  resource which isn't properly provisioned and can be trivially filled
  up making it an easy DoS target.
 
 OK, got your point. You are right and I haven't considered the size of
 the table and the size restrictions of kmalloc. Thanks for pointing this
 out!
 ---
 From cde8a296eddd288780e78803610127401b6a Mon Sep 17 00:00:00 2001
 From: Michal Hocko mho...@suse.cz
 Date: Wed, 7 Aug 2013 11:11:22 +0200
 Subject: [PATCH] memcg: limit the number of thresholds per-memcg
 
 There is no limit for the maximum number of threshold events registered
 per memcg. It is even worse that all the events are stored in a
 per-memcg table which is enlarged when a new event is registered. This
 can lead to the following issue mentioned by Tejun:
 
 So, let's say the file is write-accessible to !priv user which is
 under reasonable resource limits.  Normally this shouldn't affect priv
 system tools which are monitoring the same event as it shouldn't be
 able to deplete resources as long as the resource control mechanisms
 are configured and functioning properly; however, the memory usage
 event puts all event listeners into a single contiguous table which a
 !priv user can easily expand to a size where the table can no longer
 be enlarged and if a priv system tool or another user tries to
 register event afterwards, it'll fail.  IOW, it creates a shared
 resource which isn't properly provisioned and can be trivially filled
 up making it an easy DoS target.
 
 
 Let's be more strict and cap the number of events that might be
 registered. MAX_THRESHOLD_EVENTS value is more or less random. The
 expectation is that it should be high enough to cover reasonable
 usecases while not too high to allow excessive resources consumption.
 1024 events consume something like 16KB which shouldn't be a big deal
 and it should be good enough.

Is it correct that you fix one local DoS by introducing a new one?
With the page the !priv user can block root from registering a threshold.
Is it really the way we want to fix it?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/