Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-06 Thread Balbir Singh
On Tue, Aug 6, 2013 at 7:39 PM, Tejun Heo  wrote:
> Hello, Balbir.
>
> On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote:
>> [off-topic] Has the unified hierarchy been agreed upon? I did not
>> follow that thread
>
> I consider it agreed upon enough.  There of course are objections but
> I feel fairly comfortable with the amount of existing consensus and
> considering the current state of cgroup in general, especially the API
> leaks, I don't think we have many other choices.  The devil is always
> in the details but unless we meet a major technical barrier, I'm
> pretty sure it's happening.
>

I guess we'll need to see what the details look like :)

>> > events at fixed points, or if that's too restrictive, configureable
>> > cadence or single set of configureable points should be enough.
>>
>> Nit-pick: typo on the spelling of configurable
>
> Will update.
>
>> Tejun, I think the framework was designed to be flexible. Do you see
>> cgroup subsystems never using this?
>
> I can't be a hundred percent sure that we won't need events which are
> configureable per-listener but it's highly unlikely given that we're
> moving onto single agent model and the nature of event delivery -
> spurious events are unlikely to be noticeable unless the frequency is
> very high.  In general, as anything, aiming for extremes isn't a
> healthy design practice.  Maximum flexibility sounds good in isolation
> but nothing is free and it entails unneeded complexity both in
> implementation and usage.  Note that even for memcg, both oom and
> vmpressure don't benefit in any way from all the added complexity at
> all.  The only other place that I can see event being useful at the
> moment is freezer state change notification and that also would only
> require unconditional file modified notification.
>

Hmm.. I think it would be interesting to set thresholds on other
resources instead of pure monitoring. One might want to set thresholds
for CPUACCT usage for example. Freezer is another example like you
mentioned.

>> > +static int cgroup_write_event_control(struct cgroup_subsys_state *css,
>> > + struct cftype *cft, const char 
>> > *buffer)
>> > +{
>> > +   struct cgroup *cgrp = css->cgroup;
>> > +   struct cgroup_event *event;
>> > +   struct cgroup *cgrp_cfile;
>> > +   unsigned int efd, cfd;
>> > +   struct file *efile;
>> > +   struct file *cfile;
>> > +   char *endp;
>> > +   int ret;
>> > +
>>
>> Can we assert that buffer is NOT NULL here?
>
> The patch moves the code as-is as things become difficult to review
> otherwise.  After the patchset, it belongs to memcg, please feel free
> to modify as memcg people see fit.

OK, Thanks

Balbir Singh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-06 Thread Tejun Heo
Hello, Balbir.

On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote:
> [off-topic] Has the unified hierarchy been agreed upon? I did not
> follow that thread

I consider it agreed upon enough.  There of course are objections but
I feel fairly comfortable with the amount of existing consensus and
considering the current state of cgroup in general, especially the API
leaks, I don't think we have many other choices.  The devil is always
in the details but unless we meet a major technical barrier, I'm
pretty sure it's happening.

> > events at fixed points, or if that's too restrictive, configureable
> > cadence or single set of configureable points should be enough.
> 
> Nit-pick: typo on the spelling of configurable

Will update.

> Tejun, I think the framework was designed to be flexible. Do you see
> cgroup subsystems never using this?

I can't be a hundred percent sure that we won't need events which are
configureable per-listener but it's highly unlikely given that we're
moving onto single agent model and the nature of event delivery -
spurious events are unlikely to be noticeable unless the frequency is
very high.  In general, as anything, aiming for extremes isn't a
healthy design practice.  Maximum flexibility sounds good in isolation
but nothing is free and it entails unneeded complexity both in
implementation and usage.  Note that even for memcg, both oom and
vmpressure don't benefit in any way from all the added complexity at
all.  The only other place that I can see event being useful at the
moment is freezer state change notification and that also would only
require unconditional file modified notification.

> > +static int cgroup_write_event_control(struct cgroup_subsys_state *css,
> > + struct cftype *cft, const char 
> > *buffer)
> > +{
> > +   struct cgroup *cgrp = css->cgroup;
> > +   struct cgroup_event *event;
> > +   struct cgroup *cgrp_cfile;
> > +   unsigned int efd, cfd;
> > +   struct file *efile;
> > +   struct file *cfile;
> > +   char *endp;
> > +   int ret;
> > +
> 
> Can we assert that buffer is NOT NULL here?

The patch moves the code as-is as things become difficult to review
otherwise.  After the patchset, it belongs to memcg, please feel free
to modify as memcg people see fit.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-06 Thread Tejun Heo
Hello, Balbir.

On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote:
 [off-topic] Has the unified hierarchy been agreed upon? I did not
 follow that thread

I consider it agreed upon enough.  There of course are objections but
I feel fairly comfortable with the amount of existing consensus and
considering the current state of cgroup in general, especially the API
leaks, I don't think we have many other choices.  The devil is always
in the details but unless we meet a major technical barrier, I'm
pretty sure it's happening.

  events at fixed points, or if that's too restrictive, configureable
  cadence or single set of configureable points should be enough.
 
 Nit-pick: typo on the spelling of configurable

Will update.

 Tejun, I think the framework was designed to be flexible. Do you see
 cgroup subsystems never using this?

I can't be a hundred percent sure that we won't need events which are
configureable per-listener but it's highly unlikely given that we're
moving onto single agent model and the nature of event delivery -
spurious events are unlikely to be noticeable unless the frequency is
very high.  In general, as anything, aiming for extremes isn't a
healthy design practice.  Maximum flexibility sounds good in isolation
but nothing is free and it entails unneeded complexity both in
implementation and usage.  Note that even for memcg, both oom and
vmpressure don't benefit in any way from all the added complexity at
all.  The only other place that I can see event being useful at the
moment is freezer state change notification and that also would only
require unconditional file modified notification.

  +static int cgroup_write_event_control(struct cgroup_subsys_state *css,
  + struct cftype *cft, const char 
  *buffer)
  +{
  +   struct cgroup *cgrp = css-cgroup;
  +   struct cgroup_event *event;
  +   struct cgroup *cgrp_cfile;
  +   unsigned int efd, cfd;
  +   struct file *efile;
  +   struct file *cfile;
  +   char *endp;
  +   int ret;
  +
 
 Can we assert that buffer is NOT NULL here?

The patch moves the code as-is as things become difficult to review
otherwise.  After the patchset, it belongs to memcg, please feel free
to modify as memcg people see fit.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-06 Thread Balbir Singh
On Tue, Aug 6, 2013 at 7:39 PM, Tejun Heo t...@kernel.org wrote:
 Hello, Balbir.

 On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote:
 [off-topic] Has the unified hierarchy been agreed upon? I did not
 follow that thread

 I consider it agreed upon enough.  There of course are objections but
 I feel fairly comfortable with the amount of existing consensus and
 considering the current state of cgroup in general, especially the API
 leaks, I don't think we have many other choices.  The devil is always
 in the details but unless we meet a major technical barrier, I'm
 pretty sure it's happening.


I guess we'll need to see what the details look like :)

  events at fixed points, or if that's too restrictive, configureable
  cadence or single set of configureable points should be enough.

 Nit-pick: typo on the spelling of configurable

 Will update.

 Tejun, I think the framework was designed to be flexible. Do you see
 cgroup subsystems never using this?

 I can't be a hundred percent sure that we won't need events which are
 configureable per-listener but it's highly unlikely given that we're
 moving onto single agent model and the nature of event delivery -
 spurious events are unlikely to be noticeable unless the frequency is
 very high.  In general, as anything, aiming for extremes isn't a
 healthy design practice.  Maximum flexibility sounds good in isolation
 but nothing is free and it entails unneeded complexity both in
 implementation and usage.  Note that even for memcg, both oom and
 vmpressure don't benefit in any way from all the added complexity at
 all.  The only other place that I can see event being useful at the
 moment is freezer state change notification and that also would only
 require unconditional file modified notification.


Hmm.. I think it would be interesting to set thresholds on other
resources instead of pure monitoring. One might want to set thresholds
for CPUACCT usage for example. Freezer is another example like you
mentioned.

  +static int cgroup_write_event_control(struct cgroup_subsys_state *css,
  + struct cftype *cft, const char 
  *buffer)
  +{
  +   struct cgroup *cgrp = css-cgroup;
  +   struct cgroup_event *event;
  +   struct cgroup *cgrp_cfile;
  +   unsigned int efd, cfd;
  +   struct file *efile;
  +   struct file *cfile;
  +   char *endp;
  +   int ret;
  +

 Can we assert that buffer is NOT NULL here?

 The patch moves the code as-is as things become difficult to review
 otherwise.  After the patchset, it belongs to memcg, please feel free
 to modify as memcg people see fit.

OK, Thanks

Balbir Singh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-05 Thread Balbir Singh
On Sun, Aug 4, 2013 at 9:37 PM, Tejun Heo  wrote:
> cgroup_event is way over-designed and tries to build a generic
> flexible event mechanism into cgroup - fully customizable event
> specification for each user of the interface.  This is utterly
> unnecessary and overboard especially in the light of the planned
> unified hierarchy as there's gonna be single agent.  Simply generating

[off-topic] Has the unified hierarchy been agreed upon? I did not
follow that thread

> events at fixed points, or if that's too restrictive, configureable
> cadence or single set of configureable points should be enough.
>

Nit-pick: typo on the spelling of configurable

> Thankfully, memcg is the only user and gets to keep it.  Replacing it
> with something simpler on sane_behavior is strongly recommended.
>
> This patch moves cgroup_event and "cgroup.event_control"
> implementation to mm/memcontrol.c.  Clearing of events on cgroup
> destruction is moved from cgroup_destroy_locked() to
> mem_cgroup_css_offline(), which shouldn't make any noticeable
> difference.
>
> Note that "cgroup.event_control" will now exist only on the hierarchy
> with memcg attached to it.  While this change is visible to userland,
> it is unlikely to be noticeable as the file has never been meaningful
> outside memcg.
>

Tejun, I think the framework was designed to be flexible. Do you see
cgroup subsystems never using this?

> Signed-off-by: Tejun Heo 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Balbir Singh 
> ---
>  kernel/cgroup.c | 237 ---
>  mm/memcontrol.c | 238 
> 
>  2 files changed, 238 insertions(+), 237 deletions(-)
>
...
> +/*
> + * cgroup_event represents events which userspace want to receive.
> + */
> +struct cgroup_event {
> +   /*
> +* css which the event belongs to.
> +*/
> +   struct cgroup_subsys_state *css;
> +   /*
> +* Control file which the event associated.
> +*/
> +   struct cftype *cft;
> +   /*
> +* eventfd to signal userspace about the event.
> +*/
> +   struct eventfd_ctx *eventfd;
> +   /*
> +* Each of these stored in a list by the cgroup.
> +*/
> +   struct list_head list;
> +   /*
> +* All fields below needed to unregister event when
> +* userspace closes eventfd.
> +*/
> +   poll_table pt;
> +   wait_queue_head_t *wqh;
> +   wait_queue_t wait;
> +   struct work_struct remove;
> +};
> +
>  static void mem_cgroup_threshold(struct mem_cgroup *memcg);
>  static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
>
> @@ -5926,6 +5956,194 @@ static void kmem_cgroup_css_offline(struct mem_cgroup 
> *memcg)
>  }
>  #endif
>
> +/*
> + * Unregister event and free resources.
> + *
> + * Gets called from workqueue.
> + */
> +static void cgroup_event_remove(struct work_struct *work)
> +{
> +   struct cgroup_event *event = container_of(work, struct cgroup_event,
> +   remove);
> +   struct cgroup_subsys_state *css = event->css;
> +   struct cgroup *cgrp = css->cgroup;
> +
> +   remove_wait_queue(event->wqh, >wait);
> +
> +   event->cft->unregister_event(css, event->cft, event->eventfd);
> +
> +   /* Notify userspace the event is going away. */
> +   eventfd_signal(event->eventfd, 1);
> +
> +   eventfd_ctx_put(event->eventfd);
> +   kfree(event);
> +   __cgroup_dput(cgrp);
> +}
> +
> +/*
> + * Gets called on POLLHUP on eventfd when user closes it.
> + *
> + * Called with wqh->lock held and interrupts disabled.
> + */
> +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> +   int sync, void *key)
> +{
> +   struct cgroup_event *event = container_of(wait,
> +   struct cgroup_event, wait);
> +   struct cgroup *cgrp = event->css->cgroup;
> +   unsigned long flags = (unsigned long)key;
> +
> +   if (flags & POLLHUP) {
> +   /*
> +* If the event has been detached at cgroup removal, we
> +* can simply return knowing the other side will cleanup
> +* for us.
> +*
> +* We can't race against event freeing since the other
> +* side will require wqh->lock via remove_wait_queue(),
> +* which we hold.
> +*/
> +   spin_lock(>event_list_lock);
> +   if (!list_empty(>list)) {
> +   list_del_init(>list);
> +   /*
> +* We are in atomic context, but cgroup_event_remove()
> +* may sleep, so we have to call it in workqueue.
> +*/
> +   schedule_work(>remove);
> +   }
> +   spin_unlock(>event_list_lock);
> +   }
> +
> +   return 0;
> +}
> +
> +static void 

Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-05 Thread Balbir Singh
On Sun, Aug 4, 2013 at 9:37 PM, Tejun Heo t...@kernel.org wrote:
 cgroup_event is way over-designed and tries to build a generic
 flexible event mechanism into cgroup - fully customizable event
 specification for each user of the interface.  This is utterly
 unnecessary and overboard especially in the light of the planned
 unified hierarchy as there's gonna be single agent.  Simply generating

[off-topic] Has the unified hierarchy been agreed upon? I did not
follow that thread

 events at fixed points, or if that's too restrictive, configureable
 cadence or single set of configureable points should be enough.


Nit-pick: typo on the spelling of configurable

 Thankfully, memcg is the only user and gets to keep it.  Replacing it
 with something simpler on sane_behavior is strongly recommended.

 This patch moves cgroup_event and cgroup.event_control
 implementation to mm/memcontrol.c.  Clearing of events on cgroup
 destruction is moved from cgroup_destroy_locked() to
 mem_cgroup_css_offline(), which shouldn't make any noticeable
 difference.

 Note that cgroup.event_control will now exist only on the hierarchy
 with memcg attached to it.  While this change is visible to userland,
 it is unlikely to be noticeable as the file has never been meaningful
 outside memcg.


Tejun, I think the framework was designed to be flexible. Do you see
cgroup subsystems never using this?

 Signed-off-by: Tejun Heo t...@kernel.org
 Cc: Johannes Weiner han...@cmpxchg.org
 Cc: Michal Hocko mho...@suse.cz
 Cc: Balbir Singh bsinghar...@gmail.com
 ---
  kernel/cgroup.c | 237 ---
  mm/memcontrol.c | 238 
 
  2 files changed, 238 insertions(+), 237 deletions(-)

...
 +/*
 + * cgroup_event represents events which userspace want to receive.
 + */
 +struct cgroup_event {
 +   /*
 +* css which the event belongs to.
 +*/
 +   struct cgroup_subsys_state *css;
 +   /*
 +* Control file which the event associated.
 +*/
 +   struct cftype *cft;
 +   /*
 +* eventfd to signal userspace about the event.
 +*/
 +   struct eventfd_ctx *eventfd;
 +   /*
 +* Each of these stored in a list by the cgroup.
 +*/
 +   struct list_head list;
 +   /*
 +* All fields below needed to unregister event when
 +* userspace closes eventfd.
 +*/
 +   poll_table pt;
 +   wait_queue_head_t *wqh;
 +   wait_queue_t wait;
 +   struct work_struct remove;
 +};
 +
  static void mem_cgroup_threshold(struct mem_cgroup *memcg);
  static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);

 @@ -5926,6 +5956,194 @@ static void kmem_cgroup_css_offline(struct mem_cgroup 
 *memcg)
  }
  #endif

 +/*
 + * Unregister event and free resources.
 + *
 + * Gets called from workqueue.
 + */
 +static void cgroup_event_remove(struct work_struct *work)
 +{
 +   struct cgroup_event *event = container_of(work, struct cgroup_event,
 +   remove);
 +   struct cgroup_subsys_state *css = event-css;
 +   struct cgroup *cgrp = css-cgroup;
 +
 +   remove_wait_queue(event-wqh, event-wait);
 +
 +   event-cft-unregister_event(css, event-cft, event-eventfd);
 +
 +   /* Notify userspace the event is going away. */
 +   eventfd_signal(event-eventfd, 1);
 +
 +   eventfd_ctx_put(event-eventfd);
 +   kfree(event);
 +   __cgroup_dput(cgrp);
 +}
 +
 +/*
 + * Gets called on POLLHUP on eventfd when user closes it.
 + *
 + * Called with wqh-lock held and interrupts disabled.
 + */
 +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 +   int sync, void *key)
 +{
 +   struct cgroup_event *event = container_of(wait,
 +   struct cgroup_event, wait);
 +   struct cgroup *cgrp = event-css-cgroup;
 +   unsigned long flags = (unsigned long)key;
 +
 +   if (flags  POLLHUP) {
 +   /*
 +* If the event has been detached at cgroup removal, we
 +* can simply return knowing the other side will cleanup
 +* for us.
 +*
 +* We can't race against event freeing since the other
 +* side will require wqh-lock via remove_wait_queue(),
 +* which we hold.
 +*/
 +   spin_lock(cgrp-event_list_lock);
 +   if (!list_empty(event-list)) {
 +   list_del_init(event-list);
 +   /*
 +* We are in atomic context, but cgroup_event_remove()
 +* may sleep, so we have to call it in workqueue.
 +*/
 +   schedule_work(event-remove);
 +   }
 +   spin_unlock(cgrp-event_list_lock);
 +   }
 +
 +   return 0;
 +}
 +
 +static void cgroup_event_ptable_queue_proc(struct file 

Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-04 Thread Li Zefan
于 2013/8/5 0:07, Tejun Heo 写道:
> cgroup_event is way over-designed and tries to build a generic
> flexible event mechanism into cgroup - fully customizable event
> specification for each user of the interface.  This is utterly
> unnecessary and overboard especially in the light of the planned
> unified hierarchy as there's gonna be single agent.  Simply generating
> events at fixed points, or if that's too restrictive, configureable
> cadence or single set of configureable points should be enough.
> 
> Thankfully, memcg is the only user and gets to keep it.  Replacing it
> with something simpler on sane_behavior is strongly recommended.
> 
> This patch moves cgroup_event and "cgroup.event_control"
> implementation to mm/memcontrol.c.  Clearing of events on cgroup
> destruction is moved from cgroup_destroy_locked() to
> mem_cgroup_css_offline(), which shouldn't make any noticeable
> difference.
> 
> Note that "cgroup.event_control" will now exist only on the hierarchy
> with memcg attached to it.  While this change is visible to userland,
> it is unlikely to be noticeable as the file has never been meaningful
> outside memcg.
> 
> Signed-off-by: Tejun Heo 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Balbir Singh 
> ---
>  kernel/cgroup.c | 237 ---
>  mm/memcontrol.c | 238 
> 
>  2 files changed, 238 insertions(+), 237 deletions(-)
> 

init/Kconfig needs to be updated too:

menuconfig CGROUPS
boolean "Control Group support"
depends on EVENTFD
...
config SCHED_AUTOGROUP
bool "Automatic process group scheduling"
select EVENTFD
select CGROUPS

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 2583b7b..a0b5e22 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -56,7 +56,6 @@
>  #include 
>  #include 
>  #include  /* TODO: replace with more sophisticated array */
> -#include 
>  #include 

poll.h also can be removed.

>  #include  /* used in cgroup_attach_task */
>  #include 
> @@ -154,36 +153,6 @@ struct css_id {
>   unsigned short stack[0]; /* Array of Length (depth+1) */
>  };
>  

[...]

>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2885e3e..3700b65 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c

#include 
#include 

> @@ -239,6 +239,36 @@ struct mem_cgroup_eventfd_list {
>   struct eventfd_ctx *eventfd;
>  };


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-04 Thread Tejun Heo
cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent.  Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.

Thankfully, memcg is the only user and gets to keep it.  Replacing it
with something simpler on sane_behavior is strongly recommended.

This patch moves cgroup_event and "cgroup.event_control"
implementation to mm/memcontrol.c.  Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.

Note that "cgroup.event_control" will now exist only on the hierarchy
with memcg attached to it.  While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.

Signed-off-by: Tejun Heo 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Balbir Singh 
---
 kernel/cgroup.c | 237 ---
 mm/memcontrol.c | 238 
 2 files changed, 238 insertions(+), 237 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2583b7b..a0b5e22 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,7 +56,6 @@
 #include 
 #include 
 #include  /* TODO: replace with more sophisticated array */
-#include 
 #include 
 #include  /* used in cgroup_attach_task */
 #include 
@@ -154,36 +153,6 @@ struct css_id {
unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
-/*
- * cgroup_event represents events which userspace want to receive.
- */
-struct cgroup_event {
-   /*
-* css which the event belongs to.
-*/
-   struct cgroup_subsys_state *css;
-   /*
-* Control file which the event associated.
-*/
-   struct cftype *cft;
-   /*
-* eventfd to signal userspace about the event.
-*/
-   struct eventfd_ctx *eventfd;
-   /*
-* Each of these stored in a list by the cgroup.
-*/
-   struct list_head list;
-   /*
-* All fields below needed to unregister event when
-* userspace closes eventfd.
-*/
-   poll_table pt;
-   wait_queue_head_t *wqh;
-   wait_queue_t wait;
-   struct work_struct remove;
-};
-
 /* The list of hierarchy roots */
 
 static LIST_HEAD(cgroup_roots);
@@ -3966,194 +3935,6 @@ void __cgroup_dput(struct cgroup *cgrp)
 }
 EXPORT_SYMBOL_GPL(__cgroup_dput);
 
-/*
- * Unregister event and free resources.
- *
- * Gets called from workqueue.
- */
-static void cgroup_event_remove(struct work_struct *work)
-{
-   struct cgroup_event *event = container_of(work, struct cgroup_event,
-   remove);
-   struct cgroup_subsys_state *css = event->css;
-   struct cgroup *cgrp = css->cgroup;
-
-   remove_wait_queue(event->wqh, >wait);
-
-   event->cft->unregister_event(css, event->cft, event->eventfd);
-
-   /* Notify userspace the event is going away. */
-   eventfd_signal(event->eventfd, 1);
-
-   eventfd_ctx_put(event->eventfd);
-   kfree(event);
-   __cgroup_dput(cgrp);
-}
-
-/*
- * Gets called on POLLHUP on eventfd when user closes it.
- *
- * Called with wqh->lock held and interrupts disabled.
- */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-   int sync, void *key)
-{
-   struct cgroup_event *event = container_of(wait,
-   struct cgroup_event, wait);
-   struct cgroup *cgrp = event->css->cgroup;
-   unsigned long flags = (unsigned long)key;
-
-   if (flags & POLLHUP) {
-   /*
-* If the event has been detached at cgroup removal, we
-* can simply return knowing the other side will cleanup
-* for us.
-*
-* We can't race against event freeing since the other
-* side will require wqh->lock via remove_wait_queue(),
-* which we hold.
-*/
-   spin_lock(>event_list_lock);
-   if (!list_empty(>list)) {
-   list_del_init(>list);
-   /*
-* We are in atomic context, but cgroup_event_remove()
-* may sleep, so we have to call it in workqueue.
-*/
-   schedule_work(>remove);
-   }
-   spin_unlock(>event_list_lock);
-   }
-
-   return 0;
-}
-
-static void cgroup_event_ptable_queue_proc(struct file *file,
-   wait_queue_head_t *wqh, poll_table *pt)
-{
-   struct cgroup_event *event = container_of(pt,
-   struct 

[PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-04 Thread Tejun Heo
cgroup_event is way over-designed and tries to build a generic
flexible event mechanism into cgroup - fully customizable event
specification for each user of the interface.  This is utterly
unnecessary and overboard especially in the light of the planned
unified hierarchy as there's gonna be single agent.  Simply generating
events at fixed points, or if that's too restrictive, configureable
cadence or single set of configureable points should be enough.

Thankfully, memcg is the only user and gets to keep it.  Replacing it
with something simpler on sane_behavior is strongly recommended.

This patch moves cgroup_event and cgroup.event_control
implementation to mm/memcontrol.c.  Clearing of events on cgroup
destruction is moved from cgroup_destroy_locked() to
mem_cgroup_css_offline(), which shouldn't make any noticeable
difference.

Note that cgroup.event_control will now exist only on the hierarchy
with memcg attached to it.  While this change is visible to userland,
it is unlikely to be noticeable as the file has never been meaningful
outside memcg.

Signed-off-by: Tejun Heo t...@kernel.org
Cc: Johannes Weiner han...@cmpxchg.org
Cc: Michal Hocko mho...@suse.cz
Cc: Balbir Singh bsinghar...@gmail.com
---
 kernel/cgroup.c | 237 ---
 mm/memcontrol.c | 238 
 2 files changed, 238 insertions(+), 237 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2583b7b..a0b5e22 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,7 +56,6 @@
 #include linux/pid_namespace.h
 #include linux/idr.h
 #include linux/vmalloc.h /* TODO: replace with more sophisticated array */
-#include linux/eventfd.h
 #include linux/poll.h
 #include linux/flex_array.h /* used in cgroup_attach_task */
 #include linux/kthread.h
@@ -154,36 +153,6 @@ struct css_id {
unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
-/*
- * cgroup_event represents events which userspace want to receive.
- */
-struct cgroup_event {
-   /*
-* css which the event belongs to.
-*/
-   struct cgroup_subsys_state *css;
-   /*
-* Control file which the event associated.
-*/
-   struct cftype *cft;
-   /*
-* eventfd to signal userspace about the event.
-*/
-   struct eventfd_ctx *eventfd;
-   /*
-* Each of these stored in a list by the cgroup.
-*/
-   struct list_head list;
-   /*
-* All fields below needed to unregister event when
-* userspace closes eventfd.
-*/
-   poll_table pt;
-   wait_queue_head_t *wqh;
-   wait_queue_t wait;
-   struct work_struct remove;
-};
-
 /* The list of hierarchy roots */
 
 static LIST_HEAD(cgroup_roots);
@@ -3966,194 +3935,6 @@ void __cgroup_dput(struct cgroup *cgrp)
 }
 EXPORT_SYMBOL_GPL(__cgroup_dput);
 
-/*
- * Unregister event and free resources.
- *
- * Gets called from workqueue.
- */
-static void cgroup_event_remove(struct work_struct *work)
-{
-   struct cgroup_event *event = container_of(work, struct cgroup_event,
-   remove);
-   struct cgroup_subsys_state *css = event-css;
-   struct cgroup *cgrp = css-cgroup;
-
-   remove_wait_queue(event-wqh, event-wait);
-
-   event-cft-unregister_event(css, event-cft, event-eventfd);
-
-   /* Notify userspace the event is going away. */
-   eventfd_signal(event-eventfd, 1);
-
-   eventfd_ctx_put(event-eventfd);
-   kfree(event);
-   __cgroup_dput(cgrp);
-}
-
-/*
- * Gets called on POLLHUP on eventfd when user closes it.
- *
- * Called with wqh-lock held and interrupts disabled.
- */
-static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
-   int sync, void *key)
-{
-   struct cgroup_event *event = container_of(wait,
-   struct cgroup_event, wait);
-   struct cgroup *cgrp = event-css-cgroup;
-   unsigned long flags = (unsigned long)key;
-
-   if (flags  POLLHUP) {
-   /*
-* If the event has been detached at cgroup removal, we
-* can simply return knowing the other side will cleanup
-* for us.
-*
-* We can't race against event freeing since the other
-* side will require wqh-lock via remove_wait_queue(),
-* which we hold.
-*/
-   spin_lock(cgrp-event_list_lock);
-   if (!list_empty(event-list)) {
-   list_del_init(event-list);
-   /*
-* We are in atomic context, but cgroup_event_remove()
-* may sleep, so we have to call it in workqueue.
-*/
-   schedule_work(event-remove);
-   }
-   spin_unlock(cgrp-event_list_lock);
-   }
-
-   return 0;
-}
-
-static void 

Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-04 Thread Li Zefan
于 2013/8/5 0:07, Tejun Heo 写道:
 cgroup_event is way over-designed and tries to build a generic
 flexible event mechanism into cgroup - fully customizable event
 specification for each user of the interface.  This is utterly
 unnecessary and overboard especially in the light of the planned
 unified hierarchy as there's gonna be single agent.  Simply generating
 events at fixed points, or if that's too restrictive, configureable
 cadence or single set of configureable points should be enough.
 
 Thankfully, memcg is the only user and gets to keep it.  Replacing it
 with something simpler on sane_behavior is strongly recommended.
 
 This patch moves cgroup_event and cgroup.event_control
 implementation to mm/memcontrol.c.  Clearing of events on cgroup
 destruction is moved from cgroup_destroy_locked() to
 mem_cgroup_css_offline(), which shouldn't make any noticeable
 difference.
 
 Note that cgroup.event_control will now exist only on the hierarchy
 with memcg attached to it.  While this change is visible to userland,
 it is unlikely to be noticeable as the file has never been meaningful
 outside memcg.
 
 Signed-off-by: Tejun Heo t...@kernel.org
 Cc: Johannes Weiner han...@cmpxchg.org
 Cc: Michal Hocko mho...@suse.cz
 Cc: Balbir Singh bsinghar...@gmail.com
 ---
  kernel/cgroup.c | 237 ---
  mm/memcontrol.c | 238 
 
  2 files changed, 238 insertions(+), 237 deletions(-)
 

init/Kconfig needs to be updated too:

menuconfig CGROUPS
boolean Control Group support
depends on EVENTFD
...
config SCHED_AUTOGROUP
bool Automatic process group scheduling
select EVENTFD
select CGROUPS

 diff --git a/kernel/cgroup.c b/kernel/cgroup.c
 index 2583b7b..a0b5e22 100644
 --- a/kernel/cgroup.c
 +++ b/kernel/cgroup.c
 @@ -56,7 +56,6 @@
  #include linux/pid_namespace.h
  #include linux/idr.h
  #include linux/vmalloc.h /* TODO: replace with more sophisticated array */
 -#include linux/eventfd.h
  #include linux/poll.h

poll.h also can be removed.

  #include linux/flex_array.h /* used in cgroup_attach_task */
  #include linux/kthread.h
 @@ -154,36 +153,6 @@ struct css_id {
   unsigned short stack[0]; /* Array of Length (depth+1) */
  };
  

[...]

  
 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index 2885e3e..3700b65 100644
 --- a/mm/memcontrol.c
 +++ b/mm/memcontrol.c

#include linux/eventfd.h
#include linux/poll.h

 @@ -239,6 +239,36 @@ struct mem_cgroup_eventfd_list {
   struct eventfd_ctx *eventfd;
  };


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/