Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
On Tue, Aug 6, 2013 at 7:39 PM, Tejun Heo wrote: > Hello, Balbir. > > On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote: >> [off-topic] Has the unified hierarchy been agreed upon? I did not >> follow that thread > > I consider it agreed upon enough. There of course are objections but > I feel fairly comfortable with the amount of existing consensus and > considering the current state of cgroup in general, especially the API > leaks, I don't think we have many other choices. The devil is always > in the details but unless we meet a major technical barrier, I'm > pretty sure it's happening. > I guess we'll need to see what the details look like :) >> > events at fixed points, or if that's too restrictive, configureable >> > cadence or single set of configureable points should be enough. >> >> Nit-pick: typo on the spelling of configurable > > Will update. > >> Tejun, I think the framework was designed to be flexible. Do you see >> cgroup subsystems never using this? > > I can't be a hundred percent sure that we won't need events which are > configureable per-listener but it's highly unlikely given that we're > moving onto single agent model and the nature of event delivery - > spurious events are unlikely to be noticeable unless the frequency is > very high. In general, as anything, aiming for extremes isn't a > healthy design practice. Maximum flexibility sounds good in isolation > but nothing is free and it entails unneeded complexity both in > implementation and usage. Note that even for memcg, both oom and > vmpressure don't benefit in any way from all the added complexity at > all. The only other place that I can see event being useful at the > moment is freezer state change notification and that also would only > require unconditional file modified notification. > Hmm.. I think it would be interesting to set thresholds on other resources instead of pure monitoring. One might want to set thresholds for CPUACCT usage for example. Freezer is another example like you mentioned. >> > +static int cgroup_write_event_control(struct cgroup_subsys_state *css, >> > + struct cftype *cft, const char >> > *buffer) >> > +{ >> > + struct cgroup *cgrp = css->cgroup; >> > + struct cgroup_event *event; >> > + struct cgroup *cgrp_cfile; >> > + unsigned int efd, cfd; >> > + struct file *efile; >> > + struct file *cfile; >> > + char *endp; >> > + int ret; >> > + >> >> Can we assert that buffer is NOT NULL here? > > The patch moves the code as-is as things become difficult to review > otherwise. After the patchset, it belongs to memcg, please feel free > to modify as memcg people see fit. OK, Thanks Balbir Singh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
Hello, Balbir. On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote: > [off-topic] Has the unified hierarchy been agreed upon? I did not > follow that thread I consider it agreed upon enough. There of course are objections but I feel fairly comfortable with the amount of existing consensus and considering the current state of cgroup in general, especially the API leaks, I don't think we have many other choices. The devil is always in the details but unless we meet a major technical barrier, I'm pretty sure it's happening. > > events at fixed points, or if that's too restrictive, configureable > > cadence or single set of configureable points should be enough. > > Nit-pick: typo on the spelling of configurable Will update. > Tejun, I think the framework was designed to be flexible. Do you see > cgroup subsystems never using this? I can't be a hundred percent sure that we won't need events which are configureable per-listener but it's highly unlikely given that we're moving onto single agent model and the nature of event delivery - spurious events are unlikely to be noticeable unless the frequency is very high. In general, as anything, aiming for extremes isn't a healthy design practice. Maximum flexibility sounds good in isolation but nothing is free and it entails unneeded complexity both in implementation and usage. Note that even for memcg, both oom and vmpressure don't benefit in any way from all the added complexity at all. The only other place that I can see event being useful at the moment is freezer state change notification and that also would only require unconditional file modified notification. > > +static int cgroup_write_event_control(struct cgroup_subsys_state *css, > > + struct cftype *cft, const char > > *buffer) > > +{ > > + struct cgroup *cgrp = css->cgroup; > > + struct cgroup_event *event; > > + struct cgroup *cgrp_cfile; > > + unsigned int efd, cfd; > > + struct file *efile; > > + struct file *cfile; > > + char *endp; > > + int ret; > > + > > Can we assert that buffer is NOT NULL here? The patch moves the code as-is as things become difficult to review otherwise. After the patchset, it belongs to memcg, please feel free to modify as memcg people see fit. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
Hello, Balbir. On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote: [off-topic] Has the unified hierarchy been agreed upon? I did not follow that thread I consider it agreed upon enough. There of course are objections but I feel fairly comfortable with the amount of existing consensus and considering the current state of cgroup in general, especially the API leaks, I don't think we have many other choices. The devil is always in the details but unless we meet a major technical barrier, I'm pretty sure it's happening. events at fixed points, or if that's too restrictive, configureable cadence or single set of configureable points should be enough. Nit-pick: typo on the spelling of configurable Will update. Tejun, I think the framework was designed to be flexible. Do you see cgroup subsystems never using this? I can't be a hundred percent sure that we won't need events which are configureable per-listener but it's highly unlikely given that we're moving onto single agent model and the nature of event delivery - spurious events are unlikely to be noticeable unless the frequency is very high. In general, as anything, aiming for extremes isn't a healthy design practice. Maximum flexibility sounds good in isolation but nothing is free and it entails unneeded complexity both in implementation and usage. Note that even for memcg, both oom and vmpressure don't benefit in any way from all the added complexity at all. The only other place that I can see event being useful at the moment is freezer state change notification and that also would only require unconditional file modified notification. +static int cgroup_write_event_control(struct cgroup_subsys_state *css, + struct cftype *cft, const char *buffer) +{ + struct cgroup *cgrp = css-cgroup; + struct cgroup_event *event; + struct cgroup *cgrp_cfile; + unsigned int efd, cfd; + struct file *efile; + struct file *cfile; + char *endp; + int ret; + Can we assert that buffer is NOT NULL here? The patch moves the code as-is as things become difficult to review otherwise. After the patchset, it belongs to memcg, please feel free to modify as memcg people see fit. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
On Tue, Aug 6, 2013 at 7:39 PM, Tejun Heo t...@kernel.org wrote: Hello, Balbir. On Tue, Aug 06, 2013 at 08:56:34AM +0530, Balbir Singh wrote: [off-topic] Has the unified hierarchy been agreed upon? I did not follow that thread I consider it agreed upon enough. There of course are objections but I feel fairly comfortable with the amount of existing consensus and considering the current state of cgroup in general, especially the API leaks, I don't think we have many other choices. The devil is always in the details but unless we meet a major technical barrier, I'm pretty sure it's happening. I guess we'll need to see what the details look like :) events at fixed points, or if that's too restrictive, configureable cadence or single set of configureable points should be enough. Nit-pick: typo on the spelling of configurable Will update. Tejun, I think the framework was designed to be flexible. Do you see cgroup subsystems never using this? I can't be a hundred percent sure that we won't need events which are configureable per-listener but it's highly unlikely given that we're moving onto single agent model and the nature of event delivery - spurious events are unlikely to be noticeable unless the frequency is very high. In general, as anything, aiming for extremes isn't a healthy design practice. Maximum flexibility sounds good in isolation but nothing is free and it entails unneeded complexity both in implementation and usage. Note that even for memcg, both oom and vmpressure don't benefit in any way from all the added complexity at all. The only other place that I can see event being useful at the moment is freezer state change notification and that also would only require unconditional file modified notification. Hmm.. I think it would be interesting to set thresholds on other resources instead of pure monitoring. One might want to set thresholds for CPUACCT usage for example. Freezer is another example like you mentioned. +static int cgroup_write_event_control(struct cgroup_subsys_state *css, + struct cftype *cft, const char *buffer) +{ + struct cgroup *cgrp = css-cgroup; + struct cgroup_event *event; + struct cgroup *cgrp_cfile; + unsigned int efd, cfd; + struct file *efile; + struct file *cfile; + char *endp; + int ret; + Can we assert that buffer is NOT NULL here? The patch moves the code as-is as things become difficult to review otherwise. After the patchset, it belongs to memcg, please feel free to modify as memcg people see fit. OK, Thanks Balbir Singh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
On Sun, Aug 4, 2013 at 9:37 PM, Tejun Heo wrote: > cgroup_event is way over-designed and tries to build a generic > flexible event mechanism into cgroup - fully customizable event > specification for each user of the interface. This is utterly > unnecessary and overboard especially in the light of the planned > unified hierarchy as there's gonna be single agent. Simply generating [off-topic] Has the unified hierarchy been agreed upon? I did not follow that thread > events at fixed points, or if that's too restrictive, configureable > cadence or single set of configureable points should be enough. > Nit-pick: typo on the spelling of configurable > Thankfully, memcg is the only user and gets to keep it. Replacing it > with something simpler on sane_behavior is strongly recommended. > > This patch moves cgroup_event and "cgroup.event_control" > implementation to mm/memcontrol.c. Clearing of events on cgroup > destruction is moved from cgroup_destroy_locked() to > mem_cgroup_css_offline(), which shouldn't make any noticeable > difference. > > Note that "cgroup.event_control" will now exist only on the hierarchy > with memcg attached to it. While this change is visible to userland, > it is unlikely to be noticeable as the file has never been meaningful > outside memcg. > Tejun, I think the framework was designed to be flexible. Do you see cgroup subsystems never using this? > Signed-off-by: Tejun Heo > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Balbir Singh > --- > kernel/cgroup.c | 237 --- > mm/memcontrol.c | 238 > > 2 files changed, 238 insertions(+), 237 deletions(-) > ... > +/* > + * cgroup_event represents events which userspace want to receive. > + */ > +struct cgroup_event { > + /* > +* css which the event belongs to. > +*/ > + struct cgroup_subsys_state *css; > + /* > +* Control file which the event associated. > +*/ > + struct cftype *cft; > + /* > +* eventfd to signal userspace about the event. > +*/ > + struct eventfd_ctx *eventfd; > + /* > +* Each of these stored in a list by the cgroup. > +*/ > + struct list_head list; > + /* > +* All fields below needed to unregister event when > +* userspace closes eventfd. > +*/ > + poll_table pt; > + wait_queue_head_t *wqh; > + wait_queue_t wait; > + struct work_struct remove; > +}; > + > static void mem_cgroup_threshold(struct mem_cgroup *memcg); > static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); > > @@ -5926,6 +5956,194 @@ static void kmem_cgroup_css_offline(struct mem_cgroup > *memcg) > } > #endif > > +/* > + * Unregister event and free resources. > + * > + * Gets called from workqueue. > + */ > +static void cgroup_event_remove(struct work_struct *work) > +{ > + struct cgroup_event *event = container_of(work, struct cgroup_event, > + remove); > + struct cgroup_subsys_state *css = event->css; > + struct cgroup *cgrp = css->cgroup; > + > + remove_wait_queue(event->wqh, >wait); > + > + event->cft->unregister_event(css, event->cft, event->eventfd); > + > + /* Notify userspace the event is going away. */ > + eventfd_signal(event->eventfd, 1); > + > + eventfd_ctx_put(event->eventfd); > + kfree(event); > + __cgroup_dput(cgrp); > +} > + > +/* > + * Gets called on POLLHUP on eventfd when user closes it. > + * > + * Called with wqh->lock held and interrupts disabled. > + */ > +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode, > + int sync, void *key) > +{ > + struct cgroup_event *event = container_of(wait, > + struct cgroup_event, wait); > + struct cgroup *cgrp = event->css->cgroup; > + unsigned long flags = (unsigned long)key; > + > + if (flags & POLLHUP) { > + /* > +* If the event has been detached at cgroup removal, we > +* can simply return knowing the other side will cleanup > +* for us. > +* > +* We can't race against event freeing since the other > +* side will require wqh->lock via remove_wait_queue(), > +* which we hold. > +*/ > + spin_lock(>event_list_lock); > + if (!list_empty(>list)) { > + list_del_init(>list); > + /* > +* We are in atomic context, but cgroup_event_remove() > +* may sleep, so we have to call it in workqueue. > +*/ > + schedule_work(>remove); > + } > + spin_unlock(>event_list_lock); > + } > + > + return 0; > +} > + > +static void
Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
On Sun, Aug 4, 2013 at 9:37 PM, Tejun Heo t...@kernel.org wrote: cgroup_event is way over-designed and tries to build a generic flexible event mechanism into cgroup - fully customizable event specification for each user of the interface. This is utterly unnecessary and overboard especially in the light of the planned unified hierarchy as there's gonna be single agent. Simply generating [off-topic] Has the unified hierarchy been agreed upon? I did not follow that thread events at fixed points, or if that's too restrictive, configureable cadence or single set of configureable points should be enough. Nit-pick: typo on the spelling of configurable Thankfully, memcg is the only user and gets to keep it. Replacing it with something simpler on sane_behavior is strongly recommended. This patch moves cgroup_event and cgroup.event_control implementation to mm/memcontrol.c. Clearing of events on cgroup destruction is moved from cgroup_destroy_locked() to mem_cgroup_css_offline(), which shouldn't make any noticeable difference. Note that cgroup.event_control will now exist only on the hierarchy with memcg attached to it. While this change is visible to userland, it is unlikely to be noticeable as the file has never been meaningful outside memcg. Tejun, I think the framework was designed to be flexible. Do you see cgroup subsystems never using this? Signed-off-by: Tejun Heo t...@kernel.org Cc: Johannes Weiner han...@cmpxchg.org Cc: Michal Hocko mho...@suse.cz Cc: Balbir Singh bsinghar...@gmail.com --- kernel/cgroup.c | 237 --- mm/memcontrol.c | 238 2 files changed, 238 insertions(+), 237 deletions(-) ... +/* + * cgroup_event represents events which userspace want to receive. + */ +struct cgroup_event { + /* +* css which the event belongs to. +*/ + struct cgroup_subsys_state *css; + /* +* Control file which the event associated. +*/ + struct cftype *cft; + /* +* eventfd to signal userspace about the event. +*/ + struct eventfd_ctx *eventfd; + /* +* Each of these stored in a list by the cgroup. +*/ + struct list_head list; + /* +* All fields below needed to unregister event when +* userspace closes eventfd. +*/ + poll_table pt; + wait_queue_head_t *wqh; + wait_queue_t wait; + struct work_struct remove; +}; + static void mem_cgroup_threshold(struct mem_cgroup *memcg); static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); @@ -5926,6 +5956,194 @@ static void kmem_cgroup_css_offline(struct mem_cgroup *memcg) } #endif +/* + * Unregister event and free resources. + * + * Gets called from workqueue. + */ +static void cgroup_event_remove(struct work_struct *work) +{ + struct cgroup_event *event = container_of(work, struct cgroup_event, + remove); + struct cgroup_subsys_state *css = event-css; + struct cgroup *cgrp = css-cgroup; + + remove_wait_queue(event-wqh, event-wait); + + event-cft-unregister_event(css, event-cft, event-eventfd); + + /* Notify userspace the event is going away. */ + eventfd_signal(event-eventfd, 1); + + eventfd_ctx_put(event-eventfd); + kfree(event); + __cgroup_dput(cgrp); +} + +/* + * Gets called on POLLHUP on eventfd when user closes it. + * + * Called with wqh-lock held and interrupts disabled. + */ +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode, + int sync, void *key) +{ + struct cgroup_event *event = container_of(wait, + struct cgroup_event, wait); + struct cgroup *cgrp = event-css-cgroup; + unsigned long flags = (unsigned long)key; + + if (flags POLLHUP) { + /* +* If the event has been detached at cgroup removal, we +* can simply return knowing the other side will cleanup +* for us. +* +* We can't race against event freeing since the other +* side will require wqh-lock via remove_wait_queue(), +* which we hold. +*/ + spin_lock(cgrp-event_list_lock); + if (!list_empty(event-list)) { + list_del_init(event-list); + /* +* We are in atomic context, but cgroup_event_remove() +* may sleep, so we have to call it in workqueue. +*/ + schedule_work(event-remove); + } + spin_unlock(cgrp-event_list_lock); + } + + return 0; +} + +static void cgroup_event_ptable_queue_proc(struct file
Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
于 2013/8/5 0:07, Tejun Heo 写道: > cgroup_event is way over-designed and tries to build a generic > flexible event mechanism into cgroup - fully customizable event > specification for each user of the interface. This is utterly > unnecessary and overboard especially in the light of the planned > unified hierarchy as there's gonna be single agent. Simply generating > events at fixed points, or if that's too restrictive, configureable > cadence or single set of configureable points should be enough. > > Thankfully, memcg is the only user and gets to keep it. Replacing it > with something simpler on sane_behavior is strongly recommended. > > This patch moves cgroup_event and "cgroup.event_control" > implementation to mm/memcontrol.c. Clearing of events on cgroup > destruction is moved from cgroup_destroy_locked() to > mem_cgroup_css_offline(), which shouldn't make any noticeable > difference. > > Note that "cgroup.event_control" will now exist only on the hierarchy > with memcg attached to it. While this change is visible to userland, > it is unlikely to be noticeable as the file has never been meaningful > outside memcg. > > Signed-off-by: Tejun Heo > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: Balbir Singh > --- > kernel/cgroup.c | 237 --- > mm/memcontrol.c | 238 > > 2 files changed, 238 insertions(+), 237 deletions(-) > init/Kconfig needs to be updated too: menuconfig CGROUPS boolean "Control Group support" depends on EVENTFD ... config SCHED_AUTOGROUP bool "Automatic process group scheduling" select EVENTFD select CGROUPS > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 2583b7b..a0b5e22 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -56,7 +56,6 @@ > #include > #include > #include /* TODO: replace with more sophisticated array */ > -#include > #include poll.h also can be removed. > #include /* used in cgroup_attach_task */ > #include > @@ -154,36 +153,6 @@ struct css_id { > unsigned short stack[0]; /* Array of Length (depth+1) */ > }; > [...] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 2885e3e..3700b65 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c #include #include > @@ -239,6 +239,36 @@ struct mem_cgroup_eventfd_list { > struct eventfd_ctx *eventfd; > }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
cgroup_event is way over-designed and tries to build a generic flexible event mechanism into cgroup - fully customizable event specification for each user of the interface. This is utterly unnecessary and overboard especially in the light of the planned unified hierarchy as there's gonna be single agent. Simply generating events at fixed points, or if that's too restrictive, configureable cadence or single set of configureable points should be enough. Thankfully, memcg is the only user and gets to keep it. Replacing it with something simpler on sane_behavior is strongly recommended. This patch moves cgroup_event and "cgroup.event_control" implementation to mm/memcontrol.c. Clearing of events on cgroup destruction is moved from cgroup_destroy_locked() to mem_cgroup_css_offline(), which shouldn't make any noticeable difference. Note that "cgroup.event_control" will now exist only on the hierarchy with memcg attached to it. While this change is visible to userland, it is unlikely to be noticeable as the file has never been meaningful outside memcg. Signed-off-by: Tejun Heo Cc: Johannes Weiner Cc: Michal Hocko Cc: Balbir Singh --- kernel/cgroup.c | 237 --- mm/memcontrol.c | 238 2 files changed, 238 insertions(+), 237 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2583b7b..a0b5e22 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -56,7 +56,6 @@ #include #include #include /* TODO: replace with more sophisticated array */ -#include #include #include /* used in cgroup_attach_task */ #include @@ -154,36 +153,6 @@ struct css_id { unsigned short stack[0]; /* Array of Length (depth+1) */ }; -/* - * cgroup_event represents events which userspace want to receive. - */ -struct cgroup_event { - /* -* css which the event belongs to. -*/ - struct cgroup_subsys_state *css; - /* -* Control file which the event associated. -*/ - struct cftype *cft; - /* -* eventfd to signal userspace about the event. -*/ - struct eventfd_ctx *eventfd; - /* -* Each of these stored in a list by the cgroup. -*/ - struct list_head list; - /* -* All fields below needed to unregister event when -* userspace closes eventfd. -*/ - poll_table pt; - wait_queue_head_t *wqh; - wait_queue_t wait; - struct work_struct remove; -}; - /* The list of hierarchy roots */ static LIST_HEAD(cgroup_roots); @@ -3966,194 +3935,6 @@ void __cgroup_dput(struct cgroup *cgrp) } EXPORT_SYMBOL_GPL(__cgroup_dput); -/* - * Unregister event and free resources. - * - * Gets called from workqueue. - */ -static void cgroup_event_remove(struct work_struct *work) -{ - struct cgroup_event *event = container_of(work, struct cgroup_event, - remove); - struct cgroup_subsys_state *css = event->css; - struct cgroup *cgrp = css->cgroup; - - remove_wait_queue(event->wqh, >wait); - - event->cft->unregister_event(css, event->cft, event->eventfd); - - /* Notify userspace the event is going away. */ - eventfd_signal(event->eventfd, 1); - - eventfd_ctx_put(event->eventfd); - kfree(event); - __cgroup_dput(cgrp); -} - -/* - * Gets called on POLLHUP on eventfd when user closes it. - * - * Called with wqh->lock held and interrupts disabled. - */ -static int cgroup_event_wake(wait_queue_t *wait, unsigned mode, - int sync, void *key) -{ - struct cgroup_event *event = container_of(wait, - struct cgroup_event, wait); - struct cgroup *cgrp = event->css->cgroup; - unsigned long flags = (unsigned long)key; - - if (flags & POLLHUP) { - /* -* If the event has been detached at cgroup removal, we -* can simply return knowing the other side will cleanup -* for us. -* -* We can't race against event freeing since the other -* side will require wqh->lock via remove_wait_queue(), -* which we hold. -*/ - spin_lock(>event_list_lock); - if (!list_empty(>list)) { - list_del_init(>list); - /* -* We are in atomic context, but cgroup_event_remove() -* may sleep, so we have to call it in workqueue. -*/ - schedule_work(>remove); - } - spin_unlock(>event_list_lock); - } - - return 0; -} - -static void cgroup_event_ptable_queue_proc(struct file *file, - wait_queue_head_t *wqh, poll_table *pt) -{ - struct cgroup_event *event = container_of(pt, - struct
[PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
cgroup_event is way over-designed and tries to build a generic flexible event mechanism into cgroup - fully customizable event specification for each user of the interface. This is utterly unnecessary and overboard especially in the light of the planned unified hierarchy as there's gonna be single agent. Simply generating events at fixed points, or if that's too restrictive, configureable cadence or single set of configureable points should be enough. Thankfully, memcg is the only user and gets to keep it. Replacing it with something simpler on sane_behavior is strongly recommended. This patch moves cgroup_event and cgroup.event_control implementation to mm/memcontrol.c. Clearing of events on cgroup destruction is moved from cgroup_destroy_locked() to mem_cgroup_css_offline(), which shouldn't make any noticeable difference. Note that cgroup.event_control will now exist only on the hierarchy with memcg attached to it. While this change is visible to userland, it is unlikely to be noticeable as the file has never been meaningful outside memcg. Signed-off-by: Tejun Heo t...@kernel.org Cc: Johannes Weiner han...@cmpxchg.org Cc: Michal Hocko mho...@suse.cz Cc: Balbir Singh bsinghar...@gmail.com --- kernel/cgroup.c | 237 --- mm/memcontrol.c | 238 2 files changed, 238 insertions(+), 237 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2583b7b..a0b5e22 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -56,7 +56,6 @@ #include linux/pid_namespace.h #include linux/idr.h #include linux/vmalloc.h /* TODO: replace with more sophisticated array */ -#include linux/eventfd.h #include linux/poll.h #include linux/flex_array.h /* used in cgroup_attach_task */ #include linux/kthread.h @@ -154,36 +153,6 @@ struct css_id { unsigned short stack[0]; /* Array of Length (depth+1) */ }; -/* - * cgroup_event represents events which userspace want to receive. - */ -struct cgroup_event { - /* -* css which the event belongs to. -*/ - struct cgroup_subsys_state *css; - /* -* Control file which the event associated. -*/ - struct cftype *cft; - /* -* eventfd to signal userspace about the event. -*/ - struct eventfd_ctx *eventfd; - /* -* Each of these stored in a list by the cgroup. -*/ - struct list_head list; - /* -* All fields below needed to unregister event when -* userspace closes eventfd. -*/ - poll_table pt; - wait_queue_head_t *wqh; - wait_queue_t wait; - struct work_struct remove; -}; - /* The list of hierarchy roots */ static LIST_HEAD(cgroup_roots); @@ -3966,194 +3935,6 @@ void __cgroup_dput(struct cgroup *cgrp) } EXPORT_SYMBOL_GPL(__cgroup_dput); -/* - * Unregister event and free resources. - * - * Gets called from workqueue. - */ -static void cgroup_event_remove(struct work_struct *work) -{ - struct cgroup_event *event = container_of(work, struct cgroup_event, - remove); - struct cgroup_subsys_state *css = event-css; - struct cgroup *cgrp = css-cgroup; - - remove_wait_queue(event-wqh, event-wait); - - event-cft-unregister_event(css, event-cft, event-eventfd); - - /* Notify userspace the event is going away. */ - eventfd_signal(event-eventfd, 1); - - eventfd_ctx_put(event-eventfd); - kfree(event); - __cgroup_dput(cgrp); -} - -/* - * Gets called on POLLHUP on eventfd when user closes it. - * - * Called with wqh-lock held and interrupts disabled. - */ -static int cgroup_event_wake(wait_queue_t *wait, unsigned mode, - int sync, void *key) -{ - struct cgroup_event *event = container_of(wait, - struct cgroup_event, wait); - struct cgroup *cgrp = event-css-cgroup; - unsigned long flags = (unsigned long)key; - - if (flags POLLHUP) { - /* -* If the event has been detached at cgroup removal, we -* can simply return knowing the other side will cleanup -* for us. -* -* We can't race against event freeing since the other -* side will require wqh-lock via remove_wait_queue(), -* which we hold. -*/ - spin_lock(cgrp-event_list_lock); - if (!list_empty(event-list)) { - list_del_init(event-list); - /* -* We are in atomic context, but cgroup_event_remove() -* may sleep, so we have to call it in workqueue. -*/ - schedule_work(event-remove); - } - spin_unlock(cgrp-event_list_lock); - } - - return 0; -} - -static void
Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg
于 2013/8/5 0:07, Tejun Heo 写道: cgroup_event is way over-designed and tries to build a generic flexible event mechanism into cgroup - fully customizable event specification for each user of the interface. This is utterly unnecessary and overboard especially in the light of the planned unified hierarchy as there's gonna be single agent. Simply generating events at fixed points, or if that's too restrictive, configureable cadence or single set of configureable points should be enough. Thankfully, memcg is the only user and gets to keep it. Replacing it with something simpler on sane_behavior is strongly recommended. This patch moves cgroup_event and cgroup.event_control implementation to mm/memcontrol.c. Clearing of events on cgroup destruction is moved from cgroup_destroy_locked() to mem_cgroup_css_offline(), which shouldn't make any noticeable difference. Note that cgroup.event_control will now exist only on the hierarchy with memcg attached to it. While this change is visible to userland, it is unlikely to be noticeable as the file has never been meaningful outside memcg. Signed-off-by: Tejun Heo t...@kernel.org Cc: Johannes Weiner han...@cmpxchg.org Cc: Michal Hocko mho...@suse.cz Cc: Balbir Singh bsinghar...@gmail.com --- kernel/cgroup.c | 237 --- mm/memcontrol.c | 238 2 files changed, 238 insertions(+), 237 deletions(-) init/Kconfig needs to be updated too: menuconfig CGROUPS boolean Control Group support depends on EVENTFD ... config SCHED_AUTOGROUP bool Automatic process group scheduling select EVENTFD select CGROUPS diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2583b7b..a0b5e22 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -56,7 +56,6 @@ #include linux/pid_namespace.h #include linux/idr.h #include linux/vmalloc.h /* TODO: replace with more sophisticated array */ -#include linux/eventfd.h #include linux/poll.h poll.h also can be removed. #include linux/flex_array.h /* used in cgroup_attach_task */ #include linux/kthread.h @@ -154,36 +153,6 @@ struct css_id { unsigned short stack[0]; /* Array of Length (depth+1) */ }; [...] diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2885e3e..3700b65 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c #include linux/eventfd.h #include linux/poll.h @@ -239,6 +239,36 @@ struct mem_cgroup_eventfd_list { struct eventfd_ctx *eventfd; }; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/