Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
On 25/02/2016 15:34, Parav Pandit wrote: > On Thu, Feb 25, 2016 at 5:33 PM, Haggai Eranwrote: > +retry: > + spin_lock(>rpool_list_lock); > + rpool = find_cg_rpool_locked(cg, device); > + if (!rpool) { > + spin_unlock(>rpool_list_lock); > + ret = alloc_cg_rpool(cg, device); > + if (ret) > + goto err; > + else > + goto retry; Instead of retrying after allocation of a new rpool, why not just return the newly allocated rpool (or the existing one) from alloc_cg_rpool? >>> >>> It can be done, but locking semantics just becomes difficult to >>> review/maintain with that where alloc_cg_rpool will unlock and lock >>> conditionally later on. >> Maybe I'm missing something, but couldn't you simply lock rpool_list_lock >> inside alloc_cg_rpool()? It already does that around its call to >> find_cg_rpool_locked() and the insertion to cg_list. > > No. ref_count and usage counters are updated at level where lock is > taken in charge_cg_resource(). > If I move locking rpool_list_lock inside alloc_cg_rpool, unlocking > will continue outside, alloc_cg_rpool() when its found or allocated. > As you acknowledged in below comment that this makes confusing to > lock/unlock from different context, I think current implementation > achieves both. > (a) take lock from single context > (b) keep functionality of find and alloc in two separate individual functions Okay, fair enough. >> I thought that was about functions that only locked the lock, called the >> find function, and released the lock. What I'm suggesting is to have one >> function that does "lock + find + allocate if needed + unlock", > > I had similar function in past which does, > "lock + find + allocate if needed + + inc_ref_cnt + unlock", (get_cg_rpool) > update usage_counter atomically, because other thread/process might update > too. > check atomic_dec_cnt - on reaching zero, "lock + del_entry + unlock + free". > > Tejun asked to simplify this to, > > "lock + find + allocate if needed + inc_ref_cnt_without_atomic" + unlock". > which I did in this patch v6. Okay. >> and another >> function that does (under caller's lock) "check ref count + check max count + >> release rpool". > This can be done. Have one dumb basic question for thiat. > Can we call kfree() with spin_lock held? All these years I tend to > avoid doing so. > I think so. This is an old link but I think it still applies: https://lkml.org/lkml/2004/11/21/130 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
> Can we call kfree() with spin_lock held? All these years I tend to > avoid doing so. Also it doesn't look correct to hold the lock while freeing the memory which is totally unrelated to the lock. With that I think current code appears ok with exception that its duplicated at two place for code readability around lock. What say? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
On Thu, Feb 25, 2016 at 5:33 PM, Haggai Eranwrote: +retry: + spin_lock(>rpool_list_lock); + rpool = find_cg_rpool_locked(cg, device); + if (!rpool) { + spin_unlock(>rpool_list_lock); + ret = alloc_cg_rpool(cg, device); + if (ret) + goto err; + else + goto retry; >>> Instead of retrying after allocation of a new rpool, why not just return the >>> newly allocated rpool (or the existing one) from alloc_cg_rpool? >> >> It can be done, but locking semantics just becomes difficult to >> review/maintain with that where alloc_cg_rpool will unlock and lock >> conditionally later on. > Maybe I'm missing something, but couldn't you simply lock rpool_list_lock > inside alloc_cg_rpool()? It already does that around its call to > find_cg_rpool_locked() and the insertion to cg_list. No. ref_count and usage counters are updated at level where lock is taken in charge_cg_resource(). If I move locking rpool_list_lock inside alloc_cg_rpool, unlocking will continue outside, alloc_cg_rpool() when its found or allocated. As you acknowledged in below comment that this makes confusing to lock/unlock from different context, I think current implementation achieves both. (a) take lock from single context (b) keep functionality of find and alloc in two separate individual functions > >> This path will be hit anyway on first allocation typically. Once >> application is warm up, it will be unlikely to enter here. >> I should change if(!rpool) to if (unlikely(!rpool)). > Theoretically the new allocated rpool can be released again by the time you > get to the second call to find_cg_rpool_locked(). > Thats ok, because if that occurs find_cg_rpool_locked() won't find the entry and will try to allocate again. Things work fine in that case. > I thought that was about functions that only locked the lock, called the > find function, and released the lock. What I'm suggesting is to have one > function that does "lock + find + allocate if needed + unlock", I had similar function in past which does, "lock + find + allocate if needed + + inc_ref_cnt + unlock", (get_cg_rpool) update usage_counter atomically, because other thread/process might update too. check atomic_dec_cnt - on reaching zero, "lock + del_entry + unlock + free". Tejun asked to simplify this to, "lock + find + allocate if needed + inc_ref_cnt_without_atomic" + unlock". which I did in this patch v6. > and another > function that does (under caller's lock) "check ref count + check max count + > release rpool". This can be done. Have one dumb basic question for thiat. Can we call kfree() with spin_lock held? All these years I tend to avoid doing so. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
On 24/02/2016 18:16, Parav Pandit wrote: >>> + struct rdmacg_resource_pool *rpool; >>> + struct rdmacg_pool_info *pool_info = >pool_info; >>> + >>> + spin_lock(>rpool_list_lock); >>> + rpool = find_cg_rpool_locked(cg, device); >> Is it possible for rpool to be NULL? >> > Unlikely, unless we have but in cgroup implementation. > It may be worth to add WARN_ON and return from here to avoid kernel crash. Sounds good. >>> +static int charge_cg_resource(struct rdma_cgroup *cg, >>> + struct rdmacg_device *device, >>> + int index, int num) >>> +{ >>> + struct rdmacg_resource_pool *rpool; >>> + s64 new; >>> + int ret = 0; >>> + >>> +retry: >>> + spin_lock(>rpool_list_lock); >>> + rpool = find_cg_rpool_locked(cg, device); >>> + if (!rpool) { >>> + spin_unlock(>rpool_list_lock); >>> + ret = alloc_cg_rpool(cg, device); >>> + if (ret) >>> + goto err; >>> + else >>> + goto retry; >> Instead of retrying after allocation of a new rpool, why not just return the >> newly allocated rpool (or the existing one) from alloc_cg_rpool? > > It can be done, but locking semantics just becomes difficult to > review/maintain with that where alloc_cg_rpool will unlock and lock > conditionally later on. Maybe I'm missing something, but couldn't you simply lock rpool_list_lock inside alloc_cg_rpool()? It already does that around its call to find_cg_rpool_locked() and the insertion to cg_list. > This path will be hit anyway on first allocation typically. Once > application is warm up, it will be unlikely to enter here. > I should change if(!rpool) to if (unlikely(!rpool)). Theoretically the new allocated rpool can be released again by the time you get to the second call to find_cg_rpool_locked(). >>> + spin_lock(>rpool_list_lock); >>> + rpool = find_cg_rpool_locked(cg, device); >>> + if (!rpool) { >>> + spin_unlock(>rpool_list_lock); >>> + ret = alloc_cg_rpool(cg, device); >>> + if (ret) >>> + goto opt_err; >>> + else >>> + goto retry; >> You can avoid the retry here too. Perhaps this can go into a function. >> > In v5 I had wrapper around code which used to similar hiding using > get_cg_rpool and put_cg_rpool helper functions. > But Tejun was of opinion that I should have locks outside of all those > functions. With that approach, this is done. > So I think its ok. to have it this way. I thought that was about functions that only locked the lock, called the find function, and released the lock. What I'm suggesting is to have one function that does "lock + find + allocate if needed + unlock", and another function that does (under caller's lock) "check ref count + check max count + release rpool". >>> + } >>> + >>> + /* now set the new limits of the rpool */ >>> + while (enables) { >>> + /* if user set the limit, enables bit is set */ >>> + if (enables & BIT(i)) { >>> + enables &= ~BIT(i); >>> + set_resource_limit(rpool, i, new_limits[i]); >>> + } >>> + i++; >>> + } >>> + if (rpool->refcnt == 0 && >>> + rpool->num_max_cnt == pool_info->table_len) { >>> + /* >>> + * No user of the rpool and all entries are >>> + * set to max, so safe to delete this rpool. >>> + */ >>> + list_del(>cg_list); >>> + spin_unlock(>rpool_list_lock); >>> + free_cg_rpool(rpool); >>> + } else { >>> + spin_unlock(>rpool_list_lock); >>> + } >> You should consider putting this piece of code in a function (the >> check of the reference counts and release of the rpool). >> > Yes. I did. Same as above comment. Also this function will have to > unlock. Its usually better to lock/unlock from same function level, > instead of locking at one level and unlocking from inside the > function. > Or > I should have > cg_rpool_cond_free_unlock() for above code (check of the reference > counts and release of the rpool)? It is confusing to lock and unlock in different contexts. Why not lock in the caller context? free_cg_rpool() can be called under rpool_list_lock, couldn't it? It locks device->rpool_lock, but uncharge_cg_resource() also locks both in the same order. Thanks, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
Hi, Overall I the patch looks good to me. I have a few comments below. On 20/02/2016 13:00, Parav Pandit wrote: > Resource pool is created/destroyed dynamically whenever > charging/uncharging occurs respectively and whenever user > configuration is done. Its a tradeoff of memory vs little more code Its -> It's > space that creates resource pool whenever necessary, > instead of creating them during cgroup creation and device registration > time. > > Signed-off-by: Parav Pandit> diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h > new file mode 100644 > index 000..b370733 > --- /dev/null > +++ b/include/linux/cgroup_rdma.h > +struct rdmacg_device { > + struct rdmacg_pool_info pool_info; > + struct list_headrdmacg_list; > + struct list_headrpool_head; > + spinlock_t rpool_lock; /* protects rsource pool list */ rsource -> resource > + char*name; > +}; > + > +/* APIs for RDMA/IB stack to publish when a device wants to > + * participate in resource accounting > + */ > +int rdmacg_register_device(struct rdmacg_device *device); > +void rdmacg_unregister_device(struct rdmacg_device *device); > + > +/* APIs for RDMA/IB stack to charge/uncharge pool specific resources */ > +int rdmacg_try_charge(struct rdma_cgroup **rdmacg, > + struct rdmacg_device *device, > + int resource_index, > + int num); > +void rdmacg_uncharge(struct rdma_cgroup *cg, > + struct rdmacg_device *device, > + int resource_index, > + int num); > +void rdmacg_query_limit(struct rdmacg_device *device, > + int *limits, int max_count); You can drop the max_count parameter, and require the caller to always provide pool_info->table_len items, couldn't you? > + > +#endif /* CONFIG_CGROUP_RDMA */ > +#endif /* _CGROUP_RDMA_H */ > diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h > index 0df0336a..d0e597c 100644 > --- a/include/linux/cgroup_subsys.h > +++ b/include/linux/cgroup_subsys.h > @@ -56,6 +56,10 @@ SUBSYS(hugetlb) > SUBSYS(pids) > #endif > > +#if IS_ENABLED(CONFIG_CGROUP_RDMA) > +SUBSYS(rdma) > +#endif > + > /* > * The following subsystems are not supported on the default hierarchy. > */ > diff --git a/init/Kconfig b/init/Kconfig > index 2232080..1741b65 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1054,6 +1054,16 @@ config CGROUP_PIDS > since the PIDs limit only affects a process's ability to fork, not to > attach to a cgroup. > > +config CGROUP_RDMA > + bool "RDMA controller" > + help > + Provides enforcement of RDMA resources defined by IB stack. > + It is fairly easy for consumers to exhaust RDMA resources, which > + can result into resource unavailibility to other consumers. unavailibility -> unavailability > + RDMA controller is designed to stop this from happening. > + Attaching processes with active RDMA resources to the cgroup > + hierarchy is allowed even if can cross the hierarchy's limit. > + > config CGROUP_FREEZER > bool "Freezer controller" > help > diff --git a/kernel/Makefile b/kernel/Makefile > index baa55e5..501f5df 100644 > +/** > + * uncharge_cg_resource - uncharge resource for rdma cgroup > + * @cg: pointer to cg to uncharge and all parents in hierarchy It only uncharges a single cg, right? > + * @device: pointer to ib device > + * @index: index of the resource to uncharge in cg (resource pool) > + * @num: the number of rdma resource to uncharge > + * > + * It also frees the resource pool in the hierarchy for the resource pool > + * which was created as part of charing operation. charing -> charging > + */ > +static void uncharge_cg_resource(struct rdma_cgroup *cg, > + struct rdmacg_device *device, > + int index, int num) > +{ > + struct rdmacg_resource_pool *rpool; > + struct rdmacg_pool_info *pool_info = >pool_info; > + > + spin_lock(>rpool_list_lock); > + rpool = find_cg_rpool_locked(cg, device); Is it possible for rpool to be NULL? > + > + /* > + * A negative count (or overflow) is invalid, > + * it indicates a bug in the rdma controller. > + */ > + rpool->resources[index].usage -= num; > + > + WARN_ON_ONCE(rpool->resources[index].usage < 0); > + rpool->refcnt--; > + if (rpool->refcnt == 0 && rpool->num_max_cnt == pool_info->table_len) { > + /* > + * No user of the rpool and all entries are set to max, so > + * safe to delete this rpool. > + */ > + list_del(>cg_list); > + spin_unlock(>rpool_list_lock); > + > + free_cg_rpool(rpool); > + return; > + } > + spin_unlock(>rpool_list_lock); > +} > +/** > + * charge_cg_resource - charge
Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
On Sat, Feb 20, 2016 at 04:30:04PM +0530, Parav Pandit wrote: > Added rdma cgroup controller that does accounting, limit enforcement > on rdma/IB verbs and hw resources. > > Added rdma cgroup header file which defines its APIs to perform > charing/uncharing functionality and device registration which will > participate in controller functions of accounting and limit > enforcements. It also define rdmacg_device structure to bind IB stack > and RDMA cgroup controller. > > RDMA resources are tracked using resource pool. Resource pool is per > device, per cgroup entity which allows setting up accounting limits > on per device basis. > > Resources are not defined by the RDMA cgroup, instead they are defined > by the external module IB stack. This allows extending IB stack > without changing kernel, as IB stack is going through changes > and enhancements. > > Resource pool is created/destroyed dynamically whenever > charging/uncharging occurs respectively and whenever user > configuration is done. Its a tradeoff of memory vs little more code > space that creates resource pool whenever necessary, > instead of creating them during cgroup creation and device registration > time. > > Signed-off-by: Parav Pandit> --- > include/linux/cgroup_rdma.h | 53 +++ > include/linux/cgroup_subsys.h | 4 + > init/Kconfig | 10 + > kernel/Makefile | 1 + > kernel/cgroup_rdma.c | 753 > ++ > 5 files changed, 821 insertions(+) > create mode 100644 include/linux/cgroup_rdma.h > create mode 100644 kernel/cgroup_rdma.c > > diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h > new file mode 100644 > index 000..b370733 > --- /dev/null > +++ b/include/linux/cgroup_rdma.h > @@ -0,0 +1,53 @@ > +#ifndef _CGROUP_RDMA_H > +#define _CGROUP_RDMA_H > + > +#include > + > +struct rdma_cgroup { > +#ifdef CONFIG_CGROUP_RDMA > + struct cgroup_subsys_state css; > + > + spinlock_t rpool_list_lock; /* protects resource pool list */ > + struct list_head rpool_head;/* head to keep track of all resource > + * pools that belongs to this cgroup. > + */ > +#endif > +}; > + > +#ifdef CONFIG_CGROUP_RDMA I'm sure that you already asked about that, but why do you need ifdef embedded in struct rdma_cgroup and right after that the same one? Can you place this ifdef before declaring struct rdma_cgroup? > + > +struct rdmacg_device; > + Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
On Sun, Feb 21, 2016 at 8:39 PM, Leon Romanovskywrote: > On Sun, Feb 21, 2016 at 07:41:08PM +0530, Parav Pandit wrote: >> CONFIG_CGROUP_RDMA >> >> On Sun, Feb 21, 2016 at 7:15 PM, Leon Romanovsky wrote: >> > On Sun, Feb 21, 2016 at 05:03:05PM +0530, Parav Pandit wrote: >> >> On Sun, Feb 21, 2016 at 1:13 PM, Leon Romanovsky wrote: >> >> > On Sat, Feb 20, 2016 at 04:30:04PM +0530, Parav Pandit wrote: >> >> > Can you place this ifdef before declaring struct rdma_cgroup? >> >> Yes. I missed out this cleanup. Done locally now. >> > >> > Great, additional thing which spotted my attention was related to >> > declaring and using the new cgroups functions. There are number of >> > places where you protected the calls by specific ifdefs in the >> > IB/core c-files and not in h-files as it is usually done. >> > >> ib_device_register_rdmacg, ib_device_unregister_rdmacg are the only >> two functions called from IB/core as its tied to functionality. >> They can also be implemented as NULL call when CONFIG_CGROUP_RDMA is >> undefined. >> (Similar to ib_rdmacg_try_charge and others). >> I didn't do because occurrence of call of register and unregister is >> limited to single file and only twice compare to charge/uncharge >> functions. >> Either way is fine with me, I can make the changes which you >> described. Let me know. > > Please do, > IMHO, it is better to have one place which handles all relevant ifdefs > and functions. IB/core doesn't need to know about cgroups implementation. > ok. Done. Thanks for the review. I will accumulate more comments from Tejun and others before spinning v7. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller
On Sun, Feb 21, 2016 at 07:41:08PM +0530, Parav Pandit wrote: > CONFIG_CGROUP_RDMA > > On Sun, Feb 21, 2016 at 7:15 PM, Leon Romanovskywrote: > > On Sun, Feb 21, 2016 at 05:03:05PM +0530, Parav Pandit wrote: > >> On Sun, Feb 21, 2016 at 1:13 PM, Leon Romanovsky wrote: > >> > On Sat, Feb 20, 2016 at 04:30:04PM +0530, Parav Pandit wrote: > >> > Can you place this ifdef before declaring struct rdma_cgroup? > >> Yes. I missed out this cleanup. Done locally now. > > > > Great, additional thing which spotted my attention was related to > > declaring and using the new cgroups functions. There are number of > > places where you protected the calls by specific ifdefs in the > > IB/core c-files and not in h-files as it is usually done. > > > ib_device_register_rdmacg, ib_device_unregister_rdmacg are the only > two functions called from IB/core as its tied to functionality. > They can also be implemented as NULL call when CONFIG_CGROUP_RDMA is > undefined. > (Similar to ib_rdmacg_try_charge and others). > I didn't do because occurrence of call of register and unregister is > limited to single file and only twice compare to charge/uncharge > functions. > Either way is fine with me, I can make the changes which you > described. Let me know. Please do, IMHO, it is better to have one place which handles all relevant ifdefs and functions. IB/core doesn't need to know about cgroups implementation. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html