Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-25 Thread Haggai Eran
On 25/02/2016 15:34, Parav Pandit wrote:
> On Thu, Feb 25, 2016 at 5:33 PM, Haggai Eran  wrote:
> +retry:
> + spin_lock(>rpool_list_lock);
> + rpool = find_cg_rpool_locked(cg, device);
> + if (!rpool) {
> + spin_unlock(>rpool_list_lock);
> + ret = alloc_cg_rpool(cg, device);
> + if (ret)
> + goto err;
> + else
> + goto retry;
 Instead of retrying after allocation of a new rpool, why not just return 
 the
 newly allocated rpool (or the existing one) from alloc_cg_rpool?
>>>
>>> It can be done, but locking semantics just becomes difficult to
>>> review/maintain with that where alloc_cg_rpool will unlock and lock
>>> conditionally later on.
>> Maybe I'm missing something, but couldn't you simply lock rpool_list_lock
>> inside alloc_cg_rpool()? It already does that around its call to
>> find_cg_rpool_locked() and the insertion to cg_list.
> 
> No. ref_count and usage counters are updated at level where lock is
> taken in charge_cg_resource().
> If I move locking rpool_list_lock inside alloc_cg_rpool, unlocking
> will continue outside, alloc_cg_rpool() when its found or allocated.
> As you acknowledged in below comment that this makes confusing to
> lock/unlock from different context, I think current implementation
> achieves both.
> (a) take lock from single context
> (b) keep functionality of find and alloc in two separate individual functions

Okay, fair enough.

>> I thought that was about functions that only locked the lock, called the
>> find function, and released the lock. What I'm suggesting is to have one
>> function that does "lock + find + allocate if needed + unlock",
> 
> I had similar function in past which does,
> "lock + find + allocate if needed + + inc_ref_cnt + unlock", (get_cg_rpool)
> update usage_counter atomically, because other thread/process might update 
> too.
> check atomic_dec_cnt - on reaching zero, "lock + del_entry + unlock + free".
> 
> Tejun asked to simplify this to,
> 
> "lock + find + allocate if needed + inc_ref_cnt_without_atomic" + unlock".
> which I did in this patch v6.

Okay.

>> and another
>> function that does (under caller's lock) "check ref count + check max count +
>> release rpool".
> This can be done. Have one dumb basic question for thiat.
> Can we call kfree() with spin_lock held? All these years I tend to
> avoid doing so.
> 

I think so. This is an old link but I think it still applies: 
https://lkml.org/lkml/2004/11/21/130
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-25 Thread Parav Pandit
> Can we call kfree() with spin_lock held? All these years I tend to
> avoid doing so.
Also it doesn't look correct to hold the lock while freeing the memory
which is totally unrelated to the lock.
With that I think current code appears ok with exception that its
duplicated at two place for code readability around lock.
What say?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-25 Thread Parav Pandit
On Thu, Feb 25, 2016 at 5:33 PM, Haggai Eran  wrote:
 +retry:
 + spin_lock(>rpool_list_lock);
 + rpool = find_cg_rpool_locked(cg, device);
 + if (!rpool) {
 + spin_unlock(>rpool_list_lock);
 + ret = alloc_cg_rpool(cg, device);
 + if (ret)
 + goto err;
 + else
 + goto retry;
>>> Instead of retrying after allocation of a new rpool, why not just return the
>>> newly allocated rpool (or the existing one) from alloc_cg_rpool?
>>
>> It can be done, but locking semantics just becomes difficult to
>> review/maintain with that where alloc_cg_rpool will unlock and lock
>> conditionally later on.
> Maybe I'm missing something, but couldn't you simply lock rpool_list_lock
> inside alloc_cg_rpool()? It already does that around its call to
> find_cg_rpool_locked() and the insertion to cg_list.

No. ref_count and usage counters are updated at level where lock is
taken in charge_cg_resource().
If I move locking rpool_list_lock inside alloc_cg_rpool, unlocking
will continue outside, alloc_cg_rpool() when its found or allocated.
As you acknowledged in below comment that this makes confusing to
lock/unlock from different context, I think current implementation
achieves both.
(a) take lock from single context
(b) keep functionality of find and alloc in two separate individual functions

>
>> This path will be hit anyway on first allocation typically. Once
>> application is warm up, it will be unlikely to enter here.
>> I should change if(!rpool) to if (unlikely(!rpool)).
> Theoretically the new allocated rpool can be released again by the time you
> get to the second call to find_cg_rpool_locked().
>
Thats ok, because if that occurs find_cg_rpool_locked() won't find the
entry and will try to allocate again.
Things work fine in that case.

> I thought that was about functions that only locked the lock, called the
> find function, and released the lock. What I'm suggesting is to have one
> function that does "lock + find + allocate if needed + unlock",

I had similar function in past which does,
"lock + find + allocate if needed + + inc_ref_cnt + unlock", (get_cg_rpool)
update usage_counter atomically, because other thread/process might update too.
check atomic_dec_cnt - on reaching zero, "lock + del_entry + unlock + free".

Tejun asked to simplify this to,

"lock + find + allocate if needed + inc_ref_cnt_without_atomic" + unlock".
which I did in this patch v6.

> and another
> function that does (under caller's lock) "check ref count + check max count +
> release rpool".
This can be done. Have one dumb basic question for thiat.
Can we call kfree() with spin_lock held? All these years I tend to
avoid doing so.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-25 Thread Haggai Eran
On 24/02/2016 18:16, Parav Pandit wrote:
>>> + struct rdmacg_resource_pool *rpool;
>>> + struct rdmacg_pool_info *pool_info = >pool_info;
>>> +
>>> + spin_lock(>rpool_list_lock);
>>> + rpool = find_cg_rpool_locked(cg, device);
>> Is it possible for rpool to be NULL?
>>
> Unlikely, unless we have but in cgroup implementation.
> It may be worth to add WARN_ON and return from here to avoid kernel crash.
Sounds good.

>>> +static int charge_cg_resource(struct rdma_cgroup *cg,
>>> +   struct rdmacg_device *device,
>>> +   int index, int num)
>>> +{
>>> + struct rdmacg_resource_pool *rpool;
>>> + s64 new;
>>> + int ret = 0;
>>> +
>>> +retry:
>>> + spin_lock(>rpool_list_lock);
>>> + rpool = find_cg_rpool_locked(cg, device);
>>> + if (!rpool) {
>>> + spin_unlock(>rpool_list_lock);
>>> + ret = alloc_cg_rpool(cg, device);
>>> + if (ret)
>>> + goto err;
>>> + else
>>> + goto retry;
>> Instead of retrying after allocation of a new rpool, why not just return the
>> newly allocated rpool (or the existing one) from alloc_cg_rpool?
> 
> It can be done, but locking semantics just becomes difficult to
> review/maintain with that where alloc_cg_rpool will unlock and lock
> conditionally later on.
Maybe I'm missing something, but couldn't you simply lock rpool_list_lock
inside alloc_cg_rpool()? It already does that around its call to 
find_cg_rpool_locked() and the insertion to cg_list.

> This path will be hit anyway on first allocation typically. Once
> application is warm up, it will be unlikely to enter here.
> I should change if(!rpool) to if (unlikely(!rpool)).
Theoretically the new allocated rpool can be released again by the time you
get to the second call to find_cg_rpool_locked().

>>> + spin_lock(>rpool_list_lock);
>>> + rpool = find_cg_rpool_locked(cg, device);
>>> + if (!rpool) {
>>> + spin_unlock(>rpool_list_lock);
>>> + ret = alloc_cg_rpool(cg, device);
>>> + if (ret)
>>> + goto opt_err;
>>> + else
>>> + goto retry;
>> You can avoid the retry here too. Perhaps this can go into a function.
>>
> In v5 I had wrapper around code which used to similar hiding using
> get_cg_rpool and put_cg_rpool helper functions.
> But Tejun was of opinion that I should have locks outside of all those
> functions. With that approach, this is done.
> So I think its ok. to have it this way.
I thought that was about functions that only locked the lock, called the 
find function, and released the lock. What I'm suggesting is to have one
function that does "lock + find + allocate if needed + unlock", and another
function that does (under caller's lock) "check ref count + check max count +
release rpool".

>>> + }
>>> +
>>> + /* now set the new limits of the rpool */
>>> + while (enables) {
>>> + /* if user set the limit, enables bit is set */
>>> + if (enables & BIT(i)) {
>>> + enables &= ~BIT(i);
>>> + set_resource_limit(rpool, i, new_limits[i]);
>>> + }
>>> + i++;
>>> + }
>>> + if (rpool->refcnt == 0 &&
>>> + rpool->num_max_cnt == pool_info->table_len) {
>>> + /*
>>> +  * No user of the rpool and all entries are
>>> +  * set to max, so safe to delete this rpool.
>>> +  */
>>> + list_del(>cg_list);
>>> + spin_unlock(>rpool_list_lock);
>>> + free_cg_rpool(rpool);
>>> + } else {
>>> + spin_unlock(>rpool_list_lock);
>>> + }
>> You should consider putting this piece of code in a function (the
>> check of the reference counts and release of the rpool).
>>
> Yes. I did. Same as above comment. Also this function will have to
> unlock. Its usually better to lock/unlock from same function level,
> instead of locking at one level and unlocking from inside the
> function.
> Or
> I should have
> cg_rpool_cond_free_unlock() for above code (check of the reference
> counts and release of the rpool)?
It is confusing to lock and unlock in different contexts. Why not lock
in the caller context? free_cg_rpool() can be called under rpool_list_lock,
couldn't it? It locks device->rpool_lock, but uncharge_cg_resource() also
locks both in the same order.

Thanks,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-24 Thread Haggai Eran
Hi,

Overall I the patch looks good to me. I have a few comments below.

On 20/02/2016 13:00, Parav Pandit wrote:
> Resource pool is created/destroyed dynamically whenever
> charging/uncharging occurs respectively and whenever user
> configuration is done. Its a tradeoff of memory vs little more code
Its -> It's
> space that creates resource pool whenever necessary,
> instead of creating them during cgroup creation and device registration
> time.
> 
> Signed-off-by: Parav Pandit 

> diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
> new file mode 100644
> index 000..b370733
> --- /dev/null
> +++ b/include/linux/cgroup_rdma.h

> +struct rdmacg_device {
> + struct rdmacg_pool_info pool_info;
> + struct list_headrdmacg_list;
> + struct list_headrpool_head;
> + spinlock_t  rpool_lock; /* protects rsource pool list */
rsource -> resource

> + char*name;
> +};
> +
> +/* APIs for RDMA/IB stack to publish when a device wants to
> + * participate in resource accounting
> + */
> +int rdmacg_register_device(struct rdmacg_device *device);
> +void rdmacg_unregister_device(struct rdmacg_device *device);
> +
> +/* APIs for RDMA/IB stack to charge/uncharge pool specific resources */
> +int rdmacg_try_charge(struct rdma_cgroup **rdmacg,
> +   struct rdmacg_device *device,
> +   int resource_index,
> +   int num);
> +void rdmacg_uncharge(struct rdma_cgroup *cg,
> +  struct rdmacg_device *device,
> +  int resource_index,
> +  int num);
> +void rdmacg_query_limit(struct rdmacg_device *device,
> + int *limits, int max_count);
You can drop the max_count parameter, and require the caller to
always provide pool_info->table_len items, couldn't you?

> +
> +#endif   /* CONFIG_CGROUP_RDMA */
> +#endif   /* _CGROUP_RDMA_H */
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index 0df0336a..d0e597c 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -56,6 +56,10 @@ SUBSYS(hugetlb)
>  SUBSYS(pids)
>  #endif
>  
> +#if IS_ENABLED(CONFIG_CGROUP_RDMA)
> +SUBSYS(rdma)
> +#endif
> +
>  /*
>   * The following subsystems are not supported on the default hierarchy.
>   */
> diff --git a/init/Kconfig b/init/Kconfig
> index 2232080..1741b65 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1054,6 +1054,16 @@ config CGROUP_PIDS
> since the PIDs limit only affects a process's ability to fork, not to
> attach to a cgroup.
>  
> +config CGROUP_RDMA
> + bool "RDMA controller"
> + help
> +   Provides enforcement of RDMA resources defined by IB stack.
> +   It is fairly easy for consumers to exhaust RDMA resources, which
> +   can result into resource unavailibility to other consumers.
unavailibility -> unavailability
> +   RDMA controller is designed to stop this from happening.
> +   Attaching processes with active RDMA resources to the cgroup
> +   hierarchy is allowed even if can cross the hierarchy's limit.
> +
>  config CGROUP_FREEZER
>   bool "Freezer controller"
>   help
> diff --git a/kernel/Makefile b/kernel/Makefile
> index baa55e5..501f5df 100644

> +/**
> + * uncharge_cg_resource - uncharge resource for rdma cgroup
> + * @cg: pointer to cg to uncharge and all parents in hierarchy
It only uncharges a single cg, right?
> + * @device: pointer to ib device
> + * @index: index of the resource to uncharge in cg (resource pool)
> + * @num: the number of rdma resource to uncharge
> + *
> + * It also frees the resource pool in the hierarchy for the resource pool
> + * which was created as part of charing operation.
charing -> charging
> + */
> +static void uncharge_cg_resource(struct rdma_cgroup *cg,
> +  struct rdmacg_device *device,
> +  int index, int num)
> +{
> + struct rdmacg_resource_pool *rpool;
> + struct rdmacg_pool_info *pool_info = >pool_info;
> +
> + spin_lock(>rpool_list_lock);
> + rpool = find_cg_rpool_locked(cg, device);
Is it possible for rpool to be NULL?

> +
> + /*
> +  * A negative count (or overflow) is invalid,
> +  * it indicates a bug in the rdma controller.
> +  */
> + rpool->resources[index].usage -= num;
> +
> + WARN_ON_ONCE(rpool->resources[index].usage < 0);
> + rpool->refcnt--;
> + if (rpool->refcnt == 0 && rpool->num_max_cnt == pool_info->table_len) {
> + /*
> +  * No user of the rpool and all entries are set to max, so
> +  * safe to delete this rpool.
> +  */
> + list_del(>cg_list);
> + spin_unlock(>rpool_list_lock);
> +
> + free_cg_rpool(rpool);
> + return;
> + }
> + spin_unlock(>rpool_list_lock);
> +}

> +/**
> + * charge_cg_resource - charge 

Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-21 Thread Leon Romanovsky
On Sat, Feb 20, 2016 at 04:30:04PM +0530, Parav Pandit wrote:
> Added rdma cgroup controller that does accounting, limit enforcement
> on rdma/IB verbs and hw resources.
> 
> Added rdma cgroup header file which defines its APIs to perform
> charing/uncharing functionality and device registration which will
> participate in controller functions of accounting and limit
> enforcements. It also define rdmacg_device structure to bind IB stack
> and RDMA cgroup controller.
> 
> RDMA resources are tracked using resource pool. Resource pool is per
> device, per cgroup entity which allows setting up accounting limits
> on per device basis.
> 
> Resources are not defined by the RDMA cgroup, instead they are defined
> by the external module IB stack. This allows extending IB stack
> without changing kernel, as IB stack is going through changes
> and enhancements.
> 
> Resource pool is created/destroyed dynamically whenever
> charging/uncharging occurs respectively and whenever user
> configuration is done. Its a tradeoff of memory vs little more code
> space that creates resource pool whenever necessary,
> instead of creating them during cgroup creation and device registration
> time.
> 
> Signed-off-by: Parav Pandit 
> ---
>  include/linux/cgroup_rdma.h   |  53 +++
>  include/linux/cgroup_subsys.h |   4 +
>  init/Kconfig  |  10 +
>  kernel/Makefile   |   1 +
>  kernel/cgroup_rdma.c  | 753 
> ++
>  5 files changed, 821 insertions(+)
>  create mode 100644 include/linux/cgroup_rdma.h
>  create mode 100644 kernel/cgroup_rdma.c
> 
> diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
> new file mode 100644
> index 000..b370733
> --- /dev/null
> +++ b/include/linux/cgroup_rdma.h
> @@ -0,0 +1,53 @@
> +#ifndef _CGROUP_RDMA_H
> +#define _CGROUP_RDMA_H
> +
> +#include 
> +
> +struct rdma_cgroup {
> +#ifdef CONFIG_CGROUP_RDMA
> + struct cgroup_subsys_state  css;
> +
> + spinlock_t rpool_list_lock; /* protects resource pool list */
> + struct list_head rpool_head;/* head to keep track of all resource
> +  * pools that belongs to this cgroup.
> +  */
> +#endif
> +};
> +
> +#ifdef CONFIG_CGROUP_RDMA

I'm sure that you already asked about that, but why do you need ifdef
embedded in struct rdma_cgroup and right after that the same one?
Can you place this ifdef before declaring struct rdma_cgroup?

> +
> +struct rdmacg_device;
> +

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-21 Thread Parav Pandit
On Sun, Feb 21, 2016 at 8:39 PM, Leon Romanovsky  wrote:
> On Sun, Feb 21, 2016 at 07:41:08PM +0530, Parav Pandit wrote:
>> CONFIG_CGROUP_RDMA
>>
>> On Sun, Feb 21, 2016 at 7:15 PM, Leon Romanovsky  wrote:
>> > On Sun, Feb 21, 2016 at 05:03:05PM +0530, Parav Pandit wrote:
>> >> On Sun, Feb 21, 2016 at 1:13 PM, Leon Romanovsky  wrote:
>> >> > On Sat, Feb 20, 2016 at 04:30:04PM +0530, Parav Pandit wrote:
>> >> > Can you place this ifdef before declaring struct rdma_cgroup?
>> >> Yes. I missed out this cleanup. Done locally now.
>> >
>> > Great, additional thing which spotted my attention was related to
>> > declaring and using the new cgroups functions. There are number of
>> > places where you protected the calls by specific ifdefs in the
>> > IB/core c-files and not in h-files as it is usually done.
>> >
>> ib_device_register_rdmacg, ib_device_unregister_rdmacg are the only
>> two functions called from IB/core as its tied to functionality.
>> They can also be implemented as NULL call when CONFIG_CGROUP_RDMA is 
>> undefined.
>> (Similar to ib_rdmacg_try_charge and others).
>> I didn't do because occurrence of call of register and unregister is
>> limited to single file and only twice compare to charge/uncharge
>> functions.
>> Either way is fine with me, I can make the changes which you
>> described. Let me know.
>
> Please do,
> IMHO, it is better to have one place which handles all relevant ifdefs
> and functions. IB/core doesn't need to know about cgroups implementation.
>
ok. Done. Thanks for the review. I will accumulate more comments from
Tejun and others before spinning v7.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller

2016-02-21 Thread Leon Romanovsky
On Sun, Feb 21, 2016 at 07:41:08PM +0530, Parav Pandit wrote:
> CONFIG_CGROUP_RDMA
> 
> On Sun, Feb 21, 2016 at 7:15 PM, Leon Romanovsky  wrote:
> > On Sun, Feb 21, 2016 at 05:03:05PM +0530, Parav Pandit wrote:
> >> On Sun, Feb 21, 2016 at 1:13 PM, Leon Romanovsky  wrote:
> >> > On Sat, Feb 20, 2016 at 04:30:04PM +0530, Parav Pandit wrote:
> >> > Can you place this ifdef before declaring struct rdma_cgroup?
> >> Yes. I missed out this cleanup. Done locally now.
> >
> > Great, additional thing which spotted my attention was related to
> > declaring and using the new cgroups functions. There are number of
> > places where you protected the calls by specific ifdefs in the
> > IB/core c-files and not in h-files as it is usually done.
> >
> ib_device_register_rdmacg, ib_device_unregister_rdmacg are the only
> two functions called from IB/core as its tied to functionality.
> They can also be implemented as NULL call when CONFIG_CGROUP_RDMA is 
> undefined.
> (Similar to ib_rdmacg_try_charge and others).
> I didn't do because occurrence of call of register and unregister is
> limited to single file and only twice compare to charge/uncharge
> functions.
> Either way is fine with me, I can make the changes which you
> described. Let me know.

Please do,
IMHO, it is better to have one place which handles all relevant ifdefs
and functions. IB/core doesn't need to know about cgroups implementation.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html