Re: [PATCH 4/7] devcg: Added rdma resource tracker object per task

2015-09-07 Thread Haggai Eran
On 07/09/2015 23:38, Parav Pandit wrote:
> @@ -2676,7 +2686,7 @@ static inline int thread_group_empty(struct task_struct 
> *p)
>   * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
>   * subscriptions and synchronises with wait4().  Also used in procfs.  Also
>   * pins the final release of task.io_context.  Also protects ->cpuset and
> - * ->cgroup.subsys[]. And ->vfork_done.
> + * ->cgroup.subsys[]. Also projtects ->vfork_done and ->rdma_res_counter.
s/projtects/protects/
>   *
>   * Nests both inside and outside of read_lock(&tasklist_lock).
>   * It must not be nested with write_lock_irq(&tasklist_lock),

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] devcg: Added infrastructure for rdma device cgroup.

2015-09-07 Thread Haggai Eran
On 07/09/2015 23:38, Parav Pandit wrote:
> diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
> index 8b64221..cdbdd60 100644
> --- a/include/linux/device_cgroup.h
> +++ b/include/linux/device_cgroup.h
> @@ -1,6 +1,57 @@
> +#ifndef _DEVICE_CGROUP
> +#define _DEVICE_CGROUP
> +
>  #include 
> +#include 
> +#include 

You cannot add this include line before adding the device_rdma_cgroup.h
(added in patch 5). You should reorder the patches so that after each
patch the kernel builds correctly.

I also noticed in patch 2 you add device_rdma_cgroup.o to the Makefile
before it was added to the kernel.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 5/5] iw_cxgb4: set the default MPA version to 2

2015-09-07 Thread Hariprasad Shenai
This enables ORD/IRD negotiation and its about time to enable it by
default

Signed-off-by: Hariprasad Shenai 
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 0e2741b..79d6855 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -115,11 +115,11 @@ module_param(ep_timeout_secs, int, 0644);
 MODULE_PARM_DESC(ep_timeout_secs, "CM Endpoint operation timeout "
   "in seconds (default=60)");
 
-static int mpa_rev = 1;
+static int mpa_rev = 2;
 module_param(mpa_rev, int, 0644);
 MODULE_PARM_DESC(mpa_rev, "MPA Revision, 0 supports amso1100, "
"1 is RFC0544 spec compliant, 2 is IETF MPA Peer Connect Draft"
-   " compliant (default=1)");
+   " compliant (default=2)");
 
 static int markers_enabled;
 module_param(markers_enabled, int, 0644);
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 4/5] iw_cxgb4: reverse the ord/ird in the ESTABLISHED upcall

2015-09-07 Thread Hariprasad Shenai
The ESTABLISHED event should have the peer's ord/ird so
swap the values in the event before the upcall.

Signed-off-by: Hariprasad Shenai 
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 06d208c..0e2741b 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1242,8 +1242,8 @@ static void established_upcall(struct c4iw_ep *ep)
PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
memset(&event, 0, sizeof(event));
event.event = IW_CM_EVENT_ESTABLISHED;
-   event.ird = ep->ird;
-   event.ord = ep->ord;
+   event.ird = ep->ord;
+   event.ord = ep->ird;
if (ep->com.cm_id) {
PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
ep->com.cm_id->event_handler(ep->com.cm_id, &event);
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 1/5] iw_cxgb4: detect fatal errors while creating listening filters

2015-09-07 Thread Hariprasad Shenai
In c4iw_create_listen(), if we're using listen filters, then bail out
of the busy loop if the device becomes fatally dead

Signed-off-by: Hariprasad Shenai 
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 3ad8dc7..42087c9 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -3202,6 +3202,10 @@ static int create_server4(struct c4iw_dev *dev, struct 
c4iw_listen_ep *ep)
sin->sin_addr.s_addr, sin->sin_port, 0,
ep->com.dev->rdev.lldi.rxq_ids[0], 0, 0);
if (err == -EBUSY) {
+   if (c4iw_fatal_error(&ep->com.dev->rdev)) {
+   err = -EIO;
+   break;
+   }
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(usecs_to_jiffies(100));
}
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 3/5] iw_cxgb4: fix misuse of ep->ord for minimum ird calculation

2015-09-07 Thread Hariprasad Shenai
When calculating the minimum ird in c4iw_accept_cr(), we need to always
have a value of at least 1 if the RTR message is a 0B read.  The code
was
incorrectly using ep->ord for this logic which was incorrectly adjusting
the ird and causing incorrect ord/ird negotiation when using MPAv2 to
negotiate these values.

Signed-off-by: Hariprasad Shenai 
---
 drivers/infiniband/hw/cxgb4/cm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index a26d293..06d208c 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -2878,7 +2878,7 @@ int c4iw_accept_cr(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
} else {
if (peer2peer &&
(ep->mpa_attr.p2p_type != FW_RI_INIT_P2PTYPE_DISABLED) &&
-   (p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ) && ep->ord == 0)
+   (p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ) && ep->ird == 0)
ep->ird = 1;
}
 
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 0/5] set MPA revision to 2 and misc. fixes for iw_cxgb4

2015-09-07 Thread Hariprasad Shenai
Hi,

This patch series adds the following.
Detect errors while creating listening servers, pass ird/ord info in
connect reply events, fix misuse of ord for ird calculation and for the
ESTABLISHED event we should have the peer's ord/ird so swap the values in
the event before the upcall. Set default MPA version to 2.

This patch series has been created against Doug's linux tree and includes
patches on iw_cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.

Thanks


Hariprasad Shenai (5):
  iw_cxgb4: detect fatal errors while creating listening filters
  iw_cxgb4: pass the ord/ird in connect reply events
  iw_cxgb4: fix misuse of ep->ord for minimum ird calculation
  iw_cxgb4: reverse the ord/ird in the ESTABLISHED upcall
  iw_cxgb4: set the default MPA version to 2

 drivers/infiniband/hw/cxgb4/cm.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 2/5] iw_cxgb4: pass the ord/ird in connect reply events

2015-09-07 Thread Hariprasad Shenai
This allows client ULPs to get the negotiated ord/ird which is useful
to avoid stalling the SQ due to exceeding the ORD.

Signed-off-by: Hariprasad Shenai 
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 42087c9..a26d293 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1169,6 +1169,8 @@ static void connect_reply_upcall(struct c4iw_ep *ep, int 
status)
if ((status == 0) || (status == -ECONNREFUSED)) {
if (!ep->tried_with_mpa_v1) {
/* this means MPA_v2 is used */
+   event.ord = ep->ird;
+   event.ird = ep->ord;
event.private_data_len = ep->plen -
sizeof(struct mpa_v2_conn_params);
event.private_data = ep->mpa_pkt +
@@ -1176,6 +1178,8 @@ static void connect_reply_upcall(struct c4iw_ep *ep, int 
status)
sizeof(struct mpa_v2_conn_params);
} else {
/* this means MPA_v1 is used */
+   event.ord = cur_max_read_depth(ep->com.dev);
+   event.ird = cur_max_read_depth(ep->com.dev);
event.private_data_len = ep->plen;
event.private_data = ep->mpa_pkt +
sizeof(struct mpa_message);
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource

2015-09-07 Thread Parav Pandit
Hi Doug, Tejun,

This is from cgroups for-4.3 branch.
linux-rdma trunk will face compilation error as its behind Tejun's
for-4.3 branch.
Patch has dependency on the some of the cgroup subsystem functionality
for fork().
Therefore its required to merge those changes first to linux-rdma trunk.

Parav


On Tue, Sep 8, 2015 at 2:08 AM, Parav Pandit  wrote:
> Currently user space applications can easily take away all the rdma
> device specific resources such as AH, CQ, QP, MR etc. Due to which other
> applications in other cgroup or kernel space ULPs may not even get chance
> to allocate any rdma resources.
>
> This patch-set allows limiting rdma resources to set of processes.
> It extend device cgroup controller for limiting rdma device limits.
>
> With this patch, user verbs module queries rdma device cgroup controller
> to query process's limit to consume such resource. It uncharge resource
> counter after resource is being freed.
>
> It extends the task structure to hold the statistic information about 
> process's
> rdma resource usage so that when process migrates from one to other 
> controller,
> right amount of resources can be migrated from one to other cgroup.
>
> Future patches will support RDMA flows resource and will be enhanced further
> to enforce limit of other resources and capabilities.
>
> Parav Pandit (7):
>   devcg: Added user option to rdma resource tracking.
>   devcg: Added rdma resource tracking module.
>   devcg: Added infrastructure for rdma device cgroup.
>   devcg: Added rdma resource tracker object per task
>   devcg: device cgroup's extension for RDMA resource.
>   devcg: Added support to use RDMA device cgroup.
>   devcg: Added Documentation of RDMA device cgroup.
>
>  Documentation/cgroups/devices.txt |  32 ++-
>  drivers/infiniband/core/uverbs_cmd.c  | 139 +--
>  drivers/infiniband/core/uverbs_main.c |  39 +++-
>  include/linux/device_cgroup.h |  53 +
>  include/linux/device_rdma_cgroup.h|  83 +++
>  include/linux/sched.h |  12 +-
>  init/Kconfig  |  12 +
>  security/Makefile |   1 +
>  security/device_cgroup.c  | 119 +++---
>  security/device_rdma_cgroup.c | 422 
> ++
>  10 files changed, 850 insertions(+), 62 deletions(-)
>  create mode 100644 include/linux/device_rdma_cgroup.h
>  create mode 100644 security/device_rdma_cgroup.c
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] devcg: Added infrastructure for rdma device cgroup.

2015-09-07 Thread Parav Pandit
1. Moved necessary functions and data structures to header file to
reuse them at device cgroup white list functionality and for rdma
functionality.
2. Added infrastructure to invoke RDMA specific routines for resource
configuration, query and during fork handling.
3. Added sysfs interface files for configuring max limit of each rdma
resource and one file for querying controllers current resource usage.

Signed-off-by: Parav Pandit 
---
 include/linux/device_cgroup.h |  53 +++
 security/device_cgroup.c  | 119 +-
 2 files changed, 136 insertions(+), 36 deletions(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 8b64221..cdbdd60 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -1,6 +1,57 @@
+#ifndef _DEVICE_CGROUP
+#define _DEVICE_CGROUP
+
 #include 
+#include 
+#include 
 
 #ifdef CONFIG_CGROUP_DEVICE
+
+enum devcg_behavior {
+   DEVCG_DEFAULT_NONE,
+   DEVCG_DEFAULT_ALLOW,
+   DEVCG_DEFAULT_DENY,
+};
+
+/*
+ * exception list locking rules:
+ * hold devcgroup_mutex for update/read.
+ * hold rcu_read_lock() for read.
+ */
+
+struct dev_exception_item {
+   u32 major, minor;
+   short type;
+   short access;
+   struct list_head list;
+   struct rcu_head rcu;
+};
+
+struct dev_cgroup {
+   struct cgroup_subsys_state css;
+   struct list_head exceptions;
+   enum devcg_behavior behavior;
+
+#ifdef CONFIG_CGROUP_RDMA_RESOURCE
+   struct devcgroup_rdma rdma;
+#endif
+};
+
+static inline struct dev_cgroup *css_to_devcgroup(struct cgroup_subsys_state 
*s)
+{
+   return s ? container_of(s, struct dev_cgroup, css) : NULL;
+}
+
+static inline struct dev_cgroup *parent_devcgroup(struct dev_cgroup *dev_cg)
+{
+   return css_to_devcgroup(dev_cg->css.parent);
+}
+
+static inline struct dev_cgroup *task_devcgroup(struct task_struct *task)
+{
+   return css_to_devcgroup(task_css(task, devices_cgrp_id));
+}
+
 extern int __devcgroup_inode_permission(struct inode *inode, int mask);
 extern int devcgroup_inode_mknod(int mode, dev_t dev);
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
@@ -17,3 +68,5 @@ static inline int devcgroup_inode_permission(struct inode 
*inode, int mask)
 static inline int devcgroup_inode_mknod(int mode, dev_t dev)
 { return 0; }
 #endif
+
+#endif
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 188c1d2..a0b3239 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -25,42 +25,6 @@
 
 static DEFINE_MUTEX(devcgroup_mutex);
 
-enum devcg_behavior {
-   DEVCG_DEFAULT_NONE,
-   DEVCG_DEFAULT_ALLOW,
-   DEVCG_DEFAULT_DENY,
-};
-
-/*
- * exception list locking rules:
- * hold devcgroup_mutex for update/read.
- * hold rcu_read_lock() for read.
- */
-
-struct dev_exception_item {
-   u32 major, minor;
-   short type;
-   short access;
-   struct list_head list;
-   struct rcu_head rcu;
-};
-
-struct dev_cgroup {
-   struct cgroup_subsys_state css;
-   struct list_head exceptions;
-   enum devcg_behavior behavior;
-};
-
-static inline struct dev_cgroup *css_to_devcgroup(struct cgroup_subsys_state 
*s)
-{
-   return s ? container_of(s, struct dev_cgroup, css) : NULL;
-}
-
-static inline struct dev_cgroup *task_devcgroup(struct task_struct *task)
-{
-   return css_to_devcgroup(task_css(task, devices_cgrp_id));
-}
-
 /*
  * called under devcgroup_mutex
  */
@@ -223,6 +187,9 @@ devcgroup_css_alloc(struct cgroup_subsys_state *parent_css)
INIT_LIST_HEAD(&dev_cgroup->exceptions);
dev_cgroup->behavior = DEVCG_DEFAULT_NONE;
 
+#ifdef CONFIG_CGROUP_RDMA_RESOURCE
+   init_devcgroup_rdma_tracker(dev_cgroup);
+#endif
return &dev_cgroup->css;
 }
 
@@ -234,6 +201,25 @@ static void devcgroup_css_free(struct cgroup_subsys_state 
*css)
kfree(dev_cgroup);
 }
 
+#ifdef CONFIG_CGROUP_RDMA_RESOURCE
+static int devcgroup_can_attach(struct cgroup_subsys_state *dst_css,
+   struct cgroup_taskset *tset)
+{
+   return devcgroup_rdma_can_attach(dst_css, tset);
+}
+
+static void devcgroup_cancel_attach(struct cgroup_subsys_state *dst_css,
+   struct cgroup_taskset *tset)
+{
+   devcgroup_cancel_attach(dst_css, tset);
+}
+
+static void devcgroup_fork(struct task_struct *task, void *priv)
+{
+   devcgroup_rdma_fork(task, priv);
+}
+#endif
+
 #define DEVCG_ALLOW 1
 #define DEVCG_DENY 2
 #define DEVCG_LIST 3
@@ -788,6 +774,62 @@ static struct cftype dev_cgroup_files[] = {
.seq_show = devcgroup_seq_show,
.private = DEVCG_LIST,
},
+
+#ifdef CONFIG_CGROUP_RDMA_RESOURCE
+   {
+   .name = "rdma.resource.uctx.max",
+   .write = devcgroup_rdma_set_max_resource,
+   .seq_show = devcgroup_rdma_get_max_resource,
+   .private = DEVCG_RDMA_RES_TYPE_UCTX,
+  

[PATCH 7/7] devcg: Added Documentation of RDMA device cgroup.

2015-09-07 Thread Parav Pandit
Modified device cgroup documentation to reflect its dual purpose
without creating new cgroup subsystem for rdma.

Added documentation to describe functionality and usage of device cgroup
extension for RDMA.

Signed-off-by: Parav Pandit 
---
 Documentation/cgroups/devices.txt | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/cgroups/devices.txt 
b/Documentation/cgroups/devices.txt
index 3c1095c..eca5b70 100644
--- a/Documentation/cgroups/devices.txt
+++ b/Documentation/cgroups/devices.txt
@@ -1,9 +1,12 @@
-Device Whitelist Controller
+Device Controller
 
 1. Description:
 
-Implement a cgroup to track and enforce open and mknod restrictions
-on device files.  A device cgroup associates a device access
+Device controller implements a cgroup for two purposes.
+
+1.1 Device white list controller
+It implement a cgroup to track and enforce open and mknod
+restrictions on device files.  A device cgroup associates a device access
 whitelist with each cgroup.  A whitelist entry has 4 fields.
 'type' is a (all), c (char), or b (block).  'all' means it applies
 to all types and all major and minor numbers.  Major and minor are
@@ -15,8 +18,15 @@ cgroup gets a copy of the parent.  Administrators can then 
remove
 devices from the whitelist or add new entries.  A child cgroup can
 never receive a device access which is denied by its parent.
 
+1.2 RDMA device resource controller
+It implements a cgroup to limit various RDMA device resources for
+a controller. Such resource includes RDMA PD, CQ, AH, MR, SRQ, QP, FLOW.
+It limits RDMA resources access to tasks of the cgroup across multiple
+RDMA devices.
+
 2. User Interface
 
+2.1 Device white list controller
 An entry is added using devices.allow, and removed using
 devices.deny.  For instance
 
@@ -33,6 +43,22 @@ will remove the default 'a *:* rwm' entry. Doing
 
 will add the 'a *:* rwm' entry to the whitelist.
 
+2.2 RDMA device controller
+
+RDMA resources are limited using devices.rdma.resource.max..
+Doing
+   echo 200 > /sys/fs/cgroup/1/rdma.resource.max_qp
+will limit maximum number of QP across all the process of cgroup to 200.
+
+More examples:
+   echo 200 > /sys/fs/cgroup/1/rdma.resource.max_flow
+   echo 10  > /sys/fs/cgroup/1/rdma.resource.max_pd
+   echo 15  > /sys/fs/cgroup/1/rdma.resource.max_srq
+   echo 1   > /sys/fs/cgroup/1/rdma.resource.max_uctx
+
+RDMA resource current usage can be tracked using devices.rdma.resource.usage
+   cat /sys/fs/cgroup/1/devices.rdma.resource.usage
+
 3. Security
 
 Any task can move itself between cgroups.  This clearly won't
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] devcg: Added support to use RDMA device cgroup.

2015-09-07 Thread Parav Pandit
RDMA uverbs modules now queries associated device cgroup rdma controller
before allocating device resources and uncharge them while freeing
rdma device resources.
Since fput() sequence can free the resources from the workqueue
context (instead of task context which allocated the resource),
it passes associated ucontext pointer during uncharge, so that
rdma cgroup controller can correctly free the resource of right
task and right cgroup.

Signed-off-by: Parav Pandit 
---
 drivers/infiniband/core/uverbs_cmd.c  | 139 +-
 drivers/infiniband/core/uverbs_main.c |  39 +-
 2 files changed, 156 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index bbb02ff..c080374 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -281,6 +282,19 @@ static void put_xrcd_read(struct ib_uobject *uobj)
put_uobj_read(uobj);
 }
 
+static void init_ucontext_lists(struct ib_ucontext *ucontext)
+{
+   INIT_LIST_HEAD(&ucontext->pd_list);
+   INIT_LIST_HEAD(&ucontext->mr_list);
+   INIT_LIST_HEAD(&ucontext->mw_list);
+   INIT_LIST_HEAD(&ucontext->cq_list);
+   INIT_LIST_HEAD(&ucontext->qp_list);
+   INIT_LIST_HEAD(&ucontext->srq_list);
+   INIT_LIST_HEAD(&ucontext->ah_list);
+   INIT_LIST_HEAD(&ucontext->xrcd_list);
+   INIT_LIST_HEAD(&ucontext->rule_list);
+}
+
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
  const char __user *buf,
  int in_len, int out_len)
@@ -313,22 +327,18 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
   (unsigned long) cmd.response + sizeof resp,
   in_len - sizeof cmd, out_len - sizeof resp);
 
+   ret = devcgroup_rdma_try_charge_resource(DEVCG_RDMA_RES_TYPE_UCTX, 1);
+   if (ret)
+   goto err;
+
ucontext = ibdev->alloc_ucontext(ibdev, &udata);
if (IS_ERR(ucontext)) {
ret = PTR_ERR(ucontext);
-   goto err;
+   goto err_alloc;
}
 
ucontext->device = ibdev;
-   INIT_LIST_HEAD(&ucontext->pd_list);
-   INIT_LIST_HEAD(&ucontext->mr_list);
-   INIT_LIST_HEAD(&ucontext->mw_list);
-   INIT_LIST_HEAD(&ucontext->cq_list);
-   INIT_LIST_HEAD(&ucontext->qp_list);
-   INIT_LIST_HEAD(&ucontext->srq_list);
-   INIT_LIST_HEAD(&ucontext->ah_list);
-   INIT_LIST_HEAD(&ucontext->xrcd_list);
-   INIT_LIST_HEAD(&ucontext->rule_list);
+   init_ucontext_lists(ucontext);
rcu_read_lock();
ucontext->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
rcu_read_unlock();
@@ -395,6 +405,8 @@ err_free:
put_pid(ucontext->tgid);
ibdev->dealloc_ucontext(ucontext);
 
+err_alloc:
+   devcgroup_rdma_uncharge_resource(NULL, DEVCG_RDMA_RES_TYPE_UCTX, 1);
 err:
mutex_unlock(&file->mutex);
return ret;
@@ -412,15 +424,23 @@ static void copy_query_dev_fields(struct ib_uverbs_file 
*file,
resp->vendor_id = attr->vendor_id;
resp->vendor_part_id= attr->vendor_part_id;
resp->hw_ver= attr->hw_ver;
-   resp->max_qp= attr->max_qp;
+   resp->max_qp= min_t(int, attr->max_qp,
+   devcgroup_rdma_query_resource_limit(
+   DEVCG_RDMA_RES_TYPE_QP));
resp->max_qp_wr = attr->max_qp_wr;
resp->device_cap_flags  = attr->device_cap_flags;
resp->max_sge   = attr->max_sge;
resp->max_sge_rd= attr->max_sge_rd;
-   resp->max_cq= attr->max_cq;
+   resp->max_cq= min_t(int, attr->max_cq,
+   devcgroup_rdma_query_resource_limit(
+   DEVCG_RDMA_RES_TYPE_CQ));
resp->max_cqe   = attr->max_cqe;
-   resp->max_mr= attr->max_mr;
-   resp->max_pd= attr->max_pd;
+   resp->max_mr= min_t(int, attr->max_mr,
+   devcgroup_rdma_query_resource_limit(
+   DEVCG_RDMA_RES_TYPE_MR));
+   resp->max_pd= min_t(int, attr->max_pd,
+   devcgroup_rdma_query_resource_limit(
+   DEVCG_RDMA_RES_TYPE_PD));
resp->max_qp_rd_atom= attr->max_qp_rd_atom;
resp->max_ee_rd_atom= attr->max_ee_rd_atom;
resp->max_res_rd_atom   = attr->max_res_rd_atom;
@@ -429,16 +449,22 @@ static void copy_query_dev_fields(struct ib_uverbs_file 
*file,
resp->atomic_cap= attr->atomic_cap;
resp->max_ee= attr->max_ee;
   

[PATCH 4/7] devcg: Added rdma resource tracker object per task

2015-09-07 Thread Parav Pandit
Added RDMA device resource tracking object per task.
Added comments to capture usage of task lock by device cgroup
for rdma.

Signed-off-by: Parav Pandit 
---
 include/linux/sched.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ae21f15..a5f79b6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1334,6 +1334,8 @@ union rcu_special {
 };
 struct rcu_node;
 
+struct task_rdma_res_counter;
+
 enum perf_event_task_context {
perf_invalid_context = -1,
perf_hw_context = 0,
@@ -1637,6 +1639,14 @@ struct task_struct {
struct css_set __rcu *cgroups;
/* cg_list protected by css_set_lock and tsk->alloc_lock */
struct list_head cg_list;
+
+#ifdef CONFIG_CGROUP_RDMA_RESOURCE
+   /* RDMA resource accounting counters, allocated only
+* when RDMA resources are created by a task.
+*/
+   struct task_rdma_res_counter *rdma_res_counter;
+#endif
+
 #endif
 #ifdef CONFIG_FUTEX
struct robust_list_head __user *robust_list;
@@ -2676,7 +2686,7 @@ static inline int thread_group_empty(struct task_struct 
*p)
  * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
  * subscriptions and synchronises with wait4().  Also used in procfs.  Also
  * pins the final release of task.io_context.  Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. Also projtects ->vfork_done and ->rdma_res_counter.
  *
  * Nests both inside and outside of read_lock(&tasklist_lock).
  * It must not be nested with write_lock_irq(&tasklist_lock),
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] devcg: device cgroup's extension for RDMA resource.

2015-09-07 Thread Parav Pandit
Extension of device cgroup for RDMA device resources.
This implements RDMA resource tracker to limit RDMA resources such as
AH, CQ, PD, QP, MR, SRQ etc resources for processes of the cgroup.
It implements RDMA resource limit module to limit consuming RDMA
resources for processes of the cgroup.
RDMA resources are tracked on per task basis.
RDMA resources across multiple such devices are limited among multiple
processes of the owning device cgroup.

RDMA device cgroup extension returns error when user space applications
try to allocate resources more than its configured limit.

Signed-off-by: Parav Pandit 
---
 include/linux/device_rdma_cgroup.h |  83 
 security/device_rdma_cgroup.c  | 422 +
 2 files changed, 505 insertions(+)
 create mode 100644 include/linux/device_rdma_cgroup.h
 create mode 100644 security/device_rdma_cgroup.c

diff --git a/include/linux/device_rdma_cgroup.h 
b/include/linux/device_rdma_cgroup.h
new file mode 100644
index 000..a2c261b
--- /dev/null
+++ b/include/linux/device_rdma_cgroup.h
@@ -0,0 +1,83 @@
+#ifndef _DEVICE_RDMA_CGROUP_H
+#define _DEVICE_RDMA_CGROUP_H
+
+#include 
+
+/* RDMA resources from device cgroup perspective */
+enum devcgroup_rdma_rt {
+   DEVCG_RDMA_RES_TYPE_UCTX,
+   DEVCG_RDMA_RES_TYPE_CQ,
+   DEVCG_RDMA_RES_TYPE_PD,
+   DEVCG_RDMA_RES_TYPE_AH,
+   DEVCG_RDMA_RES_TYPE_MR,
+   DEVCG_RDMA_RES_TYPE_MW,
+   DEVCG_RDMA_RES_TYPE_SRQ,
+   DEVCG_RDMA_RES_TYPE_QP,
+   DEVCG_RDMA_RES_TYPE_FLOW,
+   DEVCG_RDMA_RES_TYPE_MAX,
+};
+
+struct ib_ucontext;
+
+#define DEVCG_RDMA_MAX_RESOURCES S32_MAX
+
+#ifdef CONFIG_CGROUP_RDMA_RESOURCE
+
+#define DEVCG_RDMA_MAX_RESOURCE_STR "max"
+
+enum devcgroup_rdma_access_files {
+   DEVCG_RDMA_LIST_USAGE,
+};
+
+struct task_rdma_res_counter {
+   /* allows atomic increment of task and cgroup counters
+*  to avoid race with migration task.
+*/
+   spinlock_t lock;
+   u32 usage[DEVCG_RDMA_RES_TYPE_MAX];
+};
+
+struct devcgroup_rdma_tracker {
+   int limit;
+   atomic_t usage;
+   int failcnt;
+};
+
+struct devcgroup_rdma {
+   struct devcgroup_rdma_tracker tracker[DEVCG_RDMA_RES_TYPE_MAX];
+};
+
+struct dev_cgroup;
+
+void init_devcgroup_rdma_tracker(struct dev_cgroup *dev_cg);
+ssize_t devcgroup_rdma_set_max_resource(struct kernfs_open_file *of,
+   char *buf,
+   size_t nbytes, loff_t off);
+int devcgroup_rdma_get_max_resource(struct seq_file *m, void *v);
+int devcgroup_rdma_show_usage(struct seq_file *m, void *v);
+
+int devcgroup_rdma_try_charge_resource(enum devcgroup_rdma_rt type, int num);
+void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext,
+ enum devcgroup_rdma_rt type, int num);
+void devcgroup_rdma_fork(struct task_struct *task, void *priv);
+
+int devcgroup_rdma_can_attach(struct cgroup_subsys_state *css,
+ struct cgroup_taskset *tset);
+void devcgroup_rdma_cancel_attach(struct cgroup_subsys_state *css,
+ struct cgroup_taskset *tset);
+int devcgroup_rdma_query_resource_limit(enum devcgroup_rdma_rt type);
+#else
+
+static inline int devcgroup_rdma_try_charge_resource(
+   enum devcgroup_rdma_rt type, int num)
+{ return 0; }
+static inline void devcgroup_rdma_uncharge_resource(
+   struct ib_ucontext *ucontext,
+   enum devcgroup_rdma_rt type, int num)
+{ }
+static inline int devcgroup_rdma_query_resource_limit(
+   enum devcgroup_rdma_rt type)
+{ return DEVCG_RDMA_MAX_RESOURCES; }
+#endif
+
+#endif
diff --git a/security/device_rdma_cgroup.c b/security/device_rdma_cgroup.c
new file mode 100644
index 000..fb4cc59
--- /dev/null
+++ b/security/device_rdma_cgroup.c
@@ -0,0 +1,422 @@
+/*
+ * RDMA device cgroup controller of device controller cgroup.
+ *
+ * Provides a cgroup hierarchy to limit various RDMA resource allocation to a
+ * configured limit of the cgroup.
+ *
+ * Its easy for user space applications to consume of RDMA device specific
+ * hardware resources. Such resource exhaustion should be prevented so that
+ * user space applications and other kernel consumers gets chance to allocate
+ * and effectively use the hardware resources.
+ *
+ * In order to use the device rdma controller, set the maximum resource count
+ * per cgroup, which ensures that total rdma resources for processes belonging
+ * to a cgroup doesn't exceed configured limit.
+ *
+ * RDMA resource limits are hierarchical, so the highest configured limit of
+ * the hierarchy is enforced. Allowing resource limit configuration to default
+ * cgroup allows fair share to kernel space ULPs as well.
+ *
+ * This file is subject to the terms and conditions of version 2 of the GNU
+ * General Public License.  See the file COPYING in the

[PATCH 2/7] devcg: Added rdma resource tracking module.

2015-09-07 Thread Parav Pandit
Added RDMA resource tracking object of device cgroup.

Signed-off-by: Parav Pandit 
---
 security/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/Makefile b/security/Makefile
index c9bfbc8..c9ad56d 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_SECURITY_TOMOYO) += tomoyo/
 obj-$(CONFIG_SECURITY_APPARMOR)+= apparmor/
 obj-$(CONFIG_SECURITY_YAMA)+= yama/
 obj-$(CONFIG_CGROUP_DEVICE)+= device_cgroup.o
+obj-$(CONFIG_CGROUP_RDMA_RESOURCE) += device_rdma_cgroup.o
 
 # Object integrity file lists
 subdir-$(CONFIG_INTEGRITY) += integrity
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] devcg: Added user option to rdma resource tracking.

2015-09-07 Thread Parav Pandit
Added user configuration option to enable/disable RDMA resource tracking
feature of device cgroup as sub module.

Signed-off-by: Parav Pandit 
---
 init/Kconfig | 12 
 1 file changed, 12 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index 2184b34..089db85 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -977,6 +977,18 @@ config CGROUP_DEVICE
  Provides a cgroup implementing whitelists for devices which
  a process in the cgroup can mknod or open.
 
+config CGROUP_RDMA_RESOURCE
+   bool "RDMA Resource Controller for cgroups"
+   depends on CGROUP_DEVICE
+   default n
+   help
+ This option enables limiting rdma resources for a device cgroup.
+ Using this option, user space processes can be limited to use
+ limited number of RDMA resources such as MR, PD, QP, AH, FLOW, CQ
+ etc.
+
+ Say N if unsure.
+
 config CPUSETS
bool "Cpuset support"
help
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] devcg: device cgroup extension for rdma resource

2015-09-07 Thread Parav Pandit
Currently user space applications can easily take away all the rdma
device specific resources such as AH, CQ, QP, MR etc. Due to which other
applications in other cgroup or kernel space ULPs may not even get chance
to allocate any rdma resources.

This patch-set allows limiting rdma resources to set of processes.
It extend device cgroup controller for limiting rdma device limits.

With this patch, user verbs module queries rdma device cgroup controller
to query process's limit to consume such resource. It uncharge resource 
counter after resource is being freed.

It extends the task structure to hold the statistic information about process's 
rdma resource usage so that when process migrates from one to other controller,
right amount of resources can be migrated from one to other cgroup.

Future patches will support RDMA flows resource and will be enhanced further
to enforce limit of other resources and capabilities.

Parav Pandit (7):
  devcg: Added user option to rdma resource tracking.
  devcg: Added rdma resource tracking module.
  devcg: Added infrastructure for rdma device cgroup.
  devcg: Added rdma resource tracker object per task
  devcg: device cgroup's extension for RDMA resource.
  devcg: Added support to use RDMA device cgroup.
  devcg: Added Documentation of RDMA device cgroup.

 Documentation/cgroups/devices.txt |  32 ++-
 drivers/infiniband/core/uverbs_cmd.c  | 139 +--
 drivers/infiniband/core/uverbs_main.c |  39 +++-
 include/linux/device_cgroup.h |  53 +
 include/linux/device_rdma_cgroup.h|  83 +++
 include/linux/sched.h |  12 +-
 init/Kconfig  |  12 +
 security/Makefile |   1 +
 security/device_cgroup.c  | 119 +++---
 security/device_rdma_cgroup.c | 422 ++
 10 files changed, 850 insertions(+), 62 deletions(-)
 create mode 100644 include/linux/device_rdma_cgroup.h
 create mode 100644 security/device_rdma_cgroup.c

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html