[PATCH v4 0/7] Inode security label invalidation

2015-10-28 Thread Andreas Gruenbacher
Here is another version of the patch queue to make gfs2 and similar file
systems work with SELinux.

In this version, dentry_security() helper has been renamed to
backing_inode_security() to make it more obvious that it revalidates the
backing inode of its dentry argument.  The file_path_has_perm and
file_has_perm functions no longer revalidate inode security labels; callers
that may sleep can call inode_security_revalidate() themselves instead.
The revalidation functions now make use of might_sleep() when appropriate
so that any remaining bugs should turn up soon.

With this version of the patch queue, the SELinux test suite passes:

  https://github.com/SELinuxProject/selinux-testsuite

Could you please review?

Thanks,
Andreas

Andreas Gruenbacher (7):
  selinux: Remove unused variable in selinux_inode_init_security
  security: Make inode argument of inode_getsecurity non-const
  security: Make inode argument of inode_getsecid non-const
  selinux: Add accessor functions for inode->i_security
  security: Add hook to invalidate inode security labels
  selinux: Revalidate invalid inode security labels
  gfs2: Invalide security labels of inodes when they go invalid

 fs/gfs2/glops.c   |   2 +
 include/linux/audit.h |   8 +-
 include/linux/lsm_hooks.h |  10 +-
 include/linux/security.h  |  13 ++-
 kernel/audit.c|   2 +-
 kernel/audit.h|   2 +-
 kernel/auditsc.c  |   6 +-
 security/security.c   |  12 ++-
 security/selinux/hooks.c  | 197 +++---
 security/selinux/include/objsec.h |   6 ++
 security/smack/smack_lsm.c|   4 +-
 11 files changed, 186 insertions(+), 76 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/7] security: Make inode argument of inode_getsecurity non-const

2015-10-28 Thread Andreas Gruenbacher
Make the inode argument of the inode_getsecurity hook non-const so that
we can use it to revalidate invalid security labels.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/lsm_hooks.h  | 2 +-
 include/linux/security.h   | 4 ++--
 security/security.c| 2 +-
 security/selinux/hooks.c   | 2 +-
 security/smack/smack_lsm.c | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index ec3a6ba..bdd0a3a 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1413,7 +1413,7 @@ union security_list_options {
int (*inode_removexattr)(struct dentry *dentry, const char *name);
int (*inode_need_killpriv)(struct dentry *dentry);
int (*inode_killpriv)(struct dentry *dentry);
-   int (*inode_getsecurity)(const struct inode *inode, const char *name,
+   int (*inode_getsecurity)(struct inode *inode, const char *name,
void **buffer, bool alloc);
int (*inode_setsecurity)(struct inode *inode, const char *name,
const void *value, size_t size,
diff --git a/include/linux/security.h b/include/linux/security.h
index 2f4c1f7..9ee61b2 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -270,7 +270,7 @@ int security_inode_listxattr(struct dentry *dentry);
 int security_inode_removexattr(struct dentry *dentry, const char *name);
 int security_inode_need_killpriv(struct dentry *dentry);
 int security_inode_killpriv(struct dentry *dentry);
-int security_inode_getsecurity(const struct inode *inode, const char *name, 
void **buffer, bool alloc);
+int security_inode_getsecurity(struct inode *inode, const char *name, void 
**buffer, bool alloc);
 int security_inode_setsecurity(struct inode *inode, const char *name, const 
void *value, size_t size, int flags);
 int security_inode_listsecurity(struct inode *inode, char *buffer, size_t 
buffer_size);
 void security_inode_getsecid(const struct inode *inode, u32 *secid);
@@ -719,7 +719,7 @@ static inline int security_inode_killpriv(struct dentry 
*dentry)
return cap_inode_killpriv(dentry);
 }
 
-static inline int security_inode_getsecurity(const struct inode *inode, const 
char *name, void **buffer, bool alloc)
+static inline int security_inode_getsecurity(struct inode *inode, const char 
*name, void **buffer, bool alloc)
 {
return -EOPNOTSUPP;
 }
diff --git a/security/security.c b/security/security.c
index 46f405c..73514c9 100644
--- a/security/security.c
+++ b/security/security.c
@@ -697,7 +697,7 @@ int security_inode_killpriv(struct dentry *dentry)
return call_int_hook(inode_killpriv, 0, dentry);
 }
 
-int security_inode_getsecurity(const struct inode *inode, const char *name, 
void **buffer, bool alloc)
+int security_inode_getsecurity(struct inode *inode, const char *name, void 
**buffer, bool alloc)
 {
if (unlikely(IS_PRIVATE(inode)))
return -EOPNOTSUPP;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index fc8f626..adec2e2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3110,7 +3110,7 @@ static int selinux_inode_removexattr(struct dentry 
*dentry, const char *name)
  *
  * Permission check is handled by selinux_inode_getxattr hook.
  */
-static int selinux_inode_getsecurity(const struct inode *inode, const char 
*name, void **buffer, bool alloc)
+static int selinux_inode_getsecurity(struct inode *inode, const char *name, 
void **buffer, bool alloc)
 {
u32 size;
int error;
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 996c889..07d0344 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1435,7 +1435,7 @@ static int smack_inode_removexattr(struct dentry *dentry, 
const char *name)
  *
  * Returns the size of the attribute or an error code
  */
-static int smack_inode_getsecurity(const struct inode *inode,
+static int smack_inode_getsecurity(struct inode *inode,
   const char *name, void **buffer,
   bool alloc)
 {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 7/7] gfs2: Invalide security labels of inodes when they go invalid

2015-10-28 Thread Andreas Gruenbacher
When gfs2 releases the glock of an inode, it must invalidate all
information cached for that inode, including the page cache and acls.  Use
the new security_inode_invalidate_secctx hook to also invalidate security
labels in that case.  These items will be reread from disk when needed
after reacquiring the glock.

Signed-off-by: Andreas Gruenbacher 
Cc: Steven Whitehouse 
Cc: Bob Peterson 
Cc: cluster-de...@redhat.com
---
 fs/gfs2/glops.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 1f6c9c3..0833076 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "gfs2.h"
 #include "incore.h"
@@ -262,6 +263,7 @@ static void inode_go_inval(struct gfs2_glock *gl, int flags)
if (ip) {
set_bit(GIF_INVALID, &ip->i_flags);
forget_all_cached_acls(&ip->i_inode);
+   security_inode_invalidate_secctx(&ip->i_inode);
gfs2_dir_hash_inval(ip);
}
}
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 5/7] security: Add hook to invalidate inode security labels

2015-10-28 Thread Andreas Gruenbacher
Add a hook to invalidate an inode's security label when the cached
information becomes invalid.

Add the new hook in selinux: set a flag when a security label becomes
invalid.

Signed-off-by: Andreas Gruenbacher 
Reviewed-by: James Morris 
---
 include/linux/lsm_hooks.h |  6 ++
 include/linux/security.h  |  5 +
 security/security.c   |  8 
 security/selinux/hooks.c  | 30 --
 security/selinux/include/objsec.h |  6 ++
 5 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 4c48227..71969de 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1261,6 +1261,10 @@
  * audit_rule_init.
  * @rule contains the allocated rule
  *
+ * @inode_invalidate_secctx:
+ * Notify the security module that it must revalidate the security context
+ * of an inode.
+ *
  * @inode_notifysecctx:
  * Notify the security module of what the security context of an inode
  * should be.  Initializes the incore security context managed by the
@@ -1516,6 +1520,7 @@ union security_list_options {
int (*secctx_to_secid)(const char *secdata, u32 seclen, u32 *secid);
void (*release_secctx)(char *secdata, u32 seclen);
 
+   void (*inode_invalidate_secctx)(struct inode *inode);
int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen);
int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen);
int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen);
@@ -1757,6 +1762,7 @@ struct security_hook_heads {
struct list_head secid_to_secctx;
struct list_head secctx_to_secid;
struct list_head release_secctx;
+   struct list_head inode_invalidate_secctx;
struct list_head inode_notifysecctx;
struct list_head inode_setsecctx;
struct list_head inode_getsecctx;
diff --git a/include/linux/security.h b/include/linux/security.h
index e79149a..4824a4c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -353,6 +353,7 @@ int security_secid_to_secctx(u32 secid, char **secdata, u32 
*seclen);
 int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void security_release_secctx(char *secdata, u32 seclen);
 
+void security_inode_invalidate_secctx(struct inode *inode);
 int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
 int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
@@ -1093,6 +1094,10 @@ static inline void security_release_secctx(char 
*secdata, u32 seclen)
 {
 }
 
+static inline void security_inode_invalidate_secctx(struct inode *inode)
+{
+}
+
 static inline int security_inode_notifysecctx(struct inode *inode, void *ctx, 
u32 ctxlen)
 {
return -EOPNOTSUPP;
diff --git a/security/security.c b/security/security.c
index c5beb7e..e8ffd92 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1161,6 +1161,12 @@ void security_release_secctx(char *secdata, u32 seclen)
 }
 EXPORT_SYMBOL(security_release_secctx);
 
+void security_inode_invalidate_secctx(struct inode *inode)
+{
+   call_void_hook(inode_invalidate_secctx, inode);
+}
+EXPORT_SYMBOL(security_inode_invalidate_secctx);
+
 int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
 {
return call_int_hook(inode_notifysecctx, 0, inode, ctx, ctxlen);
@@ -1763,6 +1769,8 @@ struct security_hook_heads security_hook_heads = {
LIST_HEAD_INIT(security_hook_heads.secctx_to_secid),
.release_secctx =
LIST_HEAD_INIT(security_hook_heads.release_secctx),
+   .inode_invalidate_secctx =
+   LIST_HEAD_INIT(security_hook_heads.inode_invalidate_secctx),
.inode_notifysecctx =
LIST_HEAD_INIT(security_hook_heads.inode_notifysecctx),
.inode_setsecctx =
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 48d1908..dcac6dc 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -820,7 +820,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
goto out;
 
root_isec->sid = rootcontext_sid;
-   root_isec->initialized = 1;
+   root_isec->initialized = LABEL_INITIALIZED;
}
 
if (defcontext_sid) {
@@ -1308,11 +1308,11 @@ static int inode_doinit_with_dentry(struct inode 
*inode, struct dentry *opt_dent
unsigned len = 0;
int rc = 0;
 
-   if (isec->initialized)
+   if (isec->initialized == LABEL_INITIALIZED)
goto out;
 
mutex_lock(&isec->lock);
-   if (isec->initialized)
+   if (isec->initialized == LABEL_INITIALIZED)
goto out_unlock;
 
sbsec = inode->i_sb->s_security;
@@ -1484,7 +1484,7 @@ static int inode_doinit_with_dentry(stru

[PATCH v4 4/7] selinux: Add accessor functions for inode->i_security

2015-10-28 Thread Andreas Gruenbacher
Add functions dentry_security and inode_security for accessing
inode->i_security.  These functions initially don't do much, but they
will later be used to revalidate the security labels when necessary.

Signed-off-by: Andreas Gruenbacher 
---
 security/selinux/hooks.c | 97 
 1 file changed, 56 insertions(+), 41 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a8f09af..48d1908 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -241,6 +241,24 @@ static int inode_alloc_security(struct inode *inode)
return 0;
 }
 
+/*
+ * Get the security label of an inode.
+ */
+static struct inode_security_struct *inode_security(struct inode *inode)
+{
+   return inode->i_security;
+}
+
+/*
+ * Get the security label of a dentry's backing inode.
+ */
+static struct inode_security_struct *backing_inode_security(struct dentry 
*dentry)
+{
+   struct inode *inode = d_backing_inode(dentry);
+
+   return inode->i_security;
+}
+
 static void inode_free_rcu(struct rcu_head *head)
 {
struct inode_security_struct *isec;
@@ -564,8 +582,8 @@ static int selinux_get_mnt_opts(const struct super_block 
*sb,
opts->mnt_opts_flags[i++] = DEFCONTEXT_MNT;
}
if (sbsec->flags & ROOTCONTEXT_MNT) {
-   struct inode *root = d_backing_inode(sbsec->sb->s_root);
-   struct inode_security_struct *isec = root->i_security;
+   struct dentry *root = sbsec->sb->s_root;
+   struct inode_security_struct *isec = 
backing_inode_security(root);
 
rc = security_sid_to_context(isec->sid, &context, &len);
if (rc)
@@ -620,8 +638,8 @@ static int selinux_set_mnt_opts(struct super_block *sb,
int rc = 0, i;
struct superblock_security_struct *sbsec = sb->s_security;
const char *name = sb->s_type->name;
-   struct inode *inode = d_backing_inode(sbsec->sb->s_root);
-   struct inode_security_struct *root_isec = inode->i_security;
+   struct dentry *root = sbsec->sb->s_root;
+   struct inode_security_struct *root_isec = backing_inode_security(root);
u32 fscontext_sid = 0, context_sid = 0, rootcontext_sid = 0;
u32 defcontext_sid = 0;
char **mount_options = opts->mnt_opts;
@@ -852,8 +870,8 @@ static int selinux_cmp_sb_context(const struct super_block 
*oldsb,
if ((oldflags & DEFCONTEXT_MNT) && old->def_sid != new->def_sid)
goto mismatch;
if (oldflags & ROOTCONTEXT_MNT) {
-   struct inode_security_struct *oldroot = 
d_backing_inode(oldsb->s_root)->i_security;
-   struct inode_security_struct *newroot = 
d_backing_inode(newsb->s_root)->i_security;
+   struct inode_security_struct *oldroot = 
backing_inode_security(oldsb->s_root);
+   struct inode_security_struct *newroot = 
backing_inode_security(newsb->s_root);
if (oldroot->sid != newroot->sid)
goto mismatch;
}
@@ -903,17 +921,14 @@ static int selinux_sb_clone_mnt_opts(const struct 
super_block *oldsb,
if (!set_fscontext)
newsbsec->sid = sid;
if (!set_rootcontext) {
-   struct inode *newinode = d_backing_inode(newsb->s_root);
-   struct inode_security_struct *newisec = 
newinode->i_security;
+   struct inode_security_struct *newisec = 
backing_inode_security(newsb->s_root);
newisec->sid = sid;
}
newsbsec->mntpoint_sid = sid;
}
if (set_rootcontext) {
-   const struct inode *oldinode = d_backing_inode(oldsb->s_root);
-   const struct inode_security_struct *oldisec = 
oldinode->i_security;
-   struct inode *newinode = d_backing_inode(newsb->s_root);
-   struct inode_security_struct *newisec = newinode->i_security;
+   const struct inode_security_struct *oldisec = 
backing_inode_security(oldsb->s_root);
+   struct inode_security_struct *newisec = 
backing_inode_security(newsb->s_root);
 
newisec->sid = oldisec->sid;
}
@@ -1712,13 +1727,13 @@ out:
 /*
  * Determine the label for an inode that might be unioned.
  */
-static int selinux_determine_inode_label(const struct inode *dir,
+static int selinux_determine_inode_label(struct inode *dir,
 const struct qstr *name,
 u16 tclass,
 u32 *_new_isid)
 {
const struct superblock_security_struct *sbsec = dir->i_sb->s_security;
-   const struct inode_security_struct *dsec = dir->i_security;
+   const struct inode_security_struct *dsec = inode_security(dir);
const struct task_security_struct *tsec = current_security();
 
if ((sbsec->fla

[PATCH v4 3/7] security: Make inode argument of inode_getsecid non-const

2015-10-28 Thread Andreas Gruenbacher
Make the inode argument of the inode_getsecid hook non-const so that we
can use it to revalidate invalid security labels.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/audit.h  | 8 
 include/linux/lsm_hooks.h  | 2 +-
 include/linux/security.h   | 4 ++--
 kernel/audit.c | 2 +-
 kernel/audit.h | 2 +-
 kernel/auditsc.c   | 6 +++---
 security/security.c| 2 +-
 security/selinux/hooks.c   | 2 +-
 security/smack/smack_lsm.c | 2 +-
 9 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b2abc99..7a9e0d7 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -137,7 +137,7 @@ extern void __audit_getname(struct filename *name);
 extern void __audit_inode(struct filename *name, const struct dentry *dentry,
unsigned int flags);
 extern void __audit_file(const struct file *);
-extern void __audit_inode_child(const struct inode *parent,
+extern void __audit_inode_child(struct inode *parent,
const struct dentry *dentry,
const unsigned char type);
 extern void __audit_seccomp(unsigned long syscall, long signr, int code);
@@ -202,7 +202,7 @@ static inline void audit_inode_parent_hidden(struct 
filename *name,
__audit_inode(name, dentry,
AUDIT_INODE_PARENT | AUDIT_INODE_HIDDEN);
 }
-static inline void audit_inode_child(const struct inode *parent,
+static inline void audit_inode_child(struct inode *parent,
 const struct dentry *dentry,
 const unsigned char type) {
if (unlikely(!audit_dummy_context()))
@@ -359,7 +359,7 @@ static inline void __audit_inode(struct filename *name,
const struct dentry *dentry,
unsigned int flags)
 { }
-static inline void __audit_inode_child(const struct inode *parent,
+static inline void __audit_inode_child(struct inode *parent,
const struct dentry *dentry,
const unsigned char type)
 { }
@@ -373,7 +373,7 @@ static inline void audit_file(struct file *file)
 static inline void audit_inode_parent_hidden(struct filename *name,
const struct dentry *dentry)
 { }
-static inline void audit_inode_child(const struct inode *parent,
+static inline void audit_inode_child(struct inode *parent,
 const struct dentry *dentry,
 const unsigned char type)
 { }
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index bdd0a3a..4c48227 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1420,7 +1420,7 @@ union security_list_options {
int flags);
int (*inode_listsecurity)(struct inode *inode, char *buffer,
size_t buffer_size);
-   void (*inode_getsecid)(const struct inode *inode, u32 *secid);
+   void (*inode_getsecid)(struct inode *inode, u32 *secid);
 
int (*file_permission)(struct file *file, int mask);
int (*file_alloc_security)(struct file *file);
diff --git a/include/linux/security.h b/include/linux/security.h
index 9ee61b2..e79149a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -273,7 +273,7 @@ int security_inode_killpriv(struct dentry *dentry);
 int security_inode_getsecurity(struct inode *inode, const char *name, void 
**buffer, bool alloc);
 int security_inode_setsecurity(struct inode *inode, const char *name, const 
void *value, size_t size, int flags);
 int security_inode_listsecurity(struct inode *inode, char *buffer, size_t 
buffer_size);
-void security_inode_getsecid(const struct inode *inode, u32 *secid);
+void security_inode_getsecid(struct inode *inode, u32 *secid);
 int security_file_permission(struct file *file, int mask);
 int security_file_alloc(struct file *file);
 void security_file_free(struct file *file);
@@ -734,7 +734,7 @@ static inline int security_inode_listsecurity(struct inode 
*inode, char *buffer,
return 0;
 }
 
-static inline void security_inode_getsecid(const struct inode *inode, u32 
*secid)
+static inline void security_inode_getsecid(struct inode *inode, u32 *secid)
 {
*secid = 0;
 }
diff --git a/kernel/audit.c b/kernel/audit.c
index 662c007..d20f674 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1708,7 +1708,7 @@ static inline int audit_copy_fcaps(struct audit_names 
*name,
 
 /* Copy inode data into an audit_names. */
 void audit_copy_inode(struct audit_names *name, const struct dentry *dentry,
- const struct inode *inode)
+ struct inode *inode)
 {
name->ino   = inode->i_ino;
name->dev   = inode->i_sb->s_dev;
diff --git a/

[PATCH v4 1/7] selinux: Remove unused variable in selinux_inode_init_security

2015-10-28 Thread Andreas Gruenbacher
Signed-off-by: Andreas Gruenbacher 
Acked-by: Stephen Smalley 
---
 security/selinux/hooks.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e4369d8..fc8f626 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2756,13 +2756,11 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
   void **value, size_t *len)
 {
const struct task_security_struct *tsec = current_security();
-   struct inode_security_struct *dsec;
struct superblock_security_struct *sbsec;
u32 sid, newsid, clen;
int rc;
char *context;
 
-   dsec = dir->i_security;
sbsec = dir->i_sb->s_security;
 
sid = tsec->sid;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 6/7] selinux: Revalidate invalid inode security labels

2015-10-28 Thread Andreas Gruenbacher
When fetching an inode's security label, check if it is still valid, and
try reloading it if it is not. Reloading will fail when we are in RCU
context which doesn't allow sleeping, or when we can't find a dentry for
the inode.  (Reloading happens via iop->getxattr which takes a dentry
parameter.)  When reloading fails, continue using the old, invalid
label.

Signed-off-by: Andreas Gruenbacher 
---
 security/selinux/hooks.c | 70 
 1 file changed, 65 insertions(+), 5 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index dcac6dc..47dc704 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -241,11 +241,63 @@ static int inode_alloc_security(struct inode *inode)
return 0;
 }
 
+static int inode_doinit_with_dentry(struct inode *inode, struct dentry 
*opt_dentry);
+
+/*
+ * Try reloading inode security labels that have been marked as invalid.  The
+ * @may_sleep parameter indicates when sleeping and thus reloading labels is
+ * allowed; when set to false, returns ERR_PTR(-ECHILD) when the label is
+ * invalid.  The @opt_dentry parameter should be set to a dentry of the inode;
+ * when no dentry is available, set it to NULL instead.
+ */
+static int __inode_security_revalidate(struct inode *inode,
+  struct dentry *opt_dentry,
+  bool may_sleep)
+{
+   struct inode_security_struct *isec = inode->i_security;
+
+   might_sleep_if(may_sleep);
+
+   if (isec->initialized == LABEL_INVALID) {
+   if (!may_sleep)
+   return -ECHILD;
+
+   /*
+* Try reloading the inode security label.  This will fail if
+* @opt_dentry is NULL and no dentry for this inode can be
+* found; in that case, continue using the old label.
+*/
+   inode_doinit_with_dentry(inode, opt_dentry);
+   }
+   return 0;
+}
+
+static void inode_security_revalidate(struct inode *inode)
+{
+   __inode_security_revalidate(inode, NULL, true);
+}
+
+static struct inode_security_struct *inode_security_novalidate(struct inode 
*inode)
+{
+   return inode->i_security;
+}
+
+static struct inode_security_struct *inode_security_rcu(struct inode *inode, 
bool rcu)
+{
+   int error;
+
+   error = __inode_security_revalidate(inode, NULL, !rcu);
+   if (error)
+   return ERR_PTR(error);
+   return inode->i_security;
+}
+
 /*
  * Get the security label of an inode.
  */
 static struct inode_security_struct *inode_security(struct inode *inode)
 {
+   __inode_security_revalidate(inode, NULL, true);
return inode->i_security;
 }
 
@@ -256,6 +308,7 @@ static struct inode_security_struct 
*backing_inode_security(struct dentry *dentr
 {
struct inode *inode = d_backing_inode(dentry);
 
+   __inode_security_revalidate(inode, dentry, true);
return inode->i_security;
 }
 
@@ -362,8 +415,6 @@ static const char *labeling_behaviors[7] = {
"uses native labeling",
 };
 
-static int inode_doinit_with_dentry(struct inode *inode, struct dentry 
*opt_dentry);
-
 static inline int inode_doinit(struct inode *inode)
 {
return inode_doinit_with_dentry(inode, NULL);
@@ -1655,6 +1706,7 @@ static inline int dentry_has_perm(const struct cred *cred,
 
ad.type = LSM_AUDIT_DATA_DENTRY;
ad.u.dentry = dentry;
+   __inode_security_revalidate(inode, dentry, true);
return inode_has_perm(cred, inode, av, &ad);
 }
 
@@ -1670,6 +1722,7 @@ static inline int path_has_perm(const struct cred *cred,
 
ad.type = LSM_AUDIT_DATA_PATH;
ad.u.path = *path;
+   __inode_security_revalidate(inode, path->dentry, true);
return inode_has_perm(cred, inode, av, &ad);
 }
 
@@ -2874,7 +2927,9 @@ static int selinux_inode_follow_link(struct dentry 
*dentry, struct inode *inode,
ad.type = LSM_AUDIT_DATA_DENTRY;
ad.u.dentry = dentry;
sid = cred_sid(cred);
-   isec = inode_security(inode);
+   isec = inode_security_rcu(inode, rcu);
+   if (IS_ERR(isec))
+   return PTR_ERR(isec);
 
return avc_has_perm_flags(sid, isec->sid, isec->sclass, FILE__READ, &ad,
  rcu ? MAY_NOT_BLOCK : 0);
@@ -2926,7 +2981,9 @@ static int selinux_inode_permission(struct inode *inode, 
int mask)
perms = file_mask_to_av(inode->i_mode, mask);
 
sid = cred_sid(cred);
-   isec = inode_security(inode);
+   isec = inode_security_rcu(inode, flags & MAY_NOT_BLOCK);
+   if (IS_ERR(isec))
+   return PTR_ERR(isec);
 
rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
audited = avc_audit_required(perms, &avd, rc,
@@ -3234,6 +3291,7 @@ static int selinux_file_permission(struct file *file, int 
mask)
/* No change since file_open check. */
   

Re: [PATCH v3 3/7] selinux: Get rid of file_path_has_perm

2015-10-28 Thread Andreas Gruenbacher
On Wed, Oct 28, 2015 at 7:56 PM, Stephen Smalley  wrote:
> On 10/28/2015 01:31 PM, Stephen Smalley wrote:
>>
>> On 10/28/2015 07:48 AM, Andreas Gruenbacher wrote:
>>>
>>> On Tue, Oct 27, 2015 at 5:40 PM, Stephen Smalley 
>>> wrote:

 On 10/26/2015 05:15 PM, Andreas Gruenbacher wrote:
>
>
> Use path_has_perm directly instead.



 This reverts:

 commit 13f8e9810bff12d01807b6f92329111f45218235
 Author: David Howells 
 Date:   Thu Jun 13 23:37:55 2013 +0100

  SELinux: Institute file_path_has_perm()

  Create a file_path_has_perm() function that is like path_has_perm()
 but
  instead takes a file struct that is the source of both the path and
 the
  inode (rather than getting the inode from the dentry in the path).
 This
  is then used where appropriate.

  This will be useful for situations like unionmount where it will be
  possible to have an apparently-negative dentry (eg. a fallthrough)
 that
 is
  open with the file struct pointing to an inode on the lower fs.

  Signed-off-by: David Howells 
  Signed-off-by: Al Viro 

 which I think David was intending to use as part of his
 SELinux/overlayfs
 support.
>>>
>>>
>>> Okay. As long as overlayfs support in SELinux is in half-finished
>>> state, let's leave this alone.
>>
>>
>> Also, the caller is holding a spinlock (tty_files_lock), so you can't call
>> inode_doinit from
>> here.
>>
>> Try stress testing your patch series by just always setting
>> isec->initialized to LABEL_INVALID.
>> Previously the *has_perm functions could be called under essentially any
>> condition, with the exception
>> of when in a RCU walk and needing to audit the dname (but they did not
>> previously block/sleep).

Using might_sleep() is even better, then CONFIG_DEBUG_ATOMIC_SLEEP
will catch any remaining problems.

> file_has_perm() also gets called from match_file() callback to iterate_fd(),
> which holds files->file_lock.

Yes, thanks.

Andreas
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/7] Inode security label invalidation

2015-10-28 Thread Andreas Gruenbacher
On Wed, Oct 28, 2015 at 10:12 PM, Paul Moore  wrote:
> On Mon, Oct 26, 2015 at 5:15 PM, Andreas Gruenbacher
>  wrote:
>> Here is another version of the patch queue to make gfs2 and similar file
>> systems work with SELinux.  As suggested by Stephen Smalley [*], the relevant
>> uses of inode->security are wrapped in function calls that try to revalidate
>> invalid labels.
>>
>>   [*] http://marc.info/?l=linux-kernel&m=144416710207686&w=2
>>
>> The patches are looking good from my point of view; is there anything else 
>> that
>> needs addressing?
>
> Hi Andreas,
>
> I'm largely staying out of the way on this patchset as Stephen has
> been providing good review and feedback (I see he identified a few
> more things in this latest revision),

Yes, Stephen is being very helpful.

> however, before I accept this
> upstream I'd like to see an ACK from one of the GFS developers on the
> last patch which touches the code under fs/gfs2.

Sure, no worries there ...

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/7] Inode security label invalidation

2015-10-28 Thread Paul Moore
On Mon, Oct 26, 2015 at 5:15 PM, Andreas Gruenbacher
 wrote:
> Here is another version of the patch queue to make gfs2 and similar file
> systems work with SELinux.  As suggested by Stephen Smalley [*], the relevant
> uses of inode->security are wrapped in function calls that try to revalidate
> invalid labels.
>
>   [*] http://marc.info/?l=linux-kernel&m=144416710207686&w=2
>
> The patches are looking good from my point of view; is there anything else 
> that
> needs addressing?

Hi Andreas,

I'm largely staying out of the way on this patchset as Stephen has
been providing good review and feedback (I see he identified a few
more things in this latest revision), however, before I accept this
upstream I'd like to see an ACK from one of the GFS developers on the
last patch which touches the code under fs/gfs2.  Yes, it's a minor
change, and probably not strictly necessary, but I would like for us
to be good neighbors when possible.

Thanks.

-- 
paul moore
www.paul-moore.com
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] selinux: export validatetrans decisions

2015-10-28 Thread Stephen Smalley

On 10/27/2015 04:48 PM, Andrew Perepechko wrote:

Make validatetrans decisions available through selinuxfs.
"/validatetrans" is added to selinuxfs for this purpose.
This functionality is needed by file system servers
implemented in userspace or kernelspace without the VFS
layer.

Writing "$oldcontext $newcontext $tclass $taskcontext"
to /validatetrans is expected to return 0 if the transition
is allowed and -EPERM otherwise.

Signed-off-by: Andrew Perepechko 
CC: andrew.perepec...@seagate.com
---
  security/selinux/include/classmap.h |2 +-
  security/selinux/include/security.h |3 +
  security/selinux/selinuxfs.c|   79 +++
  security/selinux/ss/services.c  |   41 ++
  4 files changed, 114 insertions(+), 11 deletions(-)

diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 5a4eef5..ef83c4b 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -21,7 +21,7 @@ struct security_class_mapping secclass_map[] = {
  { "compute_av", "compute_create", "compute_member",
"check_context", "load_policy", "compute_relabel",
"compute_user", "setenforce", "setbool", "setsecparam",
-   "setcheckreqprot", "read_policy", NULL } },
+   "setcheckreqprot", "read_policy", "validate_trans", NULL } },
{ "process",
  { "fork", "transition", "sigchld", "sigkill",
"sigstop", "signull", "signal", "ptrace", "getsched", "setsched",
diff --git a/security/selinux/include/security.h 
b/security/selinux/include/security.h
index 223e9fd..38feb55 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -187,6 +187,9 @@ int security_node_sid(u16 domain, void *addr, u32 addrlen,
  int security_validate_transition(u32 oldsid, u32 newsid, u32 tasksid,
 u16 tclass);

+int security_validate_transition_user(u32 oldsid, u32 newsid, u32 tasksid,
+ u16 tclass);
+
  int security_bounded_transition(u32 oldsid, u32 newsid);

  int security_sid_mls_copy(u32 sid, u32 mls_sid, u32 *new_sid);
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index c02da25..e460b4e 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -116,6 +116,7 @@ enum sel_inos {
SEL_DENY_UNKNOWN, /* export unknown deny handling to userspace */
SEL_STATUS, /* export current status using mmap() */
SEL_POLICY, /* allow userspace to read the in kernel policy */
+   SEL_VALIDATE_TRANS, /* compute validatetrans decision */
SEL_INO_NEXT,   /* The next inode number to use */
  };

@@ -653,6 +654,83 @@ static const struct file_operations sel_checkreqprot_ops = 
{
.llseek = generic_file_llseek,
  };

+static ssize_t sel_write_validatetrans(struct file *file,
+   const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   char *oldcon = NULL, *newcon = NULL, *taskcon = NULL;
+   char *req = NULL;
+   u32 osid, nsid, tsid;
+   u16 tclass;
+   int rc;
+
+   rc = task_has_security(current, SECURITY__VALIDATE_TRANS);
+   if (rc)
+   goto out;
+
+   rc = -ENOMEM;
+   if (count >= PAGE_SIZE)
+   goto out;
+
+   /* No partial writes. */
+   rc = -EINVAL;
+   if (*ppos != 0)
+   goto out;
+
+   rc = -ENOMEM;
+   req = kzalloc(count + 1, GFP_KERNEL);
+   if (!req)
+   goto out;
+
+   rc = -EFAULT;
+   if (copy_from_user(req, buf, count))
+   goto out;
+
+   rc = -ENOMEM;
+   oldcon = kzalloc(count + 1, GFP_KERNEL);
+   if (!oldcon)
+   goto out;
+
+   newcon = kzalloc(count + 1, GFP_KERNEL);
+   if (!newcon)
+   goto out;
+
+   taskcon = kzalloc(count + 1, GFP_KERNEL);
+   if (!taskcon)
+   goto out;
+
+   rc = -EINVAL;
+   if (sscanf(req, "%s %s %hu %s", oldcon, newcon, &tclass, taskcon) != 4)
+   goto out;
+
+   rc = security_context_str_to_sid(oldcon, &osid, GFP_KERNEL);
+   if (rc)
+   goto out;
+
+   rc = security_context_str_to_sid(newcon, &nsid, GFP_KERNEL);
+   if (rc)
+   goto out;
+
+   rc = security_context_str_to_sid(taskcon, &tsid, GFP_KERNEL);
+   if (rc)
+   goto out;
+
+   rc = security_validate_transition_user(osid, nsid, tsid, tclass);
+   if (!rc)
+   rc = count;
+out:
+   kfree(req);
+   kfree(oldcon);
+   kfree(newcon);
+   kfree(taskcon);
+   return rc;
+}
+
+static const struct file_operations sel_validatetrans_ops = {
+   .write  = sel_write_validatetrans,
+   .llseek = generic_file_llseek,
+};
+
  /*
   * Remaining nodes use transaction based IO methods

Re: [PATCH v3 3/7] selinux: Get rid of file_path_has_perm

2015-10-28 Thread Stephen Smalley

On 10/28/2015 01:31 PM, Stephen Smalley wrote:

On 10/28/2015 07:48 AM, Andreas Gruenbacher wrote:

On Tue, Oct 27, 2015 at 5:40 PM, Stephen Smalley  wrote:

On 10/26/2015 05:15 PM, Andreas Gruenbacher wrote:


Use path_has_perm directly instead.



This reverts:

commit 13f8e9810bff12d01807b6f92329111f45218235
Author: David Howells 
Date:   Thu Jun 13 23:37:55 2013 +0100

 SELinux: Institute file_path_has_perm()

 Create a file_path_has_perm() function that is like path_has_perm() but
 instead takes a file struct that is the source of both the path and the
 inode (rather than getting the inode from the dentry in the path).  This
 is then used where appropriate.

 This will be useful for situations like unionmount where it will be
 possible to have an apparently-negative dentry (eg. a fallthrough) that
is
 open with the file struct pointing to an inode on the lower fs.

 Signed-off-by: David Howells 
 Signed-off-by: Al Viro 

which I think David was intending to use as part of his SELinux/overlayfs
support.


Okay. As long as overlayfs support in SELinux is in half-finished
state, let's leave this alone.


Also, the caller is holding a spinlock (tty_files_lock), so you can't call 
inode_doinit from
here.

Try stress testing your patch series by just always setting isec->initialized 
to LABEL_INVALID.
Previously the *has_perm functions could be called under essentially any 
condition, with the exception
of when in a RCU walk and needing to audit the dname (but they did not 
previously block/sleep).


file_has_perm() also gets called from match_file() callback to 
iterate_fd(), which holds files->file_lock.




--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/7] selinux: Get rid of file_path_has_perm

2015-10-28 Thread Stephen Smalley
On 10/28/2015 07:48 AM, Andreas Gruenbacher wrote:
> On Tue, Oct 27, 2015 at 5:40 PM, Stephen Smalley  wrote:
>> On 10/26/2015 05:15 PM, Andreas Gruenbacher wrote:
>>>
>>> Use path_has_perm directly instead.
>>
>>
>> This reverts:
>>
>> commit 13f8e9810bff12d01807b6f92329111f45218235
>> Author: David Howells 
>> Date:   Thu Jun 13 23:37:55 2013 +0100
>>
>> SELinux: Institute file_path_has_perm()
>>
>> Create a file_path_has_perm() function that is like path_has_perm() but
>> instead takes a file struct that is the source of both the path and the
>> inode (rather than getting the inode from the dentry in the path).  This
>> is then used where appropriate.
>>
>> This will be useful for situations like unionmount where it will be
>> possible to have an apparently-negative dentry (eg. a fallthrough) that
>> is
>> open with the file struct pointing to an inode on the lower fs.
>>
>> Signed-off-by: David Howells 
>> Signed-off-by: Al Viro 
>>
>> which I think David was intending to use as part of his SELinux/overlayfs
>> support.
> 
> Okay. As long as overlayfs support in SELinux is in half-finished
> state, let's leave this alone.

Also, the caller is holding a spinlock (tty_files_lock), so you can't call 
inode_doinit from
here.

Try stress testing your patch series by just always setting isec->initialized 
to LABEL_INVALID.
Previously the *has_perm functions could be called under essentially any 
condition, with the exception
of when in a RCU walk and needing to audit the dname (but they did not 
previously block/sleep).



--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/7] selinux: Add accessor functions for inode->i_security

2015-10-28 Thread Andreas Gruenbacher
On Tue, Oct 27, 2015 at 6:20 PM, Stephen Smalley  wrote:
> On 10/26/2015 05:15 PM, Andreas Gruenbacher wrote:
>> @@ -2217,7 +2231,7 @@ static int selinux_bprm_set_creds(struct
>> linux_binprm *bprm)
>>
>> old_tsec = current_security();
>> new_tsec = bprm->cred->security;
>> -   isec = inode->i_security;
>> +   isec = dentry_security(bprm->file->f_path.dentry);
>
> IIUC, this could change which inode label gets used when using overlayfs
> (the overlay inode or the underlying inode).  Not sure whether the current
> code is correct for overlayfs (overlayfs + SELinux support still in
> progress).

Okay, let's stick with inode_security, at least for now.

>> @@ -3154,7 +3168,7 @@ out_nofree:
>>   static int selinux_inode_setsecurity(struct inode *inode, const char
>> *name,
>>  const void *value, size_t size, int
>> flags)
>>   {
>> -   struct inode_security_struct *isec = inode->i_security;
>> +   struct inode_security_struct *isec = inode_security(inode);
>
> Was it intentional to not do this for selinux_inode_getsecurity() and
> selinux_inode_getsecid()?

These two hooks both pass in a const inode *, so that needs to be
changed first. Then, selinux_inode_getsecurity should obviously use
inode_security.

I'm not really sure about selinux_inode_getsecid though: can it be
call it from a non-sleeping context?

>> @@ -3241,8 +3254,8 @@ int ioctl_has_perm(const struct cred *cred, struct
>> file *file,
>>   {
>> struct common_audit_data ad;
>> struct file_security_struct *fsec = file->f_security;
>> -   struct inode *inode = file_inode(file);
>> -   struct inode_security_struct *isec = inode->i_security;
>> +   struct dentry *dentry = file->f_path.dentry;
>> +   struct inode_security_struct *isec = dentry_security(dentry);
>> struct lsm_ioctlop_audit ioctl;
>> u32 ssid = cred_sid(cred);
>> int rc;
>> @@ -3263,7 +3276,7 @@ int ioctl_has_perm(const struct cred *cred, struct
>> file *file,
>> goto out;
>> }
>>
>> -   if (unlikely(IS_PRIVATE(inode)))
>> +   if (unlikely(IS_PRIVATE(dentry->d_inode)))
>> return 0;
>>
>> rc = avc_has_extended_perms(ssid, isec->sid, isec->sclass,
>> @@ -3506,7 +3519,7 @@ static int selinux_file_open(struct file *file,
>> const struct cred *cred)
>> struct inode_security_struct *isec;
>>
>> fsec = file->f_security;
>> -   isec = file_inode(file)->i_security;
>> +   isec = dentry_security(file->f_path.dentry);
>
>
> Similarly for these cases, switching from file_inode(file) to
> d_backing_inode(dentry) could affect overlayfs interaction IIUC.

Okay, let's stick with inode_security as well for now.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/7] selinux: Get rid of file_path_has_perm

2015-10-28 Thread Andreas Gruenbacher
On Tue, Oct 27, 2015 at 5:40 PM, Stephen Smalley  wrote:
> On 10/26/2015 05:15 PM, Andreas Gruenbacher wrote:
>>
>> Use path_has_perm directly instead.
>
>
> This reverts:
>
> commit 13f8e9810bff12d01807b6f92329111f45218235
> Author: David Howells 
> Date:   Thu Jun 13 23:37:55 2013 +0100
>
> SELinux: Institute file_path_has_perm()
>
> Create a file_path_has_perm() function that is like path_has_perm() but
> instead takes a file struct that is the source of both the path and the
> inode (rather than getting the inode from the dentry in the path).  This
> is then used where appropriate.
>
> This will be useful for situations like unionmount where it will be
> possible to have an apparently-negative dentry (eg. a fallthrough) that
> is
> open with the file struct pointing to an inode on the lower fs.
>
> Signed-off-by: David Howells 
> Signed-off-by: Al Viro 
>
> which I think David was intending to use as part of his SELinux/overlayfs
> support.

Okay. As long as overlayfs support in SELinux is in half-finished
state, let's leave this alone.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC rdma cgroup

2015-10-28 Thread Parav Pandit
Hi All,

Based on the review comments, feedback, discussion from/with Tejun,
Haggai, Doug, Jason, Liran, Sean, ORNL team, I have updated the design
as below.

This is fairly strong and simple design, addresses most of the points
raised to cover current RDMA use cases.
Feel free to skip design guidelines section and jump to design section
below if you find it too verbose. I had to describe it to set the
context and address comments from our past discussion.

Design guidelines:
---
1. There will be new rdma cgroup for accounting rdma resources
(instead of extending device cgroup).
Rationale: RDMA tracks different type of resources and it functions
differently than device cgroup. Though device cgroup could have been
extended for more generic nature, community feels that its better to
create RDMA cgroup, which might have more features than just resource
limit enforcement in future.

2. RDMA cgroup will allow resource accounting, limit enforcement on
per cgroup, per rdma device basis (instead of resource limiting across
all devices).
Rationale: this give granular control when multiple devices exist in the system.

3. Resources are not defined by the RDMA cgroup. Resources are defined
by RDMA/IB subsystem and optionally by HCA vendor device drivers.
Rationale: This allows rdma cgroup to remain constant while RDMA/IB
subsystem can evolve without the need of rdma cgroup update. A new
resource can be easily added by the RDMA/IB subsystem without touching
rdma cgroup.

4. RDMA uverbs layer will enforce limits on well defined RDMA verb
resources without any HCA vendor device driver involvement.
Rationale:
(a) RDMA verbs are very well defined set of resource abstraction in
Linux kernel stack for many years now and in use by many applications
directly working with RDMA resources in varied manner. Instead of
replicating code in every vendor driver, RDMA uverbs layer will
enforce such resource limits (with help of rdma cgroup).
(b) IB verbs resource is also a vendor agnostic representation of RDMA
resource; therefore its done at RDMA uverbs level.

6. RDMA uverbs layer will not do accounting of hw vendor specific resources.
Rationale: RDMA uverbs layer is not aware of which hw resource maps to
which verb resource and by how much amount. Therefore hw resource
accounting, charging, uncharging has to happen by the vendor driver.
This is optional and left to the HCA vendor device driver to
implement. HCA driver best knows on how to keep the mapping, therefore
its left to HCA vendor driver to do the accounting.

7. RDMA cgroup will provide unified APIs through which both RDMA
subsystem and vendor defined RDMA resource can be charged, uncharged
by verb layer and HCA driver respectively.

8. RDMA cgroup initial version will support only hard limits without
any kind of reservation of resources or ranges. In future it might be
extended for more dynamic nature.
Rationale: Typically RDMA resources are stateful resource unlike cpu
and RDMA resources don't follow work conserving nature.

9. Resource limit enforcement is hierarchical.

10. Process migration from one to other cgroup with active RDMA
resource is highly discouraged.

11. When process is migrated with active RDMA resources, rdma cgroup
continues to charge original cgroup.
Rationale:
Unlike other POSIX calls, RDMA resources are not defined as POSIX
level. These resources sit behind a file descriptor.
Multiple processes forked, belonging to different thread group, can
possibly placed in different cgroup sharing same rdma resources.
It could be well done where its allocated by one thread group and
release by other thread group from different cgroup.
Resource usage hierarchy can bet easily get complex even though that
is not primary use case.
Typically all processes which want to use RDMA resources will be part
of one leaf cgroup throughout their life cycle.
Therefore its not worth to complicate design around process migration.

Design:
-
1. New RDMA cgroup defines resource pool object, that connects cgroup
subsystem to RDMA subsystem.
2. Resource pool object is per cgroup, per device entity that is
managed, controlled, configured by the administrator via cgroup
interface.
3. There can be at maximum of 64 resources per resource pool (such as
MR, QP, AH, PD etc and other hardware resources). To manage resources
beyond 64, it will require RDMA cgroup subsystem update. This will be
done in future if at all its needed.

4. RDMA cgroup defines two class of resources.
(a) verb resources - tracks RDMA verb layer resources
(b) hw resources - tracks HCA HW specific resources
5. verbs resource template is defined by RDMA uverbs layer.
6. hw resource template is defined by HCA vendor driver. This is
optional and should be done by those driver which doesn't have one to
one mapping with verb resource and hw resource.

7. Processes in a cgroup without any configured limit (or in other
words without resource pools) has max limits of the resources. If one
of the r

Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource

2015-10-28 Thread Parav Pandit
Hi,

I finally got some chance and progress on redesigning rdma cgroup
controller for the most use cases that we discussed in this email
chain.
I am posting RFC and soon code in new email.

Parav


On Sun, Sep 20, 2015 at 4:05 PM, Haggai Eran  wrote:
> On 15/09/2015 06:45, Jason Gunthorpe wrote:
>> No, I'm saying the resource pool is *well defined* and *fixed* by each
>> hardware.
>>
>> The only question is how do we expose the N resource limits, the list
>> of which is totally vendor specific.
>
> I don't see why you say the limits are vendor specific. It is true that
> different RDMA devices have different implementations and capabilities,
> but they all use the expose the same set of RDMA objects with their
> limitations. Whether those limitations come from hardware limitations,
> from the driver, or just because the address space is limited, they can
> still be exhausted.
>
>> Yes, using a % scheme fixes the ratios, 1% is going to be a certain
>> number of PD's, QP's, MRs, CQ's, etc at a ratio fixed by the driver
>> configuration. That is the trade off for API simplicity.
>>
>>
>> Yes, this results in some resources being over provisioned.
>
> I agree that such a scheme will be easy to configure, but I don't think
> it can work well in all situations. Imagine you want to let one
> container use almost all RC QPs as you want it to connect to the entire
> cluster through RC. Other containers can still use a single datagram QP
> to connect to the entire cluster, but they would require many address
> handles. If you force a fixed ratio of resources given to each container
> it would be hard to describe such a partitioning.
>
> I think it would be better to expose different controls for the
> different RDMA resources.
>
> Regards,
> Haggai
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html