Re:Re: [PATCH net-next] net: Remove useless function skb_header_release

2017-09-20 Thread Gao Feng

At 2017-09-21 05:30:46, "David Miller" <da...@davemloft.net> wrote:
>From: gfree.w...@vip.163.com
>Date: Tue, 19 Sep 2017 22:32:48 +0800
>
>> From: Gao Feng <gfree.w...@vip.163.com>
>> 
>> There is no one which would invokes the function skb_header_release.
>> So just remove it now.
>> 
>> Signed-off-by: Gao Feng <gfree.w...@vip.163.com>
>
>Networking patches must be at least CC:'d to net...@vger.kernel.org,
>thank you.

Thanks your reminder.
I just used the result of get_maintainer.pl, and didn't noticed this point you 
mentioned.

I would send it again.

Regards
Feng


Re:Re: [PATCH net-next] net: Remove useless function skb_header_release

2017-09-20 Thread Gao Feng

At 2017-09-21 05:30:46, "David Miller"  wrote:
>From: gfree.w...@vip.163.com
>Date: Tue, 19 Sep 2017 22:32:48 +0800
>
>> From: Gao Feng 
>> 
>> There is no one which would invokes the function skb_header_release.
>> So just remove it now.
>> 
>> Signed-off-by: Gao Feng 
>
>Networking patches must be at least CC:'d to net...@vger.kernel.org,
>thank you.

Thanks your reminder.
I just used the result of get_maintainer.pl, and didn't noticed this point you 
mentioned.

I would send it again.

Regards
Feng


Re: [PATCH] net: avoid uninitialized variable

2016-10-26 Thread Gao Feng
On Thu, Oct 27, 2016 at 11:56 AM, zhongjiang  wrote:
> From: zhong jiang 
>
> when I compiler the newest kernel, I hit the following error with
> Werror=may-uninitalized.
>
> net/core/flow_dissector.c: In function ?._skb_flow_dissect?
> include/uapi/linux/swab.h:100:46: error: ?.lan?.may be used uninitialized in 
> this function [-Werror=maybe-uninitialized]
> net/core/flow_dissector.c:248:26: note: ?.lan?.was declared here
>
> This adds an additional check for proto to explicitly tell the compiler
> that vlan pointer have the correct value before it is used.
>
> Signed-off-by: zhong jiang 
> ---
>  net/core/flow_dissector.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 1a7b80f..a04d9cf 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -245,7 +245,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> }
> case htons(ETH_P_8021AD):
> case htons(ETH_P_8021Q): {
> -   const struct vlan_hdr *vlan;
> +   const struct vlan_hdr *vlan = NULL;
>
> if (skb_vlan_tag_present(skb))
> proto = skb->protocol;
> @@ -276,7 +276,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> key_vlan->vlan_id = skb_vlan_tag_get_id(skb);
> key_vlan->vlan_priority =
> (skb_vlan_tag_get_prio(skb) >> 
> VLAN_PRIO_SHIFT);
> -   } else {
> +   } else if (proto == cpu_to_be16(ETH_P_8021Q) ||
> +   proto == 
> cpu_to_be16(ETH_P_8021AD)) {
> key_vlan->vlan_id = ntohs(vlan->h_vlan_TCI) &
> VLAN_VID_MASK;
> key_vlan->vlan_priority =
> --
> 1.8.3.1
>

It seems there is one similar patch already.
You could refer to https://patchwork.kernel.org/patch/9389565/

Regards
Feng




Re: [PATCH] net: avoid uninitialized variable

2016-10-26 Thread Gao Feng
On Thu, Oct 27, 2016 at 11:56 AM, zhongjiang  wrote:
> From: zhong jiang 
>
> when I compiler the newest kernel, I hit the following error with
> Werror=may-uninitalized.
>
> net/core/flow_dissector.c: In function ?._skb_flow_dissect?
> include/uapi/linux/swab.h:100:46: error: ?.lan?.may be used uninitialized in 
> this function [-Werror=maybe-uninitialized]
> net/core/flow_dissector.c:248:26: note: ?.lan?.was declared here
>
> This adds an additional check for proto to explicitly tell the compiler
> that vlan pointer have the correct value before it is used.
>
> Signed-off-by: zhong jiang 
> ---
>  net/core/flow_dissector.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 1a7b80f..a04d9cf 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -245,7 +245,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> }
> case htons(ETH_P_8021AD):
> case htons(ETH_P_8021Q): {
> -   const struct vlan_hdr *vlan;
> +   const struct vlan_hdr *vlan = NULL;
>
> if (skb_vlan_tag_present(skb))
> proto = skb->protocol;
> @@ -276,7 +276,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> key_vlan->vlan_id = skb_vlan_tag_get_id(skb);
> key_vlan->vlan_priority =
> (skb_vlan_tag_get_prio(skb) >> 
> VLAN_PRIO_SHIFT);
> -   } else {
> +   } else if (proto == cpu_to_be16(ETH_P_8021Q) ||
> +   proto == 
> cpu_to_be16(ETH_P_8021AD)) {
> key_vlan->vlan_id = ntohs(vlan->h_vlan_TCI) &
> VLAN_VID_MASK;
> key_vlan->vlan_priority =
> --
> 1.8.3.1
>

It seems there is one similar patch already.
You could refer to https://patchwork.kernel.org/patch/9389565/

Regards
Feng




Re: [PATCH net] rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly

2016-08-30 Thread Gao Feng
On Wed, Aug 31, 2016 at 12:14 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2016-08-31 at 10:56 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> The original codes depend on that the function parameters are evaluated from
>> left to right. But the parameter's evaluation order is not defined in C
>> standard actually.
>>
>> When flow_keys_have_l4() is invoked before ___skb_get_hash(skb, ,
>> hashrnd) with some compilers or environment, the keys passed to
>> flow_keys_have_l4 is not initialized.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>
> Good catch, please add
>
> Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from 
> ___skb_get_hash")
> Acked-by: Eric Dumazet <eduma...@google.com>
>
>

Add it into the description and resend the patch again?

Best Regards
Feng




Re: [PATCH net] rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly

2016-08-30 Thread Gao Feng
On Wed, Aug 31, 2016 at 12:14 PM, Eric Dumazet  wrote:
> On Wed, 2016-08-31 at 10:56 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng 
>>
>> The original codes depend on that the function parameters are evaluated from
>> left to right. But the parameter's evaluation order is not defined in C
>> standard actually.
>>
>> When flow_keys_have_l4() is invoked before ___skb_get_hash(skb, ,
>> hashrnd) with some compilers or environment, the keys passed to
>> flow_keys_have_l4 is not initialized.
>>
>> Signed-off-by: Gao Feng 
>> ---
>
> Good catch, please add
>
> Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from 
> ___skb_get_hash")
> Acked-by: Eric Dumazet 
>
>

Add it into the description and resend the patch again?

Best Regards
Feng




Re: [PATCH audit-next 2/2] Audit: make audit netlink socket net namespace unaware

2014-01-16 Thread Gao feng
On 01/17/2014 06:29 AM, Serge E. Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> Add a compare function which always return true for
>> audit netlink socket, this will cause audit netlink
>> sockets netns unaware, and no matter which netns the
>> user space audit netlink sockets belong to, they all
>> can find out and communicate with audit_sock.
>>
>> This gets rid of the necessary to create per-netns
>> audit kernel side socket(audit_sock), it's pain to
>> depend on and get reference of netns for auditns.
>>
>> Signed-off-by: Gao feng 
> 
> So whereas before you could prevent a task from spamming
> audit by putting it into a private netns, now you have to
> do it using a user namespace (to prevent capable(CAP_AUDIT_WRITE))
> right?
> 

Yes, the commit 1a938bec0090dc49abdb471e978e0d8155186845
"listen in all network namespaces" in audit-next already
did this change. this patch is another way to allow task
to generate audit msg in un-init netns. This is one of
the purpose of auditns. And this capable check has already
done in audit_netlink_ok.

> I don't know that anyone is depending on that, in any case, but
> it's a change.
> 

I think this change should be transparent to the userspace tools.
Since I don't know why a task should depend on audit is unavailable.

Or I misunderstand your question?

> Is this building up to something?
> 

Just allow task in un-init netns to communicate with kernel.

Thanks!
Gao

>> ---
>>  kernel/audit.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/kernel/audit.c b/kernel/audit.c
>> index b62153a..2ac6212 100644
>> --- a/kernel/audit.c
>> +++ b/kernel/audit.c
>> @@ -1064,12 +1064,18 @@ static void audit_receive(struct sk_buff  *skb)
>>  mutex_unlock(_cmd_mutex);
>>  }
>>  
>> +static bool audit_compare(struct net *net, struct sock *sk)
>> +{
>> +return true;
>> +}
>> +
>>  /* Initialize audit support at boot time. */
>>  static int __init audit_init(void)
>>  {
>>  int i;
>>  struct netlink_kernel_cfg cfg = {
>>  .input  = audit_receive,
>> +.compare = audit_compare,
>>  };
>>  
>>  if (audit_initialized == AUDIT_DISABLED)
>> -- 
>> 1.8.4.2
>>
>> ___
>> Containers mailing list
>> contain...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/containers
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH audit-next 2/2] Audit: make audit netlink socket net namespace unaware

2014-01-16 Thread Gao feng
On 01/17/2014 06:29 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Add a compare function which always return true for
 audit netlink socket, this will cause audit netlink
 sockets netns unaware, and no matter which netns the
 user space audit netlink sockets belong to, they all
 can find out and communicate with audit_sock.

 This gets rid of the necessary to create per-netns
 audit kernel side socket(audit_sock), it's pain to
 depend on and get reference of netns for auditns.

 Signed-off-by: Gao feng gaof...@cn.fujitsu.com
 
 So whereas before you could prevent a task from spamming
 audit by putting it into a private netns, now you have to
 do it using a user namespace (to prevent capable(CAP_AUDIT_WRITE))
 right?
 

Yes, the commit 1a938bec0090dc49abdb471e978e0d8155186845
listen in all network namespaces in audit-next already
did this change. this patch is another way to allow task
to generate audit msg in un-init netns. This is one of
the purpose of auditns. And this capable check has already
done in audit_netlink_ok.

 I don't know that anyone is depending on that, in any case, but
 it's a change.
 

I think this change should be transparent to the userspace tools.
Since I don't know why a task should depend on audit is unavailable.

Or I misunderstand your question?

 Is this building up to something?
 

Just allow task in un-init netns to communicate with kernel.

Thanks!
Gao

 ---
  kernel/audit.c | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/kernel/audit.c b/kernel/audit.c
 index b62153a..2ac6212 100644
 --- a/kernel/audit.c
 +++ b/kernel/audit.c
 @@ -1064,12 +1064,18 @@ static void audit_receive(struct sk_buff  *skb)
  mutex_unlock(audit_cmd_mutex);
  }
  
 +static bool audit_compare(struct net *net, struct sock *sk)
 +{
 +return true;
 +}
 +
  /* Initialize audit support at boot time. */
  static int __init audit_init(void)
  {
  int i;
  struct netlink_kernel_cfg cfg = {
  .input  = audit_receive,
 +.compare = audit_compare,
  };
  
  if (audit_initialized == AUDIT_DISABLED)
 -- 
 1.8.4.2

 ___
 Containers mailing list
 contain...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/containers
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH audit-next 2/2] Audit: make audit netlink socket net namespace unaware

2014-01-09 Thread Gao feng
Add a compare function which always return true for
audit netlink socket, this will cause audit netlink
sockets netns unaware, and no matter which netns the
user space audit netlink sockets belong to, they all
can find out and communicate with audit_sock.

This gets rid of the necessary to create per-netns
audit kernel side socket(audit_sock), it's pain to
depend on and get reference of netns for auditns.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index b62153a..2ac6212 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1064,12 +1064,18 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(_cmd_mutex);
 }
 
+static bool audit_compare(struct net *net, struct sock *sk)
+{
+   return true;
+}
+
 /* Initialize audit support at boot time. */
 static int __init audit_init(void)
 {
int i;
struct netlink_kernel_cfg cfg = {
.input  = audit_receive,
+   .compare = audit_compare,
};
 
if (audit_initialized == AUDIT_DISABLED)
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH audit-next 1/2] audit: revert commit listen in all network namespaces

2014-01-09 Thread Gao feng
This patch reverts the commit 1a938bec0090dc49abdb471e978e0d8155186845
"listen in all network namespaces",this commit brings the dependence
of net namespace for audit. it's a pain when we implement audit namespace.
we have to do lots of works to make sure audit socket is valid.
and unshare namespace will make things more worse.

In the next patch, I will add a compare function for audit
netlink socket, and this will make audit socket global. all
user space audit netlink will communicate with this global
socket, and kernel will send message to user space through
this socket. this will make things easy and we needn't to
consider the complicate cases.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 61 ++
 kernel/audit.h |  4 
 2 files changed, 10 insertions(+), 55 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index ff1d1d7..b62153a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -63,7 +63,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "audit.h"
 
@@ -124,7 +123,6 @@ static atomic_taudit_lost = ATOMIC_INIT(0);
 
 /* The netlink socket. */
 static struct sock *audit_sock;
-int audit_net_id;
 
 /* Hash for inode-based rules */
 struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
@@ -415,7 +413,6 @@ static void kauditd_send_skb(struct sk_buff *skb)
audit_pid);
audit_log_lost("auditd disappeared\n");
audit_pid = 0;
-   audit_sock = NULL;
}
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
@@ -501,15 +498,13 @@ int audit_send_list(void *_dest)
 {
struct audit_netlink_list *dest = _dest;
struct sk_buff *skb;
-   struct net *net = get_net_ns_by_pid(dest->pid);
-   struct audit_net *aunet = net_generic(net, audit_net_id);
 
/* wait for parent to finish and send an ACK */
mutex_lock(_cmd_mutex);
mutex_unlock(_cmd_mutex);
 
while ((skb = __skb_dequeue(>q)) != NULL)
-   netlink_unicast(aunet->nlsk, skb, dest->portid, 0);
+   netlink_unicast(audit_sock, skb, dest->portid, 0);
 
kfree(dest);
 
@@ -544,15 +539,13 @@ out_kfree_skb:
 static int audit_send_reply_thread(void *arg)
 {
struct audit_reply *reply = (struct audit_reply *)arg;
-   struct net *net = get_net_ns_by_pid(reply->pid);
-   struct audit_net *aunet = net_generic(net, audit_net_id);
 
mutex_lock(_cmd_mutex);
mutex_unlock(_cmd_mutex);
 
/* Ignore failure. It'll only happen if the sender goes away,
   because our timeout is set to infinite. */
-   netlink_unicast(aunet->nlsk , reply->skb, reply->portid, 0);
+   netlink_unicast(audit_sock, reply->skb, reply->portid, 0);
kfree(reply);
return 0;
 }
@@ -822,7 +815,6 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
audit_log_config_change("audit_pid", new_pid, 
audit_pid, 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
-   audit_sock = skb->sk;
}
if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
err = audit_set_rate_limit(s.rate_limit);
@@ -1072,57 +1064,24 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(_cmd_mutex);
 }
 
-static int __net_init audit_net_init(struct net *net)
-{
-   struct netlink_kernel_cfg cfg = {
-   .input  = audit_receive,
-   };
-
-   struct audit_net *aunet = net_generic(net, audit_net_id);
-
-   pr_info("audit: initializing netlink socket in namespace\n");
-
-   aunet->nlsk = netlink_kernel_create(net, NETLINK_AUDIT, );
-   if (aunet->nlsk == NULL) {
-   audit_panic("cannot initialize netlink socket in namespace");
-   return -ENOMEM;
-   }
-   aunet->nlsk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
-   return 0;
-}
-
-static void __net_exit audit_net_exit(struct net *net)
-{
-   struct audit_net *aunet = net_generic(net, audit_net_id);
-   struct sock *sock = aunet->nlsk;
-   if (sock == audit_sock) {
-   audit_pid = 0;
-   audit_sock = NULL;
-   }
-
-   rcu_assign_pointer(aunet->nlsk, NULL);
-   synchronize_net();
-   netlink_kernel_release(sock);
-}
-
-static struct pernet_operations audit_net_ops __net_initdata = {
-   .init = audit_net_init,
-   .exit = audit_net_exit,
-   .id = _net_id,
-   .size = sizeof(struct audit_net),
-};
-
 /* Initialize audit support at boot time. */
 static int __init audit_init(void)
 {
int i;
+   struct netlink_kernel_cfg cfg = {
+   .input  = au

[PATCH audit-next 1/2] audit: revert commit listen in all network namespaces

2014-01-09 Thread Gao feng
This patch reverts the commit 1a938bec0090dc49abdb471e978e0d8155186845
listen in all network namespaces,this commit brings the dependence
of net namespace for audit. it's a pain when we implement audit namespace.
we have to do lots of works to make sure audit socket is valid.
and unshare namespace will make things more worse.

In the next patch, I will add a compare function for audit
netlink socket, and this will make audit socket global. all
user space audit netlink will communicate with this global
socket, and kernel will send message to user space through
this socket. this will make things easy and we needn't to
consider the complicate cases.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 kernel/audit.c | 61 ++
 kernel/audit.h |  4 
 2 files changed, 10 insertions(+), 55 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index ff1d1d7..b62153a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -63,7 +63,6 @@
 #include linux/freezer.h
 #include linux/tty.h
 #include linux/pid_namespace.h
-#include net/netns/generic.h
 
 #include audit.h
 
@@ -124,7 +123,6 @@ static atomic_taudit_lost = ATOMIC_INIT(0);
 
 /* The netlink socket. */
 static struct sock *audit_sock;
-int audit_net_id;
 
 /* Hash for inode-based rules */
 struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
@@ -415,7 +413,6 @@ static void kauditd_send_skb(struct sk_buff *skb)
audit_pid);
audit_log_lost(auditd disappeared\n);
audit_pid = 0;
-   audit_sock = NULL;
}
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
@@ -501,15 +498,13 @@ int audit_send_list(void *_dest)
 {
struct audit_netlink_list *dest = _dest;
struct sk_buff *skb;
-   struct net *net = get_net_ns_by_pid(dest-pid);
-   struct audit_net *aunet = net_generic(net, audit_net_id);
 
/* wait for parent to finish and send an ACK */
mutex_lock(audit_cmd_mutex);
mutex_unlock(audit_cmd_mutex);
 
while ((skb = __skb_dequeue(dest-q)) != NULL)
-   netlink_unicast(aunet-nlsk, skb, dest-portid, 0);
+   netlink_unicast(audit_sock, skb, dest-portid, 0);
 
kfree(dest);
 
@@ -544,15 +539,13 @@ out_kfree_skb:
 static int audit_send_reply_thread(void *arg)
 {
struct audit_reply *reply = (struct audit_reply *)arg;
-   struct net *net = get_net_ns_by_pid(reply-pid);
-   struct audit_net *aunet = net_generic(net, audit_net_id);
 
mutex_lock(audit_cmd_mutex);
mutex_unlock(audit_cmd_mutex);
 
/* Ignore failure. It'll only happen if the sender goes away,
   because our timeout is set to infinite. */
-   netlink_unicast(aunet-nlsk , reply-skb, reply-portid, 0);
+   netlink_unicast(audit_sock, reply-skb, reply-portid, 0);
kfree(reply);
return 0;
 }
@@ -822,7 +815,6 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
audit_log_config_change(audit_pid, new_pid, 
audit_pid, 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
-   audit_sock = skb-sk;
}
if (s.mask  AUDIT_STATUS_RATE_LIMIT) {
err = audit_set_rate_limit(s.rate_limit);
@@ -1072,57 +1064,24 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(audit_cmd_mutex);
 }
 
-static int __net_init audit_net_init(struct net *net)
-{
-   struct netlink_kernel_cfg cfg = {
-   .input  = audit_receive,
-   };
-
-   struct audit_net *aunet = net_generic(net, audit_net_id);
-
-   pr_info(audit: initializing netlink socket in namespace\n);
-
-   aunet-nlsk = netlink_kernel_create(net, NETLINK_AUDIT, cfg);
-   if (aunet-nlsk == NULL) {
-   audit_panic(cannot initialize netlink socket in namespace);
-   return -ENOMEM;
-   }
-   aunet-nlsk-sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
-   return 0;
-}
-
-static void __net_exit audit_net_exit(struct net *net)
-{
-   struct audit_net *aunet = net_generic(net, audit_net_id);
-   struct sock *sock = aunet-nlsk;
-   if (sock == audit_sock) {
-   audit_pid = 0;
-   audit_sock = NULL;
-   }
-
-   rcu_assign_pointer(aunet-nlsk, NULL);
-   synchronize_net();
-   netlink_kernel_release(sock);
-}
-
-static struct pernet_operations audit_net_ops __net_initdata = {
-   .init = audit_net_init,
-   .exit = audit_net_exit,
-   .id = audit_net_id,
-   .size = sizeof(struct audit_net),
-};
-
 /* Initialize audit support at boot time. */
 static int __init audit_init(void)
 {
int i;
+   struct netlink_kernel_cfg cfg = {
+   .input  = audit_receive

[PATCH audit-next 2/2] Audit: make audit netlink socket net namespace unaware

2014-01-09 Thread Gao feng
Add a compare function which always return true for
audit netlink socket, this will cause audit netlink
sockets netns unaware, and no matter which netns the
user space audit netlink sockets belong to, they all
can find out and communicate with audit_sock.

This gets rid of the necessary to create per-netns
audit kernel side socket(audit_sock), it's pain to
depend on and get reference of netns for auditns.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 kernel/audit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index b62153a..2ac6212 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1064,12 +1064,18 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(audit_cmd_mutex);
 }
 
+static bool audit_compare(struct net *net, struct sock *sk)
+{
+   return true;
+}
+
 /* Initialize audit support at boot time. */
 static int __init audit_init(void)
 {
int i;
struct netlink_kernel_cfg cfg = {
.input  = audit_receive,
+   .compare = audit_compare,
};
 
if (audit_initialized == AUDIT_DISABLED)
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] audit: print error message when fail to create audit socket

2014-01-07 Thread Gao feng
On 01/08/2014 08:53 AM, Andrew Morton wrote:
> On Tue, 17 Dec 2013 11:10:41 +0800 Gao feng  wrote:
> 
>> print the error message and then return -ENOMEM.
>>
>> ...
>>
>> --- a/kernel/audit.c
>> +++ b/kernel/audit.c
>> @@ -1083,12 +1083,11 @@ static int __net_init audit_net_init(struct net *net)
>>  pr_info("audit: initializing netlink socket in namespace\n");
>>  
>>  aunet->nlsk = netlink_kernel_create(net, NETLINK_AUDIT, );
>> -if (aunet->nlsk == NULL)
>> -return -ENOMEM;
>> -if (!aunet->nlsk)
>> +if (aunet->nlsk == NULL) {
>>  audit_panic("cannot initialize netlink socket in namespace");
>> -else
>> -aunet->nlsk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
>> +return -ENOMEM;
>> +}
>> +aunet->nlsk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
>>  return 0;
>>  }
> 
> What kernel version are these against?  Something ancient, I expect -
> audit_net_init() doesn't exist.
> 
> Please check current kernels, redo and resend the patches if anything
> needs changing?

This patch is against Richard Guy Briggs's audit tree. the current kernel
doesn't have this problem.

BTW, Richard & Eric, when do you plan to push these changes to the upstream?
there are a lot of changes in Richard's tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] audit: print error message when fail to create audit socket

2014-01-07 Thread Gao feng
On 01/08/2014 08:53 AM, Andrew Morton wrote:
 On Tue, 17 Dec 2013 11:10:41 +0800 Gao feng gaof...@cn.fujitsu.com wrote:
 
 print the error message and then return -ENOMEM.

 ...

 --- a/kernel/audit.c
 +++ b/kernel/audit.c
 @@ -1083,12 +1083,11 @@ static int __net_init audit_net_init(struct net *net)
  pr_info(audit: initializing netlink socket in namespace\n);
  
  aunet-nlsk = netlink_kernel_create(net, NETLINK_AUDIT, cfg);
 -if (aunet-nlsk == NULL)
 -return -ENOMEM;
 -if (!aunet-nlsk)
 +if (aunet-nlsk == NULL) {
  audit_panic(cannot initialize netlink socket in namespace);
 -else
 -aunet-nlsk-sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
 +return -ENOMEM;
 +}
 +aunet-nlsk-sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
  return 0;
  }
 
 What kernel version are these against?  Something ancient, I expect -
 audit_net_init() doesn't exist.
 
 Please check current kernels, redo and resend the patches if anything
 needs changing?

This patch is against Richard Guy Briggs's audit tree. the current kernel
doesn't have this problem.

BTW, Richard  Eric, when do you plan to push these changes to the upstream?
there are a lot of changes in Richard's tree.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH net-next 0/4] net_cls for sys container

2014-01-06 Thread Gao feng
On 01/06/2014 03:54 PM, Libo Chen wrote:
> On 2014/1/3 13:20, Cong Wang wrote:
>> On Thu, Jan 2, 2014 at 7:11 PM, Libo Chen  
>> wrote:
>>> Hi guys,
>>>
>>> Now, lxc created with veth can not be under control by
>>> cls_cgroup.
>>>
>>> the former discussion:
>>> http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/00214.html
>>>
>>> In short, because cls_cgroup relys classid attached to sock
>>> filter skb, but sock will be cleared inside dev_forward_skb()
>>> in veth_xmit().
>>
>>
>> So what are you trying to achieve here?
> 
> sys container using veth can be controlled by cls_cgroup basing on physic 
> network interface
> 

It's a problem about virtual nic, not container/net namespace.

If veth device is running in host. the skb is transmitted firstly by veth 
device and then delivered
by physical device. if you set both qdisc rule on veth and physical device. 
which qdisc rule will take
effect?

In your patch, both qdisc rule are effective. it looks strange.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH net-next 0/4] net_cls for sys container

2014-01-06 Thread Gao feng
On 01/06/2014 03:54 PM, Libo Chen wrote:
 On 2014/1/3 13:20, Cong Wang wrote:
 On Thu, Jan 2, 2014 at 7:11 PM, Libo Chen clbchenlibo.c...@huawei.com 
 wrote:
 Hi guys,

 Now, lxc created with veth can not be under control by
 cls_cgroup.

 the former discussion:
 http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/00214.html

 In short, because cls_cgroup relys classid attached to sock
 filter skb, but sock will be cleared inside dev_forward_skb()
 in veth_xmit().


 So what are you trying to achieve here?
 
 sys container using veth can be controlled by cls_cgroup basing on physic 
 network interface
 

It's a problem about virtual nic, not container/net namespace.

If veth device is running in host. the skb is transmitted firstly by veth 
device and then delivered
by physical device. if you set both qdisc rule on veth and physical device. 
which qdisc rule will take
effect?

In your patch, both qdisc rule are effective. it looks strange.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-24 Thread Gao feng
On 12/24/2013 07:47 AM, Richard Guy Briggs wrote:
> On 13/12/09, Gao feng wrote:
>> On 12/07/2013 05:31 AM, Serge E. Hallyn wrote:
>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
> 
>>>> The main target of this patchset is allowing user in audit
>>>> namespace to generate the USER_MSG type of audit message,
>>>> some userspace tools need to generate audit message, or
>>>> these tools will broken.
>>>>
>>>> And the login process in container may want to setup
>>>> /proc//loginuid, right now this value is unalterable
>>>> once it being set. this will also broke the login problem
>>>> in container. After this patchset, we can reset this loginuid
>>>> to zero if task is running in a new audit namespace.
>>>>
>>>> Same with v1 patchset, in this patchset, only the privileged
>>>> user in init_audit_ns and init_user_ns has rights to
>>>> add/del audit rules. and these rules are gloabl. all
>>>> audit namespace will comply with the rules.
>>>>
>>>> Compared with v1, v2 patch has some big changes.
>>>> 1, the audit namespace is not assigned to user namespace.
>>>>since there is no available bit of flags for clone, we
>>>>create audit namespace through netlink, patch[18/20]
>>>>introduces a new audit netlink type AUDIT_CREATE_NS.
>>>>the privileged user in userns has rights to create a
>>>>audit namespace, it means the unprivileged user can
>>>>create auditns through create userns first. In order
>>>>to prevent them from doing harm to host, the default
>>>>audit_backlog_limit of un-init-audit-ns is zero(means
>>>>audit is unavailable in audit namespace). and it can't
>>>>be changed in auditns through netlink.
>>>
>>> So the unprivileged user can create an audit-ns, but can't
>>> then actually send any messages there?  I guess setting it
>>> to something small would just be hacky?
>>
>> Yes, if unprivileged user wants to send audit message, he should
>> ask privileged user to setup the audit_backlog_limit for him.
>>
>> I know it's a little of hack, but I don't have good idea :(
> 
> There's a recent patch that actually clarifies the way
> audit_backlog_limit works, since different parts of the code seemed to
> think different things.  It now means unlimited, since there are other
> places to shut off logging.
>   audit: allow unlimited backlog queue

Yep, thanks for your information, we can set a negative number to backlog_limit
to mark there is no available buff for this audit ns.

> 
> At first, I'd say each audit_ns should be able to set its own
> audit_backlog_limit.  The most obvious argument against this would be
> the vulnerability of a DoS. 

There are two problem we should conside, auditns costs lot's of memory by
setting large backlog_limit and costs lot's of cpu resources by generating
audit log all the time. So I think the privileged user should have the ability
to limit the backlog len.

And I think it's not very necessary to keep on allowing auditns to set its own
audit_backlog_limit. if you think this is necessary, we can add a field 
max_backlog_limit
for per audit namespace. and set this value when we create auditns.

And seem like the audit_rate_limit should not be change by unprivileged user.

I don't know if I really follow your request...

> Could there be some automatic metrics to
> set sub audit_ns backlog limits, such as default to the same as
> init_audit_ns and have the init_audit_ns override those defaults?
> Could this be done per user like ulimiit?
> 

I think something like ulimit cannot help us.
we should set sub-auditns's backlog_limit in parent auditns..
so maybe the proc file is the best way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-24 Thread Gao feng
On 12/21/2013 05:15 AM, Serge E. Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> On 12/11/2013 04:36 AM, Serge E. Hallyn wrote:
>>> Quoting Eric Paris (epa...@redhat.com):
>>>> On Tue, 2013-12-10 at 10:51 -0600, Serge Hallyn wrote:
>>>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>>>> On 12/10/2013 02:26 AM, Serge Hallyn wrote:
>>>>>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>>>>>> On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
>>>>>>>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> On 10/24/2013 03:31 PM, Gao feng wrote:
>>>>>>>>>>> Here is the v1 patchset: http://lwn.net/Articles/549546/
>>>>>>>>>>>
>>>>>>>>>>> The main target of this patchset is allowing user in audit
>>>>>>>>>>> namespace to generate the USER_MSG type of audit message,
>>>>>>>>>>> some userspace tools need to generate audit message, or
>>>>>>>>>>> these tools will broken.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I really need this feature, right now,some process such as
>>>>>>>>>> logind are broken in container becase we leak of this feature.
>>>>>>>>>
>>>>>>>>> Your set doesn't address loginuid though right?  How exactly do you
>>>>>>>>> expect to do that?  If user violates MAC policy and audit msg is
>>>>>>>>> sent to init user ns by mac subsys, you need the loginuid from
>>>>>>>>> init_audit_ns.  where will that be stored if you allow updates
>>>>>>>>> of loginuid in auditns?
>>>>>>>>>
>>>>>>>> This patchset doesn't include the loginuid part.
>>>>>>>>
>>>>>>>> the loginuid is stored in task as before.
>>>>>>>> In my opinion, when task creates a new audit namespace, this task's
>>>>>>>> loginuid will be reset to zero, so the children tasks can set their
>>>>>>>> loginuid. Does this change break the MAC?
>>>>>>>
>>>>>>> I think so, yes.  In an LSPP selinux environment, if the task
>>>>>>> manages to trigger an selinux deny rule which is audited, then
>>>>>>> the loginuid must make sense on the host.  Now presumably it
>>>>>>> will get translated to the mapped host uid, and we can figure
>>>>>>> out the host uid owning it through /etc/subuid.  But that adds
>>>>>>> /etc/subuid as a new part of the TCB without any warning 
>>>>>>> So in that sense, for LSPP, it breaks it.
>>>>>>>
>>>>>>
>>>>>> Looks like my opinion is incorrect.
>>>>>>
>>>>>> In the audit-next tree, Eric added a new audit feature to allow 
>>>>>> privileged
>>>>>> user to disable AUDIT_LOGINUID_IMMUTABLE. after AUDIT_LOGINUID_IMMUTABLE
>>>>>> is disabled, the privileged user can reset/set the loginuid of task. I
>>>>>> think this way is safe since only privileged user can do the change.
>>>>>>
>>>>>> So I will not change the loginuid part.
>>>>>>
>>>>>> Thanks for your information Serge :)
>>>>>
>>>>> Unfortunately this makes the patchset much less compelling :)  The
>>>>> problem I was looking into is that a container running in a user
>>>>> namespace cannot (bc he has ns_capable(CAP_AUDIT_*) but not
>>>>> capable(CAP_AUDIT_*)) set loginuids at all.
>>>>>
>>>>> Which from an LSPP pov is correct;  which is why I was hoping you were
>>>>> going to have the audit namespaces be hierarchical, with a task in a
>>>>> level 2 audit ns having two loginuids - one in his own auditns, and
>>>>> one in the initial one.
>>>>
>>>> Right now user namespace + audit is just total crud.  We all know
>>>> this...  (I'm not sure pid is must better, but I digress)   All thoughts
>>>> around loginuid in the kernel right this very moment only make sense in
>>>> the initial user 

Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-24 Thread Gao feng
On 12/21/2013 05:15 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 On 12/11/2013 04:36 AM, Serge E. Hallyn wrote:
 Quoting Eric Paris (epa...@redhat.com):
 On Tue, 2013-12-10 at 10:51 -0600, Serge Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 On 12/10/2013 02:26 AM, Serge Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Hi

 On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/

 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.


 I really need this feature, right now,some process such as
 logind are broken in container becase we leak of this feature.

 Your set doesn't address loginuid though right?  How exactly do you
 expect to do that?  If user violates MAC policy and audit msg is
 sent to init user ns by mac subsys, you need the loginuid from
 init_audit_ns.  where will that be stored if you allow updates
 of loginuid in auditns?

 This patchset doesn't include the loginuid part.

 the loginuid is stored in task as before.
 In my opinion, when task creates a new audit namespace, this task's
 loginuid will be reset to zero, so the children tasks can set their
 loginuid. Does this change break the MAC?

 I think so, yes.  In an LSPP selinux environment, if the task
 manages to trigger an selinux deny rule which is audited, then
 the loginuid must make sense on the host.  Now presumably it
 will get translated to the mapped host uid, and we can figure
 out the host uid owning it through /etc/subuid.  But that adds
 /etc/subuid as a new part of the TCB without any warning shrug
 So in that sense, for LSPP, it breaks it.


 Looks like my opinion is incorrect.

 In the audit-next tree, Eric added a new audit feature to allow 
 privileged
 user to disable AUDIT_LOGINUID_IMMUTABLE. after AUDIT_LOGINUID_IMMUTABLE
 is disabled, the privileged user can reset/set the loginuid of task. I
 think this way is safe since only privileged user can do the change.

 So I will not change the loginuid part.

 Thanks for your information Serge :)

 Unfortunately this makes the patchset much less compelling :)  The
 problem I was looking into is that a container running in a user
 namespace cannot (bc he has ns_capable(CAP_AUDIT_*) but not
 capable(CAP_AUDIT_*)) set loginuids at all.

 Which from an LSPP pov is correct;  which is why I was hoping you were
 going to have the audit namespaces be hierarchical, with a task in a
 level 2 audit ns having two loginuids - one in his own auditns, and
 one in the initial one.

 Right now user namespace + audit is just total crud.  We all know
 this...  (I'm not sure pid is must better, but I digress)   All thoughts
 around loginuid in the kernel right this very moment only make sense in
 the initial user namespace and all permission checks are in the initial
 user namespace as well.

 I think I'm a proponent of the hierarchical approach to audit
 namespaces.  An audit namespace would hold a reference to the
 pid/user/whatever namespace it was created in/with.  Each audit
 namespace should have it's own set of filter rules, etc.  Instead of
 just storing 'loginuid' we store 'loginuid+user namespace'.   When the

 So long as the kernel stores the kuid_t (which the only sane thing to
 do) that is a non-issue.

 kernel creates a record it should translate the loginuid to the
 namespace of the audit namespace and send the record.

 Yup, that should go without saying.  Use kuid_t in kernel and translate
 at the kernel-user boundary.


 I can implement audit namespace as a hierarchy, give per auditns a level 
 value
 and a pointer which point to parent auditns.

 but for the loginuid part, I think we can implement it after we push the 
 audit
 ns into the upstream.

 Is this ok?
 
 Well as Ive said the loginuid part is the only one that interests
 me.  I'll be out most of the rest of the year, but I'll review any
 patchset you send for what seems to me to be correctness :)
 

Thanks for your help!
As soon as the frame of auditns being accepted, I think it's easily
to push the loginuid part. :)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-24 Thread Gao feng
On 12/24/2013 07:47 AM, Richard Guy Briggs wrote:
 On 13/12/09, Gao feng wrote:
 On 12/07/2013 05:31 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 
 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.

 And the login process in container may want to setup
 /proc/pid/loginuid, right now this value is unalterable
 once it being set. this will also broke the login problem
 in container. After this patchset, we can reset this loginuid
 to zero if task is running in a new audit namespace.

 Same with v1 patchset, in this patchset, only the privileged
 user in init_audit_ns and init_user_ns has rights to
 add/del audit rules. and these rules are gloabl. all
 audit namespace will comply with the rules.

 Compared with v1, v2 patch has some big changes.
 1, the audit namespace is not assigned to user namespace.
since there is no available bit of flags for clone, we
create audit namespace through netlink, patch[18/20]
introduces a new audit netlink type AUDIT_CREATE_NS.
the privileged user in userns has rights to create a
audit namespace, it means the unprivileged user can
create auditns through create userns first. In order
to prevent them from doing harm to host, the default
audit_backlog_limit of un-init-audit-ns is zero(means
audit is unavailable in audit namespace). and it can't
be changed in auditns through netlink.

 So the unprivileged user can create an audit-ns, but can't
 then actually send any messages there?  I guess setting it
 to something small would just be hacky?

 Yes, if unprivileged user wants to send audit message, he should
 ask privileged user to setup the audit_backlog_limit for him.

 I know it's a little of hack, but I don't have good idea :(
 
 There's a recent patch that actually clarifies the way
 audit_backlog_limit works, since different parts of the code seemed to
 think different things.  It now means unlimited, since there are other
 places to shut off logging.
   audit: allow unlimited backlog queue

Yep, thanks for your information, we can set a negative number to backlog_limit
to mark there is no available buff for this audit ns.

 
 At first, I'd say each audit_ns should be able to set its own
 audit_backlog_limit.  The most obvious argument against this would be
 the vulnerability of a DoS. 

There are two problem we should conside, auditns costs lot's of memory by
setting large backlog_limit and costs lot's of cpu resources by generating
audit log all the time. So I think the privileged user should have the ability
to limit the backlog len.

And I think it's not very necessary to keep on allowing auditns to set its own
audit_backlog_limit. if you think this is necessary, we can add a field 
max_backlog_limit
for per audit namespace. and set this value when we create auditns.

And seem like the audit_rate_limit should not be change by unprivileged user.

I don't know if I really follow your request...

 Could there be some automatic metrics to
 set sub audit_ns backlog limits, such as default to the same as
 init_audit_ns and have the init_audit_ns override those defaults?
 Could this be done per user like ulimiit?
 

I think something like ulimit cannot help us.
we should set sub-auditns's backlog_limit in parent auditns..
so maybe the proc file is the best way.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-19 Thread Gao feng
On 12/20/2013 11:11 AM, Eric Paris wrote:
> On Fri, 2013-12-20 at 10:46 +0800, Gao feng wrote:
>> On 12/20/2013 02:40 AM, Eric Paris wrote:
>>> On Thu, 2013-12-19 at 11:59 +0800, Gao feng wrote:
>>>> On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
> 
>>>> we have to store audit_sock
>>>> into auditns(auditns will be passed to kauditd_send_skb),
>>>> this will cause auditns have to get a reference of netns.
>>>> and for some reason(netfilter audit target), netns will
>>>> get reference of auditns too. this is terrible...
>>>
>>> I'm not sure I agree/understand this entirely...
>>>
>>
>> Yes, the audit_sock is created and destroyed by net namespace,
>> so if auditns wants to use audit_sock, it must prevent netns
>> from being destroyed. so auditns has to get reference of netns.
> 
> Namespace == mind blown.  Ok, so:
> 
>  auditd in audit_ns2 and net_ns2. <--- ONLY process in net_ns2
>  some process in audit_ns2 and net_ns3
> 
> Lets assume that auditd is killed improperly/dies.  Because the last
> process in net_ns2 is gone net_ns2 is invalid/freed.
> 
> Today in the kernel the way we detect auditd is gone is by using the
> socket and getting ECONNREFUSSED.  So here you think that audit_ns2
> should hold a reference on net_ns2, to make sure that socket is always
> valid.
> 
> I instead propose that we could run all audit_ns and reset the audit_pid
> in that namespace and the audit_sock in the namespace to 0/null inside
> audit_net_exit.  Since obviously if the net_ns disappeared, the auditd
> which was running in any audit namespace in that net_ns isn't running
> any more.  We didn't need to hold a reference on the net_ns.  We just
> have to clear the skb_queue, reset the audit_pid to 0, and reset the
> socket to NULL...

multi auditns can share the same netns. it happens if you unshare
auditns. if you want to reset audit_sock to null inside audit_net_exit,
you have to maintain a list in netns, this list contains the auditnss
whose audit_sock is created in this netns. so you can foreach this
list and reset the audit socks of audit nss.

Above is unsharing auditns, consider unsharing netns. auditd is running
in auditns1 and netns1, and then who-know-why the auditd call 
unshare(CLONE_NEWNET)
to change it's netns from netns1 to new netns2. so the netns1 is released
and auditns->audit_sock being reset to NULL. the auditd cannot receive
the audit log. auditd will in chaos, "I'm still alive, why kernel think
I'm die?"

So maybe you will say, we can reset the audit_sock of netns2 to auditns.
ok, this is a way. but how can we decide if we should reset the 
auditns->audit_sock?
when we create the new netns, the old netns is still alive, so the 
auditns->audit_sock
is still valid in that time.

I don't know if there are some other problems we should consider.
it is too complex..

> 
> Maybe the one magic socket is the right answer.  I'm not arguing against
> your solution.  I'm really trying to understand why we are going that
> way...
> 

That's why we should discuss :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-19 Thread Gao feng
On 12/20/2013 02:40 AM, Eric Paris wrote:
> On Thu, 2013-12-19 at 11:59 +0800, Gao feng wrote:
>> On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
>>> Convert audit from only listening in init_net to use 
>>> register_pernet_subsys()
>>> to dynamically manage the netlink socket list.
>>>
>>> Signed-off-by: Richard Guy Briggs 
>>> ---
>>
>> I think it's the time for us to discuss if we should revert this
>> commit, since this one prevent me from continuing to achieve
>> audit namespace.
>>
>>
>> The major problem is in kaudit_send_skb, we have no idea which
>> audit sock the skb should send to.
> 
> right, we have problems here no matter what...
> 
> If we stick with the current approach you will need to know socket +
> portid.  With your approach one only needs to know portid.  Since these
> are can both be part of the audit_ns structure I don't see a huge
> difference...
> 
>> we have to store audit_sock
>> into auditns(auditns will be passed to kauditd_send_skb),
>> this will cause auditns have to get a reference of netns.
>> and for some reason(netfilter audit target), netns will
>> get reference of auditns too. this is terrible...
> 
> I'm not sure I agree/understand this entirely...
> 

Yes, the audit_sock is created and destroyed by net namespace,
so if auditns wants to use audit_sock, it must prevent netns
from being destroyed. so auditns has to get reference of netns.

and in some case, netns will get reference of auditns too. this
is complex than making audit_sock global and getting rid of this
reference.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: fix build error when disable audit

2013-12-19 Thread Gao feng
On 12/20/2013 09:40 AM, Richard Guy Briggs wrote:
> On 13/12/20, Gao feng wrote:
>> On 12/20/2013 09:19 AM, Richard Guy Briggs wrote:
>>> On 13/12/19, Gao feng wrote:
>>>> On 12/19/2013 10:34 AM, Gao feng wrote:
>>>>> kernel/capability.c: In function ‘SYSC_capset’:
>>>>> kernel/capability.c:280:2: warning: passing argument 1 of 
>>>>> ‘audit_log_capset’ makes integer from pointer without a cast [enabled by 
>>>>> default]
>>>>>   audit_log_capset(new, current_cred());
>>>>>   ^
>>>>> In file included from kernel/capability.c:10:0:
>>>>> include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of 
>>>>> type ‘struct cred *’
>>>>>  static inline void audit_log_capset(pid_t pid, const struct cred *new,
>>>>> ^
>>>>> kernel/capability.c:280:2: error: too few arguments to function 
>>>>> ‘audit_log_capset’
>>>>>   audit_log_capset(new, current_cred());
>>>>>   ^
>>>>> In file included from kernel/capability.c:10:0:
>>>>> include/linux/audit.h:400:20: note: declared here
>>>>>  static inline void audit_log_capset(pid_t pid, const struct cred *new,
>>>>> ^
>>>>> make[1]: *** [kernel/capability.o] Error 1
>>>>
>>>> BTW, bug introduced by commmit 26b81eb408c411d86c7cc93278fb88fbcd785f65
>>>> audit: Simplify and correct audit_log_capset
>>>
>>> Yup, found it.  Thanks for the patch.  Since it isn't upstream yet, I
>>> may merge it and rebase for linux-next.
>>
>> thank you Richard, btw, do you have plan to push these changes to the 
>> linux-next?
> 
> Yes, that's where I'm putting all these, via Eric.
> 

get it, thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: fix build error when disable audit

2013-12-19 Thread Gao feng
On 12/20/2013 09:19 AM, Richard Guy Briggs wrote:
> On 13/12/19, Gao feng wrote:
>> On 12/19/2013 10:34 AM, Gao feng wrote:
>>> kernel/capability.c: In function ‘SYSC_capset’:
>>> kernel/capability.c:280:2: warning: passing argument 1 of 
>>> ‘audit_log_capset’ makes integer from pointer without a cast [enabled by 
>>> default]
>>>   audit_log_capset(new, current_cred());
>>>   ^
>>> In file included from kernel/capability.c:10:0:
>>> include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of 
>>> type ‘struct cred *’
>>>  static inline void audit_log_capset(pid_t pid, const struct cred *new,
>>> ^
>>> kernel/capability.c:280:2: error: too few arguments to function 
>>> ‘audit_log_capset’
>>>   audit_log_capset(new, current_cred());
>>>   ^
>>> In file included from kernel/capability.c:10:0:
>>> include/linux/audit.h:400:20: note: declared here
>>>  static inline void audit_log_capset(pid_t pid, const struct cred *new,
>>> ^
>>> make[1]: *** [kernel/capability.o] Error 1
>>
>> BTW, bug introduced by commmit 26b81eb408c411d86c7cc93278fb88fbcd785f65
>> audit: Simplify and correct audit_log_capset
> 
> Yup, found it.  Thanks for the patch.  Since it isn't upstream yet, I
> may merge it and rebase for linux-next.
> 

thank you Richard, btw, do you have plan to push these changes to the 
linux-next?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-19 Thread Gao feng
On 12/20/2013 02:40 AM, Eric Paris wrote:
> On Thu, 2013-12-19 at 11:59 +0800, Gao feng wrote:
>> On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
>>> Convert audit from only listening in init_net to use 
>>> register_pernet_subsys()
>>> to dynamically manage the netlink socket list.
>>>
>>> Signed-off-by: Richard Guy Briggs 
>>> ---
>>
>> I think it's the time for us to discuss if we should revert this
>> commit, since this one prevent me from continuing to achieve
>> audit namespace.
>>
>>
>> The major problem is in kaudit_send_skb, we have no idea which
>> audit sock the skb should send to.
> 
> right, we have problems here no matter what...
> 
> If we stick with the current approach you will need to know socket +
> portid.  With your approach one only needs to know portid.  Since these
> are can both be part of the audit_ns structure I don't see a huge
> difference...
> 
>> we have to store audit_sock
>> into auditns(auditns will be passed to kauditd_send_skb),
>> this will cause auditns have to get a reference of netns.
>> and for some reason(netfilter audit target), netns will
>> get reference of auditns too. this is terrible...
> 
> I'm not sure I agree/understand this entirely...
> 

My brain must be destroyed, I need to think about if auditns
should get reference of netns. it's not clear to me now. :(
but I intend to think you are right.

>> So why not we revert this one, and use a very simple one to
>> replace it? the below patch will save us from the refer to
>> each other case, achieve the same effect.
>>
>> what's your opinion?
> 
> Help me go all the way back to the beginning.  What's our end goal here
> again?
> 
> When thinking about this I realized we have another problem that I don't
> think we've considered.  Which makes me lean away from the single
> socket/kauditd  :(
> 
> I we have one socket and one kauditd ANY auditd can completely freeze
> the audit system.  Which seems problematic, especially if there isn't
> equal levels of trust between the different namespaces...  If one auditd
> gets hung (intentionally or not) the kernel will never send another
> audit message
> 
> Makes me think we really need a kauditd thread per namespace, possibly
> an skb queue per namespace.  At which point an audit socket per
> namespace makes a lot of sense too
> 

You are right, and My prototype supports per kauditd/auditd/sbk queue per
audit namespace. one auditd freeze in one auditns will not affect audit
subsystem in another auditns.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-19 Thread Gao feng
On 12/20/2013 02:40 AM, Eric Paris wrote:
 On Thu, 2013-12-19 at 11:59 +0800, Gao feng wrote:
 On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
 Convert audit from only listening in init_net to use 
 register_pernet_subsys()
 to dynamically manage the netlink socket list.

 Signed-off-by: Richard Guy Briggs r...@redhat.com
 ---

 I think it's the time for us to discuss if we should revert this
 commit, since this one prevent me from continuing to achieve
 audit namespace.


 The major problem is in kaudit_send_skb, we have no idea which
 audit sock the skb should send to.
 
 right, we have problems here no matter what...
 
 If we stick with the current approach you will need to know socket +
 portid.  With your approach one only needs to know portid.  Since these
 are can both be part of the audit_ns structure I don't see a huge
 difference...
 
 we have to store audit_sock
 into auditns(auditns will be passed to kauditd_send_skb),
 this will cause auditns have to get a reference of netns.
 and for some reason(netfilter audit target), netns will
 get reference of auditns too. this is terrible...
 
 I'm not sure I agree/understand this entirely...
 

My brain must be destroyed, I need to think about if auditns
should get reference of netns. it's not clear to me now. :(
but I intend to think you are right.

 So why not we revert this one, and use a very simple one to
 replace it? the below patch will save us from the refer to
 each other case, achieve the same effect.

 what's your opinion?
 
 Help me go all the way back to the beginning.  What's our end goal here
 again?
 
 When thinking about this I realized we have another problem that I don't
 think we've considered.  Which makes me lean away from the single
 socket/kauditd  :(
 
 I we have one socket and one kauditd ANY auditd can completely freeze
 the audit system.  Which seems problematic, especially if there isn't
 equal levels of trust between the different namespaces...  If one auditd
 gets hung (intentionally or not) the kernel will never send another
 audit message
 
 Makes me think we really need a kauditd thread per namespace, possibly
 an skb queue per namespace.  At which point an audit socket per
 namespace makes a lot of sense too
 

You are right, and My prototype supports per kauditd/auditd/sbk queue per
audit namespace. one auditd freeze in one auditns will not affect audit
subsystem in another auditns.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: fix build error when disable audit

2013-12-19 Thread Gao feng
On 12/20/2013 09:19 AM, Richard Guy Briggs wrote:
 On 13/12/19, Gao feng wrote:
 On 12/19/2013 10:34 AM, Gao feng wrote:
 kernel/capability.c: In function ‘SYSC_capset’:
 kernel/capability.c:280:2: warning: passing argument 1 of 
 ‘audit_log_capset’ makes integer from pointer without a cast [enabled by 
 default]
   audit_log_capset(new, current_cred());
   ^
 In file included from kernel/capability.c:10:0:
 include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of 
 type ‘struct cred *’
  static inline void audit_log_capset(pid_t pid, const struct cred *new,
 ^
 kernel/capability.c:280:2: error: too few arguments to function 
 ‘audit_log_capset’
   audit_log_capset(new, current_cred());
   ^
 In file included from kernel/capability.c:10:0:
 include/linux/audit.h:400:20: note: declared here
  static inline void audit_log_capset(pid_t pid, const struct cred *new,
 ^
 make[1]: *** [kernel/capability.o] Error 1

 BTW, bug introduced by commmit 26b81eb408c411d86c7cc93278fb88fbcd785f65
 audit: Simplify and correct audit_log_capset
 
 Yup, found it.  Thanks for the patch.  Since it isn't upstream yet, I
 may merge it and rebase for linux-next.
 

thank you Richard, btw, do you have plan to push these changes to the 
linux-next?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: fix build error when disable audit

2013-12-19 Thread Gao feng
On 12/20/2013 09:40 AM, Richard Guy Briggs wrote:
 On 13/12/20, Gao feng wrote:
 On 12/20/2013 09:19 AM, Richard Guy Briggs wrote:
 On 13/12/19, Gao feng wrote:
 On 12/19/2013 10:34 AM, Gao feng wrote:
 kernel/capability.c: In function ‘SYSC_capset’:
 kernel/capability.c:280:2: warning: passing argument 1 of 
 ‘audit_log_capset’ makes integer from pointer without a cast [enabled by 
 default]
   audit_log_capset(new, current_cred());
   ^
 In file included from kernel/capability.c:10:0:
 include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of 
 type ‘struct cred *’
  static inline void audit_log_capset(pid_t pid, const struct cred *new,
 ^
 kernel/capability.c:280:2: error: too few arguments to function 
 ‘audit_log_capset’
   audit_log_capset(new, current_cred());
   ^
 In file included from kernel/capability.c:10:0:
 include/linux/audit.h:400:20: note: declared here
  static inline void audit_log_capset(pid_t pid, const struct cred *new,
 ^
 make[1]: *** [kernel/capability.o] Error 1

 BTW, bug introduced by commmit 26b81eb408c411d86c7cc93278fb88fbcd785f65
 audit: Simplify and correct audit_log_capset

 Yup, found it.  Thanks for the patch.  Since it isn't upstream yet, I
 may merge it and rebase for linux-next.

 thank you Richard, btw, do you have plan to push these changes to the 
 linux-next?
 
 Yes, that's where I'm putting all these, via Eric.
 

get it, thanks!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-19 Thread Gao feng
On 12/20/2013 02:40 AM, Eric Paris wrote:
 On Thu, 2013-12-19 at 11:59 +0800, Gao feng wrote:
 On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
 Convert audit from only listening in init_net to use 
 register_pernet_subsys()
 to dynamically manage the netlink socket list.

 Signed-off-by: Richard Guy Briggs r...@redhat.com
 ---

 I think it's the time for us to discuss if we should revert this
 commit, since this one prevent me from continuing to achieve
 audit namespace.


 The major problem is in kaudit_send_skb, we have no idea which
 audit sock the skb should send to.
 
 right, we have problems here no matter what...
 
 If we stick with the current approach you will need to know socket +
 portid.  With your approach one only needs to know portid.  Since these
 are can both be part of the audit_ns structure I don't see a huge
 difference...
 
 we have to store audit_sock
 into auditns(auditns will be passed to kauditd_send_skb),
 this will cause auditns have to get a reference of netns.
 and for some reason(netfilter audit target), netns will
 get reference of auditns too. this is terrible...
 
 I'm not sure I agree/understand this entirely...
 

Yes, the audit_sock is created and destroyed by net namespace,
so if auditns wants to use audit_sock, it must prevent netns
from being destroyed. so auditns has to get reference of netns.

and in some case, netns will get reference of auditns too. this
is complex than making audit_sock global and getting rid of this
reference.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-19 Thread Gao feng
On 12/20/2013 11:11 AM, Eric Paris wrote:
 On Fri, 2013-12-20 at 10:46 +0800, Gao feng wrote:
 On 12/20/2013 02:40 AM, Eric Paris wrote:
 On Thu, 2013-12-19 at 11:59 +0800, Gao feng wrote:
 On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
 
 we have to store audit_sock
 into auditns(auditns will be passed to kauditd_send_skb),
 this will cause auditns have to get a reference of netns.
 and for some reason(netfilter audit target), netns will
 get reference of auditns too. this is terrible...

 I'm not sure I agree/understand this entirely...


 Yes, the audit_sock is created and destroyed by net namespace,
 so if auditns wants to use audit_sock, it must prevent netns
 from being destroyed. so auditns has to get reference of netns.
 
 Namespace == mind blown.  Ok, so:
 
  auditd in audit_ns2 and net_ns2. --- ONLY process in net_ns2
  some process in audit_ns2 and net_ns3
 
 Lets assume that auditd is killed improperly/dies.  Because the last
 process in net_ns2 is gone net_ns2 is invalid/freed.
 
 Today in the kernel the way we detect auditd is gone is by using the
 socket and getting ECONNREFUSSED.  So here you think that audit_ns2
 should hold a reference on net_ns2, to make sure that socket is always
 valid.
 
 I instead propose that we could run all audit_ns and reset the audit_pid
 in that namespace and the audit_sock in the namespace to 0/null inside
 audit_net_exit.  Since obviously if the net_ns disappeared, the auditd
 which was running in any audit namespace in that net_ns isn't running
 any more.  We didn't need to hold a reference on the net_ns.  We just
 have to clear the skb_queue, reset the audit_pid to 0, and reset the
 socket to NULL...

multi auditns can share the same netns. it happens if you unshare
auditns. if you want to reset audit_sock to null inside audit_net_exit,
you have to maintain a list in netns, this list contains the auditnss
whose audit_sock is created in this netns. so you can foreach this
list and reset the audit socks of audit nss.

Above is unsharing auditns, consider unsharing netns. auditd is running
in auditns1 and netns1, and then who-know-why the auditd call 
unshare(CLONE_NEWNET)
to change it's netns from netns1 to new netns2. so the netns1 is released
and auditns-audit_sock being reset to NULL. the auditd cannot receive
the audit log. auditd will in chaos, I'm still alive, why kernel think
I'm die?

So maybe you will say, we can reset the audit_sock of netns2 to auditns.
ok, this is a way. but how can we decide if we should reset the 
auditns-audit_sock?
when we create the new netns, the old netns is still alive, so the 
auditns-audit_sock
is still valid in that time.

I don't know if there are some other problems we should consider.
it is too complex..

 
 Maybe the one magic socket is the right answer.  I'm not arguing against
 your solution.  I'm really trying to understand why we are going that
 way...
 

That's why we should discuss :)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-18 Thread Gao feng
On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
> Convert audit from only listening in init_net to use register_pernet_subsys()
> to dynamically manage the netlink socket list.
> 
> Signed-off-by: Richard Guy Briggs 
> ---

I think it's the time for us to discuss if we should revert this
commit, since this one prevent me from continuing to achieve
audit namespace.


The major problem is in kaudit_send_skb, we have no idea which
audit sock the skb should send to.

in this patch, there only is one auditd proecess, so the
audit_sock is the only one. but when we have audit namespace.
there will be multi audit socks. we have to store audit_sock
into auditns(auditns will be passed to kauditd_send_skb),
this will cause auditns have to get a reference of netns.
and for some reason(netfilter audit target), netns will
get reference of auditns too. this is terrible...

So why not we revert this one, and use a very simple one to
replace it? the below patch will save us from the refer to
each other case, achieve the same effect.

what's your opinion?


Add a compare function which always return true for
audit netlink socket, this will cause audit netlink
sockets netns unaware, and no matter which netns the
user space audit netlink sockets belong to, they all
can find out and communicate with audit_sock.

This gets rid of the necessary to create per-netns
audit kernel side socket(audit_sock), it's pain to
depend on and get reference of netns for auditns.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 7b0e23a..468950b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -886,12 +886,18 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(_cmd_mutex);
 }

+static bool audit_compare(struct net *net, struct sock *sk)
+{
+   return true;
+}
+
 /* Initialize audit support at boot time. */
 static int __init audit_init(void)
 {
int i;
struct netlink_kernel_cfg cfg = {
.input  = audit_receive,
+   .compare = audit_compare,
};

if (audit_initialized == AUDIT_DISABLED)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: fix build error when disable audit

2013-12-18 Thread Gao feng
On 12/19/2013 10:34 AM, Gao feng wrote:
> kernel/capability.c: In function ‘SYSC_capset’:
> kernel/capability.c:280:2: warning: passing argument 1 of ‘audit_log_capset’ 
> makes integer from pointer without a cast [enabled by default]
>   audit_log_capset(new, current_cred());
>   ^
> In file included from kernel/capability.c:10:0:
> include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of type 
> ‘struct cred *’
>  static inline void audit_log_capset(pid_t pid, const struct cred *new,
> ^
> kernel/capability.c:280:2: error: too few arguments to function 
> ‘audit_log_capset’
>   audit_log_capset(new, current_cred());
>   ^
> In file included from kernel/capability.c:10:0:
> include/linux/audit.h:400:20: note: declared here
>  static inline void audit_log_capset(pid_t pid, const struct cred *new,
> ^
> make[1]: *** [kernel/capability.o] Error 1
> 

BTW, bug introduced by commmit 26b81eb408c411d86c7cc93278fb88fbcd785f65
audit: Simplify and correct audit_log_capset
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] audit: fix build error when disable audit

2013-12-18 Thread Gao feng
kernel/capability.c: In function ‘SYSC_capset’:
kernel/capability.c:280:2: warning: passing argument 1 of ‘audit_log_capset’ 
makes integer from pointer without a cast [enabled by default]
  audit_log_capset(new, current_cred());
  ^
In file included from kernel/capability.c:10:0:
include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of type 
‘struct cred *’
 static inline void audit_log_capset(pid_t pid, const struct cred *new,
^
kernel/capability.c:280:2: error: too few arguments to function 
‘audit_log_capset’
  audit_log_capset(new, current_cred());
  ^
In file included from kernel/capability.c:10:0:
include/linux/audit.h:400:20: note: declared here
 static inline void audit_log_capset(pid_t pid, const struct cred *new,
^
make[1]: *** [kernel/capability.o] Error 1

Signed-off-by: Gao feng 
---
 include/linux/audit.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b4d5160..6976219 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -397,8 +397,8 @@ static inline int audit_log_bprm_fcaps(struct linux_binprm 
*bprm,
 {
return 0;
 }
-static inline void audit_log_capset(pid_t pid, const struct cred *new,
-  const struct cred *old)
+static inline void audit_log_capset(const struct cred *new,
+   const struct cred *old)
 { }
 static inline void audit_mmap_fd(int fd, int flags)
 { }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] audit: fix build error when disable audit

2013-12-18 Thread Gao feng
kernel/capability.c: In function ‘SYSC_capset’:
kernel/capability.c:280:2: warning: passing argument 1 of ‘audit_log_capset’ 
makes integer from pointer without a cast [enabled by default]
  audit_log_capset(new, current_cred());
  ^
In file included from kernel/capability.c:10:0:
include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of type 
‘struct cred *’
 static inline void audit_log_capset(pid_t pid, const struct cred *new,
^
kernel/capability.c:280:2: error: too few arguments to function 
‘audit_log_capset’
  audit_log_capset(new, current_cred());
  ^
In file included from kernel/capability.c:10:0:
include/linux/audit.h:400:20: note: declared here
 static inline void audit_log_capset(pid_t pid, const struct cred *new,
^
make[1]: *** [kernel/capability.o] Error 1

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 include/linux/audit.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index b4d5160..6976219 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -397,8 +397,8 @@ static inline int audit_log_bprm_fcaps(struct linux_binprm 
*bprm,
 {
return 0;
 }
-static inline void audit_log_capset(pid_t pid, const struct cred *new,
-  const struct cred *old)
+static inline void audit_log_capset(const struct cred *new,
+   const struct cred *old)
 { }
 static inline void audit_mmap_fd(int fd, int flags)
 { }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: fix build error when disable audit

2013-12-18 Thread Gao feng
On 12/19/2013 10:34 AM, Gao feng wrote:
 kernel/capability.c: In function ‘SYSC_capset’:
 kernel/capability.c:280:2: warning: passing argument 1 of ‘audit_log_capset’ 
 makes integer from pointer without a cast [enabled by default]
   audit_log_capset(new, current_cred());
   ^
 In file included from kernel/capability.c:10:0:
 include/linux/audit.h:400:20: note: expected ‘pid_t’ but argument is of type 
 ‘struct cred *’
  static inline void audit_log_capset(pid_t pid, const struct cred *new,
 ^
 kernel/capability.c:280:2: error: too few arguments to function 
 ‘audit_log_capset’
   audit_log_capset(new, current_cred());
   ^
 In file included from kernel/capability.c:10:0:
 include/linux/audit.h:400:20: note: declared here
  static inline void audit_log_capset(pid_t pid, const struct cred *new,
 ^
 make[1]: *** [kernel/capability.o] Error 1
 

BTW, bug introduced by commmit 26b81eb408c411d86c7cc93278fb88fbcd785f65
audit: Simplify and correct audit_log_capset
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: listen in all network namespaces

2013-12-18 Thread Gao feng
On 07/17/2013 04:32 AM, Richard Guy Briggs wrote:
 Convert audit from only listening in init_net to use register_pernet_subsys()
 to dynamically manage the netlink socket list.
 
 Signed-off-by: Richard Guy Briggs r...@redhat.com
 ---

I think it's the time for us to discuss if we should revert this
commit, since this one prevent me from continuing to achieve
audit namespace.


The major problem is in kaudit_send_skb, we have no idea which
audit sock the skb should send to.

in this patch, there only is one auditd proecess, so the
audit_sock is the only one. but when we have audit namespace.
there will be multi audit socks. we have to store audit_sock
into auditns(auditns will be passed to kauditd_send_skb),
this will cause auditns have to get a reference of netns.
and for some reason(netfilter audit target), netns will
get reference of auditns too. this is terrible...

So why not we revert this one, and use a very simple one to
replace it? the below patch will save us from the refer to
each other case, achieve the same effect.

what's your opinion?


Add a compare function which always return true for
audit netlink socket, this will cause audit netlink
sockets netns unaware, and no matter which netns the
user space audit netlink sockets belong to, they all
can find out and communicate with audit_sock.

This gets rid of the necessary to create per-netns
audit kernel side socket(audit_sock), it's pain to
depend on and get reference of netns for auditns.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 kernel/audit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 7b0e23a..468950b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -886,12 +886,18 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(audit_cmd_mutex);
 }

+static bool audit_compare(struct net *net, struct sock *sk)
+{
+   return true;
+}
+
 /* Initialize audit support at boot time. */
 static int __init audit_init(void)
 {
int i;
struct netlink_kernel_cfg cfg = {
.input  = audit_receive,
+   .compare = audit_compare,
};

if (audit_initialized == AUDIT_DISABLED)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] audit: print error message when fail to create audit socket

2013-12-16 Thread Gao feng
print the error message and then return -ENOMEM.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 2a0ed0b..041b951 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1083,12 +1083,11 @@ static int __net_init audit_net_init(struct net *net)
pr_info("audit: initializing netlink socket in namespace\n");
 
aunet->nlsk = netlink_kernel_create(net, NETLINK_AUDIT, );
-   if (aunet->nlsk == NULL)
-   return -ENOMEM;
-   if (!aunet->nlsk)
+   if (aunet->nlsk == NULL) {
audit_panic("cannot initialize netlink socket in namespace");
-   else
-   aunet->nlsk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
+   return -ENOMEM;
+   }
+   aunet->nlsk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] audit: fix incorrect set of audit_sock

2013-12-16 Thread Gao feng
NETLINK_CB(skb).sk is the socket of user space process,
netlink_unicast in kauditd_send_skb wants the kernel
side socket. Since the sk_state of audit netlink socket
is not NETLINK_CONNECTED, so the netlink_getsockbyportid
doesn't return -ECONNREFUSED.

And the socket of userspace process can be released anytime,
so the audit_sock may point to invalid socket.

this patch sets the audit_sock to the kernel side audit
netlink socket.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 041b951..ff1d1d7 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -822,7 +822,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
audit_log_config_change("audit_pid", new_pid, 
audit_pid, 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
-   audit_sock = NETLINK_CB(skb).sk;
+   audit_sock = skb->sk;
}
if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
err = audit_set_rate_limit(s.rate_limit);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] audit: fix incorrect set of audit_sock

2013-12-16 Thread Gao feng
NETLINK_CB(skb).sk is the socket of user space process,
netlink_unicast in kauditd_send_skb wants the kernel
side socket. Since the sk_state of audit netlink socket
is not NETLINK_CONNECTED, so the netlink_getsockbyportid
doesn't return -ECONNREFUSED.

And the socket of userspace process can be released anytime,
so the audit_sock may point to invalid socket.

this patch sets the audit_sock to the kernel side audit
netlink socket.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 kernel/audit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 041b951..ff1d1d7 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -822,7 +822,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
audit_log_config_change(audit_pid, new_pid, 
audit_pid, 1);
audit_pid = new_pid;
audit_nlk_portid = NETLINK_CB(skb).portid;
-   audit_sock = NETLINK_CB(skb).sk;
+   audit_sock = skb-sk;
}
if (s.mask  AUDIT_STATUS_RATE_LIMIT) {
err = audit_set_rate_limit(s.rate_limit);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] audit: print error message when fail to create audit socket

2013-12-16 Thread Gao feng
print the error message and then return -ENOMEM.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 kernel/audit.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 2a0ed0b..041b951 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1083,12 +1083,11 @@ static int __net_init audit_net_init(struct net *net)
pr_info(audit: initializing netlink socket in namespace\n);
 
aunet-nlsk = netlink_kernel_create(net, NETLINK_AUDIT, cfg);
-   if (aunet-nlsk == NULL)
-   return -ENOMEM;
-   if (!aunet-nlsk)
+   if (aunet-nlsk == NULL) {
audit_panic(cannot initialize netlink socket in namespace);
-   else
-   aunet-nlsk-sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
+   return -ENOMEM;
+   }
+   aunet-nlsk-sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-15 Thread Gao feng
On 12/11/2013 04:36 AM, Serge E. Hallyn wrote:
> Quoting Eric Paris (epa...@redhat.com):
>> On Tue, 2013-12-10 at 10:51 -0600, Serge Hallyn wrote:
>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>> On 12/10/2013 02:26 AM, Serge Hallyn wrote:
>>>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>>>> On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
>>>>>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> On 10/24/2013 03:31 PM, Gao feng wrote:
>>>>>>>>> Here is the v1 patchset: http://lwn.net/Articles/549546/
>>>>>>>>>
>>>>>>>>> The main target of this patchset is allowing user in audit
>>>>>>>>> namespace to generate the USER_MSG type of audit message,
>>>>>>>>> some userspace tools need to generate audit message, or
>>>>>>>>> these tools will broken.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I really need this feature, right now,some process such as
>>>>>>>> logind are broken in container becase we leak of this feature.
>>>>>>>
>>>>>>> Your set doesn't address loginuid though right?  How exactly do you
>>>>>>> expect to do that?  If user violates MAC policy and audit msg is
>>>>>>> sent to init user ns by mac subsys, you need the loginuid from
>>>>>>> init_audit_ns.  where will that be stored if you allow updates
>>>>>>> of loginuid in auditns?
>>>>>>>
>>>>>> This patchset doesn't include the loginuid part.
>>>>>>
>>>>>> the loginuid is stored in task as before.
>>>>>> In my opinion, when task creates a new audit namespace, this task's
>>>>>> loginuid will be reset to zero, so the children tasks can set their
>>>>>> loginuid. Does this change break the MAC?
>>>>>
>>>>> I think so, yes.  In an LSPP selinux environment, if the task
>>>>> manages to trigger an selinux deny rule which is audited, then
>>>>> the loginuid must make sense on the host.  Now presumably it
>>>>> will get translated to the mapped host uid, and we can figure
>>>>> out the host uid owning it through /etc/subuid.  But that adds
>>>>> /etc/subuid as a new part of the TCB without any warning 
>>>>> So in that sense, for LSPP, it breaks it.
>>>>>
>>>>
>>>> Looks like my opinion is incorrect.
>>>>
>>>> In the audit-next tree, Eric added a new audit feature to allow privileged
>>>> user to disable AUDIT_LOGINUID_IMMUTABLE. after AUDIT_LOGINUID_IMMUTABLE
>>>> is disabled, the privileged user can reset/set the loginuid of task. I
>>>> think this way is safe since only privileged user can do the change.
>>>>
>>>> So I will not change the loginuid part.
>>>>
>>>> Thanks for your information Serge :)
>>>
>>> Unfortunately this makes the patchset much less compelling :)  The
>>> problem I was looking into is that a container running in a user
>>> namespace cannot (bc he has ns_capable(CAP_AUDIT_*) but not
>>> capable(CAP_AUDIT_*)) set loginuids at all.
>>>
>>> Which from an LSPP pov is correct;  which is why I was hoping you were
>>> going to have the audit namespaces be hierarchical, with a task in a
>>> level 2 audit ns having two loginuids - one in his own auditns, and
>>> one in the initial one.
>>
>> Right now user namespace + audit is just total crud.  We all know
>> this...  (I'm not sure pid is must better, but I digress)   All thoughts
>> around loginuid in the kernel right this very moment only make sense in
>> the initial user namespace and all permission checks are in the initial
>> user namespace as well.
>>
>> I think I'm a proponent of the hierarchical approach to audit
>> namespaces.  An audit namespace would hold a reference to the
>> pid/user/whatever namespace it was created in/with.  Each audit
>> namespace should have it's own set of filter rules, etc.  Instead of
>> just storing 'loginuid' we store 'loginuid+user namespace'.   When the
> 
> So long as the kernel stores the kuid_t (which the only sane thing to
> do) that is a non-issue.
> 
>> kernel creates a record it should translate the loginuid to the
>> namespace of the audit namespace and send the record.
> 
> Yup, that should go without saying.  Use kuid_t in kernel and translate
> at the kernel-user boundary.
> 

I can implement audit namespace as a hierarchy, give per auditns a level value
and a pointer which point to parent auditns.

but for the loginuid part, I think we can implement it after we push the audit
ns into the upstream.

Is this ok?
>> It's a pretty major rewrite, but at least it makes sense.  Things like
>> AVC's might show up in multiple audit logs, but in every log they would
>> make sense to the admin of that namespace...
>>
>> But what the hell do I know...
> 
> Exactly how it would all affect selinux.  I'm happy it seems we agree.

This idea looks good to me, I will Investigate this. :)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-15 Thread Gao feng
On 12/11/2013 04:36 AM, Serge E. Hallyn wrote:
 Quoting Eric Paris (epa...@redhat.com):
 On Tue, 2013-12-10 at 10:51 -0600, Serge Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 On 12/10/2013 02:26 AM, Serge Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Hi

 On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/

 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.


 I really need this feature, right now,some process such as
 logind are broken in container becase we leak of this feature.

 Your set doesn't address loginuid though right?  How exactly do you
 expect to do that?  If user violates MAC policy and audit msg is
 sent to init user ns by mac subsys, you need the loginuid from
 init_audit_ns.  where will that be stored if you allow updates
 of loginuid in auditns?

 This patchset doesn't include the loginuid part.

 the loginuid is stored in task as before.
 In my opinion, when task creates a new audit namespace, this task's
 loginuid will be reset to zero, so the children tasks can set their
 loginuid. Does this change break the MAC?

 I think so, yes.  In an LSPP selinux environment, if the task
 manages to trigger an selinux deny rule which is audited, then
 the loginuid must make sense on the host.  Now presumably it
 will get translated to the mapped host uid, and we can figure
 out the host uid owning it through /etc/subuid.  But that adds
 /etc/subuid as a new part of the TCB without any warning shrug
 So in that sense, for LSPP, it breaks it.


 Looks like my opinion is incorrect.

 In the audit-next tree, Eric added a new audit feature to allow privileged
 user to disable AUDIT_LOGINUID_IMMUTABLE. after AUDIT_LOGINUID_IMMUTABLE
 is disabled, the privileged user can reset/set the loginuid of task. I
 think this way is safe since only privileged user can do the change.

 So I will not change the loginuid part.

 Thanks for your information Serge :)

 Unfortunately this makes the patchset much less compelling :)  The
 problem I was looking into is that a container running in a user
 namespace cannot (bc he has ns_capable(CAP_AUDIT_*) but not
 capable(CAP_AUDIT_*)) set loginuids at all.

 Which from an LSPP pov is correct;  which is why I was hoping you were
 going to have the audit namespaces be hierarchical, with a task in a
 level 2 audit ns having two loginuids - one in his own auditns, and
 one in the initial one.

 Right now user namespace + audit is just total crud.  We all know
 this...  (I'm not sure pid is must better, but I digress)   All thoughts
 around loginuid in the kernel right this very moment only make sense in
 the initial user namespace and all permission checks are in the initial
 user namespace as well.

 I think I'm a proponent of the hierarchical approach to audit
 namespaces.  An audit namespace would hold a reference to the
 pid/user/whatever namespace it was created in/with.  Each audit
 namespace should have it's own set of filter rules, etc.  Instead of
 just storing 'loginuid' we store 'loginuid+user namespace'.   When the
 
 So long as the kernel stores the kuid_t (which the only sane thing to
 do) that is a non-issue.
 
 kernel creates a record it should translate the loginuid to the
 namespace of the audit namespace and send the record.
 
 Yup, that should go without saying.  Use kuid_t in kernel and translate
 at the kernel-user boundary.
 

I can implement audit namespace as a hierarchy, give per auditns a level value
and a pointer which point to parent auditns.

but for the loginuid part, I think we can implement it after we push the audit
ns into the upstream.

Is this ok?
 It's a pretty major rewrite, but at least it makes sense.  Things like
 AVC's might show up in multiple audit logs, but in every log they would
 make sense to the admin of that namespace...

 But what the hell do I know...
 
 Exactly how it would all affect selinux.  I'm happy it seems we agree.

This idea looks good to me, I will Investigate this. :)

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-10 Thread Gao feng
On 12/10/2013 02:26 AM, Serge Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>> Hi
>>>>
>>>> On 10/24/2013 03:31 PM, Gao feng wrote:
>>>>> Here is the v1 patchset: http://lwn.net/Articles/549546/
>>>>>
>>>>> The main target of this patchset is allowing user in audit
>>>>> namespace to generate the USER_MSG type of audit message,
>>>>> some userspace tools need to generate audit message, or
>>>>> these tools will broken.
>>>>>
>>>>
>>>> I really need this feature, right now,some process such as
>>>> logind are broken in container becase we leak of this feature.
>>>
>>> Your set doesn't address loginuid though right?  How exactly do you
>>> expect to do that?  If user violates MAC policy and audit msg is
>>> sent to init user ns by mac subsys, you need the loginuid from
>>> init_audit_ns.  where will that be stored if you allow updates
>>> of loginuid in auditns?
>>>
>> This patchset doesn't include the loginuid part.
>>
>> the loginuid is stored in task as before.
>> In my opinion, when task creates a new audit namespace, this task's
>> loginuid will be reset to zero, so the children tasks can set their
>> loginuid. Does this change break the MAC?
> 
> I think so, yes.  In an LSPP selinux environment, if the task
> manages to trigger an selinux deny rule which is audited, then
> the loginuid must make sense on the host.  Now presumably it
> will get translated to the mapped host uid, and we can figure
> out the host uid owning it through /etc/subuid.  But that adds
> /etc/subuid as a new part of the TCB without any warning 
> So in that sense, for LSPP, it breaks it.
> 

Looks like my opinion is incorrect.

In the audit-next tree, Eric added a new audit feature to allow privileged
user to disable AUDIT_LOGINUID_IMMUTABLE. after AUDIT_LOGINUID_IMMUTABLE
is disabled, the privileged user can reset/set the loginuid of task. I
think this way is safe since only privileged user can do the change.

So I will not change the loginuid part.

Thanks for your information Serge :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-10 Thread Gao feng
On 12/10/2013 02:26 AM, Serge Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Hi

 On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/

 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.


 I really need this feature, right now,some process such as
 logind are broken in container becase we leak of this feature.

 Your set doesn't address loginuid though right?  How exactly do you
 expect to do that?  If user violates MAC policy and audit msg is
 sent to init user ns by mac subsys, you need the loginuid from
 init_audit_ns.  where will that be stored if you allow updates
 of loginuid in auditns?

 This patchset doesn't include the loginuid part.

 the loginuid is stored in task as before.
 In my opinion, when task creates a new audit namespace, this task's
 loginuid will be reset to zero, so the children tasks can set their
 loginuid. Does this change break the MAC?
 
 I think so, yes.  In an LSPP selinux environment, if the task
 manages to trigger an selinux deny rule which is audited, then
 the loginuid must make sense on the host.  Now presumably it
 will get translated to the mapped host uid, and we can figure
 out the host uid owning it through /etc/subuid.  But that adds
 /etc/subuid as a new part of the TCB without any warning shrug
 So in that sense, for LSPP, it breaks it.
 

Looks like my opinion is incorrect.

In the audit-next tree, Eric added a new audit feature to allow privileged
user to disable AUDIT_LOGINUID_IMMUTABLE. after AUDIT_LOGINUID_IMMUTABLE
is disabled, the privileged user can reset/set the loginuid of task. I
think this way is safe since only privileged user can do the change.

So I will not change the loginuid part.

Thanks for your information Serge :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/20] audit: add new message type AUDIT_CREATE_NS

2013-12-09 Thread Gao feng
On 12/10/2013 01:53 AM, Serge Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> On 12/07/2013 06:10 AM, Serge E. Hallyn wrote:
>>> Quoting Gao feng (gaof...@cn.fujitsu.com):
>>>> Since there is no more place for flags of clone system call.
>>>> we need to find a way to create audit namespace.
>>>>
>>>> this patch add a new type of message AUDIT_CREATE_NS.
>>>> user space can create new audit namespace through
>>>> netlink.
>>>>
>>>> Right now, The privileged user in user namespace is allowed
>>>> to create audit namespace. it means the unprivileged user can
>>>> create an user namespace and then create audit namespace.
>>>>
>>>> Looks like it is not safe, but even the unprivileged user can
>>>> create audit namespace, it can do no harm to the host. un-init
>>>> audit namespace cann't effect the host.
>>>>
>>>> In the follow patches, the audit_backlog_limit will be per
>>>> audit namesapace, but only the privileged user has rights to
>>>> modify it. and the default value of audit_backlog_limit for
>>>> uninit audit namespace will be set to 0.
>>>>
>>>> And the audit_rate_limit will be limited too.
>>>>
>>>> Signed-off-by: Gao feng 
>>>> ---
>>>>  include/linux/audit_namespace.h |  7 +++
>>>>  include/uapi/linux/audit.h  |  1 +
>>>>  kernel/audit.c  | 22 ++
>>>>  kernel/audit_namespace.c| 29 +
>>>>  4 files changed, 59 insertions(+)
>>>>
>>>> diff --git a/include/linux/audit_namespace.h 
>>>> b/include/linux/audit_namespace.h
>>>> index 79a9b78..b17f052 100644
>>>> --- a/include/linux/audit_namespace.h
>>>> +++ b/include/linux/audit_namespace.h
>>>> @@ -54,6 +54,8 @@ void put_audit_ns(struct audit_namespace *ns)
>>>>rcu_read_unlock();
>>>>}
>>>>  }
>>>> +
>>>> +extern int unshare_audit_namespace(void);
>>>>  #else
>>>>  static inline
>>>>  struct audit_namespace *get_audit_ns(struct audit_namespace *ns)
>>>> @@ -66,6 +68,11 @@ void put_audit_ns(struct audit_namespace *ns)
>>>>  {
>>>>  
>>>>  }
>>>> +
>>>> +static inline int unshare_audit_namespace()
>>>> +{
>>>> +  return -EINVAL;
>>>> +}
>>>>  #endif
>>>>  
>>>>  static inline struct
>>>> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
>>>> index 75cef3f..877d509 100644
>>>> --- a/include/uapi/linux/audit.h
>>>> +++ b/include/uapi/linux/audit.h
>>>> @@ -68,6 +68,7 @@
>>>>  #define AUDIT_MAKE_EQUIV  1015/* Append to watched tree */
>>>>  #define AUDIT_TTY_GET 1016/* Get TTY auditing status */
>>>>  #define AUDIT_TTY_SET 1017/* Set TTY auditing status */
>>>> +#define AUDIT_CREATE_NS   1018/* Create new audit namespace */
>>>>  
>>>>  #define AUDIT_FIRST_USER_MSG  1100/* Userspace messages mostly 
>>>> uninteresting to kernel */
>>>>  #define AUDIT_USER_AVC1107/* We filter this differently */
>>>> diff --git a/kernel/audit.c b/kernel/audit.c
>>>> index c4d4291..86212d3 100644
>>>> --- a/kernel/audit.c
>>>> +++ b/kernel/audit.c
>>>> @@ -596,6 +596,12 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
>>>> msg_type)
>>>>!capable(CAP_AUDIT_CONTROL))
>>>>err = -EPERM;
>>>>break;
>>>> +  case AUDIT_CREATE_NS:
>>>> +  /* Allow privileged user in user namespace to
>>>> +   * create audit namespace */
>>>> +  if (!ns_capable(current_user_ns(), CAP_AUDIT_CONTROL))
>>>> +  err = -EPERM;
>>>> +  break;
>>>>case AUDIT_USER:
>>>>case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
>>>>case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
>>>> @@ -735,6 +741,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
>>>> struct nlmsghdr *nlh)
>>>>if (status_get->mask & AUDIT_STATUS_BACKLOG_LIMIT)
>>>>err = 
>>>> audit_set_bac

Re: [PATCH 18/20] audit: add new message type AUDIT_CREATE_NS

2013-12-09 Thread Gao feng
On 12/10/2013 01:53 AM, Serge Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 On 12/07/2013 06:10 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Since there is no more place for flags of clone system call.
 we need to find a way to create audit namespace.

 this patch add a new type of message AUDIT_CREATE_NS.
 user space can create new audit namespace through
 netlink.

 Right now, The privileged user in user namespace is allowed
 to create audit namespace. it means the unprivileged user can
 create an user namespace and then create audit namespace.

 Looks like it is not safe, but even the unprivileged user can
 create audit namespace, it can do no harm to the host. un-init
 audit namespace cann't effect the host.

 In the follow patches, the audit_backlog_limit will be per
 audit namesapace, but only the privileged user has rights to
 modify it. and the default value of audit_backlog_limit for
 uninit audit namespace will be set to 0.

 And the audit_rate_limit will be limited too.

 Signed-off-by: Gao feng gaof...@cn.fujitsu.com
 ---
  include/linux/audit_namespace.h |  7 +++
  include/uapi/linux/audit.h  |  1 +
  kernel/audit.c  | 22 ++
  kernel/audit_namespace.c| 29 +
  4 files changed, 59 insertions(+)

 diff --git a/include/linux/audit_namespace.h 
 b/include/linux/audit_namespace.h
 index 79a9b78..b17f052 100644
 --- a/include/linux/audit_namespace.h
 +++ b/include/linux/audit_namespace.h
 @@ -54,6 +54,8 @@ void put_audit_ns(struct audit_namespace *ns)
rcu_read_unlock();
}
  }
 +
 +extern int unshare_audit_namespace(void);
  #else
  static inline
  struct audit_namespace *get_audit_ns(struct audit_namespace *ns)
 @@ -66,6 +68,11 @@ void put_audit_ns(struct audit_namespace *ns)
  {
  
  }
 +
 +static inline int unshare_audit_namespace()
 +{
 +  return -EINVAL;
 +}
  #endif
  
  static inline struct
 diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
 index 75cef3f..877d509 100644
 --- a/include/uapi/linux/audit.h
 +++ b/include/uapi/linux/audit.h
 @@ -68,6 +68,7 @@
  #define AUDIT_MAKE_EQUIV  1015/* Append to watched tree */
  #define AUDIT_TTY_GET 1016/* Get TTY auditing status */
  #define AUDIT_TTY_SET 1017/* Set TTY auditing status */
 +#define AUDIT_CREATE_NS   1018/* Create new audit namespace */
  
  #define AUDIT_FIRST_USER_MSG  1100/* Userspace messages mostly 
 uninteresting to kernel */
  #define AUDIT_USER_AVC1107/* We filter this differently */
 diff --git a/kernel/audit.c b/kernel/audit.c
 index c4d4291..86212d3 100644
 --- a/kernel/audit.c
 +++ b/kernel/audit.c
 @@ -596,6 +596,12 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
 msg_type)
!capable(CAP_AUDIT_CONTROL))
err = -EPERM;
break;
 +  case AUDIT_CREATE_NS:
 +  /* Allow privileged user in user namespace to
 +   * create audit namespace */
 +  if (!ns_capable(current_user_ns(), CAP_AUDIT_CONTROL))
 +  err = -EPERM;
 +  break;
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
 @@ -735,6 +741,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
 struct nlmsghdr *nlh)
if (status_get-mask  AUDIT_STATUS_BACKLOG_LIMIT)
err = 
 audit_set_backlog_limit(status_get-backlog_limit);
break;
 +  case AUDIT_CREATE_NS:
 +  err = unshare_audit_namespace();
 +
 +  if (audit_enabled == AUDIT_OFF)
 +  break;
 +
 +  ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, AUDIT_CREATE_NS);
 +  if (ab) {
 +  audit_log_format(ab, Create audit namespace);
 +  audit_log_session_info(ab);
 +  audit_log_task_context(ab);
 +  audit_log_format(ab, res=%d, err ? 0 : 1);
 +  audit_log_end_ns(ns, ab);
 +  }
 +
 +  break;
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
 diff --git a/kernel/audit_namespace.c b/kernel/audit_namespace.c
 index 6d9cb8f..28c608e 100644
 --- a/kernel/audit_namespace.c
 +++ b/kernel/audit_namespace.c
 @@ -6,3 +6,32 @@ struct audit_namespace init_audit_ns = {
.user_ns = init_user_ns,
  };
  EXPORT_SYMBOL_GPL(init_audit_ns);
 +
 +int unshare_audit_namespace(void)
 +{
 +  struct task_struct *tsk = current;
 +  struct audit_namespace *new_audit = NULL;
 +  struct nsproxy *new_nsp;
 +
 +  new_audit = kzalloc(sizeof(struct audit_namespace), GFP_KERNEL);
 +  if (!new_audit)
 +  return -ENOMEM;
 +
 +  skb_queue_head_init(new_audit-queue);
 +  skb_queue_head_init(new_audit-hold_queue);
 +  init_waitqueue_head(new_audit-kauditd_wait);
 +  init_waitqueue_head(new_audit

Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-08 Thread Gao feng
Hi Serge,

Thanks for your comments!

On 12/07/2013 05:31 AM, Serge E. Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> Here is the v1 patchset: http://lwn.net/Articles/549546/
>>
>> The main target of this patchset is allowing user in audit
>> namespace to generate the USER_MSG type of audit message,
>> some userspace tools need to generate audit message, or
>> these tools will broken.
>>
>> And the login process in container may want to setup
>> /proc//loginuid, right now this value is unalterable
>> once it being set. this will also broke the login problem
>> in container. After this patchset, we can reset this loginuid
>> to zero if task is running in a new audit namespace.
>>
>> Same with v1 patchset, in this patchset, only the privileged
>> user in init_audit_ns and init_user_ns has rights to
>> add/del audit rules. and these rules are gloabl. all
>> audit namespace will comply with the rules.
>>
>> Compared with v1, v2 patch has some big changes.
>> 1, the audit namespace is not assigned to user namespace.
>>since there is no available bit of flags for clone, we
>>create audit namespace through netlink, patch[18/20]
>>introduces a new audit netlink type AUDIT_CREATE_NS.
>>the privileged user in userns has rights to create a
>>audit namespace, it means the unprivileged user can
>>create auditns through create userns first. In order
>>to prevent them from doing harm to host, the default
>>audit_backlog_limit of un-init-audit-ns is zero(means
>>audit is unavailable in audit namespace). and it can't
>>be changed in auditns through netlink.
> 
> So the unprivileged user can create an audit-ns, but can't
> then actually send any messages there?  I guess setting it
> to something small would just be hacky?

Yes, if unprivileged user wants to send audit message, he should
ask privileged user to setup the audit_backlog_limit for him.

I know it's a little of hack, but I don't have good idea :(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-08 Thread Gao feng
On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> Hi
>>
>> On 10/24/2013 03:31 PM, Gao feng wrote:
>>> Here is the v1 patchset: http://lwn.net/Articles/549546/
>>>
>>> The main target of this patchset is allowing user in audit
>>> namespace to generate the USER_MSG type of audit message,
>>> some userspace tools need to generate audit message, or
>>> these tools will broken.
>>>
>>
>> I really need this feature, right now,some process such as
>> logind are broken in container becase we leak of this feature.
> 
> Your set doesn't address loginuid though right?  How exactly do you
> expect to do that?  If user violates MAC policy and audit msg is
> sent to init user ns by mac subsys, you need the loginuid from
> init_audit_ns.  where will that be stored if you allow updates
> of loginuid in auditns?
> 
This patchset doesn't include the loginuid part.

the loginuid is stored in task as before.
In my opinion, when task creates a new audit namespace, this task's
loginuid will be reset to zero, so the children tasks can set their
loginuid. Does this change break the MAC?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/20] audit: add new message type AUDIT_CREATE_NS

2013-12-08 Thread Gao feng
On 12/07/2013 06:10 AM, Serge E. Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> Since there is no more place for flags of clone system call.
>> we need to find a way to create audit namespace.
>>
>> this patch add a new type of message AUDIT_CREATE_NS.
>> user space can create new audit namespace through
>> netlink.
>>
>> Right now, The privileged user in user namespace is allowed
>> to create audit namespace. it means the unprivileged user can
>> create an user namespace and then create audit namespace.
>>
>> Looks like it is not safe, but even the unprivileged user can
>> create audit namespace, it can do no harm to the host. un-init
>> audit namespace cann't effect the host.
>>
>> In the follow patches, the audit_backlog_limit will be per
>> audit namesapace, but only the privileged user has rights to
>> modify it. and the default value of audit_backlog_limit for
>> uninit audit namespace will be set to 0.
>>
>> And the audit_rate_limit will be limited too.
>>
>> Signed-off-by: Gao feng 
>> ---
>>  include/linux/audit_namespace.h |  7 +++
>>  include/uapi/linux/audit.h  |  1 +
>>  kernel/audit.c  | 22 ++
>>  kernel/audit_namespace.c| 29 +
>>  4 files changed, 59 insertions(+)
>>
>> diff --git a/include/linux/audit_namespace.h 
>> b/include/linux/audit_namespace.h
>> index 79a9b78..b17f052 100644
>> --- a/include/linux/audit_namespace.h
>> +++ b/include/linux/audit_namespace.h
>> @@ -54,6 +54,8 @@ void put_audit_ns(struct audit_namespace *ns)
>>  rcu_read_unlock();
>>  }
>>  }
>> +
>> +extern int unshare_audit_namespace(void);
>>  #else
>>  static inline
>>  struct audit_namespace *get_audit_ns(struct audit_namespace *ns)
>> @@ -66,6 +68,11 @@ void put_audit_ns(struct audit_namespace *ns)
>>  {
>>  
>>  }
>> +
>> +static inline int unshare_audit_namespace()
>> +{
>> +return -EINVAL;
>> +}
>>  #endif
>>  
>>  static inline struct
>> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
>> index 75cef3f..877d509 100644
>> --- a/include/uapi/linux/audit.h
>> +++ b/include/uapi/linux/audit.h
>> @@ -68,6 +68,7 @@
>>  #define AUDIT_MAKE_EQUIV1015/* Append to watched tree */
>>  #define AUDIT_TTY_GET   1016/* Get TTY auditing status */
>>  #define AUDIT_TTY_SET   1017/* Set TTY auditing status */
>> +#define AUDIT_CREATE_NS 1018/* Create new audit namespace */
>>  
>>  #define AUDIT_FIRST_USER_MSG1100/* Userspace messages mostly 
>> uninteresting to kernel */
>>  #define AUDIT_USER_AVC  1107/* We filter this differently */
>> diff --git a/kernel/audit.c b/kernel/audit.c
>> index c4d4291..86212d3 100644
>> --- a/kernel/audit.c
>> +++ b/kernel/audit.c
>> @@ -596,6 +596,12 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
>> msg_type)
>>  !capable(CAP_AUDIT_CONTROL))
>>  err = -EPERM;
>>  break;
>> +case AUDIT_CREATE_NS:
>> +/* Allow privileged user in user namespace to
>> + * create audit namespace */
>> +if (!ns_capable(current_user_ns(), CAP_AUDIT_CONTROL))
>> +err = -EPERM;
>> +break;
>>  case AUDIT_USER:
>>  case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
>>  case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
>> @@ -735,6 +741,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
>> struct nlmsghdr *nlh)
>>  if (status_get->mask & AUDIT_STATUS_BACKLOG_LIMIT)
>>  err = 
>> audit_set_backlog_limit(status_get->backlog_limit);
>>  break;
>> +case AUDIT_CREATE_NS:
>> +err = unshare_audit_namespace();
>> +
>> +if (audit_enabled == AUDIT_OFF)
>> +break;
>> +
>> +ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, AUDIT_CREATE_NS);
>> +if (ab) {
>> +audit_log_format(ab, "Create audit namespace");
>> +audit_log_session_info(ab);
>> +audit_log_task_context(ab);
>> +audit_log_format(ab, "res=%d", err ? 0 : 1);
>> +audit_log_end_ns(ns, ab);
>> +}
>> +
>> +break;

Re: [PATCH 16/20] audit: allow GET, SET, USER MSG operations in audit namespace

2013-12-08 Thread Gao feng
On 12/07/2013 06:00 AM, Serge E. Hallyn wrote:
> Quoting Gao feng (gaof...@cn.fujitsu.com):
>> 1, remove the permission check of pid namespace. it's no reason
>>to deny un-init pid namespace to operate audit subsystem.
>>
>> 2, only allow init user namespace and init audit namespace to
>>operate list/add/del rule, tty set, trim, make equiv operations.
>>
>> 3, allow audit namespace to get/set audit configuration, send
>>    userspace audit message.
>>
>> Signed-off-by: Gao feng 
>> ---
>>  kernel/audit.c | 13 ++---
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/audit.c b/kernel/audit.c
>> index 095f54d..c4d4291 100644
>> --- a/kernel/audit.c
>> +++ b/kernel/audit.c
>> @@ -573,11 +573,7 @@ out:
>>  static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
>>  {
>>  int err = 0;
>> -
>> -/* Only support the initial namespaces for now. */
>> -if ((current_user_ns() != _user_ns) ||
>> -(task_active_pid_ns(current) != _pid_ns))
>> -return -EPERM;
>> +struct audit_namespace *ns = current->nsproxy->audit_ns;
>>  
>>  switch (msg_type) {
>>  case AUDIT_LIST:
>> @@ -586,6 +582,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
>> msg_type)
>>  return -EOPNOTSUPP;
>>  case AUDIT_GET:
>>  case AUDIT_SET:
>> +break;
> 
> So, these AUDIT_SET and AUDIT_GET go from requiring CAP_AUDIT_CONTROL
> to not needing any privs at all?
> 

My mistake, there should be a check such as
ns_capable(ns, CAP_AUDIT_CONTROL).

will fix in next version.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/20] audit: allow GET, SET, USER MSG operations in audit namespace

2013-12-08 Thread Gao feng
On 12/07/2013 06:00 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 1, remove the permission check of pid namespace. it's no reason
to deny un-init pid namespace to operate audit subsystem.

 2, only allow init user namespace and init audit namespace to
operate list/add/del rule, tty set, trim, make equiv operations.

 3, allow audit namespace to get/set audit configuration, send
userspace audit message.

 Signed-off-by: Gao feng gaof...@cn.fujitsu.com
 ---
  kernel/audit.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

 diff --git a/kernel/audit.c b/kernel/audit.c
 index 095f54d..c4d4291 100644
 --- a/kernel/audit.c
 +++ b/kernel/audit.c
 @@ -573,11 +573,7 @@ out:
  static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
  {
  int err = 0;
 -
 -/* Only support the initial namespaces for now. */
 -if ((current_user_ns() != init_user_ns) ||
 -(task_active_pid_ns(current) != init_pid_ns))
 -return -EPERM;
 +struct audit_namespace *ns = current-nsproxy-audit_ns;
  
  switch (msg_type) {
  case AUDIT_LIST:
 @@ -586,6 +582,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
 msg_type)
  return -EOPNOTSUPP;
  case AUDIT_GET:
  case AUDIT_SET:
 +break;
 
 So, these AUDIT_SET and AUDIT_GET go from requiring CAP_AUDIT_CONTROL
 to not needing any privs at all?
 

My mistake, there should be a check such as
ns_capable(ns, CAP_AUDIT_CONTROL).

will fix in next version.

Thanks!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/20] audit: add new message type AUDIT_CREATE_NS

2013-12-08 Thread Gao feng
On 12/07/2013 06:10 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Since there is no more place for flags of clone system call.
 we need to find a way to create audit namespace.

 this patch add a new type of message AUDIT_CREATE_NS.
 user space can create new audit namespace through
 netlink.

 Right now, The privileged user in user namespace is allowed
 to create audit namespace. it means the unprivileged user can
 create an user namespace and then create audit namespace.

 Looks like it is not safe, but even the unprivileged user can
 create audit namespace, it can do no harm to the host. un-init
 audit namespace cann't effect the host.

 In the follow patches, the audit_backlog_limit will be per
 audit namesapace, but only the privileged user has rights to
 modify it. and the default value of audit_backlog_limit for
 uninit audit namespace will be set to 0.

 And the audit_rate_limit will be limited too.

 Signed-off-by: Gao feng gaof...@cn.fujitsu.com
 ---
  include/linux/audit_namespace.h |  7 +++
  include/uapi/linux/audit.h  |  1 +
  kernel/audit.c  | 22 ++
  kernel/audit_namespace.c| 29 +
  4 files changed, 59 insertions(+)

 diff --git a/include/linux/audit_namespace.h 
 b/include/linux/audit_namespace.h
 index 79a9b78..b17f052 100644
 --- a/include/linux/audit_namespace.h
 +++ b/include/linux/audit_namespace.h
 @@ -54,6 +54,8 @@ void put_audit_ns(struct audit_namespace *ns)
  rcu_read_unlock();
  }
  }
 +
 +extern int unshare_audit_namespace(void);
  #else
  static inline
  struct audit_namespace *get_audit_ns(struct audit_namespace *ns)
 @@ -66,6 +68,11 @@ void put_audit_ns(struct audit_namespace *ns)
  {
  
  }
 +
 +static inline int unshare_audit_namespace()
 +{
 +return -EINVAL;
 +}
  #endif
  
  static inline struct
 diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
 index 75cef3f..877d509 100644
 --- a/include/uapi/linux/audit.h
 +++ b/include/uapi/linux/audit.h
 @@ -68,6 +68,7 @@
  #define AUDIT_MAKE_EQUIV1015/* Append to watched tree */
  #define AUDIT_TTY_GET   1016/* Get TTY auditing status */
  #define AUDIT_TTY_SET   1017/* Set TTY auditing status */
 +#define AUDIT_CREATE_NS 1018/* Create new audit namespace */
  
  #define AUDIT_FIRST_USER_MSG1100/* Userspace messages mostly 
 uninteresting to kernel */
  #define AUDIT_USER_AVC  1107/* We filter this differently */
 diff --git a/kernel/audit.c b/kernel/audit.c
 index c4d4291..86212d3 100644
 --- a/kernel/audit.c
 +++ b/kernel/audit.c
 @@ -596,6 +596,12 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
 msg_type)
  !capable(CAP_AUDIT_CONTROL))
  err = -EPERM;
  break;
 +case AUDIT_CREATE_NS:
 +/* Allow privileged user in user namespace to
 + * create audit namespace */
 +if (!ns_capable(current_user_ns(), CAP_AUDIT_CONTROL))
 +err = -EPERM;
 +break;
  case AUDIT_USER:
  case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
  case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
 @@ -735,6 +741,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
 struct nlmsghdr *nlh)
  if (status_get-mask  AUDIT_STATUS_BACKLOG_LIMIT)
  err = 
 audit_set_backlog_limit(status_get-backlog_limit);
  break;
 +case AUDIT_CREATE_NS:
 +err = unshare_audit_namespace();
 +
 +if (audit_enabled == AUDIT_OFF)
 +break;
 +
 +ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, AUDIT_CREATE_NS);
 +if (ab) {
 +audit_log_format(ab, Create audit namespace);
 +audit_log_session_info(ab);
 +audit_log_task_context(ab);
 +audit_log_format(ab, res=%d, err ? 0 : 1);
 +audit_log_end_ns(ns, ab);
 +}
 +
 +break;
  case AUDIT_USER:
  case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
  case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
 diff --git a/kernel/audit_namespace.c b/kernel/audit_namespace.c
 index 6d9cb8f..28c608e 100644
 --- a/kernel/audit_namespace.c
 +++ b/kernel/audit_namespace.c
 @@ -6,3 +6,32 @@ struct audit_namespace init_audit_ns = {
  .user_ns = init_user_ns,
  };
  EXPORT_SYMBOL_GPL(init_audit_ns);
 +
 +int unshare_audit_namespace(void)
 +{
 +struct task_struct *tsk = current;
 +struct audit_namespace *new_audit = NULL;
 +struct nsproxy *new_nsp;
 +
 +new_audit = kzalloc(sizeof(struct audit_namespace), GFP_KERNEL);
 +if (!new_audit)
 +return -ENOMEM;
 +
 +skb_queue_head_init(new_audit-queue);
 +skb_queue_head_init(new_audit-hold_queue);
 +init_waitqueue_head(new_audit-kauditd_wait

Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-08 Thread Gao feng
On 12/07/2013 06:12 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Hi

 On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/

 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.


 I really need this feature, right now,some process such as
 logind are broken in container becase we leak of this feature.
 
 Your set doesn't address loginuid though right?  How exactly do you
 expect to do that?  If user violates MAC policy and audit msg is
 sent to init user ns by mac subsys, you need the loginuid from
 init_audit_ns.  where will that be stored if you allow updates
 of loginuid in auditns?
 
This patchset doesn't include the loginuid part.

the loginuid is stored in task as before.
In my opinion, when task creates a new audit namespace, this task's
loginuid will be reset to zero, so the children tasks can set their
loginuid. Does this change break the MAC?

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-08 Thread Gao feng
Hi Serge,

Thanks for your comments!

On 12/07/2013 05:31 AM, Serge E. Hallyn wrote:
 Quoting Gao feng (gaof...@cn.fujitsu.com):
 Here is the v1 patchset: http://lwn.net/Articles/549546/

 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.

 And the login process in container may want to setup
 /proc/pid/loginuid, right now this value is unalterable
 once it being set. this will also broke the login problem
 in container. After this patchset, we can reset this loginuid
 to zero if task is running in a new audit namespace.

 Same with v1 patchset, in this patchset, only the privileged
 user in init_audit_ns and init_user_ns has rights to
 add/del audit rules. and these rules are gloabl. all
 audit namespace will comply with the rules.

 Compared with v1, v2 patch has some big changes.
 1, the audit namespace is not assigned to user namespace.
since there is no available bit of flags for clone, we
create audit namespace through netlink, patch[18/20]
introduces a new audit netlink type AUDIT_CREATE_NS.
the privileged user in userns has rights to create a
audit namespace, it means the unprivileged user can
create auditns through create userns first. In order
to prevent them from doing harm to host, the default
audit_backlog_limit of un-init-audit-ns is zero(means
audit is unavailable in audit namespace). and it can't
be changed in auditns through netlink.
 
 So the unprivileged user can create an audit-ns, but can't
 then actually send any messages there?  I guess setting it
 to something small would just be hacky?

Yes, if unprivileged user wants to send audit message, he should
ask privileged user to setup the audit_backlog_limit for him.

I know it's a little of hack, but I don't have good idea :(

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-04 Thread Gao feng
Hi

On 10/24/2013 03:31 PM, Gao feng wrote:
> Here is the v1 patchset: http://lwn.net/Articles/549546/
> 
> The main target of this patchset is allowing user in audit
> namespace to generate the USER_MSG type of audit message,
> some userspace tools need to generate audit message, or
> these tools will broken.
> 

I really need this feature, right now,some process such as
logind are broken in container becase we leak of this feature.

> And the login process in container may want to setup
> /proc//loginuid, right now this value is unalterable
> once it being set. this will also broke the login problem
> in container. After this patchset, we can reset this loginuid
> to zero if task is running in a new audit namespace.
> 

I notice this problem has been fixed in audit tree, so may be
I don't need to clean up loginuid when we create audit namespace.

And in audit tree, audit socket is per net namespace, hmm, I want
to know if anybody has a solution to allow processes to generate
USER_MSG type of audit message in un init pid/user namepsace.

Any idea? Eric,Steve,Richard?

Thanks
Gao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-12-04 Thread Gao feng
Hi

On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/
 
 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.
 

I really need this feature, right now,some process such as
logind are broken in container becase we leak of this feature.

 And the login process in container may want to setup
 /proc/pid/loginuid, right now this value is unalterable
 once it being set. this will also broke the login problem
 in container. After this patchset, we can reset this loginuid
 to zero if task is running in a new audit namespace.
 

I notice this problem has been fixed in audit tree, so may be
I don't need to clean up loginuid when we create audit namespace.

And in audit tree, audit socket is per net namespace, hmm, I want
to know if anybody has a solution to allow processes to generate
USER_MSG type of audit message in un init pid/user namepsace.

Any idea? Eric,Steve,Richard?

Thanks
Gao
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nsproxy: Check to make sure count is truly zero before freeing

2013-11-18 Thread Gao feng
On 11/19/2013 08:04 AM, Steven Rostedt wrote:
> 
> I'll start out saying that this email was a complete oops. I only kept
> it around for reference, as this didn't fix the bug we were seeing, and
> I used this email to just document what I initially thought.
> 

Can you describe the panic situation and the way to reproduce?
it's useful for us to find out the real problem.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nsproxy: Check to make sure count is truly zero before freeing

2013-11-18 Thread Gao feng
On 11/19/2013 08:04 AM, Steven Rostedt wrote:
 
 I'll start out saying that this email was a complete oops. I only kept
 it around for reference, as this didn't fix the bug we were seeing, and
 I used this email to just document what I initially thought.
 

Can you describe the panic situation and the way to reproduce?
it's useful for us to find out the real problem.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-14 Thread Gao feng
On 11/15/2013 12:54 PM, Eric W. Biederman wrote:
> Gao feng  writes:
> 
>> On 11/15/2013 12:54 AM, Andy Lutomirski wrote:
>>> On Thu, Nov 14, 2013 at 3:10 AM, Gao feng  wrote:
>>>> On 11/13/2013 03:26 PM, Gao feng wrote:
>>>>> On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
>>>>>> Right now I would rather not have the empty directory exception than
>>>>>> remove this code.
>>>>>>
>>>>>> The test is a little trickier to write than it might otherwise be
>>>>>> because /proc and /sys tend to be slightly imperfect filesystems.
>>>>>>
>>>>>> I think the only way to really test that is to call readdir on the
>>>>>> directory itself :(  I don't like that thought.
>>>>>>
>>>>>> I don't know what I was thinking when I wrote that test but I definitely
>>>>>> goofed up.  Grr!
>>>>>>
>>>>>> I can certainly filter out any directory with nlink > 2.  That would be
>>>>>> an easy partial step forward.
>>>>>>
>>>>>> The real question though is how do I detect directories it is safe to
>>>>>> mount on where there will not be files in them.  I can't call iterate
>>>>>> with the namespace_lock held so things are a bit tricky.
>>>>>>
>>>>>
>>>>> I know this problem is not easy to be resolved. why not let the user
>>>>> make the decision?  maybe we can introduce a new mount option MS_LOCK,
>>>>> if user wants to use mount to hide something, he should use mount with
>>>>> option MS_LOCK. so the unpriviged user can't umount this filesystem and
>>>>> fail to mount the filesystem if one of it's child mount is mounted with
>>>>> MS_LOCK option otherwise he use MS_REC too.
>>>>>
>>>>
>>>> Something like this.
>>>>
>>>> From 437f33ea366623c7a9d557b2e84cae424876a44f Mon Sep 17 00:00:00 2001
>>>> From: Gao feng 
>>>> Date: Wed, 13 Nov 2013 16:06:46 +0800
>>>> Subject: [PATCH] userns: introduce new mount option MS_LOCK
>>>>
>>>> After commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942
>>>> vfs: Lock in place mounts from more privileged users,
>>>> in userns, the mounts of child mntns which copied from
>>>> parent mntns is locked and user has no rights to umount/move
>>>> them, it's too strict.
>>>>
>>>> The core purpose of above commit is trying to prevent
>>>> unprivileged user from accessing files hidden by mount.
>>>> This patch introduces a new mount option MS_LOCK, this
>>>> gives user the capable to mount filesystem as the type
>>>> of lock if he wants to use mount to hide something.
>>>>
>>>
>>> This is bad -- if something was secure in old kernels, it needs to
>>> stay secure.  If you had MS_NOT_A_LOCK, that would be okay, but it
>>> might not solve your problem.
>>>
>>
>> what you mean old kernels here? I saw patch "vfs: Lock in place mounts from 
>> more privileged users"
>> is merged into upstream in linux 3.12-rc1, this is not very old. I think 
>> there
>> are not many userspace processes rely on this feature.
> 
> Sort of true.  Most people aren't that silly.  This feature was added to
> defend against a theoretical attack that you can use with mount
> namespaces.
> 
> In particular the scenario we are concerned with is:
> 
> Suppose the file system looks like:
> 
> Suppose there are two filesystems a and b that look like:
> 
> a:/usr/
> a:/usr/my_very_secret_file
> a:/dev/
> a:/etc/
> a:/lib/
> 
> b:/bin/
> b:/etc/
> b:/games/
> b:/include/
> b:/lib/
> b:/lib32/
> b:/local/
> b:/sbin/
> b:/share/
> b:/src/
> 
> And filesystem b is mounted on a:/usr hiding a:/usr/my_very_secret_file
> 
> So the filesystem looks like:
> 
> /usr/
> /usr/bin/
> /usr/etc/
> /usr/games/
> /usr/include/
> /usr/lib/
> /usr/lib32/
> /usr/local/
> /usr/sbin/
> /usr/share/
> /usr/src/
> /dev/
> /etc/
> /lib/
> 
> Without locking mounts into place an unprivileged user can clone the
> mount namespace and do "umount /usr" and read /usr/my_very_secret_file.
> 
> Most systems don't hide sensitive things with mounts but it is very
> possible and guarding against is fairly cheap and easy.  And while a
> little annoying it should not be a large impediment to unpri

Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-14 Thread Gao feng
On 11/15/2013 12:54 AM, Andy Lutomirski wrote:
> On Thu, Nov 14, 2013 at 3:10 AM, Gao feng  wrote:
>> On 11/13/2013 03:26 PM, Gao feng wrote:
>>> On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
>>>> Right now I would rather not have the empty directory exception than
>>>> remove this code.
>>>>
>>>> The test is a little trickier to write than it might otherwise be
>>>> because /proc and /sys tend to be slightly imperfect filesystems.
>>>>
>>>> I think the only way to really test that is to call readdir on the
>>>> directory itself :(  I don't like that thought.
>>>>
>>>> I don't know what I was thinking when I wrote that test but I definitely
>>>> goofed up.  Grr!
>>>>
>>>> I can certainly filter out any directory with nlink > 2.  That would be
>>>> an easy partial step forward.
>>>>
>>>> The real question though is how do I detect directories it is safe to
>>>> mount on where there will not be files in them.  I can't call iterate
>>>> with the namespace_lock held so things are a bit tricky.
>>>>
>>>
>>> I know this problem is not easy to be resolved. why not let the user
>>> make the decision?  maybe we can introduce a new mount option MS_LOCK,
>>> if user wants to use mount to hide something, he should use mount with
>>> option MS_LOCK. so the unpriviged user can't umount this filesystem and
>>> fail to mount the filesystem if one of it's child mount is mounted with
>>> MS_LOCK option otherwise he use MS_REC too.
>>>
>>
>> Something like this.
>>
>> From 437f33ea366623c7a9d557b2e84cae424876a44f Mon Sep 17 00:00:00 2001
>> From: Gao feng 
>> Date: Wed, 13 Nov 2013 16:06:46 +0800
>> Subject: [PATCH] userns: introduce new mount option MS_LOCK
>>
>> After commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942
>> vfs: Lock in place mounts from more privileged users,
>> in userns, the mounts of child mntns which copied from
>> parent mntns is locked and user has no rights to umount/move
>> them, it's too strict.
>>
>> The core purpose of above commit is trying to prevent
>> unprivileged user from accessing files hidden by mount.
>> This patch introduces a new mount option MS_LOCK, this
>> gives user the capable to mount filesystem as the type
>> of lock if he wants to use mount to hide something.
>>
> 
> This is bad -- if something was secure in old kernels, it needs to
> stay secure.  If you had MS_NOT_A_LOCK, that would be okay, but it
> might not solve your problem.
> 

what you mean old kernels here? I saw patch "vfs: Lock in place mounts from 
more privileged users"
is merged into upstream in linux 3.12-rc1, this is not very old. I think there
are not many userspace processes rely on this feature.

If user think host needs to be secure, he should use MS_LOCK to mount 
filesystem.
we can't make decison for user.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] userns: allow privileged user to operate locked mount

2013-11-14 Thread Gao feng
On 11/15/2013 07:50 AM, Eric W. Biederman wrote:
> Gao feng  writes:
> 
>> Privileged user should have rights to mount/umount/move
>> these even locked mount.
> 
> Hmm. This is pretty much a can't happen case, as the only exist in mount
> namespaces where the global root isn't the root.  How are you getting
> into this situation?  Using setns() ?
> 

Before, priviged user can use setns to set his mount namespace to the
container's mount namespace, and change container's mount directly.
this patch just gives back host the control of container.

> Why would we even care?
> 
> As implemented this patch does not handle nested user namespaces and
> that really worries me at a semantic level.
> 
> We don't want to design cases where we can create containers in
> containers.
> 
> Eric
> 
> 
>> Signed-off-by: Gao feng 
>> ---
>>  fs/namespace.c | 14 ++
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index da5c494..7097fc7 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -1297,6 +1297,11 @@ static inline bool may_mount(void)
>>  return ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN);
>>  }
>>  
>> +static inline bool may_mount_lock(struct mount *mnt)
>> +{
>> +return !(mnt->mnt.mnt_flags & MNT_LOCKED) || capable(CAP_SYS_ADMIN);
>> +}
>> +
>>  /*
>>   * Now umount can handle mount points as well as block devices.
>>   * This is important for filesystems which use unnamed block devices.
>> @@ -1330,7 +1335,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, 
>> flags)
>>  goto dput_and_out;
>>  if (!check_mnt(mnt))
>>  goto dput_and_out;
>> -if (mnt->mnt.mnt_flags & MNT_LOCKED)
>> +if (!may_mount_lock(mnt))
>>  goto dput_and_out;
>>  
>>  retval = do_umount(mnt, flags);
>> @@ -1768,7 +1773,8 @@ static int do_loopback(struct path *path, const char 
>> *old_name,
>>  if (!check_mnt(parent) || !check_mnt(old))
>>  goto out2;
>>  
>> -if (!recurse && has_locked_children(old, old_path.dentry))
>> +if (!recurse && has_locked_children(old, old_path.dentry) &&
>> +!capable(CAP_SYS_ADMIN))
>>  goto out2;
>>  
>>  if (recurse)
>> @@ -1895,7 +1901,7 @@ static int do_move_mount(struct path *path, const char 
>> *old_name)
>>  if (!check_mnt(p) || !check_mnt(old))
>>  goto out1;
>>  
>> -if (old->mnt.mnt_flags & MNT_LOCKED)
>> +if (!may_mount_lock(old))
>>  goto out1;
>>  
>>  err = -EINVAL;
>> @@ -2679,7 +2685,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, 
>> new_root,
>>  goto out4;
>>  if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
>>  goto out4;
>> -if (new_mnt->mnt.mnt_flags & MNT_LOCKED)
>> +if (!may_mount_lock(new_mnt))
>>  goto out4;
>>  error = -ENOENT;
>>  if (d_unlinked(new.dentry))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-14 Thread Gao feng
On 11/13/2013 03:26 PM, Gao feng wrote:
> On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
>> Gao feng  writes:
>>
>>> On 11/02/2013 02:06 PM, Gao feng wrote:
>>>> Hi Eric,
>>>>
>>>> On 08/28/2013 05:44 AM, Eric W. Biederman wrote:
>>>>>
>>>>> Rely on the fact that another flavor of the filesystem is already
>>>>> mounted and do not rely on state in the user namespace.
>>>>>
>>>>> Verify that the mounted filesystem is not covered in any significant
>>>>> way.  I would love to verify that the previously mounted filesystem
>>>>> has no mounts on top but there are at least the directories
>>>>> /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
>>>>> for other filesystems to mount on top of.
>>>>>
>>>>> Refactor the test into a function named fs_fully_visible and call that
>>>>> function from the mount routines of proc and sysfs.  This makes this
>>>>> test local to the filesystems involved and the results current of when
>>>>> the mounts take place, removing a weird threading of the user
>>>>> namespace, the mount namespace and the filesystems themselves.
>>>>>
>>>>> Signed-off-by: "Eric W. Biederman" 
>>>>> ---
>>>>>  fs/namespace.c |   37 
>>>>> +
>>>>>  fs/proc/root.c |7 +--
>>>>>  fs/sysfs/mount.c   |3 ++-
>>>>>  include/linux/fs.h |1 +
>>>>>  include/linux/user_namespace.h |4 
>>>>>  kernel/user.c  |2 --
>>>>>  kernel/user_namespace.c|2 --
>>>>>  7 files changed, 33 insertions(+), 23 deletions(-)
>>>>>
>>>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>>>> index 64627f8..877e427 100644
>>>>> --- a/fs/namespace.c
>>>>> +++ b/fs/namespace.c
>>>>> @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
>>>>>   return chrooted;
>>>>>  }
>>>>>  
>>>>> -void update_mnt_policy(struct user_namespace *userns)
>>>>> +bool fs_fully_visible(struct file_system_type *type)
>>>>>  {
>>>>>   struct mnt_namespace *ns = current->nsproxy->mnt_ns;
>>>>>   struct mount *mnt;
>>>>> + bool visible = false;
>>>>>  
>>>>> - down_read(_sem);
>>>>> + if (unlikely(!ns))
>>>>> + return false;
>>>>> +
>>>>> + namespace_lock();
>>>>>   list_for_each_entry(mnt, >list, mnt_list) {
>>>>> - switch (mnt->mnt.mnt_sb->s_magic) {
>>>>> - case SYSFS_MAGIC:
>>>>> - userns->may_mount_sysfs = true;
>>>>> - break;
>>>>> - case PROC_SUPER_MAGIC:
>>>>> - userns->may_mount_proc = true;
>>>>> - break;
>>>>> + struct mount *child;
>>>>> + if (mnt->mnt.mnt_sb->s_type != type)
>>>>> + continue;
>>>>> +
>>>>> + /* This mount is not fully visible if there are any child mounts
>>>>> +  * that cover anything except for empty directories.
>>>>> +  */
>>>>> + list_for_each_entry(child, >mnt_mounts, mnt_child) {
>>>>> + struct inode *inode = child->mnt_mountpoint->d_inode;
>>>>> + if (!S_ISDIR(inode->i_mode))
>>>>> + goto next;
>>>>> + if (inode->i_nlink != 2)
>>>>> + goto next;
>>>>
>>>>
>>>> I met a problem that proc filesystem failed to mount in user namespace,
>>>> The problem is the i_nlink of sysctl entries under proc filesystem is not
>>>> 2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
>>>> btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
>>>> filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
>>>> on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
>>>> 1.
>>
>> Yes. 1 is what filesystems that are too

Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-14 Thread Gao feng
On 11/13/2013 03:26 PM, Gao feng wrote:
 On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
 Gao feng gaof...@cn.fujitsu.com writes:

 On 11/02/2013 02:06 PM, Gao feng wrote:
 Hi Eric,

 On 08/28/2013 05:44 AM, Eric W. Biederman wrote:

 Rely on the fact that another flavor of the filesystem is already
 mounted and do not rely on state in the user namespace.

 Verify that the mounted filesystem is not covered in any significant
 way.  I would love to verify that the previously mounted filesystem
 has no mounts on top but there are at least the directories
 /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
 for other filesystems to mount on top of.

 Refactor the test into a function named fs_fully_visible and call that
 function from the mount routines of proc and sysfs.  This makes this
 test local to the filesystems involved and the results current of when
 the mounts take place, removing a weird threading of the user
 namespace, the mount namespace and the filesystems themselves.

 Signed-off-by: Eric W. Biederman ebied...@xmission.com
 ---
  fs/namespace.c |   37 
 +
  fs/proc/root.c |7 +--
  fs/sysfs/mount.c   |3 ++-
  include/linux/fs.h |1 +
  include/linux/user_namespace.h |4 
  kernel/user.c  |2 --
  kernel/user_namespace.c|2 --
  7 files changed, 33 insertions(+), 23 deletions(-)

 diff --git a/fs/namespace.c b/fs/namespace.c
 index 64627f8..877e427 100644
 --- a/fs/namespace.c
 +++ b/fs/namespace.c
 @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
   return chrooted;
  }
  
 -void update_mnt_policy(struct user_namespace *userns)
 +bool fs_fully_visible(struct file_system_type *type)
  {
   struct mnt_namespace *ns = current-nsproxy-mnt_ns;
   struct mount *mnt;
 + bool visible = false;
  
 - down_read(namespace_sem);
 + if (unlikely(!ns))
 + return false;
 +
 + namespace_lock();
   list_for_each_entry(mnt, ns-list, mnt_list) {
 - switch (mnt-mnt.mnt_sb-s_magic) {
 - case SYSFS_MAGIC:
 - userns-may_mount_sysfs = true;
 - break;
 - case PROC_SUPER_MAGIC:
 - userns-may_mount_proc = true;
 - break;
 + struct mount *child;
 + if (mnt-mnt.mnt_sb-s_type != type)
 + continue;
 +
 + /* This mount is not fully visible if there are any child mounts
 +  * that cover anything except for empty directories.
 +  */
 + list_for_each_entry(child, mnt-mnt_mounts, mnt_child) {
 + struct inode *inode = child-mnt_mountpoint-d_inode;
 + if (!S_ISDIR(inode-i_mode))
 + goto next;
 + if (inode-i_nlink != 2)
 + goto next;


 I met a problem that proc filesystem failed to mount in user namespace,
 The problem is the i_nlink of sysctl entries under proc filesystem is not
 2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
 btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
 filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
 on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
 1.

 Yes. 1 is what filesystems that are too lazy to count the number of
 links to a directory return, and /proc/sys is currently such a
 filesystem.

 Ordinarily nlink == 2 means a directory does not have any subdirectories.

 btw, I'm not quite understand what's the inode-i_nlink != 2 here means?
 is this directory empty? as I know, when we create a file(not dir) under
 a dir, the i_nlink of this dir will not increase.

 And another question, it looks like if we don't have proc/sys fs mounted,
 then proc/sys will be failed to be mounted?


 Any Idea?? or should we need to revert this patch??

 The patch is mostly doing what it is supposed to be doing.

 Now the code is slightly buggy.  inode-i_nlink will test to see if a
 directory has subdirectories but it won't test to see if a directory is
 empty.  Where did my brain go when I was writing that test?

 Right now I would rather not have the empty directory exception than
 remove this code.

 The test is a little trickier to write than it might otherwise be
 because /proc and /sys tend to be slightly imperfect filesystems.

 I think the only way to really test that is to call readdir on the
 directory itself :(  I don't like that thought.

 I don't know what I was thinking when I wrote that test but I definitely
 goofed up.  Grr!

 I can certainly filter out any directory with nlink  2.  That would be
 an easy partial step forward.

 The real question though is how do I detect directories it is safe to
 mount on where there will not be files in them.  I can't call iterate
 with the namespace_lock held so things are a bit tricky.

 
 I know this problem is not easy to be resolved. why not let

Re: [PATCH] userns: allow privileged user to operate locked mount

2013-11-14 Thread Gao feng
On 11/15/2013 07:50 AM, Eric W. Biederman wrote:
 Gao feng gaof...@cn.fujitsu.com writes:
 
 Privileged user should have rights to mount/umount/move
 these even locked mount.
 
 Hmm. This is pretty much a can't happen case, as the only exist in mount
 namespaces where the global root isn't the root.  How are you getting
 into this situation?  Using setns() ?
 

Before, priviged user can use setns to set his mount namespace to the
container's mount namespace, and change container's mount directly.
this patch just gives back host the control of container.

 Why would we even care?
 
 As implemented this patch does not handle nested user namespaces and
 that really worries me at a semantic level.
 
 We don't want to design cases where we can create containers in
 containers.
 
 Eric
 
 
 Signed-off-by: Gao feng gaof...@cn.fujitsu.com
 ---
  fs/namespace.c | 14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

 diff --git a/fs/namespace.c b/fs/namespace.c
 index da5c494..7097fc7 100644
 --- a/fs/namespace.c
 +++ b/fs/namespace.c
 @@ -1297,6 +1297,11 @@ static inline bool may_mount(void)
  return ns_capable(current-nsproxy-mnt_ns-user_ns, CAP_SYS_ADMIN);
  }
  
 +static inline bool may_mount_lock(struct mount *mnt)
 +{
 +return !(mnt-mnt.mnt_flags  MNT_LOCKED) || capable(CAP_SYS_ADMIN);
 +}
 +
  /*
   * Now umount can handle mount points as well as block devices.
   * This is important for filesystems which use unnamed block devices.
 @@ -1330,7 +1335,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, 
 flags)
  goto dput_and_out;
  if (!check_mnt(mnt))
  goto dput_and_out;
 -if (mnt-mnt.mnt_flags  MNT_LOCKED)
 +if (!may_mount_lock(mnt))
  goto dput_and_out;
  
  retval = do_umount(mnt, flags);
 @@ -1768,7 +1773,8 @@ static int do_loopback(struct path *path, const char 
 *old_name,
  if (!check_mnt(parent) || !check_mnt(old))
  goto out2;
  
 -if (!recurse  has_locked_children(old, old_path.dentry))
 +if (!recurse  has_locked_children(old, old_path.dentry) 
 +!capable(CAP_SYS_ADMIN))
  goto out2;
  
  if (recurse)
 @@ -1895,7 +1901,7 @@ static int do_move_mount(struct path *path, const char 
 *old_name)
  if (!check_mnt(p) || !check_mnt(old))
  goto out1;
  
 -if (old-mnt.mnt_flags  MNT_LOCKED)
 +if (!may_mount_lock(old))
  goto out1;
  
  err = -EINVAL;
 @@ -2679,7 +2685,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, 
 new_root,
  goto out4;
  if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
  goto out4;
 -if (new_mnt-mnt.mnt_flags  MNT_LOCKED)
 +if (!may_mount_lock(new_mnt))
  goto out4;
  error = -ENOENT;
  if (d_unlinked(new.dentry))
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-14 Thread Gao feng
On 11/15/2013 12:54 AM, Andy Lutomirski wrote:
 On Thu, Nov 14, 2013 at 3:10 AM, Gao feng gaof...@cn.fujitsu.com wrote:
 On 11/13/2013 03:26 PM, Gao feng wrote:
 On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
 Right now I would rather not have the empty directory exception than
 remove this code.

 The test is a little trickier to write than it might otherwise be
 because /proc and /sys tend to be slightly imperfect filesystems.

 I think the only way to really test that is to call readdir on the
 directory itself :(  I don't like that thought.

 I don't know what I was thinking when I wrote that test but I definitely
 goofed up.  Grr!

 I can certainly filter out any directory with nlink  2.  That would be
 an easy partial step forward.

 The real question though is how do I detect directories it is safe to
 mount on where there will not be files in them.  I can't call iterate
 with the namespace_lock held so things are a bit tricky.


 I know this problem is not easy to be resolved. why not let the user
 make the decision?  maybe we can introduce a new mount option MS_LOCK,
 if user wants to use mount to hide something, he should use mount with
 option MS_LOCK. so the unpriviged user can't umount this filesystem and
 fail to mount the filesystem if one of it's child mount is mounted with
 MS_LOCK option otherwise he use MS_REC too.


 Something like this.

 From 437f33ea366623c7a9d557b2e84cae424876a44f Mon Sep 17 00:00:00 2001
 From: Gao feng gaof...@cn.fujitsu.com
 Date: Wed, 13 Nov 2013 16:06:46 +0800
 Subject: [PATCH] userns: introduce new mount option MS_LOCK

 After commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942
 vfs: Lock in place mounts from more privileged users,
 in userns, the mounts of child mntns which copied from
 parent mntns is locked and user has no rights to umount/move
 them, it's too strict.

 The core purpose of above commit is trying to prevent
 unprivileged user from accessing files hidden by mount.
 This patch introduces a new mount option MS_LOCK, this
 gives user the capable to mount filesystem as the type
 of lock if he wants to use mount to hide something.

 
 This is bad -- if something was secure in old kernels, it needs to
 stay secure.  If you had MS_NOT_A_LOCK, that would be okay, but it
 might not solve your problem.
 

what you mean old kernels here? I saw patch vfs: Lock in place mounts from 
more privileged users
is merged into upstream in linux 3.12-rc1, this is not very old. I think there
are not many userspace processes rely on this feature.

If user think host needs to be secure, he should use MS_LOCK to mount 
filesystem.
we can't make decison for user.

Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-14 Thread Gao feng
On 11/15/2013 12:54 PM, Eric W. Biederman wrote:
 Gao feng gaof...@cn.fujitsu.com writes:
 
 On 11/15/2013 12:54 AM, Andy Lutomirski wrote:
 On Thu, Nov 14, 2013 at 3:10 AM, Gao feng gaof...@cn.fujitsu.com wrote:
 On 11/13/2013 03:26 PM, Gao feng wrote:
 On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
 Right now I would rather not have the empty directory exception than
 remove this code.

 The test is a little trickier to write than it might otherwise be
 because /proc and /sys tend to be slightly imperfect filesystems.

 I think the only way to really test that is to call readdir on the
 directory itself :(  I don't like that thought.

 I don't know what I was thinking when I wrote that test but I definitely
 goofed up.  Grr!

 I can certainly filter out any directory with nlink  2.  That would be
 an easy partial step forward.

 The real question though is how do I detect directories it is safe to
 mount on where there will not be files in them.  I can't call iterate
 with the namespace_lock held so things are a bit tricky.


 I know this problem is not easy to be resolved. why not let the user
 make the decision?  maybe we can introduce a new mount option MS_LOCK,
 if user wants to use mount to hide something, he should use mount with
 option MS_LOCK. so the unpriviged user can't umount this filesystem and
 fail to mount the filesystem if one of it's child mount is mounted with
 MS_LOCK option otherwise he use MS_REC too.


 Something like this.

 From 437f33ea366623c7a9d557b2e84cae424876a44f Mon Sep 17 00:00:00 2001
 From: Gao feng gaof...@cn.fujitsu.com
 Date: Wed, 13 Nov 2013 16:06:46 +0800
 Subject: [PATCH] userns: introduce new mount option MS_LOCK

 After commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942
 vfs: Lock in place mounts from more privileged users,
 in userns, the mounts of child mntns which copied from
 parent mntns is locked and user has no rights to umount/move
 them, it's too strict.

 The core purpose of above commit is trying to prevent
 unprivileged user from accessing files hidden by mount.
 This patch introduces a new mount option MS_LOCK, this
 gives user the capable to mount filesystem as the type
 of lock if he wants to use mount to hide something.


 This is bad -- if something was secure in old kernels, it needs to
 stay secure.  If you had MS_NOT_A_LOCK, that would be okay, but it
 might not solve your problem.


 what you mean old kernels here? I saw patch vfs: Lock in place mounts from 
 more privileged users
 is merged into upstream in linux 3.12-rc1, this is not very old. I think 
 there
 are not many userspace processes rely on this feature.
 
 Sort of true.  Most people aren't that silly.  This feature was added to
 defend against a theoretical attack that you can use with mount
 namespaces.
 
 In particular the scenario we are concerned with is:
 
 Suppose the file system looks like:
 
 Suppose there are two filesystems a and b that look like:
 
 a:/usr/
 a:/usr/my_very_secret_file
 a:/dev/
 a:/etc/
 a:/lib/
 
 b:/bin/
 b:/etc/
 b:/games/
 b:/include/
 b:/lib/
 b:/lib32/
 b:/local/
 b:/sbin/
 b:/share/
 b:/src/
 
 And filesystem b is mounted on a:/usr hiding a:/usr/my_very_secret_file
 
 So the filesystem looks like:
 
 /usr/
 /usr/bin/
 /usr/etc/
 /usr/games/
 /usr/include/
 /usr/lib/
 /usr/lib32/
 /usr/local/
 /usr/sbin/
 /usr/share/
 /usr/src/
 /dev/
 /etc/
 /lib/
 
 Without locking mounts into place an unprivileged user can clone the
 mount namespace and do umount /usr and read /usr/my_very_secret_file.
 
 Most systems don't hide sensitive things with mounts but it is very
 possible and guarding against is fairly cheap and easy.  And while a
 little annoying it should not be a large impediment to unprivileged user
 of the user namespace because pivot root still works.
 
 This thread started talking about bugs in fs_fully_visible.  And those
 bugs are fixable and I aim to get to them shortly.  At the very least
 I can lie and test for nlink = 2 which fixes the regression in mounting
 proc.
 
 Then I can write the fun version that takes references and drops locks
 so it can call the internal version of readdir to see if a directory is
 actually empty.
 
 But the principle remains the same we really don't want to reveal
 anything that is hidden under a mount on purpose or by mistake.  Just
 because then we don't have to think about those things from a security
 point of view making everyone's life easier.
 

Ok,I agree with you that we should make container security by default.

What's your idea that introduces option MS_NOT_A_LOCK just like Andy's
advisement?

In libvirt, host creates dev and devpts directories for container,then
mount devpts, tmpfs on them and create device nodes inside these dirs
for container. and then in container, these filesystems are moved to
container's /dev/ /dev/pts directory. We really have no need to lock
these mounts. they are just created for container.

Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-kernel

[PATCH] userns: allow privileged user to operate locked mount

2013-11-12 Thread Gao feng
Privileged user should have rights to mount/umount/move
these even locked mount.

Signed-off-by: Gao feng 
---
 fs/namespace.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index da5c494..7097fc7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1297,6 +1297,11 @@ static inline bool may_mount(void)
return ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN);
 }
 
+static inline bool may_mount_lock(struct mount *mnt)
+{
+   return !(mnt->mnt.mnt_flags & MNT_LOCKED) || capable(CAP_SYS_ADMIN);
+}
+
 /*
  * Now umount can handle mount points as well as block devices.
  * This is important for filesystems which use unnamed block devices.
@@ -1330,7 +1335,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
goto dput_and_out;
if (!check_mnt(mnt))
goto dput_and_out;
-   if (mnt->mnt.mnt_flags & MNT_LOCKED)
+   if (!may_mount_lock(mnt))
goto dput_and_out;
 
retval = do_umount(mnt, flags);
@@ -1768,7 +1773,8 @@ static int do_loopback(struct path *path, const char 
*old_name,
if (!check_mnt(parent) || !check_mnt(old))
goto out2;
 
-   if (!recurse && has_locked_children(old, old_path.dentry))
+   if (!recurse && has_locked_children(old, old_path.dentry) &&
+   !capable(CAP_SYS_ADMIN))
goto out2;
 
if (recurse)
@@ -1895,7 +1901,7 @@ static int do_move_mount(struct path *path, const char 
*old_name)
if (!check_mnt(p) || !check_mnt(old))
goto out1;
 
-   if (old->mnt.mnt_flags & MNT_LOCKED)
+   if (!may_mount_lock(old))
goto out1;
 
err = -EINVAL;
@@ -2679,7 +2685,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
goto out4;
if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
goto out4;
-   if (new_mnt->mnt.mnt_flags & MNT_LOCKED)
+   if (!may_mount_lock(new_mnt))
goto out4;
error = -ENOENT;
if (d_unlinked(new.dentry))
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-12 Thread Gao feng
On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
> Gao feng  writes:
> 
>> On 11/02/2013 02:06 PM, Gao feng wrote:
>>> Hi Eric,
>>>
>>> On 08/28/2013 05:44 AM, Eric W. Biederman wrote:
>>>>
>>>> Rely on the fact that another flavor of the filesystem is already
>>>> mounted and do not rely on state in the user namespace.
>>>>
>>>> Verify that the mounted filesystem is not covered in any significant
>>>> way.  I would love to verify that the previously mounted filesystem
>>>> has no mounts on top but there are at least the directories
>>>> /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
>>>> for other filesystems to mount on top of.
>>>>
>>>> Refactor the test into a function named fs_fully_visible and call that
>>>> function from the mount routines of proc and sysfs.  This makes this
>>>> test local to the filesystems involved and the results current of when
>>>> the mounts take place, removing a weird threading of the user
>>>> namespace, the mount namespace and the filesystems themselves.
>>>>
>>>> Signed-off-by: "Eric W. Biederman" 
>>>> ---
>>>>  fs/namespace.c |   37 
>>>> +
>>>>  fs/proc/root.c |7 +--
>>>>  fs/sysfs/mount.c   |3 ++-
>>>>  include/linux/fs.h |1 +
>>>>  include/linux/user_namespace.h |4 
>>>>  kernel/user.c  |2 --
>>>>  kernel/user_namespace.c|2 --
>>>>  7 files changed, 33 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>>> index 64627f8..877e427 100644
>>>> --- a/fs/namespace.c
>>>> +++ b/fs/namespace.c
>>>> @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
>>>>return chrooted;
>>>>  }
>>>>  
>>>> -void update_mnt_policy(struct user_namespace *userns)
>>>> +bool fs_fully_visible(struct file_system_type *type)
>>>>  {
>>>>struct mnt_namespace *ns = current->nsproxy->mnt_ns;
>>>>struct mount *mnt;
>>>> +  bool visible = false;
>>>>  
>>>> -  down_read(_sem);
>>>> +  if (unlikely(!ns))
>>>> +  return false;
>>>> +
>>>> +  namespace_lock();
>>>>list_for_each_entry(mnt, >list, mnt_list) {
>>>> -  switch (mnt->mnt.mnt_sb->s_magic) {
>>>> -  case SYSFS_MAGIC:
>>>> -  userns->may_mount_sysfs = true;
>>>> -  break;
>>>> -  case PROC_SUPER_MAGIC:
>>>> -  userns->may_mount_proc = true;
>>>> -  break;
>>>> +  struct mount *child;
>>>> +  if (mnt->mnt.mnt_sb->s_type != type)
>>>> +  continue;
>>>> +
>>>> +  /* This mount is not fully visible if there are any child mounts
>>>> +   * that cover anything except for empty directories.
>>>> +   */
>>>> +  list_for_each_entry(child, >mnt_mounts, mnt_child) {
>>>> +  struct inode *inode = child->mnt_mountpoint->d_inode;
>>>> +  if (!S_ISDIR(inode->i_mode))
>>>> +  goto next;
>>>> +  if (inode->i_nlink != 2)
>>>> +  goto next;
>>>
>>>
>>> I met a problem that proc filesystem failed to mount in user namespace,
>>> The problem is the i_nlink of sysctl entries under proc filesystem is not
>>> 2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
>>> btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
>>> filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
>>> on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
>>> 1.
> 
> Yes. 1 is what filesystems that are too lazy to count the number of
> links to a directory return, and /proc/sys is currently such a
> filesystem.
> 
> Ordinarily nlink == 2 means a directory does not have any subdirectories.
> 
>>> btw, I'm not quite understand what's the inode->i_nlink != 2 here means?
>>> is this directory empty? as I know, when we creat

Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-12 Thread Gao feng
On 11/09/2013 01:42 PM, Eric W. Biederman wrote:
 Gao feng gaof...@cn.fujitsu.com writes:
 
 On 11/02/2013 02:06 PM, Gao feng wrote:
 Hi Eric,

 On 08/28/2013 05:44 AM, Eric W. Biederman wrote:

 Rely on the fact that another flavor of the filesystem is already
 mounted and do not rely on state in the user namespace.

 Verify that the mounted filesystem is not covered in any significant
 way.  I would love to verify that the previously mounted filesystem
 has no mounts on top but there are at least the directories
 /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
 for other filesystems to mount on top of.

 Refactor the test into a function named fs_fully_visible and call that
 function from the mount routines of proc and sysfs.  This makes this
 test local to the filesystems involved and the results current of when
 the mounts take place, removing a weird threading of the user
 namespace, the mount namespace and the filesystems themselves.

 Signed-off-by: Eric W. Biederman ebied...@xmission.com
 ---
  fs/namespace.c |   37 
 +
  fs/proc/root.c |7 +--
  fs/sysfs/mount.c   |3 ++-
  include/linux/fs.h |1 +
  include/linux/user_namespace.h |4 
  kernel/user.c  |2 --
  kernel/user_namespace.c|2 --
  7 files changed, 33 insertions(+), 23 deletions(-)

 diff --git a/fs/namespace.c b/fs/namespace.c
 index 64627f8..877e427 100644
 --- a/fs/namespace.c
 +++ b/fs/namespace.c
 @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
return chrooted;
  }
  
 -void update_mnt_policy(struct user_namespace *userns)
 +bool fs_fully_visible(struct file_system_type *type)
  {
struct mnt_namespace *ns = current-nsproxy-mnt_ns;
struct mount *mnt;
 +  bool visible = false;
  
 -  down_read(namespace_sem);
 +  if (unlikely(!ns))
 +  return false;
 +
 +  namespace_lock();
list_for_each_entry(mnt, ns-list, mnt_list) {
 -  switch (mnt-mnt.mnt_sb-s_magic) {
 -  case SYSFS_MAGIC:
 -  userns-may_mount_sysfs = true;
 -  break;
 -  case PROC_SUPER_MAGIC:
 -  userns-may_mount_proc = true;
 -  break;
 +  struct mount *child;
 +  if (mnt-mnt.mnt_sb-s_type != type)
 +  continue;
 +
 +  /* This mount is not fully visible if there are any child mounts
 +   * that cover anything except for empty directories.
 +   */
 +  list_for_each_entry(child, mnt-mnt_mounts, mnt_child) {
 +  struct inode *inode = child-mnt_mountpoint-d_inode;
 +  if (!S_ISDIR(inode-i_mode))
 +  goto next;
 +  if (inode-i_nlink != 2)
 +  goto next;


 I met a problem that proc filesystem failed to mount in user namespace,
 The problem is the i_nlink of sysctl entries under proc filesystem is not
 2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
 btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
 filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
 on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
 1.
 
 Yes. 1 is what filesystems that are too lazy to count the number of
 links to a directory return, and /proc/sys is currently such a
 filesystem.
 
 Ordinarily nlink == 2 means a directory does not have any subdirectories.
 
 btw, I'm not quite understand what's the inode-i_nlink != 2 here means?
 is this directory empty? as I know, when we create a file(not dir) under
 a dir, the i_nlink of this dir will not increase.

 And another question, it looks like if we don't have proc/sys fs mounted,
 then proc/sys will be failed to be mounted?


 Any Idea?? or should we need to revert this patch??
 
 The patch is mostly doing what it is supposed to be doing.
 
 Now the code is slightly buggy.  inode-i_nlink will test to see if a
 directory has subdirectories but it won't test to see if a directory is
 empty.  Where did my brain go when I was writing that test?
 
 Right now I would rather not have the empty directory exception than
 remove this code.
 
 The test is a little trickier to write than it might otherwise be
 because /proc and /sys tend to be slightly imperfect filesystems.
 
 I think the only way to really test that is to call readdir on the
 directory itself :(  I don't like that thought.
 
 I don't know what I was thinking when I wrote that test but I definitely
 goofed up.  Grr!
 
 I can certainly filter out any directory with nlink  2.  That would be
 an easy partial step forward.
 
 The real question though is how do I detect directories it is safe to
 mount on where there will not be files in them.  I can't call iterate
 with the namespace_lock held so things are a bit tricky.
 

I know this problem is not easy to be resolved. why not let

[PATCH] userns: allow privileged user to operate locked mount

2013-11-12 Thread Gao feng
Privileged user should have rights to mount/umount/move
these even locked mount.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 fs/namespace.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index da5c494..7097fc7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1297,6 +1297,11 @@ static inline bool may_mount(void)
return ns_capable(current-nsproxy-mnt_ns-user_ns, CAP_SYS_ADMIN);
 }
 
+static inline bool may_mount_lock(struct mount *mnt)
+{
+   return !(mnt-mnt.mnt_flags  MNT_LOCKED) || capable(CAP_SYS_ADMIN);
+}
+
 /*
  * Now umount can handle mount points as well as block devices.
  * This is important for filesystems which use unnamed block devices.
@@ -1330,7 +1335,7 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
goto dput_and_out;
if (!check_mnt(mnt))
goto dput_and_out;
-   if (mnt-mnt.mnt_flags  MNT_LOCKED)
+   if (!may_mount_lock(mnt))
goto dput_and_out;
 
retval = do_umount(mnt, flags);
@@ -1768,7 +1773,8 @@ static int do_loopback(struct path *path, const char 
*old_name,
if (!check_mnt(parent) || !check_mnt(old))
goto out2;
 
-   if (!recurse  has_locked_children(old, old_path.dentry))
+   if (!recurse  has_locked_children(old, old_path.dentry) 
+   !capable(CAP_SYS_ADMIN))
goto out2;
 
if (recurse)
@@ -1895,7 +1901,7 @@ static int do_move_mount(struct path *path, const char 
*old_name)
if (!check_mnt(p) || !check_mnt(old))
goto out1;
 
-   if (old-mnt.mnt_flags  MNT_LOCKED)
+   if (!may_mount_lock(old))
goto out1;
 
err = -EINVAL;
@@ -2679,7 +2685,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
goto out4;
if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
goto out4;
-   if (new_mnt-mnt.mnt_flags  MNT_LOCKED)
+   if (!may_mount_lock(new_mnt))
goto out4;
error = -ENOENT;
if (d_unlinked(new.dentry))
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-07 Thread Gao feng
On 11/02/2013 02:06 PM, Gao feng wrote:
> Hi Eric,
> 
> On 08/28/2013 05:44 AM, Eric W. Biederman wrote:
>>
>> Rely on the fact that another flavor of the filesystem is already
>> mounted and do not rely on state in the user namespace.
>>
>> Verify that the mounted filesystem is not covered in any significant
>> way.  I would love to verify that the previously mounted filesystem
>> has no mounts on top but there are at least the directories
>> /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
>> for other filesystems to mount on top of.
>>
>> Refactor the test into a function named fs_fully_visible and call that
>> function from the mount routines of proc and sysfs.  This makes this
>> test local to the filesystems involved and the results current of when
>> the mounts take place, removing a weird threading of the user
>> namespace, the mount namespace and the filesystems themselves.
>>
>> Signed-off-by: "Eric W. Biederman" 
>> ---
>>  fs/namespace.c |   37 +
>>  fs/proc/root.c |7 +--
>>  fs/sysfs/mount.c   |3 ++-
>>  include/linux/fs.h |1 +
>>  include/linux/user_namespace.h |4 
>>  kernel/user.c  |2 --
>>  kernel/user_namespace.c|2 --
>>  7 files changed, 33 insertions(+), 23 deletions(-)
>>
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index 64627f8..877e427 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
>>  return chrooted;
>>  }
>>  
>> -void update_mnt_policy(struct user_namespace *userns)
>> +bool fs_fully_visible(struct file_system_type *type)
>>  {
>>  struct mnt_namespace *ns = current->nsproxy->mnt_ns;
>>  struct mount *mnt;
>> +bool visible = false;
>>  
>> -down_read(_sem);
>> +if (unlikely(!ns))
>> +return false;
>> +
>> +namespace_lock();
>>  list_for_each_entry(mnt, >list, mnt_list) {
>> -switch (mnt->mnt.mnt_sb->s_magic) {
>> -case SYSFS_MAGIC:
>> -userns->may_mount_sysfs = true;
>> -break;
>> -case PROC_SUPER_MAGIC:
>> -userns->may_mount_proc = true;
>> -break;
>> +struct mount *child;
>> +if (mnt->mnt.mnt_sb->s_type != type)
>> +continue;
>> +
>> +/* This mount is not fully visible if there are any child mounts
>> + * that cover anything except for empty directories.
>> + */
>> +list_for_each_entry(child, >mnt_mounts, mnt_child) {
>> +struct inode *inode = child->mnt_mountpoint->d_inode;
>> +if (!S_ISDIR(inode->i_mode))
>> +goto next;
>> +if (inode->i_nlink != 2)
>> +goto next;
> 
> 
> I met a problem that proc filesystem failed to mount in user namespace,
> The problem is the i_nlink of sysctl entries under proc filesystem is not
> 2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
> btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
> filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
> on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
> 1.
> 
> btw, I'm not quite understand what's the inode->i_nlink != 2 here means?
> is this directory empty? as I know, when we create a file(not dir) under
> a dir, the i_nlink of this dir will not increase.
> 
> And another question, it looks like if we don't have proc/sys fs mounted,
> then proc/sys will be failed to be mounted?
> 

Any Idea?? or should we need to revert this patch??

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-07 Thread Gao feng
On 11/02/2013 02:06 PM, Gao feng wrote:
 Hi Eric,
 
 On 08/28/2013 05:44 AM, Eric W. Biederman wrote:

 Rely on the fact that another flavor of the filesystem is already
 mounted and do not rely on state in the user namespace.

 Verify that the mounted filesystem is not covered in any significant
 way.  I would love to verify that the previously mounted filesystem
 has no mounts on top but there are at least the directories
 /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
 for other filesystems to mount on top of.

 Refactor the test into a function named fs_fully_visible and call that
 function from the mount routines of proc and sysfs.  This makes this
 test local to the filesystems involved and the results current of when
 the mounts take place, removing a weird threading of the user
 namespace, the mount namespace and the filesystems themselves.

 Signed-off-by: Eric W. Biederman ebied...@xmission.com
 ---
  fs/namespace.c |   37 +
  fs/proc/root.c |7 +--
  fs/sysfs/mount.c   |3 ++-
  include/linux/fs.h |1 +
  include/linux/user_namespace.h |4 
  kernel/user.c  |2 --
  kernel/user_namespace.c|2 --
  7 files changed, 33 insertions(+), 23 deletions(-)

 diff --git a/fs/namespace.c b/fs/namespace.c
 index 64627f8..877e427 100644
 --- a/fs/namespace.c
 +++ b/fs/namespace.c
 @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
  return chrooted;
  }
  
 -void update_mnt_policy(struct user_namespace *userns)
 +bool fs_fully_visible(struct file_system_type *type)
  {
  struct mnt_namespace *ns = current-nsproxy-mnt_ns;
  struct mount *mnt;
 +bool visible = false;
  
 -down_read(namespace_sem);
 +if (unlikely(!ns))
 +return false;
 +
 +namespace_lock();
  list_for_each_entry(mnt, ns-list, mnt_list) {
 -switch (mnt-mnt.mnt_sb-s_magic) {
 -case SYSFS_MAGIC:
 -userns-may_mount_sysfs = true;
 -break;
 -case PROC_SUPER_MAGIC:
 -userns-may_mount_proc = true;
 -break;
 +struct mount *child;
 +if (mnt-mnt.mnt_sb-s_type != type)
 +continue;
 +
 +/* This mount is not fully visible if there are any child mounts
 + * that cover anything except for empty directories.
 + */
 +list_for_each_entry(child, mnt-mnt_mounts, mnt_child) {
 +struct inode *inode = child-mnt_mountpoint-d_inode;
 +if (!S_ISDIR(inode-i_mode))
 +goto next;
 +if (inode-i_nlink != 2)
 +goto next;
 
 
 I met a problem that proc filesystem failed to mount in user namespace,
 The problem is the i_nlink of sysctl entries under proc filesystem is not
 2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
 btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
 filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
 on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
 1.
 
 btw, I'm not quite understand what's the inode-i_nlink != 2 here means?
 is this directory empty? as I know, when we create a file(not dir) under
 a dir, the i_nlink of this dir will not increase.
 
 And another question, it looks like if we don't have proc/sys fs mounted,
 then proc/sys will be failed to be mounted?
 

Any Idea?? or should we need to revert this patch??

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-06 Thread Gao feng
On 11/06/2013 03:14 AM, Richard Guy Briggs wrote:
> On Tue, Nov 05, 2013 at 04:56:55PM +0800, Gao feng wrote:
>> On 11/05/2013 04:11 PM, Li Zefan wrote:
>>> On 2013/11/5 15:52, Gao feng wrote:
>>>> On 11/05/2013 03:51 PM, Gao feng wrote:
>>>>> Ping...
>>>>
>>>> I want to catch up the merge window..
>>>
>>> Even if your patches are accepted by a certain maintainer immediately,
>>> he will in no doubt queue them for 3.14.
>>
>> Yes, you are right...
>> But I still want to get some comments..
> 
> Definitely won't make it in in this merge window.  I agree it needs
> discussion, but I won't start that for at least another week (I'm
> currently at IETF in Vancouver.).
> 

Get it,hope we can start discussion ASAP.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-06 Thread Gao feng
On 11/06/2013 03:14 AM, Richard Guy Briggs wrote:
 On Tue, Nov 05, 2013 at 04:56:55PM +0800, Gao feng wrote:
 On 11/05/2013 04:11 PM, Li Zefan wrote:
 On 2013/11/5 15:52, Gao feng wrote:
 On 11/05/2013 03:51 PM, Gao feng wrote:
 Ping...

 I want to catch up the merge window..

 Even if your patches are accepted by a certain maintainer immediately,
 he will in no doubt queue them for 3.14.

 Yes, you are right...
 But I still want to get some comments..
 
 Definitely won't make it in in this merge window.  I agree it needs
 discussion, but I won't start that for at least another week (I'm
 currently at IETF in Vancouver.).
 

Get it,hope we can start discussion ASAP.

Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-05 Thread Gao feng
On 11/05/2013 04:11 PM, Li Zefan wrote:
> On 2013/11/5 15:52, Gao feng wrote:
>> On 11/05/2013 03:51 PM, Gao feng wrote:
>>> Ping...
>>>
>>
>> I want to catch up the merge window..
>>
> 
> Even if your patches are accepted by a certain maintainer immediately,
> he will in no doubt queue them for 3.14.
> 

Yes, you are right...
But I still want to get some comments..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-05 Thread Gao feng
On 11/05/2013 04:11 PM, Li Zefan wrote:
 On 2013/11/5 15:52, Gao feng wrote:
 On 11/05/2013 03:51 PM, Gao feng wrote:
 Ping...


 I want to catch up the merge window..

 
 Even if your patches are accepted by a certain maintainer immediately,
 he will in no doubt queue them for 3.14.
 

Yes, you are right...
But I still want to get some comments..

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-04 Thread Gao feng
On 11/05/2013 03:51 PM, Gao feng wrote:
> Ping...
> 

I want to catch up the merge window..

> On 10/31/2013 11:52 AM, Gao feng wrote:
>> Hi Eric Paris,
>>
>> Can you give me some comments?
>>
>> You think the tying audit namespace to user namespace is a bad idea,
>> so this patchset doesn't assign auditns to userns and introduce an
>> new audit netlink type to help to create audit namespace.
>>
>> and this patchset also introduces an new proc interface to make
>> sure container can't influence the whole system.
>>
>> and the audit rules are not namespace aware, all of audit namespaces
>> should comply with the rules. in next step, if we find it's need to
>> make audit rules per audit namespace, then it's the time to do that
>> job.
>>
>> This patchset also makes all of net namespaces have ability to send/
>> receive audit netlink message.
>>
>> I may miss some points, if you find there are some shortage or loophole,
>> please let me know.
>>
>> Thanks!
>>
>> On 10/24/2013 03:31 PM, Gao feng wrote:
>>> Here is the v1 patchset: http://lwn.net/Articles/549546/
>>>
>>> The main target of this patchset is allowing user in audit
>>> namespace to generate the USER_MSG type of audit message,
>>> some userspace tools need to generate audit message, or
>>> these tools will broken.
>>>
>>> And the login process in container may want to setup
>>> /proc//loginuid, right now this value is unalterable
>>> once it being set. this will also broke the login problem
>>> in container. After this patchset, we can reset this loginuid
>>> to zero if task is running in a new audit namespace.
>>>
>>> Same with v1 patchset, in this patchset, only the privileged
>>> user in init_audit_ns and init_user_ns has rights to
>>> add/del audit rules. and these rules are gloabl. all
>>> audit namespace will comply with the rules.
>>>
>>> Compared with v1, v2 patch has some big changes.
>>> 1, the audit namespace is not assigned to user namespace.
>>>since there is no available bit of flags for clone, we
>>>create audit namespace through netlink, patch[18/20]
>>>introduces a new audit netlink type AUDIT_CREATE_NS.
>>>the privileged user in userns has rights to create a
>>>audit namespace, it means the unprivileged user can
>>>create auditns through create userns first. In order
>>>to prevent them from doing harm to host, the default
>>>audit_backlog_limit of un-init-audit-ns is zero(means
>>>audit is unavailable in audit namespace). and it can't
>>>be changed in auditns through netlink.
>>>
>>> 2, introduce /proc//audit_log_limit
>>>this interface is used to setup log_limit of audit
>>>namespace.  we need this interface to make audit
>>>available in un-init-audit-ns. Only the privileged user
>>>has right to set this value, it means only the root user
>>>of host can change it.
>>>
>>> 3, make audit namespace don't depend on net namespace.
>>>patch[1/20] add a compare function audit_compare for
>>>audit netlink, it always return true, it means the
>>>netlink subsystem will find out the netlink socket
>>>only through portid and netlink type. So we needn't
>>>to create kernel side audit netlink socket for per
>>>net namespace, all userspace audit netlink socket
>>>can find out the audit_sock, and audit_sock can
>>>communicate with them through the proper portid.
>>>it's just like the behavior we don't have net
>>>namespace before.
>>>
>>>
>>> This patchset still need some work, such as allow changing
>>> audit_enabled in audit namespace, auditd wants this feature.
>>>
>>> I send this patchset now in order to get more comments, so
>>> I can keep on improving namespace support for audit.
>>>
>>> Gao feng (20):
>>>   Audit: make audit netlink socket net namespace unaware
>>>   audit: introduce configure option CONFIG_AUDIT_NS
>>>   audit: make audit_skb_queue per audit namespace
>>>   audit: make audit_skb_hold_queue per audit namespace
>>>   audit: make audit_pid per audit namespace
>>>   audit: make kauditd_task per audit namespace
>>>   aduit: make audit_nlk_portid per audit namespace
>>>   audit: make kaudit_wait queue per audit namespace
>>>   audit: make audit_backlog_wait per audit namespace

Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-04 Thread Gao feng
Ping...

On 10/31/2013 11:52 AM, Gao feng wrote:
> Hi Eric Paris,
> 
> Can you give me some comments?
> 
> You think the tying audit namespace to user namespace is a bad idea,
> so this patchset doesn't assign auditns to userns and introduce an
> new audit netlink type to help to create audit namespace.
> 
> and this patchset also introduces an new proc interface to make
> sure container can't influence the whole system.
> 
> and the audit rules are not namespace aware, all of audit namespaces
> should comply with the rules. in next step, if we find it's need to
> make audit rules per audit namespace, then it's the time to do that
> job.
> 
> This patchset also makes all of net namespaces have ability to send/
> receive audit netlink message.
> 
> I may miss some points, if you find there are some shortage or loophole,
> please let me know.
> 
> Thanks!
> 
> On 10/24/2013 03:31 PM, Gao feng wrote:
>> Here is the v1 patchset: http://lwn.net/Articles/549546/
>>
>> The main target of this patchset is allowing user in audit
>> namespace to generate the USER_MSG type of audit message,
>> some userspace tools need to generate audit message, or
>> these tools will broken.
>>
>> And the login process in container may want to setup
>> /proc//loginuid, right now this value is unalterable
>> once it being set. this will also broke the login problem
>> in container. After this patchset, we can reset this loginuid
>> to zero if task is running in a new audit namespace.
>>
>> Same with v1 patchset, in this patchset, only the privileged
>> user in init_audit_ns and init_user_ns has rights to
>> add/del audit rules. and these rules are gloabl. all
>> audit namespace will comply with the rules.
>>
>> Compared with v1, v2 patch has some big changes.
>> 1, the audit namespace is not assigned to user namespace.
>>since there is no available bit of flags for clone, we
>>create audit namespace through netlink, patch[18/20]
>>introduces a new audit netlink type AUDIT_CREATE_NS.
>>the privileged user in userns has rights to create a
>>audit namespace, it means the unprivileged user can
>>create auditns through create userns first. In order
>>to prevent them from doing harm to host, the default
>>audit_backlog_limit of un-init-audit-ns is zero(means
>>audit is unavailable in audit namespace). and it can't
>>be changed in auditns through netlink.
>>
>> 2, introduce /proc//audit_log_limit
>>this interface is used to setup log_limit of audit
>>namespace.  we need this interface to make audit
>>available in un-init-audit-ns. Only the privileged user
>>has right to set this value, it means only the root user
>>of host can change it.
>>
>> 3, make audit namespace don't depend on net namespace.
>>patch[1/20] add a compare function audit_compare for
>>audit netlink, it always return true, it means the
>>netlink subsystem will find out the netlink socket
>>only through portid and netlink type. So we needn't
>>to create kernel side audit netlink socket for per
>>net namespace, all userspace audit netlink socket
>>can find out the audit_sock, and audit_sock can
>>communicate with them through the proper portid.
>>it's just like the behavior we don't have net
>>namespace before.
>>
>>
>> This patchset still need some work, such as allow changing
>> audit_enabled in audit namespace, auditd wants this feature.
>>
>> I send this patchset now in order to get more comments, so
>> I can keep on improving namespace support for audit.
>>
>> Gao feng (20):
>>   Audit: make audit netlink socket net namespace unaware
>>   audit: introduce configure option CONFIG_AUDIT_NS
>>   audit: make audit_skb_queue per audit namespace
>>   audit: make audit_skb_hold_queue per audit namespace
>>   audit: make audit_pid per audit namespace
>>   audit: make kauditd_task per audit namespace
>>   aduit: make audit_nlk_portid per audit namespace
>>   audit: make kaudit_wait queue per audit namespace
>>   audit: make audit_backlog_wait per audit namespace
>>   audit: allow un-init audit ns to change pid and portid only
>>   audit: use proper audit namespace in audit_receive_msg
>>   audit: use proper audit_namespace in kauditd_thread
>>   audit: introduce new audit logging interface for audit namespace
>>   audit: pass proper audit namespace to audit_log_common_recv_msg
>>   audit: Log audit pid config change in audit namespace
>>   audit: allow GET,SET,USER MSG o

Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-04 Thread Gao feng
Ping...

On 10/31/2013 11:52 AM, Gao feng wrote:
 Hi Eric Paris,
 
 Can you give me some comments?
 
 You think the tying audit namespace to user namespace is a bad idea,
 so this patchset doesn't assign auditns to userns and introduce an
 new audit netlink type to help to create audit namespace.
 
 and this patchset also introduces an new proc interface to make
 sure container can't influence the whole system.
 
 and the audit rules are not namespace aware, all of audit namespaces
 should comply with the rules. in next step, if we find it's need to
 make audit rules per audit namespace, then it's the time to do that
 job.
 
 This patchset also makes all of net namespaces have ability to send/
 receive audit netlink message.
 
 I may miss some points, if you find there are some shortage or loophole,
 please let me know.
 
 Thanks!
 
 On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/

 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.

 And the login process in container may want to setup
 /proc/pid/loginuid, right now this value is unalterable
 once it being set. this will also broke the login problem
 in container. After this patchset, we can reset this loginuid
 to zero if task is running in a new audit namespace.

 Same with v1 patchset, in this patchset, only the privileged
 user in init_audit_ns and init_user_ns has rights to
 add/del audit rules. and these rules are gloabl. all
 audit namespace will comply with the rules.

 Compared with v1, v2 patch has some big changes.
 1, the audit namespace is not assigned to user namespace.
since there is no available bit of flags for clone, we
create audit namespace through netlink, patch[18/20]
introduces a new audit netlink type AUDIT_CREATE_NS.
the privileged user in userns has rights to create a
audit namespace, it means the unprivileged user can
create auditns through create userns first. In order
to prevent them from doing harm to host, the default
audit_backlog_limit of un-init-audit-ns is zero(means
audit is unavailable in audit namespace). and it can't
be changed in auditns through netlink.

 2, introduce /proc/pid/audit_log_limit
this interface is used to setup log_limit of audit
namespace.  we need this interface to make audit
available in un-init-audit-ns. Only the privileged user
has right to set this value, it means only the root user
of host can change it.

 3, make audit namespace don't depend on net namespace.
patch[1/20] add a compare function audit_compare for
audit netlink, it always return true, it means the
netlink subsystem will find out the netlink socket
only through portid and netlink type. So we needn't
to create kernel side audit netlink socket for per
net namespace, all userspace audit netlink socket
can find out the audit_sock, and audit_sock can
communicate with them through the proper portid.
it's just like the behavior we don't have net
namespace before.


 This patchset still need some work, such as allow changing
 audit_enabled in audit namespace, auditd wants this feature.

 I send this patchset now in order to get more comments, so
 I can keep on improving namespace support for audit.

 Gao feng (20):
   Audit: make audit netlink socket net namespace unaware
   audit: introduce configure option CONFIG_AUDIT_NS
   audit: make audit_skb_queue per audit namespace
   audit: make audit_skb_hold_queue per audit namespace
   audit: make audit_pid per audit namespace
   audit: make kauditd_task per audit namespace
   aduit: make audit_nlk_portid per audit namespace
   audit: make kaudit_wait queue per audit namespace
   audit: make audit_backlog_wait per audit namespace
   audit: allow un-init audit ns to change pid and portid only
   audit: use proper audit namespace in audit_receive_msg
   audit: use proper audit_namespace in kauditd_thread
   audit: introduce new audit logging interface for audit namespace
   audit: pass proper audit namespace to audit_log_common_recv_msg
   audit: Log audit pid config change in audit namespace
   audit: allow GET,SET,USER MSG operations in audit namespace
   nsproxy: don't make create_new_namespaces static
   audit: add new message type AUDIT_CREATE_NS
   audit: make audit_backlog_limit per audit namespace
   audit: introduce /proc/pid/audit_backlog_limit

  fs/proc/base.c  |  53 ++
  include/linux/audit.h   |  26 ++-
  include/linux/audit_namespace.h |  92 ++
  include/linux/nsproxy.h |  15 +-
  include/uapi/linux/audit.h  |   1 +
  init/Kconfig|  10 ++
  kernel/Makefile |   2 +-
  kernel/audit.c  | 364 
 +---
  kernel/audit.h  |   5 +-
  kernel

Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-11-04 Thread Gao feng
On 11/05/2013 03:51 PM, Gao feng wrote:
 Ping...
 

I want to catch up the merge window..

 On 10/31/2013 11:52 AM, Gao feng wrote:
 Hi Eric Paris,

 Can you give me some comments?

 You think the tying audit namespace to user namespace is a bad idea,
 so this patchset doesn't assign auditns to userns and introduce an
 new audit netlink type to help to create audit namespace.

 and this patchset also introduces an new proc interface to make
 sure container can't influence the whole system.

 and the audit rules are not namespace aware, all of audit namespaces
 should comply with the rules. in next step, if we find it's need to
 make audit rules per audit namespace, then it's the time to do that
 job.

 This patchset also makes all of net namespaces have ability to send/
 receive audit netlink message.

 I may miss some points, if you find there are some shortage or loophole,
 please let me know.

 Thanks!

 On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/

 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.

 And the login process in container may want to setup
 /proc/pid/loginuid, right now this value is unalterable
 once it being set. this will also broke the login problem
 in container. After this patchset, we can reset this loginuid
 to zero if task is running in a new audit namespace.

 Same with v1 patchset, in this patchset, only the privileged
 user in init_audit_ns and init_user_ns has rights to
 add/del audit rules. and these rules are gloabl. all
 audit namespace will comply with the rules.

 Compared with v1, v2 patch has some big changes.
 1, the audit namespace is not assigned to user namespace.
since there is no available bit of flags for clone, we
create audit namespace through netlink, patch[18/20]
introduces a new audit netlink type AUDIT_CREATE_NS.
the privileged user in userns has rights to create a
audit namespace, it means the unprivileged user can
create auditns through create userns first. In order
to prevent them from doing harm to host, the default
audit_backlog_limit of un-init-audit-ns is zero(means
audit is unavailable in audit namespace). and it can't
be changed in auditns through netlink.

 2, introduce /proc/pid/audit_log_limit
this interface is used to setup log_limit of audit
namespace.  we need this interface to make audit
available in un-init-audit-ns. Only the privileged user
has right to set this value, it means only the root user
of host can change it.

 3, make audit namespace don't depend on net namespace.
patch[1/20] add a compare function audit_compare for
audit netlink, it always return true, it means the
netlink subsystem will find out the netlink socket
only through portid and netlink type. So we needn't
to create kernel side audit netlink socket for per
net namespace, all userspace audit netlink socket
can find out the audit_sock, and audit_sock can
communicate with them through the proper portid.
it's just like the behavior we don't have net
namespace before.


 This patchset still need some work, such as allow changing
 audit_enabled in audit namespace, auditd wants this feature.

 I send this patchset now in order to get more comments, so
 I can keep on improving namespace support for audit.

 Gao feng (20):
   Audit: make audit netlink socket net namespace unaware
   audit: introduce configure option CONFIG_AUDIT_NS
   audit: make audit_skb_queue per audit namespace
   audit: make audit_skb_hold_queue per audit namespace
   audit: make audit_pid per audit namespace
   audit: make kauditd_task per audit namespace
   aduit: make audit_nlk_portid per audit namespace
   audit: make kaudit_wait queue per audit namespace
   audit: make audit_backlog_wait per audit namespace
   audit: allow un-init audit ns to change pid and portid only
   audit: use proper audit namespace in audit_receive_msg
   audit: use proper audit_namespace in kauditd_thread
   audit: introduce new audit logging interface for audit namespace
   audit: pass proper audit namespace to audit_log_common_recv_msg
   audit: Log audit pid config change in audit namespace
   audit: allow GET,SET,USER MSG operations in audit namespace
   nsproxy: don't make create_new_namespaces static
   audit: add new message type AUDIT_CREATE_NS
   audit: make audit_backlog_limit per audit namespace
   audit: introduce /proc/pid/audit_backlog_limit

  fs/proc/base.c  |  53 ++
  include/linux/audit.h   |  26 ++-
  include/linux/audit_namespace.h |  92 ++
  include/linux/nsproxy.h |  15 +-
  include/uapi/linux/audit.h  |   1 +
  init/Kconfig|  10 ++
  kernel/Makefile |   2 +-
  kernel/audit.c  | 364

Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-02 Thread Gao feng
Hi Eric,

On 08/28/2013 05:44 AM, Eric W. Biederman wrote:
> 
> Rely on the fact that another flavor of the filesystem is already
> mounted and do not rely on state in the user namespace.
> 
> Verify that the mounted filesystem is not covered in any significant
> way.  I would love to verify that the previously mounted filesystem
> has no mounts on top but there are at least the directories
> /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
> for other filesystems to mount on top of.
> 
> Refactor the test into a function named fs_fully_visible and call that
> function from the mount routines of proc and sysfs.  This makes this
> test local to the filesystems involved and the results current of when
> the mounts take place, removing a weird threading of the user
> namespace, the mount namespace and the filesystems themselves.
> 
> Signed-off-by: "Eric W. Biederman" 
> ---
>  fs/namespace.c |   37 +
>  fs/proc/root.c |7 +--
>  fs/sysfs/mount.c   |3 ++-
>  include/linux/fs.h |1 +
>  include/linux/user_namespace.h |4 
>  kernel/user.c  |2 --
>  kernel/user_namespace.c|2 --
>  7 files changed, 33 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 64627f8..877e427 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
>   return chrooted;
>  }
>  
> -void update_mnt_policy(struct user_namespace *userns)
> +bool fs_fully_visible(struct file_system_type *type)
>  {
>   struct mnt_namespace *ns = current->nsproxy->mnt_ns;
>   struct mount *mnt;
> + bool visible = false;
>  
> - down_read(_sem);
> + if (unlikely(!ns))
> + return false;
> +
> + namespace_lock();
>   list_for_each_entry(mnt, >list, mnt_list) {
> - switch (mnt->mnt.mnt_sb->s_magic) {
> - case SYSFS_MAGIC:
> - userns->may_mount_sysfs = true;
> - break;
> - case PROC_SUPER_MAGIC:
> - userns->may_mount_proc = true;
> - break;
> + struct mount *child;
> + if (mnt->mnt.mnt_sb->s_type != type)
> + continue;
> +
> + /* This mount is not fully visible if there are any child mounts
> +  * that cover anything except for empty directories.
> +  */
> + list_for_each_entry(child, >mnt_mounts, mnt_child) {
> + struct inode *inode = child->mnt_mountpoint->d_inode;
> + if (!S_ISDIR(inode->i_mode))
> + goto next;
> + if (inode->i_nlink != 2)
> + goto next;


I met a problem that proc filesystem failed to mount in user namespace,
The problem is the i_nlink of sysctl entries under proc filesystem is not
2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
1.

btw, I'm not quite understand what's the inode->i_nlink != 2 here means?
is this directory empty? as I know, when we create a file(not dir) under
a dir, the i_nlink of this dir will not increase.

And another question, it looks like if we don't have proc/sys fs mounted,
then proc/sys will be failed to be mounted?

Thanks!

>   }
> - if (userns->may_mount_sysfs && userns->may_mount_proc)
> - break;
> + visible = true;
> + goto found;
> + next:   ;
>   }
> - up_read(_sem);
> +found:
> + namespace_unlock();
> + return visible;
>  }
>  
>  static void *mntns_get(struct task_struct *task)
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 38bd5d4..45e5fb7 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -110,8 +110,11 @@ static struct dentry *proc_mount(struct file_system_type 
> *fs_type,
>   ns = task_active_pid_ns(current);
>   options = data;
>  
> - if (!current_user_ns()->may_mount_proc ||
> - !ns_capable(ns->user_ns, CAP_SYS_ADMIN))
> + if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type))
> + return ERR_PTR(-EPERM);
> +
> + /* Does the mounter have privilege over the pid namespace? */
> + if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
>   return ERR_PTR(-EPERM);
>   }
>  
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index afd8327..4a2da3a 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -112,7 +112,8 @@ static struct dentry *sysfs_mount(struct file_system_type 
> *fs_type,
>   struct super_block *sb;
>   

Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted

2013-11-02 Thread Gao feng
Hi Eric,

On 08/28/2013 05:44 AM, Eric W. Biederman wrote:
 
 Rely on the fact that another flavor of the filesystem is already
 mounted and do not rely on state in the user namespace.
 
 Verify that the mounted filesystem is not covered in any significant
 way.  I would love to verify that the previously mounted filesystem
 has no mounts on top but there are at least the directories
 /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
 for other filesystems to mount on top of.
 
 Refactor the test into a function named fs_fully_visible and call that
 function from the mount routines of proc and sysfs.  This makes this
 test local to the filesystems involved and the results current of when
 the mounts take place, removing a weird threading of the user
 namespace, the mount namespace and the filesystems themselves.
 
 Signed-off-by: Eric W. Biederman ebied...@xmission.com
 ---
  fs/namespace.c |   37 +
  fs/proc/root.c |7 +--
  fs/sysfs/mount.c   |3 ++-
  include/linux/fs.h |1 +
  include/linux/user_namespace.h |4 
  kernel/user.c  |2 --
  kernel/user_namespace.c|2 --
  7 files changed, 33 insertions(+), 23 deletions(-)
 
 diff --git a/fs/namespace.c b/fs/namespace.c
 index 64627f8..877e427 100644
 --- a/fs/namespace.c
 +++ b/fs/namespace.c
 @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
   return chrooted;
  }
  
 -void update_mnt_policy(struct user_namespace *userns)
 +bool fs_fully_visible(struct file_system_type *type)
  {
   struct mnt_namespace *ns = current-nsproxy-mnt_ns;
   struct mount *mnt;
 + bool visible = false;
  
 - down_read(namespace_sem);
 + if (unlikely(!ns))
 + return false;
 +
 + namespace_lock();
   list_for_each_entry(mnt, ns-list, mnt_list) {
 - switch (mnt-mnt.mnt_sb-s_magic) {
 - case SYSFS_MAGIC:
 - userns-may_mount_sysfs = true;
 - break;
 - case PROC_SUPER_MAGIC:
 - userns-may_mount_proc = true;
 - break;
 + struct mount *child;
 + if (mnt-mnt.mnt_sb-s_type != type)
 + continue;
 +
 + /* This mount is not fully visible if there are any child mounts
 +  * that cover anything except for empty directories.
 +  */
 + list_for_each_entry(child, mnt-mnt_mounts, mnt_child) {
 + struct inode *inode = child-mnt_mountpoint-d_inode;
 + if (!S_ISDIR(inode-i_mode))
 + goto next;
 + if (inode-i_nlink != 2)
 + goto next;


I met a problem that proc filesystem failed to mount in user namespace,
The problem is the i_nlink of sysctl entries under proc filesystem is not
2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
1.

btw, I'm not quite understand what's the inode-i_nlink != 2 here means?
is this directory empty? as I know, when we create a file(not dir) under
a dir, the i_nlink of this dir will not increase.

And another question, it looks like if we don't have proc/sys fs mounted,
then proc/sys will be failed to be mounted?

Thanks!

   }
 - if (userns-may_mount_sysfs  userns-may_mount_proc)
 - break;
 + visible = true;
 + goto found;
 + next:   ;
   }
 - up_read(namespace_sem);
 +found:
 + namespace_unlock();
 + return visible;
  }
  
  static void *mntns_get(struct task_struct *task)
 diff --git a/fs/proc/root.c b/fs/proc/root.c
 index 38bd5d4..45e5fb7 100644
 --- a/fs/proc/root.c
 +++ b/fs/proc/root.c
 @@ -110,8 +110,11 @@ static struct dentry *proc_mount(struct file_system_type 
 *fs_type,
   ns = task_active_pid_ns(current);
   options = data;
  
 - if (!current_user_ns()-may_mount_proc ||
 - !ns_capable(ns-user_ns, CAP_SYS_ADMIN))
 + if (!capable(CAP_SYS_ADMIN)  !fs_fully_visible(fs_type))
 + return ERR_PTR(-EPERM);
 +
 + /* Does the mounter have privilege over the pid namespace? */
 + if (!ns_capable(ns-user_ns, CAP_SYS_ADMIN))
   return ERR_PTR(-EPERM);
   }
  
 diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
 index afd8327..4a2da3a 100644
 --- a/fs/sysfs/mount.c
 +++ b/fs/sysfs/mount.c
 @@ -112,7 +112,8 @@ static struct dentry *sysfs_mount(struct file_system_type 
 *fs_type,
   struct super_block *sb;
   int error;
  
 - if (!(flags  MS_KERNMOUNT)  !current_user_ns()-may_mount_sysfs)

[PATCH v2] audit: remove useless code in audit_enable

2013-10-31 Thread Gao feng
Since kernel parameter is operated before
initcall, so the audit_initialized must be
AUDIT_UNINITIALIZED or DISABLED in audit_enable.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

change from v1:
convert "printk(KERN_INFO " to "pr_info(".
thanks Richard for pointing out.

diff --git a/kernel/audit.c b/kernel/audit.c
index 8378c5e..7c7c028 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1038,17 +1038,8 @@ static int __init audit_enable(char *str)
if (!audit_default)
audit_initialized = AUDIT_DISABLED;
 
-   printk(KERN_INFO "audit: %s", audit_default ? "enabled" : "disabled");
-
-   if (audit_initialized == AUDIT_INITIALIZED) {
-   audit_enabled = audit_default;
-   audit_ever_enabled |= !!audit_default;
-   } else if (audit_initialized == AUDIT_UNINITIALIZED) {
-   printk(" (after initialization)");
-   } else {
-   printk(" (until reboot)");
-   }
-   printk("\n");
+   pr_info("audit: %s\n", audit_default ?
+   "enabled (after initialization)" : "disabled (until reboot)");
 
return 1;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] audit: remove useless code in audit_enable

2013-10-31 Thread Gao feng
Since kernel parameter is operated before
initcall, so the audit_initialized must be
AUDIT_UNINITIALIZED or DISABLED in audit_enable.

Signed-off-by: Gao feng gaof...@cn.fujitsu.com
---
 kernel/audit.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

change from v1:
convert printk(KERN_INFO  to pr_info(.
thanks Richard for pointing out.

diff --git a/kernel/audit.c b/kernel/audit.c
index 8378c5e..7c7c028 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1038,17 +1038,8 @@ static int __init audit_enable(char *str)
if (!audit_default)
audit_initialized = AUDIT_DISABLED;
 
-   printk(KERN_INFO audit: %s, audit_default ? enabled : disabled);
-
-   if (audit_initialized == AUDIT_INITIALIZED) {
-   audit_enabled = audit_default;
-   audit_ever_enabled |= !!audit_default;
-   } else if (audit_initialized == AUDIT_UNINITIALIZED) {
-   printk( (after initialization));
-   } else {
-   printk( (until reboot));
-   }
-   printk(\n);
+   pr_info(audit: %s\n, audit_default ?
+   enabled (after initialization) : disabled (until reboot));
 
return 1;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-10-30 Thread Gao feng
Hi Eric Paris,

Can you give me some comments?

You think the tying audit namespace to user namespace is a bad idea,
so this patchset doesn't assign auditns to userns and introduce an
new audit netlink type to help to create audit namespace.

and this patchset also introduces an new proc interface to make
sure container can't influence the whole system.

and the audit rules are not namespace aware, all of audit namespaces
should comply with the rules. in next step, if we find it's need to
make audit rules per audit namespace, then it's the time to do that
job.

This patchset also makes all of net namespaces have ability to send/
receive audit netlink message.

I may miss some points, if you find there are some shortage or loophole,
please let me know.

Thanks!

On 10/24/2013 03:31 PM, Gao feng wrote:
> Here is the v1 patchset: http://lwn.net/Articles/549546/
> 
> The main target of this patchset is allowing user in audit
> namespace to generate the USER_MSG type of audit message,
> some userspace tools need to generate audit message, or
> these tools will broken.
> 
> And the login process in container may want to setup
> /proc//loginuid, right now this value is unalterable
> once it being set. this will also broke the login problem
> in container. After this patchset, we can reset this loginuid
> to zero if task is running in a new audit namespace.
> 
> Same with v1 patchset, in this patchset, only the privileged
> user in init_audit_ns and init_user_ns has rights to
> add/del audit rules. and these rules are gloabl. all
> audit namespace will comply with the rules.
> 
> Compared with v1, v2 patch has some big changes.
> 1, the audit namespace is not assigned to user namespace.
>since there is no available bit of flags for clone, we
>create audit namespace through netlink, patch[18/20]
>introduces a new audit netlink type AUDIT_CREATE_NS.
>the privileged user in userns has rights to create a
>audit namespace, it means the unprivileged user can
>create auditns through create userns first. In order
>to prevent them from doing harm to host, the default
>audit_backlog_limit of un-init-audit-ns is zero(means
>audit is unavailable in audit namespace). and it can't
>be changed in auditns through netlink.
> 
> 2, introduce /proc//audit_log_limit
>this interface is used to setup log_limit of audit
>namespace.  we need this interface to make audit
>available in un-init-audit-ns. Only the privileged user
>has right to set this value, it means only the root user
>of host can change it.
> 
> 3, make audit namespace don't depend on net namespace.
>patch[1/20] add a compare function audit_compare for
>audit netlink, it always return true, it means the
>netlink subsystem will find out the netlink socket
>only through portid and netlink type. So we needn't
>to create kernel side audit netlink socket for per
>net namespace, all userspace audit netlink socket
>can find out the audit_sock, and audit_sock can
>communicate with them through the proper portid.
>it's just like the behavior we don't have net
>namespace before.
> 
> 
> This patchset still need some work, such as allow changing
> audit_enabled in audit namespace, auditd wants this feature.
> 
> I send this patchset now in order to get more comments, so
> I can keep on improving namespace support for audit.
> 
> Gao feng (20):
>   Audit: make audit netlink socket net namespace unaware
>   audit: introduce configure option CONFIG_AUDIT_NS
>   audit: make audit_skb_queue per audit namespace
>   audit: make audit_skb_hold_queue per audit namespace
>   audit: make audit_pid per audit namespace
>   audit: make kauditd_task per audit namespace
>   aduit: make audit_nlk_portid per audit namespace
>   audit: make kaudit_wait queue per audit namespace
>   audit: make audit_backlog_wait per audit namespace
>   audit: allow un-init audit ns to change pid and portid only
>   audit: use proper audit namespace in audit_receive_msg
>   audit: use proper audit_namespace in kauditd_thread
>   audit: introduce new audit logging interface for audit namespace
>   audit: pass proper audit namespace to audit_log_common_recv_msg
>   audit: Log audit pid config change in audit namespace
>   audit: allow GET,SET,USER MSG operations in audit namespace
>   nsproxy: don't make create_new_namespaces static
>   audit: add new message type AUDIT_CREATE_NS
>   audit: make audit_backlog_limit per audit namespace
>   audit: introduce /proc//audit_backlog_limit
> 
>  fs/proc/base.c  |  53 ++
>  include/linux/audit.h   |  26 ++-
>  include/linux/audit_namespace.h |  92 ++
>  include/linux/nsproxy.h |  15 +-
>  i

Re: [RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-10-30 Thread Gao feng
Hi Eric Paris,

Can you give me some comments?

You think the tying audit namespace to user namespace is a bad idea,
so this patchset doesn't assign auditns to userns and introduce an
new audit netlink type to help to create audit namespace.

and this patchset also introduces an new proc interface to make
sure container can't influence the whole system.

and the audit rules are not namespace aware, all of audit namespaces
should comply with the rules. in next step, if we find it's need to
make audit rules per audit namespace, then it's the time to do that
job.

This patchset also makes all of net namespaces have ability to send/
receive audit netlink message.

I may miss some points, if you find there are some shortage or loophole,
please let me know.

Thanks!

On 10/24/2013 03:31 PM, Gao feng wrote:
 Here is the v1 patchset: http://lwn.net/Articles/549546/
 
 The main target of this patchset is allowing user in audit
 namespace to generate the USER_MSG type of audit message,
 some userspace tools need to generate audit message, or
 these tools will broken.
 
 And the login process in container may want to setup
 /proc/pid/loginuid, right now this value is unalterable
 once it being set. this will also broke the login problem
 in container. After this patchset, we can reset this loginuid
 to zero if task is running in a new audit namespace.
 
 Same with v1 patchset, in this patchset, only the privileged
 user in init_audit_ns and init_user_ns has rights to
 add/del audit rules. and these rules are gloabl. all
 audit namespace will comply with the rules.
 
 Compared with v1, v2 patch has some big changes.
 1, the audit namespace is not assigned to user namespace.
since there is no available bit of flags for clone, we
create audit namespace through netlink, patch[18/20]
introduces a new audit netlink type AUDIT_CREATE_NS.
the privileged user in userns has rights to create a
audit namespace, it means the unprivileged user can
create auditns through create userns first. In order
to prevent them from doing harm to host, the default
audit_backlog_limit of un-init-audit-ns is zero(means
audit is unavailable in audit namespace). and it can't
be changed in auditns through netlink.
 
 2, introduce /proc/pid/audit_log_limit
this interface is used to setup log_limit of audit
namespace.  we need this interface to make audit
available in un-init-audit-ns. Only the privileged user
has right to set this value, it means only the root user
of host can change it.
 
 3, make audit namespace don't depend on net namespace.
patch[1/20] add a compare function audit_compare for
audit netlink, it always return true, it means the
netlink subsystem will find out the netlink socket
only through portid and netlink type. So we needn't
to create kernel side audit netlink socket for per
net namespace, all userspace audit netlink socket
can find out the audit_sock, and audit_sock can
communicate with them through the proper portid.
it's just like the behavior we don't have net
namespace before.
 
 
 This patchset still need some work, such as allow changing
 audit_enabled in audit namespace, auditd wants this feature.
 
 I send this patchset now in order to get more comments, so
 I can keep on improving namespace support for audit.
 
 Gao feng (20):
   Audit: make audit netlink socket net namespace unaware
   audit: introduce configure option CONFIG_AUDIT_NS
   audit: make audit_skb_queue per audit namespace
   audit: make audit_skb_hold_queue per audit namespace
   audit: make audit_pid per audit namespace
   audit: make kauditd_task per audit namespace
   aduit: make audit_nlk_portid per audit namespace
   audit: make kaudit_wait queue per audit namespace
   audit: make audit_backlog_wait per audit namespace
   audit: allow un-init audit ns to change pid and portid only
   audit: use proper audit namespace in audit_receive_msg
   audit: use proper audit_namespace in kauditd_thread
   audit: introduce new audit logging interface for audit namespace
   audit: pass proper audit namespace to audit_log_common_recv_msg
   audit: Log audit pid config change in audit namespace
   audit: allow GET,SET,USER MSG operations in audit namespace
   nsproxy: don't make create_new_namespaces static
   audit: add new message type AUDIT_CREATE_NS
   audit: make audit_backlog_limit per audit namespace
   audit: introduce /proc/pid/audit_backlog_limit
 
  fs/proc/base.c  |  53 ++
  include/linux/audit.h   |  26 ++-
  include/linux/audit_namespace.h |  92 ++
  include/linux/nsproxy.h |  15 +-
  include/uapi/linux/audit.h  |   1 +
  init/Kconfig|  10 ++
  kernel/Makefile |   2 +-
  kernel/audit.c  | 364 
 +---
  kernel/audit.h  |   5 +-
  kernel/audit_namespace.c| 123 ++
  kernel/auditsc.c

[PATCH 15/20] audit: Log audit pid config change in audit namespace

2013-10-24 Thread Gao feng
This patch allow to log audit config change in
audit namespace.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 92da21d..095f54d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -252,13 +252,15 @@ void audit_log_lost(const char *message)
}
 }
 
-static int audit_log_config_change(char *function_name, int new, int old,
+static int audit_log_config_change(struct audit_namespace *ns,
+  char *function_name,
+  int new, int old,
   int allow_changes)
 {
struct audit_buffer *ab;
int rc = 0;
 
-   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
+   ab = audit_log_start_ns(ns, NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
if (unlikely(!ab))
return rc;
audit_log_format(ab, "%s=%d old=%d", function_name, new, old);
@@ -267,7 +269,7 @@ static int audit_log_config_change(char *function_name, int 
new, int old,
if (rc)
allow_changes = 0; /* Something weird, deny request */
audit_log_format(ab, " res=%d", allow_changes);
-   audit_log_end(ab);
+   audit_log_end_ns(ns, ab);
return rc;
 }
 
@@ -282,7 +284,10 @@ static int audit_do_config_change(char *function_name, int 
*to_change, int new)
allow_changes = 1;
 
if (audit_enabled != AUDIT_OFF) {
-   rc = audit_log_config_change(function_name, new, old, 
allow_changes);
+   rc = audit_log_config_change(_audit_ns,
+function_name,
+new, old,
+allow_changes);
if (rc)
allow_changes = 0;
}
@@ -697,7 +702,10 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
new_pid = task_pid_nr_ns(task, _pid_ns);
 
if (audit_enabled != AUDIT_OFF)
-   audit_log_config_change("audit_pid", new_pid, 
ns->pid, 1);
+   audit_log_config_change(_audit_ns,
+   "audit_pid",
+   new_pid,
+   ns->pid, 1);
 
ns->pid = new_pid;
rcu_read_unlock();
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/20] audit: make audit_skb_queue per audit namespace

2013-10-24 Thread Gao feng
This patch makes audit_skb_queue per audit namespace,
Since we haven't finished the preparations, only
allow user to attach/detach skb to the queue of
init_audit_ns.

Signed-off-by: Gao feng 
---
 include/linux/audit_namespace.h |  3 +++
 kernel/audit.c  | 18 +-
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/linux/audit_namespace.h b/include/linux/audit_namespace.h
index ac22649..65acab1 100644
--- a/include/linux/audit_namespace.h
+++ b/include/linux/audit_namespace.h
@@ -4,11 +4,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct audit_namespace {
atomic_t count;
struct user_namespace *user_ns;
+   struct sk_buff_head queue;
 };
 
 extern struct audit_namespace init_audit_ns;
@@ -26,6 +28,7 @@ void put_audit_ns(struct audit_namespace *ns)
 {
if (atomic_dec_and_test(>count)) {
put_user_ns(ns->user_ns);
+   skb_queue_purge(>queue);
kfree(ns);
}
 }
diff --git a/kernel/audit.c b/kernel/audit.c
index 468950b..f3c2150 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "audit.h"
 
@@ -133,7 +134,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static struct sk_buff_head audit_skb_queue;
 /* queue of skbs to send to auditd when/if it comes back */
 static struct sk_buff_head audit_skb_hold_queue;
 static struct task_struct *kauditd_task;
@@ -446,7 +446,7 @@ static int kauditd_thread(void *dummy)
 
flush_hold_queue();
 
-   skb = skb_dequeue(_skb_queue);
+   skb = skb_dequeue(_audit_ns.queue);
wake_up(_backlog_wait);
if (skb) {
if (audit_pid)
@@ -458,7 +458,7 @@ static int kauditd_thread(void *dummy)
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(_wait, );
 
-   if (!skb_queue_len(_skb_queue)) {
+   if (!skb_queue_len(_audit_ns.queue)) {
try_to_freeze();
schedule();
}
@@ -665,7 +665,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
status_set.rate_limit= audit_rate_limit;
status_set.backlog_limit = audit_backlog_limit;
status_set.lost  = atomic_read(_lost);
-   status_set.backlog   = skb_queue_len(_skb_queue);
+   status_set.backlog   = skb_queue_len(_audit_ns.queue);
audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
 _set, sizeof(status_set));
break;
@@ -911,7 +911,7 @@ static int __init audit_init(void)
else
audit_sock->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
 
-   skb_queue_head_init(_skb_queue);
+   skb_queue_head_init(_audit_ns.queue);
skb_queue_head_init(_skb_hold_queue);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
@@ -1066,7 +1066,7 @@ static void wait_for_auditd(unsigned long sleep_time)
add_wait_queue(_backlog_wait, );
 
if (audit_backlog_limit &&
-   skb_queue_len(_skb_queue) > audit_backlog_limit)
+   skb_queue_len(_audit_ns.queue) > audit_backlog_limit)
schedule_timeout(sleep_time);
 
__set_current_state(TASK_RUNNING);
@@ -1117,7 +1117,7 @@ struct audit_buffer *audit_log_start(struct audit_context 
*ctx, gfp_t gfp_mask,
entries over the normal backlog limit */
 
while (audit_backlog_limit
-  && skb_queue_len(_skb_queue) > audit_backlog_limit + 
reserve) {
+  && skb_queue_len(_audit_ns.queue) > audit_backlog_limit + 
reserve) {
if (gfp_mask & __GFP_WAIT && audit_backlog_wait_time) {
unsigned long sleep_time;
 
@@ -1132,7 +1132,7 @@ struct audit_buffer *audit_log_start(struct audit_context 
*ctx, gfp_t gfp_mask,
printk(KERN_WARNING
   "audit: audit_backlog=%d > "
   "audit_backlog_limit=%d\n",
-  skb_queue_len(_skb_queue),
+  skb_queue_len(_audit_ns.queue),
   audit_backlog_limit);
audit_log_lost("backlog limit exceeded");
audit_backlog_wait_time = audit_backlog_wait_overflow;
@@ -1677,7 +1677,7 @@ void audit_log_end(struct audit_buffer *ab)
nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN;
 
if (audit_pid) {
-   skb_queue_tail(_skb_queue, ab->skb);
+   skb

[PATCH 06/20] audit: make kauditd_task per audit namespace

2013-10-24 Thread Gao feng
kauditd_task is used to send audit netlink messages
to the user space auditd process. Because the netlink
messages are per audit namespace, we should make
kaudit_task per auditns to operate the right netlink
skb.

Signed-off-by: Gao feng 
---
 include/linux/audit_namespace.h | 12 +++
 kernel/audit.c  | 47 +++--
 2 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/include/linux/audit_namespace.h b/include/linux/audit_namespace.h
index fdbb6c1..2c0eede 100644
--- a/include/linux/audit_namespace.h
+++ b/include/linux/audit_namespace.h
@@ -15,6 +15,7 @@ struct audit_namespace {
struct sk_buff_head queue;
/* queue of skbs to send to auditd when/if it comes back */
struct sk_buff_head hold_queue;
+   struct task_struct *kauditd_task;
 };
 
 extern struct audit_namespace init_audit_ns;
@@ -35,6 +36,17 @@ void put_audit_ns(struct audit_namespace *ns)
skb_queue_purge(>queue);
skb_queue_purge(>hold_queue);
kfree(ns);
+   } else if (atomic_read(>count) == 1) {
+   /* If the last user of audit namespace is kauditd,
+* we should wake up kauditd and let it kill itself,
+* Then this audit namespace will be destroyed. */
+   struct task_struct *task;
+
+   rcu_read_lock();
+   task = ACCESS_ONCE(ns->kauditd_task);
+   if (task)
+   wake_up_process(task);
+   rcu_read_unlock();
}
 }
 #else
diff --git a/kernel/audit.c b/kernel/audit.c
index e5e8cb1..ceb1cbd 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -133,7 +133,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static struct task_struct *kauditd_task;
 static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
 static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
 
@@ -410,20 +409,20 @@ static void kauditd_send_skb(struct sk_buff *skb)
  * in 5 years when I want to play with this again I'll see this
  * note and still have no friggin idea what i'm thinking today.
  */
-static void flush_hold_queue(void)
+static void flush_hold_queue(struct audit_namespace *ns)
 {
struct sk_buff *skb;
 
-   if (!audit_default || !init_audit_ns.pid)
+   if (!audit_default || !ns->pid)
return;
 
-   skb = skb_dequeue(_audit_ns.hold_queue);
+   skb = skb_dequeue(>hold_queue);
if (likely(!skb))
return;
 
-   while (skb && init_audit_ns.pid) {
+   while (skb && ns->pid) {
kauditd_send_skb(skb);
-   skb = skb_dequeue(_audit_ns.hold_queue);
+   skb = skb_dequeue(>hold_queue);
}
 
/*
@@ -436,17 +435,25 @@ static void flush_hold_queue(void)
 
 static int kauditd_thread(void *dummy)
 {
+   struct audit_namespace *ns = (struct audit_namespace *)dummy;
+
set_freezable();
while (!kthread_should_stop()) {
struct sk_buff *skb;
DECLARE_WAITQUEUE(wait, current);
 
-   flush_hold_queue();
+   /* Ok, We are the last user of this audit namespace,
+* it's time to go. Kill kauditd thread and release
+* the audit namespace. */
+   if (atomic_read(>count) == 1)
+   break;
+
+   flush_hold_queue(ns);
 
-   skb = skb_dequeue(_audit_ns.queue);
+   skb = skb_dequeue(>queue);
wake_up(_backlog_wait);
if (skb) {
-   if (init_audit_ns.pid)
+   if (ns->pid)
kauditd_send_skb(skb);
else
audit_printk_skb(skb);
@@ -455,7 +462,7 @@ static int kauditd_thread(void *dummy)
set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(_wait, );
 
-   if (!skb_queue_len(_audit_ns.queue)) {
+   if (!skb_queue_len(>queue)) {
try_to_freeze();
schedule();
}
@@ -463,6 +470,9 @@ static int kauditd_thread(void *dummy)
__set_current_state(TASK_RUNNING);
remove_wait_queue(_wait, );
}
+
+   ns->kauditd_task = NULL;
+   put_audit_ns(ns);
return 0;
 }
 
@@ -635,6 +645,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
u16 msg_type = nlh->nlmsg_type;
struct audit_sig_info   *sig_data;
char*ctx = NULL;
+   struct audit_namespace  *ns = current_audit_ns();
u32 len;
 
err = audit_netlink_ok(skb, msg_type);
@@ -643,13 +654,16 @@ static int audit_receive_msg(struct sk_buff *skb, struc

[PATCH 01/20] Audit: make audit netlink socket net namespace unaware

2013-10-24 Thread Gao feng
Add a compare function which always return true for
audit netlink socket, this will cause audit netlink
sockets netns unaware, and no matter which netns the
user space audit netlink sockets belong to, they all
can find out and communicate with audit_sock.

This gets rid of the necessary to create per-netns
audit kernel side socket(audit_sock), it's pain to
depend on and get reference of netns for auditns.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 7b0e23a..468950b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -886,12 +886,18 @@ static void audit_receive(struct sk_buff  *skb)
mutex_unlock(_cmd_mutex);
 }
 
+static bool audit_compare(struct net *net, struct sock *sk)
+{
+   return true;
+}
+
 /* Initialize audit support at boot time. */
 static int __init audit_init(void)
 {
int i;
struct netlink_kernel_cfg cfg = {
.input  = audit_receive,
+   .compare = audit_compare,
};
 
if (audit_initialized == AUDIT_DISABLED)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/20] audit: make audit_backlog_wait per audit namespace

2013-10-24 Thread Gao feng
Tasks are added to audit_backlog_wait when the
audit_skb_queue of audit namespace is full, so
audit_backlog_wait should be per audit namespace too.

Signed-off-by: Gao feng 
---
 include/linux/audit_namespace.h |  1 +
 kernel/audit.c  | 11 +--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/audit_namespace.h b/include/linux/audit_namespace.h
index 2888238..79a9b78 100644
--- a/include/linux/audit_namespace.h
+++ b/include/linux/audit_namespace.h
@@ -20,6 +20,7 @@ struct audit_namespace {
struct sk_buff_head hold_queue;
struct task_struct *kauditd_task;
wait_queue_head_t kauditd_wait;
+   wait_queue_head_t backlog_wait;
 };
 
 extern struct audit_namespace init_audit_ns;
diff --git a/kernel/audit.c b/kernel/audit.c
index a4bfd7f..d7a0993 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -126,8 +126,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
-
 /* Serialize requests from userspace. */
 DEFINE_MUTEX(audit_cmd_mutex);
 
@@ -443,7 +441,7 @@ static int kauditd_thread(void *dummy)
flush_hold_queue(ns);
 
skb = skb_dequeue(>queue);
-   wake_up(_backlog_wait);
+   wake_up(_audit_ns.backlog_wait);
if (skb) {
if (ns->pid)
kauditd_send_skb(skb);
@@ -941,6 +939,7 @@ static int __init audit_init(void)
skb_queue_head_init(_audit_ns.queue);
skb_queue_head_init(_audit_ns.hold_queue);
init_waitqueue_head(_audit_ns.kauditd_wait);
+   init_waitqueue_head(_audit_ns.backlog_wait);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
audit_ever_enabled |= !!audit_default;
@@ -1091,14 +1090,14 @@ static void wait_for_auditd(unsigned long sleep_time)
 {
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
-   add_wait_queue(_backlog_wait, );
+   add_wait_queue(_audit_ns.backlog_wait, );
 
if (audit_backlog_limit &&
skb_queue_len(_audit_ns.queue) > audit_backlog_limit)
schedule_timeout(sleep_time);
 
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(_backlog_wait, );
+   remove_wait_queue(_audit_ns.backlog_wait, );
 }
 
 /* Obtain an audit buffer.  This routine does locking to obtain the
@@ -1164,7 +1163,7 @@ struct audit_buffer *audit_log_start(struct audit_context 
*ctx, gfp_t gfp_mask,
   audit_backlog_limit);
audit_log_lost("backlog limit exceeded");
audit_backlog_wait_time = audit_backlog_wait_overflow;
-   wake_up(_backlog_wait);
+   wake_up(_audit_ns.backlog_wait);
return NULL;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/20] audit: use proper audit_namespace in kauditd_thread

2013-10-24 Thread Gao feng
Signed-off-by: Gao feng 
---
 kernel/audit.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 5524deb..b203017 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -338,11 +338,11 @@ static int audit_set_failure(int state)
  * This only holds messages is audit_default is set, aka booting with audit=1
  * or building your kernel that way.
  */
-static void audit_hold_skb(struct sk_buff *skb)
+static void audit_hold_skb(struct audit_namespace *ns, struct sk_buff *skb)
 {
if (audit_default &&
-   skb_queue_len(_audit_ns.hold_queue) < audit_backlog_limit)
-   skb_queue_tail(_audit_ns.hold_queue, skb);
+   skb_queue_len(>hold_queue) < audit_backlog_limit)
+   skb_queue_tail(>hold_queue, skb);
else
kfree_skb(skb);
 }
@@ -351,7 +351,7 @@ static void audit_hold_skb(struct sk_buff *skb)
  * For one reason or another this nlh isn't getting delivered to the userspace
  * audit daemon, just send it to printk.
  */
-static void audit_printk_skb(struct sk_buff *skb)
+static void audit_printk_skb(struct audit_namespace *ns, struct sk_buff *skb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(skb);
char *data = nlmsg_data(nlh);
@@ -363,22 +363,22 @@ static void audit_printk_skb(struct sk_buff *skb)
audit_log_lost("printk limit exceeded\n");
}
 
-   audit_hold_skb(skb);
+   audit_hold_skb(ns, skb);
 }
 
-static void kauditd_send_skb(struct sk_buff *skb)
+static void kauditd_send_skb(struct audit_namespace *ns, struct sk_buff *skb)
 {
int err;
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
-   err = netlink_unicast(audit_sock, skb, init_audit_ns.portid, 0);
+   err = netlink_unicast(audit_sock, skb, ns->portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
-   printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", 
init_audit_ns.pid);
+   printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", 
ns->pid);
audit_log_lost("auditd disappeared\n");
-   init_audit_ns.pid = 0;
+   ns->pid = 0;
/* we might get lucky and get this in the next auditd */
-   audit_hold_skb(skb);
+   audit_hold_skb(ns, skb);
} else
/* drop the extra reference if sent ok */
consume_skb(skb);
@@ -411,7 +411,7 @@ static void flush_hold_queue(struct audit_namespace *ns)
return;
 
while (skb && ns->pid) {
-   kauditd_send_skb(skb);
+   kauditd_send_skb(ns, skb);
skb = skb_dequeue(>hold_queue);
}
 
@@ -441,16 +441,16 @@ static int kauditd_thread(void *dummy)
flush_hold_queue(ns);
 
skb = skb_dequeue(>queue);
-   wake_up(_audit_ns.backlog_wait);
+   wake_up(>backlog_wait);
if (skb) {
if (ns->pid)
-   kauditd_send_skb(skb);
+   kauditd_send_skb(ns, skb);
else
-   audit_printk_skb(skb);
+   audit_printk_skb(ns, skb);
continue;
}
set_current_state(TASK_INTERRUPTIBLE);
-   add_wait_queue(_audit_ns.kauditd_wait, );
+   add_wait_queue(>kauditd_wait, );
 
if (!skb_queue_len(>queue)) {
try_to_freeze();
@@ -458,7 +458,7 @@ static int kauditd_thread(void *dummy)
}
 
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(_audit_ns.kauditd_wait, );
+   remove_wait_queue(>kauditd_wait, );
}
 
ns->kauditd_task = NULL;
@@ -1713,7 +1713,7 @@ void audit_log_end(struct audit_buffer *ab)
skb_queue_tail(_audit_ns.queue, ab->skb);
wake_up_interruptible(_audit_ns.kauditd_wait);
} else {
-   audit_printk_skb(ab->skb);
+   audit_printk_skb(_audit_ns, ab->skb);
}
ab->skb = NULL;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/20] audit: make audit_pid per audit namespace

2013-10-24 Thread Gao feng
Every audit namespace has its own auditd process,
so audit_pid should be per audit namespace too.

Since some places such as audit_filter_syscall
use audit_pid to identify if the task is auditd,
so we should store auditd's pid which for init
pid namespace not for current pid ns.

Signed-off-by: Gao feng 
---
 include/linux/audit_namespace.h |  2 ++
 kernel/audit.c  | 43 ++---
 kernel/audit.h  |  5 ++---
 kernel/auditsc.c|  6 +++---
 4 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/include/linux/audit_namespace.h b/include/linux/audit_namespace.h
index 9b7b4c5..fdbb6c1 100644
--- a/include/linux/audit_namespace.h
+++ b/include/linux/audit_namespace.h
@@ -9,6 +9,8 @@
 
 struct audit_namespace {
atomic_t count;
+   /* pid of the auditd process */
+   int pid;
struct user_namespace *user_ns;
struct sk_buff_head queue;
/* queue of skbs to send to auditd when/if it comes back */
diff --git a/kernel/audit.c b/kernel/audit.c
index 51ee701..e5e8cb1 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -94,7 +94,6 @@ static intaudit_failure = AUDIT_FAIL_PRINTK;
  * contains the pid of the auditd process and audit_nlk_portid contains
  * the portid to use to send netlink messages to that process.
  */
-intaudit_pid;
 static int audit_nlk_portid;
 
 /* If audit_rate_limit is non-zero, limit the rate of sending audit records
@@ -187,7 +186,7 @@ void audit_panic(const char *message)
break;
case AUDIT_FAIL_PANIC:
/* test audit_pid since printk is always losey, why bother? */
-   if (audit_pid)
+   if (init_audit_ns.pid)
panic("audit: %s\n", message);
break;
}
@@ -386,9 +385,9 @@ static void kauditd_send_skb(struct sk_buff *skb)
err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
-   printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", 
audit_pid);
+   printk(KERN_ERR "audit: *NO* daemon at audit_pid=%d\n", 
init_audit_ns.pid);
audit_log_lost("auditd disappeared\n");
-   audit_pid = 0;
+   init_audit_ns.pid = 0;
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
} else
@@ -415,14 +414,14 @@ static void flush_hold_queue(void)
 {
struct sk_buff *skb;
 
-   if (!audit_default || !audit_pid)
+   if (!audit_default || !init_audit_ns.pid)
return;
 
skb = skb_dequeue(_audit_ns.hold_queue);
if (likely(!skb))
return;
 
-   while (skb && audit_pid) {
+   while (skb && init_audit_ns.pid) {
kauditd_send_skb(skb);
skb = skb_dequeue(_audit_ns.hold_queue);
}
@@ -447,7 +446,7 @@ static int kauditd_thread(void *dummy)
skb = skb_dequeue(_audit_ns.queue);
wake_up(_backlog_wait);
if (skb) {
-   if (audit_pid)
+   if (init_audit_ns.pid)
kauditd_send_skb(skb);
else
audit_printk_skb(skb);
@@ -659,11 +658,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
case AUDIT_GET:
status_set.enabled   = audit_enabled;
status_set.failure   = audit_failure;
-   status_set.pid   = audit_pid;
+   status_set.pid   = init_audit_ns.pid;
status_set.rate_limit= audit_rate_limit;
status_set.backlog_limit = audit_backlog_limit;
status_set.lost  = atomic_read(_lost);
status_set.backlog   = skb_queue_len(_audit_ns.queue);
+
+   rcu_read_lock();
+   {
+   struct task_struct *task;
+
+   task = find_task_by_pid_ns(ns->pid, _pid_ns);
+
+   if (task)
+   status_set.pid = task_pid_vnr(task);
+   }
+   rcu_read_unlock();
audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
 _set, sizeof(status_set));
break;
@@ -683,10 +693,20 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
}
if (status_get->mask & AUDIT_STATUS_PID) {
int new_pid = status_get->pid;
+   struct task_struct *task;
+
+   rcu_read_lock();
+   task = find_task_by_vpid(new_pid);
+
+   if (task

[PATCH 02/20] audit: introduce configure option CONFIG_AUDIT_NS

2013-10-24 Thread Gao feng
This patch adds a new field audit_ns for struct
nsproxy, so task can access the audit_ns through
task->nsproxy->audit_ns.

Right now, we don't support create new audit_ns,
all tasks's audit_ns will point to the init_audit_ns.
next patches will add the feature creating new
audit namespace.

Signed-off-by: Gao feng 
---
 include/linux/audit_namespace.h | 51 +
 include/linux/nsproxy.h | 11 +
 init/Kconfig| 10 
 kernel/Makefile |  2 +-
 kernel/audit_namespace.c|  8 +++
 kernel/nsproxy.c| 16 -
 6 files changed, 91 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/audit_namespace.h
 create mode 100644 kernel/audit_namespace.c

diff --git a/include/linux/audit_namespace.h b/include/linux/audit_namespace.h
new file mode 100644
index 000..ac22649
--- /dev/null
+++ b/include/linux/audit_namespace.h
@@ -0,0 +1,51 @@
+#ifndef __LINUX_AUDIT_NAMESPACE_H
+#define __LINUX_AUDIT_NAMESPACE_H
+
+#include 
+#include 
+#include 
+#include 
+
+struct audit_namespace {
+   atomic_t count;
+   struct user_namespace *user_ns;
+};
+
+extern struct audit_namespace init_audit_ns;
+
+#if defined(CONFIG_AUDIT_NS)
+static inline
+struct audit_namespace *get_audit_ns(struct audit_namespace *ns)
+{
+   atomic_inc(>count);
+   return ns;
+}
+
+static inline
+void put_audit_ns(struct audit_namespace *ns)
+{
+   if (atomic_dec_and_test(>count)) {
+   put_user_ns(ns->user_ns);
+   kfree(ns);
+   }
+}
+#else
+static inline
+struct audit_namespace *get_audit_ns(struct audit_namespace *ns)
+{
+   return ns;
+}
+
+static inline
+void put_audit_ns(struct audit_namespace *ns)
+{
+
+}
+#endif
+
+static inline struct
+audit_namespace *copy_audit_ns(struct audit_namespace *audit)
+{
+   return get_audit_ns(audit);
+}
+#endif
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index b4ec59d..dc7af11 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -28,11 +28,12 @@ struct fs_struct;
  */
 struct nsproxy {
atomic_t count;
-   struct uts_namespace *uts_ns;
-   struct ipc_namespace *ipc_ns;
-   struct mnt_namespace *mnt_ns;
-   struct pid_namespace *pid_ns_for_children;
-   struct net   *net_ns;
+   struct uts_namespace*uts_ns;
+   struct ipc_namespace*ipc_ns;
+   struct mnt_namespace*mnt_ns;
+   struct pid_namespace*pid_ns_for_children;
+   struct net  *net_ns;
+   struct audit_namespace  *audit_ns;
 };
 extern struct nsproxy init_nsproxy;
 
diff --git a/init/Kconfig b/init/Kconfig
index 3ecd8a1..05e3d2c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1154,6 +1154,16 @@ config NET_NS
  Allow user space to create what appear to be multiple instances
  of the network stack.
 
+config AUDIT_NS
+   bool "Audit namespace"
+   depends on AUDIT
+   default n
+   help
+ Support audit namespace.  This allows processes write audit message
+ to the audit namespace they belong to.
+
+ If unsure, say N.
+
 endif # NAMESPACES
 
 config UIDGID_STRICT_TYPE_CHECKS
diff --git a/kernel/Makefile b/kernel/Makefile
index 1ce4755..6e64333 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -71,7 +71,7 @@ obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
 obj-$(CONFIG_SMP) += stop_machine.o
 obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
-obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
+obj-$(CONFIG_AUDIT) += audit.o auditfilter.o audit_namespace.o
 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
 obj-$(CONFIG_AUDIT_WATCH) += audit_watch.o
 obj-$(CONFIG_AUDIT_TREE) += audit_tree.o
diff --git a/kernel/audit_namespace.c b/kernel/audit_namespace.c
new file mode 100644
index 000..6d9cb8f
--- /dev/null
+++ b/kernel/audit_namespace.c
@@ -0,0 +1,8 @@
+#include 
+#include 
+
+struct audit_namespace init_audit_ns = {
+   .count = ATOMIC_INIT(1),
+   .user_ns = _user_ns,
+};
+EXPORT_SYMBOL_GPL(init_audit_ns);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 8e78110..e8374aa 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -39,6 +40,9 @@ struct nsproxy init_nsproxy = {
 #ifdef CONFIG_NET
.net_ns = _net,
 #endif
+#ifdef CONFIG_AUDIT
+   .audit_ns   = _audit_ns,
+#endif
 };
 
 static inline struct nsproxy *create_nsproxy(void)
@@ -98,8 +102,16 @@ static struct nsproxy *create_new_namespaces(unsigned long 
flags,
goto out_net;
}
 
-   return new_nsp;
+   new_nsp->audit_ns = copy_audit_ns(tsk->nsproxy->audit_ns);
+   if (IS_ERR(new_nsp->audit_ns)) {
+   err = PTR_ERR(new_nsp->audit_ns);
+   goto out_audit;
+ 

[PATCH 04/20] audit: make audit_skb_hold_queue per audit namespace

2013-10-24 Thread Gao feng
This patch makes audit_skb_hold_queue per audit namespace.

Signed-off-by: Gao feng 
---
 include/linux/audit_namespace.h |  3 +++
 kernel/audit.c  | 12 +---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/audit_namespace.h b/include/linux/audit_namespace.h
index 65acab1..9b7b4c5 100644
--- a/include/linux/audit_namespace.h
+++ b/include/linux/audit_namespace.h
@@ -11,6 +11,8 @@ struct audit_namespace {
atomic_t count;
struct user_namespace *user_ns;
struct sk_buff_head queue;
+   /* queue of skbs to send to auditd when/if it comes back */
+   struct sk_buff_head hold_queue;
 };
 
 extern struct audit_namespace init_audit_ns;
@@ -29,6 +31,7 @@ void put_audit_ns(struct audit_namespace *ns)
if (atomic_dec_and_test(>count)) {
put_user_ns(ns->user_ns);
skb_queue_purge(>queue);
+   skb_queue_purge(>hold_queue);
kfree(ns);
}
 }
diff --git a/kernel/audit.c b/kernel/audit.c
index f3c2150..51ee701 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -134,8 +134,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-/* queue of skbs to send to auditd when/if it comes back */
-static struct sk_buff_head audit_skb_hold_queue;
 static struct task_struct *kauditd_task;
 static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
 static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
@@ -355,8 +353,8 @@ static int audit_set_failure(int state)
 static void audit_hold_skb(struct sk_buff *skb)
 {
if (audit_default &&
-   skb_queue_len(_skb_hold_queue) < audit_backlog_limit)
-   skb_queue_tail(_skb_hold_queue, skb);
+   skb_queue_len(_audit_ns.hold_queue) < audit_backlog_limit)
+   skb_queue_tail(_audit_ns.hold_queue, skb);
else
kfree_skb(skb);
 }
@@ -420,13 +418,13 @@ static void flush_hold_queue(void)
if (!audit_default || !audit_pid)
return;
 
-   skb = skb_dequeue(_skb_hold_queue);
+   skb = skb_dequeue(_audit_ns.hold_queue);
if (likely(!skb))
return;
 
while (skb && audit_pid) {
kauditd_send_skb(skb);
-   skb = skb_dequeue(_skb_hold_queue);
+   skb = skb_dequeue(_audit_ns.hold_queue);
}
 
/*
@@ -912,7 +910,7 @@ static int __init audit_init(void)
audit_sock->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
 
skb_queue_head_init(_audit_ns.queue);
-   skb_queue_head_init(_skb_hold_queue);
+   skb_queue_head_init(_audit_ns.hold_queue);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
audit_ever_enabled |= !!audit_default;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC Part1 PATCH 00/20 v2] Add namespace support for audit

2013-10-24 Thread Gao feng
Here is the v1 patchset: http://lwn.net/Articles/549546/

The main target of this patchset is allowing user in audit
namespace to generate the USER_MSG type of audit message,
some userspace tools need to generate audit message, or
these tools will broken.

And the login process in container may want to setup
/proc//loginuid, right now this value is unalterable
once it being set. this will also broke the login problem
in container. After this patchset, we can reset this loginuid
to zero if task is running in a new audit namespace.

Same with v1 patchset, in this patchset, only the privileged
user in init_audit_ns and init_user_ns has rights to
add/del audit rules. and these rules are gloabl. all
audit namespace will comply with the rules.

Compared with v1, v2 patch has some big changes.
1, the audit namespace is not assigned to user namespace.
   since there is no available bit of flags for clone, we
   create audit namespace through netlink, patch[18/20]
   introduces a new audit netlink type AUDIT_CREATE_NS.
   the privileged user in userns has rights to create a
   audit namespace, it means the unprivileged user can
   create auditns through create userns first. In order
   to prevent them from doing harm to host, the default
   audit_backlog_limit of un-init-audit-ns is zero(means
   audit is unavailable in audit namespace). and it can't
   be changed in auditns through netlink.

2, introduce /proc//audit_log_limit
   this interface is used to setup log_limit of audit
   namespace.  we need this interface to make audit
   available in un-init-audit-ns. Only the privileged user
   has right to set this value, it means only the root user
   of host can change it.

3, make audit namespace don't depend on net namespace.
   patch[1/20] add a compare function audit_compare for
   audit netlink, it always return true, it means the
   netlink subsystem will find out the netlink socket
   only through portid and netlink type. So we needn't
   to create kernel side audit netlink socket for per
   net namespace, all userspace audit netlink socket
   can find out the audit_sock, and audit_sock can
   communicate with them through the proper portid.
   it's just like the behavior we don't have net
   namespace before.


This patchset still need some work, such as allow changing
audit_enabled in audit namespace, auditd wants this feature.

I send this patchset now in order to get more comments, so
I can keep on improving namespace support for audit.

Gao feng (20):
  Audit: make audit netlink socket net namespace unaware
  audit: introduce configure option CONFIG_AUDIT_NS
  audit: make audit_skb_queue per audit namespace
  audit: make audit_skb_hold_queue per audit namespace
  audit: make audit_pid per audit namespace
  audit: make kauditd_task per audit namespace
  aduit: make audit_nlk_portid per audit namespace
  audit: make kaudit_wait queue per audit namespace
  audit: make audit_backlog_wait per audit namespace
  audit: allow un-init audit ns to change pid and portid only
  audit: use proper audit namespace in audit_receive_msg
  audit: use proper audit_namespace in kauditd_thread
  audit: introduce new audit logging interface for audit namespace
  audit: pass proper audit namespace to audit_log_common_recv_msg
  audit: Log audit pid config change in audit namespace
  audit: allow GET,SET,USER MSG operations in audit namespace
  nsproxy: don't make create_new_namespaces static
  audit: add new message type AUDIT_CREATE_NS
  audit: make audit_backlog_limit per audit namespace
  audit: introduce /proc//audit_backlog_limit

 fs/proc/base.c  |  53 ++
 include/linux/audit.h   |  26 ++-
 include/linux/audit_namespace.h |  92 ++
 include/linux/nsproxy.h |  15 +-
 include/uapi/linux/audit.h  |   1 +
 init/Kconfig|  10 ++
 kernel/Makefile |   2 +-
 kernel/audit.c  | 364 +---
 kernel/audit.h  |   5 +-
 kernel/audit_namespace.c| 123 ++
 kernel/auditsc.c|   6 +-
 kernel/nsproxy.c|  18 +-
 12 files changed, 561 insertions(+), 154 deletions(-)
 create mode 100644 include/linux/audit_namespace.h
 create mode 100644 kernel/audit_namespace.c

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/20] audit: allow un-init audit ns to change pid and portid only

2013-10-24 Thread Gao feng
Only these two vars are namespace aware.

Signed-off-by: Gao feng 
---
 kernel/audit.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index d7a0993..2132929 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -685,16 +685,6 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (nlh->nlmsg_len < sizeof(struct audit_status))
return -EINVAL;
status_get   = (struct audit_status *)data;
-   if (status_get->mask & AUDIT_STATUS_ENABLED) {
-   err = audit_set_enabled(status_get->enabled);
-   if (err < 0)
-   return err;
-   }
-   if (status_get->mask & AUDIT_STATUS_FAILURE) {
-   err = audit_set_failure(status_get->failure);
-   if (err < 0)
-   return err;
-   }
if (status_get->mask & AUDIT_STATUS_PID) {
int new_pid = status_get->pid;
struct task_struct *task;
@@ -713,6 +703,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
 
init_audit_ns.portid = NETLINK_CB(skb).portid;
}
+
+   /* Right now, only audit_pid and audit_portid are namesapce
+* aware. */
+   if (ns != _audit_ns)
+   return -EPERM;
+
+   if (status_get->mask & AUDIT_STATUS_ENABLED) {
+   err = audit_set_enabled(status_get->enabled);
+   if (err < 0)
+   return err;
+   }
+   if (status_get->mask & AUDIT_STATUS_FAILURE) {
+   err = audit_set_failure(status_get->failure);
+   if (err < 0)
+   return err;
+   }
if (status_get->mask & AUDIT_STATUS_RATE_LIMIT) {
err = audit_set_rate_limit(status_get->rate_limit);
if (err < 0)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/20] audit: use proper audit namespace in audit_receive_msg

2013-10-24 Thread Gao feng
Signed-off-by: Gao feng 
---
 kernel/audit.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 2132929..5524deb 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -662,11 +662,11 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
case AUDIT_GET:
status_set.enabled   = audit_enabled;
status_set.failure   = audit_failure;
-   status_set.pid   = init_audit_ns.pid;
+   status_set.pid   = ns->pid;
status_set.rate_limit= audit_rate_limit;
status_set.backlog_limit = audit_backlog_limit;
status_set.lost  = atomic_read(_lost);
-   status_set.backlog   = skb_queue_len(_audit_ns.queue);
+   status_set.backlog   = skb_queue_len(>queue);
 
rcu_read_lock();
{
@@ -696,12 +696,12 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
new_pid = task_pid_nr_ns(task, _pid_ns);
 
if (audit_enabled != AUDIT_OFF)
-   audit_log_config_change("audit_pid", new_pid, 
init_audit_ns.pid, 1);
+   audit_log_config_change("audit_pid", new_pid, 
ns->pid, 1);
 
-   init_audit_ns.pid = new_pid;
+   ns->pid = new_pid;
rcu_read_unlock();
 
-   init_audit_ns.portid = NETLINK_CB(skb).portid;
+   ns->portid = NETLINK_CB(skb).portid;
}
 
/* Right now, only audit_pid and audit_portid are namesapce
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/20] audit: introduce new audit logging interface for audit namespace

2013-10-24 Thread Gao feng
This interface audit_log_start_ns and audit_log_end_ns
will be used for logging audit logs in audit namespace.

Signed-off-by: Gao feng 
---
 include/linux/audit.h | 26 +--
 kernel/audit.c| 92 ++-
 2 files changed, 77 insertions(+), 41 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 729a4d1..717e1d1 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -43,6 +43,7 @@ struct mq_attr;
 struct mqstat;
 struct audit_watch;
 struct audit_tree;
+struct audit_namespace;
 
 struct audit_krule {
int vers_ops;
@@ -421,10 +422,19 @@ extern __printf(4, 5)
 void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
   const char *fmt, ...);
 
-extern struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t 
gfp_mask, int type);
+extern struct audit_buffer *
+audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type);
+
+extern struct audit_buffer *
+audit_log_start_ns(struct audit_namespace *ns,
+  struct audit_context *ctx,
+  gfp_t gfp_mask, int type);
+
 extern __printf(2, 3)
 void audit_log_format(struct audit_buffer *ab, const char *fmt, ...);
 extern voidaudit_log_end(struct audit_buffer *ab);
+extern voidaudit_log_end_ns(struct audit_namespace *ns,
+struct audit_buffer *ab);
 extern int audit_string_contains_control(const char *string,
  size_t len);
 extern voidaudit_log_n_hex(struct audit_buffer *ab,
@@ -470,8 +480,15 @@ static inline __printf(4, 5)
 void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
   const char *fmt, ...)
 { }
-static inline struct audit_buffer *audit_log_start(struct audit_context *ctx,
-  gfp_t gfp_mask, int type)
+static inline struct audit_buffer *
+audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type)
+{
+   return NULL;
+}
+static inline struct audit_buffer *
+audit_log_start_ns(struct audit_namespace *ns,
+  struct audit_context *ctx,
+  gfp_t gfp_mask, int type)
 {
return NULL;
 }
@@ -480,6 +497,9 @@ void audit_log_format(struct audit_buffer *ab, const char 
*fmt, ...)
 { }
 static inline void audit_log_end(struct audit_buffer *ab)
 { }
+static inline void audit_log_end_ns(struct audit_namespace *ns,
+   struct audit_buffer *ab)
+{ }
 static inline void audit_log_n_hex(struct audit_buffer *ab,
   const unsigned char *buf, size_t len)
 { }
diff --git a/kernel/audit.c b/kernel/audit.c
index b203017..5ac7365 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1092,18 +1092,19 @@ static inline void audit_get_stamp(struct audit_context 
*ctx,
 /*
  * Wait for auditd to drain the queue a little
  */
-static void wait_for_auditd(unsigned long sleep_time)
+static void wait_for_auditd(struct audit_namespace *ns,
+   unsigned long sleep_time)
 {
DECLARE_WAITQUEUE(wait, current);
set_current_state(TASK_UNINTERRUPTIBLE);
-   add_wait_queue(_audit_ns.backlog_wait, );
+   add_wait_queue(>backlog_wait, );
 
if (audit_backlog_limit &&
-   skb_queue_len(_audit_ns.queue) > audit_backlog_limit)
+   skb_queue_len(>queue) > audit_backlog_limit)
schedule_timeout(sleep_time);
 
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(_audit_ns.backlog_wait, );
+   remove_wait_queue(>backlog_wait, );
 }
 
 /* Obtain an audit buffer.  This routine does locking to obtain the
@@ -1113,23 +1114,10 @@ static void wait_for_auditd(unsigned long sleep_time)
  * will be written at syscall exit.  If there is no associated task, tsk
  * should be NULL. */
 
-/**
- * audit_log_start - obtain an audit buffer
- * @ctx: audit_context (may be NULL)
- * @gfp_mask: type of allocation
- * @type: audit message type
- *
- * Returns audit_buffer pointer on success or NULL on error.
- *
- * Obtain an audit buffer.  This routine does locking to obtain the
- * audit buffer, but then no locking is required for calls to
- * audit_log_*format.  If the task (ctx) is a task that is currently in a
- * syscall, then the syscall is marked as auditable and an audit record
- * will be written at syscall exit.  If there is no associated task, then
- * task context (ctx) should be NULL.
- */
-struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
-int type)
+struct audit_buffer *
+audit_log_start_ns(struct audit_namespace *ns,
+  struct audit_context *ctx,
+  gfp_t gfp_mask, int type)
 {
struct audit_buffer *ab = NULL;
st

[PATCH 08/20] audit: make kaudit_wait queue per audit namespace

2013-10-24 Thread Gao feng
kauditd_task is added to the wait queue kaudit_wait when
there is no audit message being generated in audit namespace,
so the kaudit_wait should be per audit namespace too.

Signed-off-by: Gao feng 
---
 include/linux/audit_namespace.h | 2 ++
 kernel/audit.c  | 8 
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/audit_namespace.h b/include/linux/audit_namespace.h
index a9e6a40..2888238 100644
--- a/include/linux/audit_namespace.h
+++ b/include/linux/audit_namespace.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -18,6 +19,7 @@ struct audit_namespace {
/* queue of skbs to send to auditd when/if it comes back */
struct sk_buff_head hold_queue;
struct task_struct *kauditd_task;
+   wait_queue_head_t kauditd_wait;
 };
 
 extern struct audit_namespace init_audit_ns;
diff --git a/kernel/audit.c b/kernel/audit.c
index 37375fb..a4bfd7f 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -126,7 +126,6 @@ static DEFINE_SPINLOCK(audit_freelist_lock);
 static intaudit_freelist_count;
 static LIST_HEAD(audit_freelist);
 
-static DECLARE_WAIT_QUEUE_HEAD(kauditd_wait);
 static DECLARE_WAIT_QUEUE_HEAD(audit_backlog_wait);
 
 /* Serialize requests from userspace. */
@@ -453,7 +452,7 @@ static int kauditd_thread(void *dummy)
continue;
}
set_current_state(TASK_INTERRUPTIBLE);
-   add_wait_queue(_wait, );
+   add_wait_queue(_audit_ns.kauditd_wait, );
 
if (!skb_queue_len(>queue)) {
try_to_freeze();
@@ -461,7 +460,7 @@ static int kauditd_thread(void *dummy)
}
 
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(_wait, );
+   remove_wait_queue(_audit_ns.kauditd_wait, );
}
 
ns->kauditd_task = NULL;
@@ -941,6 +940,7 @@ static int __init audit_init(void)
init_audit_ns.kauditd_task = NULL;
skb_queue_head_init(_audit_ns.queue);
skb_queue_head_init(_audit_ns.hold_queue);
+   init_waitqueue_head(_audit_ns.kauditd_wait);
audit_initialized = AUDIT_INITIALIZED;
audit_enabled = audit_default;
audit_ever_enabled |= !!audit_default;
@@ -1706,7 +1706,7 @@ void audit_log_end(struct audit_buffer *ab)
 
if (init_audit_ns.pid) {
skb_queue_tail(_audit_ns.queue, ab->skb);
-   wake_up_interruptible(_wait);
+   wake_up_interruptible(_audit_ns.kauditd_wait);
} else {
audit_printk_skb(ab->skb);
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >