Re: [PATCH V4 0/8] namespaces: log namespaces per task

2014-08-21 Thread Richard Guy Briggs
On 14/08/21, Aristeu Rozanski wrote:
> Hi Richard,

Hi Aris,

> On Wed, Aug 20, 2014 at 09:09:33PM -0400, Richard Guy Briggs wrote:
> > Is there a way to link serial numbers of namespaces involved in migration 
> > of a
> > container to another kernel?  It sounds like what is needed is a part of a
> > mangement application that is able to pull the audit records from 
> > constituent
> > hosts to build an audit trail of a container.
> 
> since you're introducing a brand new serial number to make it unique
> across different procfs mounts, why not instead of a simple counter,
> use the hash output of say, $hostname-$creation_time-$random?

I had thought of this earlier on, but I could see many VMs started up
from an identical image, making the resulting hash possibly identical.

Besides, hostname isn't known yet when we are creating initial
namespaces.

> Or perhaps
> get a short hash of the hostname (generated once whenever hostname is
> set) and append the serial number you're implementing? It'd be way less human
> readable than your current proposal but it'd be unique "globally" (as long you
> don't have machines with the same hostname migrating containers between them),
> allowing the migrated namespaces to retain their unique identification across
> audit logs. It'd of course be way more costly than just using an atomic 
> counter,
> but could be useful to anything that needs to refer to a namespace and could 
> be
> migrated to another machine.

This also means that any namespace that is migrated would have to be
recreated on another host and inject an existing ID into it rather than
have the host creating it generate that ID.  Some namespaces are peers
that take the kernel default, while others are hierarchical and inherit
from their creating namespaces.

It was much easier at my layer to punt that management to a higher
layer that already knew about the other hosts in play and to manage that
information as it saw fit.

> What you think? Sounds too crazy? :)

Yup.  I was hoping there would be some kind of unique identifier per
running kernel, including CPU_ID (which may not exist or may be shut
off), RTC boot value (which may be identical for VMs), or initial random
state (which could be identical for VMs).

> Aristeu

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V4 0/8] namespaces: log namespaces per task

2014-08-21 Thread Aristeu Rozanski
Hi Richard,
On Wed, Aug 20, 2014 at 09:09:33PM -0400, Richard Guy Briggs wrote:
> Is there a way to link serial numbers of namespaces involved in migration of a
> container to another kernel?  It sounds like what is needed is a part of a
> mangement application that is able to pull the audit records from constituent
> hosts to build an audit trail of a container.

since you're introducing a brand new serial number to make it unique
across different procfs mounts, why not instead of a simple counter,
use the hash output of say, $hostname-$creation_time-$random? Or perhaps
get a short hash of the hostname (generated once whenever hostname is
set) and append the serial number you're implementing? It'd be way less human
readable than your current proposal but it'd be unique "globally" (as long you
don't have machines with the same hostname migrating containers between them),
allowing the migrated namespaces to retain their unique identification across
audit logs. It'd of course be way more costly than just using an atomic counter,
but could be useful to anything that needs to refer to a namespace and could be
migrated to another machine.

What you think? Sounds too crazy? :)

-- 
Aristeu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4 0/8] namespaces: log namespaces per task

2014-08-20 Thread Richard Guy Briggs
The purpose is to track namespace instances in use by logged processes from the
perspective of init_*_ns by assigning each a per-kernel, per-boot serial
number.

1/8 defines a function to generate them and assigns them.

Use a serial number per namespace (unique across one boot of one kernel)
instead of the inode number (which is claimed to have had the right to change
reserved and is not necessarily unique if there is more than one proc fs).  It
could be argued that the inode numbers have now become a defacto interface and
can't change now, but I'm proposing this approach to see if this helps address
some of the objections to the earlier patchset.

2/8 adds access functions to get to the serial numbers in a similar way to
inode access for namespace proc operations.

3/8 implements, as suggested by Serge Hallyn, making these serial numbers
available in /proc/self/ns/{ipc,mnt,net,pid,user,uts}_snum.  I chose "snum"
instead of "seq" for consistency with inum and there are a number of other uses
of "seq" in the namespace code.

4/8 Document proc's ns entries structure in Documentation/filesystems/proc.txt

5/8 exposes proc's ns entries structure which lists a number of useful
operations per namespace type for other subsystems to use.

6/8 provides an example of usage for audit_log_task_info() which is used by
syscall audits, among others.  audit_log_task() and audit_common_recv_message()
would be other potential use cases.

Proposed output format:
This differs slightly from Aristeu's patch because of the label conflict with
"pid=" due to including it in existing records rather than it being a seperate
record.  It has now returned to being a seperate record.  The serial numbers
are printed in hex.
type=NS_INFO msg=audit(1408577535.306:82):  netns=8 utsns=2 ipcns=1 
pidns=4 userns=3 mntns=5

7/8 tracks the creation and deletion of of namespaces, listing the type of
namespace instance, related namespace id if there is one and the newly minted
serial number.

Proposed output format for initial namespace creation:
type=AUDIT_NS_INIT_UTS msg=audit(1408577534.868:5): pid=1 uid=0 
auid=4294967295 ses=4294967295 subj=kernel old_utsns=0 utsns=2 res=1
type=AUDIT_NS_INIT_USER msg=audit(1408577534.868:6): pid=1 uid=0 
auid=4294967295 ses=4294967295 subj=kernel old_userns=0 userns=3 res=1
type=AUDIT_NS_INIT_PID msg=audit(1408577534.868:7): pid=1 uid=0 
auid=4294967295 ses=4294967295 subj=kernel old_pidns=0 pidns=4 res=1
type=AUDIT_NS_INIT_MNT msg=audit(1408577534.868:8): pid=1 uid=0 
auid=4294967295 ses=4294967295 subj=kernel old_mntns=0 mntns=5 res=1
type=AUDIT_NS_INIT_IPC msg=audit(1408577534.868:9): pid=1 uid=0 
auid=4294967295 ses=4294967295 subj=kernel old_ipcns=0 ipcns=1 res=1
type=AUDIT_NS_INIT_NET msg=audit(1408577533.500:10): pid=1 uid=0 
auid=4294967295 ses=4294967295 subj=kernel old_netns=0 netns=7 res=1

And a CLONE action would result in:
type=type=AUDIT_NS_INIT_NET msg=audit(1408577535.306:81): pid=481 uid=0 
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 old_netns=7 
netns=8 res=1
type=type=AUDIT_NS_INIT_MNT msg=audit(1408577535.307:83): pid=481 uid=0 
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 old_mntns=5 
mntns=9 res=1

While deleting a namespace would result in:
type=type=AUDIT_NS_DEL_MNT msg=audit(1408577552.221:85): pid=481 uid=0 
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 mntns=9 res=1

8/8 change audit startup from __initcall to subsys_initcall to get it started
earlier to be able to receive initial namespace log messages.


v3 -> v4:
Seperate out the NS_INFO message from the SYSCALL message.
Moved audit_log_namespace_info() out of audit_log_task_info().
Use a seperate message type per namespace type for each of INIT/DEL.
Make ns= easier to search across NS_INFO and NS_INIT/DEL_XXX msg types.
Add /proc//ns/ documentation.
Fix dynamic initial ns logging.

v2 -> v3:
Use atomic64_t in ns_serial to simplify it.
Avoid funciton duplication in proc, keying on dentry.
Squash down audit patch to avoid rcu sleep issues.
Add tracking for creation and deletion of namespace instances.

v1 -> v2:
Avoid rollover by switching from an int to a long long.
Change rollover behaviour from simply avoiding zero to raising a BUG.
Expose serial numbers in /proc//ns/*_snum.
Expose ns_entries and use it in audit.


Notes:
As for CAP_AUDIT_READ, a patchset has been accepted upstream to check
capabilities of userspace processes that try to join netlink broadcast groups.

This set does not try to solve the non-init namespace audit messages and
auditd problem yet.  That will come later, likely with additional auditd
instances running in another namespace with a limited ability to influence the
master auditd.  I echo Eric B's idea that messages destined for different
namespaces would ha