RE: [RFC]Pid conversion between pid namespace

2014-07-21 Thread chenhanx...@cn.fujitsu.com
Hi,

> -Original Message-
> From: Serge Hallyn [mailto:serge.hal...@ubuntu.com]
> Sent: Tuesday, July 15, 2014 12:16 PM
> To: Chen, Hanxiao/陈 晗霄
> Subject: Re: [RFC]Pid conversion between pid namespace
> > A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > pros:
> > - ns procfs free, easy to use.
> > We could get rid of mounted ns procfs.
> >
> > cons:
> > - may find multiple results in nested ns.
> >   We wished the new API could tell us the exact answer.
> >   But if getnspid return more than one results will bring trouble 
> > to admins,
> 
> (See below for more, but) the question being posed to getnspid has precisely
> one answer.
> 
> >   they had to make another decision.
> >   Or we marked the deepest level for translation as prerequisite.
> >
> > -based on current pidns, no reference ns.
> 
> Hm, no.  The intent here was that
> 
>   observer_pid would be in current ns
>   query_pid would be in observer_pid's ns.
> 
> So this would be ideal for "I got a pid in a logfile created by rsyslog in
> a nested contaner, what is the logged pid in my pidns."
> 
> Taking a set of tasks (like a container with nesting) and bulding a tree
> of all pids shouldn't be too difficult either.  Start with the init pid,
> call getnspid($pid, $init_pid) for every $pid in the container;  to figure
> out whether any $pid is itself a nested init_pid, we can compare the
> /proc/$$/ns/pid, as well as look at getnspid($pid, $pid).
I'm a little confused in this section:

Ex:
init_pid_nsns1 ns2
t1  2
t2   `- 3  1 
t3   `- 4  `- 51
t4   `-6   `-8  `-9
t5 `-10   `-9  `-10

For getnspid($pid, $init_pid),
Does init_pid means container's init_pid such as 3 for t2?

In nested containers, does this syscall work as:
getnspid(9, 4) -> (6, 8, 9) 
9 in ns2, 4 as t3 in init_pid_ns(current ns)

And:
getnspid($pid, $pid)
If pid in host and pid in container is the same by coincidence:
getnspid(10,10) for t5, it may not work.

Thanks,
- Chen
> 
> > B) make/change proc file/directories
> > B-1) expand /proc/pid/status
> > pros:
> > - easy to use and to debug
> > - already had existed interface in kernel
> >
> > cons:
> > - based on current ns
> >   for middle level, we had to make another decision.
> > - do not have hierarchy info.
> >
> > B-2) /proc//ns/proc/ which would contain everything
> > pros:
> > - have enough info from /proc in container
> >
> > cons:
> > - Requirements unclear.
> >   We need more discussion to decide which items should not be 
> > exposed.
> > - do not have hierarchy info.
> >
> >
> > How about do these things in two steps:
> >
> > C)  1. expose all sets of pid, pgid, sid and tgid
> > via expanded /proc/PID/status
> >   We could get translated IDs from container like:
> > NStgid: 16465   5   1
> > NSpid:  16465   5   1
> > NSpgid: 16465   5   1
> > NSsid:  16423   1   0
> > (a set of IDs with 3 level of ns)
> >
> > 2. add hierarchy info under /proc
> >   We lacked of method of getting hierarchy info, which is useful.
> >   Then we could know the relationship of ns.
> >   How about adding a new proc file just under /proc
> >   to show the hierarchy like readlink did:
> >   pid:[4026531836]-> [4026532390] -> [4026532484]
> >   pid:[4026531836]-> [4026532491]
> >   (A 3 level pid and 2 level pid_
> >
> > Any comments would be appreciated.
> >
> > Thanks,
> > - Chen
> >
> > > -Original Message-
> > > Subject: [RFC]Pid conversion between pid namespace
> > >
> > > Hi,
> > >
> > > We had some discussions on how to carry out
> > > pid conversion between pid namespace via:
> > > syscall[1] and procfs[2].
> > >
> > > Pavel suggested that a syscall like
> > > (ID, NS1, NS2) into (ID).
> > >
> > > Serge suggested that a syscall
> > > pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> > >
> > >
> > > Eric and Richard suggested a procfs solution is
> > > more appropriate.
> > >
> > > Oleg suggested that we should expand /proc/pid/status
> > > to report this kind of information.
> > >
> > > And Richard suggested adding a directory like
> > > /proc//ns/proc/ which would contain everything
> > > from /proc//.
> > >
> > > As procfs provided a more user friendly interface,
> > > how about expose all sets of tgid, pid, pgid, sid
> > > by expanding /proc/PID/status in procfs?
> > > And we could also expose ns hierarchy under /proc,
> > > which could be another reference.
> > >
> > > Ex:
> > > init_pid_nsns1 ns2
> > > t1  2
> > > t2   `- 3  1
> > > t3   `- 4  `- 51
> > >
> > > We could get in /proc/t3/status:
> > > NSpid: 4 5 1
> > > We knew that pid 1 in container is pid 4 in init ns.
> > >
> > > And we could get ns hierarchy under /proc/ns_h

RE: [Resend][PATCH] ns,proc: introduce pid_in_ns

2014-05-13 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Saturday, April 26, 2014 3:18 AM
> To: Oleg Nesterov
> Cc: Chen, Hanxiao/陈 晗霄; contain...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; Andrew Morton; Serge Hallyn; Daniel P. Berrange;
> Al Viro; David Howells
> Subject: Re: [Resend][PATCH] ns,proc: introduce pid_in_ns
> 
> Oleg Nesterov  writes:
> 
> > On 04/25, Chen Hanxiao wrote:
> >>
> >> We lacked of convenient method of getting the pid inside containers.
> 
> Are unix domain sockets not convinient?
> 

It's a very good method, but not so directly for just pid translation.

> >> If some issues occurred inside container guest, host user
> >> could not know which process is in trouble just by guest pid:
> >> the users of container guest only knew the pid inside containers.
> >> This will bring obstacle for trouble shooting.
> >>
> >> This patch introduces pid_in_ns:
> >> If one process is in init_pid_ns, /proc/PID/pid_in_ns
> >> equals to /proc/PID;
> >> if one process is in pidns, /proc/PID/pid_in_ns
> >> will tell the pid inside containers;
> >> if pidns is nested, it depends on which pidns are you in.
> >
> > Yes another /proc/pid/ file...
> >
> > Perhaps it would be better to change /proc/pid/status["Pid:"] to report the
> > list of pid_nr's, from its namespace up to the observer's namespace. The 
> > same
> > for "Tgid:".
> >
> > (Hmm. And why "Ngid:" was inserted between tid and tgid ?)
> 
> Add to that Ngid has a completely hosed implementation.  It is a pid
> stored in a pid_t, not a struct pid *.  Sigh.
> 
> I am getting more and more tempted to obliterate task->pid.  It just
> encourages bad code.
> 
> >> +int proc_pid_in_ns(struct seq_file *m, struct pid_namespace *ns,
> >> +  struct pid *pid, struct task_struct *task)
> >> +{
> >> +  pid_t pid_in_ns;
> >> +  unsigned int level;
> >> +
> >> +  level = pid->level;
> >> +  pid_in_ns = task_pid_nr_ns(task, pid->numbers[level].ns);
> >
> > This looks overcomplicated or I missed something?
> 
> I do think if we care we need to print the entire set of pids.
> I don't know if /proc/pid/status is the proper place but ...
> 

Let's print the entire set of pids in /proc/pid/status.

> Eric

Thanks for the comments.
v2 will come soon.

- Chen

N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

RE: [PATCH] ns: introduce getnspid syscall

2014-06-18 Thread chenhanx...@cn.fujitsu.com
Hi,

> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Wednesday, June 18, 2014 9:31 AM
> To: Chen, Hanxiao/陈 晗霄
> Cc: contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> Andrew Morton; Serge Hallyn; Daniel P. Berrange; Oleg Nesterov; Al Viro; David
> Howells; Richard Weinberger; Pavel Emelyanov; Vasiliy Kulikov; Gotou, 
> Yasunori/
> 五�u 康文; Linux API; Michael Kerrisk-manpages
> Subject: Re: [PATCH] ns: introduce getnspid syscall
> 
> Chen Hanxiao  writes:
> 
> > We need a direct method of getting the pid inside containers.
> > If some issues occurred inside container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> 
> There is also some ongoing work to export this information via a proc
> file which seems more appropriate for solving your problem.  Certainly
> for debugging something easily human discoverable is needed.
> 

Do you mean this patch:
/proc/pid/status: show all sets of pid according to ns
https://lkml.org/lkml/2014/5/26/145

But no new comments on this patch,
Pavel suggested that a syscall should be a good choice.
Do we should continue this kind of work?

> > int getnspid(pid_t pid, int fd1, int fd2, int pidtype);
> 
> The pidtype is nonsense.  The translation of a pid does not depend upon
> type.  Using that kind of nonsense will lead you and others into confusion.
> 

I see.

> > pid: the pid number need to be translated.
> >
> > fd: a file descriptor referring to one of
> > the namespace entries in a /proc/[pid]/ns/pid.
> > fd1 for destination ns(ns1), where the pid came from.
> > fd2 for reference ns(ns2), while fd2 = -2 means for current ns.
> >
> > pidtype: 0 PIDTYPE_PID; 1 PIDTYPE_PGID; 2 PIDTYPE_SID.
> >
> > return value:
> > >0: translated pid in ns1(fd1) seen from ns2(fd2).
> > <0: on failure.
> 
> Elsewhere we use 0 on pid translation failure.  Why be different here?
> 

It should be <=0. And <0 means some other failures.

> Eric
> 
> 
> > Signed-off-by: Chen Hanxiao 
> > +
> > +   rcu_read_lock();
> > +   task = find_task_by_pid_ns(pid, ns1);
> 
> The functions you want to be using here are:
> find_pid_ns and pid_nr_ns.
> 

Thanks for your hint.

- Chen


N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

RE: [PATCH] ns: introduce getnspid syscall

2014-06-20 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Oleg Nesterov [mailto:o...@redhat.com]
> Sent: Thursday, June 19, 2014 1:58 AM
> To: Chen, Hanxiao/陈 晗霄
> Cc: contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> Andrew Morton; Eric W. Biederman; Serge Hallyn; Daniel P. Berrange; Al Viro; 
> David
> Howells; Richard Weinberger; Pavel Emelyanov; Vasiliy Kulikov; Gotou, 
> Yasunori/
> 五�u 康文
> Subject: Re: [PATCH] ns: introduce getnspid syscall
> 
> On 06/17, Chen Hanxiao wrote:
> >
> > +SYSCALL_DEFINE4(getnspid, pid_t, pid, int, fd1, int, fd2, int, pidtype)
> > +{
> > +   struct file *file1 = NULL, *file2 = NULL;
> > +   struct task_struct *task;
> > +   struct pid_namespace *ns1, *ns2;
> > +   struct proc_ns *ei;
> > +   int ret = -1;
> > +
> > +   if (pidtype >= PIDTYPE_MAX)
> > +   return -EINVAL;
> > +
> > +   file1 = proc_ns_fget(fd1);
> > +   if (IS_ERR(file1))
> > +   return PTR_ERR(file1);
> > +   ei = get_proc_ns(file_inode(file1));
> > +   ns1 = (struct pid_namespace *)ei->ns;
> 
> and I am not sure this part is correct... shouldn't we also verify that
> ns_ops == pidns_operations ?
> 
You're right. We should check this part.

Thanks,
- Chen

> Perhaps it makes sense to generalize get_net_ns_by_fd() into
> "void *get_ns_by_fd(fd, type)"... this probably needs another "check-and-get"
> method in proc_ns_operations(). I dunno.
> 
> Oleg.

N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

RE: [RFC]Pid conversion between pid namespace

2014-07-25 Thread chenhanx...@cn.fujitsu.com
Hi,

We discussed two ways of pid conversion:
syscall and procfs.

Both of them could do a pid translation job.
But for ns hierarchy, syscall like:

pid_t* getnspid(pid_t query_pid, pid_t observer_pid)
or
pid_t getnspid(pid_t query_pid, int query_fd, int ref_fd)

could not work, we knew a pid lived in one ns, but we
did not know their relationships.
For getting the entire set of pids, both of them can do.

So using procfs is a better way.

Ex:
init_pid_ns ns1 ns2
t1  2
t2   `- 3   1 
t3   `- 4   `- 51
t4   `-6`-8  `-9
t5 `-10`-9  `-10

1. How procfs work:
a) adding a nspid hierarchy  under /proc/ like:
[root@localhost proc]# tree /proc/nspid
/proc/nspid
├── ns0
│└── ns1
│   ├── ns2
│   │   └── pid -> /proc/9/ns
│   └── pid -> /proc/4/ns
└── pid -> /proc/1/ns 

We created dirs and add a link to the 1st process of this ns.

b) expose all sets of pid, pgid, sid and tgid
via expanded /proc/PID/status
  We could get translated IDs from container like:
NStgid: 6   8   9 
NSpid:  6   8   9
NSpgid: 6   8   9 
NSsid:  6   1   0
(a set of IDs with 3 level of ns)

2. Advantage of procfs solution
a) easy to use:
getnspid(6, 10) -> (10, 9, 10)
or
getnspid(10, ns1_fd, ns0_fd) -> 9
getnspid(10, ns2_fd, ns0_fd) -> 10

And we could also get it by:
cat /proc/10/status | grep NSpid:
NSpid:  10  9   10
...

b) hierarchy info:
We could not get the ns hierarchy info by just one syscall.
If we had to, it will complicate the interface.

We could check whether two process had some relations
via procfs:
readlink /proc/PID1/ns/pid -> aaa
readlink /proc/PID2/ns/pid -> bbb

Then we could check /proc/nspid/nsX/nsY/nsZ 
and find out their relationship.
Ex:
We know t4 live in ns2, 
readlink /proc/t4/ns/pid -> AAA
then we refer to /proc/nspid/ and find a same inum AAA under
/proc/nspid/ns0/ns1/ns2
Then we knew that t4 have pid 9 in ns2, have pid 8 in ns1.

Any comments would be warmly welcomed!

Thanks,
- Chen

> -Original Message-
> From: containers-boun...@lists.linux-foundation.org
> [mailto:containers-boun...@lists.linux-foundation.org] On Behalf Of
> chenhanx...@cn.fujitsu.com
> Sent: Wednesday, July 09, 2014 6:34 PM
> To: Eric W. Biederman (ebied...@xmission.com); Serge Hallyn
> (serge.hal...@ubuntu.com); Oleg Nesterov (o...@redhat.com); Richard Weinberger
> (rich...@nod.at); Pavel Emelyanov (xe...@parallels.com); Vasily Kulikov
> (seg...@openwall.com); Gotou, Yasunori/五�u 康文; 'Daniel P. Berrange
> (berra...@redhat.com)'
> Cc: contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: RE: [RFC]Pid conversion between pid namespace
> 
> Hi,
> 
> Let me summarize our discussions of ID conversion by pros/cons:
> 
> A) make new system call for translation
> A-1) systemcall(ID, NS1, NS2) into (ID).
> pros:
> - has a reference ns(NS2)
>   We could get any lower level ID directly.
> 
> cons:
> - lack of hierarchy information.
>   CRIU need hierarchy info for checkpoint/restore in nested 
> containers.
> - not easy for debug.
>   And a lot of tools/libs need be modified.
> 
> A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> pros:
> - ns procfs free, easy to use.
> We could get rid of mounted ns procfs.
> 
> cons:
> - may find multiple results in nested ns.
>   We wished the new API could tell us the exact answer.
>   But if getnspid return more than one results will bring trouble to 
> admins,
>   they had to make another decision.
>   Or we marked the deepest level for translation as prerequisite.
> 
> -based on current pidns, no reference ns.
> 
> B) make/change proc file/directories
>   B-1) expand /proc/pid/status
>   pros:
> - easy to use and to debug
> - already had existed interface in kernel
> 
>   cons:
> - based on current ns
>   for middle level, we had to make another decision.
> - do not have hierarchy info.
> 
>   B-2) /proc//ns/proc/ which would contain everything
>   pros:
> - have enough info from /proc in container
> 
>   cons:
> - Requirements unclear.
>   We need more discussion to decide which items should not be exposed.
> - do not have hierarchy info.
> 
> 
> How about do these things in two steps:
> 
> C)  1. expose all sets of pid, pgid, sid and tgid
> via expanded /proc/PID/status
>   We could get translated IDs from container like:
> NStgid:   16465   5   1
> NSpid:16465   5   1
> N

Could not mount sysfs when enable userns but disable netns

2014-07-11 Thread chenhanx...@cn.fujitsu.com
Hello,

How to reproduce:
1. Prepare a container, enable userns and disable netns
2. use libvirt-lxc to start a container
3. libvirt could not mount sysfs then failed to start.

Then I found that
commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says:
"Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
over the net namespace."

But why should we check sysfs mouont permission over net namespace?
We've already checked CAP_SYS_ADMIN though.

What the relationship between sysfs and net namespace,
or this check is a little redundant?

Any insights on this?

Thanks,
- Chen

PS: codes below could be a workaround

@@ -34,7 +35,8 @@ static struct dentry *sysfs_mount(struct file_system_type 
*fs_type,
if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type))
return ERR_PTR(-EPERM);
 
-   if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
+   if (current->nsproxy->net_ns != &init_net &&
+!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
return ERR_PTR(-EPERM);
}
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

RE: Could not mount sysfs when enable userns but disable netns

2014-07-14 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Eric W. Biederman [mailto:ebied...@xmission.com]
> Sent: Saturday, July 12, 2014 12:29 AM
> To: Serge E. Hallyn
> Cc: Chen, Hanxiao/陈 晗霄; Serge Hallyn (serge.hal...@ubuntu.com); Greg
> Kroah-Hartman; contain...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Subject: Re: Could not mount sysfs when enable userns but disable netns
> 
> "Serge E. Hallyn"  writes:
> 
> > Quoting chenhanx...@cn.fujitsu.com (chenhanx...@cn.fujitsu.com):
> >> Hello,
> >>
> >> How to reproduce:
> >> 1. Prepare a container, enable userns and disable netns
> >> 2. use libvirt-lxc to start a container
> >> 3. libvirt could not mount sysfs then failed to start.
> >>
> >> Then I found that
> >> commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says:
> >> "Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
> >> over the net namespace."
> >>
> >> But why should we check sysfs mouont permission over net namespace?
> >> We've already checked CAP_SYS_ADMIN though.
> 
> We already checked capable(CAP_SYS_ADMIN) and it failed.

But on my machine, capable(CAP_SYS_ADMIN) passed
but failed in kobj_ns_current_may_mount.

I added some printks in sysfs_mount:
if (!(flags & MS_KERNMOUNT)) {
-   if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type))
+   if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) {
+   printk(KERN_WARNING "Failed in capable\n");
return ERR_PTR(-EPERM);
+}
 
-   if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
+   if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) {
+   printk(KERN_WARNING "Failed in 
kobj_ns_current_may_mount\n");
return ERR_PTR(-EPERM);
+}

And found: 
Jul 14 09:55:26 localhost systemd: Starting Container lxc-chx.
Jul 14 09:55:26 localhost systemd-machined: New machine lxc-chx.
Jul 14 09:55:26 localhost systemd: Started Container lxc-chx.
Jul 14 09:55:26 localhost kernel: [  784.044709] Failed in 
kobj_ns_current_may_mount
Jul 14 09:55:26 localhost systemd-machined: Machine lxc-chx terminated.

> 
> >> What the relationship between sysfs and net namespace,
> >> or this check is a little redundant?
> 
> You want a bind mount not a new fresh mount.
> 

Yes, we need to modify libvirt's codes to deal with sysfs
when enable userns but disable netns.

Thanks,
- Chen

> When looking at how evil actors could abuse things it turned out that in
> some circumstances the root user (before a user namespace is created)
> needs to control the policy on which filesystems may be mounted.  There
> are files in sysfs and in proc that you never want to see in a chroot
> jail, as they just create more surface area to attack.
> 
> The only reason for creating a new fresh mount of sysfs is to get access
> to /sys/class/net.  So to keep things simple we restrict creation of
> that mount to cases where the mounter has permisions over the network
> namespace, and cases where nothing interesing is mounted on top of
> sysfs.
> 
> If a new /sys/class/net is not needed it is possible to bind mount the
> existing copy of sysfs to the new location without loss of
> functionality.
> 
> > It is not redundant.  The whole point is that after clone(CLONE_NEWUSER)
> > you get a newly filled set of capabilities.  But you should not have
> > privileges over the host's network namesapce.  After you unshare a new
> > network namespace, you *should* have privilege over it.  So the fact
> > that we've already check CAP_SYS_ADMIN means nothing, because the
> > capabilities need to be targeted.
> 
> Exactly the tests are failing because the caller is not the global root
> and so the code is properly failing the permission checks.
> 
> Eric


[RFC]Pid conversion between pid namespace

2014-07-03 Thread chenhanx...@cn.fujitsu.com
Hi,

We had some discussions on how to carry out
pid conversion between pid namespace via:
syscall[1] and procfs[2].

Pavel suggested that a syscall like
(ID, NS1, NS2) into (ID).

Serge suggested that a syscall 
pid_t getnspid(pid_t query_pid, pid_t observer_pid).


Eric and Richard suggested a procfs solution is
more appropriate.

Oleg suggested that we should expand /proc/pid/status
to report this kind of information.

And Richard suggested adding a directory like
/proc//ns/proc/ which would contain everything
from /proc//.

As procfs provided a more user friendly interface,
how about expose all sets of tgid, pid, pgid, sid 
by expanding /proc/PID/status in procfs?
And we could also expose ns hierarchy under /proc,
which could be another reference.

Ex:
init_pid_nsns1 ns2
t1  2
t2   `- 3  1 
t3   `- 4  `- 51

We could get in /proc/t3/status:
NSpid: 4 5 1
We knew that pid 1 in container is pid 4 in init ns.

And we could get ns hierarchy under /proc/ns_hierarchy like:
init_ns->ns1->ns2   (as the result of readlink)
 ->ns3
We knew that t3 in ns2, and its hierarchy.

How these ideas looks like?
Any comments would be appreciated.

Thanks,
- Chen


a) syscall
http://lwn.net/Articles/602987/

b) procfs
http://www.spinics.net/lists/kernel/msg1751688.html



RE: [PATCH v2] ns: introduce getnspid syscall

2014-06-26 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Serge Hallyn [mailto:serge.hal...@ubuntu.com]
> Sent: Wednesday, June 25, 2014 10:39 PM
> To: Chen, Hanxiao/陈 晗霄
> Cc: Serge E. Hallyn; Eric W. Biederman; Richard Weinberger;
> contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Oleg
> Nesterov; David Howells; Al Viro; linux-...@vger.kernel.org
> > > > > >
> > > > >
> > > > > I don't think that adding a new system call for this is a good 
> > > > > solution.
> > > > > We need a more generic way. I bet people are interested in more than 
> > > > > just
> > > PID
> > > > > numbers.
> > > >
> > > > Could you please give some hints on how to expand this interface?
> > > >
> > > > >
> > > > > I agree with Eric that a procfs solution is more appropriate.
> > > > >
> > > >
> > > > Procfs is a good solution, but syscall is not bad though.
> > >
> > > I might be inclined to agree, except that in this case you are still
> > > needing mounted procfs anyway to get the proc/$pid/ns/pid fds.
> > >
> > > I'm sorry, I've not been watching this thread, so this probably has been
> > > considered and decided against, but I'm going to ask anyway.  Keeping
> > > in mind both checkpoint-restart and and introspection for use in a
> > > setns'd commend, why not make it
> > >
> > > pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > >
> > > which returns the process id of query_pid as seen from observer_pid's
> > > pidns?
> > >
> >
> > But this could be confused in nested ns.
> >
> > Ex:
> > (thanks for Pavel's figure)
> > init_pid_nsns1 ns2
> > t1  2
> > t2   `- 3  1
> > t3   `- 4  `- 51
> > t4  5
> >
> > a) getnspid(1, 1):
> > We expected it could return t2's pid(2nd 1 as pid
> 
> Clearly the passed-in pids should be interpreted as relative
> to current's pidns.  There can be no ambiguity at that point,
> unless I'm overlooking something.
> 

Default to current's pidns looks reasonable.
But nested namespace will still bring trouble to us.
Since the middle level of namespace looks less attractive to users,
how about ignore them, and just show the deepest level's pid?

Ex:
(Thanks for Pavel's figure again)
init_pid_nsns1 ns2
t1  2
t2   `- 3  1
t3   `- 4  `- 51
t4   `- 5   `-7  `- 2

1. In init_pid_ns:
a) getnspid(2, 1):
returns 2 (t1)

b) getnspid(1, 3):
returns 3 (t2)

c) getnspid(1, 4):
returns 4 (t3)

getnspid(2, 4):
returns 5 (t3)

2. In ns1
a) getnspid(2, 5):
returns 7 (t4)

How do you like this idea?

Thanks,
- Chen

> > such as systemd in init_pid_ns),
> > but t3'pid is also an appropriate result.
> > We may get more than one returns.
> >
> > b) getnspid(5, 1):
> > (1st 5 was expected as  pid in ns1)
> > t3'pid and t4's pid could both be the answer.
> > We could not determine which one is what we want.
> >
> > So something unique like fds of ns should be
> > a better reference.
> >
> > Thanks,
> > - Chen
> >
> > >
> > > > Procfs works for me, but that seems could not fit
> > > > Pavel's requirement.
> > > > His opinion is that a syscall is a more generic interface
> > > > than proc files, and  also very helpful.
> > > > And syscall could tell whether a pid lives in a specific pid namespace,
> > > > much convenient than procfs.
> > > >
> > > > Thanks,
> > > > - Chen
> > >
> > > > ___
> > > > Containers mailing list
> > > > contain...@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> >
> > ___
> > Containers mailing list
> > contain...@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers


RE: [RFC]Pid conversion between pid namespace

2014-07-09 Thread chenhanx...@cn.fujitsu.com
Hi,

Let me summarize our discussions of ID conversion by pros/cons: 

A) make new system call for translation 
A-1) systemcall(ID, NS1, NS2) into (ID).
pros:
- has a reference ns(NS2)
  We could get any lower level ID directly.
 
cons:
- lack of hierarchy information. 
  CRIU need hierarchy info for checkpoint/restore in nested containers.
- not easy for debug. 
  And a lot of tools/libs need be modified.

A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
pros:
- ns procfs free, easy to use.
We could get rid of mounted ns procfs.

cons:
- may find multiple results in nested ns.
  We wished the new API could tell us the exact answer.
  But if getnspid return more than one results will bring trouble to 
admins,
  they had to make another decision.
  Or we marked the deepest level for translation as prerequisite.

-based on current pidns, no reference ns.

B) make/change proc file/directories
B-1) expand /proc/pid/status
pros:
- easy to use and to debug
- already had existed interface in kernel

cons:
- based on current ns
  for middle level, we had to make another decision.
- do not have hierarchy info.

B-2) /proc//ns/proc/ which would contain everything
pros:
- have enough info from /proc in container

cons:
- Requirements unclear.
  We need more discussion to decide which items should not be exposed.
- do not have hierarchy info.


How about do these things in two steps: 

C)  1. expose all sets of pid, pgid, sid and tgid
via expanded /proc/PID/status
  We could get translated IDs from container like:
NStgid: 16465   5   1 
NSpid:  16465   5   1 
NSpgid: 16465   5   1 
NSsid:  16423   1   0
(a set of IDs with 3 level of ns)

2. add hierarchy info under /proc
  We lacked of method of getting hierarchy info, which is useful.
  Then we could know the relationship of ns.
  How about adding a new proc file just under /proc
  to show the hierarchy like readlink did:
  pid:[4026531836]-> [4026532390] -> [4026532484]
  pid:[4026531836]-> [4026532491]
  (A 3 level pid and 2 level pid_

Any comments would be appreciated.

Thanks,
- Chen

> -Original Message-
> Subject: [RFC]Pid conversion between pid namespace
> 
> Hi,
> 
> We had some discussions on how to carry out
> pid conversion between pid namespace via:
> syscall[1] and procfs[2].
> 
> Pavel suggested that a syscall like
> (ID, NS1, NS2) into (ID).
> 
> Serge suggested that a syscall
> pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> 
> 
> Eric and Richard suggested a procfs solution is
> more appropriate.
> 
> Oleg suggested that we should expand /proc/pid/status
> to report this kind of information.
> 
> And Richard suggested adding a directory like
> /proc//ns/proc/ which would contain everything
> from /proc//.
> 
> As procfs provided a more user friendly interface,
> how about expose all sets of tgid, pid, pgid, sid
> by expanding /proc/PID/status in procfs?
> And we could also expose ns hierarchy under /proc,
> which could be another reference.
> 
> Ex:
> init_pid_nsns1 ns2
> t1  2
> t2   `- 3  1
> t3   `- 4  `- 51
> 
> We could get in /proc/t3/status:
> NSpid: 4 5 1
> We knew that pid 1 in container is pid 4 in init ns.
> 
> And we could get ns hierarchy under /proc/ns_hierarchy like:
> init_ns->ns1->ns2 (as the result of readlink)
>  ->ns3
> We knew that t3 in ns2, and its hierarchy.
> 
> How these ideas looks like?
> Any comments would be appreciated.
> 
> Thanks,
> - Chen
> 
> 
> a) syscall
> http://lwn.net/Articles/602987/
> 
> b) procfs
> http://www.spinics.net/lists/kernel/msg1751688.html
> 
> ___
> Containers mailing list
> contain...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

RE: [PATCH] /proc/pid/status: show all sets of pid according to ns

2014-05-27 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Richard Weinberger [mailto:richard.weinber...@gmail.com]
> Sent: Monday, May 26, 2014 7:10 PM
> To: Chen, Hanxiao/陈 晗霄
> Cc: Linux Containers; LKML; Andrew Morton; Eric W. Biederman; Serge Hallyn;
> Daniel P. Berrange; Oleg Nesterov; Al Viro; David Howells
> Subject: Re: [PATCH] /proc/pid/status: show all sets of pid according to ns
> 
> On Mon, May 26, 2014 at 12:05 PM, Chen Hanxiao
>  wrote:
> > We need a direct method of getting the pid inside containers.
> > If some issues occurred inside a container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> >
> > This patch expands fields of Tgid and Pid:
> > a) In init_pid_ns, nothing changed;
> >
> > b) In one pidns, they will tell the pid inside containers:
> > Tgid:   16289   3
> > Pid:16289   3
> > ** process id is 1628 in level 0, 9 in level 1, 3 in level 2.
> >
> > c) If pidns is nested, it depends on which pidns are you in.
> > Tgid:   9   3
> > Pid:9   3
> > ** Views from level 1 for Pid 1628 in host.
> >
> > Signed-off-by: Chen Hanxiao 
> > ---
> >  fs/proc/array.c | 20 +---
> >  1 file changed, 13 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/proc/array.c b/fs/proc/array.c
> > index 64db2bc..eef20dd 100644
> > --- a/fs/proc/array.c
> > +++ b/fs/proc/array.c
> > @@ -173,17 +173,23 @@ static inline void task_state(struct seq_file *m, 
> > struct
> pid_namespace *ns,
> > cred = get_task_cred(p);
> > seq_printf(m,
> > "State:\t%s\n"
> > -   "Tgid:\t%d\n"
> > -   "Ngid:\t%d\n"
> > -   "Pid:\t%d\n"
> > +   "Ngid:\t%d\n",
> 
> You're changing the ordering of Tgid and Ngid here.

I just want to put Tgid and Pid together, for showing all sets of pids of them. 

> 
> > +   get_task_state(p),
> > +   task_numa_group_id(p));
> > +   seq_puts(m, "Tgid:");
> > +   for (g = ns->level; g <= pid->level; g++)
> > +   seq_printf(m, "\t%d ",
> > +   task_tgid_nr_ns(p, pid->numbers[g].ns));
> 
> I like the idea but IMHO we should keep Tgid and Pid as is and better
> add two new fields to /proc/pid/status.
> What about NSpid and NSgid?
> 

That's a good idea.
As Vasily's comments,
keeping Pid unchanged would be better for backward compatibility.

> > +   seq_puts(m, "\nPid:");
> > +   for (g = ns->level; g <= pid->level; g++)
> > +   seq_printf(m, "\t%d ",
> > +   task_pid_nr_ns(p, pid->numbers[g].ns));
> > +   seq_putc(m, '\n');
> > +   seq_printf(m,
> > "PPid:\t%d\n"
> > "TracerPid:\t%d\n"
> > "Uid:\t%d\t%d\t%d\t%d\n"
> > "Gid:\t%d\t%d\t%d\t%d\n",
> > -   get_task_state(p),
> > -   task_tgid_nr_ns(p, ns),
> > -   task_numa_group_id(p),
> > -   pid_nr_ns(pid, ns),
> > ppid, tpid,
> > from_kuid_munged(user_ns, cred->uid),
> > from_kuid_munged(user_ns, cred->euid),
> > --
> > 1.9.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> 
> --
> Thanks,
> //richard


RE: [PATCH] /proc/pid/status: show all sets of pid according to ns

2014-05-27 Thread chenhanx...@cn.fujitsu.com
Hi Vasily,

> -Original Message-
> From: Vasily Kulikov [mailto:sego...@gmail.com]
> Sent: Tuesday, May 27, 2014 2:05 AM
> To: Chen, Hanxiao/陈 晗霄
> Cc: contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Serge
> Hallyn; Oleg Nesterov; David Howells; Eric W. Biederman; Andrew Morton; Al 
> Viro
> Subject: Re: [PATCH] /proc/pid/status: show all sets of pid according to ns
> 
> Hi Chen,
> 
> On Mon, May 26, 2014 at 18:05 +0800, Chen Hanxiao wrote:
> > We need a direct method of getting the pid inside containers.
> > If some issues occurred inside a container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> >
> > This patch expands fields of Tgid and Pid:
> > a) In init_pid_ns, nothing changed;
> >
> > b) In one pidns, they will tell the pid inside containers:
> > Tgid:   16289   3
> > Pid:16289   3
> > ** process id is 1628 in level 0, 9 in level 1, 3 in level 2.
> 
> 1. It breaks ABI.  Any application which does something like "grep pid: | cut 
> -d:
> -f2"
> is now broken by the patch.  Maybe add a new field like 'Pid-ns', 'PidNS',
> or 'Pids' and leave the old one for compatibility?
> 

Thanks for your comments.
Adding a new field could solve backward compatibility issue.

> 2. Is it OK to show internal pids to unprivileged processes?  I cannot
> see anything obviously dangerous with it, though.
> 

I thinks just 'showing' them would not bring some troubles.

> > c) If pidns is nested, it depends on which pidns are you in.
> > Tgid:   9   3
> > Pid:9   3
> > ** Views from level 1 for Pid 1628 in host.
> 
> --
> Vasily

Thanks,
- Chen


RE: [PATCH v2] /proc/pid/status: show all sets of pid according to ns

2014-05-29 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: containers-boun...@lists.linux-foundation.org
> On 05/29/2014 09:59 AM, Vasily Kulikov wrote:
> > On Wed, May 28, 2014 at 23:27 +0400, Pavel Emelyanov wrote:
> >> On 05/28/2014 10:28 PM, Vasily Kulikov wrote:
> >>> On Wed, May 28, 2014 at 16:44 +0400, Pavel Emelyanov wrote:
> >>> It will be simplier
> >>> to parse the file -- if 'ns_ids' file contains some ID then this ID for
> >>> every ns can be obtained regardless of the specific ID name (SID, PID,
> >>> PGID, etc.).
> >>
> >> True, but given a task PID how to determine which pid namespaces it lives 
> >> in
> >> to get the idea of how PIDs map to each other? Maybe we need some explicit
> >> API for converting (ID, NS1, NS2) into (ID)?
> >
> > AFAIU the idea of the patch is to add a new debugging information which
> > can be trivially obtained via 'cat /proc/...':
> 
> I agree, but this ability will be very useful by checkpoint-restore project
> too and I'd really appreciate if the API we have for that would be scalable
> enough. Per-task proc file works for me, but how about sid-s and pgid-s?
> 

Yes, a new syscall is very useful, but it should be another task.
Just for Pids, I think proc file is good enough.

> > ] We need a direct method of getting the pid inside containers.
> > ] If some issues occurred inside container guest, host user
> > ] could not know which process is in trouble just by guest pid:
> > ] the users of container guest only knew the pid inside containers.
> > ] This will bring obstacle for trouble shooting.
> >
> > A new syscall might complicate trouble shooting by admin.
> 
> Pure syscall -- yes. What if we teach the ps and top utilities to show 
> additional
> info? I think that would help.
>

Thanks,
- Chen


RE: [PATCH] ns: introduce getnspid syscall

2014-06-18 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Pavel Emelyanov [mailto:xe...@parallels.com]
> Sent: Tuesday, June 17, 2014 8:13 PM
> To: Chen, Hanxiao/陈 晗霄
> Cc: contain...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> Andrew Morton; Eric W. Biederman; Serge Hallyn; Daniel P. Berrange; Oleg 
> Nesterov;
> Al Viro; David Howells; Richard Weinberger; Vasiliy Kulikov; Gotou, Yasunori/
> 五�u 康文
> Subject: Re: [PATCH] ns: introduce getnspid syscall
> 
> On 06/17/2014 02:21 PM, Chen Hanxiao wrote:
> > We need a direct method of getting the pid inside containers.
> > If some issues occurred inside container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> >
> > int getnspid(pid_t pid, int fd1, int fd2, int pidtype);
> >
> > pid: the pid number need to be translated.
> >
> > fd: a file descriptor referring to one of
> > the namespace entries in a /proc/[pid]/ns/pid.
> > fd1 for destination ns(ns1), where the pid came from.
> > fd2 for reference ns(ns2), while fd2 = -2 means for current ns.
> >
> > pidtype: 0 PIDTYPE_PID; 1 PIDTYPE_PGID; 2 PIDTYPE_SID.
> >
> > return value:
> > >0: translated pid in ns1(fd1) seen from ns2(fd2).
> > <0: on failure.
> >
> > +   }
> > +
> > +   switch (pidtype) {
> 
> There's no need in switch, the __task_pid_nr_ns() accepts
> the type argument.
> 

Yes, I think we still have that kind of functions, so I used them...

> > +   case PIDTYPE_PID:
> > +   ret = task_pid_nr_ns(task, ns2);
> 
> But this is not correct. If task doesn't live in ns2, but ns2
> just has the ns->level small enough, then the wrong pid value
> would be reported.
> 

Right, we should check whether the task belonged to that namespace firstly.

Thanks,
- Chen

> > +   break;
> > +   case PIDTYPE_PGID:
> > +   ret = task_pgrp_nr_ns(task, ns2);
> > +   break;
> > +   case PIDTYPE_SID:
> > +   ret = task_session_nr_ns(task, ns2);
> > +   break;
> > +   default:
> > +   ret = -EINVAL;
> > +   }
> > +   ret = (ret == 0) ? -ESRCH : ret;
> > +
> > +out:
> > +   fput(file1);
> > +   if (file2)
> > +   fput(file2);
> > +   return ret;
> > +}
> > +
> >  int __init nsproxy_cache_init(void)
> >  {
> > nsproxy_cachep = KMEM_CACHE(nsproxy, SLAB_PANIC);
> >
> 



RE: [PATCH] ns: introduce getnspid syscall

2014-06-18 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: mtk.linux.li...@gmail.com [mailto:mtk.linux.li...@gmail.com] On Behalf 
> Of
> Michael Kerrisk
> Sent: Wednesday, June 18, 2014 2:27 AM
> To: Chen, Hanxiao/陈 晗霄
> Cc: containers; Linux Kernel; Richard Weinberger; Serge Hallyn; Oleg Nesterov;
> David Howells; Eric W. Biederman; Andrew Morton; Al Viro; Linux API; Michael
> Kerrisk-manpages
> Subject: Re: [PATCH] ns: introduce getnspid syscall
> 
> Hello Chen Hanxiao
> 
> On Tue, Jun 17, 2014 at 12:21 PM, Chen Hanxiao
>  wrote:
> > We need a direct method of getting the pid inside containers.
> > If some issues occurred inside container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> >
> > int getnspid(pid_t pid, int fd1, int fd2, int pidtype);
> 
> Please CC linux-...@vger.kernel.org on all patches that change the API
> that the kernel presents to user space. See
> https://www.kernel.org/doc/man-pages/linux-api-ml.html
> 
> Thanks,
> 
> Michael
> 
> 
Thanks for reminding.

- Chen
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2] ns: introduce getnspid syscall

2014-06-23 Thread chenhanx...@cn.fujitsu.com
Hi

> -Original Message-
> From: Richard Weinberger [mailto:rich...@nod.at]
> Sent: Friday, June 20, 2014 7:02 PM
> To: Chen, Hanxiao/陈 晗霄; contain...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org
> Cc: Eric W. Biederman; Serge Hallyn; Daniel P. Berrange; Oleg Nesterov; Al 
> Viro;
> David Howells; Pavel Emelyanov; Vasiliy Kulikov; Gotou, Yasunori/五�u 康文;
> linux-...@vger.kernel.org
> Subject: Re: [PATCH v2] ns: introduce getnspid syscall
> 
> Am 20.06.2014 12:18, schrieb Chen Hanxiao:
> > We need a direct method of getting the pid inside containers.
> > If some issues occurred inside container guest, host user
> > could not know which process is in trouble just by guest pid:
> > the users of container guest only knew the pid inside containers.
> > This will bring obstacle for trouble shooting.
> >
> > int getnspid(pid_t pid, int fd1, int fd2);
> >
> > pid: the pid number need to be translated.
> >
> > fd: a file descriptor referring to one of
> > the namespace entries in a /proc/[pid]/ns/pid.
> > fd1 for destination ns(ns1), where the pid came from.
> > fd2 for reference ns(ns2), while fd2 = -2 means for current ns.
> >
> > return value:
> > >0 : translated pid in ns1(fd1) seen from ns2(fd2).
> > <=0: on failure.
> >
> 
> I don't think that adding a new system call for this is a good solution.
> We need a more generic way. I bet people are interested in more than just PID
> numbers.

Could you please give some hints on how to expand this interface?

> 
> I agree with Eric that a procfs solution is more appropriate.
> 

Procfs is a good solution, but syscall is not bad though.
Procfs works for me, but that seems could not fit
Pavel's requirement.
His opinion is that a syscall is a more generic interface
than proc files, and  also very helpful.
And syscall could tell whether a pid lives in a specific pid namespace,
much convenient than procfs.

Thanks,
- Chen


RE: [PATCH v2] ns: introduce getnspid syscall

2014-06-25 Thread chenhanx...@cn.fujitsu.com
Hi,

> -Original Message-
> From: Serge E. Hallyn [mailto:se...@hallyn.com]
> Sent: Monday, June 23, 2014 9:33 PM
> To: Chen, Hanxiao/陈 晗霄
> Cc: Richard Weinberger; contain...@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; Pavel Emelyanov; linux-...@vger.kernel.org;
> Serge Hallyn; Oleg Nesterov; David Howells; Eric W. Biederman; Al Viro
> Subject: Re: [PATCH v2] ns: introduce getnspid syscall
> 
> > > >
> > >
> > > I don't think that adding a new system call for this is a good solution.
> > > We need a more generic way. I bet people are interested in more than just
> PID
> > > numbers.
> >
> > Could you please give some hints on how to expand this interface?
> >
> > >
> > > I agree with Eric that a procfs solution is more appropriate.
> > >
> >
> > Procfs is a good solution, but syscall is not bad though.
> 
> I might be inclined to agree, except that in this case you are still
> needing mounted procfs anyway to get the proc/$pid/ns/pid fds.
> 
> I'm sorry, I've not been watching this thread, so this probably has been
> considered and decided against, but I'm going to ask anyway.  Keeping
> in mind both checkpoint-restart and and introspection for use in a
> setns'd commend, why not make it
> 
> pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> 
> which returns the process id of query_pid as seen from observer_pid's
> pidns?
> 

But this could be confused in nested ns.

Ex:
(thanks for Pavel's figure)
init_pid_nsns1 ns2
t1  2
t2   `- 3  1 
t3   `- 4  `- 51
t4  5

a) getnspid(1, 1):
We expected it could return t2's pid(2nd 1 as pid
such as systemd in init_pid_ns),
but t3'pid is also an appropriate result.
We may get more than one returns.

b) getnspid(5, 1):
(1st 5 was expected as  pid in ns1)
t3'pid and t4's pid could both be the answer.
We could not determine which one is what we want.

So something unique like fds of ns should be
a better reference. 

Thanks,
- Chen

> 
> > Procfs works for me, but that seems could not fit
> > Pavel's requirement.
> > His opinion is that a syscall is a more generic interface
> > than proc files, and  also very helpful.
> > And syscall could tell whether a pid lives in a specific pid namespace,
> > much convenient than procfs.
> >
> > Thanks,
> > - Chen
> 
> > ___
> > Containers mailing list
> > contain...@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] cgroup: fix a typo in Documentation/cgroups/cgroups.txt

2014-06-25 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Li Zefan [mailto:lize...@huawei.com]
> Sent: Wednesday, June 25, 2014 11:44 AM
> On 2014/6/25 11:30, Chen Hanxiao wrote:
> > s/iff/if
> >
> 
> This is not a typo. iff == if and only if.
> 

I see.
Thanks for teaching.

- Chen



RE: [RFC]Pid conversion between pid namespace

2014-08-29 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Serge E. Hallyn [mailto:se...@hallyn.com]
> Sent: Thursday, August 28, 2014 9:50 PM
> To: Chen, Hanxiao/陈 晗霄
> Cc: Serge Hallyn; Richard Weinberger (rich...@nod.at);
[snip]

> > > I like your proc approach.  Do you have an implementation?
> >
> > Thanks for your comments.
> > I'm preparing the pidns hierarchy patch.
> > It seems that it's not easy to carry it out.
> 
> :)  Not entirely surprised.
> 
> Please do send patches earlier rather than later to avoid going
> down a path that someone's going to nack anyway.

I've almost finished ns hierarchy patch
and do some tests now.
It will be sent in the next week.

Thanks,
- Chen
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RFC]Pid conversion between pid namespace

2014-08-07 Thread chenhanx...@cn.fujitsu.com
Hi,

> -Original Message-
> From: Serge Hallyn [mailto:serge.hal...@ubuntu.com]
> Sent: Tuesday, August 05, 2014 6:21 AM
> 
> Quoting chenhanx...@cn.fujitsu.com (chenhanx...@cn.fujitsu.com):
> > Hi,
> >
> > We discussed two ways of pid conversion:
> > syscall and procfs.
> >
> > Both of them could do a pid translation job.
> > But for ns hierarchy, syscall like:
> >
> > pid_t* getnspid(pid_t query_pid, pid_t observer_pid)
> > or
> > pid_t getnspid(pid_t query_pid, int query_fd, int ref_fd)
> >
> > could not work, we knew a pid lived in one ns, but we
> 
> Note I still disagree here. 
> 
> > did not know their relationships.
> > For getting the entire set of pids, both of them can do.
> >
> > So using procfs is a better way.
> >
> > Ex:
> > init_pid_ns ns1 ns2
> > t1  2
> > t2   `- 3   1
> > t3   `- 4   `- 51
> > t4   `-6`-8  `-9
> > t5 `-10`-9  `-10
> >
> > 1. How procfs work:
> > a) adding a nspid hierarchy  under /proc/ like:
> > [root@localhost proc]# tree /proc/nspid
> > /proc/nspid
> > ├── ns0
> > │└── ns1
> 
> Are these actually called 'ns1' etc?  Adding a namespace of pid
> namespace names is a bad thing.

That's just an example.
We incline to name it as ns$(inum), 
like what we did in proc_ns_readlink.

> 
> > │   ├── ns2
> > │   │   └── pid -> /proc/9/ns
> > │   └── pid -> /proc/4/ns
> > └── pid -> /proc/1/ns
> >
> > We created dirs and add a link to the 1st process of this ns.
> 
> How much more kernel space does this take up?
> 

Only first process when creating new ns will be add here.
So there would not so many items.

> Is there an easy way to go from a pid in your own namespace
> to its proper node under /proc/nspid?  I.e. if I am interested
> in pid 9987, which happens to be pid 5 inside a container in
> ns2, and then I want to know what it means when it (pid 9987)
> is talking about 'pid 10'.  Is there a link under /proc/9987/
> leading to /proc/nspid/ns2/5 ?

If you want to query pid 9987, you could:
a) readlink /proc/9987/ns/pid
b) refer to /proc/nspid/ns$(inum)/ns$(inum)..
c) Also the link to the 1st new ns process could be found under ns$(inum).

Or as what you said above,
we could do some change in /proc/PID/ns/pid
a) when new ns created, we put them under /proc/nspid
b) create a link from /proc/PID/ns/pid to /proc/nspid/ns$(inum)/pid

Then we could get a more clear view:
1. pidns view
/proc/nspid
├── ns_4026531836   (ns0)
│  ├─ ns1
│  │   ├─── ns2
│  │   └── pid -> pid:[4026531836]
│  └── pid -> pid:[4026531816]
└── pid -> pid:[4026531806]

Then there will be a link under /proc/9987/ns/pid to ns2:
2. PID1 live in ns0, PID2 live in ns2
/proc/PID1/ns/pid->/proc/nspid/ns_4026531806

/proc/PID2/ns/pid->/proc/nspid/ns_4026531836

> 
> > b) expose all sets of pid, pgid, sid and tgid
> > via expanded /proc/PID/status
> >   We could get translated IDs from container like:
> > NStgid: 6   8   9
> > NSpid:  6   8   9
> > NSpgid: 6   8   9
> > NSsid:  6   1   0
> > (a set of IDs with 3 level of ns)
> 
> This sure does seem the simplest route.  But it actually still
> does not provide us an easy answer to "what does pid 9987 mean
> when it talks about pid 10?".

Do you mean:
init_pid_ns   ns1 ns2
998710  5
Neither getnspid syscall nor proc/PID/status expansion
could answer this without hierarchy information.
For users in init_pid_ns, getnspid needs
an observer pid live and only live in ns1,
or we should call getnspid in ns1.
See below for more.

> 
> > 2. Advantage of procfs solution
> > a) easy to use:
> > getnspid(6, 10) -> (10, 9, 10)
> > or
> > getnspid(10, ns1_fd, ns0_fd) -> 9
> > getnspid(10, ns2_fd, ns0_fd) -> 10
> >
> > And we could also get it by:
> > cat /proc/10/status | grep NSpid:
> > NSpid:  10  9   10
> > ...
> 
> It looks nice, but I'm not convinced it gives us the info we
> need.
> 
> It's certainly possible that I've just not thought it through
> enough.
> 
> Question: are you proposing this (/proc/pid/status expansion) as an
> alternative to /proc/nspid, or are they meant to be complementary?
> 

We want /proc/nspid as a complement for pid translation.
Ex:
init_pid_ns ns1 ns2
t1  2
t2   `- 3   1 
t3   `- 4   `- 51
t4   `-6`-8  `-9
t5 `-10`-9  `-10
Suppose we were in in

RE: [RFC]Pid conversion between pid namespace

2014-08-08 Thread chenhanx...@cn.fujitsu.com


> -Original Message-
> From: Serge Hallyn [mailto:serge.hal...@ubuntu.com]
> Sent: Friday, August 08, 2014 12:12 AM
> To: Chen, Hanxiao/陈 晗霄

> > > How much more kernel space does this take up?
> > >
> >
> > Only first process when creating new ns will be add here.
> > So there would not so many items.
> 
> Oh, I see.
> 
> > > Is there an easy way to go from a pid in your own namespace
> > > to its proper node under /proc/nspid?  I.e. if I am interested
> > > in pid 9987, which happens to be pid 5 inside a container in
> > > ns2, and then I want to know what it means when it (pid 9987)
> > > is talking about 'pid 10'.  Is there a link under /proc/9987/
> > > leading to /proc/nspid/ns2/5 ?
> >
> > If you want to query pid 9987, you could:
> > a) readlink /proc/9987/ns/pid
> > b) refer to /proc/nspid/ns$(inum)/ns$(inum)..
> > c) Also the link to the 1st new ns process could be found under ns$(inum).
> 
> This is good.  Let's go with it.

OK

> 
> > Or as what you said above,
> 
> Nah.  Let's not change /proc/PID/ns/pid.
> 
> > > This sure does seem the simplest route.  But it actually still
> > > does not provide us an easy answer to "what does pid 9987 mean
> > > when it talks about pid 10?".
> >
> > Do you mean:
> > init_pid_ns   ns1 ns2
> > 998710  5
> > Neither getnspid syscall nor proc/PID/status expansion
> > could answer this without hierarchy information.
> > For users in init_pid_ns, getnspid needs
> > an observer pid live and only live in ns1,
> 
> Yes, good point.  That's a definite disadvantage of getnspid
> compared to your proc approach.
> 
> > or we should call getnspid in ns1.
> > See below for more.
> >
> > >
> > > > 2. Advantage of procfs solution
> > > > a) easy to use:
> > > > getnspid(6, 10) -> (10, 9, 10)
> > > > or
> > > > getnspid(10, ns1_fd, ns0_fd) -> 9
> > > > getnspid(10, ns2_fd, ns0_fd) -> 10
> > > >
> > > > And we could also get it by:
> > > > cat /proc/10/status | grep NSpid:
> > > > NSpid:  10  9   10
> > > > ...
> > >
> > > It looks nice, but I'm not convinced it gives us the info we
> > > need.
> > >
> > > It's certainly possible that I've just not thought it through
> > > enough.
> > >
> > > Question: are you proposing this (/proc/pid/status expansion) as an
> > > alternative to /proc/nspid, or are they meant to be complementary?
> > >
> >
> > We want /proc/nspid as a complement for pid translation.
> 
> Ok.
> 
> > Ex:
> > init_pid_ns ns1 ns2
> > t1  2
> > t2   `- 3   1
> > t3   `- 4   `- 51
> > t4   `-6`-8  `-9
> > t5 `-10`-9  `-10
> > Suppose we were in init_pid_ns:
> > getnspid(9,4)->6 (t4)
> > getnspid(9,3)->10(t5)
> > We knew t2 in ns1 and t3 in ns2, but we don't know their relationship.
> > If we want to query pid 9 in ns1, we could use getnspid(9,3)->10(t5)
> > but the pre-requisite is that we know ns2 is the child of ns1.
> 
> I like your proc approach.  Do you have an implementation?

Thanks for your comments.
I'm preparing the pidns hierarchy patch.
It seems that it's not easy to carry it out.

Thanks,
- Chen