[Devel] linux-next: lockdep whinge in cgroup_rmdir

2011-01-13 Thread Valdis . Kletnieks
Seen booting yesterday's linux-next, was not present in 2.6.37-rc7-mmotm1202.

Not sure if it's an selinux or cgroup issue, so I'm throwing it at every
address I can find for either.  This is easily replicatable and happens at
every boot, so I can test patches if needed.  Am willing to bisect it down if
nobody knows right off the bat what the problem is.

The 'W' taint is from the already-reported kernel/workqueue.c worker_enter_idle 
issue.

[   85.100795] systemd[1]: readahead-replay.service: main process exited, 
code=exited, status=1
[   85.101530] 
[   85.101531] =
[   85.101796] [ INFO: possible recursive locking detected ]
[   85.102002] 2.6.37-next-20110111 #1
[   85.102009] -
[   85.102009] systemd/1 is trying to acquire lock:
[   85.102009]  ((dentry-d_lock)-rlock){+.+...}, at: [8107ca5c] 
cgroup_rmdir+0x339/0x479
[   85.102009] 
[   85.102009] but task is already holding lock:
[   85.102009]  ((dentry-d_lock)-rlock){+.+...}, at: [8107ca54] 
cgroup_rmdir+0x331/0x479
[   85.102009] 
[   85.102009] other info that might help us debug this:
[   85.102009] 4 locks held by systemd/1:
[   85.102009]  #0:  (sb-s_type-i_mutex_key#14/1){+.+.+.}, at: 
[810fea4d] do_rmdir+0x7d/0x121
[   85.102009]  #1:  (sb-s_type-i_mutex_key#14){+.+.+.}, at: 
[810fd4bc] vfs_rmdir+0x4a/0xbe
[   85.102009]  #2:  (cgroup_mutex){+.+.+.}, at: [8107cb84] 
cgroup_rmdir+0x461/0x479
[   85.102009]  #3:  ((dentry-d_lock)-rlock){+.+...}, at: 
[8107ca54] cgroup_rmdir+0x331/0x479
[   85.102009] 
[   85.102009] stack backtrace:
[   85.102009] Pid: 1, comm: systemd Tainted: GW   2.6.37-next-20110111 
#1
[   85.102009] Call Trace:
[   85.102009]  [81069f22] ? __lock_acquire+0x929/0xd4e
[   85.102009]  [8107c6f1] ? cgroup_clear_directory+0xff/0x131
[   85.102009]  [8107c6f1] ? cgroup_clear_directory+0xff/0x131
[   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
[   85.102009]  [8106a859] ? lock_acquire+0x100/0x126
[   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
[   85.102009]  [815521ef] ? sub_preempt_count+0x35/0x48
[   85.102009]  [8154e401] ? _raw_spin_lock+0x36/0x45
[   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
[   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
[   85.102009]  [810579cd] ? autoremove_wake_function+0x0/0x34
[   85.102009]  [811e1839] ? selinux_inode_rmdir+0x15/0x17
[   85.102009]  [810fd4eb] ? vfs_rmdir+0x79/0xbe
[   85.102009]  [810feaa0] ? do_rmdir+0xd0/0x121
[   85.102009]  [8100256c] ? sysret_check+0x27/0x62
[   85.102009]  [8106ac79] ? trace_hardirqs_on_caller+0x117/0x13b
[   85.102009]  [8154e201] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   85.102009]  [8110040b] ? sys_rmdir+0x11/0x13
[   85.102009]  [8100253b] ? system_call_fastpath+0x16/0x1b
[   85.268272] systemd[1]: readahead-collect.service: main process exited, 
code=exited, status=1

Any ideas?



pgpzYhMMjNRuO.pgp
Description: PGP signature
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] [RFD] reboot / shutdown of a container

2011-01-13 Thread Daniel Lezcano

Hi all,

in the container implementation, we are facing the problem of a process 
calling the sys_reboot syscall which of course makes the host to 
poweroff/reboot.

If we drop the cap_sys_reboot capability, sys_reboot fails and the 
container reach a shutdown state but the init process stay there, hence 
the container becomes stuck waiting indefinitely the process '1' to exit.

The current implementation to make the shutdown / reboot of the 
container to work is we watch, from a process outside of the container, 
the rootfs/var/run/utmp file and check the runlevel each time the file 
changes. When the 'reboot' or 'shutdown' level is detected, we wait for 
a single remaining in the container and then we kill it.

That works but this is not efficient in case of a large number of 
containers as we will have to watch a lot of utmp files. In addition, 
the /var/run directory must *not* mounted as tmpfs in the distro. 
Unfortunately, it is the default setup on most of the distros and tends 
to generalize. That implies, the rootfs init's scripts must be modified 
for the container when we put in place its rootfs and as /var/run is 
supposed to be a tmpfs, most of the applications do not cleanup the 
directory, so we need to add extra services to wipeout the files.

More problems arise when we do an upgrade of the distro inside the 
container, because all the setup we made at creation time will be lost. 
The upgrade overwrite the scripts, the fstab and so on.

We did what was possible to solve the problem from userspace but we 
reach always a limit because there are different implementations of the 
'init' process and the init's scripts differ from a distro to another 
and the same with the versions.

We think this problem can only be solved from the kernel.

The idea was to send a signal SIGPWR to the parent of the pid '1' of the 
pid namespace when the sys_reboot is called. Of course that won't occur 
for the init pid namespace.

Does it make sense ?

Any idea is very welcome :)

   -- Daniel




___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [PATCH] Teach cifs about network namespaces (take 3)

2011-01-13 Thread Jeff Layton
On Thu, 13 Jan 2011 12:55:04 -0600
Rob Landley rland...@parallels.com wrote:

 From: Rob Landley rland...@parallels.com
 
 Teach cifs about network namespaces, so mounting uses adresses/routing
 visible from the container rather than from init context.
 
 Signed-off-by: Rob Landley rland...@parallels.com
 ---
 
 Now using net_eq(), with the initialization moved up so the error path doesn't
 dereference a null on the put.
 
  fs/cifs/cifsglob.h |   33 +
  fs/cifs/connect.c  |   12 ++--
  2 files changed, 43 insertions(+), 2 deletions(-)
 

Looks good to me:

Reviewed-by: Jeff Layton jlay...@redhat.com
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFD] reboot / shutdown of a container

2011-01-13 Thread Daniel Lezcano
On 01/13/2011 10:50 PM, Bruno Prémont wrote:
 On Thu, 13 January 2011 Daniel Lezcanodaniel.lezc...@free.fr  wrote:

 On 01/13/2011 09:09 PM, Bruno Prémont wrote:
 On Thu, 13 January 2011 Daniel Lezcanodaniel.lezc...@free.fr   wrote:
 in the container implementation, we are facing the problem of a process
 calling the sys_reboot syscall which of course makes the host to
 poweroff/reboot.

 If we drop the cap_sys_reboot capability, sys_reboot fails and the
 container reach a shutdown state but the init process stay there, hence
 the container becomes stuck waiting indefinitely the process '1' to exit.

 The current implementation to make the shutdown / reboot of the
 container to work is we watch, from a process outside of the container,
 therootfs/var/run/utmp file and check the runlevel each time the file
 changes. When the 'reboot' or 'shutdown' level is detected, we wait for
 a single remaining in the container and then we kill it.

 That works but this is not efficient in case of a large number of
 containers as we will have to watch a lot of utmp files. In addition,
 the /var/run directory must *not* mounted as tmpfs in the distro.
 Unfortunately, it is the default setup on most of the distros and tends
 to generalize. That implies, the rootfs init's scripts must be modified
 for the container when we put in place its rootfs and as /var/run is
 supposed to be a tmpfs, most of the applications do not cleanup the
 directory, so we need to add extra services to wipeout the files.

 More problems arise when we do an upgrade of the distro inside the
 container, because all the setup we made at creation time will be lost.
 The upgrade overwrite the scripts, the fstab and so on.

 We did what was possible to solve the problem from userspace but we
 reach always a limit because there are different implementations of the
 'init' process and the init's scripts differ from a distro to another
 and the same with the versions.

 We think this problem can only be solved from the kernel.

 The idea was to send a signal SIGPWR to the parent of the pid '1' of the
 pid namespace when the sys_reboot is called. Of course that won't occur
 for the init pid namespace.
 Wouldn't sending SIGKILL to the pid '1' process of the originating PID
 namespace be sufficient (that would trigger a SIGCHLD for the parent
 process in the outer PID namespace.
 This is already the case. The question is : when do we send this signal ?
 We have to wait for the container system shutdown before killing it.
 I meant that sys_reboot() would kill the namespace's init if it's not
 called from boot namespace.

 See below

 (as far as I remember the PID namespace is killed when its 'init' exits,
 if this is not the case all other processes in the given namespace would
 have to be killed as well)
 Yes, absolutely but this is not the point, reaping the container is not
 a problem.

 What we are trying to achieve is to shutdown properly the container from
 inside (from outside will be possible too with the setns syscall).

 Assuming the process '1234' creates a new process in a new namespace set
 and wait for it.

 The new process '1' will exec /sbin/init and the system will boot up.
 But, when the system is shutdown or rebooted, after the down scripts are
 executed the kill -15 -1 will be invoked, killing all the processes
 expect the process '1' and the caller. This one will then call
 'sys_reboot' and exit. Hence we still have the init process idle and its
 parent '1234' waiting for it to die.
 This call to sys_reboot() would kill new process '1' instead of trying to
 operate on the HW box.
 This also has the advantage that a container would not require an informed
 parent monitoring it from outside (though it would not be restarted even if
 requested without such informed outside parent).

Oh, ok. Sorry I misunderstood.

Yes, that could be better than crossing the namespace boundaries.

 If we are able to receive the information in the process '1234' : the
 sys_reboot was called in the child pid namespace, we can take then kill
 our child pid.  If this information is raised via a signal sent by the
 kernel with the proper information in the siginfo_t (eg. si_code
 contains LINUX_REBOOT_CMD_RESTART, LINUX_REBOOT_CMD_HALT, ... ), the
 solution will be generic for all the shutdown/reboot of any kind of
 container and init version.
 Could this be passed for a SIGCHLD? (when namespace is reaped, and received
 by 1234 from above example assuming sys_reboot() kills the new process '1')

Yes, that sounds a good idea.

 Looks like yes, but with the need to define new values for si_code (reusing
 LINUX_REBOOT_CMD_* would certainly clash, no matter which signal is choosen).

CLD_REBOOT_CMD_RESTART
CLD_REBOOT_CMD_HALT
CLD_REBOOT_CMD_POWER_OFF
CLD_REBOOT_CMD_RESTART2 (what about the cmd buffer, shall we ignore it ?)
CLD_REBOOT_CMD_KEXEC (?)
CLD_REBOOT_CMD_SW_SUSPEND (useful for the future checkpoint/restart)

LINUX_REBOOT_CMD_CAD_ON and LINUX_REBOOT_CMD_CAD_OFF 

[Devel] Re: [RFD] reboot / shutdown of a container

2011-01-13 Thread Daniel Lezcano
On 01/13/2011 09:09 PM, Bruno Prémont wrote:
 On Thu, 13 January 2011 Daniel Lezcanodaniel.lezc...@free.fr  wrote:
 in the container implementation, we are facing the problem of a process
 calling the sys_reboot syscall which of course makes the host to
 poweroff/reboot.

 If we drop the cap_sys_reboot capability, sys_reboot fails and the
 container reach a shutdown state but the init process stay there, hence
 the container becomes stuck waiting indefinitely the process '1' to exit.

 The current implementation to make the shutdown / reboot of the
 container to work is we watch, from a process outside of the container,
 therootfs/var/run/utmp file and check the runlevel each time the file
 changes. When the 'reboot' or 'shutdown' level is detected, we wait for
 a single remaining in the container and then we kill it.

 That works but this is not efficient in case of a large number of
 containers as we will have to watch a lot of utmp files. In addition,
 the /var/run directory must *not* mounted as tmpfs in the distro.
 Unfortunately, it is the default setup on most of the distros and tends
 to generalize. That implies, the rootfs init's scripts must be modified
 for the container when we put in place its rootfs and as /var/run is
 supposed to be a tmpfs, most of the applications do not cleanup the
 directory, so we need to add extra services to wipeout the files.

 More problems arise when we do an upgrade of the distro inside the
 container, because all the setup we made at creation time will be lost.
 The upgrade overwrite the scripts, the fstab and so on.

 We did what was possible to solve the problem from userspace but we
 reach always a limit because there are different implementations of the
 'init' process and the init's scripts differ from a distro to another
 and the same with the versions.

 We think this problem can only be solved from the kernel.

 The idea was to send a signal SIGPWR to the parent of the pid '1' of the
 pid namespace when the sys_reboot is called. Of course that won't occur
 for the init pid namespace.
 Wouldn't sending SIGKILL to the pid '1' process of the originating PID
 namespace be sufficient (that would trigger a SIGCHLD for the parent
 process in the outer PID namespace.

This is already the case. The question is : when do we send this signal ?
We have to wait for the container system shutdown before killing it.

 (as far as I remember the PID namespace is killed when its 'init' exits,
 if this is not the case all other processes in the given namespace would
 have to be killed as well)

Yes, absolutely but this is not the point, reaping the container is not 
a problem.

What we are trying to achieve is to shutdown properly the container from 
inside (from outside will be possible too with the setns syscall).

Assuming the process '1234' creates a new process in a new namespace set 
and wait for it.

The new process '1' will exec /sbin/init and the system will boot up. 
But, when the system is shutdown or rebooted, after the down scripts are 
executed the kill -15 -1 will be invoked, killing all the processes 
expect the process '1' and the caller. This one will then call 
'sys_reboot' and exit. Hence we still have the init process idle and its 
parent '1234' waiting for it to die.

If we are able to receive the information in the process '1234' : the 
sys_reboot was called in the child pid namespace, we can take then kill 
our child pid.  If this information is raised via a signal sent by the 
kernel with the proper information in the siginfo_t (eg. si_code 
contains LINUX_REBOOT_CMD_RESTART, LINUX_REBOOT_CMD_HALT, ... ), the 
solution will be generic for all the shutdown/reboot of any kind of 
container and init version.

 Only issue is how to differentiate the various reboot() modes (restart,
 power-off/halt) from outside, though that one also exists with the SIGPWR
 signal.


javascript:void(0);
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: linux-next: lockdep whinge in cgroup_rmdir

2011-01-13 Thread Li Zefan
Nick Piggin wrote:
 On Fri, Jan 14, 2011 at 2:34 AM,  valdis.kletni...@vt.edu wrote:
 Seen booting yesterday's linux-next, was not present in 2.6.37-rc7-mmotm1202.

 Not sure if it's an selinux or cgroup issue, so I'm throwing it at every
 address I can find for either.  This is easily replicatable and happens at
 every boot, so I can test patches if needed.  Am willing to bisect it down if
 nobody knows right off the bat what the problem is.

 The 'W' taint is from the already-reported kernel/workqueue.c 
 worker_enter_idle issue.

 [   85.100795] systemd[1]: readahead-replay.service: main process exited, 
 code=exited, status=1
 [   85.101530]
 [   85.101531] =
 [   85.101796] [ INFO: possible recursive locking detected ]
 [   85.102002] 2.6.37-next-20110111 #1
 [   85.102009] -
 [   85.102009] systemd/1 is trying to acquire lock:
 [   85.102009]  ((dentry-d_lock)-rlock){+.+...}, at: 
 [8107ca5c] cgroup_rmdir+0x339/0x479
 [   85.102009]
 [   85.102009] but task is already holding lock:
 [   85.102009]  ((dentry-d_lock)-rlock){+.+...}, at: 
 [8107ca54] cgroup_rmdir+0x331/0x479
 [   85.102009]
 [   85.102009] other info that might help us debug this:
 [   85.102009] 4 locks held by systemd/1:
 [   85.102009]  #0:  (sb-s_type-i_mutex_key#14/1){+.+.+.}, at: 
 [810fea4d] do_rmdir+0x7d/0x121
 [   85.102009]  #1:  (sb-s_type-i_mutex_key#14){+.+.+.}, at: 
 [810fd4bc] vfs_rmdir+0x4a/0xbe
 [   85.102009]  #2:  (cgroup_mutex){+.+.+.}, at: [8107cb84] 
 cgroup_rmdir+0x461/0x479
 [   85.102009]  #3:  ((dentry-d_lock)-rlock){+.+...}, at: 
 [8107ca54] cgroup_rmdir+0x331/0x479
 [   85.102009]
 [   85.102009] stack backtrace:
 [   85.102009] Pid: 1, comm: systemd Tainted: GW   
 2.6.37-next-20110111 #1
 [   85.102009] Call Trace:
 [   85.102009]  [81069f22] ? __lock_acquire+0x929/0xd4e
 [   85.102009]  [8107c6f1] ? cgroup_clear_directory+0xff/0x131
 [   85.102009]  [8107c6f1] ? cgroup_clear_directory+0xff/0x131
 [   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
 [   85.102009]  [8106a859] ? lock_acquire+0x100/0x126
 [   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
 [   85.102009]  [815521ef] ? sub_preempt_count+0x35/0x48
 [   85.102009]  [8154e401] ? _raw_spin_lock+0x36/0x45
 [   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
 [   85.102009]  [8107ca5c] ? cgroup_rmdir+0x339/0x479
 [   85.102009]  [810579cd] ? autoremove_wake_function+0x0/0x34
 [   85.102009]  [811e1839] ? selinux_inode_rmdir+0x15/0x17
 [   85.102009]  [810fd4eb] ? vfs_rmdir+0x79/0xbe
 [   85.102009]  [810feaa0] ? do_rmdir+0xd0/0x121
 [   85.102009]  [8100256c] ? sysret_check+0x27/0x62
 [   85.102009]  [8106ac79] ? trace_hardirqs_on_caller+0x117/0x13b
 [   85.102009]  [8154e201] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [   85.102009]  [8110040b] ? sys_rmdir+0x11/0x13
 [   85.102009]  [8100253b] ? system_call_fastpath+0x16/0x1b
 [   85.268272] systemd[1]: readahead-collect.service: main process exited, 
 code=exited, status=1

 Any ideas?
 
 It looks like it is just a missing parent-child lock order annotation, but
 mainline cgroupfs code looks to be OK there. What does
 cgroup_clear_directory() look like in mmotm?

It's not from cgroup_clear_directory()..

This should fix it:

=

From: Li Zefan l...@cn.fujitsu.com
Date: Fri, 14 Jan 2011 11:34:34 +0800
Subject: [PATCH] cgroups: Fix a lockdep warning at cgroup removal

Commit 2fd6b7f5 (fs: dcache scale subdirs) forgot to annotate a dentry
lock, which caused a lockdep warning.

Reported-by: Valdis Kletnieks valdis.kletni...@vt.edu
Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 kernel/cgroup.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5c5f4cc..db983e2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -910,7 +910,7 @@ static void cgroup_d_remove_dir(struct dentry *dentry)

parent = dentry-d_parent;
spin_lock(parent-d_lock);
-   spin_lock(dentry-d_lock);
+   spin_lock_nested(dentry-d_lock, DENTRY_D_LOCK_NESTED);
list_del_init(dentry-d_u.d_child);
spin_unlock(dentry-d_lock);
spin_unlock(parent-d_lock);
-- 
1.7.3.1

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] kernel BUG at fs/dcache.c:1363 (from cgroup)

2011-01-13 Thread Li Zefan
Just mount the cgroupfs:

# mount -t cgroup -o cpuset xxx /mnt
(oops!!)

The bug is caused by:

=
commit 0df6a63f8735a7c8a877878bc215d4312e41ef81
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Dec 21 13:29:29 2010 -0500

switch cgroup

switching it to s_d_op allows to kill the cgroup_lookup() kludge.

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
=

This line:

+   sb-s_d_op = cgroup_dops;

will cause the dentry op be set twice, and thus trigger the bomb:

struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
{
...
if (parent) {
...
d_set_d_op(dentry, dentry-d_sb-s_d_op);
...
}
...
}

static struct dentry *d_alloc_and_lookup(struct dentry *parent,
struct qstr *name, struct nameidata *nd)
{
...
dentry = d_alloc(parent, name);
...
old = inode-i_op-lookup(inode, dentry, nd);
...
}

simple_lookup() will call d_set_d_op()...


==

[   90.740906] kernel BUG at fs/dcache.c:1360!
..
[   90.810321] Call Trace:
[   90.814166]  [c04f97ad] simple_lookup+0x26/0x3c
[   90.818015]  [c04e86ce] d_alloc_and_lookup+0x36/0x54
[   90.818021]  [c04e8aa8] __lookup_hash+0x6a/0x71
[   90.818026]  [c04e8f33] lookup_one_len+0x81/0x90
[   90.818034]  [c0473083] cgroup_add_file+0x8e/0x132
[   90.818041]  [c0473152] cgroup_add_files+0x2b/0x3d
[   90.818047]  [c0473188] cgroup_populate_dir+0x24/0xdb
[   90.818053]  [c047360b] cgroup_mount+0x3cc/0x431
[   90.818061]  [c04e238d] vfs_kern_mount+0x57/0x109
[   90.818066]  [c047323f] ? cgroup_mount+0x0/0x431
[   90.818072]  [c04e248e] do_kern_mount+0x38/0xba
[   90.818077]  [c04f6706] do_mount+0x5e4/0x60f
[   90.818082]  [c04f6094] ? copy_mount_options+0x78/0xd7
[   90.818087]  [c04f68de] sys_mount+0x66/0x94
[   90.818093]  [c040329f] sysenter_do_call+0x12/0x38
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: kernel BUG at fs/dcache.c:1363 (from cgroup)

2011-01-13 Thread Al Viro
On Fri, Jan 14, 2011 at 12:56:17PM +0800, Li Zefan wrote:
 Just mount the cgroupfs:
 
 # mount -t cgroup -o cpuset xxx /mnt
 (oops!!)
 
 The bug is caused by:
 
 =
 commit 0df6a63f8735a7c8a877878bc215d4312e41ef81
 Author: Al Viro v...@zeniv.linux.org.uk
 Date:   Tue Dec 21 13:29:29 2010 -0500
 
 switch cgroup
 
 switching it to s_d_op allows to kill the cgroup_lookup() kludge.
 
 Signed-off-by: Al Viro v...@zeniv.linux.org.uk
 =
 
 This line:
 
 +   sb-s_d_op = cgroup_dops;

Oh, crap...  Right, it's using simple_lookup().  Let me check if anything
else might be stepping on that.

Umm...  There's a very strange codepath in btrfs that also might.
Interesting.  Fix for cgroup, AFAICS, should be this:

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5c5f4cc..ffb7bba 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(cgroup_unlock);
  */
 
 static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, int mode);
+static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct 
nameidata *);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
 static int cgroup_populate_dir(struct cgroup *cgrp);
 static const struct inode_operations cgroup_dir_inode_operations;
@@ -860,6 +861,11 @@ static void cgroup_diput(struct dentry *dentry, struct 
inode *inode)
iput(inode);
 }
 
+static int cgroup_delete(const struct dentry *d)
+{
+   return 1;
+}
+
 static void remove_dir(struct dentry *d)
 {
struct dentry *parent = dget(d-d_parent);
@@ -1451,6 +1457,7 @@ static int cgroup_get_rootdir(struct super_block *sb)
 {
static const struct dentry_operations cgroup_dops = {
.d_iput = cgroup_diput,
+   .d_delete = cgroup_delete,
};
 
struct inode *inode =
@@ -2195,12 +2202,20 @@ static const struct file_operations 
cgroup_file_operations = {
 };
 
 static const struct inode_operations cgroup_dir_inode_operations = {
-   .lookup = simple_lookup,
+   .lookup = cgroup_lookup,
.mkdir = cgroup_mkdir,
.rmdir = cgroup_rmdir,
.rename = cgroup_rename,
 };
 
+static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, 
struct nameidata *nd)
+{
+   if (dentry-d_name.len  NAME_MAX)
+   return ERR_PTR(-ENAMETOOLONG);
+   d_add(dentry, NULL);
+   return NULL;
+}
+
 /*
  * Check if a file is a control file
  */
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: kernel BUG at fs/dcache.c:1363 (from cgroup)

2011-01-13 Thread Li Zefan
Al Viro wrote:
 On Fri, Jan 14, 2011 at 12:56:17PM +0800, Li Zefan wrote:
 Just mount the cgroupfs:

 # mount -t cgroup -o cpuset xxx /mnt
 (oops!!)

 The bug is caused by:

 =
 commit 0df6a63f8735a7c8a877878bc215d4312e41ef81
 Author: Al Viro v...@zeniv.linux.org.uk
 Date:   Tue Dec 21 13:29:29 2010 -0500

 switch cgroup
 
 switching it to s_d_op allows to kill the cgroup_lookup() kludge.
 
 Signed-off-by: Al Viro v...@zeniv.linux.org.uk
 =

 This line:

 +   sb-s_d_op = cgroup_dops;
 
 Oh, crap...  Right, it's using simple_lookup().  Let me check if anything
 else might be stepping on that.
 
 Umm...  There's a very strange codepath in btrfs that also might.
 Interesting.  Fix for cgroup, AFAICS, should be this:
 

patch tested. thanks!
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: linux-cr: v23-rc1 pushed

2011-01-13 Thread Sukadev Bhattiprolu
Oren Laadan [or...@cs.columbia.edu] wrote:
| Folks,
| 
| I just pushed out a new v23-rc1 branch of linux-cr. This one is
| rebased to 2.6.37, and contains nearly all the patches pulled
| on v22-dev. I only gave it a brief test drive... feel free to 
| throw all your ammo it.

Oren,

We need the file_tty() helper to get the tty object from the file pointer
(otherwise we will be off by 4 bytes and fail tty_paranoia_check() in
tty_file_checkpoint()).

Thanks,

Sukadev

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index c89f055..6aa458e 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2781,7 +2784,7 @@ static int tty_file_checkpoint(struct ckpt_ctx *ctx, struc
int master_objref, slave_objref;
int ret;
 
-   tty = (struct tty_struct *)file-private_data;
+   tty = file_tty(file);
inode = file-f_path.dentry-d_inode;
if (tty_paranoia_check(tty, inode, tty_file_checkpoint))
return -EIO;
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel