[Devel] Re: USB Write Checkpoint
Quoting Ganzoo Awaasuren (ganaa.8...@gmail.com): Dear All Now I am changing and developing some kernel stuffs which is related to Linux FileSystems. My work is just to find the solution for the inconvinence use of the USB disk. We have to click safe remove button for flush the buffer data to disk before the plug off the USB disk . ... As far as i understand from your developed checkpoint technique on linux kernel , probably i can get it and apply it for my purpose. So is it possible to call that sys_checkpoint function in kernel's sys_write system call ? if it is possible then , is this checkpointed image will include that all of the open files ? Short answer is no. We freezer the userspace tasks for the duration of the checkpoint, but we are only guaranteeing consistency with respect to the read/write syscalls (we currently even refuse checkpoint for aio). What happens in the kernel to effect a write is not doesn't matter to us, it's assumed to complete. I'd imagine you could catch the fact that a write occurs to a disconnected device, start queueing up the data to be written, send an error message up through to userspace, through dbus to an error dialog, and warn the user to re-insert the device. But application c/r work is not going to be helpful here - unless an application is still actively writing, in which case part of the userspace callbacks in response to the error message from the kernel might be to do an lsof to find guilty tasks, and freeze them. -serge ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: LXC bringup issue on Fedora
On 06/02/2010 07:09 AM, Nirmal Guhan wrote: Hi, Am trying to get lenny (latest debian from http://ftp.us.debian.org/debian) run as a container on Fedora12 with 2.6.32.13 kernel and running into below error : lxc-start -n lennycont SELinux: Could not open policy file= /etc/selinux/targeted/policy/policy.24: No such file or directory INIT: version 2.86 booting INIT: Entering runlevel: 2 Starting enhanced syslogd: rsyslogd. Starting periodic command scheduler: crond. INIT: Id 4 respawning too fast: disabled for 5 minutes INIT: Id 2 respawning too fast: disabled for 5 minutes INIT: Id T1 respawning too fast: disabled for 5 minutes INIT: Id 1 respawning too fast: disabled for 5 minutes INIT: Id 5 respawning too fast: disabled for 5 minutes INIT: Id 3 respawning too fast: disabled for 5 minutes INIT: Id T0 respawning too fast: disabled for 5 minutes INIT: Id 6 respawning too fast: disabled for 5 minutes INIT: no more processes left in this runlevel My config file is as below : lxc.utsname = lennycont lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.ipv4 = 128.107.159.180/22 lxc.network.name = eth0 lxc.rootfs = /lxc/lenny-chroot lxc.mount = /lxc/lenny.fstab lxc.tty = 1 fstab : none /lxc/lenny-chroot/dev/pts devpts defaults 0 0 none /lxc/lenny-chroot/procproc defaults 0 0 none /lxc/lenny-chroot/sys sysfs defaults 0 0 none /lxc/lenny-chroot/dev/shm tmpfs defaults 0 0 I googled and found some solutions but none of them worked for me :-( Could you please help? Hi, it is probable the number of ttys of your container configuration does not match the number of ttys used by the container. Please ask to lxc-us...@lists.sourceforge.net Thanks -- Daniel ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On 06/01, Paul Menage wrote: On Mon, May 31, 2010 at 11:04 AM, Oleg Nesterov o...@redhat.com wrote: And, forgot to mention, I do not understand the PF_EXITING check in attach_task_by_pid() (and some others). At first glance, it buys nothing. PF_EXITING can be set right after the check. It can, but it's a benign race. Yes, Moving a non-current thread into a cgroup takes task-alloc_lock and checks for PF_EXITING before manipulating that thread's cgroup links. The exit procedure sets PF_EXITING and then (somewhat later, but guaranteed) moves current to the root cgroups while holding alloc_lock. Yes sure, I understand this part. cgroup_attach_task() correctly checks PF_EXITING under task_lock(), this protects from the case when this task has already passed cgroup_exit() which takes this lock too. But. This exactly means that the PF_EXITING check in attach_task_by_pid() is not necessary from the correctness pov (while probably can be considered as optimization), right? And, so it's fine to refuse to move it to a new cgroup. I am not sure. It doesn't hurt when we try to move a thread. But if we want to move the whole thread group, we should proceed even if the group leader has already exited and thus has PF_EXITING bit set. That was my question. Oleg. ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On Wed, Jun 2, 2010 at 7:06 AM, Oleg Nesterov o...@redhat.com wrote: Yes sure, I understand this part. cgroup_attach_task() correctly checks PF_EXITING under task_lock(), this protects from the case when this task has already passed cgroup_exit() which takes this lock too. Right. But. This exactly means that the PF_EXITING check in attach_task_by_pid() is not necessary from the correctness pov (while probably can be considered as optimization), right? For the value of correctness that was relevant when writing that code originally, I think it's fine. But ... I am not sure. It doesn't hurt when we try to move a thread. But if we want to move the whole thread group, we should proceed even if the group leader has already exited and thus has PF_EXITING bit set. Hmm, maybe. I could see this being argued both ways. Can the process hang around indefinitely with the leader as a zombie and the other threads still running? It wouldn't be that hard to make it possible to avoid relying on PF_EXITING as the check - instead we'd have an exiting_css_set object that have the same pointer set as init_css_set, but would be distinguishable from it as a task-cgroups pointer by virtue of being a different object. Then cgroup_exit() can reassign tasks to point to exiting_css_set rather than init_css_set, and the various checks that are currently made for (task-flags PF_EXITING) could check for (task-cgroups == exiting_css_set) instead. This would allow task movement further into the exit process. Paul ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On 06/02, Paul Menage wrote: On Wed, Jun 2, 2010 at 7:06 AM, Oleg Nesterov o...@redhat.com wrote: I am not sure. It doesn't hurt when we try to move a thread. But if we want to move the whole thread group, we should proceed even if the group leader has already exited and thus has PF_EXITING bit set. Hmm, maybe. I could see this being argued both ways. Can the process hang around indefinitely with the leader as a zombie and the other threads still running? Yes sure. The main thread can exit via sys_exit(), this doesn't affect the thread group. Of course, I am not saying this is common practice, perhaps no important app does this. It wouldn't be that hard to make it possible to avoid relying on PF_EXITING as the check - instead we'd have an exiting_css_set object that have the same pointer set as init_css_set, but would be distinguishable from it as a task-cgroups pointer by virtue of being a different object. Then cgroup_exit() can reassign tasks to point to exiting_css_set rather than init_css_set, and the various checks that are currently made for (task-flags PF_EXITING) could check for (task-cgroups == exiting_css_set) instead. This would allow task movement further into the exit process. It is too late for me to even try to understand the above ;) But I still can't understand why we can't just remove it. Both cgroup_attach_task() and cgroup_attach_proc() should handle the possible races correctly anyway. Oleg. ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On Wed, Jun 2, 2010 at 1:20 PM, Oleg Nesterov o...@redhat.com wrote: Yes sure. The main thread can exit via sys_exit(), this doesn't affect the thread group. Of course, I am not saying this is common practice, perhaps no important app does this. This check has been in cgroups for quite a while and no-one's complained so far... But I still can't understand why we can't just remove it. Both cgroup_attach_task() and cgroup_attach_proc() should handle the possible races correctly anyway. The it that you're proposing to remove is in fact the code that handles those races. Anyway, I think this issue is orthogonal to the movability of entire processes - with Ben's patch, an exited-but-not-reaped group leader is immovable either via tasks or cgroup.procs. Changing the race condition handling to allow such movement would be a separate issue. Paul Paul ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On 06/02, Paul Menage wrote: On Wed, Jun 2, 2010 at 1:20 PM, Oleg Nesterov o...@redhat.com wrote: Yes sure. The main thread can exit via sys_exit(), this doesn't affect the thread group. Of course, I am not saying this is common practice, perhaps no important app does this. This check has been in cgroups for quite a while and no-one's complained so far... Sure. because currently attach_task_by_pid() works with the single thread. But Ben's patch reuses this code to call cgroup_attach_proc(). But I still can't understand why we can't just remove it. Both cgroup_attach_task() and cgroup_attach_proc() should handle the possible races correctly anyway. The it that you're proposing to remove is in fact the code that handles those races. In that case I confused, and I thought we already agreed that the PF_EXITING check in attach_task_by_pid() is not strictly needed for correctness. Once again, the task can call do_exit() and set PF_EXITING right after the check. Anyway, I think this issue is orthogonal to the movability of entire processes - with Ben's patch, an exited-but-not-reaped group leader is immovable either via tasks or cgroup.procs. Hmm. As I said, I didn't really read this patch. But I thought that cgroup_attach_proc() tries to handle this. Anyway I agree, this is minor and I never pretended I understand this code. Hmm. The usage of -thread_group in -can_attach() methods doesn't look safe to me... but currently bool threadgroup is always false. Oleg. ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On Wed, Jun 2, 2010 at 1:58 PM, Oleg Nesterov o...@redhat.com wrote: The it that you're proposing to remove is in fact the code that handles those races. In that case I confused, and I thought we already agreed that the PF_EXITING check in attach_task_by_pid() is not strictly needed for correctness. Not quite - something is required for correctness, and the PF_EXITING check provides that correctness, with a very small window (between setting PF_EXITING and calling cgroup_exit) where we might arguably have been able to move the thread but decline to do so because it's simpler not to do so and no-one cares. That's the optimization that I meant - the data structures are slightly simpler since there's no way to tell when a task has passed cgroup_exit(), and instead we just see if they've passed PF_EXITING. Once again, the task can call do_exit() and set PF_EXITING right after the check. Yes, the important part is that they haven't set it *before* the check in attach_task_by_pid(). If they have set it before that, then they could be anywhere in the exit path after PF_EXITING, and we decline to move them since it's possible that they've already passed cgroup_exit(). If the exiting task has not yet set PF_EXITING, then it can't possibly get into the critical section in cgroup_exit() since attach_task_by_pid() holds task-alloc_lock. Paul ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On 06/02, Paul Menage wrote: On Wed, Jun 2, 2010 at 1:58 PM, Oleg Nesterov o...@redhat.com wrote: The it that you're proposing to remove is in fact the code that handles those races. In that case I confused, and I thought we already agreed that the PF_EXITING check in attach_task_by_pid() is not strictly needed for correctness. Not quite - something is required for correctness, and the PF_EXITING check provides that correctness, with a very small window (between setting PF_EXITING and calling cgroup_exit) where we might arguably have been able to move the thread but decline to do so because it's simpler not to do so and no-one cares. That's the optimization that I meant - the data structures are slightly simpler since there's no way to tell when a task has passed cgroup_exit(), and instead we just see if they've passed PF_EXITING. Once again, the task can call do_exit() and set PF_EXITING right after the check. Yes, the important part is that they haven't set it *before* the check in attach_task_by_pid(). If they have set it before that, then they could be anywhere in the exit path after PF_EXITING, and we decline to move them since it's possible that they've already passed cgroup_exit(). If the exiting task has not yet set PF_EXITING, then it can't possibly get into the critical section in cgroup_exit() since attach_task_by_pid() holds task-alloc_lock. It doesn't ? At least in Linus's tree. cgroup_attach_task() does, and this time PF_EXITING is understandable. Oleg. ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On Wed, Jun 2, 2010 at 2:38 PM, Oleg Nesterov o...@redhat.com wrote: cgroup_attach_task() does, and this time PF_EXITING is understandable. Oh, OK, I was talking about the one in cgroup_attach_task(), which is called by attach_task_by_pid(), and assumed that you were referring to the same one. I'd forgotten about the pre-check in attach_task_by_pid(). Yes, that one could probably be removed without affecting any final outcomes. Paul ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On Wed, Jun 02, 2010 at 10:58:55PM +0200, Oleg Nesterov wrote: Hmm. The usage of -thread_group in -can_attach() methods doesn't look safe to me... but currently bool threadgroup is always false. Oleg. I recall putting a rcu_read_lock() around that part and being assured that made it safe. But I don't remember the details. Maybe taking tasklist_lock is necessary? -- Ben ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On Wed, Jun 02, 2010 at 03:03:49PM -0700, Paul Menage wrote: On Wed, Jun 2, 2010 at 2:38 PM, Oleg Nesterov o...@redhat.com wrote: cgroup_attach_task() does, and this time PF_EXITING is understandable. Oh, OK, I was talking about the one in cgroup_attach_task(), which is called by attach_task_by_pid(), and assumed that you were referring to the same one. I'd forgotten about the pre-check in attach_task_by_pid(). Yes, that one could probably be removed without affecting any final outcomes. Paul Yes, I agree. The check should be moved into cgroup_attach_task before the ss-can_attach calls, so the optimization stays in the case where it's wanted. And the use of __task_cred after the exiting check seems safe too if the check is gone - from cred.h: The caller must make sure task doesn't go away, either by holding a ref on task or by holding tasklist_lock to prevent it from being unlinked. So that means it'd be safe to take a reference, task sets exiting and does everything before unhashing, then access the cred pointer; since we hold tasklist lock it's the same case. -- Ben ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable
On Mon, May 31, 2010 at 07:52:42PM +0200, Oleg Nesterov wrote: I only glanced into one function, cgroup_attach_proc(), and some things look obviously wrong. Sorry, I can't really read these patches now, most probably I misunderstood the code... +int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader) +{ + int retval; + struct cgroup_subsys *ss, *failed_ss = NULL; + struct cgroup *oldcgrp; + struct css_set *oldcg; + struct cgroupfs_root *root = cgrp-root; + /* threadgroup list cursor */ + struct task_struct *tsk; + /* +* we need to make sure we have css_sets for all the tasks we're +* going to move -before- we actually start moving them, so that in +* case we get an ENOMEM we can bail out before making any changes. +*/ + struct list_head newcg_list; + struct cg_list_entry *cg_entry, *temp_nobe; + + /* +* Note: Because of possible races with de_thread(), we can't +* distinguish between the case where the user gives a non-leader tid +* and the case where it changes out from under us. So both are allowed. +*/ OK, the caller has a reference to the argument, leader, + leader = leader-group_leader; But why it is safe to use leader-group_leader if we race with exec? This line means let's try to find who the leader is, since attach_task_by_pid doesn't grab it for us. It's not safe, and we still check if it's really the leader later (just before the 'commit point'). Note that before this line 'leader' doesn't really mean the leader - perhaps i should rename the variables :P But maybe I also want to grab a reference on the new task? I can't remember whether I need to or not. I'm not sure whether or not I need to grab an rcu lock, but it doesn't seem necessary because of the commit point check later on. Plus can_attach takes the rcu lock itself for iterating if it needs it. + list_for_each_entry_rcu(tsk, leader-thread_group, thread_group) { Even if we didn't change leader above, this is not safe in theory. We already discussed this, list_for_each_rcu(head) is only safe when we know that head itself is valid. Suppose that this leader exits, then leader-thread_group.next exits too before we take rcu_read_lock(). Why is that a problem? I thought leader-thread_group is supposed to stay sane as long as leader is the leader. This looks like it needs a check to see if 'leader' is still really the leader, but nothing more. + oldcgrp = task_cgroup_from_root(leader, root); + if (cgrp != oldcgrp) { + retval = cgroup_task_migrate(cgrp, oldcgrp, leader, true); + BUG_ON(retval != 0 retval != -ESRCH); + } + /* Now iterate over each thread in the group. */ + list_for_each_entry_rcu(tsk, leader-thread_group, thread_group) { + BUG_ON(tsk-signal != leader-signal); + /* leave current thread as it is if it's already there */ + oldcgrp = task_cgroup_from_root(tsk, root); + if (cgrp == oldcgrp) + continue; + /* we don't care whether these threads are exiting */ + retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true); + BUG_ON(retval != 0 retval != -ESRCH); + } This looks strange. Why do we move leader outside of the loop ? Of course, list_for_each_entry() can't work to move all sub-threads, but do while_each_thread() can. do/while_each_thread oves over all threads in the system, rather than just the threadgroup... this isn't supposed to be a fast operation, but that seems like overkill. From 0/2: recentish changes to signal_struct's lifetime rules (which don't seem to appear when I check out mmotm with git clone, already in Linus's tree. Oleg. -- Ben ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel