[Devel] Re: USB Write Checkpoint

2010-06-02 Thread Serge E. Hallyn
Quoting Ganzoo Awaasuren (ganaa.8...@gmail.com):
 Dear All
 
 Now I am changing and developing some kernel stuffs which is related to
 Linux FileSystems.
 
 My work is just to find the solution for the inconvinence use of the USB
 disk. We have to click safe remove button for flush the buffer data to
 disk before the plug off the USB disk .

...

 As far as i understand from your developed checkpoint technique on linux
 kernel , probably i can get it and apply it for my purpose. So is it
 possible to call that sys_checkpoint function in kernel's sys_write system
 call ? if it is possible then , is this checkpointed image will include that
 all of the open files ?

Short answer is no.  We freezer the userspace tasks for the duration
of the checkpoint, but we are only guaranteeing consistency with respect
to the read/write syscalls (we currently even refuse checkpoint for
aio).  What happens in the kernel to effect a write is not doesn't
matter to us, it's assumed to complete.

I'd imagine you could catch the fact that a write occurs to a disconnected
device, start queueing up the data to be written, send an error message
up through to userspace, through dbus to an error dialog, and warn the user
to re-insert the device.  But application c/r work is not going to be
helpful here - unless an application is still actively writing, in which
case part of the userspace callbacks in response to the error message from
the kernel might be to do an lsof to find guilty tasks, and freeze them.

-serge
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: LXC bringup issue on Fedora

2010-06-02 Thread Daniel Lezcano
On 06/02/2010 07:09 AM, Nirmal Guhan wrote:
 Hi,

 Am trying to get lenny (latest debian from http://ftp.us.debian.org/debian)
 run as a container on Fedora12 with 2.6.32.13 kernel and running into below
 error :

 lxc-start -n lennycont
 SELinux:  Could not open policy file=
 /etc/selinux/targeted/policy/policy.24:  No such file or directory
 INIT: version 2.86 booting
 INIT: Entering runlevel: 2
 Starting enhanced syslogd: rsyslogd.
 Starting periodic command scheduler: crond.

 INIT: Id 4 respawning too fast: disabled for 5 minutes
 INIT: Id 2 respawning too fast: disabled for 5 minutes
 INIT: Id T1 respawning too fast: disabled for 5 minutes
 INIT: Id 1 respawning too fast: disabled for 5 minutes
 INIT: Id 5 respawning too fast: disabled for 5 minutes
 INIT: Id 3 respawning too fast: disabled for 5 minutes
 INIT: Id T0 respawning too fast: disabled for 5 minutes
 INIT: Id 6 respawning too fast: disabled for 5 minutes
 INIT: no more processes left in this runlevel

 My config file is as below :

 lxc.utsname = lennycont
 lxc.network.type = veth
 lxc.network.flags = up
 lxc.network.link = br0
 lxc.network.ipv4 = 128.107.159.180/22
 lxc.network.name = eth0
 lxc.rootfs = /lxc/lenny-chroot
 lxc.mount = /lxc/lenny.fstab
 lxc.tty = 1

 fstab :
 none /lxc/lenny-chroot/dev/pts devpts defaults 0 0
 none /lxc/lenny-chroot/procproc   defaults 0 0
 none /lxc/lenny-chroot/sys sysfs  defaults 0 0
 none /lxc/lenny-chroot/dev/shm tmpfs  defaults 0 0

 I googled and found some solutions but none of them worked for me :-( Could
 you please help?


Hi,

it is probable the number of ttys of your container configuration does 
not match the number of ttys used by the container.
Please ask to lxc-us...@lists.sourceforge.net

Thanks
   -- Daniel
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Oleg Nesterov
On 06/01, Paul Menage wrote:

 On Mon, May 31, 2010 at 11:04 AM, Oleg Nesterov o...@redhat.com wrote:
 
  And, forgot to mention, I do not understand the PF_EXITING check in
  attach_task_by_pid() (and some others).
 
  At first glance, it buys nothing. PF_EXITING can be set right after
  the check.

 It can, but it's a benign race.

Yes,

 Moving a non-current thread into a cgroup takes task-alloc_lock and
 checks for PF_EXITING before manipulating that thread's cgroup links.
 The exit procedure sets PF_EXITING and then (somewhat later, but
 guaranteed) moves current to the root cgroups while holding
 alloc_lock.

Yes sure, I understand this part. cgroup_attach_task() correctly checks
PF_EXITING under task_lock(), this protects from the case when this
task has already passed cgroup_exit() which takes this lock too.

But. This exactly means that the PF_EXITING check in attach_task_by_pid()
is not necessary from the correctness pov (while probably can be considered
as optimization), right?

And,

 so it's fine to refuse to move it to a new cgroup.

I am not sure. It doesn't hurt when we try to move a thread. But if
we want to move the whole thread group, we should proceed even if the
group leader has already exited and thus has PF_EXITING bit set.

That was my question.

Oleg.

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Paul Menage
On Wed, Jun 2, 2010 at 7:06 AM, Oleg Nesterov o...@redhat.com wrote:

 Yes sure, I understand this part. cgroup_attach_task() correctly checks
 PF_EXITING under task_lock(), this protects from the case when this
 task has already passed cgroup_exit() which takes this lock too.

Right.


 But. This exactly means that the PF_EXITING check in attach_task_by_pid()
 is not necessary from the correctness pov (while probably can be considered
 as optimization), right?

For the value of correctness that was relevant when writing that code
originally, I think it's fine. But ...

 I am not sure. It doesn't hurt when we try to move a thread. But if
 we want to move the whole thread group, we should proceed even if the
 group leader has already exited and thus has PF_EXITING bit set.

Hmm, maybe. I could see this being argued both ways. Can the process
hang around indefinitely with the leader as a zombie and the other
threads still running?

It wouldn't be that hard to make it possible to avoid relying on
PF_EXITING as the check - instead we'd have an exiting_css_set object
that have the same pointer set as init_css_set, but would be
distinguishable from it as a task-cgroups pointer by virtue of being
a different object. Then cgroup_exit() can reassign tasks to point to
exiting_css_set rather than init_css_set, and the various checks that
are currently made for (task-flags  PF_EXITING) could check for
(task-cgroups == exiting_css_set) instead. This would allow task
movement further into the exit process.

Paul
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Oleg Nesterov
On 06/02, Paul Menage wrote:

 On Wed, Jun 2, 2010 at 7:06 AM, Oleg Nesterov o...@redhat.com wrote:
 
  I am not sure. It doesn't hurt when we try to move a thread. But if
  we want to move the whole thread group, we should proceed even if the
  group leader has already exited and thus has PF_EXITING bit set.

 Hmm, maybe. I could see this being argued both ways. Can the process
 hang around indefinitely with the leader as a zombie and the other
 threads still running?

Yes sure. The main thread can exit via sys_exit(), this doesn't affect
the thread group. Of course, I am not saying this is common practice,
perhaps no important app does this.

 It wouldn't be that hard to make it possible to avoid relying on
 PF_EXITING as the check - instead we'd have an exiting_css_set object
 that have the same pointer set as init_css_set, but would be
 distinguishable from it as a task-cgroups pointer by virtue of being
 a different object. Then cgroup_exit() can reassign tasks to point to
 exiting_css_set rather than init_css_set, and the various checks that
 are currently made for (task-flags  PF_EXITING) could check for
 (task-cgroups == exiting_css_set) instead. This would allow task
 movement further into the exit process.

It is too late for me to even try to understand the above ;)

But I still can't understand why we can't just remove it. Both
cgroup_attach_task() and cgroup_attach_proc() should handle the
possible races correctly anyway.

Oleg.

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Paul Menage
On Wed, Jun 2, 2010 at 1:20 PM, Oleg Nesterov o...@redhat.com wrote:

 Yes sure. The main thread can exit via sys_exit(), this doesn't affect
 the thread group. Of course, I am not saying this is common practice,
 perhaps no important app does this.

This check has been in cgroups for quite a while and no-one's
complained so far...

 But I still can't understand why we can't just remove it. Both
 cgroup_attach_task() and cgroup_attach_proc() should handle the
 possible races correctly anyway.

The it that you're proposing to remove is in fact the code that
handles those races.

Anyway, I think this issue is orthogonal to the movability of entire
processes - with Ben's patch, an exited-but-not-reaped group leader is
immovable either via tasks or cgroup.procs. Changing the race
condition handling to allow such movement would be a separate issue.

Paul

Paul
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Oleg Nesterov
On 06/02, Paul Menage wrote:

 On Wed, Jun 2, 2010 at 1:20 PM, Oleg Nesterov o...@redhat.com wrote:
 
  Yes sure. The main thread can exit via sys_exit(), this doesn't affect
  the thread group. Of course, I am not saying this is common practice,
  perhaps no important app does this.

 This check has been in cgroups for quite a while and no-one's
 complained so far...

Sure. because currently attach_task_by_pid() works with the single
thread. But Ben's patch reuses this code to call cgroup_attach_proc().

  But I still can't understand why we can't just remove it. Both
  cgroup_attach_task() and cgroup_attach_proc() should handle the
  possible races correctly anyway.

 The it that you're proposing to remove is in fact the code that
 handles those races.

In that case I confused, and I thought we already agreed that
the PF_EXITING check in attach_task_by_pid() is not strictly needed
for correctness.

Once again, the task can call do_exit() and set PF_EXITING right
after the check.

 Anyway, I think this issue is orthogonal to the movability of entire
 processes - with Ben's patch, an exited-but-not-reaped group leader is
 immovable either via tasks or cgroup.procs.

Hmm. As I said, I didn't really read this patch. But I thought that
cgroup_attach_proc() tries to handle this.

Anyway I agree, this is minor and I never pretended I understand
this code.


Hmm. The usage of -thread_group in -can_attach() methods doesn't
look safe to me... but currently bool threadgroup is always false.

Oleg.

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Paul Menage
On Wed, Jun 2, 2010 at 1:58 PM, Oleg Nesterov o...@redhat.com wrote:
 The it that you're proposing to remove is in fact the code that
 handles those races.

 In that case I confused, and I thought we already agreed that
 the PF_EXITING check in attach_task_by_pid() is not strictly needed
 for correctness.

Not quite - something is required for correctness, and the PF_EXITING
check provides that correctness, with a very small window (between
setting PF_EXITING and calling cgroup_exit) where we might arguably
have been able to move the thread but decline to do so because it's
simpler not to do so and no-one cares. That's the optimization that I
meant - the data structures are slightly simpler since there's no way
to tell when a task has passed cgroup_exit(), and instead we just see
if they've passed PF_EXITING.


 Once again, the task can call do_exit() and set PF_EXITING right
 after the check.

Yes, the important part is that they haven't set it *before* the check
in attach_task_by_pid(). If they have set it before that, then they
could be anywhere in the exit path after PF_EXITING, and we decline to
move them since it's possible that they've already passed
cgroup_exit(). If the exiting task has not yet set PF_EXITING, then it
can't possibly get into the critical section in cgroup_exit() since
attach_task_by_pid() holds task-alloc_lock.

Paul
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Oleg Nesterov
On 06/02, Paul Menage wrote:

 On Wed, Jun 2, 2010 at 1:58 PM, Oleg Nesterov o...@redhat.com wrote:
  The it that you're proposing to remove is in fact the code that
  handles those races.
 
  In that case I confused, and I thought we already agreed that
  the PF_EXITING check in attach_task_by_pid() is not strictly needed
  for correctness.

 Not quite - something is required for correctness, and the PF_EXITING
 check provides that correctness, with a very small window (between
 setting PF_EXITING and calling cgroup_exit) where we might arguably
 have been able to move the thread but decline to do so because it's
 simpler not to do so and no-one cares. That's the optimization that I
 meant - the data structures are slightly simpler since there's no way
 to tell when a task has passed cgroup_exit(), and instead we just see
 if they've passed PF_EXITING.

 
  Once again, the task can call do_exit() and set PF_EXITING right
  after the check.

 Yes, the important part is that they haven't set it *before* the check
 in attach_task_by_pid(). If they have set it before that, then they
 could be anywhere in the exit path after PF_EXITING, and we decline to
 move them since it's possible that they've already passed
 cgroup_exit(). If the exiting task has not yet set PF_EXITING, then it
 can't possibly get into the critical section in cgroup_exit() since
 attach_task_by_pid() holds task-alloc_lock.

It doesn't ? At least in Linus's tree.

cgroup_attach_task() does, and this time PF_EXITING is understandable.

Oleg.

___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Paul Menage
On Wed, Jun 2, 2010 at 2:38 PM, Oleg Nesterov o...@redhat.com wrote:

 cgroup_attach_task() does, and this time PF_EXITING is understandable.


Oh, OK, I was talking about the one in cgroup_attach_task(), which is
called by attach_task_by_pid(), and assumed that you were referring to
the same one. I'd forgotten about the pre-check in
attach_task_by_pid(). Yes, that one could probably be removed without
affecting any final outcomes.

Paul
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Ben Blum
On Wed, Jun 02, 2010 at 10:58:55PM +0200, Oleg Nesterov wrote:
 Hmm. The usage of -thread_group in -can_attach() methods doesn't
 look safe to me... but currently bool threadgroup is always false.
 
 Oleg.
 
 

I recall putting a rcu_read_lock() around that part and being assured
that made it safe. But I don't remember the details. Maybe taking
tasklist_lock is necessary?

-- Ben
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Ben Blum
On Wed, Jun 02, 2010 at 03:03:49PM -0700, Paul Menage wrote:
 On Wed, Jun 2, 2010 at 2:38 PM, Oleg Nesterov o...@redhat.com wrote:
 
  cgroup_attach_task() does, and this time PF_EXITING is understandable.
 
 
 Oh, OK, I was talking about the one in cgroup_attach_task(), which is
 called by attach_task_by_pid(), and assumed that you were referring to
 the same one. I'd forgotten about the pre-check in
 attach_task_by_pid(). Yes, that one could probably be removed without
 affecting any final outcomes.
 
 Paul

Yes, I agree. The check should be moved into cgroup_attach_task before
the ss-can_attach calls, so the optimization stays in the case where
it's wanted.

And the use of __task_cred after the exiting check seems safe too if the
check is gone - from cred.h: The caller must make sure task doesn't go
away, either by holding a ref on task or by holding tasklist_lock to
prevent it from being unlinked. So that means it'd be safe to take a
reference, task sets exiting and does everything before unhashing, then
access the cred pointer; since we hold tasklist lock it's the same case.

-- Ben
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: [RFC] [PATCH 2/2] cgroups: make procs file writable

2010-06-02 Thread Ben Blum
On Mon, May 31, 2010 at 07:52:42PM +0200, Oleg Nesterov wrote:
 I only glanced into one function, cgroup_attach_proc(), and some things
 look obviously wrong. Sorry, I can't really read these patches now,
 most probably I misunderstood the code...
 
  +int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
  +{
  +   int retval;
  +   struct cgroup_subsys *ss, *failed_ss = NULL;
  +   struct cgroup *oldcgrp;
  +   struct css_set *oldcg;
  +   struct cgroupfs_root *root = cgrp-root;
  +   /* threadgroup list cursor */
  +   struct task_struct *tsk;
  +   /*
  +* we need to make sure we have css_sets for all the tasks we're
  +* going to move -before- we actually start moving them, so that in
  +* case we get an ENOMEM we can bail out before making any changes.
  +*/
  +   struct list_head newcg_list;
  +   struct cg_list_entry *cg_entry, *temp_nobe;
  +
  +   /*
  +* Note: Because of possible races with de_thread(), we can't
  +* distinguish between the case where the user gives a non-leader tid
  +* and the case where it changes out from under us. So both are allowed.
  +*/
 
 OK, the caller has a reference to the argument, leader,
 
  +   leader = leader-group_leader;
 
 But why it is safe to use leader-group_leader if we race with exec?

This line means let's try to find who the leader is, since
attach_task_by_pid doesn't grab it for us. It's not safe, and we still
check if it's really the leader later (just before the 'commit point').
Note that before this line 'leader' doesn't really mean the leader -
perhaps i should rename the variables :P

But maybe I also want to grab a reference on the new task? I can't
remember whether I need to or not. I'm not sure whether or not I need to
grab an rcu lock, but it doesn't seem necessary because of the commit
point check later on. Plus can_attach takes the rcu lock itself for
iterating if it needs it.

 
  +   list_for_each_entry_rcu(tsk, leader-thread_group, thread_group) {
 
 Even if we didn't change leader above, this is not safe in theory.
 We already discussed this, list_for_each_rcu(head) is only safe when
 we know that head itself is valid.
 
 Suppose that this leader exits, then leader-thread_group.next exits
 too before we take rcu_read_lock().

Why is that a problem? I thought leader-thread_group is supposed to
stay sane as long as leader is the leader.

This looks like it needs a check to see if 'leader' is still really the
leader, but nothing more.

 
  +   oldcgrp = task_cgroup_from_root(leader, root);
  +   if (cgrp != oldcgrp) {
  +   retval = cgroup_task_migrate(cgrp, oldcgrp, leader, true);
  +   BUG_ON(retval != 0  retval != -ESRCH);
  +   }
  +   /* Now iterate over each thread in the group. */
  +   list_for_each_entry_rcu(tsk, leader-thread_group, thread_group) {
  +   BUG_ON(tsk-signal != leader-signal);
  +   /* leave current thread as it is if it's already there */
  +   oldcgrp = task_cgroup_from_root(tsk, root);
  +   if (cgrp == oldcgrp)
  +   continue;
  +   /* we don't care whether these threads are exiting */
  +   retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
  +   BUG_ON(retval != 0  retval != -ESRCH);
  +   }
 
 This looks strange. Why do we move leader outside of the loop ?
 Of course, list_for_each_entry() can't work to move all sub-threads,
 but do while_each_thread() can.

do/while_each_thread oves over all threads in the system, rather than
just the threadgroup... this isn't supposed to be a fast operation, but
that seems like overkill.

 
 From 0/2:
 
  recentish changes to signal_struct's lifetime rules (which don't seem to
  appear when I check out mmotm with git clone,
 
 already in Linus's tree.
 
 Oleg.
 

-- Ben
___
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel