Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-07-26 Thread Jan Engelhardt
Hi,


sorry for the really delayed mail..

On May 28 2007 22:53, Eric W. Biederman wrote:
>> Jan Engelhardt writes:
>>> On Apr 10 2007 17:47, Jan Engelhardt wrote:
>>>
>>> Done that and the result is that `ps afwx` now looks like:
>>>
>>>   PID TTY  STAT   TIME COMMAND
>>>  2722 ?S  0:00 [lockd]
>> ...
>>> 3 ?S< 0:00 [events/0]
>>> 2 ?SN 0:00 [ksoftirqd/0]
>>> 1 ?Ss 0:02 init [3]
>>>   537 ?S>>  1600 ?Ss 0:00  \_ /usr/bin/dbus-daemon --system
>>>  1692 ?Ss 0:00  \_ /sbin/acpid
>>>  1923 ?Ss 0:00  \_ /sbin/resmgrd
>> ...
>>> -if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u')
>>> +if(ADOPTED(processes[i]) && forest_type!='u')
>>
>> That's not compatible because init's children are now in the
>> logical place. Since the days of procps-1.x.x or earlier,
>> such processes have been listed at top level.
>>
>> BTW, what does "ps -ejH" do for you, with and without the patch?

2.6.22, without kernel patch, ps -ejH (shortened):

  PID  PGID   SID TTY  TIME CMD
2 0 0 ?00:00:00 kthreadd
3 0 0 ?00:00:00   migration/0
1 1 1 ?00:00:00 init
  821   821   821 ?00:00:00   udevd
 2228  2228  2228 ?00:00:00   klogd

and `ps afx`:

  PID TTY  STAT   TIME COMMAND
2 ?S< 0:00 [kthreadd]
3 ?S< 0:00  \_ [migration/0]
1 ?Ss 0:00 init [5]  
  821 ?Sps -ejH displays everything.

So does `ps afx` for me ;-)

>For 2.6.22 we will only have kthreadd
>as a sibling of init with ppid == 0.  Depending on what happens
>in the evolution of how we start kernel thread we may be able
>to remove kthreadd and have all kthreads with a ppid of 0, but only
>time will tell.
>

Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-07-26 Thread Jan Engelhardt
Hi,


sorry for the really delayed mail..

On May 28 2007 22:53, Eric W. Biederman wrote:
 Jan Engelhardt writes:
 On Apr 10 2007 17:47, Jan Engelhardt wrote:

 Done that and the result is that `ps afwx` now looks like:

   PID TTY  STAT   TIME COMMAND
  2722 ?S  0:00 [lockd]
 ...
 3 ?S 0:00 [events/0]
 2 ?SN 0:00 [ksoftirqd/0]
 1 ?Ss 0:02 init [3]
   537 ?Ss0:02  \_ /sbin/udevd --daemon
  1600 ?Ss 0:00  \_ /usr/bin/dbus-daemon --system
  1692 ?Ss 0:00  \_ /sbin/acpid
  1923 ?Ss 0:00  \_ /sbin/resmgrd
 ...
 -if(self_pid==1  ADOPTED(processes[i])  forest_type!='u')
 +if(ADOPTED(processes[i])  forest_type!='u')

 That's not compatible because init's children are now in the
 logical place. Since the days of procps-1.x.x or earlier,
 such processes have been listed at top level.

 BTW, what does ps -ejH do for you, with and without the patch?

2.6.22, without kernel patch, ps -ejH (shortened):

  PID  PGID   SID TTY  TIME CMD
2 0 0 ?00:00:00 kthreadd
3 0 0 ?00:00:00   migration/0
1 1 1 ?00:00:00 init
  821   821   821 ?00:00:00   udevd
 2228  2228  2228 ?00:00:00   klogd

and `ps afx`:

  PID TTY  STAT   TIME COMMAND
2 ?S 0:00 [kthreadd]
3 ?S 0:00  \_ [migration/0]
1 ?Ss 0:00 init [5]  
  821 ?Ss0:00 /sbin/udevd --daemon

With procps patch: it's all thrown up again, possibly due to some
kernel patch that made it into 2.6.22.

(ps -ejH)
  PID  PGID   SID TTY  TIME CMD
2 0 0 ?00:00:00 kthreadd
3 0 0 ?00:00:00   migration/0
4 0 0 ?00:00:00   ksoftirqd/0
1 1 1 ?00:00:00 init
  821   821   821 ?00:00:00   udevd
 2228  2228  2228 ?00:00:00   klogd


(ps afx)
  PID TTY  STAT   TIME COMMAND
2 ?S 0:00 [kthreadd]
3 ?S 0:00 [migration/0]
4 ?SN 0:00 [ksoftirqd/0]
1 ?Ss 0:00 init [5]  
  821 ?Ss0:00 /sbin/udevd --daemon
 2228 ?Ss 0:00 /sbin/klogd -c 5 -2 -x


(procps 3.2.7, the one used back when this thread was alive :^))

ps -ejH displays everything.

So does `ps afx` for me ;-)

For 2.6.22 we will only have kthreadd
as a sibling of init with ppid == 0.  Depending on what happens
in the evolution of how we start kernel thread we may be able
to remove kthreadd and have all kthreads with a ppid of 0, but only
time will tell.


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-29 Thread Eric W. Biederman
"Albert Cahalan" <[EMAIL PROTECTED]> writes:

> On 5/29/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> "Albert Cahalan" <[EMAIL PROTECTED]> writes:
>
> That's not what I mean. (the "-e" causes that of course)
> I'm asking about the parent-child relationships shown.
> The "-H" option is a bit different from the "f" option.

Yes.  Sorry on the unmodified ps the parent-child relationship
seems to be displayed properly.  

>>> I'd be a lot happier about breaking compatibility in this area
>>> if I could get a functional adoption flag. That is, I really
>>> would like to show a process as child of init if it naturally
>>> was created as a child of init. It's less informative to have
>>> fake children showing up the same as real ones. The original
>>> parent PID would do. (BTW, the original parent name and/or
>>> grandparent PID would be great to have) As a bonus, the kernel
>>> could reap these processes more quickly than init can... and
>>> then maybe we can stop caring if init is alive.
>>
>> Having the kernel not reparent user processes to init is an interesting
>> idea, especially when those processes have not existed.  I'm not
>> certain that is POSIX complaint and otherwise backwards compatible.
>
> I'm not suggesting that this be visible via POSIX APIs.
>
> It's almost certainly a given that getppid() must return 1, and
> probably /proc needs to show this as well. Without question,
> any process created by init must be reaped by init.
>
> Processes NOT created by init could be silently reaped by
> the kernel. They need to see their own PPID as 1, but there
> need not be any parent-child relationship in the kernel data
> structures. The kernel can fake the whole thing, which is nice
> because then the kernel isn't depending on userspace to
> correctly perform the pointless action of playing with zombies.
> (might setting the death signal to 0 be useful here?)
>
> For "ps fax" and such, I'd like to distinguish between init's
> real and adopted children. Right now the adopted children
> look like they were created by init, which is not true. I only
> need a simple boolean flag, set upon reparenting, to tell me.
> Such a flag may also be useful for optimizing away the whole
> wait/waitpid/wait4/waitid/wait3 nonsense when an adopted
> child dies.

I will keep it in mind.  A simple this process has been reparented
flag probably won't be too bad.   As for the rest I'm not certain.

With pid namespaces there is a certain sense in doing something like
this, but I'm not certain /sbin/init and all of it's replacements
don't care (although admittedly it would be a stretch to tell the
difference).

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-29 Thread Albert Cahalan

On 5/29/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

"Albert Cahalan" <[EMAIL PROTECTED]> writes:
> Jan Engelhardt writes:



-if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u')
+if(ADOPTED(processes[i]) && forest_type!='u')


That's not compatible because init's children are now in the
logical place. Since the days of procps-1.x.x or earlier,
such processes have been listed at top level.

BTW, what does "ps -ejH" do for you, with and without the patch?


ps -ejH displays everything.


That's not what I mean. (the "-e" causes that of course)
I'm asking about the parent-child relationships shown.
The "-H" option is a bit different from the "f" option.


I'd be a lot happier about breaking compatibility in this area
if I could get a functional adoption flag. That is, I really
would like to show a process as child of init if it naturally
was created as a child of init. It's less informative to have
fake children showing up the same as real ones. The original
parent PID would do. (BTW, the original parent name and/or
grandparent PID would be great to have) As a bonus, the kernel
could reap these processes more quickly than init can... and
then maybe we can stop caring if init is alive.


Having the kernel not reparent user processes to init is an interesting
idea, especially when those processes have not existed.  I'm not
certain that is POSIX complaint and otherwise backwards compatible.


I'm not suggesting that this be visible via POSIX APIs.

It's almost certainly a given that getppid() must return 1, and
probably /proc needs to show this as well. Without question,
any process created by init must be reaped by init.

Processes NOT created by init could be silently reaped by
the kernel. They need to see their own PPID as 1, but there
need not be any parent-child relationship in the kernel data
structures. The kernel can fake the whole thing, which is nice
because then the kernel isn't depending on userspace to
correctly perform the pointless action of playing with zombies.
(might setting the death signal to 0 be useful here?)

For "ps fax" and such, I'd like to distinguish between init's
real and adopted children. Right now the adopted children
look like they were created by init, which is not true. I only
need a simple boolean flag, set upon reparenting, to tell me.
Such a flag may also be useful for optimizing away the whole
wait/waitpid/wait4/waitid/wait3 nonsense when an adopted
child dies.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-29 Thread Albert Cahalan

On 5/29/07, Eric W. Biederman [EMAIL PROTECTED] wrote:

Albert Cahalan [EMAIL PROTECTED] writes:
 Jan Engelhardt writes:



-if(self_pid==1  ADOPTED(processes[i])  forest_type!='u')
+if(ADOPTED(processes[i])  forest_type!='u')


That's not compatible because init's children are now in the
logical place. Since the days of procps-1.x.x or earlier,
such processes have been listed at top level.

BTW, what does ps -ejH do for you, with and without the patch?


ps -ejH displays everything.


That's not what I mean. (the -e causes that of course)
I'm asking about the parent-child relationships shown.
The -H option is a bit different from the f option.


I'd be a lot happier about breaking compatibility in this area
if I could get a functional adoption flag. That is, I really
would like to show a process as child of init if it naturally
was created as a child of init. It's less informative to have
fake children showing up the same as real ones. The original
parent PID would do. (BTW, the original parent name and/or
grandparent PID would be great to have) As a bonus, the kernel
could reap these processes more quickly than init can... and
then maybe we can stop caring if init is alive.


Having the kernel not reparent user processes to init is an interesting
idea, especially when those processes have not existed.  I'm not
certain that is POSIX complaint and otherwise backwards compatible.


I'm not suggesting that this be visible via POSIX APIs.

It's almost certainly a given that getppid() must return 1, and
probably /proc needs to show this as well. Without question,
any process created by init must be reaped by init.

Processes NOT created by init could be silently reaped by
the kernel. They need to see their own PPID as 1, but there
need not be any parent-child relationship in the kernel data
structures. The kernel can fake the whole thing, which is nice
because then the kernel isn't depending on userspace to
correctly perform the pointless action of playing with zombies.
(might setting the death signal to 0 be useful here?)

For ps fax and such, I'd like to distinguish between init's
real and adopted children. Right now the adopted children
look like they were created by init, which is not true. I only
need a simple boolean flag, set upon reparenting, to tell me.
Such a flag may also be useful for optimizing away the whole
wait/waitpid/wait4/waitid/wait3 nonsense when an adopted
child dies.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-29 Thread Eric W. Biederman
Albert Cahalan [EMAIL PROTECTED] writes:

 On 5/29/07, Eric W. Biederman [EMAIL PROTECTED] wrote:
 Albert Cahalan [EMAIL PROTECTED] writes:

 That's not what I mean. (the -e causes that of course)
 I'm asking about the parent-child relationships shown.
 The -H option is a bit different from the f option.

Yes.  Sorry on the unmodified ps the parent-child relationship
seems to be displayed properly.  

 I'd be a lot happier about breaking compatibility in this area
 if I could get a functional adoption flag. That is, I really
 would like to show a process as child of init if it naturally
 was created as a child of init. It's less informative to have
 fake children showing up the same as real ones. The original
 parent PID would do. (BTW, the original parent name and/or
 grandparent PID would be great to have) As a bonus, the kernel
 could reap these processes more quickly than init can... and
 then maybe we can stop caring if init is alive.

 Having the kernel not reparent user processes to init is an interesting
 idea, especially when those processes have not existed.  I'm not
 certain that is POSIX complaint and otherwise backwards compatible.

 I'm not suggesting that this be visible via POSIX APIs.

 It's almost certainly a given that getppid() must return 1, and
 probably /proc needs to show this as well. Without question,
 any process created by init must be reaped by init.

 Processes NOT created by init could be silently reaped by
 the kernel. They need to see their own PPID as 1, but there
 need not be any parent-child relationship in the kernel data
 structures. The kernel can fake the whole thing, which is nice
 because then the kernel isn't depending on userspace to
 correctly perform the pointless action of playing with zombies.
 (might setting the death signal to 0 be useful here?)

 For ps fax and such, I'd like to distinguish between init's
 real and adopted children. Right now the adopted children
 look like they were created by init, which is not true. I only
 need a simple boolean flag, set upon reparenting, to tell me.
 Such a flag may also be useful for optimizing away the whole
 wait/waitpid/wait4/waitid/wait3 nonsense when an adopted
 child dies.

I will keep it in mind.  A simple this process has been reparented
flag probably won't be too bad.   As for the rest I'm not certain.

With pid namespaces there is a certain sense in doing something like
this, but I'm not certain /sbin/init and all of it's replacements
don't care (although admittedly it would be a stretch to tell the
difference).

Eric

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Roland McGrath
> Having the kernel not reparent user processes to init is an interesting
> idea, especially when those processes have not existed.  I'm not
> certain that is POSIX complaint and otherwise backwards compatible.

It's hard to see how it would work.  There has to be some parent PID.  The
reason using 1 makes sense is that it is always there.  Anything >0 and not
the PID of some live process could be reused for a new process at some point.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Eric W. Biederman
"Albert Cahalan" <[EMAIL PROTECTED]> writes:

> This has long been rotten. Mind fixing it for us? :-)
>
> We have N types of thread on M CPUs. Pick something, N or M,
> to be at the top level in /proc. The other goes below, in the
> per-process task directories.
>
> You then have either N or M things showing up in ps, not N*M.
>
> Note that both ps and top can print the CPU number just fine.
> Abusing the task name for this is just retarded. This suggests
> that the top level should be the type of task, with the lower
> level in /proc/*/task being per-CPU and not needing distinct
> naming at all.

In a lot of ways that is reasonable.  However kernel threads don't
share signal handling and getting to the point where they could share
signal handling would be difficult so we cannot use the generic
CLONE_THREAD handling they really are more like individual processes.
So at that level the cpu number in the name is just to help tell them
apart. 

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Eric W. Biederman
"Albert Cahalan" <[EMAIL PROTECTED]> writes:

> Jan Engelhardt writes:
>> On Apr 10 2007 17:47, Jan Engelhardt wrote:
>>> On Apr 8 2007 20:57, Oleg Nesterov wrote:
>
 Anyway, re-parenting to swapper breaks pstree, it doesn't
 show kernel threads. And if ->parent == /sbin/init, we can't
 remove us from ->children (unless we forbid sub-thread-of-init
 exec). So the only safe change is set ->exit_state = -1.
>>>
>>> Then we have to fix pstree and all that. (In fact, I'm
>>> trying to patch `ps f` to DTRT ;p)
>>
>> Done that and the result is that `ps afwx` now looks like:
>>
>>   PID TTY  STAT   TIME COMMAND
>>  2722 ?S  0:00 [lockd]
> ...
>> 3 ?S< 0:00 [events/0]
>> 2 ?SN 0:00 [ksoftirqd/0]
>> 1 ?Ss 0:02 init [3]
>>   537 ?S>  1600 ?Ss 0:00  \_ /usr/bin/dbus-daemon --system
>>  1692 ?Ss 0:00  \_ /sbin/acpid
>>  1923 ?Ss 0:00  \_ /sbin/resmgrd
> ...
>> -if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u')
>> +if(ADOPTED(processes[i]) && forest_type!='u')
>
> That's not compatible because init's children are now in the
> logical place. Since the days of procps-1.x.x or earlier,
> such processes have been listed at top level.
>
> BTW, what does "ps -ejH" do for you, with and without the patch?

ps -ejH displays everything.  For 2.6.22 we will only have kthreadd
as a sibling of init with ppid == 0.  Depending on what happens
in the evolution of how we start kernel thread we may be able
to remove kthreadd and have all kthreads with a ppid of 0, but only
time will tell.

> I'd be a lot happier about breaking compatibility in this area
> if I could get a functional adoption flag. That is, I really
> would like to show a process as child of init if it naturally
> was created as a child of init. It's less informative to have
> fake children showing up the same as real ones. The original
> parent PID would do. (BTW, the original parent name and/or
> grandparent PID would be great to have) As a bonus, the kernel
> could reap these processes more quickly than init can... and
> then maybe we can stop caring if init is alive.

Having the kernel not reparent user processes to init is an interesting
idea, especially when those processes have not existed.  I'm not
certain that is POSIX complaint and otherwise backwards compatible.

Eric



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Albert Cahalan

Jan Engelhardt writes:

On Apr 10 2007 17:47, Jan Engelhardt wrote:

On Apr 8 2007 20:57, Oleg Nesterov wrote:



Anyway, re-parenting to swapper breaks pstree, it doesn't
show kernel threads. And if ->parent == /sbin/init, we can't
remove us from ->children (unless we forbid sub-thread-of-init
exec). So the only safe change is set ->exit_state = -1.


Then we have to fix pstree and all that. (In fact, I'm
trying to patch `ps f` to DTRT ;p)


Done that and the result is that `ps afwx` now looks like:

  PID TTY  STAT   TIME COMMAND
 2722 ?S  0:00 [lockd]

...

3 ?S< 0:00 [events/0]
2 ?SN 0:00 [ksoftirqd/0]
1 ?Ss 0:02 init [3]
  537 ?S
...

-if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u')
+if(ADOPTED(processes[i]) && forest_type!='u')


That's not compatible because init's children are now in the
logical place. Since the days of procps-1.x.x or earlier,
such processes have been listed at top level.

BTW, what does "ps -ejH" do for you, with and without the patch?

I'd be a lot happier about breaking compatibility in this area
if I could get a functional adoption flag. That is, I really
would like to show a process as child of init if it naturally
was created as a child of init. It's less informative to have
fake children showing up the same as real ones. The original
parent PID would do. (BTW, the original parent name and/or
grandparent PID would be great to have) As a bonus, the kernel
could reap these processes more quickly than init can... and
then maybe we can stop caring if init is alive.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Albert Cahalan

Robin Holt writes:

On Mon, Apr 09, 2007 at 08:36:21AM -0600, Eric W. Biederman wrote:

Robin Holt <[EMAIL PROTECTED]> writes:



I would say this is more a benefit than a problem.  With a couple
of these systems we are testing, the number of kernel threads is
far greater than the number of user processes and having pstree
not normally show them, but maybe have an option we add later to
show them again would be beneficial.


Sure.

Robin how many kernel thread per cpu are you seeing?


10.


This has long been rotten. Mind fixing it for us? :-)

We have N types of thread on M CPUs. Pick something, N or M,
to be at the top level in /proc. The other goes below, in the
per-process task directories.

You then have either N or M things showing up in ps, not N*M.

Note that both ps and top can print the CPU number just fine.
Abusing the task name for this is just retarded. This suggests
that the top level should be the type of task, with the lower
level in /proc/*/task being per-CPU and not needing distinct
naming at all.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Albert Cahalan

Robin Holt writes:

On Mon, Apr 09, 2007 at 08:36:21AM -0600, Eric W. Biederman wrote:

Robin Holt [EMAIL PROTECTED] writes:



I would say this is more a benefit than a problem.  With a couple
of these systems we are testing, the number of kernel threads is
far greater than the number of user processes and having pstree
not normally show them, but maybe have an option we add later to
show them again would be beneficial.


Sure.

Robin how many kernel thread per cpu are you seeing?


10.


This has long been rotten. Mind fixing it for us? :-)

We have N types of thread on M CPUs. Pick something, N or M,
to be at the top level in /proc. The other goes below, in the
per-process task directories.

You then have either N or M things showing up in ps, not N*M.

Note that both ps and top can print the CPU number just fine.
Abusing the task name for this is just retarded. This suggests
that the top level should be the type of task, with the lower
level in /proc/*/task being per-CPU and not needing distinct
naming at all.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Albert Cahalan

Jan Engelhardt writes:

On Apr 10 2007 17:47, Jan Engelhardt wrote:

On Apr 8 2007 20:57, Oleg Nesterov wrote:



Anyway, re-parenting to swapper breaks pstree, it doesn't
show kernel threads. And if -parent == /sbin/init, we can't
remove us from -children (unless we forbid sub-thread-of-init
exec). So the only safe change is set -exit_state = -1.


Then we have to fix pstree and all that. (In fact, I'm
trying to patch `ps f` to DTRT ;p)


Done that and the result is that `ps afwx` now looks like:

  PID TTY  STAT   TIME COMMAND
 2722 ?S  0:00 [lockd]

...

3 ?S 0:00 [events/0]
2 ?SN 0:00 [ksoftirqd/0]
1 ?Ss 0:02 init [3]
  537 ?Ss0:02  \_ /sbin/udevd --daemon
 1600 ?Ss 0:00  \_ /usr/bin/dbus-daemon --system
 1692 ?Ss 0:00  \_ /sbin/acpid
 1923 ?Ss 0:00  \_ /sbin/resmgrd

...

-if(self_pid==1  ADOPTED(processes[i])  forest_type!='u')
+if(ADOPTED(processes[i])  forest_type!='u')


That's not compatible because init's children are now in the
logical place. Since the days of procps-1.x.x or earlier,
such processes have been listed at top level.

BTW, what does ps -ejH do for you, with and without the patch?

I'd be a lot happier about breaking compatibility in this area
if I could get a functional adoption flag. That is, I really
would like to show a process as child of init if it naturally
was created as a child of init. It's less informative to have
fake children showing up the same as real ones. The original
parent PID would do. (BTW, the original parent name and/or
grandparent PID would be great to have) As a bonus, the kernel
could reap these processes more quickly than init can... and
then maybe we can stop caring if init is alive.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Eric W. Biederman
Albert Cahalan [EMAIL PROTECTED] writes:

 Jan Engelhardt writes:
 On Apr 10 2007 17:47, Jan Engelhardt wrote:
 On Apr 8 2007 20:57, Oleg Nesterov wrote:

 Anyway, re-parenting to swapper breaks pstree, it doesn't
 show kernel threads. And if -parent == /sbin/init, we can't
 remove us from -children (unless we forbid sub-thread-of-init
 exec). So the only safe change is set -exit_state = -1.

 Then we have to fix pstree and all that. (In fact, I'm
 trying to patch `ps f` to DTRT ;p)

 Done that and the result is that `ps afwx` now looks like:

   PID TTY  STAT   TIME COMMAND
  2722 ?S  0:00 [lockd]
 ...
 3 ?S 0:00 [events/0]
 2 ?SN 0:00 [ksoftirqd/0]
 1 ?Ss 0:02 init [3]
   537 ?Ss0:02  \_ /sbin/udevd --daemon
  1600 ?Ss 0:00  \_ /usr/bin/dbus-daemon --system
  1692 ?Ss 0:00  \_ /sbin/acpid
  1923 ?Ss 0:00  \_ /sbin/resmgrd
 ...
 -if(self_pid==1  ADOPTED(processes[i])  forest_type!='u')
 +if(ADOPTED(processes[i])  forest_type!='u')

 That's not compatible because init's children are now in the
 logical place. Since the days of procps-1.x.x or earlier,
 such processes have been listed at top level.

 BTW, what does ps -ejH do for you, with and without the patch?

ps -ejH displays everything.  For 2.6.22 we will only have kthreadd
as a sibling of init with ppid == 0.  Depending on what happens
in the evolution of how we start kernel thread we may be able
to remove kthreadd and have all kthreads with a ppid of 0, but only
time will tell.

 I'd be a lot happier about breaking compatibility in this area
 if I could get a functional adoption flag. That is, I really
 would like to show a process as child of init if it naturally
 was created as a child of init. It's less informative to have
 fake children showing up the same as real ones. The original
 parent PID would do. (BTW, the original parent name and/or
 grandparent PID would be great to have) As a bonus, the kernel
 could reap these processes more quickly than init can... and
 then maybe we can stop caring if init is alive.

Having the kernel not reparent user processes to init is an interesting
idea, especially when those processes have not existed.  I'm not
certain that is POSIX complaint and otherwise backwards compatible.

Eric



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Eric W. Biederman
Albert Cahalan [EMAIL PROTECTED] writes:

 This has long been rotten. Mind fixing it for us? :-)

 We have N types of thread on M CPUs. Pick something, N or M,
 to be at the top level in /proc. The other goes below, in the
 per-process task directories.

 You then have either N or M things showing up in ps, not N*M.

 Note that both ps and top can print the CPU number just fine.
 Abusing the task name for this is just retarded. This suggests
 that the top level should be the type of task, with the lower
 level in /proc/*/task being per-CPU and not needing distinct
 naming at all.

In a lot of ways that is reasonable.  However kernel threads don't
share signal handling and getting to the point where they could share
signal handling would be difficult so we cannot use the generic
CLONE_THREAD handling they really are more like individual processes.
So at that level the cpu number in the name is just to help tell them
apart. 

Eric

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-05-28 Thread Roland McGrath
 Having the kernel not reparent user processes to init is an interesting
 idea, especially when those processes have not existed.  I'm not
 certain that is POSIX complaint and otherwise backwards compatible.

It's hard to see how it would work.  There has to be some parent PID.  The
reason using 1 makes sense is that it is always there.  Anything 0 and not
the PID of some live process could be reused for a new process at some point.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-10 Thread Jan Engelhardt

On Apr 10 2007 17:47, Jan Engelhardt wrote:
>On Apr 8 2007 20:57, Oleg Nesterov wrote:
>>
>>Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
>>threads. And if ->parent == /sbin/init, we can't remove us from ->children
>>(unless we forbid sub-thread-of-init exec). So the only safe change is
>>set ->exit_state = -1.
>
>Then we have to fix pstree and all that. (In fact, I'm
>trying to patch `ps f` to DTRT ;p)

Done that and the result is that `ps afwx` now looks like:

  PID TTY  STAT   TIME COMMAND
 2722 ?S  0:00 [lockd]
 2721 ?S< 0:00 [rpciod/0]
  749 ?S< 0:00 [ata_aux]
  748 ?S< 0:00 [ata/0]
  496 ?S< 0:00 [xfssyncd]
  495 ?S< 0:00 [xfsbufd]
  484 ?S< 0:00 [xfsdatad/0]
  482 ?S< 0:00 [xfslogd/0]
  427 ?S< 0:00 [scsi_eh_0]
  204 ?S< 0:00 [kpsmoused]
  110 ?S< 0:00 [aio/0]
  109 ?S< 0:00 [kswapd0]
  108 ?S  0:00 [pdflush]
  107 ?S  0:00 [pdflush]
   85 ?S< 0:00 [kseriod]
   21 ?S< 0:00 [kacpid]
   20 ?S< 0:00 [kblockd/0]
5 ?S< 0:00 [kthread]
4 ?S< 0:00 [khelper]
3 ?S< 0:00 [events/0]
2 ?SN 0:00 [ksoftirqd/0]
1 ?Ss 0:02 init [3]  
  537 ?Sppid != self_pid) more_children = 0;
-if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u')
+if(ADOPTED(processes[i]) && forest_type!='u')
   show_tree(i++, n, level,   more_children);
 else
   show_tree(i++, n, level+1, more_children);


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-10 Thread Jan Engelhardt

On Apr 8 2007 20:57, Oleg Nesterov wrote:
>
>Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
>threads. And if ->parent == /sbin/init, we can't remove us from ->children
>(unless we forbid sub-thread-of-init exec). So the only safe change is
>set ->exit_state = -1.

Then we have to fix pstree and all that. (In fact, I'm secretly
trying to patch `ps f` to DTRT ;p)


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-10 Thread Robin Holt
On Mon, Apr 09, 2007 at 06:35:39PM -0600, Eric W. Biederman wrote:
> Robin Holt <[EMAIL PROTECTED]> writes:
> 
> > OK.  I just got the OK from management.  The system we were booting was
> > for research only.  We had NR_CPUS=num_online_cpus()=4096 which were
> > non-hyperthreaded.  With no attached I/O and the tweak I originally
> > posted plus one change Jack has already gotten accepted, the machine
> > booted in approx 12 minutes.
> 
> How much of that time was between the time the kernel was loaded
> and before user space was started?

I am scrambling for my timed boot logs.  All I am finding right now
are boots from 512p machines.  For that machine, it took 178 seconds
to get to mounting the root filesystem and 75 seconds to get to the
login prompt.  Of that 75 seconds, 24 was because an NFS server was
down and 14 was due to PBS taking time to start.


68.994913| Memory: 2074872032k/2082172528k available (9785k code, 7319136k 
reserved, 5228k data, 672k init)
36.566117| migration_cost=3869,61218,76048
24.074113| mount: RPC: Remote system error - No route to host
13.972892| PBS mom

Those are the long standouts.  There are quite a few lines in the boot
output which take 2 seconds or less, but none of those are too surprising.
Plain and simply, booting that large of a machine has always taken
considerable time.  The first time we booted a 512p, it took nearly an
hour, now that is down to 8 or 9 minutes.

>
> Twelve minutes sounds like a long time for a boot, if you aren't fsck'ing
> filesystems.

Not fsck'ing filesystems (xfs).  One of the NFS servers that was specified
in /etc/fstab was missing, but that is unimportant.

Thanks,
Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-10 Thread Robin Holt
On Mon, Apr 09, 2007 at 06:35:39PM -0600, Eric W. Biederman wrote:
 Robin Holt [EMAIL PROTECTED] writes:
 
  OK.  I just got the OK from management.  The system we were booting was
  for research only.  We had NR_CPUS=num_online_cpus()=4096 which were
  non-hyperthreaded.  With no attached I/O and the tweak I originally
  posted plus one change Jack has already gotten accepted, the machine
  booted in approx 12 minutes.
 
 How much of that time was between the time the kernel was loaded
 and before user space was started?

I am scrambling for my timed boot logs.  All I am finding right now
are boots from 512p machines.  For that machine, it took 178 seconds
to get to mounting the root filesystem and 75 seconds to get to the
login prompt.  Of that 75 seconds, 24 was because an NFS server was
down and 14 was due to PBS taking time to start.


68.994913| Memory: 2074872032k/2082172528k available (9785k code, 7319136k 
reserved, 5228k data, 672k init)
36.566117| migration_cost=3869,61218,76048
24.074113| mount: RPC: Remote system error - No route to host
13.972892| PBS mom

Those are the long standouts.  There are quite a few lines in the boot
output which take 2 seconds or less, but none of those are too surprising.
Plain and simply, booting that large of a machine has always taken
considerable time.  The first time we booted a 512p, it took nearly an
hour, now that is down to 8 or 9 minutes.


 Twelve minutes sounds like a long time for a boot, if you aren't fsck'ing
 filesystems.

Not fsck'ing filesystems (xfs).  One of the NFS servers that was specified
in /etc/fstab was missing, but that is unimportant.

Thanks,
Robin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-10 Thread Jan Engelhardt

On Apr 8 2007 20:57, Oleg Nesterov wrote:

Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
threads. And if -parent == /sbin/init, we can't remove us from -children
(unless we forbid sub-thread-of-init exec). So the only safe change is
set -exit_state = -1.

Then we have to fix pstree and all that. (In fact, I'm secretly
trying to patch `ps f` to DTRT ;p)


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-10 Thread Jan Engelhardt

On Apr 10 2007 17:47, Jan Engelhardt wrote:
On Apr 8 2007 20:57, Oleg Nesterov wrote:

Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
threads. And if -parent == /sbin/init, we can't remove us from -children
(unless we forbid sub-thread-of-init exec). So the only safe change is
set -exit_state = -1.

Then we have to fix pstree and all that. (In fact, I'm
trying to patch `ps f` to DTRT ;p)

Done that and the result is that `ps afwx` now looks like:

  PID TTY  STAT   TIME COMMAND
 2722 ?S  0:00 [lockd]
 2721 ?S 0:00 [rpciod/0]
  749 ?S 0:00 [ata_aux]
  748 ?S 0:00 [ata/0]
  496 ?S 0:00 [xfssyncd]
  495 ?S 0:00 [xfsbufd]
  484 ?S 0:00 [xfsdatad/0]
  482 ?S 0:00 [xfslogd/0]
  427 ?S 0:00 [scsi_eh_0]
  204 ?S 0:00 [kpsmoused]
  110 ?S 0:00 [aio/0]
  109 ?S 0:00 [kswapd0]
  108 ?S  0:00 [pdflush]
  107 ?S  0:00 [pdflush]
   85 ?S 0:00 [kseriod]
   21 ?S 0:00 [kacpid]
   20 ?S 0:00 [kblockd/0]
5 ?S 0:00 [kthread]
4 ?S 0:00 [khelper]
3 ?S 0:00 [events/0]
2 ?SN 0:00 [ksoftirqd/0]
1 ?Ss 0:02 init [3]  
  537 ?Ss0:02  \_ /sbin/udevd --daemon
 1600 ?Ss 0:00  \_ /usr/bin/dbus-daemon --system
 1692 ?Ss 0:00  \_ /sbin/acpid
 1923 ?Ss 0:00  \_ /sbin/resmgrd
 1985 ?Ss 0:00  \_ /usr/sbin/polkitd
 2014 ?Ss 0:02  \_ /usr/sbin/hald --daemon=yes
 2022 ?S  0:00  |   \_ hald-runner
 2051 ?S  0:00  |   \_ hald-addon-keyboard: listening on /d
 2061 ?S  0:00  |   \_ hald-addon-keyboard: listening on /d
 2085 ?S  0:00  |   \_ hald-addon-acpi: listening on acpid 
 2086 ?S  0:00  |   \_ hald-addon-storage: polling /dev/hdc
 2601 ?Ss 0:00  \_ /sbin/syslog-ng
 2602 ?Ss 0:00  \_ /usr/sbin/sshd -o PidFile=/var/run/sshd.init
 2607 ?Ss 0:00  \_ /sbin/klogd -c 1 -x -x
 2617 ?Ss 0:00  \_ /usr/sbin/gpm -m /dev/input/mice -t ps2
 2623 ?Ss 0:00  \_ /sbin/portmap
 2634 ?Ss 0:00  \_ login -- root 
 2688 tty1 Ss+0:00  |   \_ -bash
 2635 ?Ss 0:00  \_ login -- jengelh 
 2733 tty2 Ss 0:00  |   \_ -bash
 2850 tty2 R+ 0:00  |   \_ ./ps afwx
 2636 tty3 Ss+0:00  \_ /sbin/mingetty tty3
 2637 tty4 Ss+0:00  \_ /sbin/mingetty tty4
 2638 tty5 Ss+0:00  \_ /sbin/mingetty tty5
 2651 tty6 Ss+0:00  \_ /sbin/mingetty tty6

Except for the unsorted bunch before process 1, this looks good, and
of course it only took a single line change.

Index: procps-3.2.7/ps/display.c
===
--- procps-3.2.7.orig/ps/display.c
+++ procps-3.2.7/ps/display.c
@@ -482,7 +482,7 @@ static void show_tree(const int self, co
   more_children = 0;
 else
   if(processes[i+1]-ppid != self_pid) more_children = 0;
-if(self_pid==1  ADOPTED(processes[i])  forest_type!='u')
+if(ADOPTED(processes[i])  forest_type!='u')
   show_tree(i++, n, level,   more_children);
 else
   show_tree(i++, n, level+1, more_children);


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Eric W. Biederman
Robin Holt <[EMAIL PROTECTED]> writes:

> OK.  I just got the OK from management.  The system we were booting was
> for research only.  We had NR_CPUS=num_online_cpus()=4096 which were
> non-hyperthreaded.  With no attached I/O and the tweak I originally
> posted plus one change Jack has already gotten accepted, the machine
> booted in approx 12 minutes.

How much of that time was between the time the kernel was loaded
and before user space was started?

Twelve minutes sounds like a long time for a boot, if you aren't fsck'ing
filesystems.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Robin Holt
OK.  I just got the OK from management.  The system we were booting was
for research only.  We had NR_CPUS=num_online_cpus()=4096 which were
non-hyperthreaded.  With no attached I/O and the tweak I originally
posted plus one change Jack has already gotten accepted, the machine
booted in approx 12 minutes.

Thanks,
Robin


On Mon, Apr 09, 2007 at 10:20:27AM -0600, Eric W. Biederman wrote:
> Roland McGrath <[EMAIL PROTECTED]> writes:
> 
> > I concur with Eric's assessment.  Adding new magic bits to the generic
> > clone path seems like a poor way to cope with kernel threads.  I think
> > it's better if kernel thread setup gets less like normal user process
> > setup.  I also agree with Eric that PPID of 0 is a very natural way for
> > kernel threads to be displayed.  We need to know more about the nature
> > of the compatibility issue in procps to judge whether there is good
> > reason to avoid changing it.
> 
> I just investigated the procps issue.  Using init_task as the parent
> nothing sticks out as being wrong in /proc.
> 
> Further when I modified pstree to accept 0 as it's starting pid (from
> which all else would be rooted).  All of the kernel threads showed up.
> 
> So if anything I it is a feature that kernel threads don't show up
> by default in pstree (when PPID == 0).  It isn't a subtle kernel bug.
> 
> Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Eric W. Biederman
Roland McGrath <[EMAIL PROTECTED]> writes:

> I concur with Eric's assessment.  Adding new magic bits to the generic
> clone path seems like a poor way to cope with kernel threads.  I think
> it's better if kernel thread setup gets less like normal user process
> setup.  I also agree with Eric that PPID of 0 is a very natural way for
> kernel threads to be displayed.  We need to know more about the nature
> of the compatibility issue in procps to judge whether there is good
> reason to avoid changing it.

I just investigated the procps issue.  Using init_task as the parent
nothing sticks out as being wrong in /proc.

Further when I modified pstree to accept 0 as it's starting pid (from
which all else would be rooted).  All of the kernel threads showed up.

So if anything I it is a feature that kernel threads don't show up
by default in pstree (when PPID == 0).  It isn't a subtle kernel bug.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Robin Holt
On Mon, Apr 09, 2007 at 08:36:21AM -0600, Eric W. Biederman wrote:
> Robin Holt <[EMAIL PROTECTED]> writes:
> 
> > I would say this is more a benefit than a problem.  With a couple of these
> > systems we are testing, the number of kernel threads is far greater than
> > the number of user processes and having pstree not normally show them, but
> > maybe have an option we add later to show them again would be beneficial.
> 
> Sure.
> 
> Robin how many kernel thread per cpu are you seeing?

10.

FYI, pid 1539 is kthread.

a01:~ # ps -ef | egrep "\[.*\/255\]" 
root   512 1  0 Apr08 ?00:00:00 [migration/255]
root   513 1  0 Apr08 ?00:00:00 [ksoftirqd/255]
root  1281 1  0 Apr08 ?00:00:02 [events/255]
root  2435  1539  0 Apr08 ?00:00:00 [kblockd/255]
root  3159  1539  0 Apr08 ?00:00:00 [aio/255]
root  4007  1539  0 Apr08 ?00:00:00 [cqueue/255]
root  8653  1539  0 Apr08 ?00:00:00 [ata/255]
root 17438  1539  0 Apr08 ?00:00:00 [xfslogd/255]
root 17950  1539  0 Apr08 ?00:00:00 [xfsdatad/255]
root 18426  1539  0 Apr08 ?00:00:00 [rpciod/255]


Thanks,
Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Eric W. Biederman
Robin Holt <[EMAIL PROTECTED]> writes:

> I would say this is more a benefit than a problem.  With a couple of these
> systems we are testing, the number of kernel threads is far greater than
> the number of user processes and having pstree not normally show them, but
> maybe have an option we add later to show them again would be beneficial.

Sure.

Robin how many kernel thread per cpu are you seeing?

I don't know that we have a problem in this regard but it feels like
you are seeing a lot more kernel threads per cpu than I would expect
from looking at small systems.

I am wondering if somewhere we are using per cpu kernel threads when
we should not be...

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Robin Holt
On Mon, Apr 09, 2007 at 01:06:43PM +0400, Oleg Nesterov wrote:
> On 04/08, Eric W. Biederman wrote:
> > There is a practical question how much we care about pstree being
> > confused (I assume it doesn't crash).  If this is just a confusion
> > issue then I say go for it.  PPID == 0 is a very legitimate way to say
> > the kernel is the parent process.
> 
> No, it doesn't crash. It just doesn't show kernel threads (ps ax is OK).
> I didn't look into the sources, but I guess the reason is that pstree
> assumes that the "root" of the tree is "pid == 1" process.

I would say this is more a benefit than a problem.  With a couple of these
systems we are testing, the number of kernel threads is far greater than
the number of user processes and having pstree not normally show them, but
maybe have an option we add later to show them again would be beneficial.

Thanks,
Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Oleg Nesterov
On 04/08, Eric W. Biederman wrote:
>
> Oleg Nesterov <[EMAIL PROTECTED]> writes:
> 
> > Perhaps it is better to add reparent_kthread() (next patch) to kthread()
> > and forget about CLONE_KERNEL_THREAD.
> 
> Please. 

OK, will do tomorrow.

> > Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
> > threads. And if ->parent == /sbin/init, we can't remove us from ->children
> > (unless we forbid sub-thread-of-init exec). So the only safe change is
> > set ->exit_state = -1.
> 
> Yes.  We certainly need ->exit_state = -1.
> Earlier I had forgotten about second the use of ->children to update
> the parent pointer of processes when their parent exits.
> 
> There is a practical question how much we care about pstree being
> confused (I assume it doesn't crash).  If this is just a confusion
> issue then I say go for it.  PPID == 0 is a very legitimate way to say
> the kernel is the parent process.

No, it doesn't crash. It just doesn't show kernel threads (ps ax is OK).
I didn't look into the sources, but I guess the reason is that pstree
assumes that the "root" of the tree is "pid == 1" process.

I personally think this is acceptable (and Roland seems to think the same).
Still, to be safe, I'll break this into 2 patches, the first one sets
->exit_state, the second re-parents to swapper.

In fact, we can do some odd things to make pstree happy. We need ->parent
only because /proc needs some ->parent fields. But I'd prefer to avoid
these hacks.

Still, it is sad that we can't have additional flags for kernel_thread().
However, I agree with your and Roland's objections.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Oleg Nesterov
On 04/08, Eric W. Biederman wrote:

 Oleg Nesterov [EMAIL PROTECTED] writes:
 
  Perhaps it is better to add reparent_kthread() (next patch) to kthread()
  and forget about CLONE_KERNEL_THREAD.
 
 Please. 

OK, will do tomorrow.

  Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
  threads. And if -parent == /sbin/init, we can't remove us from -children
  (unless we forbid sub-thread-of-init exec). So the only safe change is
  set -exit_state = -1.
 
 Yes.  We certainly need -exit_state = -1.
 Earlier I had forgotten about second the use of -children to update
 the parent pointer of processes when their parent exits.
 
 There is a practical question how much we care about pstree being
 confused (I assume it doesn't crash).  If this is just a confusion
 issue then I say go for it.  PPID == 0 is a very legitimate way to say
 the kernel is the parent process.

No, it doesn't crash. It just doesn't show kernel threads (ps ax is OK).
I didn't look into the sources, but I guess the reason is that pstree
assumes that the root of the tree is pid == 1 process.

I personally think this is acceptable (and Roland seems to think the same).
Still, to be safe, I'll break this into 2 patches, the first one sets
-exit_state, the second re-parents to swapper.

In fact, we can do some odd things to make pstree happy. We need -parent
only because /proc needs some -parent fields. But I'd prefer to avoid
these hacks.

Still, it is sad that we can't have additional flags for kernel_thread().
However, I agree with your and Roland's objections.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Robin Holt
On Mon, Apr 09, 2007 at 01:06:43PM +0400, Oleg Nesterov wrote:
 On 04/08, Eric W. Biederman wrote:
  There is a practical question how much we care about pstree being
  confused (I assume it doesn't crash).  If this is just a confusion
  issue then I say go for it.  PPID == 0 is a very legitimate way to say
  the kernel is the parent process.
 
 No, it doesn't crash. It just doesn't show kernel threads (ps ax is OK).
 I didn't look into the sources, but I guess the reason is that pstree
 assumes that the root of the tree is pid == 1 process.

I would say this is more a benefit than a problem.  With a couple of these
systems we are testing, the number of kernel threads is far greater than
the number of user processes and having pstree not normally show them, but
maybe have an option we add later to show them again would be beneficial.

Thanks,
Robin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Eric W. Biederman
Robin Holt [EMAIL PROTECTED] writes:

 I would say this is more a benefit than a problem.  With a couple of these
 systems we are testing, the number of kernel threads is far greater than
 the number of user processes and having pstree not normally show them, but
 maybe have an option we add later to show them again would be beneficial.

Sure.

Robin how many kernel thread per cpu are you seeing?

I don't know that we have a problem in this regard but it feels like
you are seeing a lot more kernel threads per cpu than I would expect
from looking at small systems.

I am wondering if somewhere we are using per cpu kernel threads when
we should not be...

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Robin Holt
On Mon, Apr 09, 2007 at 08:36:21AM -0600, Eric W. Biederman wrote:
 Robin Holt [EMAIL PROTECTED] writes:
 
  I would say this is more a benefit than a problem.  With a couple of these
  systems we are testing, the number of kernel threads is far greater than
  the number of user processes and having pstree not normally show them, but
  maybe have an option we add later to show them again would be beneficial.
 
 Sure.
 
 Robin how many kernel thread per cpu are you seeing?

10.

FYI, pid 1539 is kthread.

a01:~ # ps -ef | egrep \[.*\/255\] 
root   512 1  0 Apr08 ?00:00:00 [migration/255]
root   513 1  0 Apr08 ?00:00:00 [ksoftirqd/255]
root  1281 1  0 Apr08 ?00:00:02 [events/255]
root  2435  1539  0 Apr08 ?00:00:00 [kblockd/255]
root  3159  1539  0 Apr08 ?00:00:00 [aio/255]
root  4007  1539  0 Apr08 ?00:00:00 [cqueue/255]
root  8653  1539  0 Apr08 ?00:00:00 [ata/255]
root 17438  1539  0 Apr08 ?00:00:00 [xfslogd/255]
root 17950  1539  0 Apr08 ?00:00:00 [xfsdatad/255]
root 18426  1539  0 Apr08 ?00:00:00 [rpciod/255]


Thanks,
Robin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Eric W. Biederman
Roland McGrath [EMAIL PROTECTED] writes:

 I concur with Eric's assessment.  Adding new magic bits to the generic
 clone path seems like a poor way to cope with kernel threads.  I think
 it's better if kernel thread setup gets less like normal user process
 setup.  I also agree with Eric that PPID of 0 is a very natural way for
 kernel threads to be displayed.  We need to know more about the nature
 of the compatibility issue in procps to judge whether there is good
 reason to avoid changing it.

I just investigated the procps issue.  Using init_task as the parent
nothing sticks out as being wrong in /proc.

Further when I modified pstree to accept 0 as it's starting pid (from
which all else would be rooted).  All of the kernel threads showed up.

So if anything I it is a feature that kernel threads don't show up
by default in pstree (when PPID == 0).  It isn't a subtle kernel bug.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Robin Holt
OK.  I just got the OK from management.  The system we were booting was
for research only.  We had NR_CPUS=num_online_cpus()=4096 which were
non-hyperthreaded.  With no attached I/O and the tweak I originally
posted plus one change Jack has already gotten accepted, the machine
booted in approx 12 minutes.

Thanks,
Robin


On Mon, Apr 09, 2007 at 10:20:27AM -0600, Eric W. Biederman wrote:
 Roland McGrath [EMAIL PROTECTED] writes:
 
  I concur with Eric's assessment.  Adding new magic bits to the generic
  clone path seems like a poor way to cope with kernel threads.  I think
  it's better if kernel thread setup gets less like normal user process
  setup.  I also agree with Eric that PPID of 0 is a very natural way for
  kernel threads to be displayed.  We need to know more about the nature
  of the compatibility issue in procps to judge whether there is good
  reason to avoid changing it.
 
 I just investigated the procps issue.  Using init_task as the parent
 nothing sticks out as being wrong in /proc.
 
 Further when I modified pstree to accept 0 as it's starting pid (from
 which all else would be rooted).  All of the kernel threads showed up.
 
 So if anything I it is a feature that kernel threads don't show up
 by default in pstree (when PPID == 0).  It isn't a subtle kernel bug.
 
 Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Eric W. Biederman
Robin Holt [EMAIL PROTECTED] writes:

 OK.  I just got the OK from management.  The system we were booting was
 for research only.  We had NR_CPUS=num_online_cpus()=4096 which were
 non-hyperthreaded.  With no attached I/O and the tweak I originally
 posted plus one change Jack has already gotten accepted, the machine
 booted in approx 12 minutes.

How much of that time was between the time the kernel was loaded
and before user space was started?

Twelve minutes sounds like a long time for a boot, if you aren't fsck'ing
filesystems.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Roland McGrath
I concur with Eric's assessment.  Adding new magic bits to the generic
clone path seems like a poor way to cope with kernel threads.  I think
it's better if kernel thread setup gets less like normal user process
setup.  I also agree with Eric that PPID of 0 is a very natural way for
kernel threads to be displayed.  We need to know more about the nature
of the compatibility issue in procps to judge whether there is good
reason to avoid changing it.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Eric W. Biederman
Oleg Nesterov <[EMAIL PROTECTED]> writes:

> On 04/08, Eric W. Biederman wrote:
>
>> If we are going to have kernel only flags please use an additional
>> argument to do_fork and copy_process.
>
> Yes, we can do this. But we have a number of architectures which use
> sys_clone() to implement kernel_thread(). It would be nice to have an
> architecture neutral kernel_thread() implementation as you proposed.
> We should change all of them if we want to add a new parameter to
> do_fork().
>
> Perhaps it is better to add reparent_kthread() (next patch) to kthread()
> and forget about CLONE_KERNEL_THREAD.

Please. 

> Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
> threads. And if ->parent == /sbin/init, we can't remove us from ->children
> (unless we forbid sub-thread-of-init exec). So the only safe change is
> set ->exit_state = -1.

Yes.  We certainly need ->exit_state = -1.
Earlier I had forgotten about second the use of ->children to update
the parent pointer of processes when their parent exits.

There is a practical question how much we care about pstree being
confused (I assume it doesn't crash).  If this is just a confusion
issue then I say go for it.  PPID == 0 is a very legitimate way to say
the kernel is the parent process.

There are a few more cases where we are likely to get PPID == 0 in the
future and /sbin/init already has that now.  Plus there is a lot of
historic precedent.  The odd part is PPID = 0 having multiple
children.

If we decide maintaining a tree is important I would much rather put
init_task on the task_list so we can see it in /proc then go the other
way around.

I would like a confirmation that it PPID == 0 is what is confusing
pstree just to make certain we haven't half filled in some field
in init_task and are thus giving in correct /proc output.  But that is
all the double checking I would do.

>> Your current scheme also has the bad side that if user space supplied
>> a kernel flag it is hard to detect it and return -EINVAL.  Which
>> limits future expansion.  Silently dropping clone flags is a real
>> pain, if you are trying to detect if a new flag has been implemented.
>
> Yes. But that is what we are doing now. copy_process() just ignores
> unknown flags.

Agreed.  I fixed that in sys_unshare but I should really submit a
patch to do the same for sys_clone at some point.

When know flags aren't implemented we certainly return -EINVAL.

Given that this line of work looks to fix the race that messes allows
a threaded init to generate unkillable zombies I can probably find
some time in the next while to work on it.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Oleg Nesterov
On 04/08, Eric W. Biederman wrote:
> Oleg Nesterov <[EMAIL PROTECTED]> writes:
> 
> > For review only.
> >
> > To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them
> > in sys_clone().
> 
> Nack
> 
> The current clone_flags field is for user space consumption and we
> have proposed users for all or almost all of the remaining bits.

OK.

> If we are going to have kernel only flags please use an additional
> argument to do_fork and copy_process.

Yes, we can do this. But we have a number of architectures which use
sys_clone() to implement kernel_thread(). It would be nice to have an
architecture neutral kernel_thread() implementation as you proposed.
We should change all of them if we want to add a new parameter to
do_fork().

Perhaps it is better to add reparent_kthread() (next patch) to kthread()
and forget about CLONE_KERNEL_THREAD.

Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
threads. And if ->parent == /sbin/init, we can't remove us from ->children
(unless we forbid sub-thread-of-init exec). So the only safe change is
set ->exit_state = -1.

> Your current scheme also has the bad side that if user space supplied
> a kernel flag it is hard to detect it and return -EINVAL.  Which
> limits future expansion.  Silently dropping clone flags is a real
> pain, if you are trying to detect if a new flag has been implemented.

Yes. But that is what we are doing now. copy_process() just ignores
unknown flags.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Eric W. Biederman
Oleg Nesterov <[EMAIL PROTECTED]> writes:

> For review only.
>
> To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them
> in sys_clone().

Nack

The current clone_flags field is for user space consumption and we
have proposed users for all or almost all of the remaining bits.

If we are going to have kernel only flags please use an additional
argument to do_fork and copy_process.

Your current scheme also has the bad side that if user space supplied
a kernel flag it is hard to detect it and return -EINVAL.  Which
limits future expansion.  Silently dropping clone flags is a real
pain, if you are trying to detect if a new flag has been implemented.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Oleg Nesterov
For review only.

To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them
in sys_clone().

These below

arch/sparc/
arch/sparc64/
arch/ia64/
arch/v850/
arch/xtensa/

are not changed, they use assembly to implement sys_clone().

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

 include/linux/sched.h   |3 +++
 arch/alpha/kernel/process.c |3 ++-
 arch/arm/kernel/sys_arm.c   |3 ++-
 arch/arm26/kernel/sys_arm.c |2 +-
 arch/avr32/kernel/process.c |2 +-
 arch/blackfin/kernel/process.c  |2 +-
 arch/cris/arch-v10/kernel/process.c |3 ++-
 arch/cris/arch-v32/kernel/process.c |3 ++-
 arch/frv/kernel/process.c   |3 ++-
 arch/h8300/kernel/process.c |2 +-
 arch/i386/kernel/process.c  |2 +-
 arch/m32r/kernel/process.c  |2 +-
 arch/m68k/kernel/process.c  |2 +-
 arch/m68knommu/kernel/process.c |2 +-
 arch/mips/kernel/linux32.c  |2 +-
 arch/mips/kernel/syscall.c  |2 +-
 arch/parisc/kernel/process.c|3 ++-
 arch/powerpc/kernel/process.c   |3 ++-
 arch/s390/kernel/compat_linux.c |2 +-
 arch/s390/kernel/process.c  |2 +-
 arch/sh/kernel/process.c|2 +-
 arch/sh64/kernel/process.c  |2 +-
 arch/um/sys-i386/syscalls.c |4 ++--
 arch/um/sys-x86_64/syscalls.c   |4 ++--
 arch/x86_64/ia32/sys_ia32.c |3 ++-
 arch/x86_64/kernel/process.c|3 ++-
 arch/xtensa/kernel/process.c|3 ++-
 27 files changed, 41 insertions(+), 28 deletions(-)

--- 2.6.21-rc5-mm4/include/linux/sched.h~1_MASK 2007-04-07 20:11:14.0 
+0400
+++ 2.6.21-rc5-mm4/include/linux/sched.h2007-04-07 22:07:50.0 
+0400
@@ -27,6 +27,9 @@
 #define CLONE_NEWUTS   0x0400  /* New utsname group? */
 #define CLONE_NEWIPC   0x0800  /* New ipcs */
 
+/* user-space visible */
+#define SYS_CLONE_MASK (~0x0)
+
 /*
  * Scheduling policies
  */
--- 2.6.21-rc5-mm4/arch/alpha/kernel/process.c~1_MASK   2007-04-07 
20:11:11.0 +0400
+++ 2.6.21-rc5-mm4/arch/alpha/kernel/process.c  2007-04-07 22:32:37.0 
+0400
@@ -249,7 +249,8 @@ alpha_clone(unsigned long clone_flags, u
if (!usp)
usp = rdusp();
 
-   return do_fork(clone_flags, usp, regs, 0, parent_tid, child_tid);
+   return do_fork(clone_flags & SYS_CLONE_MASK, usp, regs, 0,
+   parent_tid, child_tid);
 }
 
 int
--- 2.6.21-rc5-mm4/arch/arm/kernel/sys_arm.c~1_MASK 2007-01-07 
22:57:52.0 +0300
+++ 2.6.21-rc5-mm4/arch/arm/kernel/sys_arm.c2007-04-07 22:10:02.0 
+0400
@@ -252,7 +252,8 @@ asmlinkage int sys_clone(unsigned long c
if (!newsp)
newsp = regs->ARM_sp;
 
-   return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, 
child_tidptr);
+   return do_fork(clone_flags & SYS_CLONE_MASK, newsp, regs, 0,
+   parent_tidptr, child_tidptr);
 }
 
 asmlinkage int sys_vfork(struct pt_regs *regs)
--- 2.6.21-rc5-mm4/arch/arm26/kernel/sys_arm.c~1_MASK   2006-10-22 
18:23:57.0 +0400
+++ 2.6.21-rc5-mm4/arch/arm26/kernel/sys_arm.c  2007-04-07 22:29:56.0 
+0400
@@ -256,7 +256,7 @@ asmlinkage int sys_clone(unsigned long c
if (!newsp)
newsp = regs->ARM_sp;
 
-   return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+   return do_fork(clone_flags & SYS_CLONE_MASK, newsp, regs, 0, NULL, 
NULL);
 }
 
 asmlinkage int sys_vfork(struct pt_regs *regs)
--- 2.6.21-rc5-mm4/arch/avr32/kernel/process.c~1_MASK   2007-04-07 
20:11:12.0 +0400
+++ 2.6.21-rc5-mm4/arch/avr32/kernel/process.c  2007-04-07 22:33:33.0 
+0400
@@ -359,7 +359,7 @@ asmlinkage int sys_clone(unsigned long c
 {
if (!newsp)
newsp = regs->sp;
-   return do_fork(clone_flags, newsp, regs, 0,
+   return do_fork(clone_flags & SYS_CLONE_MASK, newsp, regs, 0,
   (int __user *)parent_tidptr,
   (int __user *)child_tidptr);
 }
--- 2.6.21-rc5-mm4/arch/blackfin/kernel/process.c~1_MASK2007-04-07 
20:11:12.0 +0400
+++ 2.6.21-rc5-mm4/arch/blackfin/kernel/process.c   2007-04-07 
22:51:34.0 +0400
@@ -239,7 +239,7 @@ asmlinkage int bfin_clone(struct pt_regs
newsp = rdusp();
else
newsp -= 12;
-   return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+   return do_fork(clone_flags & SYS_CLONE_MASK, newsp, regs, 0, NULL, 
NULL);
 }
 
 int
--- 2.6.21-rc5-mm4/arch/cris/arch-v10/kernel/process.c~1_MASK   2006-07-29 
05:05:33.0 +0400
+++ 2.6.21-rc5-mm4/arch/cris/arch-v10/kernel/process.c  2007-04-07 
22:25:20.0 +0400
@@ -189,7 +189,8 @@ asmlinkage int sys_clone(unsigned long n
 {
if (!newusp)
newusp = rdusp();
-   return 

[RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Oleg Nesterov
For review only.

To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them
in sys_clone().

These below

arch/sparc/
arch/sparc64/
arch/ia64/
arch/v850/
arch/xtensa/

are not changed, they use assembly to implement sys_clone().

Signed-off-by: Oleg Nesterov [EMAIL PROTECTED]

 include/linux/sched.h   |3 +++
 arch/alpha/kernel/process.c |3 ++-
 arch/arm/kernel/sys_arm.c   |3 ++-
 arch/arm26/kernel/sys_arm.c |2 +-
 arch/avr32/kernel/process.c |2 +-
 arch/blackfin/kernel/process.c  |2 +-
 arch/cris/arch-v10/kernel/process.c |3 ++-
 arch/cris/arch-v32/kernel/process.c |3 ++-
 arch/frv/kernel/process.c   |3 ++-
 arch/h8300/kernel/process.c |2 +-
 arch/i386/kernel/process.c  |2 +-
 arch/m32r/kernel/process.c  |2 +-
 arch/m68k/kernel/process.c  |2 +-
 arch/m68knommu/kernel/process.c |2 +-
 arch/mips/kernel/linux32.c  |2 +-
 arch/mips/kernel/syscall.c  |2 +-
 arch/parisc/kernel/process.c|3 ++-
 arch/powerpc/kernel/process.c   |3 ++-
 arch/s390/kernel/compat_linux.c |2 +-
 arch/s390/kernel/process.c  |2 +-
 arch/sh/kernel/process.c|2 +-
 arch/sh64/kernel/process.c  |2 +-
 arch/um/sys-i386/syscalls.c |4 ++--
 arch/um/sys-x86_64/syscalls.c   |4 ++--
 arch/x86_64/ia32/sys_ia32.c |3 ++-
 arch/x86_64/kernel/process.c|3 ++-
 arch/xtensa/kernel/process.c|3 ++-
 27 files changed, 41 insertions(+), 28 deletions(-)

--- 2.6.21-rc5-mm4/include/linux/sched.h~1_MASK 2007-04-07 20:11:14.0 
+0400
+++ 2.6.21-rc5-mm4/include/linux/sched.h2007-04-07 22:07:50.0 
+0400
@@ -27,6 +27,9 @@
 #define CLONE_NEWUTS   0x0400  /* New utsname group? */
 #define CLONE_NEWIPC   0x0800  /* New ipcs */
 
+/* user-space visible */
+#define SYS_CLONE_MASK (~0x0)
+
 /*
  * Scheduling policies
  */
--- 2.6.21-rc5-mm4/arch/alpha/kernel/process.c~1_MASK   2007-04-07 
20:11:11.0 +0400
+++ 2.6.21-rc5-mm4/arch/alpha/kernel/process.c  2007-04-07 22:32:37.0 
+0400
@@ -249,7 +249,8 @@ alpha_clone(unsigned long clone_flags, u
if (!usp)
usp = rdusp();
 
-   return do_fork(clone_flags, usp, regs, 0, parent_tid, child_tid);
+   return do_fork(clone_flags  SYS_CLONE_MASK, usp, regs, 0,
+   parent_tid, child_tid);
 }
 
 int
--- 2.6.21-rc5-mm4/arch/arm/kernel/sys_arm.c~1_MASK 2007-01-07 
22:57:52.0 +0300
+++ 2.6.21-rc5-mm4/arch/arm/kernel/sys_arm.c2007-04-07 22:10:02.0 
+0400
@@ -252,7 +252,8 @@ asmlinkage int sys_clone(unsigned long c
if (!newsp)
newsp = regs-ARM_sp;
 
-   return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, 
child_tidptr);
+   return do_fork(clone_flags  SYS_CLONE_MASK, newsp, regs, 0,
+   parent_tidptr, child_tidptr);
 }
 
 asmlinkage int sys_vfork(struct pt_regs *regs)
--- 2.6.21-rc5-mm4/arch/arm26/kernel/sys_arm.c~1_MASK   2006-10-22 
18:23:57.0 +0400
+++ 2.6.21-rc5-mm4/arch/arm26/kernel/sys_arm.c  2007-04-07 22:29:56.0 
+0400
@@ -256,7 +256,7 @@ asmlinkage int sys_clone(unsigned long c
if (!newsp)
newsp = regs-ARM_sp;
 
-   return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+   return do_fork(clone_flags  SYS_CLONE_MASK, newsp, regs, 0, NULL, 
NULL);
 }
 
 asmlinkage int sys_vfork(struct pt_regs *regs)
--- 2.6.21-rc5-mm4/arch/avr32/kernel/process.c~1_MASK   2007-04-07 
20:11:12.0 +0400
+++ 2.6.21-rc5-mm4/arch/avr32/kernel/process.c  2007-04-07 22:33:33.0 
+0400
@@ -359,7 +359,7 @@ asmlinkage int sys_clone(unsigned long c
 {
if (!newsp)
newsp = regs-sp;
-   return do_fork(clone_flags, newsp, regs, 0,
+   return do_fork(clone_flags  SYS_CLONE_MASK, newsp, regs, 0,
   (int __user *)parent_tidptr,
   (int __user *)child_tidptr);
 }
--- 2.6.21-rc5-mm4/arch/blackfin/kernel/process.c~1_MASK2007-04-07 
20:11:12.0 +0400
+++ 2.6.21-rc5-mm4/arch/blackfin/kernel/process.c   2007-04-07 
22:51:34.0 +0400
@@ -239,7 +239,7 @@ asmlinkage int bfin_clone(struct pt_regs
newsp = rdusp();
else
newsp -= 12;
-   return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+   return do_fork(clone_flags  SYS_CLONE_MASK, newsp, regs, 0, NULL, 
NULL);
 }
 
 int
--- 2.6.21-rc5-mm4/arch/cris/arch-v10/kernel/process.c~1_MASK   2006-07-29 
05:05:33.0 +0400
+++ 2.6.21-rc5-mm4/arch/cris/arch-v10/kernel/process.c  2007-04-07 
22:25:20.0 +0400
@@ -189,7 +189,8 @@ asmlinkage int sys_clone(unsigned long n
 {
if (!newusp)
newusp = rdusp();
-   return do_fork(flags, 

Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes:

 For review only.

 To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them
 in sys_clone().

Nack

The current clone_flags field is for user space consumption and we
have proposed users for all or almost all of the remaining bits.

If we are going to have kernel only flags please use an additional
argument to do_fork and copy_process.

Your current scheme also has the bad side that if user space supplied
a kernel flag it is hard to detect it and return -EINVAL.  Which
limits future expansion.  Silently dropping clone flags is a real
pain, if you are trying to detect if a new flag has been implemented.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Oleg Nesterov
On 04/08, Eric W. Biederman wrote:
 Oleg Nesterov [EMAIL PROTECTED] writes:
 
  For review only.
 
  To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them
  in sys_clone().
 
 Nack
 
 The current clone_flags field is for user space consumption and we
 have proposed users for all or almost all of the remaining bits.

OK.

 If we are going to have kernel only flags please use an additional
 argument to do_fork and copy_process.

Yes, we can do this. But we have a number of architectures which use
sys_clone() to implement kernel_thread(). It would be nice to have an
architecture neutral kernel_thread() implementation as you proposed.
We should change all of them if we want to add a new parameter to
do_fork().

Perhaps it is better to add reparent_kthread() (next patch) to kthread()
and forget about CLONE_KERNEL_THREAD.

Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
threads. And if -parent == /sbin/init, we can't remove us from -children
(unless we forbid sub-thread-of-init exec). So the only safe change is
set -exit_state = -1.

 Your current scheme also has the bad side that if user space supplied
 a kernel flag it is hard to detect it and return -EINVAL.  Which
 limits future expansion.  Silently dropping clone flags is a real
 pain, if you are trying to detect if a new flag has been implemented.

Yes. But that is what we are doing now. copy_process() just ignores
unknown flags.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes:

 On 04/08, Eric W. Biederman wrote:

 If we are going to have kernel only flags please use an additional
 argument to do_fork and copy_process.

 Yes, we can do this. But we have a number of architectures which use
 sys_clone() to implement kernel_thread(). It would be nice to have an
 architecture neutral kernel_thread() implementation as you proposed.
 We should change all of them if we want to add a new parameter to
 do_fork().

 Perhaps it is better to add reparent_kthread() (next patch) to kthread()
 and forget about CLONE_KERNEL_THREAD.

Please. 

 Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel
 threads. And if -parent == /sbin/init, we can't remove us from -children
 (unless we forbid sub-thread-of-init exec). So the only safe change is
 set -exit_state = -1.

Yes.  We certainly need -exit_state = -1.
Earlier I had forgotten about second the use of -children to update
the parent pointer of processes when their parent exits.

There is a practical question how much we care about pstree being
confused (I assume it doesn't crash).  If this is just a confusion
issue then I say go for it.  PPID == 0 is a very legitimate way to say
the kernel is the parent process.

There are a few more cases where we are likely to get PPID == 0 in the
future and /sbin/init already has that now.  Plus there is a lot of
historic precedent.  The odd part is PPID = 0 having multiple
children.

If we decide maintaining a tree is important I would much rather put
init_task on the task_list so we can see it in /proc then go the other
way around.

I would like a confirmation that it PPID == 0 is what is confusing
pstree just to make certain we haven't half filled in some field
in init_task and are thus giving in correct /proc output.  But that is
all the double checking I would do.

 Your current scheme also has the bad side that if user space supplied
 a kernel flag it is hard to detect it and return -EINVAL.  Which
 limits future expansion.  Silently dropping clone flags is a real
 pain, if you are trying to detect if a new flag has been implemented.

 Yes. But that is what we are doing now. copy_process() just ignores
 unknown flags.

Agreed.  I fixed that in sys_unshare but I should really submit a
patch to do the same for sys_clone at some point.

When know flags aren't implemented we certainly return -EINVAL.

Given that this line of work looks to fix the race that messes allows
a threaded init to generate unkillable zombies I can probably find
some time in the next while to work on it.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-08 Thread Roland McGrath
I concur with Eric's assessment.  Adding new magic bits to the generic
clone path seems like a poor way to cope with kernel threads.  I think
it's better if kernel thread setup gets less like normal user process
setup.  I also agree with Eric that PPID of 0 is a very natural way for
kernel threads to be displayed.  We need to know more about the nature
of the compatibility issue in procps to judge whether there is good
reason to avoid changing it.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/