Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Hi, sorry for the really delayed mail.. On May 28 2007 22:53, Eric W. Biederman wrote: >> Jan Engelhardt writes: >>> On Apr 10 2007 17:47, Jan Engelhardt wrote: >>> >>> Done that and the result is that `ps afwx` now looks like: >>> >>> PID TTY STAT TIME COMMAND >>> 2722 ?S 0:00 [lockd] >> ... >>> 3 ?S< 0:00 [events/0] >>> 2 ?SN 0:00 [ksoftirqd/0] >>> 1 ?Ss 0:02 init [3] >>> 537 ?S>> 1600 ?Ss 0:00 \_ /usr/bin/dbus-daemon --system >>> 1692 ?Ss 0:00 \_ /sbin/acpid >>> 1923 ?Ss 0:00 \_ /sbin/resmgrd >> ... >>> -if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u') >>> +if(ADOPTED(processes[i]) && forest_type!='u') >> >> That's not compatible because init's children are now in the >> logical place. Since the days of procps-1.x.x or earlier, >> such processes have been listed at top level. >> >> BTW, what does "ps -ejH" do for you, with and without the patch? 2.6.22, without kernel patch, ps -ejH (shortened): PID PGID SID TTY TIME CMD 2 0 0 ?00:00:00 kthreadd 3 0 0 ?00:00:00 migration/0 1 1 1 ?00:00:00 init 821 821 821 ?00:00:00 udevd 2228 2228 2228 ?00:00:00 klogd and `ps afx`: PID TTY STAT TIME COMMAND 2 ?S< 0:00 [kthreadd] 3 ?S< 0:00 \_ [migration/0] 1 ?Ss 0:00 init [5] 821 ?Sps -ejH displays everything. So does `ps afx` for me ;-) >For 2.6.22 we will only have kthreadd >as a sibling of init with ppid == 0. Depending on what happens >in the evolution of how we start kernel thread we may be able >to remove kthreadd and have all kthreads with a ppid of 0, but only >time will tell. > Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
"Albert Cahalan" <[EMAIL PROTECTED]> writes: > On 5/29/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: >> "Albert Cahalan" <[EMAIL PROTECTED]> writes: > > That's not what I mean. (the "-e" causes that of course) > I'm asking about the parent-child relationships shown. > The "-H" option is a bit different from the "f" option. Yes. Sorry on the unmodified ps the parent-child relationship seems to be displayed properly. >>> I'd be a lot happier about breaking compatibility in this area >>> if I could get a functional adoption flag. That is, I really >>> would like to show a process as child of init if it naturally >>> was created as a child of init. It's less informative to have >>> fake children showing up the same as real ones. The original >>> parent PID would do. (BTW, the original parent name and/or >>> grandparent PID would be great to have) As a bonus, the kernel >>> could reap these processes more quickly than init can... and >>> then maybe we can stop caring if init is alive. >> >> Having the kernel not reparent user processes to init is an interesting >> idea, especially when those processes have not existed. I'm not >> certain that is POSIX complaint and otherwise backwards compatible. > > I'm not suggesting that this be visible via POSIX APIs. > > It's almost certainly a given that getppid() must return 1, and > probably /proc needs to show this as well. Without question, > any process created by init must be reaped by init. > > Processes NOT created by init could be silently reaped by > the kernel. They need to see their own PPID as 1, but there > need not be any parent-child relationship in the kernel data > structures. The kernel can fake the whole thing, which is nice > because then the kernel isn't depending on userspace to > correctly perform the pointless action of playing with zombies. > (might setting the death signal to 0 be useful here?) > > For "ps fax" and such, I'd like to distinguish between init's > real and adopted children. Right now the adopted children > look like they were created by init, which is not true. I only > need a simple boolean flag, set upon reparenting, to tell me. > Such a flag may also be useful for optimizing away the whole > wait/waitpid/wait4/waitid/wait3 nonsense when an adopted > child dies. I will keep it in mind. A simple this process has been reparented flag probably won't be too bad. As for the rest I'm not certain. With pid namespaces there is a certain sense in doing something like this, but I'm not certain /sbin/init and all of it's replacements don't care (although admittedly it would be a stretch to tell the difference). Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On 5/29/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: "Albert Cahalan" <[EMAIL PROTECTED]> writes: > Jan Engelhardt writes: -if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u') +if(ADOPTED(processes[i]) && forest_type!='u') That's not compatible because init's children are now in the logical place. Since the days of procps-1.x.x or earlier, such processes have been listed at top level. BTW, what does "ps -ejH" do for you, with and without the patch? ps -ejH displays everything. That's not what I mean. (the "-e" causes that of course) I'm asking about the parent-child relationships shown. The "-H" option is a bit different from the "f" option. I'd be a lot happier about breaking compatibility in this area if I could get a functional adoption flag. That is, I really would like to show a process as child of init if it naturally was created as a child of init. It's less informative to have fake children showing up the same as real ones. The original parent PID would do. (BTW, the original parent name and/or grandparent PID would be great to have) As a bonus, the kernel could reap these processes more quickly than init can... and then maybe we can stop caring if init is alive. Having the kernel not reparent user processes to init is an interesting idea, especially when those processes have not existed. I'm not certain that is POSIX complaint and otherwise backwards compatible. I'm not suggesting that this be visible via POSIX APIs. It's almost certainly a given that getppid() must return 1, and probably /proc needs to show this as well. Without question, any process created by init must be reaped by init. Processes NOT created by init could be silently reaped by the kernel. They need to see their own PPID as 1, but there need not be any parent-child relationship in the kernel data structures. The kernel can fake the whole thing, which is nice because then the kernel isn't depending on userspace to correctly perform the pointless action of playing with zombies. (might setting the death signal to 0 be useful here?) For "ps fax" and such, I'd like to distinguish between init's real and adopted children. Right now the adopted children look like they were created by init, which is not true. I only need a simple boolean flag, set upon reparenting, to tell me. Such a flag may also be useful for optimizing away the whole wait/waitpid/wait4/waitid/wait3 nonsense when an adopted child dies. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
> Having the kernel not reparent user processes to init is an interesting > idea, especially when those processes have not existed. I'm not > certain that is POSIX complaint and otherwise backwards compatible. It's hard to see how it would work. There has to be some parent PID. The reason using 1 makes sense is that it is always there. Anything >0 and not the PID of some live process could be reused for a new process at some point. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
"Albert Cahalan" <[EMAIL PROTECTED]> writes: > This has long been rotten. Mind fixing it for us? :-) > > We have N types of thread on M CPUs. Pick something, N or M, > to be at the top level in /proc. The other goes below, in the > per-process task directories. > > You then have either N or M things showing up in ps, not N*M. > > Note that both ps and top can print the CPU number just fine. > Abusing the task name for this is just retarded. This suggests > that the top level should be the type of task, with the lower > level in /proc/*/task being per-CPU and not needing distinct > naming at all. In a lot of ways that is reasonable. However kernel threads don't share signal handling and getting to the point where they could share signal handling would be difficult so we cannot use the generic CLONE_THREAD handling they really are more like individual processes. So at that level the cpu number in the name is just to help tell them apart. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
"Albert Cahalan" <[EMAIL PROTECTED]> writes: > Jan Engelhardt writes: >> On Apr 10 2007 17:47, Jan Engelhardt wrote: >>> On Apr 8 2007 20:57, Oleg Nesterov wrote: > Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel threads. And if ->parent == /sbin/init, we can't remove us from ->children (unless we forbid sub-thread-of-init exec). So the only safe change is set ->exit_state = -1. >>> >>> Then we have to fix pstree and all that. (In fact, I'm >>> trying to patch `ps f` to DTRT ;p) >> >> Done that and the result is that `ps afwx` now looks like: >> >> PID TTY STAT TIME COMMAND >> 2722 ?S 0:00 [lockd] > ... >> 3 ?S< 0:00 [events/0] >> 2 ?SN 0:00 [ksoftirqd/0] >> 1 ?Ss 0:02 init [3] >> 537 ?S> 1600 ?Ss 0:00 \_ /usr/bin/dbus-daemon --system >> 1692 ?Ss 0:00 \_ /sbin/acpid >> 1923 ?Ss 0:00 \_ /sbin/resmgrd > ... >> -if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u') >> +if(ADOPTED(processes[i]) && forest_type!='u') > > That's not compatible because init's children are now in the > logical place. Since the days of procps-1.x.x or earlier, > such processes have been listed at top level. > > BTW, what does "ps -ejH" do for you, with and without the patch? ps -ejH displays everything. For 2.6.22 we will only have kthreadd as a sibling of init with ppid == 0. Depending on what happens in the evolution of how we start kernel thread we may be able to remove kthreadd and have all kthreads with a ppid of 0, but only time will tell. > I'd be a lot happier about breaking compatibility in this area > if I could get a functional adoption flag. That is, I really > would like to show a process as child of init if it naturally > was created as a child of init. It's less informative to have > fake children showing up the same as real ones. The original > parent PID would do. (BTW, the original parent name and/or > grandparent PID would be great to have) As a bonus, the kernel > could reap these processes more quickly than init can... and > then maybe we can stop caring if init is alive. Having the kernel not reparent user processes to init is an interesting idea, especially when those processes have not existed. I'm not certain that is POSIX complaint and otherwise backwards compatible. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Jan Engelhardt writes: On Apr 10 2007 17:47, Jan Engelhardt wrote: On Apr 8 2007 20:57, Oleg Nesterov wrote: Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel threads. And if ->parent == /sbin/init, we can't remove us from ->children (unless we forbid sub-thread-of-init exec). So the only safe change is set ->exit_state = -1. Then we have to fix pstree and all that. (In fact, I'm trying to patch `ps f` to DTRT ;p) Done that and the result is that `ps afwx` now looks like: PID TTY STAT TIME COMMAND 2722 ?S 0:00 [lockd] ... 3 ?S< 0:00 [events/0] 2 ?SN 0:00 [ksoftirqd/0] 1 ?Ss 0:02 init [3] 537 ?S ... -if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u') +if(ADOPTED(processes[i]) && forest_type!='u') That's not compatible because init's children are now in the logical place. Since the days of procps-1.x.x or earlier, such processes have been listed at top level. BTW, what does "ps -ejH" do for you, with and without the patch? I'd be a lot happier about breaking compatibility in this area if I could get a functional adoption flag. That is, I really would like to show a process as child of init if it naturally was created as a child of init. It's less informative to have fake children showing up the same as real ones. The original parent PID would do. (BTW, the original parent name and/or grandparent PID would be great to have) As a bonus, the kernel could reap these processes more quickly than init can... and then maybe we can stop caring if init is alive. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Robin Holt writes: On Mon, Apr 09, 2007 at 08:36:21AM -0600, Eric W. Biederman wrote: Robin Holt <[EMAIL PROTECTED]> writes: I would say this is more a benefit than a problem. With a couple of these systems we are testing, the number of kernel threads is far greater than the number of user processes and having pstree not normally show them, but maybe have an option we add later to show them again would be beneficial. Sure. Robin how many kernel thread per cpu are you seeing? 10. This has long been rotten. Mind fixing it for us? :-) We have N types of thread on M CPUs. Pick something, N or M, to be at the top level in /proc. The other goes below, in the per-process task directories. You then have either N or M things showing up in ps, not N*M. Note that both ps and top can print the CPU number just fine. Abusing the task name for this is just retarded. This suggests that the top level should be the type of task, with the lower level in /proc/*/task being per-CPU and not needing distinct naming at all. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On Apr 10 2007 17:47, Jan Engelhardt wrote: >On Apr 8 2007 20:57, Oleg Nesterov wrote: >> >>Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel >>threads. And if ->parent == /sbin/init, we can't remove us from ->children >>(unless we forbid sub-thread-of-init exec). So the only safe change is >>set ->exit_state = -1. > >Then we have to fix pstree and all that. (In fact, I'm >trying to patch `ps f` to DTRT ;p) Done that and the result is that `ps afwx` now looks like: PID TTY STAT TIME COMMAND 2722 ?S 0:00 [lockd] 2721 ?S< 0:00 [rpciod/0] 749 ?S< 0:00 [ata_aux] 748 ?S< 0:00 [ata/0] 496 ?S< 0:00 [xfssyncd] 495 ?S< 0:00 [xfsbufd] 484 ?S< 0:00 [xfsdatad/0] 482 ?S< 0:00 [xfslogd/0] 427 ?S< 0:00 [scsi_eh_0] 204 ?S< 0:00 [kpsmoused] 110 ?S< 0:00 [aio/0] 109 ?S< 0:00 [kswapd0] 108 ?S 0:00 [pdflush] 107 ?S 0:00 [pdflush] 85 ?S< 0:00 [kseriod] 21 ?S< 0:00 [kacpid] 20 ?S< 0:00 [kblockd/0] 5 ?S< 0:00 [kthread] 4 ?S< 0:00 [khelper] 3 ?S< 0:00 [events/0] 2 ?SN 0:00 [ksoftirqd/0] 1 ?Ss 0:02 init [3] 537 ?Sppid != self_pid) more_children = 0; -if(self_pid==1 && ADOPTED(processes[i]) && forest_type!='u') +if(ADOPTED(processes[i]) && forest_type!='u') show_tree(i++, n, level, more_children); else show_tree(i++, n, level+1, more_children); Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On Apr 8 2007 20:57, Oleg Nesterov wrote: > >Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel >threads. And if ->parent == /sbin/init, we can't remove us from ->children >(unless we forbid sub-thread-of-init exec). So the only safe change is >set ->exit_state = -1. Then we have to fix pstree and all that. (In fact, I'm secretly trying to patch `ps f` to DTRT ;p) Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On Mon, Apr 09, 2007 at 06:35:39PM -0600, Eric W. Biederman wrote: > Robin Holt <[EMAIL PROTECTED]> writes: > > > OK. I just got the OK from management. The system we were booting was > > for research only. We had NR_CPUS=num_online_cpus()=4096 which were > > non-hyperthreaded. With no attached I/O and the tweak I originally > > posted plus one change Jack has already gotten accepted, the machine > > booted in approx 12 minutes. > > How much of that time was between the time the kernel was loaded > and before user space was started? I am scrambling for my timed boot logs. All I am finding right now are boots from 512p machines. For that machine, it took 178 seconds to get to mounting the root filesystem and 75 seconds to get to the login prompt. Of that 75 seconds, 24 was because an NFS server was down and 14 was due to PBS taking time to start. 68.994913| Memory: 2074872032k/2082172528k available (9785k code, 7319136k reserved, 5228k data, 672k init) 36.566117| migration_cost=3869,61218,76048 24.074113| mount: RPC: Remote system error - No route to host 13.972892| PBS mom Those are the long standouts. There are quite a few lines in the boot output which take 2 seconds or less, but none of those are too surprising. Plain and simply, booting that large of a machine has always taken considerable time. The first time we booted a 512p, it took nearly an hour, now that is down to 8 or 9 minutes. > > Twelve minutes sounds like a long time for a boot, if you aren't fsck'ing > filesystems. Not fsck'ing filesystems (xfs). One of the NFS servers that was specified in /etc/fstab was missing, but that is unimportant. Thanks, Robin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Robin Holt <[EMAIL PROTECTED]> writes: > OK. I just got the OK from management. The system we were booting was > for research only. We had NR_CPUS=num_online_cpus()=4096 which were > non-hyperthreaded. With no attached I/O and the tweak I originally > posted plus one change Jack has already gotten accepted, the machine > booted in approx 12 minutes. How much of that time was between the time the kernel was loaded and before user space was started? Twelve minutes sounds like a long time for a boot, if you aren't fsck'ing filesystems. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
OK. I just got the OK from management. The system we were booting was for research only. We had NR_CPUS=num_online_cpus()=4096 which were non-hyperthreaded. With no attached I/O and the tweak I originally posted plus one change Jack has already gotten accepted, the machine booted in approx 12 minutes. Thanks, Robin On Mon, Apr 09, 2007 at 10:20:27AM -0600, Eric W. Biederman wrote: > Roland McGrath <[EMAIL PROTECTED]> writes: > > > I concur with Eric's assessment. Adding new magic bits to the generic > > clone path seems like a poor way to cope with kernel threads. I think > > it's better if kernel thread setup gets less like normal user process > > setup. I also agree with Eric that PPID of 0 is a very natural way for > > kernel threads to be displayed. We need to know more about the nature > > of the compatibility issue in procps to judge whether there is good > > reason to avoid changing it. > > I just investigated the procps issue. Using init_task as the parent > nothing sticks out as being wrong in /proc. > > Further when I modified pstree to accept 0 as it's starting pid (from > which all else would be rooted). All of the kernel threads showed up. > > So if anything I it is a feature that kernel threads don't show up > by default in pstree (when PPID == 0). It isn't a subtle kernel bug. > > Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Roland McGrath <[EMAIL PROTECTED]> writes: > I concur with Eric's assessment. Adding new magic bits to the generic > clone path seems like a poor way to cope with kernel threads. I think > it's better if kernel thread setup gets less like normal user process > setup. I also agree with Eric that PPID of 0 is a very natural way for > kernel threads to be displayed. We need to know more about the nature > of the compatibility issue in procps to judge whether there is good > reason to avoid changing it. I just investigated the procps issue. Using init_task as the parent nothing sticks out as being wrong in /proc. Further when I modified pstree to accept 0 as it's starting pid (from which all else would be rooted). All of the kernel threads showed up. So if anything I it is a feature that kernel threads don't show up by default in pstree (when PPID == 0). It isn't a subtle kernel bug. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On Mon, Apr 09, 2007 at 08:36:21AM -0600, Eric W. Biederman wrote: > Robin Holt <[EMAIL PROTECTED]> writes: > > > I would say this is more a benefit than a problem. With a couple of these > > systems we are testing, the number of kernel threads is far greater than > > the number of user processes and having pstree not normally show them, but > > maybe have an option we add later to show them again would be beneficial. > > Sure. > > Robin how many kernel thread per cpu are you seeing? 10. FYI, pid 1539 is kthread. a01:~ # ps -ef | egrep "\[.*\/255\]" root 512 1 0 Apr08 ?00:00:00 [migration/255] root 513 1 0 Apr08 ?00:00:00 [ksoftirqd/255] root 1281 1 0 Apr08 ?00:00:02 [events/255] root 2435 1539 0 Apr08 ?00:00:00 [kblockd/255] root 3159 1539 0 Apr08 ?00:00:00 [aio/255] root 4007 1539 0 Apr08 ?00:00:00 [cqueue/255] root 8653 1539 0 Apr08 ?00:00:00 [ata/255] root 17438 1539 0 Apr08 ?00:00:00 [xfslogd/255] root 17950 1539 0 Apr08 ?00:00:00 [xfsdatad/255] root 18426 1539 0 Apr08 ?00:00:00 [rpciod/255] Thanks, Robin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Robin Holt <[EMAIL PROTECTED]> writes: > I would say this is more a benefit than a problem. With a couple of these > systems we are testing, the number of kernel threads is far greater than > the number of user processes and having pstree not normally show them, but > maybe have an option we add later to show them again would be beneficial. Sure. Robin how many kernel thread per cpu are you seeing? I don't know that we have a problem in this regard but it feels like you are seeing a lot more kernel threads per cpu than I would expect from looking at small systems. I am wondering if somewhere we are using per cpu kernel threads when we should not be... Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On Mon, Apr 09, 2007 at 01:06:43PM +0400, Oleg Nesterov wrote: > On 04/08, Eric W. Biederman wrote: > > There is a practical question how much we care about pstree being > > confused (I assume it doesn't crash). If this is just a confusion > > issue then I say go for it. PPID == 0 is a very legitimate way to say > > the kernel is the parent process. > > No, it doesn't crash. It just doesn't show kernel threads (ps ax is OK). > I didn't look into the sources, but I guess the reason is that pstree > assumes that the "root" of the tree is "pid == 1" process. I would say this is more a benefit than a problem. With a couple of these systems we are testing, the number of kernel threads is far greater than the number of user processes and having pstree not normally show them, but maybe have an option we add later to show them again would be beneficial. Thanks, Robin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On 04/08, Eric W. Biederman wrote: > > Oleg Nesterov <[EMAIL PROTECTED]> writes: > > > Perhaps it is better to add reparent_kthread() (next patch) to kthread() > > and forget about CLONE_KERNEL_THREAD. > > Please. OK, will do tomorrow. > > Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel > > threads. And if ->parent == /sbin/init, we can't remove us from ->children > > (unless we forbid sub-thread-of-init exec). So the only safe change is > > set ->exit_state = -1. > > Yes. We certainly need ->exit_state = -1. > Earlier I had forgotten about second the use of ->children to update > the parent pointer of processes when their parent exits. > > There is a practical question how much we care about pstree being > confused (I assume it doesn't crash). If this is just a confusion > issue then I say go for it. PPID == 0 is a very legitimate way to say > the kernel is the parent process. No, it doesn't crash. It just doesn't show kernel threads (ps ax is OK). I didn't look into the sources, but I guess the reason is that pstree assumes that the "root" of the tree is "pid == 1" process. I personally think this is acceptable (and Roland seems to think the same). Still, to be safe, I'll break this into 2 patches, the first one sets ->exit_state, the second re-parents to swapper. In fact, we can do some odd things to make pstree happy. We need ->parent only because /proc needs some ->parent fields. But I'd prefer to avoid these hacks. Still, it is sad that we can't have additional flags for kernel_thread(). However, I agree with your and Roland's objections. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
I concur with Eric's assessment. Adding new magic bits to the generic clone path seems like a poor way to cope with kernel threads. I think it's better if kernel thread setup gets less like normal user process setup. I also agree with Eric that PPID of 0 is a very natural way for kernel threads to be displayed. We need to know more about the nature of the compatibility issue in procps to judge whether there is good reason to avoid changing it. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Oleg Nesterov <[EMAIL PROTECTED]> writes: > On 04/08, Eric W. Biederman wrote: > >> If we are going to have kernel only flags please use an additional >> argument to do_fork and copy_process. > > Yes, we can do this. But we have a number of architectures which use > sys_clone() to implement kernel_thread(). It would be nice to have an > architecture neutral kernel_thread() implementation as you proposed. > We should change all of them if we want to add a new parameter to > do_fork(). > > Perhaps it is better to add reparent_kthread() (next patch) to kthread() > and forget about CLONE_KERNEL_THREAD. Please. > Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel > threads. And if ->parent == /sbin/init, we can't remove us from ->children > (unless we forbid sub-thread-of-init exec). So the only safe change is > set ->exit_state = -1. Yes. We certainly need ->exit_state = -1. Earlier I had forgotten about second the use of ->children to update the parent pointer of processes when their parent exits. There is a practical question how much we care about pstree being confused (I assume it doesn't crash). If this is just a confusion issue then I say go for it. PPID == 0 is a very legitimate way to say the kernel is the parent process. There are a few more cases where we are likely to get PPID == 0 in the future and /sbin/init already has that now. Plus there is a lot of historic precedent. The odd part is PPID = 0 having multiple children. If we decide maintaining a tree is important I would much rather put init_task on the task_list so we can see it in /proc then go the other way around. I would like a confirmation that it PPID == 0 is what is confusing pstree just to make certain we haven't half filled in some field in init_task and are thus giving in correct /proc output. But that is all the double checking I would do. >> Your current scheme also has the bad side that if user space supplied >> a kernel flag it is hard to detect it and return -EINVAL. Which >> limits future expansion. Silently dropping clone flags is a real >> pain, if you are trying to detect if a new flag has been implemented. > > Yes. But that is what we are doing now. copy_process() just ignores > unknown flags. Agreed. I fixed that in sys_unshare but I should really submit a patch to do the same for sys_clone at some point. When know flags aren't implemented we certainly return -EINVAL. Given that this line of work looks to fix the race that messes allows a threaded init to generate unkillable zombies I can probably find some time in the next while to work on it. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
On 04/08, Eric W. Biederman wrote: > Oleg Nesterov <[EMAIL PROTECTED]> writes: > > > For review only. > > > > To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them > > in sys_clone(). > > Nack > > The current clone_flags field is for user space consumption and we > have proposed users for all or almost all of the remaining bits. OK. > If we are going to have kernel only flags please use an additional > argument to do_fork and copy_process. Yes, we can do this. But we have a number of architectures which use sys_clone() to implement kernel_thread(). It would be nice to have an architecture neutral kernel_thread() implementation as you proposed. We should change all of them if we want to add a new parameter to do_fork(). Perhaps it is better to add reparent_kthread() (next patch) to kthread() and forget about CLONE_KERNEL_THREAD. Anyway, re-parenting to swapper breaks pstree, it doesn't show kernel threads. And if ->parent == /sbin/init, we can't remove us from ->children (unless we forbid sub-thread-of-init exec). So the only safe change is set ->exit_state = -1. > Your current scheme also has the bad side that if user space supplied > a kernel flag it is hard to detect it and return -EINVAL. Which > limits future expansion. Silently dropping clone flags is a real > pain, if you are trying to detect if a new flag has been implemented. Yes. But that is what we are doing now. copy_process() just ignores unknown flags. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK
Oleg Nesterov <[EMAIL PROTECTED]> writes: > For review only. > > To implement for-in-kerenl-use-only CLONE_ flags, we need to filter out them > in sys_clone(). Nack The current clone_flags field is for user space consumption and we have proposed users for all or almost all of the remaining bits. If we are going to have kernel only flags please use an additional argument to do_fork and copy_process. Your current scheme also has the bad side that if user space supplied a kernel flag it is hard to detect it and return -EINVAL. Which limits future expansion. Silently dropping clone flags is a real pain, if you are trying to detect if a new flag has been implemented. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/