[PATCH] scheduler: fix the return of the first time_slice

2007-04-15 Thread Satoru Takeuchi
At Sat, 14 Apr 2007 01:31:12 +1000,
Con Kolivas wrote:
> 
> On Monday 09 April 2007 16:09, Andrew Morton wrote:
> > On Sat, 07 Apr 2007 16:31:39 +0900 Satoru Takeuchi 
> <[EMAIL PROTECTED]> wrote:
> > > When I was examining the following program ...
> > >
> > >   1. There are a large amount of small jobs takes several msecs,
> > >  and the number of job increases constantly.
> > >   2. The process creates a thread or a process per job (I examined both
> > >  the thread model and the process model).
> > >   3. Each child process/thread does the assigned job and exit
> > > immediately.
> > >
> > > ... I found that the thread model's latency is longer than proess
> > > model's one against my expectation. It's because of the current
> > > sched_fork()/sched_exit() implementation as follows:
> > >
> > >   a) On sched_fork, the creator share its timeslice with new process.
> > >   b) On sched_exit, if the exiting process didn't exhaust its first
> > >  timeslice yet, it gives its timeslice to the parent.
> > >
> > > It has no problem on the process model since the creator is the parent.
> > > However, on the thread model, the creator is not the parent, it is same
> > > as the creator's parent. Hence, on this kind of program, the creator
> > > can't retrieve shared timeslice and exausts its timeslice at a rate of
> > > knots. In addition, somehow, the parent (typically shell?) gets extra
> > > timeslice.
> > >
> > > I believe it's a bug and the exiting process should give its timeslice
> > > to the creator. Now I have some patch plan to fix this problem as follow:
> > >
> > >  a) Add the field for the creator to task_struct. It needs extra memory.
> > >  b) Doesn't add extra field and have thread's parent the creater, which
> > > is same as process creation. However it has many side effects, for
> > > example, we also need to change sys_getppid() implementation.
> > >
> > > What do you think? Any comments are welcome.
> >
> > This comes at an awkward time, because we might well merge the
> > staircase/deadline work into 2.6.22, and I think it rewrites the part of
> > the scheduler which is causing the problems you're observing.
> >
> > Has anyone verified that SD fixes this problem and the one at
> > http://lkml.org/lkml/2007/4/7/21 ?
> 
> No, SD does nothing different in this regard. There is no notion of who made 
> the thread and who the remaining timeslice should go to. As you say, some 
> decision on sched_exit should probably be used to decide where to send it. In 
> the absence of yet more fields in task_struct, the easiest thing to do would 
> be to check if the the thread id is equal to the pid and if not, send it to 
> the pid (or any parent of that if it no longer exists). Whether it's worth 
> adding yet another field to task_struct to track this or not I do not have 
> enough information to make an informed choice in this regard. Either way I'm 
> low on patch-creation cycles so please feel free to provide your solution.

I wrote the patch fixing this problem. It's for 2.6.21-rc6 and works well.
I also examined the following load tests:

 - Compile kernel with -j4 option continuously for 8 hours on i386 UP box.
 - Compile kernel with -j48 option continuously for 2 hours on ia64 12 CPU
   box.

Now I'm testing its 2.6.21-rc6-mm1(with SD) version and would submit it soon.

Thanks,

Satoru

---
Fix the return of the first time_slice

Currently exiting process takes back its first_time_slice to its
parent. If the task is created with CLONE_PARENT or CLONE_THREAD
flag, the parent is not the creator. Hence the creator can't
collect the time_slice and time_slice leak occurs.

To fix this problem, remove first_time_slice field of task_struct
and introduce time_slice_reaper field instead. The new field means
the task which exiting task should return its first_time_slice to,
and is NULL after exhausting its first time_slice.

Signed-off-by: Satoru Takeuchi <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc6.tsfix/kernel/exit.c
===
--- linux-2.6.21-rc6.tsfix.orig/kernel/exit.c   2007-04-13 20:53:11.0 
+0900
+++ linux-2.6.21-rc6.tsfix/kernel/exit.c2007-04-14 13:58:29.0 
+0900
@@ -679,6 +679,8 @@
reaper = child_reaper(father);
break;
}
+   if (reaper->time_slice_reaper == father)
+   p->time_slice_reaper = NULL;
} while (reaper->exit_state);
 
/*
@@ -700,6 +702,8 @@
 
if (father == p->real_parent) {
/* reparent with a reaper, real father it's us */
+   if (p->time_slice_reaper == father)
+   p->time_slice_reaper = NULL;
choose_new_parent(p, reaper);
reparent_thread(p, father, 0);
} else {
@@ -721,6 +725,8 @@
}
list_for_each_safe(_p, _n, 

[PATCH] scheduler: fix the return of the first time_slice

2007-04-15 Thread Satoru Takeuchi
At Sat, 14 Apr 2007 01:31:12 +1000,
Con Kolivas wrote:
 
 On Monday 09 April 2007 16:09, Andrew Morton wrote:
  On Sat, 07 Apr 2007 16:31:39 +0900 Satoru Takeuchi 
 [EMAIL PROTECTED] wrote:
   When I was examining the following program ...
  
 1. There are a large amount of small jobs takes several msecs,
and the number of job increases constantly.
 2. The process creates a thread or a process per job (I examined both
the thread model and the process model).
 3. Each child process/thread does the assigned job and exit
   immediately.
  
   ... I found that the thread model's latency is longer than proess
   model's one against my expectation. It's because of the current
   sched_fork()/sched_exit() implementation as follows:
  
 a) On sched_fork, the creator share its timeslice with new process.
 b) On sched_exit, if the exiting process didn't exhaust its first
timeslice yet, it gives its timeslice to the parent.
  
   It has no problem on the process model since the creator is the parent.
   However, on the thread model, the creator is not the parent, it is same
   as the creator's parent. Hence, on this kind of program, the creator
   can't retrieve shared timeslice and exausts its timeslice at a rate of
   knots. In addition, somehow, the parent (typically shell?) gets extra
   timeslice.
  
   I believe it's a bug and the exiting process should give its timeslice
   to the creator. Now I have some patch plan to fix this problem as follow:
  
a) Add the field for the creator to task_struct. It needs extra memory.
b) Doesn't add extra field and have thread's parent the creater, which
   is same as process creation. However it has many side effects, for
   example, we also need to change sys_getppid() implementation.
  
   What do you think? Any comments are welcome.
 
  This comes at an awkward time, because we might well merge the
  staircase/deadline work into 2.6.22, and I think it rewrites the part of
  the scheduler which is causing the problems you're observing.
 
  Has anyone verified that SD fixes this problem and the one at
  http://lkml.org/lkml/2007/4/7/21 ?
 
 No, SD does nothing different in this regard. There is no notion of who made 
 the thread and who the remaining timeslice should go to. As you say, some 
 decision on sched_exit should probably be used to decide where to send it. In 
 the absence of yet more fields in task_struct, the easiest thing to do would 
 be to check if the the thread id is equal to the pid and if not, send it to 
 the pid (or any parent of that if it no longer exists). Whether it's worth 
 adding yet another field to task_struct to track this or not I do not have 
 enough information to make an informed choice in this regard. Either way I'm 
 low on patch-creation cycles so please feel free to provide your solution.

I wrote the patch fixing this problem. It's for 2.6.21-rc6 and works well.
I also examined the following load tests:

 - Compile kernel with -j4 option continuously for 8 hours on i386 UP box.
 - Compile kernel with -j48 option continuously for 2 hours on ia64 12 CPU
   box.

Now I'm testing its 2.6.21-rc6-mm1(with SD) version and would submit it soon.

Thanks,

Satoru

---
Fix the return of the first time_slice

Currently exiting process takes back its first_time_slice to its
parent. If the task is created with CLONE_PARENT or CLONE_THREAD
flag, the parent is not the creator. Hence the creator can't
collect the time_slice and time_slice leak occurs.

To fix this problem, remove first_time_slice field of task_struct
and introduce time_slice_reaper field instead. The new field means
the task which exiting task should return its first_time_slice to,
and is NULL after exhausting its first time_slice.

Signed-off-by: Satoru Takeuchi [EMAIL PROTECTED]

Index: linux-2.6.21-rc6.tsfix/kernel/exit.c
===
--- linux-2.6.21-rc6.tsfix.orig/kernel/exit.c   2007-04-13 20:53:11.0 
+0900
+++ linux-2.6.21-rc6.tsfix/kernel/exit.c2007-04-14 13:58:29.0 
+0900
@@ -679,6 +679,8 @@
reaper = child_reaper(father);
break;
}
+   if (reaper-time_slice_reaper == father)
+   p-time_slice_reaper = NULL;
} while (reaper-exit_state);
 
/*
@@ -700,6 +702,8 @@
 
if (father == p-real_parent) {
/* reparent with a reaper, real father it's us */
+   if (p-time_slice_reaper == father)
+   p-time_slice_reaper = NULL;
choose_new_parent(p, reaper);
reparent_thread(p, father, 0);
} else {
@@ -721,6 +725,8 @@
}
list_for_each_safe(_p, _n, father-ptrace_children) {
p = list_entry(_p, struct task_struct, ptrace_list);
+   if (p-time_slice_reaper == father)
+