Re: lowish-latency patch for 2.4.0-test9

2000-10-10 Thread Andrew Morton

Andi Kleen wrote:
> 
> On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote:
> > The little-low-latency patch for test9 is at
> >
> >   http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch
> >
> > Notes:
> >
> > - It now passes Benno's tests with 50% headroom (thanks to
> >   Ingo's scheduler race fix).
> 
> What was that race exactly ?

Not completely sure.  I _think_ the problem was that when the kernel was
switching from your SCHED_FIFO process to some other process, and an
interrupt occurred between the reenabling of interrupts and the
switch_to() in schedule(), and that interrupt tried to wake the
SCHED_FIFO process, it wasn't noticed until the next timeslice.  That
was as far as I got when the problem magically disappeared.  Due to
this hunk:

switch_to(prev, next, prev);
__schedule_tail(prev);

same_process:
reacquire_kernel_lock(current);
+   if (current->need_resched)
+   goto tq_scheduler_back;

return;


> There is a scheduler race which may also hurt (and is harder to fix):
> when the timer interrupt hits in syscall exit after the need_resched check
> was done then you may lose a time slice. The window can be quite long
> when signals are handled (after do_signal returned there is no reschedule
> check). Without signals it is only a few instructions window.
> 
> I have not checked if it really is a problem in practice though. With
> lots of signals it may be a problem.

Is it not a matter of:

a): checking need_resched after the call to do_signal() and

b): disabling local interrupts prior to the final need_resched
check to make this test atomic wrt interrupts.  RESTORE_ALL
will do the right thing and an intervening smp_send_reschedule()
will be blocked until the return to user space.

Seems too simple...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lowish-latency patch for 2.4.0-test9

2000-10-10 Thread Andrew Morton

Andi Kleen wrote:
 
 On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote:
  The little-low-latency patch for test9 is at
 
http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch
 
  Notes:
 
  - It now passes Benno's tests with 50% headroom (thanks to
Ingo's scheduler race fix).
 
 What was that race exactly ?

Not completely sure.  I _think_ the problem was that when the kernel was
switching from your SCHED_FIFO process to some other process, and an
interrupt occurred between the reenabling of interrupts and the
switch_to() in schedule(), and that interrupt tried to wake the
SCHED_FIFO process, it wasn't noticed until the next timeslice.  That
was as far as I got when the problem magically disappeared.  Due to
this hunk:

switch_to(prev, next, prev);
__schedule_tail(prev);

same_process:
reacquire_kernel_lock(current);
+   if (current-need_resched)
+   goto tq_scheduler_back;

return;


 There is a scheduler race which may also hurt (and is harder to fix):
 when the timer interrupt hits in syscall exit after the need_resched check
 was done then you may lose a time slice. The window can be quite long
 when signals are handled (after do_signal returned there is no reschedule
 check). Without signals it is only a few instructions window.
 
 I have not checked if it really is a problem in practice though. With
 lots of signals it may be a problem.

Is it not a matter of:

a): checking need_resched after the call to do_signal() and

b): disabling local interrupts prior to the final need_resched
check to make this test atomic wrt interrupts.  RESTORE_ALL
will do the right thing and an intervening smp_send_reschedule()
will be blocked until the return to user space.

Seems too simple...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lowish-latency patch for 2.4.0-test9

2000-10-06 Thread Rik van Riel

On Fri, 6 Oct 2000, Andrew Morton wrote:

> - Updated for the new VM.  (I'll have to ask Rik to take a
>   look at this part sometime).

I've taken a (very) quick look and it seems ok to me...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lowish-latency patch for 2.4.0-test9

2000-10-06 Thread Andi Kleen

On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote:
> The little-low-latency patch for test9 is at
> 
>   http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch
> 
> Notes:
> 
> - It now passes Benno's tests with 50% headroom (thanks to
>   Ingo's scheduler race fix).

What was that race exactly ?

There is a scheduler race which may also hurt (and is harder to fix):
when the timer interrupt hits in syscall exit after the need_resched check
was done then you may lose a time slice. The window can be quite long
when signals are handled (after do_signal returned there is no reschedule
check). Without signals it is only a few instructions window.

I have not checked if it really is a problem in practice though. With
lots of signals it may be a problem.



-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



lowish-latency patch for 2.4.0-test9

2000-10-06 Thread Andrew Morton

The little-low-latency patch for test9 is at

http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch

Notes:

- It now passes Benno's tests with 50% headroom (thanks to
  Ingo's scheduler race fix).

- Updated to follow the wandering ext2 truncate code.

- Updated for the new VM.  (I'll have to ask Rik to take a
  look at this part sometime).

- Set TASK_RUNNING in conditional_schedule().

  This is probably unnecessary - current->state appears to be
  always equal to TASK_RUNNING in the places I'm using it.

  So we test for this beforehand to avoid unnecessarily dirtying
  cache lines.

  (This optimisation should be done anyway, especially for SMP).

- net/ipv4/tcp_minisocks.c:tcp_twkill() can spend tens or even
  hundreds of milliseconds within a timer handler.  I have a
  fix for this, but Alexey agrees that this needs to be
  addressed independently of the low-latency patch.  So this file
  is untouched.

- This entire feature has been *disabled* for SMP.  This patch
  is now UP-only.

  It is completely stable on SMP and the scheduling latency is
  just grand, as long as you don't push things too hard.  It
  then comes unstuck.

  This is because of the following scenario:

  * CPU1 holds a long-lived spinlock such as dcache_lock
in select_parent().

  * CPU0 is spinning on the same lock.

  * An interrupt occurs and the kernel tries to wake up
your SCHED_FIFO task on CPU0.

  You lose.  Nothing happens until CPU1 releases the lock
  a week later.

  There are a number of ways of fixing this, but they're
  messy.

  One way is to identify those locks and to add a test for
  current->need_resched into the spin.  This gets nastier
  if the BKL is held at the same time.

  Another way is to write a fully-preemptible SMP kernel
  patch :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



lowish-latency patch for 2.4.0-test9

2000-10-06 Thread Andrew Morton

The little-low-latency patch for test9 is at

http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch

Notes:

- It now passes Benno's tests with 50% headroom (thanks to
  Ingo's scheduler race fix).

- Updated to follow the wandering ext2 truncate code.

- Updated for the new VM.  (I'll have to ask Rik to take a
  look at this part sometime).

- Set TASK_RUNNING in conditional_schedule().

  This is probably unnecessary - current-state appears to be
  always equal to TASK_RUNNING in the places I'm using it.

  So we test for this beforehand to avoid unnecessarily dirtying
  cache lines.

  (This optimisation should be done anyway, especially for SMP).

- net/ipv4/tcp_minisocks.c:tcp_twkill() can spend tens or even
  hundreds of milliseconds within a timer handler.  I have a
  fix for this, but Alexey agrees that this needs to be
  addressed independently of the low-latency patch.  So this file
  is untouched.

- This entire feature has been *disabled* for SMP.  This patch
  is now UP-only.

  It is completely stable on SMP and the scheduling latency is
  just grand, as long as you don't push things too hard.  It
  then comes unstuck.

  This is because of the following scenario:

  * CPU1 holds a long-lived spinlock such as dcache_lock
in select_parent().

  * CPU0 is spinning on the same lock.

  * An interrupt occurs and the kernel tries to wake up
your SCHED_FIFO task on CPU0.

  You lose.  Nothing happens until CPU1 releases the lock
  a week later.

  There are a number of ways of fixing this, but they're
  messy.

  One way is to identify those locks and to add a test for
  current-need_resched into the spin.  This gets nastier
  if the BKL is held at the same time.

  Another way is to write a fully-preemptible SMP kernel
  patch :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lowish-latency patch for 2.4.0-test9

2000-10-06 Thread Andi Kleen

On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote:
 The little-low-latency patch for test9 is at
 
   http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch
 
 Notes:
 
 - It now passes Benno's tests with 50% headroom (thanks to
   Ingo's scheduler race fix).

What was that race exactly ?

There is a scheduler race which may also hurt (and is harder to fix):
when the timer interrupt hits in syscall exit after the need_resched check
was done then you may lose a time slice. The window can be quite long
when signals are handled (after do_signal returned there is no reschedule
check). Without signals it is only a few instructions window.

I have not checked if it really is a problem in practice though. With
lots of signals it may be a problem.



-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: lowish-latency patch for 2.4.0-test9

2000-10-06 Thread Rik van Riel

On Fri, 6 Oct 2000, Andrew Morton wrote:

 - Updated for the new VM.  (I'll have to ask Rik to take a
   look at this part sometime).

I've taken a (very) quick look and it seems ok to me...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/