Re: lowish-latency patch for 2.4.0-test9
Andi Kleen wrote: > > On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote: > > The little-low-latency patch for test9 is at > > > > http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch > > > > Notes: > > > > - It now passes Benno's tests with 50% headroom (thanks to > > Ingo's scheduler race fix). > > What was that race exactly ? Not completely sure. I _think_ the problem was that when the kernel was switching from your SCHED_FIFO process to some other process, and an interrupt occurred between the reenabling of interrupts and the switch_to() in schedule(), and that interrupt tried to wake the SCHED_FIFO process, it wasn't noticed until the next timeslice. That was as far as I got when the problem magically disappeared. Due to this hunk: switch_to(prev, next, prev); __schedule_tail(prev); same_process: reacquire_kernel_lock(current); + if (current->need_resched) + goto tq_scheduler_back; return; > There is a scheduler race which may also hurt (and is harder to fix): > when the timer interrupt hits in syscall exit after the need_resched check > was done then you may lose a time slice. The window can be quite long > when signals are handled (after do_signal returned there is no reschedule > check). Without signals it is only a few instructions window. > > I have not checked if it really is a problem in practice though. With > lots of signals it may be a problem. Is it not a matter of: a): checking need_resched after the call to do_signal() and b): disabling local interrupts prior to the final need_resched check to make this test atomic wrt interrupts. RESTORE_ALL will do the right thing and an intervening smp_send_reschedule() will be blocked until the return to user space. Seems too simple... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: lowish-latency patch for 2.4.0-test9
Andi Kleen wrote: On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote: The little-low-latency patch for test9 is at http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch Notes: - It now passes Benno's tests with 50% headroom (thanks to Ingo's scheduler race fix). What was that race exactly ? Not completely sure. I _think_ the problem was that when the kernel was switching from your SCHED_FIFO process to some other process, and an interrupt occurred between the reenabling of interrupts and the switch_to() in schedule(), and that interrupt tried to wake the SCHED_FIFO process, it wasn't noticed until the next timeslice. That was as far as I got when the problem magically disappeared. Due to this hunk: switch_to(prev, next, prev); __schedule_tail(prev); same_process: reacquire_kernel_lock(current); + if (current-need_resched) + goto tq_scheduler_back; return; There is a scheduler race which may also hurt (and is harder to fix): when the timer interrupt hits in syscall exit after the need_resched check was done then you may lose a time slice. The window can be quite long when signals are handled (after do_signal returned there is no reschedule check). Without signals it is only a few instructions window. I have not checked if it really is a problem in practice though. With lots of signals it may be a problem. Is it not a matter of: a): checking need_resched after the call to do_signal() and b): disabling local interrupts prior to the final need_resched check to make this test atomic wrt interrupts. RESTORE_ALL will do the right thing and an intervening smp_send_reschedule() will be blocked until the return to user space. Seems too simple... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: lowish-latency patch for 2.4.0-test9
On Fri, 6 Oct 2000, Andrew Morton wrote: > - Updated for the new VM. (I'll have to ask Rik to take a > look at this part sometime). I've taken a (very) quick look and it seems ok to me... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: lowish-latency patch for 2.4.0-test9
On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote: > The little-low-latency patch for test9 is at > > http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch > > Notes: > > - It now passes Benno's tests with 50% headroom (thanks to > Ingo's scheduler race fix). What was that race exactly ? There is a scheduler race which may also hurt (and is harder to fix): when the timer interrupt hits in syscall exit after the need_resched check was done then you may lose a time slice. The window can be quite long when signals are handled (after do_signal returned there is no reschedule check). Without signals it is only a few instructions window. I have not checked if it really is a problem in practice though. With lots of signals it may be a problem. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
lowish-latency patch for 2.4.0-test9
The little-low-latency patch for test9 is at http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch Notes: - It now passes Benno's tests with 50% headroom (thanks to Ingo's scheduler race fix). - Updated to follow the wandering ext2 truncate code. - Updated for the new VM. (I'll have to ask Rik to take a look at this part sometime). - Set TASK_RUNNING in conditional_schedule(). This is probably unnecessary - current->state appears to be always equal to TASK_RUNNING in the places I'm using it. So we test for this beforehand to avoid unnecessarily dirtying cache lines. (This optimisation should be done anyway, especially for SMP). - net/ipv4/tcp_minisocks.c:tcp_twkill() can spend tens or even hundreds of milliseconds within a timer handler. I have a fix for this, but Alexey agrees that this needs to be addressed independently of the low-latency patch. So this file is untouched. - This entire feature has been *disabled* for SMP. This patch is now UP-only. It is completely stable on SMP and the scheduling latency is just grand, as long as you don't push things too hard. It then comes unstuck. This is because of the following scenario: * CPU1 holds a long-lived spinlock such as dcache_lock in select_parent(). * CPU0 is spinning on the same lock. * An interrupt occurs and the kernel tries to wake up your SCHED_FIFO task on CPU0. You lose. Nothing happens until CPU1 releases the lock a week later. There are a number of ways of fixing this, but they're messy. One way is to identify those locks and to add a test for current->need_resched into the spin. This gets nastier if the BKL is held at the same time. Another way is to write a fully-preemptible SMP kernel patch :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
lowish-latency patch for 2.4.0-test9
The little-low-latency patch for test9 is at http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch Notes: - It now passes Benno's tests with 50% headroom (thanks to Ingo's scheduler race fix). - Updated to follow the wandering ext2 truncate code. - Updated for the new VM. (I'll have to ask Rik to take a look at this part sometime). - Set TASK_RUNNING in conditional_schedule(). This is probably unnecessary - current-state appears to be always equal to TASK_RUNNING in the places I'm using it. So we test for this beforehand to avoid unnecessarily dirtying cache lines. (This optimisation should be done anyway, especially for SMP). - net/ipv4/tcp_minisocks.c:tcp_twkill() can spend tens or even hundreds of milliseconds within a timer handler. I have a fix for this, but Alexey agrees that this needs to be addressed independently of the low-latency patch. So this file is untouched. - This entire feature has been *disabled* for SMP. This patch is now UP-only. It is completely stable on SMP and the scheduling latency is just grand, as long as you don't push things too hard. It then comes unstuck. This is because of the following scenario: * CPU1 holds a long-lived spinlock such as dcache_lock in select_parent(). * CPU0 is spinning on the same lock. * An interrupt occurs and the kernel tries to wake up your SCHED_FIFO task on CPU0. You lose. Nothing happens until CPU1 releases the lock a week later. There are a number of ways of fixing this, but they're messy. One way is to identify those locks and to add a test for current-need_resched into the spin. This gets nastier if the BKL is held at the same time. Another way is to write a fully-preemptible SMP kernel patch :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: lowish-latency patch for 2.4.0-test9
On Fri, Oct 06, 2000 at 10:00:36PM +1100, Andrew Morton wrote: The little-low-latency patch for test9 is at http://www.uow.edu.au/~andrewm/linux/2.4.0-test9-low-latency.patch Notes: - It now passes Benno's tests with 50% headroom (thanks to Ingo's scheduler race fix). What was that race exactly ? There is a scheduler race which may also hurt (and is harder to fix): when the timer interrupt hits in syscall exit after the need_resched check was done then you may lose a time slice. The window can be quite long when signals are handled (after do_signal returned there is no reschedule check). Without signals it is only a few instructions window. I have not checked if it really is a problem in practice though. With lots of signals it may be a problem. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: lowish-latency patch for 2.4.0-test9
On Fri, 6 Oct 2000, Andrew Morton wrote: - Updated for the new VM. (I'll have to ask Rik to take a look at this part sometime). I've taken a (very) quick look and it seems ok to me... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/