Re: Latency: allowing resheduling while holding spin_locks

2001-01-15 Thread george anzinger

Roger Larsson wrote:
> 
> On Sunday 14 January 2001 01:06, george anzinger wrote:
> > Nigel Gamble wrote:
> > > On Sat, 13 Jan 2001, Roger Larsson wrote:
> > > > A rethinking of the rescheduling strategy...
> > >
> > > Actually, I think you have more-or-less described how successful
> > > preemptible kernels have already been developed, given that your
> > > "sleeping spin locks" are really just sleeping mutexes (or binary
> > > semaphores).
> > >
> > > 1.  Short critical regions are protected by spin_lock_irq().  The maximum
> > > value of "short" is therefore bounded by the maximum time we are happy
> > > to disable (local) interrupts - ideally ~100us.
> > >
> > > 2.  Longer regions are protected by sleeping mutexes.
> > >
> > > 3.  Algorithms are rearchitected until all of the highly contended locks
> > > are of type 1, and only low contention locks are of type 2.
> > >
> > > This approach has the advantage that we don't need to use a no-preempt
> > > count, and test it on exit from every spinlock to see if a preempting
> > > interrupt that has caused a need_resched has occurred, since we won't
> > > see the interrupt until it's safe to do the preemptive resched.
> >
> > I agree that this was true in days of yore.  But these days the irq
> > instructions introduce serialization points and, me thinks, may be much
> > more time consuming than the "++, --, if (false)" that a preemption
> > count implemtation introduces.  Could some one with a knowledge of the
> > hardware comment on this?
> >
> > I am not suggesting that the "++, --, if (false)" is faster than an
> > interrupt, but that it is faster than cli, sti.  Of course we are
> > assuming that there is  between the cli and the sti as there is
> > between the ++ and the -- if (false).
> >
> 
> The problem with counting scheme is that you can not schedule inside any
> spinlock - you have to split them up. Maybe you will have to do that anyway.
> But if your RT process never needs more memory - it should be quite safe.
> 
> The difference with a sleeping mutex is that it can be made lazier - keep it
> in the runlist, there should be very few...
> 
Nigel and I agree on the approach he has layed out with the possible
exception of just how to handle the short spinlocks.  It is agreed that
we can not preempt a task that has a spinlock.  He suggests that the
overhead of testing for preemption on the exit of a spinlock protected
with the preempt_count is higher than the cost of turning off and on the
interrupt system.  He may well be right, and surly was right 5 or 10
years ago.  Today the cost of an cli or sti is much higher relative to
the memory references, especially if we don't need to make the result
visible to other processors (and we don't).  We only have to serialize
WRT our own interrupt system, but the interrupt itself will do this, and
only when we need it.

snip

WRT your patch, A big problem with simple sleeping mutexes is that of
priority inversion.  An example:

Given tasks L of low priority, M of medium, and H of high and X a mutex.
If L is holding X when it is preempted by M and
M wants to run a long time
Then when H preempts M and trys to get X it will have to wait while M
does his thing, just because L can not get the cycles needed to get out
of X.

A priority inherit mutex (pi_mutex) handles this by, when H trys to get
X, boosting the priority of L (the holder of X) to its own priority
until L releases X.  At this point L reverts to its prior priority and H
continues, now having suceeded in getting X.  This is all complicated,
of course, by remembering that a task can hold several mutexes at a time
and each can have several waiters.

>From a real time point of view, we would NEVER want to scan the task
list looking for someone to wake up.  We should know who to wake up from
the getgo.  Likewise, clutter in the run_list adds wasted cycles and
cache lines to the schedule process.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Latency: allowing resheduling while holding spin_locks

2001-01-15 Thread Roger Larsson

On Sunday 14 January 2001 01:06, george anzinger wrote:
> Nigel Gamble wrote:
> > On Sat, 13 Jan 2001, Roger Larsson wrote:
> > > A rethinking of the rescheduling strategy...
> >
> > Actually, I think you have more-or-less described how successful
> > preemptible kernels have already been developed, given that your
> > "sleeping spin locks" are really just sleeping mutexes (or binary
> > semaphores).
> >
> > 1.  Short critical regions are protected by spin_lock_irq().  The maximum
> > value of "short" is therefore bounded by the maximum time we are happy
> > to disable (local) interrupts - ideally ~100us.
> >
> > 2.  Longer regions are protected by sleeping mutexes.
> >
> > 3.  Algorithms are rearchitected until all of the highly contended locks
> > are of type 1, and only low contention locks are of type 2.
> >
> > This approach has the advantage that we don't need to use a no-preempt
> > count, and test it on exit from every spinlock to see if a preempting
> > interrupt that has caused a need_resched has occurred, since we won't
> > see the interrupt until it's safe to do the preemptive resched.
>
> I agree that this was true in days of yore.  But these days the irq
> instructions introduce serialization points and, me thinks, may be much
> more time consuming than the "++, --, if (false)" that a preemption
> count implemtation introduces.  Could some one with a knowledge of the
> hardware comment on this?
>
> I am not suggesting that the "++, --, if (false)" is faster than an
> interrupt, but that it is faster than cli, sti.  Of course we are
> assuming that there is  between the cli and the sti as there is
> between the ++ and the -- if (false).
>

The problem with counting scheme is that you can not schedule inside any
spinlock - you have to split them up. Maybe you will have to do that anyway.
But if your RT process never needs more memory - it should be quite safe.

The difference with a sleeping mutex is that it can be made lazier - keep it
in the runlist, there should be very few...

See first patch attempt.

(George, Nigel told me about your idea before I sent the previous mail. So 
major influence comes from you. But I do not think that it is equivalent)

/RogerL

Note: changed email...

--- ./linux/kernel/sched.c.orig	Sat Jan 13 19:19:20 2001
+++ ./linux/kernel/sched.c	Sat Jan 13 23:27:13 2001
@@ -144,7 +144,7 @@
 	 * Also, dont trigger a counter recalculation.
 	 */
 	weight = -1;
-	if (p->policy & SCHED_YIELD)
+	if (p->policy & (SCHED_YIELD | SCHED_SPINLOCK))
 		goto out;
 
 	/*
@@ -978,7 +978,7 @@
 	read_lock(_lock);
 	p = find_process_by_pid(pid);
 	if (p)
-		retval = p->policy & ~SCHED_YIELD;
+		retval = p->policy & ~(SCHED_YIELD | SCHED_SPINLOCK);
 	read_unlock(_lock);
 
 out_nounlock:
@@ -1267,3 +1267,54 @@
 	atomic_inc(_mm.mm_count);
 	enter_lazy_tlb(_mm, current, cpu);
 }
+
+void wakeup_spinlock_yielder(spinlock_t *lock)
+{
+	int need_resched = 0;
+	struct list_head *tmp;
+	struct task_struct *p;
+	
+	/* I do not like this part...
+	*   not SMP safe, the runqueue might change under us...
+	*   can not use spinlocks...
+	*   runlist might be long...
+	*/
+	local_irqsave();
+	if (lock->spin) {
+		/* someone is "spinning" on it
+		 * it has to have higher prio than this
+		 * let go of ALL :-( spinning processes
+		 */
+		lock->spin = 0;
+
+		list_for_each(tmp, _head) {
+			p = list_entry(tmp, struct task_struct, run_list);
+			if (p->policy & SCHED_SPINLOCK) {
+p->policy &= ~SCHED_SPINLOCK;
+			}
+		}
+
+		need_resched = 1;
+	}
+	local_irqrestore();
+
+	/* all higher prio will get a chance to run... */
+	if (need_resched)
+		schedule_running();
+}
+
+void schedule_spinlock(spinlock_t *lock)
+{
+	while (test_and_set(lock->lock)) {
+		/* note: owner can not race here, it has lower prio */
+
+		lock->spinon = 1;
+		p->policy |= SCHED_SPINLOCK;
+		schedule_running();
+		/* will be released in priority order */
+	}
+}
+
+
+
+
--- ./linux/include/linux/sched.h.orig	Sat Jan 13 19:25:53 2001
+++ ./linux/include/linux/sched.h	Sat Jan 13 19:26:31 2001
@@ -119,6 +119,7 @@
  * yield the CPU for one re-schedule..
  */
 #define SCHED_YIELD		0x10
+#define SCHED_SPINLOCK  0x20
 
 struct sched_param {
 	int sched_priority;
--- ./linux/include/linux/spinlock.h.orig	Sat Jan 13 19:40:30 2001
+++ ./linux/include/linux/spinlock.h	Sat Jan 13 21:51:14 2001
@@ -66,16 +66,37 @@
 
 typedef struct {
 	volatile unsigned long lock;
+	??? queue;
 } spinlock_t;
 #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
 
+void wakeup_spinlock_yielder(spinlock_t *lock);
+void schedule_spinlock(spinlock_t *lock);
+
 #define spin_lock_init(x)	do { (x)->lock = 0; } while (0)
 #define spin_is_locked(lock)	(test_bit(0,(lock)))
-#define spin_trylock(lock)	(!test_and_set_bit(0,(lock)))
+#define spin_trylock(lock)	(!test_and_set_bit(0,(lock))) /* fail handled */
+
+#define spin_lock(x)		do { 
+if (test_and_set(lock->lock)) \
+	   schedule_spinlock(); /* kind of yield, giving low goodness, sticky */ \
+} 

Re: Latency: allowing resheduling while holding spin_locks

2001-01-15 Thread Roger Larsson

On Sunday 14 January 2001 01:06, george anzinger wrote:
 Nigel Gamble wrote:
  On Sat, 13 Jan 2001, Roger Larsson wrote:
   A rethinking of the rescheduling strategy...
 
  Actually, I think you have more-or-less described how successful
  preemptible kernels have already been developed, given that your
  "sleeping spin locks" are really just sleeping mutexes (or binary
  semaphores).
 
  1.  Short critical regions are protected by spin_lock_irq().  The maximum
  value of "short" is therefore bounded by the maximum time we are happy
  to disable (local) interrupts - ideally ~100us.
 
  2.  Longer regions are protected by sleeping mutexes.
 
  3.  Algorithms are rearchitected until all of the highly contended locks
  are of type 1, and only low contention locks are of type 2.
 
  This approach has the advantage that we don't need to use a no-preempt
  count, and test it on exit from every spinlock to see if a preempting
  interrupt that has caused a need_resched has occurred, since we won't
  see the interrupt until it's safe to do the preemptive resched.

 I agree that this was true in days of yore.  But these days the irq
 instructions introduce serialization points and, me thinks, may be much
 more time consuming than the "++, --, if (false)" that a preemption
 count implemtation introduces.  Could some one with a knowledge of the
 hardware comment on this?

 I am not suggesting that the "++, --, if (false)" is faster than an
 interrupt, but that it is faster than cli, sti.  Of course we are
 assuming that there is stuff between the cli and the sti as there is
 between the ++ and the -- if (false).


The problem with counting scheme is that you can not schedule inside any
spinlock - you have to split them up. Maybe you will have to do that anyway.
But if your RT process never needs more memory - it should be quite safe.

The difference with a sleeping mutex is that it can be made lazier - keep it
in the runlist, there should be very few...

See first patch attempt.

(George, Nigel told me about your idea before I sent the previous mail. So 
major influence comes from you. But I do not think that it is equivalent)

/RogerL

Note: changed email...

--- ./linux/kernel/sched.c.orig	Sat Jan 13 19:19:20 2001
+++ ./linux/kernel/sched.c	Sat Jan 13 23:27:13 2001
@@ -144,7 +144,7 @@
 	 * Also, dont trigger a counter recalculation.
 	 */
 	weight = -1;
-	if (p-policy  SCHED_YIELD)
+	if (p-policy  (SCHED_YIELD | SCHED_SPINLOCK))
 		goto out;
 
 	/*
@@ -978,7 +978,7 @@
 	read_lock(tasklist_lock);
 	p = find_process_by_pid(pid);
 	if (p)
-		retval = p-policy  ~SCHED_YIELD;
+		retval = p-policy  ~(SCHED_YIELD | SCHED_SPINLOCK);
 	read_unlock(tasklist_lock);
 
 out_nounlock:
@@ -1267,3 +1267,54 @@
 	atomic_inc(init_mm.mm_count);
 	enter_lazy_tlb(init_mm, current, cpu);
 }
+
+void wakeup_spinlock_yielder(spinlock_t *lock)
+{
+	int need_resched = 0;
+	struct list_head *tmp;
+	struct task_struct *p;
+	
+	/* I do not like this part...
+	*   not SMP safe, the runqueue might change under us...
+	*   can not use spinlocks...
+	*   runlist might be long...
+	*/
+	local_irqsave(flags);
+	if (lock-spin) {
+		/* someone is "spinning" on it
+		 * it has to have higher prio than this
+		 * let go of ALL :-( spinning processes
+		 */
+		lock-spin = 0;
+
+		list_for_each(tmp, runqueue_head) {
+			p = list_entry(tmp, struct task_struct, run_list);
+			if (p-policy  SCHED_SPINLOCK) {
+p-policy = ~SCHED_SPINLOCK;
+			}
+		}
+
+		need_resched = 1;
+	}
+	local_irqrestore(flags);
+
+	/* all higher prio will get a chance to run... */
+	if (need_resched)
+		schedule_running();
+}
+
+void schedule_spinlock(spinlock_t *lock)
+{
+	while (test_and_set(lock-lock)) {
+		/* note: owner can not race here, it has lower prio */
+
+		lock-spinon = 1;
+		p-policy |= SCHED_SPINLOCK;
+		schedule_running();
+		/* will be released in priority order */
+	}
+}
+
+
+
+
--- ./linux/include/linux/sched.h.orig	Sat Jan 13 19:25:53 2001
+++ ./linux/include/linux/sched.h	Sat Jan 13 19:26:31 2001
@@ -119,6 +119,7 @@
  * yield the CPU for one re-schedule..
  */
 #define SCHED_YIELD		0x10
+#define SCHED_SPINLOCK  0x20
 
 struct sched_param {
 	int sched_priority;
--- ./linux/include/linux/spinlock.h.orig	Sat Jan 13 19:40:30 2001
+++ ./linux/include/linux/spinlock.h	Sat Jan 13 21:51:14 2001
@@ -66,16 +66,37 @@
 
 typedef struct {
 	volatile unsigned long lock;
+	??? queue;
 } spinlock_t;
 #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
 
+void wakeup_spinlock_yielder(spinlock_t *lock);
+void schedule_spinlock(spinlock_t *lock);
+
 #define spin_lock_init(x)	do { (x)-lock = 0; } while (0)
 #define spin_is_locked(lock)	(test_bit(0,(lock)))
-#define spin_trylock(lock)	(!test_and_set_bit(0,(lock)))
+#define spin_trylock(lock)	(!test_and_set_bit(0,(lock))) /* fail handled */
+
+#define spin_lock(x)		do { 
+if (test_and_set(lock-lock)) \
+	   schedule_spinlock(); /* kind of yield, giving low goodness, sticky */ \
+} while (0)
+
+#define 

Re: Latency: allowing resheduling while holding spin_locks

2001-01-15 Thread george anzinger

Roger Larsson wrote:
 
 On Sunday 14 January 2001 01:06, george anzinger wrote:
  Nigel Gamble wrote:
   On Sat, 13 Jan 2001, Roger Larsson wrote:
A rethinking of the rescheduling strategy...
  
   Actually, I think you have more-or-less described how successful
   preemptible kernels have already been developed, given that your
   "sleeping spin locks" are really just sleeping mutexes (or binary
   semaphores).
  
   1.  Short critical regions are protected by spin_lock_irq().  The maximum
   value of "short" is therefore bounded by the maximum time we are happy
   to disable (local) interrupts - ideally ~100us.
  
   2.  Longer regions are protected by sleeping mutexes.
  
   3.  Algorithms are rearchitected until all of the highly contended locks
   are of type 1, and only low contention locks are of type 2.
  
   This approach has the advantage that we don't need to use a no-preempt
   count, and test it on exit from every spinlock to see if a preempting
   interrupt that has caused a need_resched has occurred, since we won't
   see the interrupt until it's safe to do the preemptive resched.
 
  I agree that this was true in days of yore.  But these days the irq
  instructions introduce serialization points and, me thinks, may be much
  more time consuming than the "++, --, if (false)" that a preemption
  count implemtation introduces.  Could some one with a knowledge of the
  hardware comment on this?
 
  I am not suggesting that the "++, --, if (false)" is faster than an
  interrupt, but that it is faster than cli, sti.  Of course we are
  assuming that there is stuff between the cli and the sti as there is
  between the ++ and the -- if (false).
 
 
 The problem with counting scheme is that you can not schedule inside any
 spinlock - you have to split them up. Maybe you will have to do that anyway.
 But if your RT process never needs more memory - it should be quite safe.
 
 The difference with a sleeping mutex is that it can be made lazier - keep it
 in the runlist, there should be very few...
 
Nigel and I agree on the approach he has layed out with the possible
exception of just how to handle the short spinlocks.  It is agreed that
we can not preempt a task that has a spinlock.  He suggests that the
overhead of testing for preemption on the exit of a spinlock protected
with the preempt_count is higher than the cost of turning off and on the
interrupt system.  He may well be right, and surly was right 5 or 10
years ago.  Today the cost of an cli or sti is much higher relative to
the memory references, especially if we don't need to make the result
visible to other processors (and we don't).  We only have to serialize
WRT our own interrupt system, but the interrupt itself will do this, and
only when we need it.

snip

WRT your patch, A big problem with simple sleeping mutexes is that of
priority inversion.  An example:

Given tasks L of low priority, M of medium, and H of high and X a mutex.
If L is holding X when it is preempted by M and
M wants to run a long time
Then when H preempts M and trys to get X it will have to wait while M
does his thing, just because L can not get the cycles needed to get out
of X.

A priority inherit mutex (pi_mutex) handles this by, when H trys to get
X, boosting the priority of L (the holder of X) to its own priority
until L releases X.  At this point L reverts to its prior priority and H
continues, now having suceeded in getting X.  This is all complicated,
of course, by remembering that a task can hold several mutexes at a time
and each can have several waiters.

From a real time point of view, we would NEVER want to scan the task
list looking for someone to wake up.  We should know who to wake up from
the getgo.  Likewise, clutter in the run_list adds wasted cycles and
cache lines to the schedule process.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread george anzinger

Nigel Gamble wrote:
> 
> On Sat, 13 Jan 2001, Roger Larsson wrote:
> > A rethinking of the rescheduling strategy...
> 
> Actually, I think you have more-or-less described how successful
> preemptible kernels have already been developed, given that your
> "sleeping spin locks" are really just sleeping mutexes (or binary
> semaphores).
> 
> 1.  Short critical regions are protected by spin_lock_irq().  The maximum
> value of "short" is therefore bounded by the maximum time we are happy
> to disable (local) interrupts - ideally ~100us.
> 
> 2.  Longer regions are protected by sleeping mutexes.
> 
> 3.  Algorithms are rearchitected until all of the highly contended locks
> are of type 1, and only low contention locks are of type 2.
> 
> This approach has the advantage that we don't need to use a no-preempt
> count, and test it on exit from every spinlock to see if a preempting
> interrupt that has caused a need_resched has occurred, since we won't
> see the interrupt until it's safe to do the preemptive resched.

I agree that this was true in days of yore.  But these days the irq
instructions introduce serialization points and, me thinks, may be much
more time consuming than the "++, --, if (false)" that a preemption
count implemtation introduces.  Could some one with a knowledge of the
hardware comment on this?

I am not suggesting that the "++, --, if (false)" is faster than an
interrupt, but that it is faster than cli, sti.  Of course we are
assuming that there is  between the cli and the sti as there is
between the ++ and the -- if (false).

George

> 
> Nigel Gamble[EMAIL PROTECTED]
> Mountain View, CA, USA. http://www.nrg.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread Nigel Gamble

On Sat, 13 Jan 2001, Roger Larsson wrote:
> A rethinking of the rescheduling strategy...

Actually, I think you have more-or-less described how successful
preemptible kernels have already been developed, given that your
"sleeping spin locks" are really just sleeping mutexes (or binary
semaphores).

1.  Short critical regions are protected by spin_lock_irq().  The maximum
value of "short" is therefore bounded by the maximum time we are happy
to disable (local) interrupts - ideally ~100us.

2.  Longer regions are protected by sleeping mutexes.

3.  Algorithms are rearchitected until all of the highly contended locks
are of type 1, and only low contention locks are of type 2.

This approach has the advantage that we don't need to use a no-preempt
count, and test it on exit from every spinlock to see if a preempting
interrupt that has caused a need_resched has occurred, since we won't
see the interrupt until it's safe to do the preemptive resched.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread Roger Larsson

Hi,

A rethinking of the rescheduling strategy...

I have come to this conclusion.

A spinlock prevents other processes to enter that specific region.
But interrupts are allowed they might delay 
execution of a spin locked
reqion for a undefined (small but anyway) time.

Code with critical maximum times should use spin_lock_irq !

=> spin_locks are not about disallowing reschedules.


Prior to the introduction of spin locks it did not make sense to
allow reschedules in kernel since the big kernel lock was so big...
Any code that wanted do any non pure computation task would
hit it very quickly.

Now with spin locks the situation is quite different...

[First assume UP kernel for simplicity]

Suppose you have two processes one that normal (P) and one high priority 
(RTP).

P runs user code, makes a system call, enters a spin lock region.

Interrupt!

The interrupt service routine wakes up RTP, which marks P as need_reschedule, 
and returns, on return from interrupt it detects that P needs_reschedule -
do it even if it is executing in kernel and holding a spin_lock.

RTP starts, and if it does not hit the same spin_lock there is nothing 
special happening until it goes to sleep again. But suppose it does!

RTP tries to get the spin_lock but fails, since it is the currently highest 
prio process and P is running it wants to reschedule to P to get its own 
stuff done.

P runs the final part of its spin_locked region, upon spin_unlock it needs to
get RTP running.

Something like this:

spin_lock(lock)
{
while (test_and_set(lock->lock)) {
schedule_spinlock(); /* kind of yield, giving low goodness, sticky */
}
}

spin_unlock(lock)
{
clear(lock);

/* note: someone with higher prio than me,
   might steal the lock from even higher prio waiters here */

if (lock->queue)
wakeup_spinlock_yielder(lock);
}


schedule_spinlock()
{
/* note: owner can not run here, it has lower prio */

addqueue(lock->queue, current);

p->policy |= SCHED_SPINLOCK;
schedule();
}

wakeup_spinlock_yielder(lock)
{
int need_resched = 0;

int my_goodness = goodness(current);

forall p in lock->queue
p->policy &= ~SCHED_SPINLOCK;
if (goodness(p) > my_goodness)
need_resched = 1;
}

if (need_resched)
schedule();
}


A final note on spin_lock_irq, since they prevent IRQs there will be no 
requests to wakeup any process during their locked region => no problems.

-- 
Home page:
  no currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread Roger Larsson

Hi,

A rethinking of the rescheduling strategy...

I have come to this conclusion.

A spinlock prevents other processes to enter that specific region.
But interrupts are allowed they might delay 
execution of a spin locked
reqion for a undefined (small but anyway) time.

Code with critical maximum times should use spin_lock_irq !

= spin_locks are not about disallowing reschedules.


Prior to the introduction of spin locks it did not make sense to
allow reschedules in kernel since the big kernel lock was so big...
Any code that wanted do any non pure computation task would
hit it very quickly.

Now with spin locks the situation is quite different...

[First assume UP kernel for simplicity]

Suppose you have two processes one that normal (P) and one high priority 
(RTP).

P runs user code, makes a system call, enters a spin lock region.

Interrupt!

The interrupt service routine wakes up RTP, which marks P as need_reschedule, 
and returns, on return from interrupt it detects that P needs_reschedule -
do it even if it is executing in kernel and holding a spin_lock.

RTP starts, and if it does not hit the same spin_lock there is nothing 
special happening until it goes to sleep again. But suppose it does!

RTP tries to get the spin_lock but fails, since it is the currently highest 
prio process and P is running it wants to reschedule to P to get its own 
stuff done.

P runs the final part of its spin_locked region, upon spin_unlock it needs to
get RTP running.

Something like this:

spin_lock(lock)
{
while (test_and_set(lock-lock)) {
schedule_spinlock(); /* kind of yield, giving low goodness, sticky */
}
}

spin_unlock(lock)
{
clear(lock);

/* note: someone with higher prio than me,
   might steal the lock from even higher prio waiters here */

if (lock-queue)
wakeup_spinlock_yielder(lock);
}


schedule_spinlock()
{
/* note: owner can not run here, it has lower prio */

addqueue(lock-queue, current);

p-policy |= SCHED_SPINLOCK;
schedule();
}

wakeup_spinlock_yielder(lock)
{
int need_resched = 0;

int my_goodness = goodness(current);

forall p in lock-queue
p-policy = ~SCHED_SPINLOCK;
if (goodness(p)  my_goodness)
need_resched = 1;
}

if (need_resched)
schedule();
}


A final note on spin_lock_irq, since they prevent IRQs there will be no 
requests to wakeup any process during their locked region = no problems.

-- 
Home page:
  no currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread Nigel Gamble

On Sat, 13 Jan 2001, Roger Larsson wrote:
 A rethinking of the rescheduling strategy...

Actually, I think you have more-or-less described how successful
preemptible kernels have already been developed, given that your
"sleeping spin locks" are really just sleeping mutexes (or binary
semaphores).

1.  Short critical regions are protected by spin_lock_irq().  The maximum
value of "short" is therefore bounded by the maximum time we are happy
to disable (local) interrupts - ideally ~100us.

2.  Longer regions are protected by sleeping mutexes.

3.  Algorithms are rearchitected until all of the highly contended locks
are of type 1, and only low contention locks are of type 2.

This approach has the advantage that we don't need to use a no-preempt
count, and test it on exit from every spinlock to see if a preempting
interrupt that has caused a need_resched has occurred, since we won't
see the interrupt until it's safe to do the preemptive resched.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread george anzinger

Nigel Gamble wrote:
 
 On Sat, 13 Jan 2001, Roger Larsson wrote:
  A rethinking of the rescheduling strategy...
 
 Actually, I think you have more-or-less described how successful
 preemptible kernels have already been developed, given that your
 "sleeping spin locks" are really just sleeping mutexes (or binary
 semaphores).
 
 1.  Short critical regions are protected by spin_lock_irq().  The maximum
 value of "short" is therefore bounded by the maximum time we are happy
 to disable (local) interrupts - ideally ~100us.
 
 2.  Longer regions are protected by sleeping mutexes.
 
 3.  Algorithms are rearchitected until all of the highly contended locks
 are of type 1, and only low contention locks are of type 2.
 
 This approach has the advantage that we don't need to use a no-preempt
 count, and test it on exit from every spinlock to see if a preempting
 interrupt that has caused a need_resched has occurred, since we won't
 see the interrupt until it's safe to do the preemptive resched.

I agree that this was true in days of yore.  But these days the irq
instructions introduce serialization points and, me thinks, may be much
more time consuming than the "++, --, if (false)" that a preemption
count implemtation introduces.  Could some one with a knowledge of the
hardware comment on this?

I am not suggesting that the "++, --, if (false)" is faster than an
interrupt, but that it is faster than cli, sti.  Of course we are
assuming that there is stuff between the cli and the sti as there is
between the ++ and the -- if (false).

George

 
 Nigel Gamble[EMAIL PROTECTED]
 Mountain View, CA, USA. http://www.nrg.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/