Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-22 Thread Paul McKenney


> > But if you are suppressing preemption in all read-side critical
sections,
> > then wouldn't any already-preempted tasks be guaranteed to -not- be in
> > a read-side critical section, and therefore be guaranteed to be
unaffected
> > by the update (in other words, wouldn't such tasks not need to be
waited
> > for)?
>
> Ah, if you want to inc and dec all the time, yes.  But even if the
> performance isn't hurt, it's unneccessary, and something else people
> have to remember to do.

I must admit that free is a very good price.

> Simplicity is very nice.  And in the case of module unload, gives us
> the ability to avoid the distinction between "am I calling into a
> module?" and "is this fixed in the kernel?" at runtime.  A very good
> thing 8)

Is it also desireable to avoid the distinction between "the currently
executing code is in a module" and "the currently executing code is
fixed in the kernel"?

> Rusty.

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-22 Thread Paul McKenney


  But if you are suppressing preemption in all read-side critical
sections,
  then wouldn't any already-preempted tasks be guaranteed to -not- be in
  a read-side critical section, and therefore be guaranteed to be
unaffected
  by the update (in other words, wouldn't such tasks not need to be
waited
  for)?

 Ah, if you want to inc and dec all the time, yes.  But even if the
 performance isn't hurt, it's unneccessary, and something else people
 have to remember to do.

I must admit that free is a very good price.

 Simplicity is very nice.  And in the case of module unload, gives us
 the ability to avoid the distinction between "am I calling into a
 module?" and "is this fixed in the kernel?" at runtime.  A very good
 thing 8)

Is it also desireable to avoid the distinction between "the currently
executing code is in a module" and "the currently executing code is
fixed in the kernel"?

 Rusty.

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-17 Thread Rusty Russell

In message  you write:
> > Already preempted tasks.
> 
> But if you are suppressing preemption in all read-side critical sections,
> then wouldn't any already-preempted tasks be guaranteed to -not- be in
> a read-side critical section, and therefore be guaranteed to be unaffected
> by the update (in other words, wouldn't such tasks not need to be waited
> for)?

Ah, if you want to inc and dec all the time, yes.  But even if the
performance isn't hurt, it's unneccessary, and something else people
have to remember to do.

Simplicity is very nice.  And in the case of module unload, gives us
the ability to avoid the distinction between "am I calling into a
module?" and "is this fixed in the kernel?" at runtime.  A very good
thing 8)

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-17 Thread Rusty Russell

In message OF42269F5F.CDF56B0F-ON88256A27.0083566F@LocalDomain you write:
  Already preempted tasks.
 
 But if you are suppressing preemption in all read-side critical sections,
 then wouldn't any already-preempted tasks be guaranteed to -not- be in
 a read-side critical section, and therefore be guaranteed to be unaffected
 by the update (in other words, wouldn't such tasks not need to be waited
 for)?

Ah, if you want to inc and dec all the time, yes.  But even if the
performance isn't hurt, it's unneccessary, and something else people
have to remember to do.

Simplicity is very nice.  And in the case of module unload, gives us
the ability to avoid the distinction between "am I calling into a
module?" and "is this fixed in the kernel?" at runtime.  A very good
thing 8)

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-11 Thread Nigel Gamble

On Tue, 10 Apr 2001 [EMAIL PROTECTED] wrote:
> On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote:
> > > Disabling preemption is a possible solution if the critical section is
> > short
> > > - less than 100us - otherwise preemption latencies become a problem.
> > 
> > Seems like a reasonable restriction.  Of course, this same limit applies
> > to locks and interrupt disabling, right?
> 
> So supposing 1/2 us per update
>   lock process list
>   for every process update pgd
>   unlock process list
> 
> is ok if #processes <  200, but can cause some unspecified system failure
> due to a dependency on the 100us limit otherwise?

Only to a hard real-time system.

> And on a slower machine or with some heavy I/O possibilities 

I'm mostly interested in Linux in embedded systems, where we have a lot
of control over the overall system, such as how many processes are
running.  This makes it easier to control latencies than on a
general purpose computer.

> We have a tiny little kernel to worry about inRTLinux and it's quite 
> hard for us to keep track of all possible delays in such cases. How's this
> going to work for Linux?

The same way everything works for Linux:  with enough people around the
world interested in and working on these problems, they will be fixed.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-11 Thread Nigel Gamble

On Tue, 10 Apr 2001, Paul McKenney wrote:
> > Disabling preemption is a possible solution if the critical section
> > is
> short
> > - less than 100us - otherwise preemption latencies become a problem.
> 
> Seems like a reasonable restriction.  Of course, this same limit
> applies to locks and interrupt disabling, right?

That's the goal I'd like to see us achieve in 2.5.  Interrupts are
already in this range (with a few notable exceptions), but there is
still the big kernel lock and a few other long held spin locks to deal
with.  So I want to make sure that any new locking scheme like the ones
under discussion play nicely with the efforts to achieve low-latency
Linux such as the preemptible kernel.

> > The implementation of synchronize_kernel() that Rusty and I
> > discussed earlier in this thread would work in other cases, such as
> > module unloading, where there was a concern that it was not
> > practical to have any sort of lock in the read-side code path and
> > the write side was not time critical.
> 
> True, but only if the synchronize_kernel() implementation is applied
> to UP kernels, also.

Yes, that is the idea.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-11 Thread Nigel Gamble

On Tue, 10 Apr 2001, Paul McKenney wrote:
  Disabling preemption is a possible solution if the critical section
  is
 short
  - less than 100us - otherwise preemption latencies become a problem.
 
 Seems like a reasonable restriction.  Of course, this same limit
 applies to locks and interrupt disabling, right?

That's the goal I'd like to see us achieve in 2.5.  Interrupts are
already in this range (with a few notable exceptions), but there is
still the big kernel lock and a few other long held spin locks to deal
with.  So I want to make sure that any new locking scheme like the ones
under discussion play nicely with the efforts to achieve low-latency
Linux such as the preemptible kernel.

  The implementation of synchronize_kernel() that Rusty and I
  discussed earlier in this thread would work in other cases, such as
  module unloading, where there was a concern that it was not
  practical to have any sort of lock in the read-side code path and
  the write side was not time critical.
 
 True, but only if the synchronize_kernel() implementation is applied
 to UP kernels, also.

Yes, that is the idea.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-11 Thread Nigel Gamble

On Tue, 10 Apr 2001 [EMAIL PROTECTED] wrote:
 On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote:
   Disabling preemption is a possible solution if the critical section is
  short
   - less than 100us - otherwise preemption latencies become a problem.
  
  Seems like a reasonable restriction.  Of course, this same limit applies
  to locks and interrupt disabling, right?
 
 So supposing 1/2 us per update
   lock process list
   for every process update pgd
   unlock process list
 
 is ok if #processes   200, but can cause some unspecified system failure
 due to a dependency on the 100us limit otherwise?

Only to a hard real-time system.

 And on a slower machine or with some heavy I/O possibilities 

I'm mostly interested in Linux in embedded systems, where we have a lot
of control over the overall system, such as how many processes are
running.  This makes it easier to control latencies than on a
general purpose computer.

 We have a tiny little kernel to worry about inRTLinux and it's quite 
 hard for us to keep track of all possible delays in such cases. How's this
 going to work for Linux?

The same way everything works for Linux:  with enough people around the
world interested in and working on these problems, they will be fixed.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread yodaiken

On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote:
> > Disabling preemption is a possible solution if the critical section is
> short
> > - less than 100us - otherwise preemption latencies become a problem.
> 
> Seems like a reasonable restriction.  Of course, this same limit applies
> to locks and interrupt disabling, right?

So supposing 1/2 us per update
lock process list
for every process update pgd
unlock process list

is ok if #processes <  200, but can cause some unspecified system failure
due to a dependency on the 100us limit otherwise?

And on a slower machine or with some heavy I/O possibilities 

We have a tiny little kernel to worry about inRTLinux and it's quite 
hard for us to keep track of all possible delays in such cases. How's this
going to work for Linux?


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread Paul McKenney


> On Tue, 10 Apr 2001, Paul McKenney wrote:
> > The algorithms we have been looking at need to have absolute guarantees
> > that earlier activity has completed.  The most straightforward way to
> > guarantee this is to have the critical-section activity run with
preemption
> > disabled.  Most of these code segments either take out locks or run
> > with interrupts disabled anyway, so there is little or no degradation
of
> > latency in this case.  In fact, in many cases, latency would actually
be
> > improved due to removal of explicit locking primitives.
> >
> > I believe that one of the issues that pushes in this direction is the
> > discovery that "synchronize_kernel()" could not be a nop in a UP kernel
> > unless the read-side critical sections disable preemption (either in
> > the natural course of events, or artificially if need be).  Andi or
> > Rusty can correct me if I missed something in the previous exchange...
> >
> > The read-side code segments are almost always quite short, and, again,
> > they would almost always otherwise need to be protected by a lock of
> > some sort, which would disable preemption in any event.
> >
> > Thoughts?
>
> Disabling preemption is a possible solution if the critical section is
short
> - less than 100us - otherwise preemption latencies become a problem.

Seems like a reasonable restriction.  Of course, this same limit applies
to locks and interrupt disabling, right?

> The implementation of synchronize_kernel() that Rusty and I discussed
> earlier in this thread would work in other cases, such as module
> unloading, where there was a concern that it was not practical to have
> any sort of lock in the read-side code path and the write side was not
> time critical.

True, but only if the synchronize_kernel() implementation is applied to UP
kernels, also.

Thanx, Paul

> Nigel Gamble[EMAIL PROTECTED]
> Mountain View, CA, USA. http://www.nrg.org/
>
> MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread Nigel Gamble

On Tue, 10 Apr 2001, Paul McKenney wrote:
> The algorithms we have been looking at need to have absolute guarantees
> that earlier activity has completed.  The most straightforward way to
> guarantee this is to have the critical-section activity run with preemption
> disabled.  Most of these code segments either take out locks or run
> with interrupts disabled anyway, so there is little or no degradation of
> latency in this case.  In fact, in many cases, latency would actually be
> improved due to removal of explicit locking primitives.
>
> I believe that one of the issues that pushes in this direction is the
> discovery that "synchronize_kernel()" could not be a nop in a UP kernel
> unless the read-side critical sections disable preemption (either in
> the natural course of events, or artificially if need be).  Andi or
> Rusty can correct me if I missed something in the previous exchange...
> 
> The read-side code segments are almost always quite short, and, again,
> they would almost always otherwise need to be protected by a lock of
> some sort, which would disable preemption in any event.
> 
> Thoughts?

Disabling preemption is a possible solution if the critical section is short
- less than 100us - otherwise preemption latencies become a problem.

The implementation of synchronize_kernel() that Rusty and I discussed
earlier in this thread would work in other cases, such as module
unloading, where there was a concern that it was not practical to have
any sort of lock in the read-side code path and the write side was not
time critical.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread Paul McKenney


> > As you've observed, with the approach of waiting for all pre-empted
tasks
> > to synchronize, the possibility of a task staying pre-empted for a long
> > time could affect the latency of an update/synchonize (though its hard
for
> > me to judge how likely that is).
>
> It's very unlikely on a system that doesn't already have problems with
> CPU starvation because of runaway real-time tasks or interrupt handlers.

Agreed!

> First, preemption is a comparitively rare event with a mostly
> timesharing load, typically from 1% to 10% of all context switches.

Again, agreed!

> Second, the scheduler should not penalize the preempted task for being
> preempted, so that it should usually get to continue running as soon as
> the preempting task is descheduled, which is at most one timeslice for
> timesharing tasks.

The algorithms we have been looking at need to have absolute guarantees
that earlier activity has completed.  The most straightforward way to
guarantee this is to have the critical-section activity run with preemption
disabled.  Most of these code segments either take out locks or run
with interrupts disabled anyway, so there is little or no degradation of
latency in this case.  In fact, in many cases, latency would actually be
improved due to removal of explicit locking primitives.

I believe that one of the issues that pushes in this direction is the
discovery that "synchronize_kernel()" could not be a nop in a UP kernel
unless the read-side critical sections disable preemption (either in
the natural course of events, or artificially if need be).  Andi or
Rusty can correct me if I missed something in the previous exchange...

The read-side code segments are almost always quite short, and, again,
they would almost always otherwise need to be protected by a lock of
some sort, which would disable preemption in any event.

Thoughts?

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread Paul McKenney


  As you've observed, with the approach of waiting for all pre-empted
tasks
  to synchronize, the possibility of a task staying pre-empted for a long
  time could affect the latency of an update/synchonize (though its hard
for
  me to judge how likely that is).

 It's very unlikely on a system that doesn't already have problems with
 CPU starvation because of runaway real-time tasks or interrupt handlers.

Agreed!

 First, preemption is a comparitively rare event with a mostly
 timesharing load, typically from 1% to 10% of all context switches.

Again, agreed!

 Second, the scheduler should not penalize the preempted task for being
 preempted, so that it should usually get to continue running as soon as
 the preempting task is descheduled, which is at most one timeslice for
 timesharing tasks.

The algorithms we have been looking at need to have absolute guarantees
that earlier activity has completed.  The most straightforward way to
guarantee this is to have the critical-section activity run with preemption
disabled.  Most of these code segments either take out locks or run
with interrupts disabled anyway, so there is little or no degradation of
latency in this case.  In fact, in many cases, latency would actually be
improved due to removal of explicit locking primitives.

I believe that one of the issues that pushes in this direction is the
discovery that "synchronize_kernel()" could not be a nop in a UP kernel
unless the read-side critical sections disable preemption (either in
the natural course of events, or artificially if need be).  Andi or
Rusty can correct me if I missed something in the previous exchange...

The read-side code segments are almost always quite short, and, again,
they would almost always otherwise need to be protected by a lock of
some sort, which would disable preemption in any event.

Thoughts?

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread Nigel Gamble

On Tue, 10 Apr 2001, Paul McKenney wrote:
 The algorithms we have been looking at need to have absolute guarantees
 that earlier activity has completed.  The most straightforward way to
 guarantee this is to have the critical-section activity run with preemption
 disabled.  Most of these code segments either take out locks or run
 with interrupts disabled anyway, so there is little or no degradation of
 latency in this case.  In fact, in many cases, latency would actually be
 improved due to removal of explicit locking primitives.

 I believe that one of the issues that pushes in this direction is the
 discovery that "synchronize_kernel()" could not be a nop in a UP kernel
 unless the read-side critical sections disable preemption (either in
 the natural course of events, or artificially if need be).  Andi or
 Rusty can correct me if I missed something in the previous exchange...
 
 The read-side code segments are almost always quite short, and, again,
 they would almost always otherwise need to be protected by a lock of
 some sort, which would disable preemption in any event.
 
 Thoughts?

Disabling preemption is a possible solution if the critical section is short
- less than 100us - otherwise preemption latencies become a problem.

The implementation of synchronize_kernel() that Rusty and I discussed
earlier in this thread would work in other cases, such as module
unloading, where there was a concern that it was not practical to have
any sort of lock in the read-side code path and the write side was not
time critical.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread Paul McKenney


 On Tue, 10 Apr 2001, Paul McKenney wrote:
  The algorithms we have been looking at need to have absolute guarantees
  that earlier activity has completed.  The most straightforward way to
  guarantee this is to have the critical-section activity run with
preemption
  disabled.  Most of these code segments either take out locks or run
  with interrupts disabled anyway, so there is little or no degradation
of
  latency in this case.  In fact, in many cases, latency would actually
be
  improved due to removal of explicit locking primitives.
 
  I believe that one of the issues that pushes in this direction is the
  discovery that "synchronize_kernel()" could not be a nop in a UP kernel
  unless the read-side critical sections disable preemption (either in
  the natural course of events, or artificially if need be).  Andi or
  Rusty can correct me if I missed something in the previous exchange...
 
  The read-side code segments are almost always quite short, and, again,
  they would almost always otherwise need to be protected by a lock of
  some sort, which would disable preemption in any event.
 
  Thoughts?

 Disabling preemption is a possible solution if the critical section is
short
 - less than 100us - otherwise preemption latencies become a problem.

Seems like a reasonable restriction.  Of course, this same limit applies
to locks and interrupt disabling, right?

 The implementation of synchronize_kernel() that Rusty and I discussed
 earlier in this thread would work in other cases, such as module
 unloading, where there was a concern that it was not practical to have
 any sort of lock in the read-side code path and the write side was not
 time critical.

True, but only if the synchronize_kernel() implementation is applied to UP
kernels, also.

Thanx, Paul

 Nigel Gamble[EMAIL PROTECTED]
 Mountain View, CA, USA. http://www.nrg.org/

 MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-10 Thread yodaiken

On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote:
  Disabling preemption is a possible solution if the critical section is
 short
  - less than 100us - otherwise preemption latencies become a problem.
 
 Seems like a reasonable restriction.  Of course, this same limit applies
 to locks and interrupt disabling, right?

So supposing 1/2 us per update
lock process list
for every process update pgd
unlock process list

is ok if #processes   200, but can cause some unspecified system failure
due to a dependency on the 100us limit otherwise?

And on a slower machine or with some heavy I/O possibilities 

We have a tiny little kernel to worry about inRTLinux and it's quite 
hard for us to keep track of all possible delays in such cases. How's this
going to work for Linux?


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-09 Thread Nigel Gamble

On Mon, 9 Apr 2001 [EMAIL PROTECTED] wrote:
> As you've observed, with the approach of waiting for all pre-empted tasks
> to synchronize, the possibility of a task staying pre-empted for a long
> time could affect the latency of an update/synchonize (though its hard for
> me to judge how likely that is).

It's very unlikely on a system that doesn't already have problems with
CPU starvation because of runaway real-time tasks or interrupt handlers.

First, preemption is a comparitively rare event with a mostly
timesharing load, typically from 1% to 10% of all context switches.

Second, the scheduler should not penalize the preempted task for being
preempted, so that it should usually get to continue running as soon as
the preempting task is descheduled, which is at most one timeslice for
timesharing tasks.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/
MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-09 Thread bsuparna


>One question:
>isn't it the case that the alternative to using synchronize_kernel()
>is to protect the read side with explicit locks, which will themselves
>suppress preemption?  If so, why not just suppress preemption on the read
>side in preemptible kernels, and thus gain the simpler implementation
>of synchronize_kernel()?  You are not losing any preemption latency
>compared to a kernel that uses traditional locks, in fact, you should
>improve latency a bit since the lock operations are more expensive than
>are simple increments and decrements.  As usual, what am I missing
>here?  ;-)
>...
>...
>I still prefer suppressing preemption on the read side, though I
>suppose one could claim that this is only because I am -really-
>used to it.  ;-)

Since this point has come up , I just wanted to mention that it may still
be nice to be able to do without explicit locks on the read-side. This is
not so much for performance reasons (I agree with your assessment on that
point) as for convinience / flexibility in the kind of situations where
this concept (i.e. synchronize_kernel or read-copy-update) could be used.

For example, consider situations where it is an executable code block that
is being protected. The read side is essentially the execution of that code
block - i.e. every entry/exit into the code block.

This is perhaps the case with module unload races. Having to acquire a
read-lock explicitly before every entry point seems to reduce  the
simplicity of the solution, doesn't it ?

This is also the case with kernel code patching, which I agree, may appear
to be a rather unlikely application of this concept to handle races in
multi-byte code patching on a running kernel, a rather difficult problem,
otherwise.  In this case, the read-side is totally unaware of the
possibility of an updater modifying the code, so it isn't even possible for
a read-lock to be acquired explicitly (if we wish to have the flexibility
of being able to patch any portion of the code).

Have been discussing this with Dipankar last week, so I realize that the
above situations were perhaps not what these locking mechanisms were
intended for, but just thought I'd bring up this perspective.

As you've observed, with the approach of waiting for all pre-empted tasks
to synchronize, the possibility of a task staying pre-empted for a long
time could affect the latency of an update/synchonize (though its hard for
me to judge how likely that is). Besides, as Andi pointed out, there
probably are a lot of situations where the readers are not pre-emptible
anyway, so that waiting for all pre-empted tasks may be superfluos.

Given these possibilities, does it make sense to simply let the
updater/synchronize kernel specify an option indicating whether it would
wait for pre-empted tasks or not ?

Regards
Suparna


  Suparna Bhattacharya
  IBM Software Lab, India
  E-mail : [EMAIL PROTECTED]
  Phone : 91-80-5267117, Extn : 2525


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-09 Thread bsuparna


One question:
isn't it the case that the alternative to using synchronize_kernel()
is to protect the read side with explicit locks, which will themselves
suppress preemption?  If so, why not just suppress preemption on the read
side in preemptible kernels, and thus gain the simpler implementation
of synchronize_kernel()?  You are not losing any preemption latency
compared to a kernel that uses traditional locks, in fact, you should
improve latency a bit since the lock operations are more expensive than
are simple increments and decrements.  As usual, what am I missing
here?  ;-)
...
...
I still prefer suppressing preemption on the read side, though I
suppose one could claim that this is only because I am -really-
used to it.  ;-)

Since this point has come up , I just wanted to mention that it may still
be nice to be able to do without explicit locks on the read-side. This is
not so much for performance reasons (I agree with your assessment on that
point) as for convinience / flexibility in the kind of situations where
this concept (i.e. synchronize_kernel or read-copy-update) could be used.

For example, consider situations where it is an executable code block that
is being protected. The read side is essentially the execution of that code
block - i.e. every entry/exit into the code block.

This is perhaps the case with module unload races. Having to acquire a
read-lock explicitly before every entry point seems to reduce  the
simplicity of the solution, doesn't it ?

This is also the case with kernel code patching, which I agree, may appear
to be a rather unlikely application of this concept to handle races in
multi-byte code patching on a running kernel, a rather difficult problem,
otherwise.  In this case, the read-side is totally unaware of the
possibility of an updater modifying the code, so it isn't even possible for
a read-lock to be acquired explicitly (if we wish to have the flexibility
of being able to patch any portion of the code).

Have been discussing this with Dipankar last week, so I realize that the
above situations were perhaps not what these locking mechanisms were
intended for, but just thought I'd bring up this perspective.

As you've observed, with the approach of waiting for all pre-empted tasks
to synchronize, the possibility of a task staying pre-empted for a long
time could affect the latency of an update/synchonize (though its hard for
me to judge how likely that is). Besides, as Andi pointed out, there
probably are a lot of situations where the readers are not pre-emptible
anyway, so that waiting for all pre-empted tasks may be superfluos.

Given these possibilities, does it make sense to simply let the
updater/synchronize kernel specify an option indicating whether it would
wait for pre-empted tasks or not ?

Regards
Suparna


  Suparna Bhattacharya
  IBM Software Lab, India
  E-mail : [EMAIL PROTECTED]
  Phone : 91-80-5267117, Extn : 2525


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-09 Thread Nigel Gamble

On Mon, 9 Apr 2001 [EMAIL PROTECTED] wrote:
 As you've observed, with the approach of waiting for all pre-empted tasks
 to synchronize, the possibility of a task staying pre-empted for a long
 time could affect the latency of an update/synchonize (though its hard for
 me to judge how likely that is).

It's very unlikely on a system that doesn't already have problems with
CPU starvation because of runaway real-time tasks or interrupt handlers.

First, preemption is a comparitively rare event with a mostly
timesharing load, typically from 1% to 10% of all context switches.

Second, the scheduler should not penalize the preempted task for being
preempted, so that it should usually get to continue running as soon as
the preempting task is descheduled, which is at most one timeslice for
timesharing tasks.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/
MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Paul McKenney


> > > > 2.   Isn't it possible to get in trouble even on a UP if a task
> > > >  is preempted in a critical region?  For example, suppose the
> > > >  preempting task does a synchronize_kernel()?
> > >
> > > Ugly. I guess one way to solve it would be to readd the 2.2 scheduler
> > > taskqueue, and just queue a scheduler callback in this case.
> >
> > Another approach would be to define a "really low" priority that noone
> > other than synchronize_kernel() was allowed to use.  Then the UP
> > implementation of synchronize_kernel() could drop its priority to
> > this level, yield the CPU, and know that all preempted tasks must
> > have obtained and voluntarily yielded the CPU before synchronize_kernel
()
> > gets it back again.
>
> That just would allow nasty starvation, e.g. when someone runs a cpu
intensive
> screensaver or a seti-at-home.

Good point!  I hereby withdraw my suggested use of ultra-low priorities
for UP implementations of synchronize_kernel().  ;-)

> > I still prefer suppressing preemption on the read side, though I
> > suppose one could claim that this is only because I am -really-
> > used to it.  ;-)
>
> For a lot of reader cases non-preemption by threads is guaranteed anyways
--
> e.g.  anything that runs in interrupts, timers, tasklets and network
softirq.
> I think that already covers a lot of interesting cases.

Good point again!  For example, this does cover most of the TCP/IP
cases, right?

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Paul McKenney


> > I see your point here, but need to think about it.  One question:
> > isn't it the case that the alternative to using synchronize_kernel()
> > is to protect the read side with explicit locks, which will themselves
> > suppress preemption?  If so, why not just suppress preemption on the
read
> > side in preemptible kernels, and thus gain the simpler implementation
> > of synchronize_kernel()?  You are not losing any preemption latency
> > compared to a kernel that uses traditional locks, in fact, you should
> > improve latency a bit since the lock operations are more expensive than
> > are simple increments and decrements.  As usual, what am I missing
> > here?  ;-)
>
> Already preempted tasks.

But if you are suppressing preemption in all read-side critical sections,
then wouldn't any already-preempted tasks be guaranteed to -not- be in
a read-side critical section, and therefore be guaranteed to be unaffected
by the update (in other words, wouldn't such tasks not need to be waited
for)?

> > Another approach would be to define a "really low" priority that noone
> > other than synchronize_kernel() was allowed to use.  Then the UP
> > implementation of synchronize_kernel() could drop its priority to
> > this level, yield the CPU, and know that all preempted tasks must
> > have obtained and voluntarily yielded the CPU before synchronize_kernel
()
> > gets it back again.
>
> Or "never", because I'm running RC5 etc. 8(.

U...  Good point!  Never mind use of low priorities in UP kernels
for synchronize_kernel()...

   Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Andi Kleen

On Fri, Apr 06, 2001 at 06:25:36PM -0700, Paul McKenney wrote:
> I see your point here, but need to think about it.  One question:
> isn't it the case that the alternative to using synchronize_kernel()
> is to protect the read side with explicit locks, which will themselves
> suppress preemption?  If so, why not just suppress preemption on the read
> side in preemptible kernels, and thus gain the simpler implementation
> of synchronize_kernel()?  You are not losing any preemption latency
> compared to a kernel that uses traditional locks, in fact, you should
> improve latency a bit since the lock operations are more expensive than
> are simple increments and decrements.  As usual, what am I missing
> here?  ;-)

You miss nothing I think. In fact it's already used (see below) 

> 
> > > 2.   Isn't it possible to get in trouble even on a UP if a task
> > >  is preempted in a critical region?  For example, suppose the
> > >  preempting task does a synchronize_kernel()?
> >
> > Ugly. I guess one way to solve it would be to readd the 2.2 scheduler
> > taskqueue, and just queue a scheduler callback in this case.
> 
> Another approach would be to define a "really low" priority that noone
> other than synchronize_kernel() was allowed to use.  Then the UP
> implementation of synchronize_kernel() could drop its priority to
> this level, yield the CPU, and know that all preempted tasks must
> have obtained and voluntarily yielded the CPU before synchronize_kernel()
> gets it back again.

That just would allow nasty starvation, e.g. when someone runs a cpu intensive
screensaver or a seti-at-home.

> 
> I still prefer suppressing preemption on the read side, though I
> suppose one could claim that this is only because I am -really-
> used to it.  ;-)

For a lot of reader cases non-preemption by threads is guaranteed anyways -- 
e.g.  anything that runs in interrupts, timers, tasklets and network softirq.  
I think that already covers a lot of interesting cases.


-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Rusty Russell

In message  you write:
> > Priority inversion is not handled in Linux kernel ATM BTW, there
> > are already situations where a realtime task can cause a deadlock
> > with some lower priority system thread (I believe there is at least
> > one case of this known with realtime ntpd on 2.4)
> 
> I see your point here, but need to think about it.  One question:
> isn't it the case that the alternative to using synchronize_kernel()
> is to protect the read side with explicit locks, which will themselves
> suppress preemption?  If so, why not just suppress preemption on the read
> side in preemptible kernels, and thus gain the simpler implementation
> of synchronize_kernel()?  You are not losing any preemption latency
> compared to a kernel that uses traditional locks, in fact, you should
> improve latency a bit since the lock operations are more expensive than
> are simple increments and decrements.  As usual, what am I missing
> here?  ;-)

Already preempted tasks.

> Another approach would be to define a "really low" priority that noone
> other than synchronize_kernel() was allowed to use.  Then the UP
> implementation of synchronize_kernel() could drop its priority to
> this level, yield the CPU, and know that all preempted tasks must
> have obtained and voluntarily yielded the CPU before synchronize_kernel()
> gets it back again.

Or "never", because I'm running RC5 etc. 8(.

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Paul McKenney


Andi, thank you for the background!  More comments interspersed...

> On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote:
> > 1.   On a busy system, isn't it possible for a preempted task
> >  to stay preempted for a -long- time, especially if there are
> >  lots of real-time tasks in the mix?
>
> The problem you're describing is probably considered too hard to
> solve properly (bad answer, but that is how it is currently)
>
> Yes there is. You can also force a normal (non preemptive) kernel
> into complete livelock by just giving it enough network traffic
> to do, so that it always works in the high priority network
> softirq or doing the same with some other interrupt.
>
> Just when this happens a lot of basic things will stop working (like
> page cleaning, IO flushing etc.), so your callbacks or module unloads
> not running are probably the least of your worries.
>
> The same problem applies to a smaller scale to real time processes;
> kernel services normally do not run real-time, so they can be starved.
>
> Priority inversion is not handled in Linux kernel ATM BTW, there
> are already situations where a realtime task can cause a deadlock
> with some lower priority system thread (I believe there is at least
> one case of this known with realtime ntpd on 2.4)

I see your point here, but need to think about it.  One question:
isn't it the case that the alternative to using synchronize_kernel()
is to protect the read side with explicit locks, which will themselves
suppress preemption?  If so, why not just suppress preemption on the read
side in preemptible kernels, and thus gain the simpler implementation
of synchronize_kernel()?  You are not losing any preemption latency
compared to a kernel that uses traditional locks, in fact, you should
improve latency a bit since the lock operations are more expensive than
are simple increments and decrements.  As usual, what am I missing
here?  ;-)

> > 2.   Isn't it possible to get in trouble even on a UP if a task
> >  is preempted in a critical region?  For example, suppose the
> >  preempting task does a synchronize_kernel()?
>
> Ugly. I guess one way to solve it would be to readd the 2.2 scheduler
> taskqueue, and just queue a scheduler callback in this case.

Another approach would be to define a "really low" priority that noone
other than synchronize_kernel() was allowed to use.  Then the UP
implementation of synchronize_kernel() could drop its priority to
this level, yield the CPU, and know that all preempted tasks must
have obtained and voluntarily yielded the CPU before synchronize_kernel()
gets it back again.

I still prefer suppressing preemption on the read side, though I
suppose one could claim that this is only because I am -really-
used to it.  ;-)

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Paul McKenney


Andi, thank you for the background!  More comments interspersed...

 On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote:
  1.   On a busy system, isn't it possible for a preempted task
   to stay preempted for a -long- time, especially if there are
   lots of real-time tasks in the mix?

 The problem you're describing is probably considered too hard to
 solve properly (bad answer, but that is how it is currently)

 Yes there is. You can also force a normal (non preemptive) kernel
 into complete livelock by just giving it enough network traffic
 to do, so that it always works in the high priority network
 softirq or doing the same with some other interrupt.

 Just when this happens a lot of basic things will stop working (like
 page cleaning, IO flushing etc.), so your callbacks or module unloads
 not running are probably the least of your worries.

 The same problem applies to a smaller scale to real time processes;
 kernel services normally do not run real-time, so they can be starved.

 Priority inversion is not handled in Linux kernel ATM BTW, there
 are already situations where a realtime task can cause a deadlock
 with some lower priority system thread (I believe there is at least
 one case of this known with realtime ntpd on 2.4)

I see your point here, but need to think about it.  One question:
isn't it the case that the alternative to using synchronize_kernel()
is to protect the read side with explicit locks, which will themselves
suppress preemption?  If so, why not just suppress preemption on the read
side in preemptible kernels, and thus gain the simpler implementation
of synchronize_kernel()?  You are not losing any preemption latency
compared to a kernel that uses traditional locks, in fact, you should
improve latency a bit since the lock operations are more expensive than
are simple increments and decrements.  As usual, what am I missing
here?  ;-)

  2.   Isn't it possible to get in trouble even on a UP if a task
   is preempted in a critical region?  For example, suppose the
   preempting task does a synchronize_kernel()?

 Ugly. I guess one way to solve it would be to readd the 2.2 scheduler
 taskqueue, and just queue a scheduler callback in this case.

Another approach would be to define a "really low" priority that noone
other than synchronize_kernel() was allowed to use.  Then the UP
implementation of synchronize_kernel() could drop its priority to
this level, yield the CPU, and know that all preempted tasks must
have obtained and voluntarily yielded the CPU before synchronize_kernel()
gets it back again.

I still prefer suppressing preemption on the read side, though I
suppose one could claim that this is only because I am -really-
used to it.  ;-)

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Rusty Russell

In message OF37B0793C.6B15F182-ON88256A27.0007C3EF@LocalDomain you write:
  Priority inversion is not handled in Linux kernel ATM BTW, there
  are already situations where a realtime task can cause a deadlock
  with some lower priority system thread (I believe there is at least
  one case of this known with realtime ntpd on 2.4)
 
 I see your point here, but need to think about it.  One question:
 isn't it the case that the alternative to using synchronize_kernel()
 is to protect the read side with explicit locks, which will themselves
 suppress preemption?  If so, why not just suppress preemption on the read
 side in preemptible kernels, and thus gain the simpler implementation
 of synchronize_kernel()?  You are not losing any preemption latency
 compared to a kernel that uses traditional locks, in fact, you should
 improve latency a bit since the lock operations are more expensive than
 are simple increments and decrements.  As usual, what am I missing
 here?  ;-)

Already preempted tasks.

 Another approach would be to define a "really low" priority that noone
 other than synchronize_kernel() was allowed to use.  Then the UP
 implementation of synchronize_kernel() could drop its priority to
 this level, yield the CPU, and know that all preempted tasks must
 have obtained and voluntarily yielded the CPU before synchronize_kernel()
 gets it back again.

Or "never", because I'm running RC5 etc. 8(.

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Andi Kleen

On Fri, Apr 06, 2001 at 06:25:36PM -0700, Paul McKenney wrote:
 I see your point here, but need to think about it.  One question:
 isn't it the case that the alternative to using synchronize_kernel()
 is to protect the read side with explicit locks, which will themselves
 suppress preemption?  If so, why not just suppress preemption on the read
 side in preemptible kernels, and thus gain the simpler implementation
 of synchronize_kernel()?  You are not losing any preemption latency
 compared to a kernel that uses traditional locks, in fact, you should
 improve latency a bit since the lock operations are more expensive than
 are simple increments and decrements.  As usual, what am I missing
 here?  ;-)

You miss nothing I think. In fact it's already used (see below) 

 
   2.   Isn't it possible to get in trouble even on a UP if a task
is preempted in a critical region?  For example, suppose the
preempting task does a synchronize_kernel()?
 
  Ugly. I guess one way to solve it would be to readd the 2.2 scheduler
  taskqueue, and just queue a scheduler callback in this case.
 
 Another approach would be to define a "really low" priority that noone
 other than synchronize_kernel() was allowed to use.  Then the UP
 implementation of synchronize_kernel() could drop its priority to
 this level, yield the CPU, and know that all preempted tasks must
 have obtained and voluntarily yielded the CPU before synchronize_kernel()
 gets it back again.

That just would allow nasty starvation, e.g. when someone runs a cpu intensive
screensaver or a seti-at-home.

 
 I still prefer suppressing preemption on the read side, though I
 suppose one could claim that this is only because I am -really-
 used to it.  ;-)

For a lot of reader cases non-preemption by threads is guaranteed anyways -- 
e.g.  anything that runs in interrupts, timers, tasklets and network softirq.  
I think that already covers a lot of interesting cases.


-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Paul McKenney


  I see your point here, but need to think about it.  One question:
  isn't it the case that the alternative to using synchronize_kernel()
  is to protect the read side with explicit locks, which will themselves
  suppress preemption?  If so, why not just suppress preemption on the
read
  side in preemptible kernels, and thus gain the simpler implementation
  of synchronize_kernel()?  You are not losing any preemption latency
  compared to a kernel that uses traditional locks, in fact, you should
  improve latency a bit since the lock operations are more expensive than
  are simple increments and decrements.  As usual, what am I missing
  here?  ;-)

 Already preempted tasks.

But if you are suppressing preemption in all read-side critical sections,
then wouldn't any already-preempted tasks be guaranteed to -not- be in
a read-side critical section, and therefore be guaranteed to be unaffected
by the update (in other words, wouldn't such tasks not need to be waited
for)?

  Another approach would be to define a "really low" priority that noone
  other than synchronize_kernel() was allowed to use.  Then the UP
  implementation of synchronize_kernel() could drop its priority to
  this level, yield the CPU, and know that all preempted tasks must
  have obtained and voluntarily yielded the CPU before synchronize_kernel
()
  gets it back again.

 Or "never", because I'm running RC5 etc. 8(.

U...  Good point!  Never mind use of low priorities in UP kernels
for synchronize_kernel()...

   Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel

2001-04-07 Thread Paul McKenney


2.   Isn't it possible to get in trouble even on a UP if a task
 is preempted in a critical region?  For example, suppose the
 preempting task does a synchronize_kernel()?
  
   Ugly. I guess one way to solve it would be to readd the 2.2 scheduler
   taskqueue, and just queue a scheduler callback in this case.
 
  Another approach would be to define a "really low" priority that noone
  other than synchronize_kernel() was allowed to use.  Then the UP
  implementation of synchronize_kernel() could drop its priority to
  this level, yield the CPU, and know that all preempted tasks must
  have obtained and voluntarily yielded the CPU before synchronize_kernel
()
  gets it back again.

 That just would allow nasty starvation, e.g. when someone runs a cpu
intensive
 screensaver or a seti-at-home.

Good point!  I hereby withdraw my suggested use of ultra-low priorities
for UP implementations of synchronize_kernel().  ;-)

  I still prefer suppressing preemption on the read side, though I
  suppose one could claim that this is only because I am -really-
  used to it.  ;-)

 For a lot of reader cases non-preemption by threads is guaranteed anyways
--
 e.g.  anything that runs in interrupts, timers, tasklets and network
softirq.
 I think that already covers a lot of interesting cases.

Good point again!  For example, this does cover most of the TCP/IP
cases, right?

  Thanx, Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-06 Thread Andi Kleen

Hallo, 

On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote:
> 1.   On a busy system, isn't it possible for a preempted task
>  to stay preempted for a -long- time, especially if there are
>  lots of real-time tasks in the mix?

The problem you're describing is probably considered too hard to solve properly (bad 
answer,
but that is how it is currently) 

Yes there is. You can also force a normal (non preemptive) kernel into complete 
livelock by 
just giving it enough network traffic to do, so that it always works in the high 
priority 
network softirq or doing the same with some other interrupt. 

Just when this happens a lot of basic things will stop working (like page cleaning, IO 
flushing etc.), so your callbacks or module unloads not running are probably the least
of your worries.  

The same problem applies to a smaller scale to real time processes; kernel services 
normally do not run real-time, so they can be starved. 

Priority inversion is not handled in Linux kernel ATM BTW, there are already situations
where a realtime task can cause a deadlock with some lower priority system thread
(I believe there is at least one case of this known with realtime ntpd on 2.4) 

> 2.   Isn't it possible to get in trouble even on a UP if a task
>  is preempted in a critical region?  For example, suppose the
>  preempting task does a synchronize_kernel()?

Ugly. I guess one way to solve it would be to readd the 2.2 scheduler taskqueue,
and just queue a scheduler callback in this case.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-06 Thread Paul McKenney

Please accept my apologies if I am missing something basic, but...

1.   On a busy system, isn't it possible for a preempted task
 to stay preempted for a -long- time, especially if there are
 lots of real-time tasks in the mix?

2.   Isn't it possible to get in trouble even on a UP if a task
 is preempted in a critical region?  For example, suppose the
 preempting task does a synchronize_kernel()?

If I understand the preemptible kernel patch, suppressing preemption
is quite cheap -- just increment preempt_count before and decrement
it after, with no locks, atomic operations, or disabling of interrupts
required.  I understand that disabling preemption for any long time
is a Bad Thing, however, Dipankar's patch is able to detect situations
where preemption has been suppressed for too long.  In addition, Rusty's
original synchronize_kernel() patch is much simpler than the two that
wait for preempted tasks to reach a voluntary context switch.

What am I missing here?

   Thanx, Paul

> > Here is an attempt at a possible version of synchronize_kernel() that
> > should work on a preemptible kernel. I haven't tested it yet.
>
> It's close, but...
>
> Those who suggest that we don't do preemtion on SMP make this much
> easier (synchronize_kernel() is a NOP on UP), and I'm starting to
> agree with them. Anyway:
>
> > if (p->state == TASK_RUNNING ||
> > (p->state == (TASK_RUNNING|TASK_PREEMPTED))) {
> > p->flags |= PF_SYNCING;
>
> Setting a running task's flags brings races, AFAICT, and checking
> p->state is NOT sufficient, consider wait_event(): you need p->has_cpu
> here I think. You could do it for TASK_PREEMPTED only, but you'd have
> to do the "unreal priority" part of synchronize_kernel() with some
> method to say "don't preempt anyone", but it will hurt latency.
> Hmmm...
>
> The only way I can see is to have a new element in "struct
> task_struct" saying "syncing now", which is protected by the runqueue
> lock. This looks like (and I prefer wait queues, they have such nice
> helpers):
>
> static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
> static DECLARE_MUTEX(synchronize_kernel_mtx);
> static int sync_count = 0;
>
> schedule():
> if (!(prev->state & TASK_PREEMPTED) && prev->syncing)
> if (--sync_count == 0) wake_up(_task);
>
> synchronize_kernel():
> {
> struct list_head *tmp;
> struct task_struct *p;
>
> /* Guard against multiple calls to this function */
> down(_kernel_mtx);
>
> /* Everyone running now or currently preempted must
>voluntarily schedule before we know we are safe. */
> spin_lock_irq(_lock);
> list_for_each(tmp, _head) {
> p = list_entry(tmp, struct task_struct, run_list);
> if (p->has_cpu || p->state
== (TASK_RUNNING|TASK_PREEMPTED)) {
> p->syncing = 1;
> sync_count++;
> }
> }
> spin_unlock_irq(_lock);
>
> /* Wait for them all */
> wait_event(syncing_task, sync_count == 0);
> up(_kernel_mtx);
> }
>
> Also untested 8),
> Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-06 Thread Paul McKenney

Please accept my apologies if I am missing something basic, but...

1.   On a busy system, isn't it possible for a preempted task
 to stay preempted for a -long- time, especially if there are
 lots of real-time tasks in the mix?

2.   Isn't it possible to get in trouble even on a UP if a task
 is preempted in a critical region?  For example, suppose the
 preempting task does a synchronize_kernel()?

If I understand the preemptible kernel patch, suppressing preemption
is quite cheap -- just increment preempt_count before and decrement
it after, with no locks, atomic operations, or disabling of interrupts
required.  I understand that disabling preemption for any long time
is a Bad Thing, however, Dipankar's patch is able to detect situations
where preemption has been suppressed for too long.  In addition, Rusty's
original synchronize_kernel() patch is much simpler than the two that
wait for preempted tasks to reach a voluntary context switch.

What am I missing here?

   Thanx, Paul

  Here is an attempt at a possible version of synchronize_kernel() that
  should work on a preemptible kernel. I haven't tested it yet.

 It's close, but...

 Those who suggest that we don't do preemtion on SMP make this much
 easier (synchronize_kernel() is a NOP on UP), and I'm starting to
 agree with them. Anyway:

  if (p-state == TASK_RUNNING ||
  (p-state == (TASK_RUNNING|TASK_PREEMPTED))) {
  p-flags |= PF_SYNCING;

 Setting a running task's flags brings races, AFAICT, and checking
 p-state is NOT sufficient, consider wait_event(): you need p-has_cpu
 here I think. You could do it for TASK_PREEMPTED only, but you'd have
 to do the "unreal priority" part of synchronize_kernel() with some
 method to say "don't preempt anyone", but it will hurt latency.
 Hmmm...

 The only way I can see is to have a new element in "struct
 task_struct" saying "syncing now", which is protected by the runqueue
 lock. This looks like (and I prefer wait queues, they have such nice
 helpers):

 static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
 static DECLARE_MUTEX(synchronize_kernel_mtx);
 static int sync_count = 0;

 schedule():
 if (!(prev-state  TASK_PREEMPTED)  prev-syncing)
 if (--sync_count == 0) wake_up(syncing_task);

 synchronize_kernel():
 {
 struct list_head *tmp;
 struct task_struct *p;

 /* Guard against multiple calls to this function */
 down(synchronize_kernel_mtx);

 /* Everyone running now or currently preempted must
voluntarily schedule before we know we are safe. */
 spin_lock_irq(runqueue_lock);
 list_for_each(tmp, runqueue_head) {
 p = list_entry(tmp, struct task_struct, run_list);
 if (p-has_cpu || p-state
== (TASK_RUNNING|TASK_PREEMPTED)) {
 p-syncing = 1;
 sync_count++;
 }
 }
 spin_unlock_irq(runqueue_lock);

 /* Wait for them all */
 wait_event(syncing_task, sync_count == 0);
 up(synchronize_kernel_mtx);
 }

 Also untested 8),
 Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-06 Thread Andi Kleen

Hallo, 

On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote:
 1.   On a busy system, isn't it possible for a preempted task
  to stay preempted for a -long- time, especially if there are
  lots of real-time tasks in the mix?

The problem you're describing is probably considered too hard to solve properly (bad 
answer,
but that is how it is currently) 

Yes there is. You can also force a normal (non preemptive) kernel into complete 
livelock by 
just giving it enough network traffic to do, so that it always works in the high 
priority 
network softirq or doing the same with some other interrupt. 

Just when this happens a lot of basic things will stop working (like page cleaning, IO 
flushing etc.), so your callbacks or module unloads not running are probably the least
of your worries.  

The same problem applies to a smaller scale to real time processes; kernel services 
normally do not run real-time, so they can be starved. 

Priority inversion is not handled in Linux kernel ATM BTW, there are already situations
where a realtime task can cause a deadlock with some lower priority system thread
(I believe there is at least one case of this known with realtime ntpd on 2.4) 

 2.   Isn't it possible to get in trouble even on a UP if a task
  is preempted in a critical region?  For example, suppose the
  preempting task does a synchronize_kernel()?

Ugly. I guess one way to solve it would be to readd the 2.2 scheduler taskqueue,
and just queue a scheduler callback in this case.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-05 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write
:
> > Setting a running task's flags brings races, AFAICT, and checking
> > p->state is NOT sufficient, consider wait_event(): you need p->has_cpu
> > here I think.
> 
> My thought here was that if p->state is anything other than TASK_RUNNING
> or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a
> synchonize point,

Right.  Theoretically possible to set p->state and not sleep, but it's
not a bad assumption.

> > schedule():
> > if (!(prev->state & TASK_PREEMPTED) && prev->syncing)
> > if (--sync_count == 0) wake_up(_task);
> 
> Don't forget to reset prev->syncing.

Right.

> I agree with you about wait queues, but didn't use them here because
> of the problem of avoiding deadlock on the runqueue lock, which the
> wait queues also use.

And right again.  Yeah, it has to be done manually.  Ack, can I
retract that Email?

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-05 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> On a higher level, I think the scanning of the run list to set flags and
> counters is a bit off.

[snip standard refcnt scheme]

For most things, refcnts are great.  I use them in connection
tracking.  But when writes can be insanely slow (eg. once per hour),
those two atomic ops just hurt scalability, and are invasive.

In particular, they suck HARD for modules, which is where my initial
quiescent wait implementation came from, only I didn't call it that
because Andi hadn't posted the Sequent paper URL yet and I hadn't
figured out that this was a generically useful scheme that can be
implemented in 20 lines...

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-05 Thread Rusty Russell

In message [EMAIL PROTECTED] you write
:
  Setting a running task's flags brings races, AFAICT, and checking
  p-state is NOT sufficient, consider wait_event(): you need p-has_cpu
  here I think.
 
 My thought here was that if p-state is anything other than TASK_RUNNING
 or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a
 synchonize point,

Right.  Theoretically possible to set p-state and not sleep, but it's
not a bad assumption.

  schedule():
  if (!(prev-state  TASK_PREEMPTED)  prev-syncing)
  if (--sync_count == 0) wake_up(syncing_task);
 
 Don't forget to reset prev-syncing.

Right.

 I agree with you about wait queues, but didn't use them here because
 of the problem of avoiding deadlock on the runqueue lock, which the
 wait queues also use.

And right again.  Yeah, it has to be done manually.  Ack, can I
retract that Email?

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-05 Thread Rusty Russell

In message [EMAIL PROTECTED] you write:
 On a higher level, I think the scanning of the run list to set flags and
 counters is a bit off.

[snip standard refcnt scheme]

For most things, refcnts are great.  I use them in connection
tracking.  But when writes can be insanely slow (eg. once per hour),
those two atomic ops just hurt scalability, and are invasive.

In particular, they suck HARD for modules, which is where my initial
quiescent wait implementation came from, only I didn't call it that
because Andi hadn't posted the Sequent paper URL yet and I hadn't
figured out that this was a generically useful scheme that can be
implemented in 20 lines...

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-02 Thread george anzinger

Nigel Gamble wrote:
> 
> On Sat, 31 Mar 2001, george anzinger wrote:
> > I think this should be:
> > if (p->has_cpu || p->state & TASK_PREEMPTED)) {
> > to catch tasks that were preempted with other states.
> 
> But the other states are all part of the state change that happens at a
> non-preemtive schedule() point, aren't they, so those tasks are already
> safe to access the data we are protecting.
> 
If your saying that the task's "thinking" about a state change is
sufficient, ok.  The point is that a task changes it state prior to
calling schedule() and then, sometimes, doesn't call schedule and just
changes its state back to running.  Preemption can happen at any of
these times, after all that is what the TASK_PREEMPTED flag is used for.

On a higher level, I think the scanning of the run list to set flags and
counters is a bit off.  If these things need to be counted and kept
track of, the tasks should be doing it "in the moment" not some other
task at some distant time.  For example if what is being protected is a
data structure, a counter could be put in the structure that keeps track
of the number of tasks that are interested in seeing it stay around.  As
I understand the objective of the method being explored, a writer wants
to change the structure, but old readers can continue to use the old
while new readers get the new structure.  The issue then is when to
"garbage collect" the no longer used structures.  It seems to me that
the pointer to the structure can be changed to point to the new one when
the writer has it set up.  Old users continue to use the old.  When they
are done, they decrement the use count.  When the use count goes to
zero, it is time to "garbage collect".  At this time, the "garbage man"
is called (one simple one would be to check if the structure is still
the one a "new" task would get).  Various methods exist for determing
how and if the "garbage man" should be called, but this sort of thing,
IMNSHO, does NOT belong in schedule().

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-02 Thread george anzinger

Nigel Gamble wrote:
 
 On Sat, 31 Mar 2001, george anzinger wrote:
  I think this should be:
  if (p-has_cpu || p-state  TASK_PREEMPTED)) {
  to catch tasks that were preempted with other states.
 
 But the other states are all part of the state change that happens at a
 non-preemtive schedule() point, aren't they, so those tasks are already
 safe to access the data we are protecting.
 
If your saying that the task's "thinking" about a state change is
sufficient, ok.  The point is that a task changes it state prior to
calling schedule() and then, sometimes, doesn't call schedule and just
changes its state back to running.  Preemption can happen at any of
these times, after all that is what the TASK_PREEMPTED flag is used for.

On a higher level, I think the scanning of the run list to set flags and
counters is a bit off.  If these things need to be counted and kept
track of, the tasks should be doing it "in the moment" not some other
task at some distant time.  For example if what is being protected is a
data structure, a counter could be put in the structure that keeps track
of the number of tasks that are interested in seeing it stay around.  As
I understand the objective of the method being explored, a writer wants
to change the structure, but old readers can continue to use the old
while new readers get the new structure.  The issue then is when to
"garbage collect" the no longer used structures.  It seems to me that
the pointer to the structure can be changed to point to the new one when
the writer has it set up.  Old users continue to use the old.  When they
are done, they decrement the use count.  When the use count goes to
zero, it is time to "garbage collect".  At this time, the "garbage man"
is called (one simple one would be to check if the structure is still
the one a "new" task would get).  Various methods exist for determing
how and if the "garbage man" should be called, but this sort of thing,
IMNSHO, does NOT belong in schedule().

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-01 Thread Nigel Gamble

On Sat, 31 Mar 2001, Rusty Russell wrote:
> > if (p->state == TASK_RUNNING ||
> > (p->state == (TASK_RUNNING|TASK_PREEMPTED))) {
> > p->flags |= PF_SYNCING;
> 
> Setting a running task's flags brings races, AFAICT, and checking
> p->state is NOT sufficient, consider wait_event(): you need p->has_cpu
> here I think.

My thought here was that if p->state is anything other than TASK_RUNNING
or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a
synchonize point, so we don't need to wait for it to arrive at another
one - it will get a consistent view of the data we are protecting.
wait_event() qualifies as a synchronize point, doesn't it?  Or am I
missing something?

> The only way I can see is to have a new element in "struct
> task_struct" saying "syncing now", which is protected by the runqueue
> lock.  This looks like (and I prefer wait queues, they have such nice
> helpers):
> 
>   static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
>   static DECLARE_MUTEX(synchronize_kernel_mtx);
>   static int sync_count = 0;
> 
> schedule():
>   if (!(prev->state & TASK_PREEMPTED) && prev->syncing)
>   if (--sync_count == 0) wake_up(_task);

Don't forget to reset prev->syncing.  I agree with you about wait
queues, but didn't use them here because of the problem of avoiding
deadlock on the runqueue lock, which the wait queues also use.  The
above code in schedule needs the runqueue lock to protect sync_count.


Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-01 Thread Nigel Gamble

On Sat, 31 Mar 2001, george anzinger wrote:
> I think this should be:
> if (p->has_cpu || p->state & TASK_PREEMPTED)) {
> to catch tasks that were preempted with other states.

But the other states are all part of the state change that happens at a
non-preemtive schedule() point, aren't they, so those tasks are already
safe to access the data we are protecting.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-01 Thread Nigel Gamble

On Sat, 31 Mar 2001, george anzinger wrote:
 I think this should be:
 if (p-has_cpu || p-state  TASK_PREEMPTED)) {
 to catch tasks that were preempted with other states.

But the other states are all part of the state change that happens at a
non-preemtive schedule() point, aren't they, so those tasks are already
safe to access the data we are protecting.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-04-01 Thread Nigel Gamble

On Sat, 31 Mar 2001, Rusty Russell wrote:
  if (p-state == TASK_RUNNING ||
  (p-state == (TASK_RUNNING|TASK_PREEMPTED))) {
  p-flags |= PF_SYNCING;
 
 Setting a running task's flags brings races, AFAICT, and checking
 p-state is NOT sufficient, consider wait_event(): you need p-has_cpu
 here I think.

My thought here was that if p-state is anything other than TASK_RUNNING
or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a
synchonize point, so we don't need to wait for it to arrive at another
one - it will get a consistent view of the data we are protecting.
wait_event() qualifies as a synchronize point, doesn't it?  Or am I
missing something?

 The only way I can see is to have a new element in "struct
 task_struct" saying "syncing now", which is protected by the runqueue
 lock.  This looks like (and I prefer wait queues, they have such nice
 helpers):
 
   static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
   static DECLARE_MUTEX(synchronize_kernel_mtx);
   static int sync_count = 0;
 
 schedule():
   if (!(prev-state  TASK_PREEMPTED)  prev-syncing)
   if (--sync_count == 0) wake_up(syncing_task);

Don't forget to reset prev-syncing.  I agree with you about wait
queues, but didn't use them here because of the problem of avoiding
deadlock on the runqueue lock, which the wait queues also use.  The
above code in schedule needs the runqueue lock to protect sync_count.


Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-31 Thread george anzinger

Rusty Russell wrote:
> 
> In message <[EMAIL PROTECTED]> you write:
> > Here is an attempt at a possible version of synchronize_kernel() that
> > should work on a preemptible kernel.  I haven't tested it yet.
> 
> It's close, but...
> 
> Those who suggest that we don't do preemtion on SMP make this much
> easier (synchronize_kernel() is a NOP on UP), and I'm starting to
> agree with them.  Anyway:
> 
> >   if (p->state == TASK_RUNNING ||
> >   (p->state == (TASK_RUNNING|TASK_PREEMPTED))) {
> >   p->flags |= PF_SYNCING;
> 
> Setting a running task's flags brings races, AFAICT, and checking
> p->state is NOT sufficient, consider wait_event(): you need p->has_cpu
> here I think.  You could do it for TASK_PREEMPTED only, but you'd have
> to do the "unreal priority" part of synchronize_kernel() with some
> method to say "don't preempt anyone", but it will hurt latency.
> Hmmm...
> 
> The only way I can see is to have a new element in "struct
> task_struct" saying "syncing now", which is protected by the runqueue
> lock.  This looks like (and I prefer wait queues, they have such nice
> helpers):
> 
> static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
> static DECLARE_MUTEX(synchronize_kernel_mtx);
> static int sync_count = 0;
> 
> schedule():
> if (!(prev->state & TASK_PREEMPTED) && prev->syncing)
> if (--sync_count == 0) wake_up(_task);
> 
> synchronize_kernel():
> {
> struct list_head *tmp;
> struct task_struct *p;
> 
> /* Guard against multiple calls to this function */
> down(_kernel_mtx);
> 
> /* Everyone running now or currently preempted must
>voluntarily schedule before we know we are safe. */
> spin_lock_irq(_lock);
> list_for_each(tmp, _head) {
> p = list_entry(tmp, struct task_struct, run_list);
> if (p->has_cpu || p->state == (TASK_RUNNING|TASK_PREEMPTED)) {
I think this should be:
  if (p->has_cpu || p->state & TASK_PREEMPTED)) {
to catch tasks that were preempted with other states.  The lse Multi
Queue scheduler folks are going to love this.

George

> p->syncing = 1;
> sync_count++;
> }
> }
> spin_unlock_irq(_lock);
> 
> /* Wait for them all */
> wait_event(syncing_task, sync_count == 0);
> up(_kernel_mtx);
> }
> 
> Also untested 8),
> Rusty.
> --
> Premature optmztion is rt of all evl. --DK
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-31 Thread george anzinger

Rusty Russell wrote:
 
 In message [EMAIL PROTECTED] you write:
  Here is an attempt at a possible version of synchronize_kernel() that
  should work on a preemptible kernel.  I haven't tested it yet.
 
 It's close, but...
 
 Those who suggest that we don't do preemtion on SMP make this much
 easier (synchronize_kernel() is a NOP on UP), and I'm starting to
 agree with them.  Anyway:
 
if (p-state == TASK_RUNNING ||
(p-state == (TASK_RUNNING|TASK_PREEMPTED))) {
p-flags |= PF_SYNCING;
 
 Setting a running task's flags brings races, AFAICT, and checking
 p-state is NOT sufficient, consider wait_event(): you need p-has_cpu
 here I think.  You could do it for TASK_PREEMPTED only, but you'd have
 to do the "unreal priority" part of synchronize_kernel() with some
 method to say "don't preempt anyone", but it will hurt latency.
 Hmmm...
 
 The only way I can see is to have a new element in "struct
 task_struct" saying "syncing now", which is protected by the runqueue
 lock.  This looks like (and I prefer wait queues, they have such nice
 helpers):
 
 static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
 static DECLARE_MUTEX(synchronize_kernel_mtx);
 static int sync_count = 0;
 
 schedule():
 if (!(prev-state  TASK_PREEMPTED)  prev-syncing)
 if (--sync_count == 0) wake_up(syncing_task);
 
 synchronize_kernel():
 {
 struct list_head *tmp;
 struct task_struct *p;
 
 /* Guard against multiple calls to this function */
 down(synchronize_kernel_mtx);
 
 /* Everyone running now or currently preempted must
voluntarily schedule before we know we are safe. */
 spin_lock_irq(runqueue_lock);
 list_for_each(tmp, runqueue_head) {
 p = list_entry(tmp, struct task_struct, run_list);
 if (p-has_cpu || p-state == (TASK_RUNNING|TASK_PREEMPTED)) {
I think this should be:
  if (p-has_cpu || p-state  TASK_PREEMPTED)) {
to catch tasks that were preempted with other states.  The lse Multi
Queue scheduler folks are going to love this.

George

 p-syncing = 1;
 sync_count++;
 }
 }
 spin_unlock_irq(runqueue_lock);
 
 /* Wait for them all */
 wait_event(syncing_task, sync_count == 0);
 up(synchronize_kernel_mtx);
 }
 
 Also untested 8),
 Rusty.
 --
 Premature optmztion is rt of all evl. --DK
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-30 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> Here is an attempt at a possible version of synchronize_kernel() that
> should work on a preemptible kernel.  I haven't tested it yet.

It's close, but...

Those who suggest that we don't do preemtion on SMP make this much
easier (synchronize_kernel() is a NOP on UP), and I'm starting to
agree with them.  Anyway:

>   if (p->state == TASK_RUNNING ||
>   (p->state == (TASK_RUNNING|TASK_PREEMPTED))) {
>   p->flags |= PF_SYNCING;

Setting a running task's flags brings races, AFAICT, and checking
p->state is NOT sufficient, consider wait_event(): you need p->has_cpu
here I think.  You could do it for TASK_PREEMPTED only, but you'd have
to do the "unreal priority" part of synchronize_kernel() with some
method to say "don't preempt anyone", but it will hurt latency.
Hmmm...

The only way I can see is to have a new element in "struct
task_struct" saying "syncing now", which is protected by the runqueue
lock.  This looks like (and I prefer wait queues, they have such nice
helpers):

static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
static DECLARE_MUTEX(synchronize_kernel_mtx);
static int sync_count = 0;

schedule():
if (!(prev->state & TASK_PREEMPTED) && prev->syncing)
if (--sync_count == 0) wake_up(_task);

synchronize_kernel():
{
struct list_head *tmp;
struct task_struct *p;

/* Guard against multiple calls to this function */
down(_kernel_mtx);

/* Everyone running now or currently preempted must
   voluntarily schedule before we know we are safe. */
spin_lock_irq(_lock);
list_for_each(tmp, _head) {
p = list_entry(tmp, struct task_struct, run_list);
if (p->has_cpu || p->state == (TASK_RUNNING|TASK_PREEMPTED)) {
p->syncing = 1;
sync_count++;
}
}
spin_unlock_irq(_lock);

/* Wait for them all */
wait_event(syncing_task, sync_count == 0);
up(_kernel_mtx);
}

Also untested 8),
Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-30 Thread Rusty Russell

In message [EMAIL PROTECTED] you write:
 Here is an attempt at a possible version of synchronize_kernel() that
 should work on a preemptible kernel.  I haven't tested it yet.

It's close, but...

Those who suggest that we don't do preemtion on SMP make this much
easier (synchronize_kernel() is a NOP on UP), and I'm starting to
agree with them.  Anyway:

   if (p-state == TASK_RUNNING ||
   (p-state == (TASK_RUNNING|TASK_PREEMPTED))) {
   p-flags |= PF_SYNCING;

Setting a running task's flags brings races, AFAICT, and checking
p-state is NOT sufficient, consider wait_event(): you need p-has_cpu
here I think.  You could do it for TASK_PREEMPTED only, but you'd have
to do the "unreal priority" part of synchronize_kernel() with some
method to say "don't preempt anyone", but it will hurt latency.
Hmmm...

The only way I can see is to have a new element in "struct
task_struct" saying "syncing now", which is protected by the runqueue
lock.  This looks like (and I prefer wait queues, they have such nice
helpers):

static DECLARE_WAIT_QUEUE_HEAD(syncing_task);
static DECLARE_MUTEX(synchronize_kernel_mtx);
static int sync_count = 0;

schedule():
if (!(prev-state  TASK_PREEMPTED)  prev-syncing)
if (--sync_count == 0) wake_up(syncing_task);

synchronize_kernel():
{
struct list_head *tmp;
struct task_struct *p;

/* Guard against multiple calls to this function */
down(synchronize_kernel_mtx);

/* Everyone running now or currently preempted must
   voluntarily schedule before we know we are safe. */
spin_lock_irq(runqueue_lock);
list_for_each(tmp, runqueue_head) {
p = list_entry(tmp, struct task_struct, run_list);
if (p-has_cpu || p-state == (TASK_RUNNING|TASK_PREEMPTED)) {
p-syncing = 1;
sync_count++;
}
}
spin_unlock_irq(runqueue_lock);

/* Wait for them all */
wait_event(syncing_task, sync_count == 0);
up(synchronize_kernel_mtx);
}

Also untested 8),
Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-29 Thread Keith Owens

On Wed, 28 Mar 2001 12:51:02 -0800, 
george anzinger <[EMAIL PROTECTED]> wrote:
>Dipankar Sarma wrote:
>> 1. Disable pre-emption during the time when references to data
>> structures
>> updated using such Two-phase updates are held.
>
>Doesn't this fly in the face of the whole Two-phase system?  It seems to
>me that the point was to not require any locks.  Preemption disable IS a
>lock.  Not as strong as some, but a lock none the less.

The aim is to remove all module locks from the main kernel path and
move the penalty for module unloading into the task doing the unload.
Only the unload code needs to worry about preemption, not the main
kernel code paths, so main line code runs faster and scales better.  I
don't care how long (within reason) it takes to unload a module, it is
such a rare event.  I want module unload to be race free with zero
impact on the fast code paths, the current design penalizes the fast
code paths.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-29 Thread Nigel Gamble

On Tue, 20 Mar 2001, Nigel Gamble wrote:
> On Tue, 20 Mar 2001, Rusty Russell wrote:
> > Thoughts?
> 
> Perhaps synchronize_kernel() could take the run_queue lock, mark all the
> tasks on it and count them.  Any task marked when it calls schedule()
> voluntarily (but not if it is preempted) is unmarked and the count
> decremented.  synchronize_kernel() continues until the count is zero.

Hi Rusty,

Here is an attempt at a possible version of synchronize_kernel() that
should work on a preemptible kernel.  I haven't tested it yet.


static int sync_count = 0;
static struct task_struct *syncing_task = NULL;
static DECLARE_MUTEX(synchronize_kernel_mtx);

void
synchronize_kernel()
{
struct list_head *tmp;
struct task_struct *p;

/* Guard against multiple calls to this function */
down(_kernel_mtx);

/* Mark all tasks on the runqueue */
spin_lock_irq(_lock);
list_for_each(tmp, _head) {
p = list_entry(tmp, struct task_struct, run_list);
if (p == current)
continue;
if (p->state == TASK_RUNNING ||
(p->state == (TASK_RUNNING|TASK_PREEMPTED))) {
p->flags |= PF_SYNCING;
sync_count++;
}
}
if (sync_count == 0)
goto out;

syncing_task = current;
spin_unlock_irq(_lock);

/*
 * Cause a schedule on every CPU, as for a non-preemptible
 * kernel
 */

/* Save current state */
cpus_allowed = current->cpus_allowed;
policy = current->policy;
rt_priority = current->rt_priority;

/* Create an unreal time task. */
current->policy = SCHED_FIFO;
current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);

/* Make us schedulable on all CPUs. */
current->cpus_allowed = (1ULcpus_allowed = cpus_allowed;
current->policy = policy;
current->rt_priority = rt_priority;

/*
 * Wait, if necessary, until all preempted tasks
 * have reached a sync point.
 */

spin_lock_irq(_lock);
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (sync_count == 0)
break;
spin_unlock_irq(_lock);
schedule();
spin_lock_irq(_lock);
}
current->state = TASK_RUNNING;
syncing_task =  NULL;
out:
spin_unlock_irq(_lock);

up(_kernel_mtx);
}

And add this code to the beginning of schedule(), just after the
runqueue_lock is taken (the flags field is probably not be the right
place to put the synchronize mark; and the test should be optimized for
the fast path in the same way as the other tests in schedule(), but you
get the idea):

if ((prev->flags & PF_SYNCING) && !(prev->state & TASK_PREEMPTED)) {
prev->flags &= ~PF_SYNCING;
if (--sync_count == 0) {
syncing_task->state = TASK_RUNNING;
if (!task_on_runqueue(syncing_task))
add_to_runqueue(syncing_task);
syncing_task = NULL;
}

}

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-29 Thread Dipankar Sarma

On Wed, Mar 28, 2001 at 12:51:02PM -0800, george anzinger wrote:
> Dipankar Sarma wrote:
> > 
> > Also, a task could be preempted and then rescheduled on the same cpu
> > making
> > the depth counter 0 (right ?), but it could still be holding references
> > to data
> > structures to be updated using synchronize_kernel(). There seems to be
> > two
> > approaches to tackle preemption -
> > 
> > 1. Disable pre-emption during the time when references to data
> > structures
> > updated using such Two-phase updates are held.
> 
> Doesn't this fly in the face of the whole Two-phase system?  It seems to
> me that the point was to not require any locks.  Preemption disable IS a
> lock.  Not as strong as some, but a lock none the less.

The point is to avoid acquring costly locks, so it is a question of 
relative cost. Disabling preemption (by an atomic increment) for 
short critical sections may not be as bad as spin-waiting for 
highly contended locks or thrashing lock cachelines.


> > 
> > Pros: easy to implement using a flag (ctx_sw_off() ?)
> > Cons: not so easy to use since critical sections need to be clearly
> > identified and interfaces defined. also affects preemptive behavior.
> > 
> > 2. In synchronize_kernel(), distinguish between "natural" and preemptive
> > schedules() and ignore preemptive ones.
> > 
> > Pros: easy to use
> > Cons: Not so easy to implement. Also a low priority task that keeps
> > getting
> > preempted often can affect update side performance significantly.
> 
> Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in
> state).  Don't you also have to have some sort of task flag that
> indicates that the task is one that needs to sync?  Something that gets
> set when it enters the area of interest and cleared when it hits the
> sync point?  

None of the two two-phase update implementations (synchronize_kernel())
by Rusty and read-copy update by us, monitor the tasks that require
sync for update. synchronize_kernel() forces a schedule on every
cpu and read-copy update waits until every cpu goes through
a quiscent state, before updating. Both approaches will require
significant special handling because they implicitly assume 
that tasks inside the kernel are bound to the current cpu until it
reaches a quiescent state (like a "normal"
context switch). Since preempted tasks can run later on any cpu, we
will have to keep track of sync points on a per-task basis and
that will probably require using a snapshot of the running
tasks from the global runqueue. That may not be a good thing
from performance standpoint, not to mention the complexity.

Also, in situations where read-to-write ratio is not heavily
skewed towards read or lots of updates happening, a very low
priority task can have a significant impact on performance
by getting preempted all the time.

Thanks
Dipankar
-- 
Dipankar Sarma  ([EMAIL PROTECTED])
IBM Linux Technology Center
IBM Software Lab, Bangalore, India.
Project Page: http://lse.sourceforge.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-29 Thread Dipankar Sarma

On Wed, Mar 28, 2001 at 12:51:02PM -0800, george anzinger wrote:
 Dipankar Sarma wrote:
  
  Also, a task could be preempted and then rescheduled on the same cpu
  making
  the depth counter 0 (right ?), but it could still be holding references
  to data
  structures to be updated using synchronize_kernel(). There seems to be
  two
  approaches to tackle preemption -
  
  1. Disable pre-emption during the time when references to data
  structures
  updated using such Two-phase updates are held.
 
 Doesn't this fly in the face of the whole Two-phase system?  It seems to
 me that the point was to not require any locks.  Preemption disable IS a
 lock.  Not as strong as some, but a lock none the less.

The point is to avoid acquring costly locks, so it is a question of 
relative cost. Disabling preemption (by an atomic increment) for 
short critical sections may not be as bad as spin-waiting for 
highly contended locks or thrashing lock cachelines.


  
  Pros: easy to implement using a flag (ctx_sw_off() ?)
  Cons: not so easy to use since critical sections need to be clearly
  identified and interfaces defined. also affects preemptive behavior.
  
  2. In synchronize_kernel(), distinguish between "natural" and preemptive
  schedules() and ignore preemptive ones.
  
  Pros: easy to use
  Cons: Not so easy to implement. Also a low priority task that keeps
  getting
  preempted often can affect update side performance significantly.
 
 Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in
 state).  Don't you also have to have some sort of task flag that
 indicates that the task is one that needs to sync?  Something that gets
 set when it enters the area of interest and cleared when it hits the
 sync point?  

None of the two two-phase update implementations (synchronize_kernel())
by Rusty and read-copy update by us, monitor the tasks that require
sync for update. synchronize_kernel() forces a schedule on every
cpu and read-copy update waits until every cpu goes through
a quiscent state, before updating. Both approaches will require
significant special handling because they implicitly assume 
that tasks inside the kernel are bound to the current cpu until it
reaches a quiescent state (like a "normal"
context switch). Since preempted tasks can run later on any cpu, we
will have to keep track of sync points on a per-task basis and
that will probably require using a snapshot of the running
tasks from the global runqueue. That may not be a good thing
from performance standpoint, not to mention the complexity.

Also, in situations where read-to-write ratio is not heavily
skewed towards read or lots of updates happening, a very low
priority task can have a significant impact on performance
by getting preempted all the time.

Thanks
Dipankar
-- 
Dipankar Sarma  ([EMAIL PROTECTED])
IBM Linux Technology Center
IBM Software Lab, Bangalore, India.
Project Page: http://lse.sourceforge.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-29 Thread Nigel Gamble

On Tue, 20 Mar 2001, Nigel Gamble wrote:
 On Tue, 20 Mar 2001, Rusty Russell wrote:
  Thoughts?
 
 Perhaps synchronize_kernel() could take the run_queue lock, mark all the
 tasks on it and count them.  Any task marked when it calls schedule()
 voluntarily (but not if it is preempted) is unmarked and the count
 decremented.  synchronize_kernel() continues until the count is zero.

Hi Rusty,

Here is an attempt at a possible version of synchronize_kernel() that
should work on a preemptible kernel.  I haven't tested it yet.


static int sync_count = 0;
static struct task_struct *syncing_task = NULL;
static DECLARE_MUTEX(synchronize_kernel_mtx);

void
synchronize_kernel()
{
struct list_head *tmp;
struct task_struct *p;

/* Guard against multiple calls to this function */
down(synchronize_kernel_mtx);

/* Mark all tasks on the runqueue */
spin_lock_irq(runqueue_lock);
list_for_each(tmp, runqueue_head) {
p = list_entry(tmp, struct task_struct, run_list);
if (p == current)
continue;
if (p-state == TASK_RUNNING ||
(p-state == (TASK_RUNNING|TASK_PREEMPTED))) {
p-flags |= PF_SYNCING;
sync_count++;
}
}
if (sync_count == 0)
goto out;

syncing_task = current;
spin_unlock_irq(runqueue_lock);

/*
 * Cause a schedule on every CPU, as for a non-preemptible
 * kernel
 */

/* Save current state */
cpus_allowed = current-cpus_allowed;
policy = current-policy;
rt_priority = current-rt_priority;

/* Create an unreal time task. */
current-policy = SCHED_FIFO;
current-rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);

/* Make us schedulable on all CPUs. */
current-cpus_allowed = (1ULsmp_num_cpus)-1;

/* Eliminate current cpu, reschedule */
while ((current-cpus_allowed = ~(1  smp_processor_id())) != 0)
schedule();

/* Back to normal. */
current-cpus_allowed = cpus_allowed;
current-policy = policy;
current-rt_priority = rt_priority;

/*
 * Wait, if necessary, until all preempted tasks
 * have reached a sync point.
 */

spin_lock_irq(runqueue_lock);
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (sync_count == 0)
break;
spin_unlock_irq(runqueue_lock);
schedule();
spin_lock_irq(runqueue_lock);
}
current-state = TASK_RUNNING;
syncing_task =  NULL;
out:
spin_unlock_irq(runqueue_lock);

up(synchronize_kernel_mtx);
}

And add this code to the beginning of schedule(), just after the
runqueue_lock is taken (the flags field is probably not be the right
place to put the synchronize mark; and the test should be optimized for
the fast path in the same way as the other tests in schedule(), but you
get the idea):

if ((prev-flags  PF_SYNCING)  !(prev-state  TASK_PREEMPTED)) {
prev-flags = ~PF_SYNCING;
if (--sync_count == 0) {
syncing_task-state = TASK_RUNNING;
if (!task_on_runqueue(syncing_task))
add_to_runqueue(syncing_task);
syncing_task = NULL;
}

}

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-29 Thread Keith Owens

On Wed, 28 Mar 2001 12:51:02 -0800, 
george anzinger [EMAIL PROTECTED] wrote:
Dipankar Sarma wrote:
 1. Disable pre-emption during the time when references to data
 structures
 updated using such Two-phase updates are held.

Doesn't this fly in the face of the whole Two-phase system?  It seems to
me that the point was to not require any locks.  Preemption disable IS a
lock.  Not as strong as some, but a lock none the less.

The aim is to remove all module locks from the main kernel path and
move the penalty for module unloading into the task doing the unload.
Only the unload code needs to worry about preemption, not the main
kernel code paths, so main line code runs faster and scales better.  I
don't care how long (within reason) it takes to unload a module, it is
such a rare event.  I want module unload to be race free with zero
impact on the fast code paths, the current design penalizes the fast
code paths.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-28 Thread george anzinger

Dipankar Sarma wrote:
> 
> Nigel Gamble wrote:
> >
> > On Wed, 21 Mar 2001, Keith Owens wrote:
> > > I misread the code, but the idea is still correct.  Add a preemption
> > > depth counter to each cpu, when you schedule and the depth is zero then
> > > you know that the cpu is no longer holding any references to quiesced
> > > structures.
> >
> > A task that has been preempted is on the run queue and can be
> > rescheduled on a different CPU, so I can't see how a per-CPU counter
> > would work.  It seems to me that you would need a per run queue
> > counter, like the example I gave in a previous posting.
> 
> Also, a task could be preempted and then rescheduled on the same cpu
> making
> the depth counter 0 (right ?), but it could still be holding references
> to data
> structures to be updated using synchronize_kernel(). There seems to be
> two
> approaches to tackle preemption -
> 
> 1. Disable pre-emption during the time when references to data
> structures
> updated using such Two-phase updates are held.

Doesn't this fly in the face of the whole Two-phase system?  It seems to
me that the point was to not require any locks.  Preemption disable IS a
lock.  Not as strong as some, but a lock none the less.
> 
> Pros: easy to implement using a flag (ctx_sw_off() ?)
> Cons: not so easy to use since critical sections need to be clearly
> identified and interfaces defined. also affects preemptive behavior.
> 
> 2. In synchronize_kernel(), distinguish between "natural" and preemptive
> schedules() and ignore preemptive ones.
> 
> Pros: easy to use
> Cons: Not so easy to implement. Also a low priority task that keeps
> getting
> preempted often can affect update side performance significantly.

Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in
state).  Don't you also have to have some sort of task flag that
indicates that the task is one that needs to sync?  Something that gets
set when it enters the area of interest and cleared when it hits the
sync point?  

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-28 Thread Dipankar Sarma

Hi George,

george anzinger wrote:
> 
> Exactly so.  The method does not depend on the sum of preemption being
> zip, but on each potential reader (writers take locks) passing thru a
> "sync point".  Your notion of waiting for each task to arrive
> "naturally" at schedule() would work.  It is, in fact, over kill as you
> could also add arrival at sys call exit as a (the) "sync point".  In
> fact, for module unload, isn't this the real "sync point"?  After all, a
> module can call schedule, or did I miss a usage counter somewhere?

It is certainly possible to implement synchronize_kernel() like
primitive for two phase update using "sync point". Waiting for
sys call exit will perhaps work in the module unloading case,
but there could be performance issues if a cpu spends most of
its time in idle task/interrupts. synchronize_kernel() provides
a simple generic way of implementing a two phase update without
serialization for reading.

I am working a "sync point" based version of such an approach
available at http://lse.sourceforge.net/locking/rclock.html. It
is based on the original DYNIX/ptx stuff that Paul Mckenney
developed in early 90s. This and synchronize_kernel() are 
very similar in approach and each can be implemented using 
the other.

As for handling preemption, we can perhaps try 2 things -

1. The read side of the critical section is enclosed in
RC_RDPROTECT()/RC_RDUNPROTECT() which are currently nops.
We can disable/enable preemption using these.

2. Avoid counting preemptive context switches. I am not sure
about this one though.

> 
> By the way, there is a paper on this somewhere on the web.  Anyone
> remember where?

If you are talking about Paul's paper, the link is
http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf.

Thanks
Dipankar
-- 
Dipankar Sarma  ([EMAIL PROTECTED])
IBM Linux Technology Center
IBM Software Lab, Bangalore, India.
Project Page: http://lse.sourceforge.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-28 Thread Dipankar Sarma

Nigel Gamble wrote:
> 
> On Wed, 21 Mar 2001, Keith Owens wrote:
> > I misread the code, but the idea is still correct.  Add a preemption
> > depth counter to each cpu, when you schedule and the depth is zero then
> > you know that the cpu is no longer holding any references to quiesced
> > structures.
> 
> A task that has been preempted is on the run queue and can be
> rescheduled on a different CPU, so I can't see how a per-CPU counter
> would work.  It seems to me that you would need a per run queue
> counter, like the example I gave in a previous posting.

Also, a task could be preempted and then rescheduled on the same cpu
making
the depth counter 0 (right ?), but it could still be holding references
to data
structures to be updated using synchronize_kernel(). There seems to be
two
approaches to tackle preemption -

1. Disable pre-emption during the time when references to data
structures 
updated using such Two-phase updates are held.

Pros: easy to implement using a flag (ctx_sw_off() ?)
Cons: not so easy to use since critical sections need to be clearly
identified and interfaces defined. also affects preemptive behavior.

2. In synchronize_kernel(), distinguish between "natural" and preemptive
schedules() and ignore preemptive ones.

Pros: easy to use
Cons: Not so easy to implement. Also a low priority task that keeps
getting
preempted often can affect update side performance significantly.

I intend to experiment with both to understand the impact.

Thanks
Dipankar
-- 
Dipankar Sarma  ([EMAIL PROTECTED])
IBM Linux Technology Center
IBM Software Lab, Bangalore, India.
Project Page: http://lse.sourceforge.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-28 Thread Dipankar Sarma

Nigel Gamble wrote:
 
 On Wed, 21 Mar 2001, Keith Owens wrote:
  I misread the code, but the idea is still correct.  Add a preemption
  depth counter to each cpu, when you schedule and the depth is zero then
  you know that the cpu is no longer holding any references to quiesced
  structures.
 
 A task that has been preempted is on the run queue and can be
 rescheduled on a different CPU, so I can't see how a per-CPU counter
 would work.  It seems to me that you would need a per run queue
 counter, like the example I gave in a previous posting.

Also, a task could be preempted and then rescheduled on the same cpu
making
the depth counter 0 (right ?), but it could still be holding references
to data
structures to be updated using synchronize_kernel(). There seems to be
two
approaches to tackle preemption -

1. Disable pre-emption during the time when references to data
structures 
updated using such Two-phase updates are held.

Pros: easy to implement using a flag (ctx_sw_off() ?)
Cons: not so easy to use since critical sections need to be clearly
identified and interfaces defined. also affects preemptive behavior.

2. In synchronize_kernel(), distinguish between "natural" and preemptive
schedules() and ignore preemptive ones.

Pros: easy to use
Cons: Not so easy to implement. Also a low priority task that keeps
getting
preempted often can affect update side performance significantly.

I intend to experiment with both to understand the impact.

Thanks
Dipankar
-- 
Dipankar Sarma  ([EMAIL PROTECTED])
IBM Linux Technology Center
IBM Software Lab, Bangalore, India.
Project Page: http://lse.sourceforge.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-28 Thread Dipankar Sarma

Hi George,

george anzinger wrote:
 
 Exactly so.  The method does not depend on the sum of preemption being
 zip, but on each potential reader (writers take locks) passing thru a
 "sync point".  Your notion of waiting for each task to arrive
 "naturally" at schedule() would work.  It is, in fact, over kill as you
 could also add arrival at sys call exit as a (the) "sync point".  In
 fact, for module unload, isn't this the real "sync point"?  After all, a
 module can call schedule, or did I miss a usage counter somewhere?

It is certainly possible to implement synchronize_kernel() like
primitive for two phase update using "sync point". Waiting for
sys call exit will perhaps work in the module unloading case,
but there could be performance issues if a cpu spends most of
its time in idle task/interrupts. synchronize_kernel() provides
a simple generic way of implementing a two phase update without
serialization for reading.

I am working a "sync point" based version of such an approach
available at http://lse.sourceforge.net/locking/rclock.html. It
is based on the original DYNIX/ptx stuff that Paul Mckenney
developed in early 90s. This and synchronize_kernel() are 
very similar in approach and each can be implemented using 
the other.

As for handling preemption, we can perhaps try 2 things -

1. The read side of the critical section is enclosed in
RC_RDPROTECT()/RC_RDUNPROTECT() which are currently nops.
We can disable/enable preemption using these.

2. Avoid counting preemptive context switches. I am not sure
about this one though.

 
 By the way, there is a paper on this somewhere on the web.  Anyone
 remember where?

If you are talking about Paul's paper, the link is
http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf.

Thanks
Dipankar
-- 
Dipankar Sarma  ([EMAIL PROTECTED])
IBM Linux Technology Center
IBM Software Lab, Bangalore, India.
Project Page: http://lse.sourceforge.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-28 Thread george anzinger

Dipankar Sarma wrote:
 
 Nigel Gamble wrote:
 
  On Wed, 21 Mar 2001, Keith Owens wrote:
   I misread the code, but the idea is still correct.  Add a preemption
   depth counter to each cpu, when you schedule and the depth is zero then
   you know that the cpu is no longer holding any references to quiesced
   structures.
 
  A task that has been preempted is on the run queue and can be
  rescheduled on a different CPU, so I can't see how a per-CPU counter
  would work.  It seems to me that you would need a per run queue
  counter, like the example I gave in a previous posting.
 
 Also, a task could be preempted and then rescheduled on the same cpu
 making
 the depth counter 0 (right ?), but it could still be holding references
 to data
 structures to be updated using synchronize_kernel(). There seems to be
 two
 approaches to tackle preemption -
 
 1. Disable pre-emption during the time when references to data
 structures
 updated using such Two-phase updates are held.

Doesn't this fly in the face of the whole Two-phase system?  It seems to
me that the point was to not require any locks.  Preemption disable IS a
lock.  Not as strong as some, but a lock none the less.
 
 Pros: easy to implement using a flag (ctx_sw_off() ?)
 Cons: not so easy to use since critical sections need to be clearly
 identified and interfaces defined. also affects preemptive behavior.
 
 2. In synchronize_kernel(), distinguish between "natural" and preemptive
 schedules() and ignore preemptive ones.
 
 Pros: easy to use
 Cons: Not so easy to implement. Also a low priority task that keeps
 getting
 preempted often can affect update side performance significantly.

Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in
state).  Don't you also have to have some sort of task flag that
indicates that the task is one that needs to sync?  Something that gets
set when it enters the area of interest and cleared when it hits the
sync point?  

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-23 Thread Nigel Gamble

On Thu, 22 Mar 2001, Rusty Russell wrote:
> Nigel's "traverse the run queue and mark the preempted" solution is
> actually pretty nice, and cheap.  Since the runqueue lock is grabbed,
> it doesn't require icky atomic ops, either.

You'd have to mark both the preempted tasks, and the tasks currently
running on each CPU (which could become preempted before reaching a
voluntary schedule point).

> Despite Nigel's initial belief that this technique is fragile, I
> believe it will become an increasingly fundamental method in the
> kernel, so (with documentation) it will become widely understood, as
> it offers scalability and efficiency.

Actually, I agree with you now that I've had a chance to think about
this some more.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-23 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> 
> Keith Owens writes:
>  > Or have I missed something?
> 
> Nope, it is a fundamental problem with such kernel pre-emption
> schemes.  As a result, it would also break our big-reader locks
> (see include/linux/brlock.h).

Good point: holding a brlock has to block preemption, as per spinlocks.

> Basically, anything which uses smp_processor_id() would need to
> be holding some lock so as to not get pre-empted.

When I audited the uses of smp_processor_id() for the hotplug cpu
stuff, there were surprisingly few calls to smp_processor_id(), and
most of these will be holding a brlock, so OK already.

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-23 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> Nigel Gamble wrote:
> > 
> > On Wed, 21 Mar 2001, Keith Owens wrote:
> > > I misread the code, but the idea is still correct.  Add a preemption
> > > depth counter to each cpu, when you schedule and the depth is zero then
> > > you know that the cpu is no longer holding any references to quiesced
> > > structures.
> > 
> > A task that has been preempted is on the run queue and can be
> > rescheduled on a different CPU, so I can't see how a per-CPU counter
> > would work.  It seems to me that you would need a per run queue
> > counter, like the example I gave in a previous posting.
> 
> Exactly so.  The method does not depend on the sum of preemption being
> zip, but on each potential reader (writers take locks) passing thru a
> "sync point".  Your notion of waiting for each task to arrive
> "naturally" at schedule() would work.  It is, in fact, over kill as you
> could also add arrival at sys call exit as a (the) "sync point".  In

Power off is also a sync point 8).  But you want it to happen in
bounded time: consider a daemon which opens a device every minute and
never exits.

Nigel's "traverse the run queue and mark the preempted" solution is
actually pretty nice, and cheap.  Since the runqueue lock is grabbed,
it doesn't require icky atomic ops, either.

Despite Nigel's initial belief that this technique is fragile, I
believe it will become an increasingly fundamental method in the
kernel, so (with documentation) it will become widely understood, as
it offers scalability and efficiency.

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-23 Thread Rusty Russell

In message [EMAIL PROTECTED] you write:
 Nigel Gamble wrote:
  
  On Wed, 21 Mar 2001, Keith Owens wrote:
   I misread the code, but the idea is still correct.  Add a preemption
   depth counter to each cpu, when you schedule and the depth is zero then
   you know that the cpu is no longer holding any references to quiesced
   structures.
  
  A task that has been preempted is on the run queue and can be
  rescheduled on a different CPU, so I can't see how a per-CPU counter
  would work.  It seems to me that you would need a per run queue
  counter, like the example I gave in a previous posting.
 
 Exactly so.  The method does not depend on the sum of preemption being
 zip, but on each potential reader (writers take locks) passing thru a
 "sync point".  Your notion of waiting for each task to arrive
 "naturally" at schedule() would work.  It is, in fact, over kill as you
 could also add arrival at sys call exit as a (the) "sync point".  In

Power off is also a sync point 8).  But you want it to happen in
bounded time: consider a daemon which opens a device every minute and
never exits.

Nigel's "traverse the run queue and mark the preempted" solution is
actually pretty nice, and cheap.  Since the runqueue lock is grabbed,
it doesn't require icky atomic ops, either.

Despite Nigel's initial belief that this technique is fragile, I
believe it will become an increasingly fundamental method in the
kernel, so (with documentation) it will become widely understood, as
it offers scalability and efficiency.

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-23 Thread Nigel Gamble

On Thu, 22 Mar 2001, Rusty Russell wrote:
 Nigel's "traverse the run queue and mark the preempted" solution is
 actually pretty nice, and cheap.  Since the runqueue lock is grabbed,
 it doesn't require icky atomic ops, either.

You'd have to mark both the preempted tasks, and the tasks currently
running on each CPU (which could become preempted before reaching a
voluntary schedule point).

 Despite Nigel's initial belief that this technique is fragile, I
 believe it will become an increasingly fundamental method in the
 kernel, so (with documentation) it will become widely understood, as
 it offers scalability and efficiency.

Actually, I agree with you now that I've had a chance to think about
this some more.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-23 Thread Rusty Russell

In message [EMAIL PROTECTED] you write:
 
 Keith Owens writes:
   Or have I missed something?
 
 Nope, it is a fundamental problem with such kernel pre-emption
 schemes.  As a result, it would also break our big-reader locks
 (see include/linux/brlock.h).

Good point: holding a brlock has to block preemption, as per spinlocks.

 Basically, anything which uses smp_processor_id() would need to
 be holding some lock so as to not get pre-empted.

When I audited the uses of smp_processor_id() for the hotplug cpu
stuff, there were surprisingly few calls to smp_processor_id(), and
most of these will be holding a brlock, so OK already.

Rusty.
--
Premature optmztion is rt of all evl. --DK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Nigel Gamble

On Wed, 21 Mar 2001, Andrew Morton wrote:
> It's a problem for uniprocessors as well.
> 
> Example:
> 
> #define current_cpu_data boot_cpu_data
> #define pgd_quicklist (current_cpu_data.pgd_quick)
> 
> extern __inline__ void free_pgd_fast(pgd_t *pgd)
> {
> *(unsigned long *)pgd = (unsigned long) pgd_quicklist;
> pgd_quicklist = (unsigned long *) pgd;
> pgtable_cache_size++;
> }
> 
> Preemption could corrupt this list.

Thanks, Andrew, for pointing this out.  I've added fixes to the patch
for this problem and the others in pgalloc.h.  If you know of any other
similar problems on uniprocessors, please let me know.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Nigel Gamble

On Wed, 21 Mar 2001, David S. Miller wrote:
> Basically, anything which uses smp_processor_id() would need to
> be holding some lock so as to not get pre-empted.

Not necessarily.  Another solution for the smp_processor_id() case is
to ensure that the task can only be scheduled on the current CPU for the
duration that the value of smp_processor_id() is used.  Or, if the
critical region is very short, to disable interrupts on the local CPU.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Andrea Arcangeli

On Wed, Mar 21, 2001 at 08:19:54PM +1100, Keith Owens wrote:
> Ouch.  What about all the per cpu structures in the kernel, how do you
> handle them if a preempted task can be rescheduled on another cpu?
> 
>  int count[NR_CPUS], *p;
>  p = count+smp_processor_id(); /* start on cpu 0, [0] */
>  if (*p >= 1024) {
>/* preempt here, reschedule on cpu 1 */
>*p = 1;  /* update cpu 0 count from cpu 1, oops */
>   }
> 
> Unless you find every use of a per cpu structure and wrap a spin lock
> around it, migrating a task from one cpu to another is going to be a
> source of wierd and wonderful errors.  Since the main use of per cpu
> structures is to avoid locks, adding a spin lock to every structure
> will be a killer.  Or have I missed something?

That's why Linus suggested it for UP only.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread David S. Miller


george anzinger writes:
 > By the by, if a preemption lock is all that is needed the patch defines
 > it and it is rather fast (an inc going in and a dec & test comming
 > out).  A lot faster than a spin lock with its "LOCK" access.  A preempt
 > lock does not need to be "LOCK"ed because the only contender is the same
 > cpu.

So we would have to invoke this thing around every set of
smp_processor_id() references?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Andrew Morton

"David S. Miller" wrote:
> 
> Keith Owens writes:
>  > Or have I missed something?
> 
> Nope, it is a fundamental problem with such kernel pre-emption
> schemes.  As a result, it would also break our big-reader locks
> (see include/linux/brlock.h).
> 
> Basically, anything which uses smp_processor_id() would need to
> be holding some lock so as to not get pre-empted.
> 

It's a problem for uniprocessors as well.

Example:

#define current_cpu_data boot_cpu_data
#define pgd_quicklist (current_cpu_data.pgd_quick)

extern __inline__ void free_pgd_fast(pgd_t *pgd)
{
*(unsigned long *)pgd = (unsigned long) pgd_quicklist;
pgd_quicklist = (unsigned long *) pgd;
pgtable_cache_size++;
}

Preemption could corrupt this list.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread David S. Miller


Keith Owens writes:
 > Or have I missed something?

Nope, it is a fundamental problem with such kernel pre-emption
schemes.  As a result, it would also break our big-reader locks
(see include/linux/brlock.h).

Basically, anything which uses smp_processor_id() would need to
be holding some lock so as to not get pre-empted.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Keith Owens

Nigel Gamble wrote:
> A task that has been preempted is on the run queue and can be
> rescheduled on a different CPU, so I can't see how a per-CPU counter
> would work.  It seems to me that you would need a per run queue
> counter, like the example I gave in a previous posting.

Ouch.  What about all the per cpu structures in the kernel, how do you
handle them if a preempted task can be rescheduled on another cpu?

 int count[NR_CPUS], *p;
 p = count+smp_processor_id(); /* start on cpu 0, [0] */
 if (*p >= 1024) {
   /* preempt here, reschedule on cpu 1 */
   *p = 1;  /* update cpu 0 count from cpu 1, oops */
  }

Unless you find every use of a per cpu structure and wrap a spin lock
around it, migrating a task from one cpu to another is going to be a
source of wierd and wonderful errors.  Since the main use of per cpu
structures is to avoid locks, adding a spin lock to every structure
will be a killer.  Or have I missed something?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Keith Owens

On Wed, 21 Mar 2001 00:04:56 -0800, 
george anzinger <[EMAIL PROTECTED]> wrote:
>Exactly so.  The method does not depend on the sum of preemption being
>zip, but on each potential reader (writers take locks) passing thru a
>"sync point".  Your notion of waiting for each task to arrive
>"naturally" at schedule() would work.  It is, in fact, over kill as you
>could also add arrival at sys call exit as a (the) "sync point".  In
>fact, for module unload, isn't this the real "sync point"?  After all, a
>module can call schedule, or did I miss a usage counter somewhere?

A module can call schedule but it must do MOD_INC_USE_COUNT first.
Sleeping in module code without incrementing the module use count first
is a shooting offence.  It is so full of races that you may as well
call it Daytona.

>By the way, there is a paper on this somewhere on the web.  Anyone
>remember where?

http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread george anzinger

Nigel Gamble wrote:
> 
> On Wed, 21 Mar 2001, Keith Owens wrote:
> > I misread the code, but the idea is still correct.  Add a preemption
> > depth counter to each cpu, when you schedule and the depth is zero then
> > you know that the cpu is no longer holding any references to quiesced
> > structures.
> 
> A task that has been preempted is on the run queue and can be
> rescheduled on a different CPU, so I can't see how a per-CPU counter
> would work.  It seems to me that you would need a per run queue
> counter, like the example I gave in a previous posting.

Exactly so.  The method does not depend on the sum of preemption being
zip, but on each potential reader (writers take locks) passing thru a
"sync point".  Your notion of waiting for each task to arrive
"naturally" at schedule() would work.  It is, in fact, over kill as you
could also add arrival at sys call exit as a (the) "sync point".  In
fact, for module unload, isn't this the real "sync point"?  After all, a
module can call schedule, or did I miss a usage counter somewhere?

By the way, there is a paper on this somewhere on the web.  Anyone
remember where?

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread george anzinger

Nigel Gamble wrote:
 
 On Wed, 21 Mar 2001, Keith Owens wrote:
  I misread the code, but the idea is still correct.  Add a preemption
  depth counter to each cpu, when you schedule and the depth is zero then
  you know that the cpu is no longer holding any references to quiesced
  structures.
 
 A task that has been preempted is on the run queue and can be
 rescheduled on a different CPU, so I can't see how a per-CPU counter
 would work.  It seems to me that you would need a per run queue
 counter, like the example I gave in a previous posting.

Exactly so.  The method does not depend on the sum of preemption being
zip, but on each potential reader (writers take locks) passing thru a
"sync point".  Your notion of waiting for each task to arrive
"naturally" at schedule() would work.  It is, in fact, over kill as you
could also add arrival at sys call exit as a (the) "sync point".  In
fact, for module unload, isn't this the real "sync point"?  After all, a
module can call schedule, or did I miss a usage counter somewhere?

By the way, there is a paper on this somewhere on the web.  Anyone
remember where?

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Keith Owens

On Wed, 21 Mar 2001 00:04:56 -0800, 
george anzinger [EMAIL PROTECTED] wrote:
Exactly so.  The method does not depend on the sum of preemption being
zip, but on each potential reader (writers take locks) passing thru a
"sync point".  Your notion of waiting for each task to arrive
"naturally" at schedule() would work.  It is, in fact, over kill as you
could also add arrival at sys call exit as a (the) "sync point".  In
fact, for module unload, isn't this the real "sync point"?  After all, a
module can call schedule, or did I miss a usage counter somewhere?

A module can call schedule but it must do MOD_INC_USE_COUNT first.
Sleeping in module code without incrementing the module use count first
is a shooting offence.  It is so full of races that you may as well
call it Daytona.

By the way, there is a paper on this somewhere on the web.  Anyone
remember where?

http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Keith Owens

Nigel Gamble wrote:
 A task that has been preempted is on the run queue and can be
 rescheduled on a different CPU, so I can't see how a per-CPU counter
 would work.  It seems to me that you would need a per run queue
 counter, like the example I gave in a previous posting.

Ouch.  What about all the per cpu structures in the kernel, how do you
handle them if a preempted task can be rescheduled on another cpu?

 int count[NR_CPUS], *p;
 p = count+smp_processor_id(); /* start on cpu 0, count[0] */
 if (*p = 1024) {
   /* preempt here, reschedule on cpu 1 */
   *p = 1;  /* update cpu 0 count from cpu 1, oops */
  }

Unless you find every use of a per cpu structure and wrap a spin lock
around it, migrating a task from one cpu to another is going to be a
source of wierd and wonderful errors.  Since the main use of per cpu
structures is to avoid locks, adding a spin lock to every structure
will be a killer.  Or have I missed something?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Andrea Arcangeli

On Wed, Mar 21, 2001 at 08:19:54PM +1100, Keith Owens wrote:
 Ouch.  What about all the per cpu structures in the kernel, how do you
 handle them if a preempted task can be rescheduled on another cpu?
 
  int count[NR_CPUS], *p;
  p = count+smp_processor_id(); /* start on cpu 0, count[0] */
  if (*p = 1024) {
/* preempt here, reschedule on cpu 1 */
*p = 1;  /* update cpu 0 count from cpu 1, oops */
   }
 
 Unless you find every use of a per cpu structure and wrap a spin lock
 around it, migrating a task from one cpu to another is going to be a
 source of wierd and wonderful errors.  Since the main use of per cpu
 structures is to avoid locks, adding a spin lock to every structure
 will be a killer.  Or have I missed something?

That's why Linus suggested it for UP only.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread David S. Miller


george anzinger writes:
  By the by, if a preemption lock is all that is needed the patch defines
  it and it is rather fast (an inc going in and a dec  test comming
  out).  A lot faster than a spin lock with its "LOCK" access.  A preempt
  lock does not need to be "LOCK"ed because the only contender is the same
  cpu.

So we would have to invoke this thing around every set of
smp_processor_id() references?

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Nigel Gamble

On Wed, 21 Mar 2001, Andrew Morton wrote:
 It's a problem for uniprocessors as well.
 
 Example:
 
 #define current_cpu_data boot_cpu_data
 #define pgd_quicklist (current_cpu_data.pgd_quick)
 
 extern __inline__ void free_pgd_fast(pgd_t *pgd)
 {
 *(unsigned long *)pgd = (unsigned long) pgd_quicklist;
 pgd_quicklist = (unsigned long *) pgd;
 pgtable_cache_size++;
 }
 
 Preemption could corrupt this list.

Thanks, Andrew, for pointing this out.  I've added fixes to the patch
for this problem and the others in pgalloc.h.  If you know of any other
similar problems on uniprocessors, please let me know.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread David S. Miller


Keith Owens writes:
  Or have I missed something?

Nope, it is a fundamental problem with such kernel pre-emption
schemes.  As a result, it would also break our big-reader locks
(see include/linux/brlock.h).

Basically, anything which uses smp_processor_id() would need to
be holding some lock so as to not get pre-empted.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Nigel Gamble

On Wed, 21 Mar 2001, David S. Miller wrote:
 Basically, anything which uses smp_processor_id() would need to
 be holding some lock so as to not get pre-empted.

Not necessarily.  Another solution for the smp_processor_id() case is
to ensure that the task can only be scheduled on the current CPU for the
duration that the value of smp_processor_id() is used.  Or, if the
critical region is very short, to disable interrupts on the local CPU.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-21 Thread Andrew Morton

"David S. Miller" wrote:
 
 Keith Owens writes:
   Or have I missed something?
 
 Nope, it is a fundamental problem with such kernel pre-emption
 schemes.  As a result, it would also break our big-reader locks
 (see include/linux/brlock.h).
 
 Basically, anything which uses smp_processor_id() would need to
 be holding some lock so as to not get pre-empted.
 

It's a problem for uniprocessors as well.

Example:

#define current_cpu_data boot_cpu_data
#define pgd_quicklist (current_cpu_data.pgd_quick)

extern __inline__ void free_pgd_fast(pgd_t *pgd)
{
*(unsigned long *)pgd = (unsigned long) pgd_quicklist;
pgd_quicklist = (unsigned long *) pgd;
pgtable_cache_size++;
}

Preemption could corrupt this list.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Nigel Gamble

On Wed, 21 Mar 2001, Keith Owens wrote:
> I misread the code, but the idea is still correct.  Add a preemption
> depth counter to each cpu, when you schedule and the depth is zero then
> you know that the cpu is no longer holding any references to quiesced
> structures.

A task that has been preempted is on the run queue and can be
rescheduled on a different CPU, so I can't see how a per-CPU counter
would work.  It seems to me that you would need a per run queue
counter, like the example I gave in a previous posting.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Keith Owens

On Tue, 20 Mar 2001 16:48:01 -0800 (PST), 
Nigel Gamble <[EMAIL PROTECTED]> wrote:
>On Tue, 20 Mar 2001, Keith Owens wrote:
>> The preemption patch only allows preemption from interrupt and only for
>> a single level of preemption.  That coexists quite happily with
>> synchronize_kernel() which runs in user context.  Just count user
>> context schedules (preempt_count == 0), not preemptive schedules.
>
>If you're looking at preempt_schedule(), note the call to ctx_sw_off()
>only increments current->preempt_count for the preempted task - the
>higher priority preempting task that is about to be scheduled will have
>a preempt_count of 0.

I misread the code, but the idea is still correct.  Add a preemption
depth counter to each cpu, when you schedule and the depth is zero then
you know that the cpu is no longer holding any references to quiesced
structures.

>So, to make sure I understand this, the code to free a node would look
>like:
>
>   prev->next = node->next; /* assumed to be atomic */
>   synchronize_kernel();
>   free(node);
>
>So that any other CPU concurrently traversing the list would see a
>consistent state, either including or not including "node" before the
>call to synchronize_kernel(); but after synchronize_kernel() all other
>CPUs are guaranteed to see a list that no longer includes "node", so it
>is now safe to free it.
>
>It looks like there are also implicit assumptions to this approach, like
>no other CPU is trying to use the same approach simultaneously to free
>"prev".

Not quite.  The idea is that readers can traverse lists without locks,
code that changes the list needs to take a semaphore first.

Read
node = node->next;

Update
down(_sem);
prev->next = node->next;
synchronize_kernel();
free(node);
up(_sem);

Because the readers have no locks, other cpus can have references to
the node being freed.  The updating code needs to wait until all other
cpus have gone through at least one schedule to ensure that all
dangling references have been flushed.  Adding preemption complicates
this slightly, we have to distinguish between the bottom level schedule
and higher level schedules for preemptive code.  Only when all
preemptive code on a cpu has ended is it safe to say that there are no
dangling references left on that cpu.

This method is a win for high read, low update lists.  Instead of
penalizing the read code every time on the off chance that somebody
will update the data, speed up the common code and penalize the update
code.  The classic example is module code, it is rarely unloaded but
right now everything that *might* be entering a module has to grab the
module spin lock and update the module use count.  So main line code is
being slowed down all the time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Nigel Gamble

On Tue, 20 Mar 2001, Keith Owens wrote:
> The preemption patch only allows preemption from interrupt and only for
> a single level of preemption.  That coexists quite happily with
> synchronize_kernel() which runs in user context.  Just count user
> context schedules (preempt_count == 0), not preemptive schedules.

I'm not sure what you mean by "only for a single level of preemption."
It's possible for a preempting process to be preempted itself by a
higher priority process, and for that process to be preempted by an even
higher priority one, limited only by the number of processes waiting for
interrupt handlers to make them runnable.  This isn't very likely in
practice (kernel preemptions tend to be rare compared to normal calls to
schedule()), but it could happen in theory.

If you're looking at preempt_schedule(), note the call to ctx_sw_off()
only increments current->preempt_count for the preempted task - the
higher priority preempting task that is about to be scheduled will have
a preempt_count of 0.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Nigel Gamble

On Tue, 20 Mar 2001, Rusty Russell wrote:
>   I can see three problems with this approach, only one of which
> is serious.
> 
> The first is code which is already SMP unsafe is now a problem for
> everyone, not just the 0.1% of SMP machines.  I consider this a good
> thing for 2.5 though.

So do I.

> The second is that there are "manual" locking schemes which are used
> in several places in the kernel which rely on non-preemptability;
> de-facto spinlocks if you will.  I consider all these uses flawed: (1)
> they are often subtly broken anyway, (2) they make reading those parts
> of the code much harder, and (3) they break when things like this are
> done.

Likewise.

> The third is that preemtivity conflicts with the naive
> quiescent-period approach proposed for module unloading in 2.5, and
> useful for several other things (eg. hotplugging CPUs).  This method
> relies on knowing that when a schedule() has occurred on every CPU, we
> know noone is holding certain references.  The simplest example is a
> single linked list: you can traverse without a lock as long as you
> don't sleep, and then someone can unlink a node, and wait for a
> schedule on every other CPU before freeing it.  The non-SMP case is a
> noop.  See synchonize_kernel() below.

So, to make sure I understand this, the code to free a node would look
like:

prev->next = node->next; /* assumed to be atomic */
synchronize_kernel();
free(node);

So that any other CPU concurrently traversing the list would see a
consistent state, either including or not including "node" before the
call to synchronize_kernel(); but after synchronize_kernel() all other
CPUs are guaranteed to see a list that no longer includes "node", so it
is now safe to free it.

It looks like there are also implicit assumptions to this approach, like
no other CPU is trying to use the same approach simultaneously to free
"prev".  So my initial reaction is that this approach is, like the
manual locking schemes you commented on above, open to being subtly
broken when people don't understand all the implicit assumptions and
subsequently invalidate them.

> This, too, is soluble, but it means that synchronize_kernel() must
> guarantee that each task which was running or preempted in kernel
> space when it was called, has been non-preemtively scheduled before
> synchronize_kernel() can exit.  Icky.

Yes, you're right.

> Thoughts?

Perhaps synchronize_kernel() could take the run_queue lock, mark all the
tasks on it and count them.  Any task marked when it calls schedule()
voluntarily (but not if it is preempted) is unmarked and the count
decremented.  synchronize_kernel() continues until the count is zero.
As you said, "Icky."

> /* We could keep a schedule count for each CPU and make idle tasks
>schedule (some don't unless need_resched), but this scales quite
>well (eg. 64 processors, average time to wait for first schedule =
>jiffie/64.  Total time for all processors = jiffie/63 + jiffie/62...
> 
>At 1024 cpus, this is about 7.5 jiffies.  And that assumes noone
>schedules early. --RR */
> void synchronize_kernel(void)
> {
>   unsigned long cpus_allowed, policy, rt_priority;
> 
>   /* Save current state */
>   cpus_allowed = current->cpus_allowed;
>   policy = current->policy;
>   rt_priority = current->rt_priority;
> 
>   /* Create an unreal time task. */
>   current->policy = SCHED_FIFO;
>   current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);
> 
>   /* Make us schedulable on all CPUs. */
>   current->cpus_allowed = (1UL<   
>   /* Eliminate current cpu, reschedule */
>   while ((current->cpus_allowed &= ~(1 << smp_processor_id())) != 0)
>   schedule();
> 
>   /* Back to normal. */
>   current->cpus_allowed = cpus_allowed;
>   current->policy = policy;
>   current->rt_priority = rt_priority;
> }
> 

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread george anzinger

Nigel Gamble wrote:
> 
> On Tue, 20 Mar 2001, Roger Larsson wrote:
> > One little readability thing I found.
> > The prev->state TASK_ value is mostly used as a plain value
> > but the new TASK_PREEMPTED is or:ed together with whatever was there.
> > Later when we switch to check the state it is checked against TASK_PREEMPTED
> > only. Since TASK_RUNNING is 0 it works OK but...
> 
> Yes, you're right.  I had forgotten that TASK_RUNNING is 0 and I think I
> was assuming that there could be (rare) cases where a task was preempted
> while prev->state was in transition such that no other flags were set.
> This is, of course, impossible given that TASK_RUNNING is 0.  So your
> change makes the common case more obvious (to me, at least!)
> 
> > --- sched.c.nigel   Tue Mar 20 18:52:43 2001
> > +++ sched.c.roger   Tue Mar 20 19:03:28 2001
> > @@ -553,7 +553,7 @@
> >  #endif
> > del_from_runqueue(prev);
> >  #ifdef CONFIG_PREEMPT
> > -   case TASK_PREEMPTED:
> > +   case TASK_RUNNING | TASK_PREEMPTED:
> >  #endif
> > case TASK_RUNNING:
> > }
> >
> >
> > We could add all/(other common) combinations as cases
> >
> >   switch (prev->state) {
> >   case TASK_INTERRUPTIBLE:
> >   if (signal_pending(prev)) {
> >   prev->state = TASK_RUNNING;
> >   break;
> >   }
> >   default:
> > #ifdef CONFIG_PREEMPT
> >   if (prev->state & TASK_PREEMPTED)
> >   break;
> > #endif
> >   del_from_runqueue(prev);
> > #ifdef CONFIG_PREEMPT
> >   case TASK_RUNNING   | TASK_PREEMPTED:
> >   case TASK_INTERRUPTIBLE | TASK_PREEMPTED:
> >   case TASK_UNINTERRUPTIBLE   | TASK_PREEMPTED:
> > #endif
> >   case TASK_RUNNING:
> >   }
> >
> >
> > Then the break in default case could almost be replaced with a BUG()...
> > (I have not checked the generated code)
> 
> The other cases are not very common, as they only happen if a task is
> preempted during the short time that it is running while in the process
> of changing state while going to sleep or waking up, so the default case
> is probably OK for them; and I'd be happier to leave the default case
> for reliability reasons anyway.

Especially since he forgot:

TASK_ZOMBIE
TASK_STOPPED
TASK_SWAPPING

I don't know about the last two but TASK_ZOMBIE must be handled
correctly or the task will never clear.

In general, a task must run till it gets to schedule() before the actual
state is "real" so the need for the TASK_PREEMPT.  

The actual code generated with what you propose should be the same (even
if TASK_RUNNING != 0, except for the constant).

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Nigel Gamble

On Tue, 20 Mar 2001, Roger Larsson wrote:
> One little readability thing I found.
> The prev->state TASK_ value is mostly used as a plain value
> but the new TASK_PREEMPTED is or:ed together with whatever was there.
> Later when we switch to check the state it is checked against TASK_PREEMPTED
> only. Since TASK_RUNNING is 0 it works OK but...

Yes, you're right.  I had forgotten that TASK_RUNNING is 0 and I think I
was assuming that there could be (rare) cases where a task was preempted
while prev->state was in transition such that no other flags were set.
This is, of course, impossible given that TASK_RUNNING is 0.  So your
change makes the common case more obvious (to me, at least!)

> --- sched.c.nigel   Tue Mar 20 18:52:43 2001
> +++ sched.c.roger   Tue Mar 20 19:03:28 2001
> @@ -553,7 +553,7 @@
>  #endif
> del_from_runqueue(prev);
>  #ifdef CONFIG_PREEMPT
> -   case TASK_PREEMPTED:
> +   case TASK_RUNNING | TASK_PREEMPTED:
>  #endif
> case TASK_RUNNING:
> }
> 
> 
> We could add all/(other common) combinations as cases 
> 
>   switch (prev->state) {
>   case TASK_INTERRUPTIBLE:
>   if (signal_pending(prev)) {
>   prev->state = TASK_RUNNING;
>   break;
>   }
>   default:
> #ifdef CONFIG_PREEMPT
>   if (prev->state & TASK_PREEMPTED)
>   break;
> #endif
>   del_from_runqueue(prev);
> #ifdef CONFIG_PREEMPT
>   case TASK_RUNNING   | TASK_PREEMPTED:
>   case TASK_INTERRUPTIBLE | TASK_PREEMPTED:
>   case TASK_UNINTERRUPTIBLE   | TASK_PREEMPTED:
> #endif
>   case TASK_RUNNING:
>   }
> 
> 
> Then the break in default case could almost be replaced with a BUG()...
> (I have not checked the generated code)

The other cases are not very common, as they only happen if a task is
preempted during the short time that it is running while in the process
of changing state while going to sleep or waking up, so the default case
is probably OK for them; and I'd be happier to leave the default case
for reliability reasons anyway.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Roger Larsson

Hi,

One little readability thing I found.
The prev->state TASK_ value is mostly used as a plain value
but the new TASK_PREEMPTED is or:ed together with whatever was there.
Later when we switch to check the state it is checked against TASK_PREEMPTED
only. Since TASK_RUNNING is 0 it works OK but...

--- sched.c.nigel   Tue Mar 20 18:52:43 2001
+++ sched.c.roger   Tue Mar 20 19:03:28 2001
@@ -553,7 +553,7 @@
 #endif
del_from_runqueue(prev);
 #ifdef CONFIG_PREEMPT
-   case TASK_PREEMPTED:
+   case TASK_RUNNING | TASK_PREEMPTED:
 #endif
case TASK_RUNNING:
}


We could add all/(other common) combinations as cases 

switch (prev->state) {
case TASK_INTERRUPTIBLE:
if (signal_pending(prev)) {
prev->state = TASK_RUNNING;
break;
}
default:
#ifdef CONFIG_PREEMPT
if (prev->state & TASK_PREEMPTED)
break;
#endif
del_from_runqueue(prev);
#ifdef CONFIG_PREEMPT
case TASK_RUNNING   | TASK_PREEMPTED:
case TASK_INTERRUPTIBLE | TASK_PREEMPTED:
case TASK_UNINTERRUPTIBLE   | TASK_PREEMPTED:
#endif
case TASK_RUNNING:
}


Then the break in default case could almost be replaced with a BUG()...
(I have not checked the generated code)

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Keith Owens

On Tue, 20 Mar 2001 19:43:50 +1100, 
Rusty Russell <[EMAIL PROTECTED]> wrote:
>The third is that preemtivity conflicts with the naive
>quiescent-period approach proposed for module unloading in 2.5, and
>useful for several other things (eg. hotplugging CPUs).  This method
>relies on knowing that when a schedule() has occurred on every CPU, we
>know noone is holding certain references.
>
>This, too, is soluble, but it means that synchronize_kernel() must
>guarantee that each task which was running or preempted in kernel
>space when it was called, has been non-preemtively scheduled before
>synchronize_kernel() can exit.  Icky.

The preemption patch only allows preemption from interrupt and only for
a single level of preemption.  That coexists quite happily with
synchronize_kernel() which runs in user context.  Just count user
context schedules (preempt_count == 0), not preemptive schedules.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Rusty Russell

In message <[EMAIL PROTECTED]> you write:
> Kernel preemption is not allowed while spinlocks are held, which means
> that this patch alone cannot guarantee low preemption latencies.  But
> as long held locks (in particular the BKL) are replaced by finer-grained
> locks, this patch will enable lower latencies as the kernel also becomes
> more scalable on large SMP systems.

Hi Nigel,

I can see three problems with this approach, only one of which
is serious.

The first is code which is already SMP unsafe is now a problem for
everyone, not just the 0.1% of SMP machines.  I consider this a good
thing for 2.5 though.

The second is that there are "manual" locking schemes which are used
in several places in the kernel which rely on non-preemptability;
de-facto spinlocks if you will.  I consider all these uses flawed: (1)
they are often subtly broken anyway, (2) they make reading those parts
of the code much harder, and (3) they break when things like this are
done.

The third is that preemtivity conflicts with the naive
quiescent-period approach proposed for module unloading in 2.5, and
useful for several other things (eg. hotplugging CPUs).  This method
relies on knowing that when a schedule() has occurred on every CPU, we
know noone is holding certain references.  The simplest example is a
single linked list: you can traverse without a lock as long as you
don't sleep, and then someone can unlink a node, and wait for a
schedule on every other CPU before freeing it.  The non-SMP case is a
noop.  See synchonize_kernel() below.

This, too, is soluble, but it means that synchronize_kernel() must
guarantee that each task which was running or preempted in kernel
space when it was called, has been non-preemtively scheduled before
synchronize_kernel() can exit.  Icky.

Thoughts?
Rusty.
--
Premature optmztion is rt of all evl. --DK

/* We could keep a schedule count for each CPU and make idle tasks
   schedule (some don't unless need_resched), but this scales quite
   well (eg. 64 processors, average time to wait for first schedule =
   jiffie/64.  Total time for all processors = jiffie/63 + jiffie/62...

   At 1024 cpus, this is about 7.5 jiffies.  And that assumes noone
   schedules early. --RR */
void synchronize_kernel(void)
{
unsigned long cpus_allowed, policy, rt_priority;

/* Save current state */
cpus_allowed = current->cpus_allowed;
policy = current->policy;
rt_priority = current->rt_priority;

/* Create an unreal time task. */
current->policy = SCHED_FIFO;
current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);

/* Make us schedulable on all CPUs. */
current->cpus_allowed = (1ULcpus_allowed = cpus_allowed;
current->policy = policy;
current->rt_priority = rt_priority;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Rusty Russell

In message [EMAIL PROTECTED] you write:
 Kernel preemption is not allowed while spinlocks are held, which means
 that this patch alone cannot guarantee low preemption latencies.  But
 as long held locks (in particular the BKL) are replaced by finer-grained
 locks, this patch will enable lower latencies as the kernel also becomes
 more scalable on large SMP systems.

Hi Nigel,

I can see three problems with this approach, only one of which
is serious.

The first is code which is already SMP unsafe is now a problem for
everyone, not just the 0.1% of SMP machines.  I consider this a good
thing for 2.5 though.

The second is that there are "manual" locking schemes which are used
in several places in the kernel which rely on non-preemptability;
de-facto spinlocks if you will.  I consider all these uses flawed: (1)
they are often subtly broken anyway, (2) they make reading those parts
of the code much harder, and (3) they break when things like this are
done.

The third is that preemtivity conflicts with the naive
quiescent-period approach proposed for module unloading in 2.5, and
useful for several other things (eg. hotplugging CPUs).  This method
relies on knowing that when a schedule() has occurred on every CPU, we
know noone is holding certain references.  The simplest example is a
single linked list: you can traverse without a lock as long as you
don't sleep, and then someone can unlink a node, and wait for a
schedule on every other CPU before freeing it.  The non-SMP case is a
noop.  See synchonize_kernel() below.

This, too, is soluble, but it means that synchronize_kernel() must
guarantee that each task which was running or preempted in kernel
space when it was called, has been non-preemtively scheduled before
synchronize_kernel() can exit.  Icky.

Thoughts?
Rusty.
--
Premature optmztion is rt of all evl. --DK

/* We could keep a schedule count for each CPU and make idle tasks
   schedule (some don't unless need_resched), but this scales quite
   well (eg. 64 processors, average time to wait for first schedule =
   jiffie/64.  Total time for all processors = jiffie/63 + jiffie/62...

   At 1024 cpus, this is about 7.5 jiffies.  And that assumes noone
   schedules early. --RR */
void synchronize_kernel(void)
{
unsigned long cpus_allowed, policy, rt_priority;

/* Save current state */
cpus_allowed = current-cpus_allowed;
policy = current-policy;
rt_priority = current-rt_priority;

/* Create an unreal time task. */
current-policy = SCHED_FIFO;
current-rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);

/* Make us schedulable on all CPUs. */
current-cpus_allowed = (1ULsmp_num_cpus)-1;

/* Eliminate current cpu, reschedule */
while ((current-cpus_allowed = ~(1  smp_processor_id())) != 0)
schedule();

/* Back to normal. */
current-cpus_allowed = cpus_allowed;
current-policy = policy;
current-rt_priority = rt_priority;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Keith Owens

On Tue, 20 Mar 2001 19:43:50 +1100, 
Rusty Russell [EMAIL PROTECTED] wrote:
The third is that preemtivity conflicts with the naive
quiescent-period approach proposed for module unloading in 2.5, and
useful for several other things (eg. hotplugging CPUs).  This method
relies on knowing that when a schedule() has occurred on every CPU, we
know noone is holding certain references.

This, too, is soluble, but it means that synchronize_kernel() must
guarantee that each task which was running or preempted in kernel
space when it was called, has been non-preemtively scheduled before
synchronize_kernel() can exit.  Icky.

The preemption patch only allows preemption from interrupt and only for
a single level of preemption.  That coexists quite happily with
synchronize_kernel() which runs in user context.  Just count user
context schedules (preempt_count == 0), not preemptive schedules.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Roger Larsson

Hi,

One little readability thing I found.
The prev-state TASK_ value is mostly used as a plain value
but the new TASK_PREEMPTED is or:ed together with whatever was there.
Later when we switch to check the state it is checked against TASK_PREEMPTED
only. Since TASK_RUNNING is 0 it works OK but...

--- sched.c.nigel   Tue Mar 20 18:52:43 2001
+++ sched.c.roger   Tue Mar 20 19:03:28 2001
@@ -553,7 +553,7 @@
 #endif
del_from_runqueue(prev);
 #ifdef CONFIG_PREEMPT
-   case TASK_PREEMPTED:
+   case TASK_RUNNING | TASK_PREEMPTED:
 #endif
case TASK_RUNNING:
}


We could add all/(other common) combinations as cases 

switch (prev-state) {
case TASK_INTERRUPTIBLE:
if (signal_pending(prev)) {
prev-state = TASK_RUNNING;
break;
}
default:
#ifdef CONFIG_PREEMPT
if (prev-state  TASK_PREEMPTED)
break;
#endif
del_from_runqueue(prev);
#ifdef CONFIG_PREEMPT
case TASK_RUNNING   | TASK_PREEMPTED:
case TASK_INTERRUPTIBLE | TASK_PREEMPTED:
case TASK_UNINTERRUPTIBLE   | TASK_PREEMPTED:
#endif
case TASK_RUNNING:
}


Then the break in default case could almost be replaced with a BUG()...
(I have not checked the generated code)

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread george anzinger

Nigel Gamble wrote:
 
 On Tue, 20 Mar 2001, Roger Larsson wrote:
  One little readability thing I found.
  The prev-state TASK_ value is mostly used as a plain value
  but the new TASK_PREEMPTED is or:ed together with whatever was there.
  Later when we switch to check the state it is checked against TASK_PREEMPTED
  only. Since TASK_RUNNING is 0 it works OK but...
 
 Yes, you're right.  I had forgotten that TASK_RUNNING is 0 and I think I
 was assuming that there could be (rare) cases where a task was preempted
 while prev-state was in transition such that no other flags were set.
 This is, of course, impossible given that TASK_RUNNING is 0.  So your
 change makes the common case more obvious (to me, at least!)
 
  --- sched.c.nigel   Tue Mar 20 18:52:43 2001
  +++ sched.c.roger   Tue Mar 20 19:03:28 2001
  @@ -553,7 +553,7 @@
   #endif
  del_from_runqueue(prev);
   #ifdef CONFIG_PREEMPT
  -   case TASK_PREEMPTED:
  +   case TASK_RUNNING | TASK_PREEMPTED:
   #endif
  case TASK_RUNNING:
  }
 
 
  We could add all/(other common) combinations as cases
 
switch (prev-state) {
case TASK_INTERRUPTIBLE:
if (signal_pending(prev)) {
prev-state = TASK_RUNNING;
break;
}
default:
  #ifdef CONFIG_PREEMPT
if (prev-state  TASK_PREEMPTED)
break;
  #endif
del_from_runqueue(prev);
  #ifdef CONFIG_PREEMPT
case TASK_RUNNING   | TASK_PREEMPTED:
case TASK_INTERRUPTIBLE | TASK_PREEMPTED:
case TASK_UNINTERRUPTIBLE   | TASK_PREEMPTED:
  #endif
case TASK_RUNNING:
}
 
 
  Then the break in default case could almost be replaced with a BUG()...
  (I have not checked the generated code)
 
 The other cases are not very common, as they only happen if a task is
 preempted during the short time that it is running while in the process
 of changing state while going to sleep or waking up, so the default case
 is probably OK for them; and I'd be happier to leave the default case
 for reliability reasons anyway.

Especially since he forgot:

TASK_ZOMBIE
TASK_STOPPED
TASK_SWAPPING

I don't know about the last two but TASK_ZOMBIE must be handled
correctly or the task will never clear.

In general, a task must run till it gets to schedule() before the actual
state is "real" so the need for the TASK_PREEMPT.  

The actual code generated with what you propose should be the same (even
if TASK_RUNNING != 0, except for the constant).

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Nigel Gamble

On Tue, 20 Mar 2001, Rusty Russell wrote:
   I can see three problems with this approach, only one of which
 is serious.
 
 The first is code which is already SMP unsafe is now a problem for
 everyone, not just the 0.1% of SMP machines.  I consider this a good
 thing for 2.5 though.

So do I.

 The second is that there are "manual" locking schemes which are used
 in several places in the kernel which rely on non-preemptability;
 de-facto spinlocks if you will.  I consider all these uses flawed: (1)
 they are often subtly broken anyway, (2) they make reading those parts
 of the code much harder, and (3) they break when things like this are
 done.

Likewise.

 The third is that preemtivity conflicts with the naive
 quiescent-period approach proposed for module unloading in 2.5, and
 useful for several other things (eg. hotplugging CPUs).  This method
 relies on knowing that when a schedule() has occurred on every CPU, we
 know noone is holding certain references.  The simplest example is a
 single linked list: you can traverse without a lock as long as you
 don't sleep, and then someone can unlink a node, and wait for a
 schedule on every other CPU before freeing it.  The non-SMP case is a
 noop.  See synchonize_kernel() below.

So, to make sure I understand this, the code to free a node would look
like:

prev-next = node-next; /* assumed to be atomic */
synchronize_kernel();
free(node);

So that any other CPU concurrently traversing the list would see a
consistent state, either including or not including "node" before the
call to synchronize_kernel(); but after synchronize_kernel() all other
CPUs are guaranteed to see a list that no longer includes "node", so it
is now safe to free it.

It looks like there are also implicit assumptions to this approach, like
no other CPU is trying to use the same approach simultaneously to free
"prev".  So my initial reaction is that this approach is, like the
manual locking schemes you commented on above, open to being subtly
broken when people don't understand all the implicit assumptions and
subsequently invalidate them.

 This, too, is soluble, but it means that synchronize_kernel() must
 guarantee that each task which was running or preempted in kernel
 space when it was called, has been non-preemtively scheduled before
 synchronize_kernel() can exit.  Icky.

Yes, you're right.

 Thoughts?

Perhaps synchronize_kernel() could take the run_queue lock, mark all the
tasks on it and count them.  Any task marked when it calls schedule()
voluntarily (but not if it is preempted) is unmarked and the count
decremented.  synchronize_kernel() continues until the count is zero.
As you said, "Icky."

 /* We could keep a schedule count for each CPU and make idle tasks
schedule (some don't unless need_resched), but this scales quite
well (eg. 64 processors, average time to wait for first schedule =
jiffie/64.  Total time for all processors = jiffie/63 + jiffie/62...
 
At 1024 cpus, this is about 7.5 jiffies.  And that assumes noone
schedules early. --RR */
 void synchronize_kernel(void)
 {
   unsigned long cpus_allowed, policy, rt_priority;
 
   /* Save current state */
   cpus_allowed = current-cpus_allowed;
   policy = current-policy;
   rt_priority = current-rt_priority;
 
   /* Create an unreal time task. */
   current-policy = SCHED_FIFO;
   current-rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO);
 
   /* Make us schedulable on all CPUs. */
   current-cpus_allowed = (1ULsmp_num_cpus)-1;
   
   /* Eliminate current cpu, reschedule */
   while ((current-cpus_allowed = ~(1  smp_processor_id())) != 0)
   schedule();
 
   /* Back to normal. */
   current-cpus_allowed = cpus_allowed;
   current-policy = policy;
   current-rt_priority = rt_priority;
 }
 

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Nigel Gamble

On Tue, 20 Mar 2001, Keith Owens wrote:
 The preemption patch only allows preemption from interrupt and only for
 a single level of preemption.  That coexists quite happily with
 synchronize_kernel() which runs in user context.  Just count user
 context schedules (preempt_count == 0), not preemptive schedules.

I'm not sure what you mean by "only for a single level of preemption."
It's possible for a preempting process to be preempted itself by a
higher priority process, and for that process to be preempted by an even
higher priority one, limited only by the number of processes waiting for
interrupt handlers to make them runnable.  This isn't very likely in
practice (kernel preemptions tend to be rare compared to normal calls to
schedule()), but it could happen in theory.

If you're looking at preempt_schedule(), note the call to ctx_sw_off()
only increments current-preempt_count for the preempted task - the
higher priority preempting task that is about to be scheduled will have
a preempt_count of 0.

Nigel Gamble[EMAIL PROTECTED]
Mountain View, CA, USA. http://www.nrg.org/

MontaVista Software [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Keith Owens

On Tue, 20 Mar 2001 16:48:01 -0800 (PST), 
Nigel Gamble [EMAIL PROTECTED] wrote:
On Tue, 20 Mar 2001, Keith Owens wrote:
 The preemption patch only allows preemption from interrupt and only for
 a single level of preemption.  That coexists quite happily with
 synchronize_kernel() which runs in user context.  Just count user
 context schedules (preempt_count == 0), not preemptive schedules.

If you're looking at preempt_schedule(), note the call to ctx_sw_off()
only increments current-preempt_count for the preempted task - the
higher priority preempting task that is about to be scheduled will have
a preempt_count of 0.

I misread the code, but the idea is still correct.  Add a preemption
depth counter to each cpu, when you schedule and the depth is zero then
you know that the cpu is no longer holding any references to quiesced
structures.

So, to make sure I understand this, the code to free a node would look
like:

   prev-next = node-next; /* assumed to be atomic */
   synchronize_kernel();
   free(node);

So that any other CPU concurrently traversing the list would see a
consistent state, either including or not including "node" before the
call to synchronize_kernel(); but after synchronize_kernel() all other
CPUs are guaranteed to see a list that no longer includes "node", so it
is now safe to free it.

It looks like there are also implicit assumptions to this approach, like
no other CPU is trying to use the same approach simultaneously to free
"prev".

Not quite.  The idea is that readers can traverse lists without locks,
code that changes the list needs to take a semaphore first.

Read
node = node-next;

Update
down(list_sem);
prev-next = node-next;
synchronize_kernel();
free(node);
up(list_sem);

Because the readers have no locks, other cpus can have references to
the node being freed.  The updating code needs to wait until all other
cpus have gone through at least one schedule to ensure that all
dangling references have been flushed.  Adding preemption complicates
this slightly, we have to distinguish between the bottom level schedule
and higher level schedules for preemptive code.  Only when all
preemptive code on a cpu has ended is it safe to say that there are no
dangling references left on that cpu.

This method is a win for high read, low update lists.  Instead of
penalizing the read code every time on the off chance that somebody
will update the data, speed up the common code and penalize the update
code.  The classic example is module code, it is rarely unloaded but
right now everything that *might* be entering a module has to grab the
module spin lock and update the module use count.  So main line code is
being slowed down all the time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   >