Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
> > But if you are suppressing preemption in all read-side critical sections, > > then wouldn't any already-preempted tasks be guaranteed to -not- be in > > a read-side critical section, and therefore be guaranteed to be unaffected > > by the update (in other words, wouldn't such tasks not need to be waited > > for)? > > Ah, if you want to inc and dec all the time, yes. But even if the > performance isn't hurt, it's unneccessary, and something else people > have to remember to do. I must admit that free is a very good price. > Simplicity is very nice. And in the case of module unload, gives us > the ability to avoid the distinction between "am I calling into a > module?" and "is this fixed in the kernel?" at runtime. A very good > thing 8) Is it also desireable to avoid the distinction between "the currently executing code is in a module" and "the currently executing code is fixed in the kernel"? > Rusty. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
But if you are suppressing preemption in all read-side critical sections, then wouldn't any already-preempted tasks be guaranteed to -not- be in a read-side critical section, and therefore be guaranteed to be unaffected by the update (in other words, wouldn't such tasks not need to be waited for)? Ah, if you want to inc and dec all the time, yes. But even if the performance isn't hurt, it's unneccessary, and something else people have to remember to do. I must admit that free is a very good price. Simplicity is very nice. And in the case of module unload, gives us the ability to avoid the distinction between "am I calling into a module?" and "is this fixed in the kernel?" at runtime. A very good thing 8) Is it also desireable to avoid the distinction between "the currently executing code is in a module" and "the currently executing code is fixed in the kernel"? Rusty. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
In messageyou write: > > Already preempted tasks. > > But if you are suppressing preemption in all read-side critical sections, > then wouldn't any already-preempted tasks be guaranteed to -not- be in > a read-side critical section, and therefore be guaranteed to be unaffected > by the update (in other words, wouldn't such tasks not need to be waited > for)? Ah, if you want to inc and dec all the time, yes. But even if the performance isn't hurt, it's unneccessary, and something else people have to remember to do. Simplicity is very nice. And in the case of module unload, gives us the ability to avoid the distinction between "am I calling into a module?" and "is this fixed in the kernel?" at runtime. A very good thing 8) Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
In message OF42269F5F.CDF56B0F-ON88256A27.0083566F@LocalDomain you write: Already preempted tasks. But if you are suppressing preemption in all read-side critical sections, then wouldn't any already-preempted tasks be guaranteed to -not- be in a read-side critical section, and therefore be guaranteed to be unaffected by the update (in other words, wouldn't such tasks not need to be waited for)? Ah, if you want to inc and dec all the time, yes. But even if the performance isn't hurt, it's unneccessary, and something else people have to remember to do. Simplicity is very nice. And in the case of module unload, gives us the ability to avoid the distinction between "am I calling into a module?" and "is this fixed in the kernel?" at runtime. A very good thing 8) Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, 10 Apr 2001 [EMAIL PROTECTED] wrote: > On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote: > > > Disabling preemption is a possible solution if the critical section is > > short > > > - less than 100us - otherwise preemption latencies become a problem. > > > > Seems like a reasonable restriction. Of course, this same limit applies > > to locks and interrupt disabling, right? > > So supposing 1/2 us per update > lock process list > for every process update pgd > unlock process list > > is ok if #processes < 200, but can cause some unspecified system failure > due to a dependency on the 100us limit otherwise? Only to a hard real-time system. > And on a slower machine or with some heavy I/O possibilities I'm mostly interested in Linux in embedded systems, where we have a lot of control over the overall system, such as how many processes are running. This makes it easier to control latencies than on a general purpose computer. > We have a tiny little kernel to worry about inRTLinux and it's quite > hard for us to keep track of all possible delays in such cases. How's this > going to work for Linux? The same way everything works for Linux: with enough people around the world interested in and working on these problems, they will be fixed. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, 10 Apr 2001, Paul McKenney wrote: > > Disabling preemption is a possible solution if the critical section > > is > short > > - less than 100us - otherwise preemption latencies become a problem. > > Seems like a reasonable restriction. Of course, this same limit > applies to locks and interrupt disabling, right? That's the goal I'd like to see us achieve in 2.5. Interrupts are already in this range (with a few notable exceptions), but there is still the big kernel lock and a few other long held spin locks to deal with. So I want to make sure that any new locking scheme like the ones under discussion play nicely with the efforts to achieve low-latency Linux such as the preemptible kernel. > > The implementation of synchronize_kernel() that Rusty and I > > discussed earlier in this thread would work in other cases, such as > > module unloading, where there was a concern that it was not > > practical to have any sort of lock in the read-side code path and > > the write side was not time critical. > > True, but only if the synchronize_kernel() implementation is applied > to UP kernels, also. Yes, that is the idea. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, 10 Apr 2001, Paul McKenney wrote: Disabling preemption is a possible solution if the critical section is short - less than 100us - otherwise preemption latencies become a problem. Seems like a reasonable restriction. Of course, this same limit applies to locks and interrupt disabling, right? That's the goal I'd like to see us achieve in 2.5. Interrupts are already in this range (with a few notable exceptions), but there is still the big kernel lock and a few other long held spin locks to deal with. So I want to make sure that any new locking scheme like the ones under discussion play nicely with the efforts to achieve low-latency Linux such as the preemptible kernel. The implementation of synchronize_kernel() that Rusty and I discussed earlier in this thread would work in other cases, such as module unloading, where there was a concern that it was not practical to have any sort of lock in the read-side code path and the write side was not time critical. True, but only if the synchronize_kernel() implementation is applied to UP kernels, also. Yes, that is the idea. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, 10 Apr 2001 [EMAIL PROTECTED] wrote: On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote: Disabling preemption is a possible solution if the critical section is short - less than 100us - otherwise preemption latencies become a problem. Seems like a reasonable restriction. Of course, this same limit applies to locks and interrupt disabling, right? So supposing 1/2 us per update lock process list for every process update pgd unlock process list is ok if #processes 200, but can cause some unspecified system failure due to a dependency on the 100us limit otherwise? Only to a hard real-time system. And on a slower machine or with some heavy I/O possibilities I'm mostly interested in Linux in embedded systems, where we have a lot of control over the overall system, such as how many processes are running. This makes it easier to control latencies than on a general purpose computer. We have a tiny little kernel to worry about inRTLinux and it's quite hard for us to keep track of all possible delays in such cases. How's this going to work for Linux? The same way everything works for Linux: with enough people around the world interested in and working on these problems, they will be fixed. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote: > > Disabling preemption is a possible solution if the critical section is > short > > - less than 100us - otherwise preemption latencies become a problem. > > Seems like a reasonable restriction. Of course, this same limit applies > to locks and interrupt disabling, right? So supposing 1/2 us per update lock process list for every process update pgd unlock process list is ok if #processes < 200, but can cause some unspecified system failure due to a dependency on the 100us limit otherwise? And on a slower machine or with some heavy I/O possibilities We have a tiny little kernel to worry about inRTLinux and it's quite hard for us to keep track of all possible delays in such cases. How's this going to work for Linux? -- - Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
> On Tue, 10 Apr 2001, Paul McKenney wrote: > > The algorithms we have been looking at need to have absolute guarantees > > that earlier activity has completed. The most straightforward way to > > guarantee this is to have the critical-section activity run with preemption > > disabled. Most of these code segments either take out locks or run > > with interrupts disabled anyway, so there is little or no degradation of > > latency in this case. In fact, in many cases, latency would actually be > > improved due to removal of explicit locking primitives. > > > > I believe that one of the issues that pushes in this direction is the > > discovery that "synchronize_kernel()" could not be a nop in a UP kernel > > unless the read-side critical sections disable preemption (either in > > the natural course of events, or artificially if need be). Andi or > > Rusty can correct me if I missed something in the previous exchange... > > > > The read-side code segments are almost always quite short, and, again, > > they would almost always otherwise need to be protected by a lock of > > some sort, which would disable preemption in any event. > > > > Thoughts? > > Disabling preemption is a possible solution if the critical section is short > - less than 100us - otherwise preemption latencies become a problem. Seems like a reasonable restriction. Of course, this same limit applies to locks and interrupt disabling, right? > The implementation of synchronize_kernel() that Rusty and I discussed > earlier in this thread would work in other cases, such as module > unloading, where there was a concern that it was not practical to have > any sort of lock in the read-side code path and the write side was not > time critical. True, but only if the synchronize_kernel() implementation is applied to UP kernels, also. Thanx, Paul > Nigel Gamble[EMAIL PROTECTED] > Mountain View, CA, USA. http://www.nrg.org/ > > MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, 10 Apr 2001, Paul McKenney wrote: > The algorithms we have been looking at need to have absolute guarantees > that earlier activity has completed. The most straightforward way to > guarantee this is to have the critical-section activity run with preemption > disabled. Most of these code segments either take out locks or run > with interrupts disabled anyway, so there is little or no degradation of > latency in this case. In fact, in many cases, latency would actually be > improved due to removal of explicit locking primitives. > > I believe that one of the issues that pushes in this direction is the > discovery that "synchronize_kernel()" could not be a nop in a UP kernel > unless the read-side critical sections disable preemption (either in > the natural course of events, or artificially if need be). Andi or > Rusty can correct me if I missed something in the previous exchange... > > The read-side code segments are almost always quite short, and, again, > they would almost always otherwise need to be protected by a lock of > some sort, which would disable preemption in any event. > > Thoughts? Disabling preemption is a possible solution if the critical section is short - less than 100us - otherwise preemption latencies become a problem. The implementation of synchronize_kernel() that Rusty and I discussed earlier in this thread would work in other cases, such as module unloading, where there was a concern that it was not practical to have any sort of lock in the read-side code path and the write side was not time critical. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
> > As you've observed, with the approach of waiting for all pre-empted tasks > > to synchronize, the possibility of a task staying pre-empted for a long > > time could affect the latency of an update/synchonize (though its hard for > > me to judge how likely that is). > > It's very unlikely on a system that doesn't already have problems with > CPU starvation because of runaway real-time tasks or interrupt handlers. Agreed! > First, preemption is a comparitively rare event with a mostly > timesharing load, typically from 1% to 10% of all context switches. Again, agreed! > Second, the scheduler should not penalize the preempted task for being > preempted, so that it should usually get to continue running as soon as > the preempting task is descheduled, which is at most one timeslice for > timesharing tasks. The algorithms we have been looking at need to have absolute guarantees that earlier activity has completed. The most straightforward way to guarantee this is to have the critical-section activity run with preemption disabled. Most of these code segments either take out locks or run with interrupts disabled anyway, so there is little or no degradation of latency in this case. In fact, in many cases, latency would actually be improved due to removal of explicit locking primitives. I believe that one of the issues that pushes in this direction is the discovery that "synchronize_kernel()" could not be a nop in a UP kernel unless the read-side critical sections disable preemption (either in the natural course of events, or artificially if need be). Andi or Rusty can correct me if I missed something in the previous exchange... The read-side code segments are almost always quite short, and, again, they would almost always otherwise need to be protected by a lock of some sort, which would disable preemption in any event. Thoughts? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
As you've observed, with the approach of waiting for all pre-empted tasks to synchronize, the possibility of a task staying pre-empted for a long time could affect the latency of an update/synchonize (though its hard for me to judge how likely that is). It's very unlikely on a system that doesn't already have problems with CPU starvation because of runaway real-time tasks or interrupt handlers. Agreed! First, preemption is a comparitively rare event with a mostly timesharing load, typically from 1% to 10% of all context switches. Again, agreed! Second, the scheduler should not penalize the preempted task for being preempted, so that it should usually get to continue running as soon as the preempting task is descheduled, which is at most one timeslice for timesharing tasks. The algorithms we have been looking at need to have absolute guarantees that earlier activity has completed. The most straightforward way to guarantee this is to have the critical-section activity run with preemption disabled. Most of these code segments either take out locks or run with interrupts disabled anyway, so there is little or no degradation of latency in this case. In fact, in many cases, latency would actually be improved due to removal of explicit locking primitives. I believe that one of the issues that pushes in this direction is the discovery that "synchronize_kernel()" could not be a nop in a UP kernel unless the read-side critical sections disable preemption (either in the natural course of events, or artificially if need be). Andi or Rusty can correct me if I missed something in the previous exchange... The read-side code segments are almost always quite short, and, again, they would almost always otherwise need to be protected by a lock of some sort, which would disable preemption in any event. Thoughts? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, 10 Apr 2001, Paul McKenney wrote: The algorithms we have been looking at need to have absolute guarantees that earlier activity has completed. The most straightforward way to guarantee this is to have the critical-section activity run with preemption disabled. Most of these code segments either take out locks or run with interrupts disabled anyway, so there is little or no degradation of latency in this case. In fact, in many cases, latency would actually be improved due to removal of explicit locking primitives. I believe that one of the issues that pushes in this direction is the discovery that "synchronize_kernel()" could not be a nop in a UP kernel unless the read-side critical sections disable preemption (either in the natural course of events, or artificially if need be). Andi or Rusty can correct me if I missed something in the previous exchange... The read-side code segments are almost always quite short, and, again, they would almost always otherwise need to be protected by a lock of some sort, which would disable preemption in any event. Thoughts? Disabling preemption is a possible solution if the critical section is short - less than 100us - otherwise preemption latencies become a problem. The implementation of synchronize_kernel() that Rusty and I discussed earlier in this thread would work in other cases, such as module unloading, where there was a concern that it was not practical to have any sort of lock in the read-side code path and the write side was not time critical. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, 10 Apr 2001, Paul McKenney wrote: The algorithms we have been looking at need to have absolute guarantees that earlier activity has completed. The most straightforward way to guarantee this is to have the critical-section activity run with preemption disabled. Most of these code segments either take out locks or run with interrupts disabled anyway, so there is little or no degradation of latency in this case. In fact, in many cases, latency would actually be improved due to removal of explicit locking primitives. I believe that one of the issues that pushes in this direction is the discovery that "synchronize_kernel()" could not be a nop in a UP kernel unless the read-side critical sections disable preemption (either in the natural course of events, or artificially if need be). Andi or Rusty can correct me if I missed something in the previous exchange... The read-side code segments are almost always quite short, and, again, they would almost always otherwise need to be protected by a lock of some sort, which would disable preemption in any event. Thoughts? Disabling preemption is a possible solution if the critical section is short - less than 100us - otherwise preemption latencies become a problem. Seems like a reasonable restriction. Of course, this same limit applies to locks and interrupt disabling, right? The implementation of synchronize_kernel() that Rusty and I discussed earlier in this thread would work in other cases, such as module unloading, where there was a concern that it was not practical to have any sort of lock in the read-side code path and the write side was not time critical. True, but only if the synchronize_kernel() implementation is applied to UP kernels, also. Thanx, Paul Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Tue, Apr 10, 2001 at 09:08:16PM -0700, Paul McKenney wrote: Disabling preemption is a possible solution if the critical section is short - less than 100us - otherwise preemption latencies become a problem. Seems like a reasonable restriction. Of course, this same limit applies to locks and interrupt disabling, right? So supposing 1/2 us per update lock process list for every process update pgd unlock process list is ok if #processes 200, but can cause some unspecified system failure due to a dependency on the 100us limit otherwise? And on a slower machine or with some heavy I/O possibilities We have a tiny little kernel to worry about inRTLinux and it's quite hard for us to keep track of all possible delays in such cases. How's this going to work for Linux? -- - Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Mon, 9 Apr 2001 [EMAIL PROTECTED] wrote: > As you've observed, with the approach of waiting for all pre-empted tasks > to synchronize, the possibility of a task staying pre-empted for a long > time could affect the latency of an update/synchonize (though its hard for > me to judge how likely that is). It's very unlikely on a system that doesn't already have problems with CPU starvation because of runaway real-time tasks or interrupt handlers. First, preemption is a comparitively rare event with a mostly timesharing load, typically from 1% to 10% of all context switches. Second, the scheduler should not penalize the preempted task for being preempted, so that it should usually get to continue running as soon as the preempting task is descheduled, which is at most one timeslice for timesharing tasks. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
>One question: >isn't it the case that the alternative to using synchronize_kernel() >is to protect the read side with explicit locks, which will themselves >suppress preemption? If so, why not just suppress preemption on the read >side in preemptible kernels, and thus gain the simpler implementation >of synchronize_kernel()? You are not losing any preemption latency >compared to a kernel that uses traditional locks, in fact, you should >improve latency a bit since the lock operations are more expensive than >are simple increments and decrements. As usual, what am I missing >here? ;-) >... >... >I still prefer suppressing preemption on the read side, though I >suppose one could claim that this is only because I am -really- >used to it. ;-) Since this point has come up , I just wanted to mention that it may still be nice to be able to do without explicit locks on the read-side. This is not so much for performance reasons (I agree with your assessment on that point) as for convinience / flexibility in the kind of situations where this concept (i.e. synchronize_kernel or read-copy-update) could be used. For example, consider situations where it is an executable code block that is being protected. The read side is essentially the execution of that code block - i.e. every entry/exit into the code block. This is perhaps the case with module unload races. Having to acquire a read-lock explicitly before every entry point seems to reduce the simplicity of the solution, doesn't it ? This is also the case with kernel code patching, which I agree, may appear to be a rather unlikely application of this concept to handle races in multi-byte code patching on a running kernel, a rather difficult problem, otherwise. In this case, the read-side is totally unaware of the possibility of an updater modifying the code, so it isn't even possible for a read-lock to be acquired explicitly (if we wish to have the flexibility of being able to patch any portion of the code). Have been discussing this with Dipankar last week, so I realize that the above situations were perhaps not what these locking mechanisms were intended for, but just thought I'd bring up this perspective. As you've observed, with the approach of waiting for all pre-empted tasks to synchronize, the possibility of a task staying pre-empted for a long time could affect the latency of an update/synchonize (though its hard for me to judge how likely that is). Besides, as Andi pointed out, there probably are a lot of situations where the readers are not pre-emptible anyway, so that waiting for all pre-empted tasks may be superfluos. Given these possibilities, does it make sense to simply let the updater/synchronize kernel specify an option indicating whether it would wait for pre-empted tasks or not ? Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : [EMAIL PROTECTED] Phone : 91-80-5267117, Extn : 2525 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
One question: isn't it the case that the alternative to using synchronize_kernel() is to protect the read side with explicit locks, which will themselves suppress preemption? If so, why not just suppress preemption on the read side in preemptible kernels, and thus gain the simpler implementation of synchronize_kernel()? You are not losing any preemption latency compared to a kernel that uses traditional locks, in fact, you should improve latency a bit since the lock operations are more expensive than are simple increments and decrements. As usual, what am I missing here? ;-) ... ... I still prefer suppressing preemption on the read side, though I suppose one could claim that this is only because I am -really- used to it. ;-) Since this point has come up , I just wanted to mention that it may still be nice to be able to do without explicit locks on the read-side. This is not so much for performance reasons (I agree with your assessment on that point) as for convinience / flexibility in the kind of situations where this concept (i.e. synchronize_kernel or read-copy-update) could be used. For example, consider situations where it is an executable code block that is being protected. The read side is essentially the execution of that code block - i.e. every entry/exit into the code block. This is perhaps the case with module unload races. Having to acquire a read-lock explicitly before every entry point seems to reduce the simplicity of the solution, doesn't it ? This is also the case with kernel code patching, which I agree, may appear to be a rather unlikely application of this concept to handle races in multi-byte code patching on a running kernel, a rather difficult problem, otherwise. In this case, the read-side is totally unaware of the possibility of an updater modifying the code, so it isn't even possible for a read-lock to be acquired explicitly (if we wish to have the flexibility of being able to patch any portion of the code). Have been discussing this with Dipankar last week, so I realize that the above situations were perhaps not what these locking mechanisms were intended for, but just thought I'd bring up this perspective. As you've observed, with the approach of waiting for all pre-empted tasks to synchronize, the possibility of a task staying pre-empted for a long time could affect the latency of an update/synchonize (though its hard for me to judge how likely that is). Besides, as Andi pointed out, there probably are a lot of situations where the readers are not pre-emptible anyway, so that waiting for all pre-empted tasks may be superfluos. Given these possibilities, does it make sense to simply let the updater/synchronize kernel specify an option indicating whether it would wait for pre-empted tasks or not ? Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : [EMAIL PROTECTED] Phone : 91-80-5267117, Extn : 2525 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Mon, 9 Apr 2001 [EMAIL PROTECTED] wrote: As you've observed, with the approach of waiting for all pre-empted tasks to synchronize, the possibility of a task staying pre-empted for a long time could affect the latency of an update/synchonize (though its hard for me to judge how likely that is). It's very unlikely on a system that doesn't already have problems with CPU starvation because of runaway real-time tasks or interrupt handlers. First, preemption is a comparitively rare event with a mostly timesharing load, typically from 1% to 10% of all context switches. Second, the scheduler should not penalize the preempted task for being preempted, so that it should usually get to continue running as soon as the preempting task is descheduled, which is at most one timeslice for timesharing tasks. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
> > > > 2. Isn't it possible to get in trouble even on a UP if a task > > > > is preempted in a critical region? For example, suppose the > > > > preempting task does a synchronize_kernel()? > > > > > > Ugly. I guess one way to solve it would be to readd the 2.2 scheduler > > > taskqueue, and just queue a scheduler callback in this case. > > > > Another approach would be to define a "really low" priority that noone > > other than synchronize_kernel() was allowed to use. Then the UP > > implementation of synchronize_kernel() could drop its priority to > > this level, yield the CPU, and know that all preempted tasks must > > have obtained and voluntarily yielded the CPU before synchronize_kernel () > > gets it back again. > > That just would allow nasty starvation, e.g. when someone runs a cpu intensive > screensaver or a seti-at-home. Good point! I hereby withdraw my suggested use of ultra-low priorities for UP implementations of synchronize_kernel(). ;-) > > I still prefer suppressing preemption on the read side, though I > > suppose one could claim that this is only because I am -really- > > used to it. ;-) > > For a lot of reader cases non-preemption by threads is guaranteed anyways -- > e.g. anything that runs in interrupts, timers, tasklets and network softirq. > I think that already covers a lot of interesting cases. Good point again! For example, this does cover most of the TCP/IP cases, right? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
> > I see your point here, but need to think about it. One question: > > isn't it the case that the alternative to using synchronize_kernel() > > is to protect the read side with explicit locks, which will themselves > > suppress preemption? If so, why not just suppress preemption on the read > > side in preemptible kernels, and thus gain the simpler implementation > > of synchronize_kernel()? You are not losing any preemption latency > > compared to a kernel that uses traditional locks, in fact, you should > > improve latency a bit since the lock operations are more expensive than > > are simple increments and decrements. As usual, what am I missing > > here? ;-) > > Already preempted tasks. But if you are suppressing preemption in all read-side critical sections, then wouldn't any already-preempted tasks be guaranteed to -not- be in a read-side critical section, and therefore be guaranteed to be unaffected by the update (in other words, wouldn't such tasks not need to be waited for)? > > Another approach would be to define a "really low" priority that noone > > other than synchronize_kernel() was allowed to use. Then the UP > > implementation of synchronize_kernel() could drop its priority to > > this level, yield the CPU, and know that all preempted tasks must > > have obtained and voluntarily yielded the CPU before synchronize_kernel () > > gets it back again. > > Or "never", because I'm running RC5 etc. 8(. U... Good point! Never mind use of low priorities in UP kernels for synchronize_kernel()... Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Fri, Apr 06, 2001 at 06:25:36PM -0700, Paul McKenney wrote: > I see your point here, but need to think about it. One question: > isn't it the case that the alternative to using synchronize_kernel() > is to protect the read side with explicit locks, which will themselves > suppress preemption? If so, why not just suppress preemption on the read > side in preemptible kernels, and thus gain the simpler implementation > of synchronize_kernel()? You are not losing any preemption latency > compared to a kernel that uses traditional locks, in fact, you should > improve latency a bit since the lock operations are more expensive than > are simple increments and decrements. As usual, what am I missing > here? ;-) You miss nothing I think. In fact it's already used (see below) > > > > 2. Isn't it possible to get in trouble even on a UP if a task > > > is preempted in a critical region? For example, suppose the > > > preempting task does a synchronize_kernel()? > > > > Ugly. I guess one way to solve it would be to readd the 2.2 scheduler > > taskqueue, and just queue a scheduler callback in this case. > > Another approach would be to define a "really low" priority that noone > other than synchronize_kernel() was allowed to use. Then the UP > implementation of synchronize_kernel() could drop its priority to > this level, yield the CPU, and know that all preempted tasks must > have obtained and voluntarily yielded the CPU before synchronize_kernel() > gets it back again. That just would allow nasty starvation, e.g. when someone runs a cpu intensive screensaver or a seti-at-home. > > I still prefer suppressing preemption on the read side, though I > suppose one could claim that this is only because I am -really- > used to it. ;-) For a lot of reader cases non-preemption by threads is guaranteed anyways -- e.g. anything that runs in interrupts, timers, tasklets and network softirq. I think that already covers a lot of interesting cases. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In messageyou write: > > Priority inversion is not handled in Linux kernel ATM BTW, there > > are already situations where a realtime task can cause a deadlock > > with some lower priority system thread (I believe there is at least > > one case of this known with realtime ntpd on 2.4) > > I see your point here, but need to think about it. One question: > isn't it the case that the alternative to using synchronize_kernel() > is to protect the read side with explicit locks, which will themselves > suppress preemption? If so, why not just suppress preemption on the read > side in preemptible kernels, and thus gain the simpler implementation > of synchronize_kernel()? You are not losing any preemption latency > compared to a kernel that uses traditional locks, in fact, you should > improve latency a bit since the lock operations are more expensive than > are simple increments and decrements. As usual, what am I missing > here? ;-) Already preempted tasks. > Another approach would be to define a "really low" priority that noone > other than synchronize_kernel() was allowed to use. Then the UP > implementation of synchronize_kernel() could drop its priority to > this level, yield the CPU, and know that all preempted tasks must > have obtained and voluntarily yielded the CPU before synchronize_kernel() > gets it back again. Or "never", because I'm running RC5 etc. 8(. Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Andi, thank you for the background! More comments interspersed... > On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote: > > 1. On a busy system, isn't it possible for a preempted task > > to stay preempted for a -long- time, especially if there are > > lots of real-time tasks in the mix? > > The problem you're describing is probably considered too hard to > solve properly (bad answer, but that is how it is currently) > > Yes there is. You can also force a normal (non preemptive) kernel > into complete livelock by just giving it enough network traffic > to do, so that it always works in the high priority network > softirq or doing the same with some other interrupt. > > Just when this happens a lot of basic things will stop working (like > page cleaning, IO flushing etc.), so your callbacks or module unloads > not running are probably the least of your worries. > > The same problem applies to a smaller scale to real time processes; > kernel services normally do not run real-time, so they can be starved. > > Priority inversion is not handled in Linux kernel ATM BTW, there > are already situations where a realtime task can cause a deadlock > with some lower priority system thread (I believe there is at least > one case of this known with realtime ntpd on 2.4) I see your point here, but need to think about it. One question: isn't it the case that the alternative to using synchronize_kernel() is to protect the read side with explicit locks, which will themselves suppress preemption? If so, why not just suppress preemption on the read side in preemptible kernels, and thus gain the simpler implementation of synchronize_kernel()? You are not losing any preemption latency compared to a kernel that uses traditional locks, in fact, you should improve latency a bit since the lock operations are more expensive than are simple increments and decrements. As usual, what am I missing here? ;-) > > 2. Isn't it possible to get in trouble even on a UP if a task > > is preempted in a critical region? For example, suppose the > > preempting task does a synchronize_kernel()? > > Ugly. I guess one way to solve it would be to readd the 2.2 scheduler > taskqueue, and just queue a scheduler callback in this case. Another approach would be to define a "really low" priority that noone other than synchronize_kernel() was allowed to use. Then the UP implementation of synchronize_kernel() could drop its priority to this level, yield the CPU, and know that all preempted tasks must have obtained and voluntarily yielded the CPU before synchronize_kernel() gets it back again. I still prefer suppressing preemption on the read side, though I suppose one could claim that this is only because I am -really- used to it. ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Andi, thank you for the background! More comments interspersed... On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote: 1. On a busy system, isn't it possible for a preempted task to stay preempted for a -long- time, especially if there are lots of real-time tasks in the mix? The problem you're describing is probably considered too hard to solve properly (bad answer, but that is how it is currently) Yes there is. You can also force a normal (non preemptive) kernel into complete livelock by just giving it enough network traffic to do, so that it always works in the high priority network softirq or doing the same with some other interrupt. Just when this happens a lot of basic things will stop working (like page cleaning, IO flushing etc.), so your callbacks or module unloads not running are probably the least of your worries. The same problem applies to a smaller scale to real time processes; kernel services normally do not run real-time, so they can be starved. Priority inversion is not handled in Linux kernel ATM BTW, there are already situations where a realtime task can cause a deadlock with some lower priority system thread (I believe there is at least one case of this known with realtime ntpd on 2.4) I see your point here, but need to think about it. One question: isn't it the case that the alternative to using synchronize_kernel() is to protect the read side with explicit locks, which will themselves suppress preemption? If so, why not just suppress preemption on the read side in preemptible kernels, and thus gain the simpler implementation of synchronize_kernel()? You are not losing any preemption latency compared to a kernel that uses traditional locks, in fact, you should improve latency a bit since the lock operations are more expensive than are simple increments and decrements. As usual, what am I missing here? ;-) 2. Isn't it possible to get in trouble even on a UP if a task is preempted in a critical region? For example, suppose the preempting task does a synchronize_kernel()? Ugly. I guess one way to solve it would be to readd the 2.2 scheduler taskqueue, and just queue a scheduler callback in this case. Another approach would be to define a "really low" priority that noone other than synchronize_kernel() was allowed to use. Then the UP implementation of synchronize_kernel() could drop its priority to this level, yield the CPU, and know that all preempted tasks must have obtained and voluntarily yielded the CPU before synchronize_kernel() gets it back again. I still prefer suppressing preemption on the read side, though I suppose one could claim that this is only because I am -really- used to it. ;-) Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message OF37B0793C.6B15F182-ON88256A27.0007C3EF@LocalDomain you write: Priority inversion is not handled in Linux kernel ATM BTW, there are already situations where a realtime task can cause a deadlock with some lower priority system thread (I believe there is at least one case of this known with realtime ntpd on 2.4) I see your point here, but need to think about it. One question: isn't it the case that the alternative to using synchronize_kernel() is to protect the read side with explicit locks, which will themselves suppress preemption? If so, why not just suppress preemption on the read side in preemptible kernels, and thus gain the simpler implementation of synchronize_kernel()? You are not losing any preemption latency compared to a kernel that uses traditional locks, in fact, you should improve latency a bit since the lock operations are more expensive than are simple increments and decrements. As usual, what am I missing here? ;-) Already preempted tasks. Another approach would be to define a "really low" priority that noone other than synchronize_kernel() was allowed to use. Then the UP implementation of synchronize_kernel() could drop its priority to this level, yield the CPU, and know that all preempted tasks must have obtained and voluntarily yielded the CPU before synchronize_kernel() gets it back again. Or "never", because I'm running RC5 etc. 8(. Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
On Fri, Apr 06, 2001 at 06:25:36PM -0700, Paul McKenney wrote: I see your point here, but need to think about it. One question: isn't it the case that the alternative to using synchronize_kernel() is to protect the read side with explicit locks, which will themselves suppress preemption? If so, why not just suppress preemption on the read side in preemptible kernels, and thus gain the simpler implementation of synchronize_kernel()? You are not losing any preemption latency compared to a kernel that uses traditional locks, in fact, you should improve latency a bit since the lock operations are more expensive than are simple increments and decrements. As usual, what am I missing here? ;-) You miss nothing I think. In fact it's already used (see below) 2. Isn't it possible to get in trouble even on a UP if a task is preempted in a critical region? For example, suppose the preempting task does a synchronize_kernel()? Ugly. I guess one way to solve it would be to readd the 2.2 scheduler taskqueue, and just queue a scheduler callback in this case. Another approach would be to define a "really low" priority that noone other than synchronize_kernel() was allowed to use. Then the UP implementation of synchronize_kernel() could drop its priority to this level, yield the CPU, and know that all preempted tasks must have obtained and voluntarily yielded the CPU before synchronize_kernel() gets it back again. That just would allow nasty starvation, e.g. when someone runs a cpu intensive screensaver or a seti-at-home. I still prefer suppressing preemption on the read side, though I suppose one could claim that this is only because I am -really- used to it. ;-) For a lot of reader cases non-preemption by threads is guaranteed anyways -- e.g. anything that runs in interrupts, timers, tasklets and network softirq. I think that already covers a lot of interesting cases. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
I see your point here, but need to think about it. One question: isn't it the case that the alternative to using synchronize_kernel() is to protect the read side with explicit locks, which will themselves suppress preemption? If so, why not just suppress preemption on the read side in preemptible kernels, and thus gain the simpler implementation of synchronize_kernel()? You are not losing any preemption latency compared to a kernel that uses traditional locks, in fact, you should improve latency a bit since the lock operations are more expensive than are simple increments and decrements. As usual, what am I missing here? ;-) Already preempted tasks. But if you are suppressing preemption in all read-side critical sections, then wouldn't any already-preempted tasks be guaranteed to -not- be in a read-side critical section, and therefore be guaranteed to be unaffected by the update (in other words, wouldn't such tasks not need to be waited for)? Another approach would be to define a "really low" priority that noone other than synchronize_kernel() was allowed to use. Then the UP implementation of synchronize_kernel() could drop its priority to this level, yield the CPU, and know that all preempted tasks must have obtained and voluntarily yielded the CPU before synchronize_kernel () gets it back again. Or "never", because I'm running RC5 etc. 8(. U... Good point! Never mind use of low priorities in UP kernels for synchronize_kernel()... Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: [PATCH for 2.5] preemptible kernel
2. Isn't it possible to get in trouble even on a UP if a task is preempted in a critical region? For example, suppose the preempting task does a synchronize_kernel()? Ugly. I guess one way to solve it would be to readd the 2.2 scheduler taskqueue, and just queue a scheduler callback in this case. Another approach would be to define a "really low" priority that noone other than synchronize_kernel() was allowed to use. Then the UP implementation of synchronize_kernel() could drop its priority to this level, yield the CPU, and know that all preempted tasks must have obtained and voluntarily yielded the CPU before synchronize_kernel () gets it back again. That just would allow nasty starvation, e.g. when someone runs a cpu intensive screensaver or a seti-at-home. Good point! I hereby withdraw my suggested use of ultra-low priorities for UP implementations of synchronize_kernel(). ;-) I still prefer suppressing preemption on the read side, though I suppose one could claim that this is only because I am -really- used to it. ;-) For a lot of reader cases non-preemption by threads is guaranteed anyways -- e.g. anything that runs in interrupts, timers, tasklets and network softirq. I think that already covers a lot of interesting cases. Good point again! For example, this does cover most of the TCP/IP cases, right? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Hallo, On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote: > 1. On a busy system, isn't it possible for a preempted task > to stay preempted for a -long- time, especially if there are > lots of real-time tasks in the mix? The problem you're describing is probably considered too hard to solve properly (bad answer, but that is how it is currently) Yes there is. You can also force a normal (non preemptive) kernel into complete livelock by just giving it enough network traffic to do, so that it always works in the high priority network softirq or doing the same with some other interrupt. Just when this happens a lot of basic things will stop working (like page cleaning, IO flushing etc.), so your callbacks or module unloads not running are probably the least of your worries. The same problem applies to a smaller scale to real time processes; kernel services normally do not run real-time, so they can be starved. Priority inversion is not handled in Linux kernel ATM BTW, there are already situations where a realtime task can cause a deadlock with some lower priority system thread (I believe there is at least one case of this known with realtime ntpd on 2.4) > 2. Isn't it possible to get in trouble even on a UP if a task > is preempted in a critical region? For example, suppose the > preempting task does a synchronize_kernel()? Ugly. I guess one way to solve it would be to readd the 2.2 scheduler taskqueue, and just queue a scheduler callback in this case. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Please accept my apologies if I am missing something basic, but... 1. On a busy system, isn't it possible for a preempted task to stay preempted for a -long- time, especially if there are lots of real-time tasks in the mix? 2. Isn't it possible to get in trouble even on a UP if a task is preempted in a critical region? For example, suppose the preempting task does a synchronize_kernel()? If I understand the preemptible kernel patch, suppressing preemption is quite cheap -- just increment preempt_count before and decrement it after, with no locks, atomic operations, or disabling of interrupts required. I understand that disabling preemption for any long time is a Bad Thing, however, Dipankar's patch is able to detect situations where preemption has been suppressed for too long. In addition, Rusty's original synchronize_kernel() patch is much simpler than the two that wait for preempted tasks to reach a voluntary context switch. What am I missing here? Thanx, Paul > > Here is an attempt at a possible version of synchronize_kernel() that > > should work on a preemptible kernel. I haven't tested it yet. > > It's close, but... > > Those who suggest that we don't do preemtion on SMP make this much > easier (synchronize_kernel() is a NOP on UP), and I'm starting to > agree with them. Anyway: > > > if (p->state == TASK_RUNNING || > > (p->state == (TASK_RUNNING|TASK_PREEMPTED))) { > > p->flags |= PF_SYNCING; > > Setting a running task's flags brings races, AFAICT, and checking > p->state is NOT sufficient, consider wait_event(): you need p->has_cpu > here I think. You could do it for TASK_PREEMPTED only, but you'd have > to do the "unreal priority" part of synchronize_kernel() with some > method to say "don't preempt anyone", but it will hurt latency. > Hmmm... > > The only way I can see is to have a new element in "struct > task_struct" saying "syncing now", which is protected by the runqueue > lock. This looks like (and I prefer wait queues, they have such nice > helpers): > > static DECLARE_WAIT_QUEUE_HEAD(syncing_task); > static DECLARE_MUTEX(synchronize_kernel_mtx); > static int sync_count = 0; > > schedule(): > if (!(prev->state & TASK_PREEMPTED) && prev->syncing) > if (--sync_count == 0) wake_up(_task); > > synchronize_kernel(): > { > struct list_head *tmp; > struct task_struct *p; > > /* Guard against multiple calls to this function */ > down(_kernel_mtx); > > /* Everyone running now or currently preempted must >voluntarily schedule before we know we are safe. */ > spin_lock_irq(_lock); > list_for_each(tmp, _head) { > p = list_entry(tmp, struct task_struct, run_list); > if (p->has_cpu || p->state == (TASK_RUNNING|TASK_PREEMPTED)) { > p->syncing = 1; > sync_count++; > } > } > spin_unlock_irq(_lock); > > /* Wait for them all */ > wait_event(syncing_task, sync_count == 0); > up(_kernel_mtx); > } > > Also untested 8), > Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Please accept my apologies if I am missing something basic, but... 1. On a busy system, isn't it possible for a preempted task to stay preempted for a -long- time, especially if there are lots of real-time tasks in the mix? 2. Isn't it possible to get in trouble even on a UP if a task is preempted in a critical region? For example, suppose the preempting task does a synchronize_kernel()? If I understand the preemptible kernel patch, suppressing preemption is quite cheap -- just increment preempt_count before and decrement it after, with no locks, atomic operations, or disabling of interrupts required. I understand that disabling preemption for any long time is a Bad Thing, however, Dipankar's patch is able to detect situations where preemption has been suppressed for too long. In addition, Rusty's original synchronize_kernel() patch is much simpler than the two that wait for preempted tasks to reach a voluntary context switch. What am I missing here? Thanx, Paul Here is an attempt at a possible version of synchronize_kernel() that should work on a preemptible kernel. I haven't tested it yet. It's close, but... Those who suggest that we don't do preemtion on SMP make this much easier (synchronize_kernel() is a NOP on UP), and I'm starting to agree with them. Anyway: if (p-state == TASK_RUNNING || (p-state == (TASK_RUNNING|TASK_PREEMPTED))) { p-flags |= PF_SYNCING; Setting a running task's flags brings races, AFAICT, and checking p-state is NOT sufficient, consider wait_event(): you need p-has_cpu here I think. You could do it for TASK_PREEMPTED only, but you'd have to do the "unreal priority" part of synchronize_kernel() with some method to say "don't preempt anyone", but it will hurt latency. Hmmm... The only way I can see is to have a new element in "struct task_struct" saying "syncing now", which is protected by the runqueue lock. This looks like (and I prefer wait queues, they have such nice helpers): static DECLARE_WAIT_QUEUE_HEAD(syncing_task); static DECLARE_MUTEX(synchronize_kernel_mtx); static int sync_count = 0; schedule(): if (!(prev-state TASK_PREEMPTED) prev-syncing) if (--sync_count == 0) wake_up(syncing_task); synchronize_kernel(): { struct list_head *tmp; struct task_struct *p; /* Guard against multiple calls to this function */ down(synchronize_kernel_mtx); /* Everyone running now or currently preempted must voluntarily schedule before we know we are safe. */ spin_lock_irq(runqueue_lock); list_for_each(tmp, runqueue_head) { p = list_entry(tmp, struct task_struct, run_list); if (p-has_cpu || p-state == (TASK_RUNNING|TASK_PREEMPTED)) { p-syncing = 1; sync_count++; } } spin_unlock_irq(runqueue_lock); /* Wait for them all */ wait_event(syncing_task, sync_count == 0); up(synchronize_kernel_mtx); } Also untested 8), Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Hallo, On Fri, Apr 06, 2001 at 04:52:25PM -0700, Paul McKenney wrote: 1. On a busy system, isn't it possible for a preempted task to stay preempted for a -long- time, especially if there are lots of real-time tasks in the mix? The problem you're describing is probably considered too hard to solve properly (bad answer, but that is how it is currently) Yes there is. You can also force a normal (non preemptive) kernel into complete livelock by just giving it enough network traffic to do, so that it always works in the high priority network softirq or doing the same with some other interrupt. Just when this happens a lot of basic things will stop working (like page cleaning, IO flushing etc.), so your callbacks or module unloads not running are probably the least of your worries. The same problem applies to a smaller scale to real time processes; kernel services normally do not run real-time, so they can be starved. Priority inversion is not handled in Linux kernel ATM BTW, there are already situations where a realtime task can cause a deadlock with some lower priority system thread (I believe there is at least one case of this known with realtime ntpd on 2.4) 2. Isn't it possible to get in trouble even on a UP if a task is preempted in a critical region? For example, suppose the preempting task does a synchronize_kernel()? Ugly. I guess one way to solve it would be to readd the 2.2 scheduler taskqueue, and just queue a scheduler callback in this case. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message <[EMAIL PROTECTED]> you write : > > Setting a running task's flags brings races, AFAICT, and checking > > p->state is NOT sufficient, consider wait_event(): you need p->has_cpu > > here I think. > > My thought here was that if p->state is anything other than TASK_RUNNING > or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a > synchonize point, Right. Theoretically possible to set p->state and not sleep, but it's not a bad assumption. > > schedule(): > > if (!(prev->state & TASK_PREEMPTED) && prev->syncing) > > if (--sync_count == 0) wake_up(_task); > > Don't forget to reset prev->syncing. Right. > I agree with you about wait queues, but didn't use them here because > of the problem of avoiding deadlock on the runqueue lock, which the > wait queues also use. And right again. Yeah, it has to be done manually. Ack, can I retract that Email? Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message <[EMAIL PROTECTED]> you write: > On a higher level, I think the scanning of the run list to set flags and > counters is a bit off. [snip standard refcnt scheme] For most things, refcnts are great. I use them in connection tracking. But when writes can be insanely slow (eg. once per hour), those two atomic ops just hurt scalability, and are invasive. In particular, they suck HARD for modules, which is where my initial quiescent wait implementation came from, only I didn't call it that because Andi hadn't posted the Sequent paper URL yet and I hadn't figured out that this was a generically useful scheme that can be implemented in 20 lines... Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message [EMAIL PROTECTED] you write : Setting a running task's flags brings races, AFAICT, and checking p-state is NOT sufficient, consider wait_event(): you need p-has_cpu here I think. My thought here was that if p-state is anything other than TASK_RUNNING or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a synchonize point, Right. Theoretically possible to set p-state and not sleep, but it's not a bad assumption. schedule(): if (!(prev-state TASK_PREEMPTED) prev-syncing) if (--sync_count == 0) wake_up(syncing_task); Don't forget to reset prev-syncing. Right. I agree with you about wait queues, but didn't use them here because of the problem of avoiding deadlock on the runqueue lock, which the wait queues also use. And right again. Yeah, it has to be done manually. Ack, can I retract that Email? Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message [EMAIL PROTECTED] you write: On a higher level, I think the scanning of the run list to set flags and counters is a bit off. [snip standard refcnt scheme] For most things, refcnts are great. I use them in connection tracking. But when writes can be insanely slow (eg. once per hour), those two atomic ops just hurt scalability, and are invasive. In particular, they suck HARD for modules, which is where my initial quiescent wait implementation came from, only I didn't call it that because Andi hadn't posted the Sequent paper URL yet and I hadn't figured out that this was a generically useful scheme that can be implemented in 20 lines... Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: > > On Sat, 31 Mar 2001, george anzinger wrote: > > I think this should be: > > if (p->has_cpu || p->state & TASK_PREEMPTED)) { > > to catch tasks that were preempted with other states. > > But the other states are all part of the state change that happens at a > non-preemtive schedule() point, aren't they, so those tasks are already > safe to access the data we are protecting. > If your saying that the task's "thinking" about a state change is sufficient, ok. The point is that a task changes it state prior to calling schedule() and then, sometimes, doesn't call schedule and just changes its state back to running. Preemption can happen at any of these times, after all that is what the TASK_PREEMPTED flag is used for. On a higher level, I think the scanning of the run list to set flags and counters is a bit off. If these things need to be counted and kept track of, the tasks should be doing it "in the moment" not some other task at some distant time. For example if what is being protected is a data structure, a counter could be put in the structure that keeps track of the number of tasks that are interested in seeing it stay around. As I understand the objective of the method being explored, a writer wants to change the structure, but old readers can continue to use the old while new readers get the new structure. The issue then is when to "garbage collect" the no longer used structures. It seems to me that the pointer to the structure can be changed to point to the new one when the writer has it set up. Old users continue to use the old. When they are done, they decrement the use count. When the use count goes to zero, it is time to "garbage collect". At this time, the "garbage man" is called (one simple one would be to check if the structure is still the one a "new" task would get). Various methods exist for determing how and if the "garbage man" should be called, but this sort of thing, IMNSHO, does NOT belong in schedule(). George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: On Sat, 31 Mar 2001, george anzinger wrote: I think this should be: if (p-has_cpu || p-state TASK_PREEMPTED)) { to catch tasks that were preempted with other states. But the other states are all part of the state change that happens at a non-preemtive schedule() point, aren't they, so those tasks are already safe to access the data we are protecting. If your saying that the task's "thinking" about a state change is sufficient, ok. The point is that a task changes it state prior to calling schedule() and then, sometimes, doesn't call schedule and just changes its state back to running. Preemption can happen at any of these times, after all that is what the TASK_PREEMPTED flag is used for. On a higher level, I think the scanning of the run list to set flags and counters is a bit off. If these things need to be counted and kept track of, the tasks should be doing it "in the moment" not some other task at some distant time. For example if what is being protected is a data structure, a counter could be put in the structure that keeps track of the number of tasks that are interested in seeing it stay around. As I understand the objective of the method being explored, a writer wants to change the structure, but old readers can continue to use the old while new readers get the new structure. The issue then is when to "garbage collect" the no longer used structures. It seems to me that the pointer to the structure can be changed to point to the new one when the writer has it set up. Old users continue to use the old. When they are done, they decrement the use count. When the use count goes to zero, it is time to "garbage collect". At this time, the "garbage man" is called (one simple one would be to check if the structure is still the one a "new" task would get). Various methods exist for determing how and if the "garbage man" should be called, but this sort of thing, IMNSHO, does NOT belong in schedule(). George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Sat, 31 Mar 2001, Rusty Russell wrote: > > if (p->state == TASK_RUNNING || > > (p->state == (TASK_RUNNING|TASK_PREEMPTED))) { > > p->flags |= PF_SYNCING; > > Setting a running task's flags brings races, AFAICT, and checking > p->state is NOT sufficient, consider wait_event(): you need p->has_cpu > here I think. My thought here was that if p->state is anything other than TASK_RUNNING or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a synchonize point, so we don't need to wait for it to arrive at another one - it will get a consistent view of the data we are protecting. wait_event() qualifies as a synchronize point, doesn't it? Or am I missing something? > The only way I can see is to have a new element in "struct > task_struct" saying "syncing now", which is protected by the runqueue > lock. This looks like (and I prefer wait queues, they have such nice > helpers): > > static DECLARE_WAIT_QUEUE_HEAD(syncing_task); > static DECLARE_MUTEX(synchronize_kernel_mtx); > static int sync_count = 0; > > schedule(): > if (!(prev->state & TASK_PREEMPTED) && prev->syncing) > if (--sync_count == 0) wake_up(_task); Don't forget to reset prev->syncing. I agree with you about wait queues, but didn't use them here because of the problem of avoiding deadlock on the runqueue lock, which the wait queues also use. The above code in schedule needs the runqueue lock to protect sync_count. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Sat, 31 Mar 2001, george anzinger wrote: > I think this should be: > if (p->has_cpu || p->state & TASK_PREEMPTED)) { > to catch tasks that were preempted with other states. But the other states are all part of the state change that happens at a non-preemtive schedule() point, aren't they, so those tasks are already safe to access the data we are protecting. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Sat, 31 Mar 2001, george anzinger wrote: I think this should be: if (p-has_cpu || p-state TASK_PREEMPTED)) { to catch tasks that were preempted with other states. But the other states are all part of the state change that happens at a non-preemtive schedule() point, aren't they, so those tasks are already safe to access the data we are protecting. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Sat, 31 Mar 2001, Rusty Russell wrote: if (p-state == TASK_RUNNING || (p-state == (TASK_RUNNING|TASK_PREEMPTED))) { p-flags |= PF_SYNCING; Setting a running task's flags brings races, AFAICT, and checking p-state is NOT sufficient, consider wait_event(): you need p-has_cpu here I think. My thought here was that if p-state is anything other than TASK_RUNNING or TASK_RUNNING|TASK_PREEMPTED, then that task is already at a synchonize point, so we don't need to wait for it to arrive at another one - it will get a consistent view of the data we are protecting. wait_event() qualifies as a synchronize point, doesn't it? Or am I missing something? The only way I can see is to have a new element in "struct task_struct" saying "syncing now", which is protected by the runqueue lock. This looks like (and I prefer wait queues, they have such nice helpers): static DECLARE_WAIT_QUEUE_HEAD(syncing_task); static DECLARE_MUTEX(synchronize_kernel_mtx); static int sync_count = 0; schedule(): if (!(prev-state TASK_PREEMPTED) prev-syncing) if (--sync_count == 0) wake_up(syncing_task); Don't forget to reset prev-syncing. I agree with you about wait queues, but didn't use them here because of the problem of avoiding deadlock on the runqueue lock, which the wait queues also use. The above code in schedule needs the runqueue lock to protect sync_count. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Rusty Russell wrote: > > In message <[EMAIL PROTECTED]> you write: > > Here is an attempt at a possible version of synchronize_kernel() that > > should work on a preemptible kernel. I haven't tested it yet. > > It's close, but... > > Those who suggest that we don't do preemtion on SMP make this much > easier (synchronize_kernel() is a NOP on UP), and I'm starting to > agree with them. Anyway: > > > if (p->state == TASK_RUNNING || > > (p->state == (TASK_RUNNING|TASK_PREEMPTED))) { > > p->flags |= PF_SYNCING; > > Setting a running task's flags brings races, AFAICT, and checking > p->state is NOT sufficient, consider wait_event(): you need p->has_cpu > here I think. You could do it for TASK_PREEMPTED only, but you'd have > to do the "unreal priority" part of synchronize_kernel() with some > method to say "don't preempt anyone", but it will hurt latency. > Hmmm... > > The only way I can see is to have a new element in "struct > task_struct" saying "syncing now", which is protected by the runqueue > lock. This looks like (and I prefer wait queues, they have such nice > helpers): > > static DECLARE_WAIT_QUEUE_HEAD(syncing_task); > static DECLARE_MUTEX(synchronize_kernel_mtx); > static int sync_count = 0; > > schedule(): > if (!(prev->state & TASK_PREEMPTED) && prev->syncing) > if (--sync_count == 0) wake_up(_task); > > synchronize_kernel(): > { > struct list_head *tmp; > struct task_struct *p; > > /* Guard against multiple calls to this function */ > down(_kernel_mtx); > > /* Everyone running now or currently preempted must >voluntarily schedule before we know we are safe. */ > spin_lock_irq(_lock); > list_for_each(tmp, _head) { > p = list_entry(tmp, struct task_struct, run_list); > if (p->has_cpu || p->state == (TASK_RUNNING|TASK_PREEMPTED)) { I think this should be: if (p->has_cpu || p->state & TASK_PREEMPTED)) { to catch tasks that were preempted with other states. The lse Multi Queue scheduler folks are going to love this. George > p->syncing = 1; > sync_count++; > } > } > spin_unlock_irq(_lock); > > /* Wait for them all */ > wait_event(syncing_task, sync_count == 0); > up(_kernel_mtx); > } > > Also untested 8), > Rusty. > -- > Premature optmztion is rt of all evl. --DK > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Rusty Russell wrote: In message [EMAIL PROTECTED] you write: Here is an attempt at a possible version of synchronize_kernel() that should work on a preemptible kernel. I haven't tested it yet. It's close, but... Those who suggest that we don't do preemtion on SMP make this much easier (synchronize_kernel() is a NOP on UP), and I'm starting to agree with them. Anyway: if (p-state == TASK_RUNNING || (p-state == (TASK_RUNNING|TASK_PREEMPTED))) { p-flags |= PF_SYNCING; Setting a running task's flags brings races, AFAICT, and checking p-state is NOT sufficient, consider wait_event(): you need p-has_cpu here I think. You could do it for TASK_PREEMPTED only, but you'd have to do the "unreal priority" part of synchronize_kernel() with some method to say "don't preempt anyone", but it will hurt latency. Hmmm... The only way I can see is to have a new element in "struct task_struct" saying "syncing now", which is protected by the runqueue lock. This looks like (and I prefer wait queues, they have such nice helpers): static DECLARE_WAIT_QUEUE_HEAD(syncing_task); static DECLARE_MUTEX(synchronize_kernel_mtx); static int sync_count = 0; schedule(): if (!(prev-state TASK_PREEMPTED) prev-syncing) if (--sync_count == 0) wake_up(syncing_task); synchronize_kernel(): { struct list_head *tmp; struct task_struct *p; /* Guard against multiple calls to this function */ down(synchronize_kernel_mtx); /* Everyone running now or currently preempted must voluntarily schedule before we know we are safe. */ spin_lock_irq(runqueue_lock); list_for_each(tmp, runqueue_head) { p = list_entry(tmp, struct task_struct, run_list); if (p-has_cpu || p-state == (TASK_RUNNING|TASK_PREEMPTED)) { I think this should be: if (p-has_cpu || p-state TASK_PREEMPTED)) { to catch tasks that were preempted with other states. The lse Multi Queue scheduler folks are going to love this. George p-syncing = 1; sync_count++; } } spin_unlock_irq(runqueue_lock); /* Wait for them all */ wait_event(syncing_task, sync_count == 0); up(synchronize_kernel_mtx); } Also untested 8), Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message <[EMAIL PROTECTED]> you write: > Here is an attempt at a possible version of synchronize_kernel() that > should work on a preemptible kernel. I haven't tested it yet. It's close, but... Those who suggest that we don't do preemtion on SMP make this much easier (synchronize_kernel() is a NOP on UP), and I'm starting to agree with them. Anyway: > if (p->state == TASK_RUNNING || > (p->state == (TASK_RUNNING|TASK_PREEMPTED))) { > p->flags |= PF_SYNCING; Setting a running task's flags brings races, AFAICT, and checking p->state is NOT sufficient, consider wait_event(): you need p->has_cpu here I think. You could do it for TASK_PREEMPTED only, but you'd have to do the "unreal priority" part of synchronize_kernel() with some method to say "don't preempt anyone", but it will hurt latency. Hmmm... The only way I can see is to have a new element in "struct task_struct" saying "syncing now", which is protected by the runqueue lock. This looks like (and I prefer wait queues, they have such nice helpers): static DECLARE_WAIT_QUEUE_HEAD(syncing_task); static DECLARE_MUTEX(synchronize_kernel_mtx); static int sync_count = 0; schedule(): if (!(prev->state & TASK_PREEMPTED) && prev->syncing) if (--sync_count == 0) wake_up(_task); synchronize_kernel(): { struct list_head *tmp; struct task_struct *p; /* Guard against multiple calls to this function */ down(_kernel_mtx); /* Everyone running now or currently preempted must voluntarily schedule before we know we are safe. */ spin_lock_irq(_lock); list_for_each(tmp, _head) { p = list_entry(tmp, struct task_struct, run_list); if (p->has_cpu || p->state == (TASK_RUNNING|TASK_PREEMPTED)) { p->syncing = 1; sync_count++; } } spin_unlock_irq(_lock); /* Wait for them all */ wait_event(syncing_task, sync_count == 0); up(_kernel_mtx); } Also untested 8), Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message [EMAIL PROTECTED] you write: Here is an attempt at a possible version of synchronize_kernel() that should work on a preemptible kernel. I haven't tested it yet. It's close, but... Those who suggest that we don't do preemtion on SMP make this much easier (synchronize_kernel() is a NOP on UP), and I'm starting to agree with them. Anyway: if (p-state == TASK_RUNNING || (p-state == (TASK_RUNNING|TASK_PREEMPTED))) { p-flags |= PF_SYNCING; Setting a running task's flags brings races, AFAICT, and checking p-state is NOT sufficient, consider wait_event(): you need p-has_cpu here I think. You could do it for TASK_PREEMPTED only, but you'd have to do the "unreal priority" part of synchronize_kernel() with some method to say "don't preempt anyone", but it will hurt latency. Hmmm... The only way I can see is to have a new element in "struct task_struct" saying "syncing now", which is protected by the runqueue lock. This looks like (and I prefer wait queues, they have such nice helpers): static DECLARE_WAIT_QUEUE_HEAD(syncing_task); static DECLARE_MUTEX(synchronize_kernel_mtx); static int sync_count = 0; schedule(): if (!(prev-state TASK_PREEMPTED) prev-syncing) if (--sync_count == 0) wake_up(syncing_task); synchronize_kernel(): { struct list_head *tmp; struct task_struct *p; /* Guard against multiple calls to this function */ down(synchronize_kernel_mtx); /* Everyone running now or currently preempted must voluntarily schedule before we know we are safe. */ spin_lock_irq(runqueue_lock); list_for_each(tmp, runqueue_head) { p = list_entry(tmp, struct task_struct, run_list); if (p-has_cpu || p-state == (TASK_RUNNING|TASK_PREEMPTED)) { p-syncing = 1; sync_count++; } } spin_unlock_irq(runqueue_lock); /* Wait for them all */ wait_event(syncing_task, sync_count == 0); up(synchronize_kernel_mtx); } Also untested 8), Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 28 Mar 2001 12:51:02 -0800, george anzinger <[EMAIL PROTECTED]> wrote: >Dipankar Sarma wrote: >> 1. Disable pre-emption during the time when references to data >> structures >> updated using such Two-phase updates are held. > >Doesn't this fly in the face of the whole Two-phase system? It seems to >me that the point was to not require any locks. Preemption disable IS a >lock. Not as strong as some, but a lock none the less. The aim is to remove all module locks from the main kernel path and move the penalty for module unloading into the task doing the unload. Only the unload code needs to worry about preemption, not the main kernel code paths, so main line code runs faster and scales better. I don't care how long (within reason) it takes to unload a module, it is such a rare event. I want module unload to be race free with zero impact on the fast code paths, the current design penalizes the fast code paths. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001, Nigel Gamble wrote: > On Tue, 20 Mar 2001, Rusty Russell wrote: > > Thoughts? > > Perhaps synchronize_kernel() could take the run_queue lock, mark all the > tasks on it and count them. Any task marked when it calls schedule() > voluntarily (but not if it is preempted) is unmarked and the count > decremented. synchronize_kernel() continues until the count is zero. Hi Rusty, Here is an attempt at a possible version of synchronize_kernel() that should work on a preemptible kernel. I haven't tested it yet. static int sync_count = 0; static struct task_struct *syncing_task = NULL; static DECLARE_MUTEX(synchronize_kernel_mtx); void synchronize_kernel() { struct list_head *tmp; struct task_struct *p; /* Guard against multiple calls to this function */ down(_kernel_mtx); /* Mark all tasks on the runqueue */ spin_lock_irq(_lock); list_for_each(tmp, _head) { p = list_entry(tmp, struct task_struct, run_list); if (p == current) continue; if (p->state == TASK_RUNNING || (p->state == (TASK_RUNNING|TASK_PREEMPTED))) { p->flags |= PF_SYNCING; sync_count++; } } if (sync_count == 0) goto out; syncing_task = current; spin_unlock_irq(_lock); /* * Cause a schedule on every CPU, as for a non-preemptible * kernel */ /* Save current state */ cpus_allowed = current->cpus_allowed; policy = current->policy; rt_priority = current->rt_priority; /* Create an unreal time task. */ current->policy = SCHED_FIFO; current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO); /* Make us schedulable on all CPUs. */ current->cpus_allowed = (1ULcpus_allowed = cpus_allowed; current->policy = policy; current->rt_priority = rt_priority; /* * Wait, if necessary, until all preempted tasks * have reached a sync point. */ spin_lock_irq(_lock); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (sync_count == 0) break; spin_unlock_irq(_lock); schedule(); spin_lock_irq(_lock); } current->state = TASK_RUNNING; syncing_task = NULL; out: spin_unlock_irq(_lock); up(_kernel_mtx); } And add this code to the beginning of schedule(), just after the runqueue_lock is taken (the flags field is probably not be the right place to put the synchronize mark; and the test should be optimized for the fast path in the same way as the other tests in schedule(), but you get the idea): if ((prev->flags & PF_SYNCING) && !(prev->state & TASK_PREEMPTED)) { prev->flags &= ~PF_SYNCING; if (--sync_count == 0) { syncing_task->state = TASK_RUNNING; if (!task_on_runqueue(syncing_task)) add_to_runqueue(syncing_task); syncing_task = NULL; } } Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, Mar 28, 2001 at 12:51:02PM -0800, george anzinger wrote: > Dipankar Sarma wrote: > > > > Also, a task could be preempted and then rescheduled on the same cpu > > making > > the depth counter 0 (right ?), but it could still be holding references > > to data > > structures to be updated using synchronize_kernel(). There seems to be > > two > > approaches to tackle preemption - > > > > 1. Disable pre-emption during the time when references to data > > structures > > updated using such Two-phase updates are held. > > Doesn't this fly in the face of the whole Two-phase system? It seems to > me that the point was to not require any locks. Preemption disable IS a > lock. Not as strong as some, but a lock none the less. The point is to avoid acquring costly locks, so it is a question of relative cost. Disabling preemption (by an atomic increment) for short critical sections may not be as bad as spin-waiting for highly contended locks or thrashing lock cachelines. > > > > Pros: easy to implement using a flag (ctx_sw_off() ?) > > Cons: not so easy to use since critical sections need to be clearly > > identified and interfaces defined. also affects preemptive behavior. > > > > 2. In synchronize_kernel(), distinguish between "natural" and preemptive > > schedules() and ignore preemptive ones. > > > > Pros: easy to use > > Cons: Not so easy to implement. Also a low priority task that keeps > > getting > > preempted often can affect update side performance significantly. > > Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in > state). Don't you also have to have some sort of task flag that > indicates that the task is one that needs to sync? Something that gets > set when it enters the area of interest and cleared when it hits the > sync point? None of the two two-phase update implementations (synchronize_kernel()) by Rusty and read-copy update by us, monitor the tasks that require sync for update. synchronize_kernel() forces a schedule on every cpu and read-copy update waits until every cpu goes through a quiscent state, before updating. Both approaches will require significant special handling because they implicitly assume that tasks inside the kernel are bound to the current cpu until it reaches a quiescent state (like a "normal" context switch). Since preempted tasks can run later on any cpu, we will have to keep track of sync points on a per-task basis and that will probably require using a snapshot of the running tasks from the global runqueue. That may not be a good thing from performance standpoint, not to mention the complexity. Also, in situations where read-to-write ratio is not heavily skewed towards read or lots of updates happening, a very low priority task can have a significant impact on performance by getting preempted all the time. Thanks Dipankar -- Dipankar Sarma ([EMAIL PROTECTED]) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, Mar 28, 2001 at 12:51:02PM -0800, george anzinger wrote: Dipankar Sarma wrote: Also, a task could be preempted and then rescheduled on the same cpu making the depth counter 0 (right ?), but it could still be holding references to data structures to be updated using synchronize_kernel(). There seems to be two approaches to tackle preemption - 1. Disable pre-emption during the time when references to data structures updated using such Two-phase updates are held. Doesn't this fly in the face of the whole Two-phase system? It seems to me that the point was to not require any locks. Preemption disable IS a lock. Not as strong as some, but a lock none the less. The point is to avoid acquring costly locks, so it is a question of relative cost. Disabling preemption (by an atomic increment) for short critical sections may not be as bad as spin-waiting for highly contended locks or thrashing lock cachelines. Pros: easy to implement using a flag (ctx_sw_off() ?) Cons: not so easy to use since critical sections need to be clearly identified and interfaces defined. also affects preemptive behavior. 2. In synchronize_kernel(), distinguish between "natural" and preemptive schedules() and ignore preemptive ones. Pros: easy to use Cons: Not so easy to implement. Also a low priority task that keeps getting preempted often can affect update side performance significantly. Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in state). Don't you also have to have some sort of task flag that indicates that the task is one that needs to sync? Something that gets set when it enters the area of interest and cleared when it hits the sync point? None of the two two-phase update implementations (synchronize_kernel()) by Rusty and read-copy update by us, monitor the tasks that require sync for update. synchronize_kernel() forces a schedule on every cpu and read-copy update waits until every cpu goes through a quiscent state, before updating. Both approaches will require significant special handling because they implicitly assume that tasks inside the kernel are bound to the current cpu until it reaches a quiescent state (like a "normal" context switch). Since preempted tasks can run later on any cpu, we will have to keep track of sync points on a per-task basis and that will probably require using a snapshot of the running tasks from the global runqueue. That may not be a good thing from performance standpoint, not to mention the complexity. Also, in situations where read-to-write ratio is not heavily skewed towards read or lots of updates happening, a very low priority task can have a significant impact on performance by getting preempted all the time. Thanks Dipankar -- Dipankar Sarma ([EMAIL PROTECTED]) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001, Nigel Gamble wrote: On Tue, 20 Mar 2001, Rusty Russell wrote: Thoughts? Perhaps synchronize_kernel() could take the run_queue lock, mark all the tasks on it and count them. Any task marked when it calls schedule() voluntarily (but not if it is preempted) is unmarked and the count decremented. synchronize_kernel() continues until the count is zero. Hi Rusty, Here is an attempt at a possible version of synchronize_kernel() that should work on a preemptible kernel. I haven't tested it yet. static int sync_count = 0; static struct task_struct *syncing_task = NULL; static DECLARE_MUTEX(synchronize_kernel_mtx); void synchronize_kernel() { struct list_head *tmp; struct task_struct *p; /* Guard against multiple calls to this function */ down(synchronize_kernel_mtx); /* Mark all tasks on the runqueue */ spin_lock_irq(runqueue_lock); list_for_each(tmp, runqueue_head) { p = list_entry(tmp, struct task_struct, run_list); if (p == current) continue; if (p-state == TASK_RUNNING || (p-state == (TASK_RUNNING|TASK_PREEMPTED))) { p-flags |= PF_SYNCING; sync_count++; } } if (sync_count == 0) goto out; syncing_task = current; spin_unlock_irq(runqueue_lock); /* * Cause a schedule on every CPU, as for a non-preemptible * kernel */ /* Save current state */ cpus_allowed = current-cpus_allowed; policy = current-policy; rt_priority = current-rt_priority; /* Create an unreal time task. */ current-policy = SCHED_FIFO; current-rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO); /* Make us schedulable on all CPUs. */ current-cpus_allowed = (1ULsmp_num_cpus)-1; /* Eliminate current cpu, reschedule */ while ((current-cpus_allowed = ~(1 smp_processor_id())) != 0) schedule(); /* Back to normal. */ current-cpus_allowed = cpus_allowed; current-policy = policy; current-rt_priority = rt_priority; /* * Wait, if necessary, until all preempted tasks * have reached a sync point. */ spin_lock_irq(runqueue_lock); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (sync_count == 0) break; spin_unlock_irq(runqueue_lock); schedule(); spin_lock_irq(runqueue_lock); } current-state = TASK_RUNNING; syncing_task = NULL; out: spin_unlock_irq(runqueue_lock); up(synchronize_kernel_mtx); } And add this code to the beginning of schedule(), just after the runqueue_lock is taken (the flags field is probably not be the right place to put the synchronize mark; and the test should be optimized for the fast path in the same way as the other tests in schedule(), but you get the idea): if ((prev-flags PF_SYNCING) !(prev-state TASK_PREEMPTED)) { prev-flags = ~PF_SYNCING; if (--sync_count == 0) { syncing_task-state = TASK_RUNNING; if (!task_on_runqueue(syncing_task)) add_to_runqueue(syncing_task); syncing_task = NULL; } } Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 28 Mar 2001 12:51:02 -0800, george anzinger [EMAIL PROTECTED] wrote: Dipankar Sarma wrote: 1. Disable pre-emption during the time when references to data structures updated using such Two-phase updates are held. Doesn't this fly in the face of the whole Two-phase system? It seems to me that the point was to not require any locks. Preemption disable IS a lock. Not as strong as some, but a lock none the less. The aim is to remove all module locks from the main kernel path and move the penalty for module unloading into the task doing the unload. Only the unload code needs to worry about preemption, not the main kernel code paths, so main line code runs faster and scales better. I don't care how long (within reason) it takes to unload a module, it is such a rare event. I want module unload to be race free with zero impact on the fast code paths, the current design penalizes the fast code paths. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Dipankar Sarma wrote: > > Nigel Gamble wrote: > > > > On Wed, 21 Mar 2001, Keith Owens wrote: > > > I misread the code, but the idea is still correct. Add a preemption > > > depth counter to each cpu, when you schedule and the depth is zero then > > > you know that the cpu is no longer holding any references to quiesced > > > structures. > > > > A task that has been preempted is on the run queue and can be > > rescheduled on a different CPU, so I can't see how a per-CPU counter > > would work. It seems to me that you would need a per run queue > > counter, like the example I gave in a previous posting. > > Also, a task could be preempted and then rescheduled on the same cpu > making > the depth counter 0 (right ?), but it could still be holding references > to data > structures to be updated using synchronize_kernel(). There seems to be > two > approaches to tackle preemption - > > 1. Disable pre-emption during the time when references to data > structures > updated using such Two-phase updates are held. Doesn't this fly in the face of the whole Two-phase system? It seems to me that the point was to not require any locks. Preemption disable IS a lock. Not as strong as some, but a lock none the less. > > Pros: easy to implement using a flag (ctx_sw_off() ?) > Cons: not so easy to use since critical sections need to be clearly > identified and interfaces defined. also affects preemptive behavior. > > 2. In synchronize_kernel(), distinguish between "natural" and preemptive > schedules() and ignore preemptive ones. > > Pros: easy to use > Cons: Not so easy to implement. Also a low priority task that keeps > getting > preempted often can affect update side performance significantly. Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in state). Don't you also have to have some sort of task flag that indicates that the task is one that needs to sync? Something that gets set when it enters the area of interest and cleared when it hits the sync point? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Hi George, george anzinger wrote: > > Exactly so. The method does not depend on the sum of preemption being > zip, but on each potential reader (writers take locks) passing thru a > "sync point". Your notion of waiting for each task to arrive > "naturally" at schedule() would work. It is, in fact, over kill as you > could also add arrival at sys call exit as a (the) "sync point". In > fact, for module unload, isn't this the real "sync point"? After all, a > module can call schedule, or did I miss a usage counter somewhere? It is certainly possible to implement synchronize_kernel() like primitive for two phase update using "sync point". Waiting for sys call exit will perhaps work in the module unloading case, but there could be performance issues if a cpu spends most of its time in idle task/interrupts. synchronize_kernel() provides a simple generic way of implementing a two phase update without serialization for reading. I am working a "sync point" based version of such an approach available at http://lse.sourceforge.net/locking/rclock.html. It is based on the original DYNIX/ptx stuff that Paul Mckenney developed in early 90s. This and synchronize_kernel() are very similar in approach and each can be implemented using the other. As for handling preemption, we can perhaps try 2 things - 1. The read side of the critical section is enclosed in RC_RDPROTECT()/RC_RDUNPROTECT() which are currently nops. We can disable/enable preemption using these. 2. Avoid counting preemptive context switches. I am not sure about this one though. > > By the way, there is a paper on this somewhere on the web. Anyone > remember where? If you are talking about Paul's paper, the link is http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf. Thanks Dipankar -- Dipankar Sarma ([EMAIL PROTECTED]) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: > > On Wed, 21 Mar 2001, Keith Owens wrote: > > I misread the code, but the idea is still correct. Add a preemption > > depth counter to each cpu, when you schedule and the depth is zero then > > you know that the cpu is no longer holding any references to quiesced > > structures. > > A task that has been preempted is on the run queue and can be > rescheduled on a different CPU, so I can't see how a per-CPU counter > would work. It seems to me that you would need a per run queue > counter, like the example I gave in a previous posting. Also, a task could be preempted and then rescheduled on the same cpu making the depth counter 0 (right ?), but it could still be holding references to data structures to be updated using synchronize_kernel(). There seems to be two approaches to tackle preemption - 1. Disable pre-emption during the time when references to data structures updated using such Two-phase updates are held. Pros: easy to implement using a flag (ctx_sw_off() ?) Cons: not so easy to use since critical sections need to be clearly identified and interfaces defined. also affects preemptive behavior. 2. In synchronize_kernel(), distinguish between "natural" and preemptive schedules() and ignore preemptive ones. Pros: easy to use Cons: Not so easy to implement. Also a low priority task that keeps getting preempted often can affect update side performance significantly. I intend to experiment with both to understand the impact. Thanks Dipankar -- Dipankar Sarma ([EMAIL PROTECTED]) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: On Wed, 21 Mar 2001, Keith Owens wrote: I misread the code, but the idea is still correct. Add a preemption depth counter to each cpu, when you schedule and the depth is zero then you know that the cpu is no longer holding any references to quiesced structures. A task that has been preempted is on the run queue and can be rescheduled on a different CPU, so I can't see how a per-CPU counter would work. It seems to me that you would need a per run queue counter, like the example I gave in a previous posting. Also, a task could be preempted and then rescheduled on the same cpu making the depth counter 0 (right ?), but it could still be holding references to data structures to be updated using synchronize_kernel(). There seems to be two approaches to tackle preemption - 1. Disable pre-emption during the time when references to data structures updated using such Two-phase updates are held. Pros: easy to implement using a flag (ctx_sw_off() ?) Cons: not so easy to use since critical sections need to be clearly identified and interfaces defined. also affects preemptive behavior. 2. In synchronize_kernel(), distinguish between "natural" and preemptive schedules() and ignore preemptive ones. Pros: easy to use Cons: Not so easy to implement. Also a low priority task that keeps getting preempted often can affect update side performance significantly. I intend to experiment with both to understand the impact. Thanks Dipankar -- Dipankar Sarma ([EMAIL PROTECTED]) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Hi George, george anzinger wrote: Exactly so. The method does not depend on the sum of preemption being zip, but on each potential reader (writers take locks) passing thru a "sync point". Your notion of waiting for each task to arrive "naturally" at schedule() would work. It is, in fact, over kill as you could also add arrival at sys call exit as a (the) "sync point". In fact, for module unload, isn't this the real "sync point"? After all, a module can call schedule, or did I miss a usage counter somewhere? It is certainly possible to implement synchronize_kernel() like primitive for two phase update using "sync point". Waiting for sys call exit will perhaps work in the module unloading case, but there could be performance issues if a cpu spends most of its time in idle task/interrupts. synchronize_kernel() provides a simple generic way of implementing a two phase update without serialization for reading. I am working a "sync point" based version of such an approach available at http://lse.sourceforge.net/locking/rclock.html. It is based on the original DYNIX/ptx stuff that Paul Mckenney developed in early 90s. This and synchronize_kernel() are very similar in approach and each can be implemented using the other. As for handling preemption, we can perhaps try 2 things - 1. The read side of the critical section is enclosed in RC_RDPROTECT()/RC_RDUNPROTECT() which are currently nops. We can disable/enable preemption using these. 2. Avoid counting preemptive context switches. I am not sure about this one though. By the way, there is a paper on this somewhere on the web. Anyone remember where? If you are talking about Paul's paper, the link is http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf. Thanks Dipankar -- Dipankar Sarma ([EMAIL PROTECTED]) IBM Linux Technology Center IBM Software Lab, Bangalore, India. Project Page: http://lse.sourceforge.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Dipankar Sarma wrote: Nigel Gamble wrote: On Wed, 21 Mar 2001, Keith Owens wrote: I misread the code, but the idea is still correct. Add a preemption depth counter to each cpu, when you schedule and the depth is zero then you know that the cpu is no longer holding any references to quiesced structures. A task that has been preempted is on the run queue and can be rescheduled on a different CPU, so I can't see how a per-CPU counter would work. It seems to me that you would need a per run queue counter, like the example I gave in a previous posting. Also, a task could be preempted and then rescheduled on the same cpu making the depth counter 0 (right ?), but it could still be holding references to data structures to be updated using synchronize_kernel(). There seems to be two approaches to tackle preemption - 1. Disable pre-emption during the time when references to data structures updated using such Two-phase updates are held. Doesn't this fly in the face of the whole Two-phase system? It seems to me that the point was to not require any locks. Preemption disable IS a lock. Not as strong as some, but a lock none the less. Pros: easy to implement using a flag (ctx_sw_off() ?) Cons: not so easy to use since critical sections need to be clearly identified and interfaces defined. also affects preemptive behavior. 2. In synchronize_kernel(), distinguish between "natural" and preemptive schedules() and ignore preemptive ones. Pros: easy to use Cons: Not so easy to implement. Also a low priority task that keeps getting preempted often can affect update side performance significantly. Actually is is fairly easy to distinguish the two (see TASK_PREEMPTED in state). Don't you also have to have some sort of task flag that indicates that the task is one that needs to sync? Something that gets set when it enters the area of interest and cleared when it hits the sync point? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Thu, 22 Mar 2001, Rusty Russell wrote: > Nigel's "traverse the run queue and mark the preempted" solution is > actually pretty nice, and cheap. Since the runqueue lock is grabbed, > it doesn't require icky atomic ops, either. You'd have to mark both the preempted tasks, and the tasks currently running on each CPU (which could become preempted before reaching a voluntary schedule point). > Despite Nigel's initial belief that this technique is fragile, I > believe it will become an increasingly fundamental method in the > kernel, so (with documentation) it will become widely understood, as > it offers scalability and efficiency. Actually, I agree with you now that I've had a chance to think about this some more. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message <[EMAIL PROTECTED]> you write: > > Keith Owens writes: > > Or have I missed something? > > Nope, it is a fundamental problem with such kernel pre-emption > schemes. As a result, it would also break our big-reader locks > (see include/linux/brlock.h). Good point: holding a brlock has to block preemption, as per spinlocks. > Basically, anything which uses smp_processor_id() would need to > be holding some lock so as to not get pre-empted. When I audited the uses of smp_processor_id() for the hotplug cpu stuff, there were surprisingly few calls to smp_processor_id(), and most of these will be holding a brlock, so OK already. Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message <[EMAIL PROTECTED]> you write: > Nigel Gamble wrote: > > > > On Wed, 21 Mar 2001, Keith Owens wrote: > > > I misread the code, but the idea is still correct. Add a preemption > > > depth counter to each cpu, when you schedule and the depth is zero then > > > you know that the cpu is no longer holding any references to quiesced > > > structures. > > > > A task that has been preempted is on the run queue and can be > > rescheduled on a different CPU, so I can't see how a per-CPU counter > > would work. It seems to me that you would need a per run queue > > counter, like the example I gave in a previous posting. > > Exactly so. The method does not depend on the sum of preemption being > zip, but on each potential reader (writers take locks) passing thru a > "sync point". Your notion of waiting for each task to arrive > "naturally" at schedule() would work. It is, in fact, over kill as you > could also add arrival at sys call exit as a (the) "sync point". In Power off is also a sync point 8). But you want it to happen in bounded time: consider a daemon which opens a device every minute and never exits. Nigel's "traverse the run queue and mark the preempted" solution is actually pretty nice, and cheap. Since the runqueue lock is grabbed, it doesn't require icky atomic ops, either. Despite Nigel's initial belief that this technique is fragile, I believe it will become an increasingly fundamental method in the kernel, so (with documentation) it will become widely understood, as it offers scalability and efficiency. Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message [EMAIL PROTECTED] you write: Nigel Gamble wrote: On Wed, 21 Mar 2001, Keith Owens wrote: I misread the code, but the idea is still correct. Add a preemption depth counter to each cpu, when you schedule and the depth is zero then you know that the cpu is no longer holding any references to quiesced structures. A task that has been preempted is on the run queue and can be rescheduled on a different CPU, so I can't see how a per-CPU counter would work. It seems to me that you would need a per run queue counter, like the example I gave in a previous posting. Exactly so. The method does not depend on the sum of preemption being zip, but on each potential reader (writers take locks) passing thru a "sync point". Your notion of waiting for each task to arrive "naturally" at schedule() would work. It is, in fact, over kill as you could also add arrival at sys call exit as a (the) "sync point". In Power off is also a sync point 8). But you want it to happen in bounded time: consider a daemon which opens a device every minute and never exits. Nigel's "traverse the run queue and mark the preempted" solution is actually pretty nice, and cheap. Since the runqueue lock is grabbed, it doesn't require icky atomic ops, either. Despite Nigel's initial belief that this technique is fragile, I believe it will become an increasingly fundamental method in the kernel, so (with documentation) it will become widely understood, as it offers scalability and efficiency. Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Thu, 22 Mar 2001, Rusty Russell wrote: Nigel's "traverse the run queue and mark the preempted" solution is actually pretty nice, and cheap. Since the runqueue lock is grabbed, it doesn't require icky atomic ops, either. You'd have to mark both the preempted tasks, and the tasks currently running on each CPU (which could become preempted before reaching a voluntary schedule point). Despite Nigel's initial belief that this technique is fragile, I believe it will become an increasingly fundamental method in the kernel, so (with documentation) it will become widely understood, as it offers scalability and efficiency. Actually, I agree with you now that I've had a chance to think about this some more. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message [EMAIL PROTECTED] you write: Keith Owens writes: Or have I missed something? Nope, it is a fundamental problem with such kernel pre-emption schemes. As a result, it would also break our big-reader locks (see include/linux/brlock.h). Good point: holding a brlock has to block preemption, as per spinlocks. Basically, anything which uses smp_processor_id() would need to be holding some lock so as to not get pre-empted. When I audited the uses of smp_processor_id() for the hotplug cpu stuff, there were surprisingly few calls to smp_processor_id(), and most of these will be holding a brlock, so OK already. Rusty. -- Premature optmztion is rt of all evl. --DK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 21 Mar 2001, Andrew Morton wrote: > It's a problem for uniprocessors as well. > > Example: > > #define current_cpu_data boot_cpu_data > #define pgd_quicklist (current_cpu_data.pgd_quick) > > extern __inline__ void free_pgd_fast(pgd_t *pgd) > { > *(unsigned long *)pgd = (unsigned long) pgd_quicklist; > pgd_quicklist = (unsigned long *) pgd; > pgtable_cache_size++; > } > > Preemption could corrupt this list. Thanks, Andrew, for pointing this out. I've added fixes to the patch for this problem and the others in pgalloc.h. If you know of any other similar problems on uniprocessors, please let me know. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 21 Mar 2001, David S. Miller wrote: > Basically, anything which uses smp_processor_id() would need to > be holding some lock so as to not get pre-empted. Not necessarily. Another solution for the smp_processor_id() case is to ensure that the task can only be scheduled on the current CPU for the duration that the value of smp_processor_id() is used. Or, if the critical region is very short, to disable interrupts on the local CPU. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, Mar 21, 2001 at 08:19:54PM +1100, Keith Owens wrote: > Ouch. What about all the per cpu structures in the kernel, how do you > handle them if a preempted task can be rescheduled on another cpu? > > int count[NR_CPUS], *p; > p = count+smp_processor_id(); /* start on cpu 0, [0] */ > if (*p >= 1024) { >/* preempt here, reschedule on cpu 1 */ >*p = 1; /* update cpu 0 count from cpu 1, oops */ > } > > Unless you find every use of a per cpu structure and wrap a spin lock > around it, migrating a task from one cpu to another is going to be a > source of wierd and wonderful errors. Since the main use of per cpu > structures is to avoid locks, adding a spin lock to every structure > will be a killer. Or have I missed something? That's why Linus suggested it for UP only. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
george anzinger writes: > By the by, if a preemption lock is all that is needed the patch defines > it and it is rather fast (an inc going in and a dec & test comming > out). A lot faster than a spin lock with its "LOCK" access. A preempt > lock does not need to be "LOCK"ed because the only contender is the same > cpu. So we would have to invoke this thing around every set of smp_processor_id() references? Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
"David S. Miller" wrote: > > Keith Owens writes: > > Or have I missed something? > > Nope, it is a fundamental problem with such kernel pre-emption > schemes. As a result, it would also break our big-reader locks > (see include/linux/brlock.h). > > Basically, anything which uses smp_processor_id() would need to > be holding some lock so as to not get pre-empted. > It's a problem for uniprocessors as well. Example: #define current_cpu_data boot_cpu_data #define pgd_quicklist (current_cpu_data.pgd_quick) extern __inline__ void free_pgd_fast(pgd_t *pgd) { *(unsigned long *)pgd = (unsigned long) pgd_quicklist; pgd_quicklist = (unsigned long *) pgd; pgtable_cache_size++; } Preemption could corrupt this list. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Keith Owens writes: > Or have I missed something? Nope, it is a fundamental problem with such kernel pre-emption schemes. As a result, it would also break our big-reader locks (see include/linux/brlock.h). Basically, anything which uses smp_processor_id() would need to be holding some lock so as to not get pre-empted. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: > A task that has been preempted is on the run queue and can be > rescheduled on a different CPU, so I can't see how a per-CPU counter > would work. It seems to me that you would need a per run queue > counter, like the example I gave in a previous posting. Ouch. What about all the per cpu structures in the kernel, how do you handle them if a preempted task can be rescheduled on another cpu? int count[NR_CPUS], *p; p = count+smp_processor_id(); /* start on cpu 0, [0] */ if (*p >= 1024) { /* preempt here, reschedule on cpu 1 */ *p = 1; /* update cpu 0 count from cpu 1, oops */ } Unless you find every use of a per cpu structure and wrap a spin lock around it, migrating a task from one cpu to another is going to be a source of wierd and wonderful errors. Since the main use of per cpu structures is to avoid locks, adding a spin lock to every structure will be a killer. Or have I missed something? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 21 Mar 2001 00:04:56 -0800, george anzinger <[EMAIL PROTECTED]> wrote: >Exactly so. The method does not depend on the sum of preemption being >zip, but on each potential reader (writers take locks) passing thru a >"sync point". Your notion of waiting for each task to arrive >"naturally" at schedule() would work. It is, in fact, over kill as you >could also add arrival at sys call exit as a (the) "sync point". In >fact, for module unload, isn't this the real "sync point"? After all, a >module can call schedule, or did I miss a usage counter somewhere? A module can call schedule but it must do MOD_INC_USE_COUNT first. Sleeping in module code without incrementing the module use count first is a shooting offence. It is so full of races that you may as well call it Daytona. >By the way, there is a paper on this somewhere on the web. Anyone >remember where? http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: > > On Wed, 21 Mar 2001, Keith Owens wrote: > > I misread the code, but the idea is still correct. Add a preemption > > depth counter to each cpu, when you schedule and the depth is zero then > > you know that the cpu is no longer holding any references to quiesced > > structures. > > A task that has been preempted is on the run queue and can be > rescheduled on a different CPU, so I can't see how a per-CPU counter > would work. It seems to me that you would need a per run queue > counter, like the example I gave in a previous posting. Exactly so. The method does not depend on the sum of preemption being zip, but on each potential reader (writers take locks) passing thru a "sync point". Your notion of waiting for each task to arrive "naturally" at schedule() would work. It is, in fact, over kill as you could also add arrival at sys call exit as a (the) "sync point". In fact, for module unload, isn't this the real "sync point"? After all, a module can call schedule, or did I miss a usage counter somewhere? By the way, there is a paper on this somewhere on the web. Anyone remember where? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: On Wed, 21 Mar 2001, Keith Owens wrote: I misread the code, but the idea is still correct. Add a preemption depth counter to each cpu, when you schedule and the depth is zero then you know that the cpu is no longer holding any references to quiesced structures. A task that has been preempted is on the run queue and can be rescheduled on a different CPU, so I can't see how a per-CPU counter would work. It seems to me that you would need a per run queue counter, like the example I gave in a previous posting. Exactly so. The method does not depend on the sum of preemption being zip, but on each potential reader (writers take locks) passing thru a "sync point". Your notion of waiting for each task to arrive "naturally" at schedule() would work. It is, in fact, over kill as you could also add arrival at sys call exit as a (the) "sync point". In fact, for module unload, isn't this the real "sync point"? After all, a module can call schedule, or did I miss a usage counter somewhere? By the way, there is a paper on this somewhere on the web. Anyone remember where? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 21 Mar 2001 00:04:56 -0800, george anzinger [EMAIL PROTECTED] wrote: Exactly so. The method does not depend on the sum of preemption being zip, but on each potential reader (writers take locks) passing thru a "sync point". Your notion of waiting for each task to arrive "naturally" at schedule() would work. It is, in fact, over kill as you could also add arrival at sys call exit as a (the) "sync point". In fact, for module unload, isn't this the real "sync point"? After all, a module can call schedule, or did I miss a usage counter somewhere? A module can call schedule but it must do MOD_INC_USE_COUNT first. Sleeping in module code without incrementing the module use count first is a shooting offence. It is so full of races that you may as well call it Daytona. By the way, there is a paper on this somewhere on the web. Anyone remember where? http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: A task that has been preempted is on the run queue and can be rescheduled on a different CPU, so I can't see how a per-CPU counter would work. It seems to me that you would need a per run queue counter, like the example I gave in a previous posting. Ouch. What about all the per cpu structures in the kernel, how do you handle them if a preempted task can be rescheduled on another cpu? int count[NR_CPUS], *p; p = count+smp_processor_id(); /* start on cpu 0, count[0] */ if (*p = 1024) { /* preempt here, reschedule on cpu 1 */ *p = 1; /* update cpu 0 count from cpu 1, oops */ } Unless you find every use of a per cpu structure and wrap a spin lock around it, migrating a task from one cpu to another is going to be a source of wierd and wonderful errors. Since the main use of per cpu structures is to avoid locks, adding a spin lock to every structure will be a killer. Or have I missed something? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, Mar 21, 2001 at 08:19:54PM +1100, Keith Owens wrote: Ouch. What about all the per cpu structures in the kernel, how do you handle them if a preempted task can be rescheduled on another cpu? int count[NR_CPUS], *p; p = count+smp_processor_id(); /* start on cpu 0, count[0] */ if (*p = 1024) { /* preempt here, reschedule on cpu 1 */ *p = 1; /* update cpu 0 count from cpu 1, oops */ } Unless you find every use of a per cpu structure and wrap a spin lock around it, migrating a task from one cpu to another is going to be a source of wierd and wonderful errors. Since the main use of per cpu structures is to avoid locks, adding a spin lock to every structure will be a killer. Or have I missed something? That's why Linus suggested it for UP only. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
george anzinger writes: By the by, if a preemption lock is all that is needed the patch defines it and it is rather fast (an inc going in and a dec test comming out). A lot faster than a spin lock with its "LOCK" access. A preempt lock does not need to be "LOCK"ed because the only contender is the same cpu. So we would have to invoke this thing around every set of smp_processor_id() references? Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 21 Mar 2001, Andrew Morton wrote: It's a problem for uniprocessors as well. Example: #define current_cpu_data boot_cpu_data #define pgd_quicklist (current_cpu_data.pgd_quick) extern __inline__ void free_pgd_fast(pgd_t *pgd) { *(unsigned long *)pgd = (unsigned long) pgd_quicklist; pgd_quicklist = (unsigned long *) pgd; pgtable_cache_size++; } Preemption could corrupt this list. Thanks, Andrew, for pointing this out. I've added fixes to the patch for this problem and the others in pgalloc.h. If you know of any other similar problems on uniprocessors, please let me know. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Keith Owens writes: Or have I missed something? Nope, it is a fundamental problem with such kernel pre-emption schemes. As a result, it would also break our big-reader locks (see include/linux/brlock.h). Basically, anything which uses smp_processor_id() would need to be holding some lock so as to not get pre-empted. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 21 Mar 2001, David S. Miller wrote: Basically, anything which uses smp_processor_id() would need to be holding some lock so as to not get pre-empted. Not necessarily. Another solution for the smp_processor_id() case is to ensure that the task can only be scheduled on the current CPU for the duration that the value of smp_processor_id() is used. Or, if the critical region is very short, to disable interrupts on the local CPU. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
"David S. Miller" wrote: Keith Owens writes: Or have I missed something? Nope, it is a fundamental problem with such kernel pre-emption schemes. As a result, it would also break our big-reader locks (see include/linux/brlock.h). Basically, anything which uses smp_processor_id() would need to be holding some lock so as to not get pre-empted. It's a problem for uniprocessors as well. Example: #define current_cpu_data boot_cpu_data #define pgd_quicklist (current_cpu_data.pgd_quick) extern __inline__ void free_pgd_fast(pgd_t *pgd) { *(unsigned long *)pgd = (unsigned long) pgd_quicklist; pgd_quicklist = (unsigned long *) pgd; pgtable_cache_size++; } Preemption could corrupt this list. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Wed, 21 Mar 2001, Keith Owens wrote: > I misread the code, but the idea is still correct. Add a preemption > depth counter to each cpu, when you schedule and the depth is zero then > you know that the cpu is no longer holding any references to quiesced > structures. A task that has been preempted is on the run queue and can be rescheduled on a different CPU, so I can't see how a per-CPU counter would work. It seems to me that you would need a per run queue counter, like the example I gave in a previous posting. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001 16:48:01 -0800 (PST), Nigel Gamble <[EMAIL PROTECTED]> wrote: >On Tue, 20 Mar 2001, Keith Owens wrote: >> The preemption patch only allows preemption from interrupt and only for >> a single level of preemption. That coexists quite happily with >> synchronize_kernel() which runs in user context. Just count user >> context schedules (preempt_count == 0), not preemptive schedules. > >If you're looking at preempt_schedule(), note the call to ctx_sw_off() >only increments current->preempt_count for the preempted task - the >higher priority preempting task that is about to be scheduled will have >a preempt_count of 0. I misread the code, but the idea is still correct. Add a preemption depth counter to each cpu, when you schedule and the depth is zero then you know that the cpu is no longer holding any references to quiesced structures. >So, to make sure I understand this, the code to free a node would look >like: > > prev->next = node->next; /* assumed to be atomic */ > synchronize_kernel(); > free(node); > >So that any other CPU concurrently traversing the list would see a >consistent state, either including or not including "node" before the >call to synchronize_kernel(); but after synchronize_kernel() all other >CPUs are guaranteed to see a list that no longer includes "node", so it >is now safe to free it. > >It looks like there are also implicit assumptions to this approach, like >no other CPU is trying to use the same approach simultaneously to free >"prev". Not quite. The idea is that readers can traverse lists without locks, code that changes the list needs to take a semaphore first. Read node = node->next; Update down(_sem); prev->next = node->next; synchronize_kernel(); free(node); up(_sem); Because the readers have no locks, other cpus can have references to the node being freed. The updating code needs to wait until all other cpus have gone through at least one schedule to ensure that all dangling references have been flushed. Adding preemption complicates this slightly, we have to distinguish between the bottom level schedule and higher level schedules for preemptive code. Only when all preemptive code on a cpu has ended is it safe to say that there are no dangling references left on that cpu. This method is a win for high read, low update lists. Instead of penalizing the read code every time on the off chance that somebody will update the data, speed up the common code and penalize the update code. The classic example is module code, it is rarely unloaded but right now everything that *might* be entering a module has to grab the module spin lock and update the module use count. So main line code is being slowed down all the time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001, Keith Owens wrote: > The preemption patch only allows preemption from interrupt and only for > a single level of preemption. That coexists quite happily with > synchronize_kernel() which runs in user context. Just count user > context schedules (preempt_count == 0), not preemptive schedules. I'm not sure what you mean by "only for a single level of preemption." It's possible for a preempting process to be preempted itself by a higher priority process, and for that process to be preempted by an even higher priority one, limited only by the number of processes waiting for interrupt handlers to make them runnable. This isn't very likely in practice (kernel preemptions tend to be rare compared to normal calls to schedule()), but it could happen in theory. If you're looking at preempt_schedule(), note the call to ctx_sw_off() only increments current->preempt_count for the preempted task - the higher priority preempting task that is about to be scheduled will have a preempt_count of 0. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001, Rusty Russell wrote: > I can see three problems with this approach, only one of which > is serious. > > The first is code which is already SMP unsafe is now a problem for > everyone, not just the 0.1% of SMP machines. I consider this a good > thing for 2.5 though. So do I. > The second is that there are "manual" locking schemes which are used > in several places in the kernel which rely on non-preemptability; > de-facto spinlocks if you will. I consider all these uses flawed: (1) > they are often subtly broken anyway, (2) they make reading those parts > of the code much harder, and (3) they break when things like this are > done. Likewise. > The third is that preemtivity conflicts with the naive > quiescent-period approach proposed for module unloading in 2.5, and > useful for several other things (eg. hotplugging CPUs). This method > relies on knowing that when a schedule() has occurred on every CPU, we > know noone is holding certain references. The simplest example is a > single linked list: you can traverse without a lock as long as you > don't sleep, and then someone can unlink a node, and wait for a > schedule on every other CPU before freeing it. The non-SMP case is a > noop. See synchonize_kernel() below. So, to make sure I understand this, the code to free a node would look like: prev->next = node->next; /* assumed to be atomic */ synchronize_kernel(); free(node); So that any other CPU concurrently traversing the list would see a consistent state, either including or not including "node" before the call to synchronize_kernel(); but after synchronize_kernel() all other CPUs are guaranteed to see a list that no longer includes "node", so it is now safe to free it. It looks like there are also implicit assumptions to this approach, like no other CPU is trying to use the same approach simultaneously to free "prev". So my initial reaction is that this approach is, like the manual locking schemes you commented on above, open to being subtly broken when people don't understand all the implicit assumptions and subsequently invalidate them. > This, too, is soluble, but it means that synchronize_kernel() must > guarantee that each task which was running or preempted in kernel > space when it was called, has been non-preemtively scheduled before > synchronize_kernel() can exit. Icky. Yes, you're right. > Thoughts? Perhaps synchronize_kernel() could take the run_queue lock, mark all the tasks on it and count them. Any task marked when it calls schedule() voluntarily (but not if it is preempted) is unmarked and the count decremented. synchronize_kernel() continues until the count is zero. As you said, "Icky." > /* We could keep a schedule count for each CPU and make idle tasks >schedule (some don't unless need_resched), but this scales quite >well (eg. 64 processors, average time to wait for first schedule = >jiffie/64. Total time for all processors = jiffie/63 + jiffie/62... > >At 1024 cpus, this is about 7.5 jiffies. And that assumes noone >schedules early. --RR */ > void synchronize_kernel(void) > { > unsigned long cpus_allowed, policy, rt_priority; > > /* Save current state */ > cpus_allowed = current->cpus_allowed; > policy = current->policy; > rt_priority = current->rt_priority; > > /* Create an unreal time task. */ > current->policy = SCHED_FIFO; > current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO); > > /* Make us schedulable on all CPUs. */ > current->cpus_allowed = (1UL< > /* Eliminate current cpu, reschedule */ > while ((current->cpus_allowed &= ~(1 << smp_processor_id())) != 0) > schedule(); > > /* Back to normal. */ > current->cpus_allowed = cpus_allowed; > current->policy = policy; > current->rt_priority = rt_priority; > } > Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: > > On Tue, 20 Mar 2001, Roger Larsson wrote: > > One little readability thing I found. > > The prev->state TASK_ value is mostly used as a plain value > > but the new TASK_PREEMPTED is or:ed together with whatever was there. > > Later when we switch to check the state it is checked against TASK_PREEMPTED > > only. Since TASK_RUNNING is 0 it works OK but... > > Yes, you're right. I had forgotten that TASK_RUNNING is 0 and I think I > was assuming that there could be (rare) cases where a task was preempted > while prev->state was in transition such that no other flags were set. > This is, of course, impossible given that TASK_RUNNING is 0. So your > change makes the common case more obvious (to me, at least!) > > > --- sched.c.nigel Tue Mar 20 18:52:43 2001 > > +++ sched.c.roger Tue Mar 20 19:03:28 2001 > > @@ -553,7 +553,7 @@ > > #endif > > del_from_runqueue(prev); > > #ifdef CONFIG_PREEMPT > > - case TASK_PREEMPTED: > > + case TASK_RUNNING | TASK_PREEMPTED: > > #endif > > case TASK_RUNNING: > > } > > > > > > We could add all/(other common) combinations as cases > > > > switch (prev->state) { > > case TASK_INTERRUPTIBLE: > > if (signal_pending(prev)) { > > prev->state = TASK_RUNNING; > > break; > > } > > default: > > #ifdef CONFIG_PREEMPT > > if (prev->state & TASK_PREEMPTED) > > break; > > #endif > > del_from_runqueue(prev); > > #ifdef CONFIG_PREEMPT > > case TASK_RUNNING | TASK_PREEMPTED: > > case TASK_INTERRUPTIBLE | TASK_PREEMPTED: > > case TASK_UNINTERRUPTIBLE | TASK_PREEMPTED: > > #endif > > case TASK_RUNNING: > > } > > > > > > Then the break in default case could almost be replaced with a BUG()... > > (I have not checked the generated code) > > The other cases are not very common, as they only happen if a task is > preempted during the short time that it is running while in the process > of changing state while going to sleep or waking up, so the default case > is probably OK for them; and I'd be happier to leave the default case > for reliability reasons anyway. Especially since he forgot: TASK_ZOMBIE TASK_STOPPED TASK_SWAPPING I don't know about the last two but TASK_ZOMBIE must be handled correctly or the task will never clear. In general, a task must run till it gets to schedule() before the actual state is "real" so the need for the TASK_PREEMPT. The actual code generated with what you propose should be the same (even if TASK_RUNNING != 0, except for the constant). George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001, Roger Larsson wrote: > One little readability thing I found. > The prev->state TASK_ value is mostly used as a plain value > but the new TASK_PREEMPTED is or:ed together with whatever was there. > Later when we switch to check the state it is checked against TASK_PREEMPTED > only. Since TASK_RUNNING is 0 it works OK but... Yes, you're right. I had forgotten that TASK_RUNNING is 0 and I think I was assuming that there could be (rare) cases where a task was preempted while prev->state was in transition such that no other flags were set. This is, of course, impossible given that TASK_RUNNING is 0. So your change makes the common case more obvious (to me, at least!) > --- sched.c.nigel Tue Mar 20 18:52:43 2001 > +++ sched.c.roger Tue Mar 20 19:03:28 2001 > @@ -553,7 +553,7 @@ > #endif > del_from_runqueue(prev); > #ifdef CONFIG_PREEMPT > - case TASK_PREEMPTED: > + case TASK_RUNNING | TASK_PREEMPTED: > #endif > case TASK_RUNNING: > } > > > We could add all/(other common) combinations as cases > > switch (prev->state) { > case TASK_INTERRUPTIBLE: > if (signal_pending(prev)) { > prev->state = TASK_RUNNING; > break; > } > default: > #ifdef CONFIG_PREEMPT > if (prev->state & TASK_PREEMPTED) > break; > #endif > del_from_runqueue(prev); > #ifdef CONFIG_PREEMPT > case TASK_RUNNING | TASK_PREEMPTED: > case TASK_INTERRUPTIBLE | TASK_PREEMPTED: > case TASK_UNINTERRUPTIBLE | TASK_PREEMPTED: > #endif > case TASK_RUNNING: > } > > > Then the break in default case could almost be replaced with a BUG()... > (I have not checked the generated code) The other cases are not very common, as they only happen if a task is preempted during the short time that it is running while in the process of changing state while going to sleep or waking up, so the default case is probably OK for them; and I'd be happier to leave the default case for reliability reasons anyway. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Hi, One little readability thing I found. The prev->state TASK_ value is mostly used as a plain value but the new TASK_PREEMPTED is or:ed together with whatever was there. Later when we switch to check the state it is checked against TASK_PREEMPTED only. Since TASK_RUNNING is 0 it works OK but... --- sched.c.nigel Tue Mar 20 18:52:43 2001 +++ sched.c.roger Tue Mar 20 19:03:28 2001 @@ -553,7 +553,7 @@ #endif del_from_runqueue(prev); #ifdef CONFIG_PREEMPT - case TASK_PREEMPTED: + case TASK_RUNNING | TASK_PREEMPTED: #endif case TASK_RUNNING: } We could add all/(other common) combinations as cases switch (prev->state) { case TASK_INTERRUPTIBLE: if (signal_pending(prev)) { prev->state = TASK_RUNNING; break; } default: #ifdef CONFIG_PREEMPT if (prev->state & TASK_PREEMPTED) break; #endif del_from_runqueue(prev); #ifdef CONFIG_PREEMPT case TASK_RUNNING | TASK_PREEMPTED: case TASK_INTERRUPTIBLE | TASK_PREEMPTED: case TASK_UNINTERRUPTIBLE | TASK_PREEMPTED: #endif case TASK_RUNNING: } Then the break in default case could almost be replaced with a BUG()... (I have not checked the generated code) /RogerL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001 19:43:50 +1100, Rusty Russell <[EMAIL PROTECTED]> wrote: >The third is that preemtivity conflicts with the naive >quiescent-period approach proposed for module unloading in 2.5, and >useful for several other things (eg. hotplugging CPUs). This method >relies on knowing that when a schedule() has occurred on every CPU, we >know noone is holding certain references. > >This, too, is soluble, but it means that synchronize_kernel() must >guarantee that each task which was running or preempted in kernel >space when it was called, has been non-preemtively scheduled before >synchronize_kernel() can exit. Icky. The preemption patch only allows preemption from interrupt and only for a single level of preemption. That coexists quite happily with synchronize_kernel() which runs in user context. Just count user context schedules (preempt_count == 0), not preemptive schedules. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message <[EMAIL PROTECTED]> you write: > Kernel preemption is not allowed while spinlocks are held, which means > that this patch alone cannot guarantee low preemption latencies. But > as long held locks (in particular the BKL) are replaced by finer-grained > locks, this patch will enable lower latencies as the kernel also becomes > more scalable on large SMP systems. Hi Nigel, I can see three problems with this approach, only one of which is serious. The first is code which is already SMP unsafe is now a problem for everyone, not just the 0.1% of SMP machines. I consider this a good thing for 2.5 though. The second is that there are "manual" locking schemes which are used in several places in the kernel which rely on non-preemptability; de-facto spinlocks if you will. I consider all these uses flawed: (1) they are often subtly broken anyway, (2) they make reading those parts of the code much harder, and (3) they break when things like this are done. The third is that preemtivity conflicts with the naive quiescent-period approach proposed for module unloading in 2.5, and useful for several other things (eg. hotplugging CPUs). This method relies on knowing that when a schedule() has occurred on every CPU, we know noone is holding certain references. The simplest example is a single linked list: you can traverse without a lock as long as you don't sleep, and then someone can unlink a node, and wait for a schedule on every other CPU before freeing it. The non-SMP case is a noop. See synchonize_kernel() below. This, too, is soluble, but it means that synchronize_kernel() must guarantee that each task which was running or preempted in kernel space when it was called, has been non-preemtively scheduled before synchronize_kernel() can exit. Icky. Thoughts? Rusty. -- Premature optmztion is rt of all evl. --DK /* We could keep a schedule count for each CPU and make idle tasks schedule (some don't unless need_resched), but this scales quite well (eg. 64 processors, average time to wait for first schedule = jiffie/64. Total time for all processors = jiffie/63 + jiffie/62... At 1024 cpus, this is about 7.5 jiffies. And that assumes noone schedules early. --RR */ void synchronize_kernel(void) { unsigned long cpus_allowed, policy, rt_priority; /* Save current state */ cpus_allowed = current->cpus_allowed; policy = current->policy; rt_priority = current->rt_priority; /* Create an unreal time task. */ current->policy = SCHED_FIFO; current->rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO); /* Make us schedulable on all CPUs. */ current->cpus_allowed = (1ULcpus_allowed = cpus_allowed; current->policy = policy; current->rt_priority = rt_priority; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
In message [EMAIL PROTECTED] you write: Kernel preemption is not allowed while spinlocks are held, which means that this patch alone cannot guarantee low preemption latencies. But as long held locks (in particular the BKL) are replaced by finer-grained locks, this patch will enable lower latencies as the kernel also becomes more scalable on large SMP systems. Hi Nigel, I can see three problems with this approach, only one of which is serious. The first is code which is already SMP unsafe is now a problem for everyone, not just the 0.1% of SMP machines. I consider this a good thing for 2.5 though. The second is that there are "manual" locking schemes which are used in several places in the kernel which rely on non-preemptability; de-facto spinlocks if you will. I consider all these uses flawed: (1) they are often subtly broken anyway, (2) they make reading those parts of the code much harder, and (3) they break when things like this are done. The third is that preemtivity conflicts with the naive quiescent-period approach proposed for module unloading in 2.5, and useful for several other things (eg. hotplugging CPUs). This method relies on knowing that when a schedule() has occurred on every CPU, we know noone is holding certain references. The simplest example is a single linked list: you can traverse without a lock as long as you don't sleep, and then someone can unlink a node, and wait for a schedule on every other CPU before freeing it. The non-SMP case is a noop. See synchonize_kernel() below. This, too, is soluble, but it means that synchronize_kernel() must guarantee that each task which was running or preempted in kernel space when it was called, has been non-preemtively scheduled before synchronize_kernel() can exit. Icky. Thoughts? Rusty. -- Premature optmztion is rt of all evl. --DK /* We could keep a schedule count for each CPU and make idle tasks schedule (some don't unless need_resched), but this scales quite well (eg. 64 processors, average time to wait for first schedule = jiffie/64. Total time for all processors = jiffie/63 + jiffie/62... At 1024 cpus, this is about 7.5 jiffies. And that assumes noone schedules early. --RR */ void synchronize_kernel(void) { unsigned long cpus_allowed, policy, rt_priority; /* Save current state */ cpus_allowed = current-cpus_allowed; policy = current-policy; rt_priority = current-rt_priority; /* Create an unreal time task. */ current-policy = SCHED_FIFO; current-rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO); /* Make us schedulable on all CPUs. */ current-cpus_allowed = (1ULsmp_num_cpus)-1; /* Eliminate current cpu, reschedule */ while ((current-cpus_allowed = ~(1 smp_processor_id())) != 0) schedule(); /* Back to normal. */ current-cpus_allowed = cpus_allowed; current-policy = policy; current-rt_priority = rt_priority; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001 19:43:50 +1100, Rusty Russell [EMAIL PROTECTED] wrote: The third is that preemtivity conflicts with the naive quiescent-period approach proposed for module unloading in 2.5, and useful for several other things (eg. hotplugging CPUs). This method relies on knowing that when a schedule() has occurred on every CPU, we know noone is holding certain references. This, too, is soluble, but it means that synchronize_kernel() must guarantee that each task which was running or preempted in kernel space when it was called, has been non-preemtively scheduled before synchronize_kernel() can exit. Icky. The preemption patch only allows preemption from interrupt and only for a single level of preemption. That coexists quite happily with synchronize_kernel() which runs in user context. Just count user context schedules (preempt_count == 0), not preemptive schedules. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Hi, One little readability thing I found. The prev-state TASK_ value is mostly used as a plain value but the new TASK_PREEMPTED is or:ed together with whatever was there. Later when we switch to check the state it is checked against TASK_PREEMPTED only. Since TASK_RUNNING is 0 it works OK but... --- sched.c.nigel Tue Mar 20 18:52:43 2001 +++ sched.c.roger Tue Mar 20 19:03:28 2001 @@ -553,7 +553,7 @@ #endif del_from_runqueue(prev); #ifdef CONFIG_PREEMPT - case TASK_PREEMPTED: + case TASK_RUNNING | TASK_PREEMPTED: #endif case TASK_RUNNING: } We could add all/(other common) combinations as cases switch (prev-state) { case TASK_INTERRUPTIBLE: if (signal_pending(prev)) { prev-state = TASK_RUNNING; break; } default: #ifdef CONFIG_PREEMPT if (prev-state TASK_PREEMPTED) break; #endif del_from_runqueue(prev); #ifdef CONFIG_PREEMPT case TASK_RUNNING | TASK_PREEMPTED: case TASK_INTERRUPTIBLE | TASK_PREEMPTED: case TASK_UNINTERRUPTIBLE | TASK_PREEMPTED: #endif case TASK_RUNNING: } Then the break in default case could almost be replaced with a BUG()... (I have not checked the generated code) /RogerL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
Nigel Gamble wrote: On Tue, 20 Mar 2001, Roger Larsson wrote: One little readability thing I found. The prev-state TASK_ value is mostly used as a plain value but the new TASK_PREEMPTED is or:ed together with whatever was there. Later when we switch to check the state it is checked against TASK_PREEMPTED only. Since TASK_RUNNING is 0 it works OK but... Yes, you're right. I had forgotten that TASK_RUNNING is 0 and I think I was assuming that there could be (rare) cases where a task was preempted while prev-state was in transition such that no other flags were set. This is, of course, impossible given that TASK_RUNNING is 0. So your change makes the common case more obvious (to me, at least!) --- sched.c.nigel Tue Mar 20 18:52:43 2001 +++ sched.c.roger Tue Mar 20 19:03:28 2001 @@ -553,7 +553,7 @@ #endif del_from_runqueue(prev); #ifdef CONFIG_PREEMPT - case TASK_PREEMPTED: + case TASK_RUNNING | TASK_PREEMPTED: #endif case TASK_RUNNING: } We could add all/(other common) combinations as cases switch (prev-state) { case TASK_INTERRUPTIBLE: if (signal_pending(prev)) { prev-state = TASK_RUNNING; break; } default: #ifdef CONFIG_PREEMPT if (prev-state TASK_PREEMPTED) break; #endif del_from_runqueue(prev); #ifdef CONFIG_PREEMPT case TASK_RUNNING | TASK_PREEMPTED: case TASK_INTERRUPTIBLE | TASK_PREEMPTED: case TASK_UNINTERRUPTIBLE | TASK_PREEMPTED: #endif case TASK_RUNNING: } Then the break in default case could almost be replaced with a BUG()... (I have not checked the generated code) The other cases are not very common, as they only happen if a task is preempted during the short time that it is running while in the process of changing state while going to sleep or waking up, so the default case is probably OK for them; and I'd be happier to leave the default case for reliability reasons anyway. Especially since he forgot: TASK_ZOMBIE TASK_STOPPED TASK_SWAPPING I don't know about the last two but TASK_ZOMBIE must be handled correctly or the task will never clear. In general, a task must run till it gets to schedule() before the actual state is "real" so the need for the TASK_PREEMPT. The actual code generated with what you propose should be the same (even if TASK_RUNNING != 0, except for the constant). George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001, Rusty Russell wrote: I can see three problems with this approach, only one of which is serious. The first is code which is already SMP unsafe is now a problem for everyone, not just the 0.1% of SMP machines. I consider this a good thing for 2.5 though. So do I. The second is that there are "manual" locking schemes which are used in several places in the kernel which rely on non-preemptability; de-facto spinlocks if you will. I consider all these uses flawed: (1) they are often subtly broken anyway, (2) they make reading those parts of the code much harder, and (3) they break when things like this are done. Likewise. The third is that preemtivity conflicts with the naive quiescent-period approach proposed for module unloading in 2.5, and useful for several other things (eg. hotplugging CPUs). This method relies on knowing that when a schedule() has occurred on every CPU, we know noone is holding certain references. The simplest example is a single linked list: you can traverse without a lock as long as you don't sleep, and then someone can unlink a node, and wait for a schedule on every other CPU before freeing it. The non-SMP case is a noop. See synchonize_kernel() below. So, to make sure I understand this, the code to free a node would look like: prev-next = node-next; /* assumed to be atomic */ synchronize_kernel(); free(node); So that any other CPU concurrently traversing the list would see a consistent state, either including or not including "node" before the call to synchronize_kernel(); but after synchronize_kernel() all other CPUs are guaranteed to see a list that no longer includes "node", so it is now safe to free it. It looks like there are also implicit assumptions to this approach, like no other CPU is trying to use the same approach simultaneously to free "prev". So my initial reaction is that this approach is, like the manual locking schemes you commented on above, open to being subtly broken when people don't understand all the implicit assumptions and subsequently invalidate them. This, too, is soluble, but it means that synchronize_kernel() must guarantee that each task which was running or preempted in kernel space when it was called, has been non-preemtively scheduled before synchronize_kernel() can exit. Icky. Yes, you're right. Thoughts? Perhaps synchronize_kernel() could take the run_queue lock, mark all the tasks on it and count them. Any task marked when it calls schedule() voluntarily (but not if it is preempted) is unmarked and the count decremented. synchronize_kernel() continues until the count is zero. As you said, "Icky." /* We could keep a schedule count for each CPU and make idle tasks schedule (some don't unless need_resched), but this scales quite well (eg. 64 processors, average time to wait for first schedule = jiffie/64. Total time for all processors = jiffie/63 + jiffie/62... At 1024 cpus, this is about 7.5 jiffies. And that assumes noone schedules early. --RR */ void synchronize_kernel(void) { unsigned long cpus_allowed, policy, rt_priority; /* Save current state */ cpus_allowed = current-cpus_allowed; policy = current-policy; rt_priority = current-rt_priority; /* Create an unreal time task. */ current-policy = SCHED_FIFO; current-rt_priority = 1001 + sys_sched_get_priority_max(SCHED_FIFO); /* Make us schedulable on all CPUs. */ current-cpus_allowed = (1ULsmp_num_cpus)-1; /* Eliminate current cpu, reschedule */ while ((current-cpus_allowed = ~(1 smp_processor_id())) != 0) schedule(); /* Back to normal. */ current-cpus_allowed = cpus_allowed; current-policy = policy; current-rt_priority = rt_priority; } Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001, Keith Owens wrote: The preemption patch only allows preemption from interrupt and only for a single level of preemption. That coexists quite happily with synchronize_kernel() which runs in user context. Just count user context schedules (preempt_count == 0), not preemptive schedules. I'm not sure what you mean by "only for a single level of preemption." It's possible for a preempting process to be preempted itself by a higher priority process, and for that process to be preempted by an even higher priority one, limited only by the number of processes waiting for interrupt handlers to make them runnable. This isn't very likely in practice (kernel preemptions tend to be rare compared to normal calls to schedule()), but it could happen in theory. If you're looking at preempt_schedule(), note the call to ctx_sw_off() only increments current-preempt_count for the preempted task - the higher priority preempting task that is about to be scheduled will have a preempt_count of 0. Nigel Gamble[EMAIL PROTECTED] Mountain View, CA, USA. http://www.nrg.org/ MontaVista Software [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for 2.5] preemptible kernel
On Tue, 20 Mar 2001 16:48:01 -0800 (PST), Nigel Gamble [EMAIL PROTECTED] wrote: On Tue, 20 Mar 2001, Keith Owens wrote: The preemption patch only allows preemption from interrupt and only for a single level of preemption. That coexists quite happily with synchronize_kernel() which runs in user context. Just count user context schedules (preempt_count == 0), not preemptive schedules. If you're looking at preempt_schedule(), note the call to ctx_sw_off() only increments current-preempt_count for the preempted task - the higher priority preempting task that is about to be scheduled will have a preempt_count of 0. I misread the code, but the idea is still correct. Add a preemption depth counter to each cpu, when you schedule and the depth is zero then you know that the cpu is no longer holding any references to quiesced structures. So, to make sure I understand this, the code to free a node would look like: prev-next = node-next; /* assumed to be atomic */ synchronize_kernel(); free(node); So that any other CPU concurrently traversing the list would see a consistent state, either including or not including "node" before the call to synchronize_kernel(); but after synchronize_kernel() all other CPUs are guaranteed to see a list that no longer includes "node", so it is now safe to free it. It looks like there are also implicit assumptions to this approach, like no other CPU is trying to use the same approach simultaneously to free "prev". Not quite. The idea is that readers can traverse lists without locks, code that changes the list needs to take a semaphore first. Read node = node-next; Update down(list_sem); prev-next = node-next; synchronize_kernel(); free(node); up(list_sem); Because the readers have no locks, other cpus can have references to the node being freed. The updating code needs to wait until all other cpus have gone through at least one schedule to ensure that all dangling references have been flushed. Adding preemption complicates this slightly, we have to distinguish between the bottom level schedule and higher level schedules for preemptive code. Only when all preemptive code on a cpu has ended is it safe to say that there are no dangling references left on that cpu. This method is a win for high read, low update lists. Instead of penalizing the read code every time on the off chance that somebody will update the data, speed up the common code and penalize the update code. The classic example is module code, it is rarely unloaded but right now everything that *might* be entering a module has to grab the module spin lock and update the module use count. So main line code is being slowed down all the time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/