Re: No 100 HZ timer!
On Mon, 23 Apr 2001, Ulrich Windl wrote: > IMHO the POSIX is doable to comply with POSIX. Probably not what many > of the RT freaks expect, but doable. I'm tuning the nanoseconds for a > while now... > > Ulrich thanks for calling me a freak :) yes, linux could have posix timers cobbled on today and comply with the POSIX spec. but, we would like to make it better, not just feature complete. 10ms timer resolutions, while completely in compliance w/ the posix spec, are completely useless for "RT freaks" remember that 95% of all computers in the world are embedded and of those most nead at least "soft" RT behavior. seems like a good growth market for linux to me. when a PPC G4 is capable of 30 nanosecond resolution, why would you want to settle for 10 millisecond (30 billionths of a second vs 10 thousandths for those of you who haven't had to familiarize yourselves with sub-second time units) > > On 17 Apr 2001, at 11:53, george anzinger wrote: > > > I was thinking that it might be good to remove the POSIX API for the > > kernel and allow a somewhat simplified interface. For example, the user > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- /*** ** Mark Salisbury | Mercury Computer Systems** ** [EMAIL PROTECTED] | System OS - Kernel Team ** **** ** I will be riding in the Multiple Sclerosis** ** Great Mass Getaway, a 150 mile bike ride from ** ** Boston to Provincetown. Last year I raised ** ** over $1200. This year I would like to beat ** ** that. If you would like to contribute, ** ** please contact me.** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
IMHO the POSIX is doable to comply with POSIX. Probably not what many of the RT freaks expect, but doable. I'm tuning the nanoseconds for a while now... Ulrich On 17 Apr 2001, at 11:53, george anzinger wrote: > I was thinking that it might be good to remove the POSIX API for the > kernel and allow a somewhat simplified interface. For example, the user - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
IMHO the POSIX is doable to comply with POSIX. Probably not what many of the RT freaks expect, but doable. I'm tuning the nanoseconds for a while now... Ulrich On 17 Apr 2001, at 11:53, george anzinger wrote: I was thinking that it might be good to remove the POSIX API for the kernel and allow a somewhat simplified interface. For example, the user - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: > > > a.) list insertion of an arbitrary timer, > > should be O(log(n)) at worst > > > > > b.) removal of canceled and expired timers, and > > easy to make O(1) > > I thought this was true also, but the priority heap structure that has > been discussed here has a O(log(n)) removal time. Several priority queue structures support removal in O(1) time. Perhaps you are thinking of the classic array-based heap, for which removal is O(log n) in the general case. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Mark Salisbury wrote: > > > Functional Specification for the high-res-timers project. > > > > In addition we expect that we will provide a high resolution timer for > > kernel use (heck, we may provide several). > > what we do here determines what we can do for the user.. I was thinking that it might be good to remove the POSIX API for the kernel and allow a somewhat simplified interface. For example, the user gets to resolution by specifying a CLOCK, where we MAY want to allow the kernel call to directly specify the resolution. This has already been suggested. I suppose you could say the functional spec should define the kernel PI (KPI?) as well as the user API, but that is a bit fuzzy at this time as I think it will depend on how we actually code the user functionality. Another example: I am leaning toward using a two word uptime composed of jiffies (i.e. 1/HZ since boot) and a machine dependent sub jiffie unit. Each ARCH would then define this unit as well as the conversion routines to move it back and forth to nanoseconds, microseconds, and 1/HZ. I think this format would play well in task time accounting, as well as in the timer management. For calls to something like delay(), however, it sucks. I think these calls want a single word, possibly microsecond, time specification. This gives a 16 or 17 minutes max and 1 microsecond min. which probably covers 99.99% of all kernel delay needs. Another kernel internal interface should allow the user specified structures (timespec and timeval). The notion is to put all the conversion routines in the timer code to maintain the specified resolution, and (more importantly), to avoid conversion to a format that just needs an additional conversion. To summarize, I think there is a need for two classes of timer interfaces in the kernel: a.) For drivers and others that need "delay()" sorts of things, and b.) For system calls that handle user specified times. > > > We will provide several "clocks" as defined by the standard. In > > particular, the following capabilities will be attached to some clock, > > regardless of the actual clock "name" we end up using: > > > > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as > > linux today). > > > > CLOCK_HIGH_RES a wall clock supporting timers with the highest > > resolution the hardware supports. > > > > CLOCK_1US a wall clock supporting timers with 1 micro second resolution > > (if the hardware allows it). > > > > CLOCK_UPTIME a clock that give the system up time. (Does this clock > > need to support timers?) > > > > CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution. > > > > Too many clocks. we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but > the others are just fluff. we should have 1 single clock mechanism for the > whole system with it's resolution and tick/tickless characteristics determined > at compile time. I think you already have let the nose of the camel into the tent :) Here is what I am thinking: Suppose an array of structures of type clock. Clock_id is an index into this array. Here is what is in the structure: struct clock{ int resolution; int *gettime(); int *settime(); int *convert_to_uptime(); int *convert_from_uptime(); ; }; Now the difference between CLOCK_UPTIME and CLOCK_REALTIME is surly in the gettime/settime and possibly in the resolution. But the difference between CLOCK_REALTIME and CLOCK_1US, CLOCK_HIGH_RES, CLOCK_10MS is JUST the resolution! In other words, all they cost is the table entry. Note that CLOCK_GMT, CLOCK_UST, and CLOCK_GPS, etc. all fit nicely into this same structure. We should also provide a way to "register" a new clock so the user can easily configure in additional clocks. There are ways of doing this that are really easy to use, e.g. the module_init() macro. > > also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a > convenience/compliance offset. If you mean by "true" that this clock can not be set, starts at 0 at boot time and can only be affected by rate adjustments to get it to pass a real second every second, I agree. > > > At the same time we will NOT support the following clocks: > > > > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or > > wall) of a given task. > > > > CLOCK_PROFILE a clock used to generate profiling events. > > > > CLOCK_??? any clock keyed to a task. > > we could do some KOOL things here but they are more related to process time > accounting and should be dealt with in that context and as a separate project. > > however our design should take these concepts into account and allow for easy > integration of these types of functionality. I agree. > > > > > (Note that this does not mean that the clock system will not support the > > virtual and profile clocks, but that they will not be accessible thru > > the POSIX timer interface.) > > I think that should sombody choose
Re: No 100 HZ timer!
> Functional Specification for the high-res-timers project. > > In addition we expect that we will provide a high resolution timer for > kernel use (heck, we may provide several). what we do here determines what we can do for the user.. > We will provide several "clocks" as defined by the standard. In > particular, the following capabilities will be attached to some clock, > regardless of the actual clock "name" we end up using: > > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as > linux today). > > CLOCK_HIGH_RES a wall clock supporting timers with the highest > resolution the hardware supports. > > CLOCK_1US a wall clock supporting timers with 1 micro second resolution > (if the hardware allows it). > > CLOCK_UPTIME a clock that give the system up time. (Does this clock > need to support timers?) > > CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution. > Too many clocks. we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but the others are just fluff. we should have 1 single clock mechanism for the whole system with it's resolution and tick/tickless characteristics determined at compile time. also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a convenience/compliance offset. > At the same time we will NOT support the following clocks: > > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or > wall) of a given task. > > CLOCK_PROFILE a clock used to generate profiling events. > > CLOCK_??? any clock keyed to a task. we could do some KOOL things here but they are more related to process time accounting and should be dealt with in that context and as a separate project. however our design should take these concepts into account and allow for easy integration of these types of functionality. > > (Note that this does not mean that the clock system will not support the > virtual and profile clocks, but that they will not be accessible thru > the POSIX timer interface.) I think that should sombody choose to implement them, they should be available at least through the clock_xxx interfaces.. > > It would be NICE if we could provide a way to hook other time support > into the system. In particular a > > CLOCK_WWV or CLOCK_GPS > CLOCK_NNTP > might be nice. The problem with these sorts of clocks is that they > imply an array of function pointers for each clock and function pointers > slow the code down because of their non predictability. Never the less, > we will attempt to allow easy expansion in this direction. > > Implications on the current kernel: > > The high resolution timers will require a fast clock access with the > maximum supported resolution in order to convert relative times to > absolute times. This same fast clock will be used to support the > various user and system time requests. > > There are two ways to provide timers to the kernel. For lack of a > better term we will refer to them as "ticked" and "tick less". Until we > have performance information that implies that one or the other of these > methods is better in all cases we will provide both ticked and tick less > systems. The variety to be used will be selected at configure time. > > For tick less systems we will need to provide code to collect execution > times. For the ticked system the current method of collection these > times will be used. This project will NOT attempt to improve the > resolution of these timers, however, the high speed, high resolution > access to the current time will allow others to augment the system in > this area. huh? why not? > > For the tick less system the project will also provide a time slice > expiration interrupt. > > The timer list(s) (all pending timers) need to be organized so that the > following operations are fast: > > a.) list insertion of an arbitrary timer, should be O(log(n)) at worst > b.) removal of canceled and expired timers, and easy to make O(1) > c.) finding the timer for "NOW" and its immediate followers. head of list or top of tree or top of heap or whatever, O(1) > Times in the timer list will be absolute and related to system up time. > These times will be converted to wall time as needed. and ONLY when needed by users > > The POSIX interface provides for "absolute" timers relative to a given do you mean "relative" timers? > clock. When these timers are related to a "wall" clock they will need > adjusting when the wall clock time is adjusted. These adjustments are > done for "leap seconds" and the date command. (Note, we are keeping > timers relative to "uptime" which can not be adjusted. This means that > relative timers and absolute timers related to CLOCK_UPTIME are not > affected by the above wall time adjustments.) absolute timers will automatically fall out when you adjust CLOCK_UPTIME... i.e. an absolute time is an absolute time and since CLOCK_UPTIME is the ultimate arbiter of what absolute time it is, adjusting
Re: No 100 HZ timer!
On Mon, 16 Apr 2001, you wrote: > Mark Salisbury wrote: > > > > > Given a system speed, there is a repeating timer rate which will consume > > > 100% of the system in handling the timer interrupts. An attempt will > > > be made to detect this rate and adjust the timer to prevent system > > > lockup. This adjustment will look like timer overruns to the user > > > (i.e. we will take a percent of the interrupts and record the untaken > > > interrupts as overruns) > > > > just at first blush, there are some things in general but I need to read > > this again and more closely > > > > but, with POSIX timers, there is a nifty little restriction/protection built > > into the spec regarding the re-insertion of short interval repeating timers. > > that is: a repeating timer will not be re-inserted until AFTER the > > associated signal handler has been handled. > > Actually what it says is: "Only a single signal shall be queued to the > process for a given timer at any point in time. When a timer for which > a signal is still pending expires, no signal shall be queued, and a > timer overrun shall occur." > > It then goes on to talk about the overrun count and how it is to be > managed. > I guess I was confusing what the spec said with the way in which I chose to comply with the spec. I calculate overruns (I know when a timer went off, and how many of its intervals have passed since it went off by by time_since_expire/interval) and don't reinsert until return from the signal handler. this prevents a flood of high frequency interrupts inherently and allows processing of the signal handlers to proceed cleanly. -- /*** ** Mark Salisbury | Mercury Computer Systems** ** [EMAIL PROTECTED] | System OS - Kernel Team ** **** ** I will be riding in the Multiple Sclerosis** ** Great Mass Getaway, a 150 mile bike ride from ** ** Boston to Provincetown. Last year I raised ** ** over $1200. This year I would like to beat ** ** that. If you would like to contribute, ** ** please contact me.** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
On Mon, 16 Apr 2001, you wrote: Mark Salisbury wrote: Given a system speed, there is a repeating timer rate which will consume 100% of the system in handling the timer interrupts. An attempt will be made to detect this rate and adjust the timer to prevent system lockup. This adjustment will look like timer overruns to the user (i.e. we will take a percent of the interrupts and record the untaken interrupts as overruns) just at first blush, there are some things in general but I need to read this again and more closely but, with POSIX timers, there is a nifty little restriction/protection built into the spec regarding the re-insertion of short interval repeating timers. that is: a repeating timer will not be re-inserted until AFTER the associated signal handler has been handled. Actually what it says is: "Only a single signal shall be queued to the process for a given timer at any point in time. When a timer for which a signal is still pending expires, no signal shall be queued, and a timer overrun shall occur." It then goes on to talk about the overrun count and how it is to be managed. I guess I was confusing what the spec said with the way in which I chose to comply with the spec. I calculate overruns (I know when a timer went off, and how many of its intervals have passed since it went off by by time_since_expire/interval) and don't reinsert until return from the signal handler. this prevents a flood of high frequency interrupts inherently and allows processing of the signal handlers to proceed cleanly. -- /*** ** Mark Salisbury | Mercury Computer Systems** ** [EMAIL PROTECTED] | System OS - Kernel Team ** **** ** I will be riding in the Multiple Sclerosis** ** Great Mass Getaway, a 150 mile bike ride from ** ** Boston to Provincetown. Last year I raised ** ** over $1200. This year I would like to beat ** ** that. If you would like to contribute, ** ** please contact me.** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Functional Specification for the high-res-timers project. In addition we expect that we will provide a high resolution timer for kernel use (heck, we may provide several). what we do here determines what we can do for the user.. We will provide several "clocks" as defined by the standard. In particular, the following capabilities will be attached to some clock, regardless of the actual clock "name" we end up using: CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as linux today). CLOCK_HIGH_RES a wall clock supporting timers with the highest resolution the hardware supports. CLOCK_1US a wall clock supporting timers with 1 micro second resolution (if the hardware allows it). CLOCK_UPTIME a clock that give the system up time. (Does this clock need to support timers?) CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution. Too many clocks. we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but the others are just fluff. we should have 1 single clock mechanism for the whole system with it's resolution and tick/tickless characteristics determined at compile time. also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a convenience/compliance offset. At the same time we will NOT support the following clocks: CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or wall) of a given task. CLOCK_PROFILE a clock used to generate profiling events. CLOCK_??? any clock keyed to a task. we could do some KOOL things here but they are more related to process time accounting and should be dealt with in that context and as a separate project. however our design should take these concepts into account and allow for easy integration of these types of functionality. (Note that this does not mean that the clock system will not support the virtual and profile clocks, but that they will not be accessible thru the POSIX timer interface.) I think that should sombody choose to implement them, they should be available at least through the clock_xxx interfaces.. It would be NICE if we could provide a way to hook other time support into the system. In particular a CLOCK_WWV or CLOCK_GPS CLOCK_NNTP might be nice. The problem with these sorts of clocks is that they imply an array of function pointers for each clock and function pointers slow the code down because of their non predictability. Never the less, we will attempt to allow easy expansion in this direction. Implications on the current kernel: The high resolution timers will require a fast clock access with the maximum supported resolution in order to convert relative times to absolute times. This same fast clock will be used to support the various user and system time requests. There are two ways to provide timers to the kernel. For lack of a better term we will refer to them as "ticked" and "tick less". Until we have performance information that implies that one or the other of these methods is better in all cases we will provide both ticked and tick less systems. The variety to be used will be selected at configure time. For tick less systems we will need to provide code to collect execution times. For the ticked system the current method of collection these times will be used. This project will NOT attempt to improve the resolution of these timers, however, the high speed, high resolution access to the current time will allow others to augment the system in this area. huh? why not? For the tick less system the project will also provide a time slice expiration interrupt. The timer list(s) (all pending timers) need to be organized so that the following operations are fast: a.) list insertion of an arbitrary timer, should be O(log(n)) at worst b.) removal of canceled and expired timers, and easy to make O(1) c.) finding the timer for "NOW" and its immediate followers. head of list or top of tree or top of heap or whatever, O(1) Times in the timer list will be absolute and related to system up time. These times will be converted to wall time as needed. and ONLY when needed by users The POSIX interface provides for "absolute" timers relative to a given do you mean "relative" timers? clock. When these timers are related to a "wall" clock they will need adjusting when the wall clock time is adjusted. These adjustments are done for "leap seconds" and the date command. (Note, we are keeping timers relative to "uptime" which can not be adjusted. This means that relative timers and absolute timers related to CLOCK_UPTIME are not affected by the above wall time adjustments.) absolute timers will automatically fall out when you adjust CLOCK_UPTIME... i.e. an absolute time is an absolute time and since CLOCK_UPTIME is the ultimate arbiter of what absolute time it is, adjusting CLOCK_UPTIME will cause the absolute times in the timer list to be handled properly without
Re: No 100 HZ timer!
Mark Salisbury wrote: Functional Specification for the high-res-timers project. In addition we expect that we will provide a high resolution timer for kernel use (heck, we may provide several). what we do here determines what we can do for the user.. I was thinking that it might be good to remove the POSIX API for the kernel and allow a somewhat simplified interface. For example, the user gets to resolution by specifying a CLOCK, where we MAY want to allow the kernel call to directly specify the resolution. This has already been suggested. I suppose you could say the functional spec should define the kernel PI (KPI?) as well as the user API, but that is a bit fuzzy at this time as I think it will depend on how we actually code the user functionality. Another example: I am leaning toward using a two word uptime composed of jiffies (i.e. 1/HZ since boot) and a machine dependent sub jiffie unit. Each ARCH would then define this unit as well as the conversion routines to move it back and forth to nanoseconds, microseconds, and 1/HZ. I think this format would play well in task time accounting, as well as in the timer management. For calls to something like delay(), however, it sucks. I think these calls want a single word, possibly microsecond, time specification. This gives a 16 or 17 minutes max and 1 microsecond min. which probably covers 99.99% of all kernel delay needs. Another kernel internal interface should allow the user specified structures (timespec and timeval). The notion is to put all the conversion routines in the timer code to maintain the specified resolution, and (more importantly), to avoid conversion to a format that just needs an additional conversion. To summarize, I think there is a need for two classes of timer interfaces in the kernel: a.) For drivers and others that need "delay()" sorts of things, and b.) For system calls that handle user specified times. We will provide several "clocks" as defined by the standard. In particular, the following capabilities will be attached to some clock, regardless of the actual clock "name" we end up using: CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as linux today). CLOCK_HIGH_RES a wall clock supporting timers with the highest resolution the hardware supports. CLOCK_1US a wall clock supporting timers with 1 micro second resolution (if the hardware allows it). CLOCK_UPTIME a clock that give the system up time. (Does this clock need to support timers?) CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution. Too many clocks. we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but the others are just fluff. we should have 1 single clock mechanism for the whole system with it's resolution and tick/tickless characteristics determined at compile time. I think you already have let the nose of the camel into the tent :) Here is what I am thinking: Suppose an array of structures of type clock. Clock_id is an index into this array. Here is what is in the structure: struct clock{ int resolution; int *gettime(); int *settime(); int *convert_to_uptime(); int *convert_from_uptime(); room for more bright ideas; }; Now the difference between CLOCK_UPTIME and CLOCK_REALTIME is surly in the gettime/settime and possibly in the resolution. But the difference between CLOCK_REALTIME and CLOCK_1US, CLOCK_HIGH_RES, CLOCK_10MS is JUST the resolution! In other words, all they cost is the table entry. Note that CLOCK_GMT, CLOCK_UST, and CLOCK_GPS, etc. all fit nicely into this same structure. We should also provide a way to "register" a new clock so the user can easily configure in additional clocks. There are ways of doing this that are really easy to use, e.g. the module_init() macro. also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a convenience/compliance offset. If you mean by "true" that this clock can not be set, starts at 0 at boot time and can only be affected by rate adjustments to get it to pass a real second every second, I agree. At the same time we will NOT support the following clocks: CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or wall) of a given task. CLOCK_PROFILE a clock used to generate profiling events. CLOCK_??? any clock keyed to a task. we could do some KOOL things here but they are more related to process time accounting and should be dealt with in that context and as a separate project. however our design should take these concepts into account and allow for easy integration of these types of functionality. I agree. (Note that this does not mean that the clock system will not support the virtual and profile clocks, but that they will not be accessible thru the POSIX timer interface.) I think that should sombody choose to implement them, they should be available at least through
Re: No 100 HZ timer!
george anzinger wrote: a.) list insertion of an arbitrary timer, should be O(log(n)) at worst b.) removal of canceled and expired timers, and easy to make O(1) I thought this was true also, but the priority heap structure that has been discussed here has a O(log(n)) removal time. Several priority queue structures support removal in O(1) time. Perhaps you are thinking of the classic array-based heap, for which removal is O(log n) in the general case. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Mark Salisbury wrote: > > > Given a system speed, there is a repeating timer rate which will consume > > 100% of the system in handling the timer interrupts. An attempt will > > be made to detect this rate and adjust the timer to prevent system > > lockup. This adjustment will look like timer overruns to the user > > (i.e. we will take a percent of the interrupts and record the untaken > > interrupts as overruns) > > just at first blush, there are some things in general but I need to read > this again and more closely > > but, with POSIX timers, there is a nifty little restriction/protection built > into the spec regarding the re-insertion of short interval repeating timers. > that is: a repeating timer will not be re-inserted until AFTER the > associated signal handler has been handled. Actually what it says is: "Only a single signal shall be queued to the process for a given timer at any point in time. When a timer for which a signal is still pending expires, no signal shall be queued, and a timer overrun shall occur." It then goes on to talk about the overrun count and how it is to be managed. What I am suggesting is that the system should detect when these interrupts would come so fast as to stall the system and just set up a percent of them while bumping the overrun count as if they had all occured. George > > this has some interesting consequences for signal handling and signal > delivery implementations, but importantly, it ensures that even a flood of > POSIX timers with very short repeat intervals will be handled cleanly. > > I will get more detailed comments to you tomorrow. > > Mark Salisbury > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
> Given a system speed, there is a repeating timer rate which will consume > 100% of the system in handling the timer interrupts. An attempt will > be made to detect this rate and adjust the timer to prevent system > lockup. This adjustment will look like timer overruns to the user > (i.e. we will take a percent of the interrupts and record the untaken > interrupts as overruns) just at first blush, there are some things in general but I need to read this again and more closely but, with POSIX timers, there is a nifty little restriction/protection built into the spec regarding the re-insertion of short interval repeating timers. that is: a repeating timer will not be re-inserted until AFTER the associated signal handler has been handled. this has some interesting consequences for signal handling and signal delivery implementations, but importantly, it ensures that even a flood of POSIX timers with very short repeat intervals will be handled cleanly. I will get more detailed comments to you tomorrow. Mark Salisbury - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
"Albert D. Cahalan" wrote: > > > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as > > linux today). > > Except on the Alpha, and on some ARM systems, etc. > The HZ constant varies from 10 to 1200. I suspect we will want to use 10 ms resolution for a clock named CLOCK_10MS :) On the other hand we could have a CLOCK_1_OVER_HZ... Actually with high-res-timers the actual HZ value becomes a bit less important. It would be "nice" to keep 1/HZ == jiffie, however. > > > At the same time we will NOT support the following clocks: > > > > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or > > wall) of a given task. > ... > > For tick less systems we will need to provide code to collect execution > > times. For the ticked system the current method of collection these > > times will be used. This project will NOT attempt to improve the > > resolution of these timers, however, the high speed, high resolution > > access to the current time will allow others to augment the system in > > this area. > ... > > This project will NOT provide higher resolution accounting (i.e. user > > and system execution times). > > It is nice to have accurate per-process user/system accounting. > Since you'd be touching the code anyway... Yeah sure... and will you pick up the ball on all the platform dependent code to get high-res-timers on all the other platforms? On second thought I am reminded of the corollary to the old saw: "The squeaking wheel get the grease." Which is: "He who complains most about the squeaking gets to do the greasing." Hint hint. > > > The POSIX interface provides for "absolute" timers relative to a given > > clock. When these timers are related to a "wall" clock they will need > > adjusting when the wall clock time is adjusted. These adjustments are > > done for "leap seconds" and the date command. > > This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely > defined POSIX time that is none of the above. This is a horrid mess, > even ignoring gravity and speed. :-) > > Can a second be 2 billion nanoseconds? > Can a nanosecond be twice as long as normal? > Can a second appear twice, with the nanoseconds getting reset? > Can a second never appear at all? > Can you compute times more than 6 months into the future? > How far does time deviate from solar time? Is this constrained? > > If you deal with leap seconds, you have to have a table of them. > This table grows with time, with adjustments being made with only > about 6 months notice. So the user upgrades after a year or two, > and the installer discovers that the user has been running a > system that is unaware of the most recent leap second. Arrrgh. > > Sure you want to touch this? The Austin group argued over it for > a very long time and never did find a really good solution. > Maybe you should just keep the code simple and fast, without any > concern for clock adjustments. There is a relatively simple way to handle this, at least from the high-res-timers point of view. First we convert all timers to absolute uptime. This is a nice well behaved time. At boot time we peg the wall clock to the uptime. Then at any given time, wall time is boot wall time + uptime. Then date, leap seconds, etc. affect the pegged value of boot wall time. Using the POSIX CLOCK id we allow the user to ask for either version of time. Now if we define an array of struc clock_id which contains pointers to such things as functions to return time, any algorithm you might want can be plugged in to bend time as needed. The only fly in this mess is the NTP rate adjustment stuff. This code is supposed to allow the system to adjust its ticker to produce accurate seconds and so gets at the root of the uptime counter be it in hardware or software or some combination of the two. But then that's what makes life interesting :) > > > In either a ticked or tick less system, it is expected that resolutions > > higher than 1/HZ will come with some additional overhead. For this > > reason, the CLOCK resolution will be used to round up times for each > > timer. When the CLOCK provides 1/HZ (or coarser) resolution, the > > project will attempt to meet or exceed the current systems timer > > performance. > > Within the kernel at least, it would be good to let drivers specify > desired resolution. Then a near-by value could be selected, perhaps > with some consideration for event type. (for cache reasons) This could be done, however, I would prefer to provide CLOCK_s to do this as per the standard. What does the community say? In either case you get different resolutions, but in the latter case the possible values are fixed at least at configure time. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
> CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as > linux today). Except on the Alpha, and on some ARM systems, etc. The HZ constant varies from 10 to 1200. > At the same time we will NOT support the following clocks: > > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or > wall) of a given task. ... > For tick less systems we will need to provide code to collect execution > times. For the ticked system the current method of collection these > times will be used. This project will NOT attempt to improve the > resolution of these timers, however, the high speed, high resolution > access to the current time will allow others to augment the system in > this area. ... > This project will NOT provide higher resolution accounting (i.e. user > and system execution times). It is nice to have accurate per-process user/system accounting. Since you'd be touching the code anyway... > The POSIX interface provides for "absolute" timers relative to a given > clock. When these timers are related to a "wall" clock they will need > adjusting when the wall clock time is adjusted. These adjustments are > done for "leap seconds" and the date command. This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely defined POSIX time that is none of the above. This is a horrid mess, even ignoring gravity and speed. :-) Can a second be 2 billion nanoseconds? Can a nanosecond be twice as long as normal? Can a second appear twice, with the nanoseconds getting reset? Can a second never appear at all? Can you compute times more than 6 months into the future? How far does time deviate from solar time? Is this constrained? If you deal with leap seconds, you have to have a table of them. This table grows with time, with adjustments being made with only about 6 months notice. So the user upgrades after a year or two, and the installer discovers that the user has been running a system that is unaware of the most recent leap second. Arrrgh. Sure you want to touch this? The Austin group argued over it for a very long time and never did find a really good solution. Maybe you should just keep the code simple and fast, without any concern for clock adjustments. > In either a ticked or tick less system, it is expected that resolutions > higher than 1/HZ will come with some additional overhead. For this > reason, the CLOCK resolution will be used to round up times for each > timer. When the CLOCK provides 1/HZ (or coarser) resolution, the > project will attempt to meet or exceed the current systems timer > performance. Within the kernel at least, it would be good to let drivers specify desired resolution. Then a near-by value could be selected, perhaps with some consideration for event type. (for cache reasons) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Mark Salisbury wrote: > > all this talk about which data structure to use and how to allocate memory is > wy premature. > > there needs to be a clear definition of the requirements that we wish to meet, > including whether we are going to do ticked, tickless, or both > > a func spec, for lack of a better term needs to be developed > > then, when we go to design the thing, THEN is when we decide on the particular > flavor of list/tree/heap/array/dbase that we use. > > let's engineer this thing instead of hacking it. Absolutely, find first draft attached. Comments please. George ~snip~ Functional Specification for the high-res-timers project. http://sourceforge.net/projects/high-res-timers We are developing code to implement the POSIX clocks & timers as defined by IEEE Std 1003.1b-1993 Section 14. (For an on line reference see our home page: http://high-res-timers.sourceforge.net/ ) The API specifies the following functions (for details please see the spec): int clock_settime(clockid_t clock_id, const struct timespec *tp); int clock_gettime(clockid_t clock_id, struct timespec *tp); int clock_getres(clockid_t clock_id, struct timespec *res); int timer_creat(clockid_t clock_id, struct sigevent *evp, timer_t *timerid); int timer_delete(timer_t *timerid); int timer_settime(timer_t *timerid, int flags, const struct itimerspec *value, struct itimerspec *ovalue); int timer_gettime(timer_t *timerid, struct itimerspec *value); int timer_getoverrun(timer_t *timerid); int nanosleep( const struct timesped *rqtp, struct timespec *rmtp); In addition we expect that we will provide a high resolution timer for kernel use (heck, we may provide several). In all this code we will code to allow resolutions to 1 nanosecond second (the max provided by the timespec structure). The actual resolution of any given clock will be fixed at compile time and the code will do its work at a higher resolution to avoid round off errors as much as possible. We will provide several "clocks" as defined by the standard. In particular, the following capabilities will be attached to some clock, regardless of the actual clock "name" we end up using: CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as linux today). CLOCK_HIGH_RES a wall clock supporting timers with the highest resolution the hardware supports. CLOCK_1US a wall clock supporting timers with 1 micro second resolution (if the hardware allows it). CLOCK_UPTIME a clock that give the system up time. (Does this clock need to support timers?) CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution. At the same time we will NOT support the following clocks: CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or wall) of a given task. CLOCK_PROFILE a clock used to generate profiling events. CLOCK_??? any clock keyed to a task. (Note that this does not mean that the clock system will not support the virtual and profile clocks, but that they will not be accessible thru the POSIX timer interface.) It would be NICE if we could provide a way to hook other time support into the system. In particular a CLOCK_WWV or CLOCK_GPS might be nice. The problem with these sorts of clocks is that they imply an array of function pointers for each clock and function pointers slow the code down because of their non predictability. Never the less, we will attempt to allow easy expansion in this direction. Implications on the current kernel: The high resolution timers will require a fast clock access with the maximum supported resolution in order to convert relative times to absolute times. This same fast clock will be used to support the various user and system time requests. There are two ways to provide timers to the kernel. For lack of a better term we will refer to them as "ticked" and "tick less". Until we have performance information that implies that one or the other of these methods is better in all cases we will provide both ticked and tick less systems. The variety to be used will be selected at configure time. For tick less systems we will need to provide code to collect execution times. For the ticked system the current method of collection these times will be used. This project will NOT attempt to improve the resolution of these timers, however, the high speed, high resolution access to the current time will allow others to augment the system in this area. For the tick less system the project will also provide a time slice expiration interrupt. The timer list(s) (all pending timers) need to be organized so that the following operations are fast: a.) list insertion of an arbitrary timer, b.) removal of canceled and expired timers, and c.) finding the timer for "NOW" and its immediate followers. Times in the timer list will be absolute and related to system up time. These times will be converted to wall time as needed. The POSIX interface
Re: No 100 HZ timer!
all this talk about which data structure to use and how to allocate memory is wy premature. there needs to be a clear definition of the requirements that we wish to meet, including whether we are going to do ticked, tickless, or both a func spec, for lack of a better term needs to be developed then, when we go to design the thing, THEN is when we decide on the particular flavor of list/tree/heap/array/dbase that we use. let's engineer this thing instead of hacking it. On Sun, 15 Apr 2001, Jamie Lokier wrote: > Ben Greear wrote: > > > Note that jumping around the array thrashes no more cache than > > > traversing a tree (it touches the same number of elements). I prefer > > > heap-ordered trees though because fixed size is always a bad idea. > > > > With a tree, you will be allocating and de-allocating for every > > insert/delete right? > > No, there is no memory allocation. > > > On cache-coherency issues, wouldn't it be more likely to have a cache > > hit when you are accessing one contigious (ie the array) piece of > > memory? A 4-k page will hold a lot of indexes!! > > No, because when traversing an array-heap, you don't access contiguous > entries. You might get one or two more cache hits near the root of the > heap. > > > To get around the fixed size thing..could have > > the array grow itself when needed (and probably never shrink again). > > You could to that, but then you'd have to deal with memory allocation > failures and memory deadlocks, making add_timer rather complicated. > It's not acceptable for add_timer to fail or require kmalloc(). > > -- Jamie > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- /*** ** Mark Salisbury | Mercury Computer Systems** ** [EMAIL PROTECTED] | System OS - Kernel Team ** **** ** I will be riding in the Multiple Sclerosis** ** Great Mass Getaway, a 150 mile bike ride from ** ** Boston to Provincetown. Last year I raised ** ** over $1200. This year I would like to beat ** ** that. If you would like to contribute, ** ** please contact me.** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
all this talk about which data structure to use and how to allocate memory is wy premature. there needs to be a clear definition of the requirements that we wish to meet, including whether we are going to do ticked, tickless, or both a func spec, for lack of a better term needs to be developed then, when we go to design the thing, THEN is when we decide on the particular flavor of list/tree/heap/array/dbase that we use. let's engineer this thing instead of hacking it. On Sun, 15 Apr 2001, Jamie Lokier wrote: Ben Greear wrote: Note that jumping around the array thrashes no more cache than traversing a tree (it touches the same number of elements). I prefer heap-ordered trees though because fixed size is always a bad idea. With a tree, you will be allocating and de-allocating for every insert/delete right? No, there is no memory allocation. On cache-coherency issues, wouldn't it be more likely to have a cache hit when you are accessing one contigious (ie the array) piece of memory? A 4-k page will hold a lot of indexes!! No, because when traversing an array-heap, you don't access contiguous entries. You might get one or two more cache hits near the root of the heap. To get around the fixed size thing..could have the array grow itself when needed (and probably never shrink again). You could to that, but then you'd have to deal with memory allocation failures and memory deadlocks, making add_timer rather complicated. It's not acceptable for add_timer to fail or require kmalloc(). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- /*** ** Mark Salisbury | Mercury Computer Systems** ** [EMAIL PROTECTED] | System OS - Kernel Team ** **** ** I will be riding in the Multiple Sclerosis** ** Great Mass Getaway, a 150 mile bike ride from ** ** Boston to Provincetown. Last year I raised ** ** over $1200. This year I would like to beat ** ** that. If you would like to contribute, ** ** please contact me.** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Mark Salisbury wrote: all this talk about which data structure to use and how to allocate memory is wy premature. there needs to be a clear definition of the requirements that we wish to meet, including whether we are going to do ticked, tickless, or both a func spec, for lack of a better term needs to be developed then, when we go to design the thing, THEN is when we decide on the particular flavor of list/tree/heap/array/dbase that we use. let's engineer this thing instead of hacking it. Absolutely, find first draft attached. Comments please. George ~snip~ Functional Specification for the high-res-timers project. http://sourceforge.net/projects/high-res-timers We are developing code to implement the POSIX clocks timers as defined by IEEE Std 1003.1b-1993 Section 14. (For an on line reference see our home page: http://high-res-timers.sourceforge.net/ ) The API specifies the following functions (for details please see the spec): int clock_settime(clockid_t clock_id, const struct timespec *tp); int clock_gettime(clockid_t clock_id, struct timespec *tp); int clock_getres(clockid_t clock_id, struct timespec *res); int timer_creat(clockid_t clock_id, struct sigevent *evp, timer_t *timerid); int timer_delete(timer_t *timerid); int timer_settime(timer_t *timerid, int flags, const struct itimerspec *value, struct itimerspec *ovalue); int timer_gettime(timer_t *timerid, struct itimerspec *value); int timer_getoverrun(timer_t *timerid); int nanosleep( const struct timesped *rqtp, struct timespec *rmtp); In addition we expect that we will provide a high resolution timer for kernel use (heck, we may provide several). In all this code we will code to allow resolutions to 1 nanosecond second (the max provided by the timespec structure). The actual resolution of any given clock will be fixed at compile time and the code will do its work at a higher resolution to avoid round off errors as much as possible. We will provide several "clocks" as defined by the standard. In particular, the following capabilities will be attached to some clock, regardless of the actual clock "name" we end up using: CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as linux today). CLOCK_HIGH_RES a wall clock supporting timers with the highest resolution the hardware supports. CLOCK_1US a wall clock supporting timers with 1 micro second resolution (if the hardware allows it). CLOCK_UPTIME a clock that give the system up time. (Does this clock need to support timers?) CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution. At the same time we will NOT support the following clocks: CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or wall) of a given task. CLOCK_PROFILE a clock used to generate profiling events. CLOCK_??? any clock keyed to a task. (Note that this does not mean that the clock system will not support the virtual and profile clocks, but that they will not be accessible thru the POSIX timer interface.) It would be NICE if we could provide a way to hook other time support into the system. In particular a CLOCK_WWV or CLOCK_GPS might be nice. The problem with these sorts of clocks is that they imply an array of function pointers for each clock and function pointers slow the code down because of their non predictability. Never the less, we will attempt to allow easy expansion in this direction. Implications on the current kernel: The high resolution timers will require a fast clock access with the maximum supported resolution in order to convert relative times to absolute times. This same fast clock will be used to support the various user and system time requests. There are two ways to provide timers to the kernel. For lack of a better term we will refer to them as "ticked" and "tick less". Until we have performance information that implies that one or the other of these methods is better in all cases we will provide both ticked and tick less systems. The variety to be used will be selected at configure time. For tick less systems we will need to provide code to collect execution times. For the ticked system the current method of collection these times will be used. This project will NOT attempt to improve the resolution of these timers, however, the high speed, high resolution access to the current time will allow others to augment the system in this area. For the tick less system the project will also provide a time slice expiration interrupt. The timer list(s) (all pending timers) need to be organized so that the following operations are fast: a.) list insertion of an arbitrary timer, b.) removal of canceled and expired timers, and c.) finding the timer for "NOW" and its immediate followers. Times in the timer list will be absolute and related to system up time. These times will be converted to wall time as needed. The POSIX interface provides for
Re: No 100 HZ timer!
CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as linux today). Except on the Alpha, and on some ARM systems, etc. The HZ constant varies from 10 to 1200. At the same time we will NOT support the following clocks: CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or wall) of a given task. ... For tick less systems we will need to provide code to collect execution times. For the ticked system the current method of collection these times will be used. This project will NOT attempt to improve the resolution of these timers, however, the high speed, high resolution access to the current time will allow others to augment the system in this area. ... This project will NOT provide higher resolution accounting (i.e. user and system execution times). It is nice to have accurate per-process user/system accounting. Since you'd be touching the code anyway... The POSIX interface provides for "absolute" timers relative to a given clock. When these timers are related to a "wall" clock they will need adjusting when the wall clock time is adjusted. These adjustments are done for "leap seconds" and the date command. This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely defined POSIX time that is none of the above. This is a horrid mess, even ignoring gravity and speed. :-) Can a second be 2 billion nanoseconds? Can a nanosecond be twice as long as normal? Can a second appear twice, with the nanoseconds getting reset? Can a second never appear at all? Can you compute times more than 6 months into the future? How far does time deviate from solar time? Is this constrained? If you deal with leap seconds, you have to have a table of them. This table grows with time, with adjustments being made with only about 6 months notice. So the user upgrades after a year or two, and the installer discovers that the user has been running a system that is unaware of the most recent leap second. Arrrgh. Sure you want to touch this? The Austin group argued over it for a very long time and never did find a really good solution. Maybe you should just keep the code simple and fast, without any concern for clock adjustments. In either a ticked or tick less system, it is expected that resolutions higher than 1/HZ will come with some additional overhead. For this reason, the CLOCK resolution will be used to round up times for each timer. When the CLOCK provides 1/HZ (or coarser) resolution, the project will attempt to meet or exceed the current systems timer performance. Within the kernel at least, it would be good to let drivers specify desired resolution. Then a near-by value could be selected, perhaps with some consideration for event type. (for cache reasons) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
"Albert D. Cahalan" wrote: CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as linux today). Except on the Alpha, and on some ARM systems, etc. The HZ constant varies from 10 to 1200. I suspect we will want to use 10 ms resolution for a clock named CLOCK_10MS :) On the other hand we could have a CLOCK_1_OVER_HZ... Actually with high-res-timers the actual HZ value becomes a bit less important. It would be "nice" to keep 1/HZ == jiffie, however. At the same time we will NOT support the following clocks: CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or wall) of a given task. ... For tick less systems we will need to provide code to collect execution times. For the ticked system the current method of collection these times will be used. This project will NOT attempt to improve the resolution of these timers, however, the high speed, high resolution access to the current time will allow others to augment the system in this area. ... This project will NOT provide higher resolution accounting (i.e. user and system execution times). It is nice to have accurate per-process user/system accounting. Since you'd be touching the code anyway... Yeah sure... and will you pick up the ball on all the platform dependent code to get high-res-timers on all the other platforms? On second thought I am reminded of the corollary to the old saw: "The squeaking wheel get the grease." Which is: "He who complains most about the squeaking gets to do the greasing." Hint hint. The POSIX interface provides for "absolute" timers relative to a given clock. When these timers are related to a "wall" clock they will need adjusting when the wall clock time is adjusted. These adjustments are done for "leap seconds" and the date command. This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely defined POSIX time that is none of the above. This is a horrid mess, even ignoring gravity and speed. :-) Can a second be 2 billion nanoseconds? Can a nanosecond be twice as long as normal? Can a second appear twice, with the nanoseconds getting reset? Can a second never appear at all? Can you compute times more than 6 months into the future? How far does time deviate from solar time? Is this constrained? If you deal with leap seconds, you have to have a table of them. This table grows with time, with adjustments being made with only about 6 months notice. So the user upgrades after a year or two, and the installer discovers that the user has been running a system that is unaware of the most recent leap second. Arrrgh. Sure you want to touch this? The Austin group argued over it for a very long time and never did find a really good solution. Maybe you should just keep the code simple and fast, without any concern for clock adjustments. There is a relatively simple way to handle this, at least from the high-res-timers point of view. First we convert all timers to absolute uptime. This is a nice well behaved time. At boot time we peg the wall clock to the uptime. Then at any given time, wall time is boot wall time + uptime. Then date, leap seconds, etc. affect the pegged value of boot wall time. Using the POSIX CLOCK id we allow the user to ask for either version of time. Now if we define an array of struc clock_id which contains pointers to such things as functions to return time, any algorithm you might want can be plugged in to bend time as needed. The only fly in this mess is the NTP rate adjustment stuff. This code is supposed to allow the system to adjust its ticker to produce accurate seconds and so gets at the root of the uptime counter be it in hardware or software or some combination of the two. But then that's what makes life interesting :) In either a ticked or tick less system, it is expected that resolutions higher than 1/HZ will come with some additional overhead. For this reason, the CLOCK resolution will be used to round up times for each timer. When the CLOCK provides 1/HZ (or coarser) resolution, the project will attempt to meet or exceed the current systems timer performance. Within the kernel at least, it would be good to let drivers specify desired resolution. Then a near-by value could be selected, perhaps with some consideration for event type. (for cache reasons) This could be done, however, I would prefer to provide CLOCK_s to do this as per the standard. What does the community say? In either case you get different resolutions, but in the latter case the possible values are fixed at least at configure time. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Given a system speed, there is a repeating timer rate which will consume 100% of the system in handling the timer interrupts. An attempt will be made to detect this rate and adjust the timer to prevent system lockup. This adjustment will look like timer overruns to the user (i.e. we will take a percent of the interrupts and record the untaken interrupts as overruns) just at first blush, there are some things in general but I need to read this again and more closely but, with POSIX timers, there is a nifty little restriction/protection built into the spec regarding the re-insertion of short interval repeating timers. that is: a repeating timer will not be re-inserted until AFTER the associated signal handler has been handled. this has some interesting consequences for signal handling and signal delivery implementations, but importantly, it ensures that even a flood of POSIX timers with very short repeat intervals will be handled cleanly. I will get more detailed comments to you tomorrow. Mark Salisbury - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Mark Salisbury wrote: Given a system speed, there is a repeating timer rate which will consume 100% of the system in handling the timer interrupts. An attempt will be made to detect this rate and adjust the timer to prevent system lockup. This adjustment will look like timer overruns to the user (i.e. we will take a percent of the interrupts and record the untaken interrupts as overruns) just at first blush, there are some things in general but I need to read this again and more closely but, with POSIX timers, there is a nifty little restriction/protection built into the spec regarding the re-insertion of short interval repeating timers. that is: a repeating timer will not be re-inserted until AFTER the associated signal handler has been handled. Actually what it says is: "Only a single signal shall be queued to the process for a given timer at any point in time. When a timer for which a signal is still pending expires, no signal shall be queued, and a timer overrun shall occur." It then goes on to talk about the overrun count and how it is to be managed. What I am suggesting is that the system should detect when these interrupts would come so fast as to stall the system and just set up a percent of them while bumping the overrun count as if they had all occured. George this has some interesting consequences for signal handling and signal delivery implementations, but importantly, it ensures that even a flood of POSIX timers with very short repeat intervals will be handled cleanly. I will get more detailed comments to you tomorrow. Mark Salisbury - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Jamie Lokier wrote: > > george anzinger wrote: > > Horst von Brand wrote: > > > > > > Ben Greear <[EMAIL PROTECTED]> said: > > > > > > [...] > > > > > > > Wouldn't a heap be a good data structure for a list of timers? Insertion > > > > is log(n) and finding the one with the least time is O(1), ie pop off the > > > > front It can be implemented in an array which should help cache > > > > coherency and all those other things they talked about in school :) > > > > > > Insertion and deleting the first are both O(log N). Plus the array is fixed > > > size (bad idea) and the jumping around in the array thrashes the caches. > > > -- > > And your solution is? > > Note that jumping around the array thrashes no more cache than > traversing a tree (it touches the same number of elements). I prefer > heap-ordered trees though because fixed size is always a bad idea. With a tree, you will be allocating and de-allocating for every insert/delete right? That seems like a reasonable performance hit that an array-based approach might not have... On cache-coherency issues, wouldn't it be more likely to have a cache hit when you are accessing one contigious (ie the array) piece of memory? A 4-k page will hold a lot of indexes!! To get around the fixed size thing..could have the array grow itself when needed (and probably never shrink again). This would suck if you did it often, but I'm assuming that it would quickly grow to needed size and then stabalize... > > Insertion is O(1) if entries can be predicted to be near > enough some place in the list, be that the beginning, the end, or some > marked places in the middle. > > By the way, the current timer implementation only appears to be O(1) if > you ignore the overhead of having to do a check on every tick, and the > extra processing on table rollover. For random timer usage patterns, > that overhead adds up to O(log n), the same as a heap. > > However for skewed usage patterns (likely in the kernel), the current > table method avoids the O(log n) sorting overhead because long-delay > timers are often removed before percolating down to the smallest tables. > It is possible to produce a general purpose heap which also has this > property. > > -- Jamie > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ben Greear ([EMAIL PROTECTED]) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: > > Horst von Brand wrote: > > > > Ben Greear <[EMAIL PROTECTED]> said: > > > > [...] > > > > > Wouldn't a heap be a good data structure for a list of timers? Insertion > > > is log(n) and finding the one with the least time is O(1), ie pop off the > > > front It can be implemented in an array which should help cache > > > coherency and all those other things they talked about in school :) > > > > Insertion and deleting the first are both O(log N). Plus the array is fixed > > size (bad idea) and the jumping around in the array thrashes the caches. Jumping around an array can't be much worse than jumping around a linked list, can it? It does not have to be fixed length, though it wouldn't be log(n) to grow the array, it can still be done...and once you reach maximal size, you won't be growing it any more... I had forgotten about the log(n) to delete, though log(n) is still pretty good. As others have suggested, it might be good to have a linked list for very-soon-to-expire timers. However, they would have to be few enough that your linear insert was not worse than a log(n) instert and a log(n) delete... > > -- > And your solution is? > > George -- Ben Greear ([EMAIL PROTECTED]) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: Horst von Brand wrote: Ben Greear [EMAIL PROTECTED] said: [...] Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) Insertion and deleting the first are both O(log N). Plus the array is fixed size (bad idea) and the jumping around in the array thrashes the caches. Jumping around an array can't be much worse than jumping around a linked list, can it? It does not have to be fixed length, though it wouldn't be log(n) to grow the array, it can still be done...and once you reach maximal size, you won't be growing it any more... I had forgotten about the log(n) to delete, though log(n) is still pretty good. As others have suggested, it might be good to have a linked list for very-soon-to-expire timers. However, they would have to be few enough that your linear insert was not worse than a log(n) instert and a log(n) delete... -- And your solution is? George -- Ben Greear ([EMAIL PROTECTED]) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Jamie Lokier wrote: george anzinger wrote: Horst von Brand wrote: Ben Greear [EMAIL PROTECTED] said: [...] Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) Insertion and deleting the first are both O(log N). Plus the array is fixed size (bad idea) and the jumping around in the array thrashes the caches. -- And your solution is? Note that jumping around the array thrashes no more cache than traversing a tree (it touches the same number of elements). I prefer heap-ordered trees though because fixed size is always a bad idea. With a tree, you will be allocating and de-allocating for every insert/delete right? That seems like a reasonable performance hit that an array-based approach might not have... On cache-coherency issues, wouldn't it be more likely to have a cache hit when you are accessing one contigious (ie the array) piece of memory? A 4-k page will hold a lot of indexes!! To get around the fixed size thing..could have the array grow itself when needed (and probably never shrink again). This would suck if you did it often, but I'm assuming that it would quickly grow to needed size and then stabalize... Insertion is O(1) if entries can be predicted to be near enough some place in the list, be that the beginning, the end, or some marked places in the middle. By the way, the current timer implementation only appears to be O(1) if you ignore the overhead of having to do a check on every tick, and the extra processing on table rollover. For random timer usage patterns, that overhead adds up to O(log n), the same as a heap. However for skewed usage patterns (likely in the kernel), the current table method avoids the O(log n) sorting overhead because long-delay timers are often removed before percolating down to the smallest tables. It is possible to produce a general purpose heap which also has this property. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Ben Greear ([EMAIL PROTECTED]) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: > Horst von Brand wrote: > > > > Ben Greear <[EMAIL PROTECTED]> said: > > > > [...] > > > > > Wouldn't a heap be a good data structure for a list of timers? Insertion > > > is log(n) and finding the one with the least time is O(1), ie pop off the > > > front It can be implemented in an array which should help cache > > > coherency and all those other things they talked about in school :) > > > > Insertion and deleting the first are both O(log N). Plus the array is fixed > > size (bad idea) and the jumping around in the array thrashes the caches. > > -- > And your solution is? Note that jumping around the array thrashes no more cache than traversing a tree (it touches the same number of elements). I prefer heap-ordered trees though because fixed size is always a bad idea. Insertion is O(1) if entries can be predicted to be near enough some place in the list, be that the beginning, the end, or some marked places in the middle. By the way, the current timer implementation only appears to be O(1) if you ignore the overhead of having to do a check on every tick, and the extra processing on table rollover. For random timer usage patterns, that overhead adds up to O(log n), the same as a heap. However for skewed usage patterns (likely in the kernel), the current table method avoids the O(log n) sorting overhead because long-delay timers are often removed before percolating down to the smallest tables. It is possible to produce a general purpose heap which also has this property. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: > If we are to do high-res-timers, I think we will always have to do some > sort of a sort on insert. Yes, that's called a priority queue... (And you don't actually sort on each insertion). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Horst von Brand wrote: > > Ben Greear <[EMAIL PROTECTED]> said: > > [...] > > > Wouldn't a heap be a good data structure for a list of timers? Insertion > > is log(n) and finding the one with the least time is O(1), ie pop off the > > front It can be implemented in an array which should help cache > > coherency and all those other things they talked about in school :) > > Insertion and deleting the first are both O(log N). Plus the array is fixed > size (bad idea) and the jumping around in the array thrashes the caches. > -- And your solution is? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Ben Greear <[EMAIL PROTECTED]> said: [...] > Wouldn't a heap be a good data structure for a list of timers? Insertion > is log(n) and finding the one with the least time is O(1), ie pop off the > front It can be implemented in an array which should help cache > coherency and all those other things they talked about in school :) Insertion and deleting the first are both O(log N). Plus the array is fixed size (bad idea) and the jumping around in the array thrashes the caches. -- Horst von Brand [EMAIL PROTECTED] Casilla 9G, Vin~a del Mar, Chile +56 32 672616 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Jamie Lokier wrote: > > george anzinger wrote: > > > Wouldn't a heap be a good data structure for a list of timers? Insertion > > > is log(n) and finding the one with the least time is O(1), ie pop off the > > > front It can be implemented in an array which should help cache > > > coherency and all those other things they talked about in school :) > > > > > I must be missing something here. You get log(n) from what? B-trees? > > How would you manage them to get the needed balance? Stopping the world > > to re-balance is worse than the cascade. I guess I need to read up on > > this stuff. A good pointer would be much appreciated. > > Look for "priority queues" and "heaps". In its simplest form, you use a > heap-ordered tree, which can be implemented using an array (that's > usually how it's presented), or having the objects in the heap point to > each other. > > A heap-ordered tree is not as strictly ordered as, well, an ordered tree > :-) The rule is: if A is the parent of B and C, then A expires earlier > than B, and A expires earlier than C. There is no constraint on the > relative expiry times of B and C. > > There is no "stop the world" to rebalance, which you may consider an > advantage over the present hierarchy of tables. On the other hand, each > insertion or deletion operation takes O(log n) time, where n is the > number of items in the queue. Although fairly fast, this bound can be > improved if you know the typical insertion/deletion patterns, to O(1) > for selected cases. Also you should know that not all priority queues > are based on heap-ordered trees. > > Linux' current hierarchy of tables is a good example of optimisation: it > is optimised for inserting and actually running short term timers, as > well as inserting and deleting (before running) long term timers. These > extremes take O(1) for insertion, removal and expiry, including the > "stop the world" time. This should be considered before and move to a > heap-based priority queue, which may turn out slower. > > > Meanwhile, I keep thinking of a simple doubly linked list in time > > order. To speed it up keep an array of pointers to the first N whole > > jiffie points and maybe pointers to coarser points beyond the first N. > > Make N, say 512. Build the pointers as needed. This should give > > something like O(n/N) insertion and O(1) removal. > > You've just described the same as the current implementation, but with > lists for longer term timers. I.e. slower. With your coarser points, > you have to sort the front elements of the coarse list into a fine one > from time to time. > > The idea of having jiffie-point pointers into a data structure for fast > insertion is a good one for speeding up any data structure for that > common case, though. > If we are to do high-res-timers, I think we will always have to do some sort of a sort on insert. If we can keep the sort to a VERY short list and only do it on sub jiffie timers, we will have something that is VERY close to what we have now. I think that the density of timers tends to increase as they get closer to "now". This should allow coarser pointers for times more removed from "now". Still if we sort on insert we will not have to resort later. The pointers into the list, however, are perishable and need to be refreshed as time passes. My though was to do this only when a needed pointer was found to be "expired" and then only for the needed pointer. On first need we would have a small overhead, but would remember for next time. Thanks for the good input George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: > > Wouldn't a heap be a good data structure for a list of timers? Insertion > > is log(n) and finding the one with the least time is O(1), ie pop off the > > front It can be implemented in an array which should help cache > > coherency and all those other things they talked about in school :) > > > I must be missing something here. You get log(n) from what? B-trees? > How would you manage them to get the needed balance? Stopping the world > to re-balance is worse than the cascade. I guess I need to read up on > this stuff. A good pointer would be much appreciated. Look for "priority queues" and "heaps". In its simplest form, you use a heap-ordered tree, which can be implemented using an array (that's usually how it's presented), or having the objects in the heap point to each other. A heap-ordered tree is not as strictly ordered as, well, an ordered tree :-) The rule is: if A is the parent of B and C, then A expires earlier than B, and A expires earlier than C. There is no constraint on the relative expiry times of B and C. There is no "stop the world" to rebalance, which you may consider an advantage over the present hierarchy of tables. On the other hand, each insertion or deletion operation takes O(log n) time, where n is the number of items in the queue. Although fairly fast, this bound can be improved if you know the typical insertion/deletion patterns, to O(1) for selected cases. Also you should know that not all priority queues are based on heap-ordered trees. Linux' current hierarchy of tables is a good example of optimisation: it is optimised for inserting and actually running short term timers, as well as inserting and deleting (before running) long term timers. These extremes take O(1) for insertion, removal and expiry, including the "stop the world" time. This should be considered before and move to a heap-based priority queue, which may turn out slower. > Meanwhile, I keep thinking of a simple doubly linked list in time > order. To speed it up keep an array of pointers to the first N whole > jiffie points and maybe pointers to coarser points beyond the first N. > Make N, say 512. Build the pointers as needed. This should give > something like O(n/N) insertion and O(1) removal. You've just described the same as the current implementation, but with lists for longer term timers. I.e. slower. With your coarser points, you have to sort the front elements of the coarse list into a fine one from time to time. The idea of having jiffie-point pointers into a data structure for fast insertion is a good one for speeding up any data structure for that common case, though. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Ben Greear wrote: > > Bret Indrelee wrote: > > > > On Thu, 12 Apr 2001, george anzinger wrote: > > > Bret Indrelee wrote: > > > > > > > > On Thu, 12 Apr 2001, george anzinger wrote: > > > > > Bret Indrelee wrote: > > > > > > Keep all timers in a sorted double-linked list. Do the insert > > > > > > intelligently, adding it from the back or front of the list depending on > > > > > > where it is in relation to existing entries. > > > > > > > > > > I think this is too slow, especially for a busy system, but there are > > > > > solutions... > > > > > > > > It is better than the current solution. > > > > > > Uh, where are we talking about. The current time list insert is real > > > close to O(1) and never more than O(5). > > > > I don't like the cost of the cascades every (as I recall) 256 > > interrupts. This is more work than is done in the rest of the interrupt > > processing, happens during the tick interrupt, and results in a rebuild of > > much of the table. Right, it needs to go, we need to eliminate the "lumps" in time :) > > > > -Bret > > Wouldn't a heap be a good data structure for a list of timers? Insertion > is log(n) and finding the one with the least time is O(1), ie pop off the > front It can be implemented in an array which should help cache > coherency and all those other things they talked about in school :) > I must be missing something here. You get log(n) from what? B-trees? How would you manage them to get the needed balance? Stopping the world to re-balance is worse than the cascade. I guess I need to read up on this stuff. A good pointer would be much appreciated. Meanwhile, I keep thinking of a simple doubly linked list in time order. To speed it up keep an array of pointers to the first N whole jiffie points and maybe pointers to coarser points beyond the first N. Make N, say 512. Build the pointers as needed. This should give something like O(n/N) insertion and O(1) removal. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Ben Greear wrote: Bret Indrelee wrote: On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. I think this is too slow, especially for a busy system, but there are solutions... It is better than the current solution. Uh, where are we talking about. The current time list insert is real close to O(1) and never more than O(5). I don't like the cost of the cascades every (as I recall) 256 interrupts. This is more work than is done in the rest of the interrupt processing, happens during the tick interrupt, and results in a rebuild of much of the table. Right, it needs to go, we need to eliminate the "lumps" in time :) -Bret Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) I must be missing something here. You get log(n) from what? B-trees? How would you manage them to get the needed balance? Stopping the world to re-balance is worse than the cascade. I guess I need to read up on this stuff. A good pointer would be much appreciated. Meanwhile, I keep thinking of a simple doubly linked list in time order. To speed it up keep an array of pointers to the first N whole jiffie points and maybe pointers to coarser points beyond the first N. Make N, say 512. Build the pointers as needed. This should give something like O(n/N) insertion and O(1) removal. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) I must be missing something here. You get log(n) from what? B-trees? How would you manage them to get the needed balance? Stopping the world to re-balance is worse than the cascade. I guess I need to read up on this stuff. A good pointer would be much appreciated. Look for "priority queues" and "heaps". In its simplest form, you use a heap-ordered tree, which can be implemented using an array (that's usually how it's presented), or having the objects in the heap point to each other. A heap-ordered tree is not as strictly ordered as, well, an ordered tree :-) The rule is: if A is the parent of B and C, then A expires earlier than B, and A expires earlier than C. There is no constraint on the relative expiry times of B and C. There is no "stop the world" to rebalance, which you may consider an advantage over the present hierarchy of tables. On the other hand, each insertion or deletion operation takes O(log n) time, where n is the number of items in the queue. Although fairly fast, this bound can be improved if you know the typical insertion/deletion patterns, to O(1) for selected cases. Also you should know that not all priority queues are based on heap-ordered trees. Linux' current hierarchy of tables is a good example of optimisation: it is optimised for inserting and actually running short term timers, as well as inserting and deleting (before running) long term timers. These extremes take O(1) for insertion, removal and expiry, including the "stop the world" time. This should be considered before and move to a heap-based priority queue, which may turn out slower. Meanwhile, I keep thinking of a simple doubly linked list in time order. To speed it up keep an array of pointers to the first N whole jiffie points and maybe pointers to coarser points beyond the first N. Make N, say 512. Build the pointers as needed. This should give something like O(n/N) insertion and O(1) removal. You've just described the same as the current implementation, but with lists for longer term timers. I.e. slower. With your coarser points, you have to sort the front elements of the coarse list into a fine one from time to time. The idea of having jiffie-point pointers into a data structure for fast insertion is a good one for speeding up any data structure for that common case, though. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Jamie Lokier wrote: george anzinger wrote: Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) I must be missing something here. You get log(n) from what? B-trees? How would you manage them to get the needed balance? Stopping the world to re-balance is worse than the cascade. I guess I need to read up on this stuff. A good pointer would be much appreciated. Look for "priority queues" and "heaps". In its simplest form, you use a heap-ordered tree, which can be implemented using an array (that's usually how it's presented), or having the objects in the heap point to each other. A heap-ordered tree is not as strictly ordered as, well, an ordered tree :-) The rule is: if A is the parent of B and C, then A expires earlier than B, and A expires earlier than C. There is no constraint on the relative expiry times of B and C. There is no "stop the world" to rebalance, which you may consider an advantage over the present hierarchy of tables. On the other hand, each insertion or deletion operation takes O(log n) time, where n is the number of items in the queue. Although fairly fast, this bound can be improved if you know the typical insertion/deletion patterns, to O(1) for selected cases. Also you should know that not all priority queues are based on heap-ordered trees. Linux' current hierarchy of tables is a good example of optimisation: it is optimised for inserting and actually running short term timers, as well as inserting and deleting (before running) long term timers. These extremes take O(1) for insertion, removal and expiry, including the "stop the world" time. This should be considered before and move to a heap-based priority queue, which may turn out slower. Meanwhile, I keep thinking of a simple doubly linked list in time order. To speed it up keep an array of pointers to the first N whole jiffie points and maybe pointers to coarser points beyond the first N. Make N, say 512. Build the pointers as needed. This should give something like O(n/N) insertion and O(1) removal. You've just described the same as the current implementation, but with lists for longer term timers. I.e. slower. With your coarser points, you have to sort the front elements of the coarse list into a fine one from time to time. The idea of having jiffie-point pointers into a data structure for fast insertion is a good one for speeding up any data structure for that common case, though. If we are to do high-res-timers, I think we will always have to do some sort of a sort on insert. If we can keep the sort to a VERY short list and only do it on sub jiffie timers, we will have something that is VERY close to what we have now. I think that the density of timers tends to increase as they get closer to "now". This should allow coarser pointers for times more removed from "now". Still if we sort on insert we will not have to resort later. The pointers into the list, however, are perishable and need to be refreshed as time passes. My though was to do this only when a needed pointer was found to be "expired" and then only for the needed pointer. On first need we would have a small overhead, but would remember for next time. Thanks for the good input George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Ben Greear [EMAIL PROTECTED] said: [...] Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) Insertion and deleting the first are both O(log N). Plus the array is fixed size (bad idea) and the jumping around in the array thrashes the caches. -- Horst von Brand [EMAIL PROTECTED] Casilla 9G, Vin~a del Mar, Chile +56 32 672616 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Horst von Brand wrote: Ben Greear [EMAIL PROTECTED] said: [...] Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) Insertion and deleting the first are both O(log N). Plus the array is fixed size (bad idea) and the jumping around in the array thrashes the caches. -- And your solution is? George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: If we are to do high-res-timers, I think we will always have to do some sort of a sort on insert. Yes, that's called a priority queue... (And you don't actually sort on each insertion). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
george anzinger wrote: Horst von Brand wrote: Ben Greear [EMAIL PROTECTED] said: [...] Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) Insertion and deleting the first are both O(log N). Plus the array is fixed size (bad idea) and the jumping around in the array thrashes the caches. -- And your solution is? Note that jumping around the array thrashes no more cache than traversing a tree (it touches the same number of elements). I prefer heap-ordered trees though because fixed size is always a bad idea. Insertion is O(1) if entries can be predicted to be near enough some place in the list, be that the beginning, the end, or some marked places in the middle. By the way, the current timer implementation only appears to be O(1) if you ignore the overhead of having to do a check on every tick, and the extra processing on table rollover. For random timer usage patterns, that overhead adds up to O(log n), the same as a heap. However for skewed usage patterns (likely in the kernel), the current table method avoids the O(log n) sorting overhead because long-delay timers are often removed before percolating down to the smallest tables. It is possible to produce a general purpose heap which also has this property. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Bret Indrelee wrote: > > On Thu, 12 Apr 2001, george anzinger wrote: > > Bret Indrelee wrote: > > > > > > On Thu, 12 Apr 2001, george anzinger wrote: > > > > Bret Indrelee wrote: > > > > > Keep all timers in a sorted double-linked list. Do the insert > > > > > intelligently, adding it from the back or front of the list depending on > > > > > where it is in relation to existing entries. > > > > > > > > I think this is too slow, especially for a busy system, but there are > > > > solutions... > > > > > > It is better than the current solution. > > > > Uh, where are we talking about. The current time list insert is real > > close to O(1) and never more than O(5). > > I don't like the cost of the cascades every (as I recall) 256 > interrupts. This is more work than is done in the rest of the interrupt > processing, happens during the tick interrupt, and results in a rebuild of > much of the table. > > -Bret Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) Ben > > -- > Bret Indrelee | Sometimes, to be deep, we must act shallow! > [EMAIL PROTECTED] | -Riff in The Quatrix > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ben Greear ([EMAIL PROTECTED]) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
On Thu, 12 Apr 2001, george anzinger wrote: > Bret Indrelee wrote: > > > > On Thu, 12 Apr 2001, george anzinger wrote: > > > Bret Indrelee wrote: > > > > Keep all timers in a sorted double-linked list. Do the insert > > > > intelligently, adding it from the back or front of the list depending on > > > > where it is in relation to existing entries. > > > > > > I think this is too slow, especially for a busy system, but there are > > > solutions... > > > > It is better than the current solution. > > Uh, where are we talking about. The current time list insert is real > close to O(1) and never more than O(5). I don't like the cost of the cascades every (as I recall) 256 interrupts. This is more work than is done in the rest of the interrupt processing, happens during the tick interrupt, and results in a rebuild of much of the table. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Bret Indrelee wrote: > > On Thu, 12 Apr 2001, george anzinger wrote: > > Bret Indrelee wrote: > > > Keep all timers in a sorted double-linked list. Do the insert > > > intelligently, adding it from the back or front of the list depending on > > > where it is in relation to existing entries. > > > > I think this is too slow, especially for a busy system, but there are > > solutions... > > It is better than the current solution. Uh, where are we talking about. The current time list insert is real close to O(1) and never more than O(5). > > The insert takes the most time, having to scan through the list. If you > had to scan the whole list it would be O(n) with a simple linked list. If > you insert it from the end, it is almost always going to be less than > that. Right, but compared to the current O(5) max, this is just too long. > > The time to remove is O(1). > > Fetching the first element from the list is also O(1), but you may have to > fetch several items that have all expired. Here you could do something > clever. Just make sure it is O(1) to determine if the list is empty. > I would hope to move expired timers to another list or just process them. In any case they should not be a problem here. One of the posts that started all this mentioned a tick less system (on a 360 I think) that used the current time list. They had to scan forward in time to find the next event and easy 10 ms was a new list to look at. Conclusion: The current list structure is NOT organized for tick less time keeping. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
On Thu, 12 Apr 2001, george anzinger wrote: > Bret Indrelee wrote: > > Keep all timers in a sorted double-linked list. Do the insert > > intelligently, adding it from the back or front of the list depending on > > where it is in relation to existing entries. > > I think this is too slow, especially for a busy system, but there are > solutions... It is better than the current solution. The insert takes the most time, having to scan through the list. If you had to scan the whole list it would be O(n) with a simple linked list. If you insert it from the end, it is almost always going to be less than that. The time to remove is O(1). Fetching the first element from the list is also O(1), but you may have to fetch several items that have all expired. Here you could do something clever. Just make sure it is O(1) to determine if the list is empty. [ snip ] > > The real trick is to do a lot less processing on every tick than is > > currently done. Current generation PCs can easily handle 1000s of > > interrupts a second if you keep the overhead small. > > I don't see the logic here. Having taken the interrupt, one would tend > to want to do as much as possible, rather than schedule another > interrupt to continue the processing. Rather, I think you are trying to > say that we can afford to take more interrupts for time keeping. Still, > I think what we are trying to get with tick less timers is a system that > takes FEWER interrupts, not more. The system should be CAPABLE of handling 1000s of interrupts without excessive system load. The actual number of interrupts would be reduced if we adjusted the interrupt interval based on the head of the list. Two different things. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Bret Indrelee wrote: > > Mikulas Patocka ([EMAIL PROTECTED]) wrote: > > Adding and removing timers happens much more frequently than PIT tick, > > so > > comparing these times is pointless. > > > > If you have some device and timer protecting it from lockup on buggy > > hardware, you actually > > > > send request to device > > add timer > > > > receive interrupt and read reply > > remove timer > > > > With the curent timer semantics, the cost of add timer and del timer is > > nearly zero. If you had to reprogram the PIT on each request and reply, > > it > > would slow things down. > > > > Note that you call mod_timer also on each packet received - and in worst > > case (which may happen), you end up reprogramming the PIT on each > > packet. > > You can still have nearly zero cost for the normal case. Avoiding worst > case behaviour is also pretty easy. > > You only reprogram the PIT if you have to change the interval. > > Keep all timers in a sorted double-linked list. Do the insert > intelligently, adding it from the back or front of the list depending on > where it is in relation to existing entries. I think this is too slow, especially for a busy system, but there are solutions... > > You only need to reprogram the interval timer when: > 1. You've got a new entry at the head of the list > AND > 2. You've reprogrammed the interval to something larger than the time to > the new head of list. Uh, I think 1. IMPLIES 2. If it is at the head, it must be closer in than what the system is waiting for (unless, of course its time has already passed, but lets not consider that here). > > In the case of a device timeout, it is usually not going to be inserted at > the head of the list. It is very seldom going to actually timeout. Right, and if the system doesn't have many timers, thus putting this at the head, it has the time to do the extra work. > > Choose your interval wisely, only increasing it when you know it will pay > off. The best way of doing this would probably be to track some sort > of LCD for timeouts. I wonder about a more relaxed device timeout semantic that says something like: wait X + next timer interrupt. This would cause the timer insert code to find an entry at least X out and put this timer just after it. There are other ways to do this of course. The question here is: Would this be worth while? > > The real trick is to do a lot less processing on every tick than is > currently done. Current generation PCs can easily handle 1000s of > interrupts a second if you keep the overhead small. I don't see the logic here. Having taken the interrupt, one would tend to want to do as much as possible, rather than schedule another interrupt to continue the processing. Rather, I think you are trying to say that we can afford to take more interrupts for time keeping. Still, I think what we are trying to get with tick less timers is a system that takes FEWER interrupts, not more. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
On Thu, 12 Apr 2001, Mark Salisbury wrote: > On Wed, 11 Apr 2001, Bret Indrelee wrote: > > Current generation PCs can easily handle 1000s of > > interrupts a second if you keep the overhead small. > > the PC centric implementation of the ticked system is one of its major flaws. > > there are architectures which the cost of a fixed interval is the same as a > variable interval, i.e. no reload register, so you must explicitely load a > value each interrupt anyway. and if you want accurate intervals, you must > perform a calculation each reload, and not just load a static value, because > you don't know how many cycles it took from the time the interrupt happened > till you get to the reload line because the int may have been masked or lower > pri than another interrupt. People were saying the cost of adjusting the PIT was high. On those archs where this is true, you would want to avoid changing the interval timer. On a more reasonable architechure, you would change it each time the head of the timer list changed. > also, why handle 1000's of interrupts if you only need to handle 10? There is no reason. I was pointing out that current hardware can easily handle this sort of load, not advocating that distributions change HZ to 1000. On Linux 2.2, if you change HZ to 400 your system is going to run at about 70% of speed. The thing is it is limited to this value, bumping it any higher doesn't cause a change. If you reprogram the RTC to generate a 1000 HZ interrupt pulse, as I recall there is about a 3% change in performance. This is with a constant rate interrupt, making it variable rate would reduce this. You can verify this yourself. Get whatever benchmark you prefer. Run it on a system. Rebuild after changing HZ to 400 and run it again. Restore HZ and use the RTC driver to produce an interval of 1024 HZ, run your benchmark again. Changing HZ had a huge performance impact in 2.2.13, I'm pretty sure it didn't change much in later 2.2 releases. Seems to me like there is a problem here. I'm pretty sure it is a combination of the overhead of the cascaded timers combined with the very high overhead of the scheduler that causes this. Someday this should be fixed. What I would like to see is support for a higher resolution timer in Linux, for those of us that need times down to about 1mSec. Those systems that can quickly reprogram the interval timer would do so each time a new value was at the head of list, others would have to do it more intelligently. Regardless of how it is done, forcing the system to run a constant 1000 HZ interrupt through the system should not impact system performance more than a few percent. If it does, something was done wrong. When there is need for the higher resolution timer, people will accept the overhead. On most systems it would not be used, so there would be no reason to run the higher rate timer. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Wed, 11 Apr 2001, Bret Indrelee wrote: > Current generation PCs can easily handle 1000s of > interrupts a second if you keep the overhead small. the PC centric implementation of the ticked system is one of its major flaws. there are architectures which the cost of a fixed interval is the same as a variable interval, i.e. no reload register, so you must explicitely load a value each interrupt anyway. and if you want accurate intervals, you must perform a calculation each reload, and not just load a static value, because you don't know how many cycles it took from the time the interrupt happened till you get to the reload line because the int may have been masked or lower pri than another interrupt. also, why handle 1000's of interrupts if you only need to handle 10? -- /*** ** Mark Salisbury | Mercury Computer Systems** ** [EMAIL PROTECTED] | System OS - Kernel Team ** **** ** I will be riding in the Multiple Sclerosis** ** Great Mass Getaway, a 150 mile bike ride from ** ** Boston to Provincetown. Last year I raised ** ** over $1200. This year I would like to beat ** ** that. If you would like to contribute, ** ** please contact me.** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Wed, 11 Apr 2001, Jamie Lokier wrote: > Think of the original 64k and 256k VGA cards. I think some of those > didn't have an irq, but did have a way to read the progress of the > raster, which you could PLL to using timer interrupts. Some video games > still look fine at 320x200 :-) The *original* VGA, i.e. the PS/2 one, did have an IRQ, IIRC (according to docs -- I haven't ever seen one). Cheap clones might have lacked it, though. Then there is workstation (non-PC) hardware from early '90s which we run on and which uses an IRQ-driven interface to graphics adapters -- not only for vsync. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
watermodem wrote: > As somebody who is now debating how to measure latencies in a > giga-bit ethernet environment with several components doing > L3 switching in much less than 10 micro-seconds ... (hardware) > I agree that some method is need to achieve higher resolutions. > (Sigh... I will likely need to buy something big and expensive) > {this is for a system to make use of L. Yarrow's little protocol} Use Alteon/Netgear cards, everyone else seems to be :-) We get measurements on the order of 100ns, if we are just measuring network latencies. (Data is not transferred over the PCI bus). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
John Alvord wrote: > I bumped into a funny non-optimization a few years ago. A system with > a timer queue like the above had been "optimized" by keeping old timer > elements... ready for new tasks to link onto and activate. The > granularity was 1 millsecond. Over time, all timer values from 0 to > roughly 10 minutes had been used. That resulted in 60,000 permanent > storage fragments laying about... a significant fragmentation problem. > The end was a forced recycle every month or so. This is the sort of thing that Linux does with slab, dentry, inode caches and so on. In theory the memory is reclaimed as required :-) It's not a big issue with timers, as the timer elements are fixed size structures that tend to be embedded in other structures. So the lifetime of the timer element is the same as the lifetime of the object associated with the timer. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
John Alvord wrote: I bumped into a funny non-optimization a few years ago. A system with a timer queue like the above had been "optimized" by keeping old timer elements... ready for new tasks to link onto and activate. The granularity was 1 millsecond. Over time, all timer values from 0 to roughly 10 minutes had been used. That resulted in 60,000 permanent storage fragments laying about... a significant fragmentation problem. The end was a forced recycle every month or so. This is the sort of thing that Linux does with slab, dentry, inode caches and so on. In theory the memory is reclaimed as required :-) It's not a big issue with timers, as the timer elements are fixed size structures that tend to be embedded in other structures. So the lifetime of the timer element is the same as the lifetime of the object associated with the timer. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
watermodem wrote: As somebody who is now debating how to measure latencies in a giga-bit ethernet environment with several components doing L3 switching in much less than 10 micro-seconds ... (hardware) I agree that some method is need to achieve higher resolutions. (Sigh... I will likely need to buy something big and expensive) {this is for a system to make use of L. Yarrow's little protocol} Use Alteon/Netgear cards, everyone else seems to be :-) We get measurements on the order of 100ns, if we are just measuring network latencies. (Data is not transferred over the PCI bus). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Wed, 11 Apr 2001, Jamie Lokier wrote: Think of the original 64k and 256k VGA cards. I think some of those didn't have an irq, but did have a way to read the progress of the raster, which you could PLL to using timer interrupts. Some video games still look fine at 320x200 :-) The *original* VGA, i.e. the PS/2 one, did have an IRQ, IIRC (according to docs -- I haven't ever seen one). Cheap clones might have lacked it, though. Then there is workstation (non-PC) hardware from early '90s which we run on and which uses an IRQ-driven interface to graphics adapters -- not only for vsync. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Wed, 11 Apr 2001, Bret Indrelee wrote: Current generation PCs can easily handle 1000s of interrupts a second if you keep the overhead small. the PC centric implementation of the ticked system is one of its major flaws. there are architectures which the cost of a fixed interval is the same as a variable interval, i.e. no reload register, so you must explicitely load a value each interrupt anyway. and if you want accurate intervals, you must perform a calculation each reload, and not just load a static value, because you don't know how many cycles it took from the time the interrupt happened till you get to the reload line because the int may have been masked or lower pri than another interrupt. also, why handle 1000's of interrupts if you only need to handle 10? -- /*** ** Mark Salisbury | Mercury Computer Systems** ** [EMAIL PROTECTED] | System OS - Kernel Team ** **** ** I will be riding in the Multiple Sclerosis** ** Great Mass Getaway, a 150 mile bike ride from ** ** Boston to Provincetown. Last year I raised ** ** over $1200. This year I would like to beat ** ** that. If you would like to contribute, ** ** please contact me.** ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
On Thu, 12 Apr 2001, Mark Salisbury wrote: On Wed, 11 Apr 2001, Bret Indrelee wrote: Current generation PCs can easily handle 1000s of interrupts a second if you keep the overhead small. the PC centric implementation of the ticked system is one of its major flaws. there are architectures which the cost of a fixed interval is the same as a variable interval, i.e. no reload register, so you must explicitely load a value each interrupt anyway. and if you want accurate intervals, you must perform a calculation each reload, and not just load a static value, because you don't know how many cycles it took from the time the interrupt happened till you get to the reload line because the int may have been masked or lower pri than another interrupt. People were saying the cost of adjusting the PIT was high. On those archs where this is true, you would want to avoid changing the interval timer. On a more reasonable architechure, you would change it each time the head of the timer list changed. also, why handle 1000's of interrupts if you only need to handle 10? There is no reason. I was pointing out that current hardware can easily handle this sort of load, not advocating that distributions change HZ to 1000. On Linux 2.2, if you change HZ to 400 your system is going to run at about 70% of speed. The thing is it is limited to this value, bumping it any higher doesn't cause a change. If you reprogram the RTC to generate a 1000 HZ interrupt pulse, as I recall there is about a 3% change in performance. This is with a constant rate interrupt, making it variable rate would reduce this. You can verify this yourself. Get whatever benchmark you prefer. Run it on a system. Rebuild after changing HZ to 400 and run it again. Restore HZ and use the RTC driver to produce an interval of 1024 HZ, run your benchmark again. Changing HZ had a huge performance impact in 2.2.13, I'm pretty sure it didn't change much in later 2.2 releases. Seems to me like there is a problem here. I'm pretty sure it is a combination of the overhead of the cascaded timers combined with the very high overhead of the scheduler that causes this. Someday this should be fixed. What I would like to see is support for a higher resolution timer in Linux, for those of us that need times down to about 1mSec. Those systems that can quickly reprogram the interval timer would do so each time a new value was at the head of list, others would have to do it more intelligently. Regardless of how it is done, forcing the system to run a constant 1000 HZ interrupt through the system should not impact system performance more than a few percent. If it does, something was done wrong. When there is need for the higher resolution timer, people will accept the overhead. On most systems it would not be used, so there would be no reason to run the higher rate timer. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Bret Indrelee wrote: Mikulas Patocka ([EMAIL PROTECTED]) wrote: Adding and removing timers happens much more frequently than PIT tick, so comparing these times is pointless. If you have some device and timer protecting it from lockup on buggy hardware, you actually send request to device add timer receive interrupt and read reply remove timer With the curent timer semantics, the cost of add timer and del timer is nearly zero. If you had to reprogram the PIT on each request and reply, it would slow things down. Note that you call mod_timer also on each packet received - and in worst case (which may happen), you end up reprogramming the PIT on each packet. You can still have nearly zero cost for the normal case. Avoiding worst case behaviour is also pretty easy. You only reprogram the PIT if you have to change the interval. Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. I think this is too slow, especially for a busy system, but there are solutions... You only need to reprogram the interval timer when: 1. You've got a new entry at the head of the list AND 2. You've reprogrammed the interval to something larger than the time to the new head of list. Uh, I think 1. IMPLIES 2. If it is at the head, it must be closer in than what the system is waiting for (unless, of course its time has already passed, but lets not consider that here). In the case of a device timeout, it is usually not going to be inserted at the head of the list. It is very seldom going to actually timeout. Right, and if the system doesn't have many timers, thus putting this at the head, it has the time to do the extra work. Choose your interval wisely, only increasing it when you know it will pay off. The best way of doing this would probably be to track some sort of LCD for timeouts. I wonder about a more relaxed device timeout semantic that says something like: wait X + next timer interrupt. This would cause the timer insert code to find an entry at least X out and put this timer just after it. There are other ways to do this of course. The question here is: Would this be worth while? The real trick is to do a lot less processing on every tick than is currently done. Current generation PCs can easily handle 1000s of interrupts a second if you keep the overhead small. I don't see the logic here. Having taken the interrupt, one would tend to want to do as much as possible, rather than schedule another interrupt to continue the processing. Rather, I think you are trying to say that we can afford to take more interrupts for time keeping. Still, I think what we are trying to get with tick less timers is a system that takes FEWER interrupts, not more. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. I think this is too slow, especially for a busy system, but there are solutions... It is better than the current solution. The insert takes the most time, having to scan through the list. If you had to scan the whole list it would be O(n) with a simple linked list. If you insert it from the end, it is almost always going to be less than that. The time to remove is O(1). Fetching the first element from the list is also O(1), but you may have to fetch several items that have all expired. Here you could do something clever. Just make sure it is O(1) to determine if the list is empty. [ snip ] The real trick is to do a lot less processing on every tick than is currently done. Current generation PCs can easily handle 1000s of interrupts a second if you keep the overhead small. I don't see the logic here. Having taken the interrupt, one would tend to want to do as much as possible, rather than schedule another interrupt to continue the processing. Rather, I think you are trying to say that we can afford to take more interrupts for time keeping. Still, I think what we are trying to get with tick less timers is a system that takes FEWER interrupts, not more. The system should be CAPABLE of handling 1000s of interrupts without excessive system load. The actual number of interrupts would be reduced if we adjusted the interrupt interval based on the head of the list. Two different things. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Bret Indrelee wrote: On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. I think this is too slow, especially for a busy system, but there are solutions... It is better than the current solution. Uh, where are we talking about. The current time list insert is real close to O(1) and never more than O(5). The insert takes the most time, having to scan through the list. If you had to scan the whole list it would be O(n) with a simple linked list. If you insert it from the end, it is almost always going to be less than that. Right, but compared to the current O(5) max, this is just too long. The time to remove is O(1). Fetching the first element from the list is also O(1), but you may have to fetch several items that have all expired. Here you could do something clever. Just make sure it is O(1) to determine if the list is empty. I would hope to move expired timers to another list or just process them. In any case they should not be a problem here. One of the posts that started all this mentioned a tick less system (on a 360 I think) that used the current time list. They had to scan forward in time to find the next event and easy 10 ms was a new list to look at. Conclusion: The current list structure is NOT organized for tick less time keeping. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. I think this is too slow, especially for a busy system, but there are solutions... It is better than the current solution. Uh, where are we talking about. The current time list insert is real close to O(1) and never more than O(5). I don't like the cost of the cascades every (as I recall) 256 interrupts. This is more work than is done in the rest of the interrupt processing, happens during the tick interrupt, and results in a rebuild of much of the table. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Bret Indrelee wrote: On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: On Thu, 12 Apr 2001, george anzinger wrote: Bret Indrelee wrote: Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. I think this is too slow, especially for a busy system, but there are solutions... It is better than the current solution. Uh, where are we talking about. The current time list insert is real close to O(1) and never more than O(5). I don't like the cost of the cascades every (as I recall) 256 interrupts. This is more work than is done in the rest of the interrupt processing, happens during the tick interrupt, and results in a rebuild of much of the table. -Bret Wouldn't a heap be a good data structure for a list of timers? Insertion is log(n) and finding the one with the least time is O(1), ie pop off the front It can be implemented in an array which should help cache coherency and all those other things they talked about in school :) Ben -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Ben Greear ([EMAIL PROTECTED]) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Alan Cox wrote: > > > Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s > > and MIN_DELACK is 0.04s, TCP would hardly benefit from them. > > There are a considerable number of people who really do need 1Khz resolution. > Midi is one of the example cases. That doesn't mean we have to go to a 1KHz > timer, we may want to combine a 100Hz timer with a smart switch to 1Khz As somebody who is now debating how to measure latencies in a giga-bit ethernet environment with several components doing L3 switching in much less than 10 micro-seconds ... (hardware) I agree that some method is need to achieve higher resolutions. (Sigh... I will likely need to buy something big and expensive) {this is for a system to make use of L. Yarrow's little protocol} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
> I suspect you might go for ticked if its overhead was less. The thing > that makes me think the overhead is high for tick less is the accounting > and time slice stuff. This has to be handled each system call and each > schedule call. These happen WAY more often than ticks... Contrary > wise, I would go for tick less if its overhead is the same or less under > a reasonable load. Of course, we would also want to check overload > sensitivity. as I said, i think the process time accounting is disjoint from the ticked/tickless issue and should therefore be considered seperately.. in my experience with the tickless method, after all pending timers have been serviced, then to determine the interval until the next interrupt should be generated all that is needed is one 64 bit subtraction and a register load (about 5 - 8 cycles) I think we should spend some serious time with some quick and dirty prototypes of both methods, both to characterize all the issues raised and to refine our ideas when it comes time to implement this. I also think that we should give some thought to implementing both an improved ticked system and a tickless system to be chosen as CONFIG options so that the developer putting together a system can make a choice based on their needs, and not our whims. I am more than willing to concede that there is a class of customer to whom a ticked system is appropriate, but I am quite sure that there is a class for whom the ticked model is completely unacceptable. > On a half way loaded system? Do you think it is that high? I sort of > think that, once the system is loaded, there would be a useful timer > tick most of the time. Might be useful to do some measurements of this. any non-useful tick takes away cycles that could be better used performing FFT's there are many kinds of loads, some which generate large numbers of timed events and some that generate none or few. I think we want to be able to provide good solutions for both cases even. we should do lots of measurements. Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Wed, 11 Apr 2001 20:57:04 +0200, Jamie Lokier <[EMAIL PROTECTED]> wrote: >george anzinger wrote: >> > A pointer-based priority queue is really not a very complex thing, and >> > there are ways to optimise them for typical cases like the above. >> > >> Please do enlighten me. The big problem in my mind is that the >> pointers, pointing at points in time, are perishable. > >Um, I'm not sure what perishability has to do with anything. Timers at >the moment can be added, deleted and destroyed and it's no big deal. > >Look in an algorithms book under "priority queue". They usually don't >say how to use a heap-ordered tree -- usually it's an array -- but you >can use a tree if you want. In such a tree, each timer has a link to >_two_ next timers and one previous timer. (The previous timer link is >only needed if you can delete timers before they expire, which is >required for Linux). > >I'm not saying saying a heap-ordered tree is fastest. But it's ok, and >doesn't require any more storage than the `struct timer' objects >themselves. > >It's possible to optimise this kind of data structure rather a lot, >depending on how you want to use it. Linux' current timer algorithm is >a pretty good example of a priority queue optimised for timer ticks, >where you don't mind doing a small amount of work per tick. > >One notable complication with the kernel is that we don't want every >timer to run at its exactly allocated time, except for some special >timers. For example, if you have 100 TCP streams and their >retransmission times are scheduled for 1.s, 1.0001s, 1.0002s, etc., >you'd much rather just process them all at once as is done at the moment >by the 100Hz timer. This is because cache locality is much more >important for TCP retransmits than transmit timing resolution. > >There's literature online about this class of problems: look up "event >set" and "simulation" and "fast priority queue". I bumped into a funny non-optimization a few years ago. A system with a timer queue like the above had been "optimized" by keeping old timer elements... ready for new tasks to link onto and activate. The granularity was 1 millsecond. Over time, all timer values from 0 to roughly 10 minutes had been used. That resulted in 60,000 permanent storage fragments laying about... a significant fragmentation problem. The end was a forced recycle every month or so. john alvord - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
george anzinger wrote: > > A pointer-based priority queue is really not a very complex thing, and > > there are ways to optimise them for typical cases like the above. > > > Please do enlighten me. The big problem in my mind is that the > pointers, pointing at points in time, are perishable. Um, I'm not sure what perishability has to do with anything. Timers at the moment can be added, deleted and destroyed and it's no big deal. Look in an algorithms book under "priority queue". They usually don't say how to use a heap-ordered tree -- usually it's an array -- but you can use a tree if you want. In such a tree, each timer has a link to _two_ next timers and one previous timer. (The previous timer link is only needed if you can delete timers before they expire, which is required for Linux). I'm not saying saying a heap-ordered tree is fastest. But it's ok, and doesn't require any more storage than the `struct timer' objects themselves. It's possible to optimise this kind of data structure rather a lot, depending on how you want to use it. Linux' current timer algorithm is a pretty good example of a priority queue optimised for timer ticks, where you don't mind doing a small amount of work per tick. One notable complication with the kernel is that we don't want every timer to run at its exactly allocated time, except for some special timers. For example, if you have 100 TCP streams and their retransmission times are scheduled for 1.s, 1.0001s, 1.0002s, etc., you'd much rather just process them all at once as is done at the moment by the 100Hz timer. This is because cache locality is much more important for TCP retransmits than transmit timing resolution. There's literature online about this class of problems: look up "event set" and "simulation" and "fast priority queue". enjoy, -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Mikulas Patocka ([EMAIL PROTECTED]) wrote: > Adding and removing timers happens much more frequently than PIT tick, > so > comparing these times is pointless. > > If you have some device and timer protecting it from lockup on buggy > hardware, you actually > > send request to device > add timer > > receive interrupt and read reply > remove timer > > With the curent timer semantics, the cost of add timer and del timer is > nearly zero. If you had to reprogram the PIT on each request and reply, > it > would slow things down. > > Note that you call mod_timer also on each packet received - and in worst > case (which may happen), you end up reprogramming the PIT on each > packet. You can still have nearly zero cost for the normal case. Avoiding worst case behaviour is also pretty easy. You only reprogram the PIT if you have to change the interval. Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. You only need to reprogram the interval timer when: 1. You've got a new entry at the head of the list AND 2. You've reprogrammed the interval to something larger than the time to the new head of list. In the case of a device timeout, it is usually not going to be inserted at the head of the list. It is very seldom going to actually timeout. Choose your interval wisely, only increasing it when you know it will pay off. The best way of doing this would probably be to track some sort of LCD for timeouts. The real trick is to do a lot less processing on every tick than is currently done. Current generation PCs can easily handle 1000s of interrupts a second if you keep the overhead small. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Mon, 9 Apr 2001, Alan Cox wrote: > > > Its worth doing even on the ancient x86 boards with the PIT. > > > > Note that programming the PIT is slw and doing it on every timer > > add_timer/del_timer would be a pain. > > You only have to do it occasionally. > > When you add a timer newer than the current one > (arguably newer by at least 1/2*HZ sec) That's only if we want to do no better than the current system. We'd want a new variable called timer_margin or something, which would be dependent on interrupt source and processor, and could be tuned up or down via sysctl. > When you finish running the timers at an interval and the new interval is > significantly larger than the current one. Make that larger or smaller. If we come out of a quiescent state (1 hz interrupts, say) and start getting 10ms timers, we want to respond to them right away. > Remember each tick we poke the PIT anyway We could also have a HZ_max tunable above which we would not try to reprogram the interval. On older systems, this could be set at 100-200HZ... -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Jamie Locker wrote: > > Mark Salisbury wrote: > > > The complexity comes in when you want to maintain indexes into the list > > > for quick insertion of new timers. To get the current insert > > > performance, for example, you would need pointers to (at least) each of > > > the next 256 centasecond boundaries in the list. But the list ages, so > > > these pointers need to be continually updated. The thought I had was to > > > update needed pointers (and only those needed) each time an insert was > > > done and a needed pointer was found to be missing or stale. Still it > > > adds complexity that the static structure used now doesn't have. > > > > actually, I think a head and tail pointer would be sufficient for most > > cases. (most new timers are either going to be a new head of list or a new > > tail, i.e. long duration timeouts that will never be serviced or short > > duration timers that are going to go off "real soon now (tm)") the oddball > > cases would be mostly coming from user-space, i.e. nanosleep which a longerr > > list insertion disapears in the block/wakeup/context switch overhead > > A pointer-based priority queue is really not a very complex thing, and > there are ways to optimise them for typical cases like the above. > Please do enlighten me. The big problem in my mind is that the pointers, pointing at points in time, are perishable. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Maciej W. Rozycki wrote: > On Tue, 10 Apr 2001, Zdenek Kabelac wrote: > > > I think it would be wrong if we could not add new very usable features > > to the system just because some old hardware doesn't support it. > > s/old/crappy/ -- even old hardware can handle vsync IRQs fine. E.g. TVGA > 8900C. Think of the original 64k and 256k VGA cards. I think some of those didn't have an irq, but did have a way to read the progress of the raster, which you could PLL to using timer interrupts. Some video games still look fine at 320x200 :-) -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Mark Salisbury wrote: > > The complexity comes in when you want to maintain indexes into the list > > for quick insertion of new timers. To get the current insert > > performance, for example, you would need pointers to (at least) each of > > the next 256 centasecond boundaries in the list. But the list ages, so > > these pointers need to be continually updated. The thought I had was to > > update needed pointers (and only those needed) each time an insert was > > done and a needed pointer was found to be missing or stale. Still it > > adds complexity that the static structure used now doesn't have. > > actually, I think a head and tail pointer would be sufficient for most > cases. (most new timers are either going to be a new head of list or a new > tail, i.e. long duration timeouts that will never be serviced or short > duration timers that are going to go off "real soon now (tm)") the oddball > cases would be mostly coming from user-space, i.e. nanosleep which a longerr > list insertion disapears in the block/wakeup/context switch overhead A pointer-based priority queue is really not a very complex thing, and there are ways to optimise them for typical cases like the above. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Tue, 10 Apr 2001, Zdenek Kabelac wrote: > I think it would be wrong if we could not add new very usable features > to the system just because some old hardware doesn't support it. s/old/crappy/ -- even old hardware can handle vsync IRQs fine. E.g. TVGA 8900C. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
>f) As noted, the account timers (task user/system times) would be much >more accurate with the tick less approach. The cost is added code in >both the system call and the schedule path. > >Tentative conclusions: > >Currently we feel that the tick less approach is not acceptable due to >(f). We felt that this added code would NOT be welcome AND would, in a >reasonably active system, have much higher overhead than any savings in >not having a tick. Also (d) implies a list organization that will, at >the very least, be harder to understand. (We have some thoughts here, >but abandoned the effort because of (f).) We are, of course, open to >discussion on this issue and all others related to the project >objectives. f) might be true on Intel based systems. At least for s/390 the situation is a little bit different. Here is a extract from the s/390 part of the timer patch: .macro UPDATE_ENTER_TIME reload la %r14,thread+_TSS_UTIME(%r9) # pointer to utime tm SP_PSW+1(%r15),0x01 # interrupting from user ? jno 0f # yes -> add to user time la %r14,8(%r14) # no -> add to system time 0: lm %r0,%r1,0(%r14) # load user/system time sl %r1,__LC_LAST_MARK+4 # subtract last time mark bc 3,BASED(1f) # borrow ? sl %r0,BASED(.Lc1) 1: sl %r0,__LC_LAST_MARK stck__LC_LAST_MARK # make a new mark al %r1,__LC_LAST_MARK+4 # add new mark -> added delta bc 12,BASED(2f) # carry ? al %r0,BASED(.Lc1) 2: al %r0,__LC_LAST_MARK stm %r0,%r1,0(%r14) # store updated user/system time clc __LC_LAST_MARK(8),__LC_JIFFY_TIMER # check if enough time jl 3f # passed for a jiffy update l %r1,BASED(.Ltime_warp) basr%r14,%r1 .if \reload # reload != 0 for system call lm %r2,%r6,SP_R2(%r15) # reload clobbered parameters .endif 3: .macro UPDATE_LEAVE_TIME l %r1,BASED(.Ltq_timer)# test if tq_timer list is empty x %r1,0(%r1) # tq_timer->next != tq_timer ? jz 0f l %r1,BASED(.Ltq_timer_active) icm %r0,15,0(%r1)# timer event already added ? jnz 0f l %r1,BASED(.Ltq_pending) basr%r14,%r1 0: lm %r0,%r1,thread+_TSS_STIME(%r9) # load system time sl %r1,__LC_LAST_MARK+4 # subtract last time mark bc 3,BASED(1f) # borrow ? sl %r0,BASED(.Lc1) 1: sl %r0,__LC_LAST_MARK stck__LC_LAST_MARK # make new mark al %r1,__LC_LAST_MARK+4 # add new mark -> added delta bc 12,BASED(2f) # carry ? al %r0,BASED(.Lc1) 2: al %r0,__LC_LAST_MARK stm %r0,%r1,thread+_TSS_STIME(%r9) # store system time .endm The two macros UPDATE_ENTER_TIME and UPDATE_LEAVE_TIMER are executed on every system entry/exit. In the case that no special work has to be done less then 31 instruction are executed in addition to the normal system entry/exit code. Special work has to be done if more time then 1/HZ has passed (call time_warp), or if tq_timer contains an element (call tq_pending). The accuracy of the timer events has not changed. It still is 1/HZ. The only thing this patch does is to avoid unneeded interruptions. I'd be happy if this could be combined with a new, more accurate timing method. blue skies, Martin Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247 E-Mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
f) As noted, the account timers (task user/system times) would be much more accurate with the tick less approach. The cost is added code in both the system call and the schedule path. Tentative conclusions: Currently we feel that the tick less approach is not acceptable due to (f). We felt that this added code would NOT be welcome AND would, in a reasonably active system, have much higher overhead than any savings in not having a tick. Also (d) implies a list organization that will, at the very least, be harder to understand. (We have some thoughts here, but abandoned the effort because of (f).) We are, of course, open to discussion on this issue and all others related to the project objectives. f) might be true on Intel based systems. At least for s/390 the situation is a little bit different. Here is a extract from the s/390 part of the timer patch: .macro UPDATE_ENTER_TIME reload la %r14,thread+_TSS_UTIME(%r9) # pointer to utime tm SP_PSW+1(%r15),0x01 # interrupting from user ? jno 0f # yes - add to user time la %r14,8(%r14) # no - add to system time 0: lm %r0,%r1,0(%r14) # load user/system time sl %r1,__LC_LAST_MARK+4 # subtract last time mark bc 3,BASED(1f) # borrow ? sl %r0,BASED(.Lc1) 1: sl %r0,__LC_LAST_MARK stck__LC_LAST_MARK # make a new mark al %r1,__LC_LAST_MARK+4 # add new mark - added delta bc 12,BASED(2f) # carry ? al %r0,BASED(.Lc1) 2: al %r0,__LC_LAST_MARK stm %r0,%r1,0(%r14) # store updated user/system time clc __LC_LAST_MARK(8),__LC_JIFFY_TIMER # check if enough time jl 3f # passed for a jiffy update l %r1,BASED(.Ltime_warp) basr%r14,%r1 .if \reload # reload != 0 for system call lm %r2,%r6,SP_R2(%r15) # reload clobbered parameters .endif 3: .macro UPDATE_LEAVE_TIME l %r1,BASED(.Ltq_timer)# test if tq_timer list is empty x %r1,0(%r1) # tq_timer-next != tq_timer ? jz 0f l %r1,BASED(.Ltq_timer_active) icm %r0,15,0(%r1)# timer event already added ? jnz 0f l %r1,BASED(.Ltq_pending) basr%r14,%r1 0: lm %r0,%r1,thread+_TSS_STIME(%r9) # load system time sl %r1,__LC_LAST_MARK+4 # subtract last time mark bc 3,BASED(1f) # borrow ? sl %r0,BASED(.Lc1) 1: sl %r0,__LC_LAST_MARK stck__LC_LAST_MARK # make new mark al %r1,__LC_LAST_MARK+4 # add new mark - added delta bc 12,BASED(2f) # carry ? al %r0,BASED(.Lc1) 2: al %r0,__LC_LAST_MARK stm %r0,%r1,thread+_TSS_STIME(%r9) # store system time .endm The two macros UPDATE_ENTER_TIME and UPDATE_LEAVE_TIMER are executed on every system entry/exit. In the case that no special work has to be done less then 31 instruction are executed in addition to the normal system entry/exit code. Special work has to be done if more time then 1/HZ has passed (call time_warp), or if tq_timer contains an element (call tq_pending). The accuracy of the timer events has not changed. It still is 1/HZ. The only thing this patch does is to avoid unneeded interruptions. I'd be happy if this could be combined with a new, more accurate timing method. blue skies, Martin Linux/390 Design Development, IBM Deutschland Entwicklung GmbH Schnaicherstr. 220, D-71032 Bblingen, Telefon: 49 - (0)7031 - 16-2247 E-Mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Tue, 10 Apr 2001, Zdenek Kabelac wrote: I think it would be wrong if we could not add new very usable features to the system just because some old hardware doesn't support it. s/old/crappy/ -- even old hardware can handle vsync IRQs fine. E.g. TVGA 8900C. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Mark Salisbury wrote: The complexity comes in when you want to maintain indexes into the list for quick insertion of new timers. To get the current insert performance, for example, you would need pointers to (at least) each of the next 256 centasecond boundaries in the list. But the list ages, so these pointers need to be continually updated. The thought I had was to update needed pointers (and only those needed) each time an insert was done and a needed pointer was found to be missing or stale. Still it adds complexity that the static structure used now doesn't have. actually, I think a head and tail pointer would be sufficient for most cases. (most new timers are either going to be a new head of list or a new tail, i.e. long duration timeouts that will never be serviced or short duration timers that are going to go off "real soon now (tm)") the oddball cases would be mostly coming from user-space, i.e. nanosleep which a longerr list insertion disapears in the block/wakeup/context switch overhead A pointer-based priority queue is really not a very complex thing, and there are ways to optimise them for typical cases like the above. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Maciej W. Rozycki wrote: On Tue, 10 Apr 2001, Zdenek Kabelac wrote: I think it would be wrong if we could not add new very usable features to the system just because some old hardware doesn't support it. s/old/crappy/ -- even old hardware can handle vsync IRQs fine. E.g. TVGA 8900C. Think of the original 64k and 256k VGA cards. I think some of those didn't have an irq, but did have a way to read the progress of the raster, which you could PLL to using timer interrupts. Some video games still look fine at 320x200 :-) -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Jamie Locker wrote: Mark Salisbury wrote: The complexity comes in when you want to maintain indexes into the list for quick insertion of new timers. To get the current insert performance, for example, you would need pointers to (at least) each of the next 256 centasecond boundaries in the list. But the list ages, so these pointers need to be continually updated. The thought I had was to update needed pointers (and only those needed) each time an insert was done and a needed pointer was found to be missing or stale. Still it adds complexity that the static structure used now doesn't have. actually, I think a head and tail pointer would be sufficient for most cases. (most new timers are either going to be a new head of list or a new tail, i.e. long duration timeouts that will never be serviced or short duration timers that are going to go off "real soon now (tm)") the oddball cases would be mostly coming from user-space, i.e. nanosleep which a longerr list insertion disapears in the block/wakeup/context switch overhead A pointer-based priority queue is really not a very complex thing, and there are ways to optimise them for typical cases like the above. Please do enlighten me. The big problem in my mind is that the pointers, pointing at points in time, are perishable. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Mon, 9 Apr 2001, Alan Cox wrote: Its worth doing even on the ancient x86 boards with the PIT. Note that programming the PIT is slw and doing it on every timer add_timer/del_timer would be a pain. You only have to do it occasionally. When you add a timer newer than the current one (arguably newer by at least 1/2*HZ sec) That's only if we want to do no better than the current system. We'd want a new variable called timer_margin or something, which would be dependent on interrupt source and processor, and could be tuned up or down via sysctl. When you finish running the timers at an interval and the new interval is significantly larger than the current one. Make that larger or smaller. If we come out of a quiescent state (1 hz interrupts, say) and start getting 10ms timers, we want to respond to them right away. Remember each tick we poke the PIT anyway We could also have a HZ_max tunable above which we would not try to reprogram the interval. On older systems, this could be set at 100-200HZ... -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Wed, 11 Apr 2001 20:57:04 +0200, Jamie Lokier [EMAIL PROTECTED] wrote: george anzinger wrote: A pointer-based priority queue is really not a very complex thing, and there are ways to optimise them for typical cases like the above. Please do enlighten me. The big problem in my mind is that the pointers, pointing at points in time, are perishable. Um, I'm not sure what perishability has to do with anything. Timers at the moment can be added, deleted and destroyed and it's no big deal. Look in an algorithms book under "priority queue". They usually don't say how to use a heap-ordered tree -- usually it's an array -- but you can use a tree if you want. In such a tree, each timer has a link to _two_ next timers and one previous timer. (The previous timer link is only needed if you can delete timers before they expire, which is required for Linux). I'm not saying saying a heap-ordered tree is fastest. But it's ok, and doesn't require any more storage than the `struct timer' objects themselves. It's possible to optimise this kind of data structure rather a lot, depending on how you want to use it. Linux' current timer algorithm is a pretty good example of a priority queue optimised for timer ticks, where you don't mind doing a small amount of work per tick. One notable complication with the kernel is that we don't want every timer to run at its exactly allocated time, except for some special timers. For example, if you have 100 TCP streams and their retransmission times are scheduled for 1.s, 1.0001s, 1.0002s, etc., you'd much rather just process them all at once as is done at the moment by the 100Hz timer. This is because cache locality is much more important for TCP retransmits than transmit timing resolution. There's literature online about this class of problems: look up "event set" and "simulation" and "fast priority queue". I bumped into a funny non-optimization a few years ago. A system with a timer queue like the above had been "optimized" by keeping old timer elements... ready for new tasks to link onto and activate. The granularity was 1 millsecond. Over time, all timer values from 0 to roughly 10 minutes had been used. That resulted in 60,000 permanent storage fragments laying about... a significant fragmentation problem. The end was a forced recycle every month or so. john alvord - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer!
Mikulas Patocka ([EMAIL PROTECTED]) wrote: Adding and removing timers happens much more frequently than PIT tick, so comparing these times is pointless. If you have some device and timer protecting it from lockup on buggy hardware, you actually send request to device add timer receive interrupt and read reply remove timer With the curent timer semantics, the cost of add timer and del timer is nearly zero. If you had to reprogram the PIT on each request and reply, it would slow things down. Note that you call mod_timer also on each packet received - and in worst case (which may happen), you end up reprogramming the PIT on each packet. You can still have nearly zero cost for the normal case. Avoiding worst case behaviour is also pretty easy. You only reprogram the PIT if you have to change the interval. Keep all timers in a sorted double-linked list. Do the insert intelligently, adding it from the back or front of the list depending on where it is in relation to existing entries. You only need to reprogram the interval timer when: 1. You've got a new entry at the head of the list AND 2. You've reprogrammed the interval to something larger than the time to the new head of list. In the case of a device timeout, it is usually not going to be inserted at the head of the list. It is very seldom going to actually timeout. Choose your interval wisely, only increasing it when you know it will pay off. The best way of doing this would probably be to track some sort of LCD for timeouts. The real trick is to do a lot less processing on every tick than is currently done. Current generation PCs can easily handle 1000s of interrupts a second if you keep the overhead small. -Bret -- Bret Indrelee | Sometimes, to be deep, we must act shallow! [EMAIL PROTECTED] | -Riff in The Quatrix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
I suspect you might go for ticked if its overhead was less. The thing that makes me think the overhead is high for tick less is the accounting and time slice stuff. This has to be handled each system call and each schedule call. These happen WAY more often than ticks... Contrary wise, I would go for tick less if its overhead is the same or less under a reasonable load. Of course, we would also want to check overload sensitivity. as I said, i think the process time accounting is disjoint from the ticked/tickless issue and should therefore be considered seperately.. in my experience with the tickless method, after all pending timers have been serviced, then to determine the interval until the next interrupt should be generated all that is needed is one 64 bit subtraction and a register load (about 5 - 8 cycles) I think we should spend some serious time with some quick and dirty prototypes of both methods, both to characterize all the issues raised and to refine our ideas when it comes time to implement this. I also think that we should give some thought to implementing both an improved ticked system and a tickless system to be chosen as CONFIG options so that the developer putting together a system can make a choice based on their needs, and not our whims. I am more than willing to concede that there is a class of customer to whom a ticked system is appropriate, but I am quite sure that there is a class for whom the ticked model is completely unacceptable. On a half way loaded system? Do you think it is that high? I sort of think that, once the system is loaded, there would be a useful timer tick most of the time. Might be useful to do some measurements of this. any non-useful tick takes away cycles that could be better used performing FFT's there are many kinds of loads, some which generate large numbers of timed events and some that generate none or few. I think we want to be able to provide good solutions for both cases even. we should do lots of measurements. Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Alan Cox wrote: Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s and MIN_DELACK is 0.04s, TCP would hardly benefit from them. There are a considerable number of people who really do need 1Khz resolution. Midi is one of the example cases. That doesn't mean we have to go to a 1KHz timer, we may want to combine a 100Hz timer with a smart switch to 1Khz As somebody who is now debating how to measure latencies in a giga-bit ethernet environment with several components doing L3 switching in much less than 10 micro-seconds ... (hardware) I agree that some method is need to achieve higher resolutions. (Sigh... I will likely need to buy something big and expensive) {this is for a system to make use of L. Yarrow's little protocol} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Mark Salisbury wrote: > > It would probably be a good compile config option to allow fine or coarse > process time accounting, that leaves the choice to the person setting up the > system to make the choice based on their needs. > I suggested this a while ago during a discussion about performance measurement. This would be fairly easy to implement using the patch provided with the Linux Trace Toolkit since all entry points and exit points are known (and it already is available in post-mortem analysis). Implementing the measurement code within the kernel should be fairly easy to implement and it would be provided as part of the compile option. All in all, given the measurements I made, I'd place the overhead at around 1% for the computations. (The overhead is very likely to be negligeable when eventual fixes are taken into account.) === Karim Yaghmour [EMAIL PROTECTED] Embedded and Real-Time Linux Expert === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Mark Salisbury wrote: > > > mark salisbury wrote: > > > > > > george anzinger wrote: > > > > > > > f) As noted, the account timers (task user/system times) would be much > > > > more accurate with the tick less approach. The cost is added code in > > > > both the system call and the schedule path. > > > > > > > > Tentative conclusions: > > > > > > > > Currently we feel that the tick less approach is not acceptable due to > > > > (f). We felt that this added code would NOT be welcome AND would, in > a > > > > reasonably active system, have much higher overhead than any savings > in > > > > not having a tick. Also (d) implies a list organization that will, at > > > > the very least, be harder to understand. (We have some thoughts here, > > > > but abandoned the effort because of (f).) We are, of course, open to > > > > discussion on this issue and all others related to the project > > > > objectives. > > > > > > f does not imply tick-less is not acceptable, it implies that better > process time > > > accounting is not acceptable. > > > > My thinking is that a timer implementation that forced (f) would have > > problems gaining acceptance (even with me :). I think a tick less > > system does force this and this is why we have, at least for the moment, > > abandoned it. In no way does this preclude (f) as it is compatible with > > either ticks or tick less time keeping. On the other hand, the stated > > project objectives do not include (f) unless, of course we do a tick > > less time system. > > > > > > list organization is not complex, it is a sorted absolute time list. I > would > > > argue that this is a hell of a lot easier to understand that ticks + > offsets. > > > > The complexity comes in when you want to maintain indexes into the list > > for quick insertion of new timers. To get the current insert > > performance, for example, you would need pointers to (at least) each of > > the next 256 centasecond boundaries in the list. But the list ages, so > > these pointers need to be continually updated. The thought I had was to > > update needed pointers (and only those needed) each time an insert was > > done and a needed pointer was found to be missing or stale. Still it > > adds complexity that the static structure used now doesn't have. > > actually, I think a head and tail pointer would be sufficient for most > cases. (most new timers are either going to be a new head of list or a new > tail, i.e. long duration timeouts that will never be serviced or short > duration timers that are going to go off "real soon now (tm)") the oddball > cases would be mostly coming from user-space, i.e. nanosleep which a longerr > list insertion disapears in the block/wakeup/context switch overhead > > > > > > > still, better process time accounting should be a compile CONFIG option, > not > > > ignored and ruled out because some one thinks that is is to expensive in > the > > > general case. > > > > As I said above, we are not ruling it out, but rather, we are not > > requiring it by going tick less. > > As I said, it is not clear how you could get > > CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every > > jiffie. What did you have in mind? > > time accounting can be limited to the quantum expiration and voluntary > yields in the tickless/useless case. > > > For the most part, I agree. I am not sure that it makes a lot of sense > > to mix some of these options, however. I think it comes down to the > > question of benefit vs cost. If keeping an old version around that is > > not any faster or more efficient in any way would seem too costly to > > me. We would like to provide a system that is better in every way and > > thus eliminate the need to keep the old one around. We could leave it > > in as a compile option so folks would have a fall back, I suppose. > > I agree that some combinations don't make much sense _TO_ME_ but that > doesn't mean they don't meet sombody's needs. > > in my case (embedded, medium hard real time, massively parallel > multicomputer) the only choices that makes sense to my customers is > tickless/useless in deployment and tickless/useful in > development/profiling/optimization. I suspect you might go for ticked if its overhead was less. The thing that makes me think the overhead is high for tick less is the accounting and time slice stuff. This has to be handled each system call and each schedule call. These happen WAY more often than ticks... Contrary wise, I would go for tick less if its overhead is the same or less under a reasonable load. Of course, we would also want to check overload sensitivity. > > in other cases ticked/useless makes sense (dumb old slow chips) > > I have no idea who would want ticked/useful or hybrid. i suggested hybrid > as an alternative in case linus might be reluctant to gutting and replacing > the sysclock. > > > > > An Issue no one has raised is that the tick less system would need to > > start a timer each
Re: No 100 HZ timer !
> mark salisbury wrote: > > > > george anzinger wrote: > > > > > f) As noted, the account timers (task user/system times) would be much > > > more accurate with the tick less approach. The cost is added code in > > > both the system call and the schedule path. > > > > > > Tentative conclusions: > > > > > > Currently we feel that the tick less approach is not acceptable due to > > > (f). We felt that this added code would NOT be welcome AND would, in a > > > reasonably active system, have much higher overhead than any savings in > > > not having a tick. Also (d) implies a list organization that will, at > > > the very least, be harder to understand. (We have some thoughts here, > > > but abandoned the effort because of (f).) We are, of course, open to > > > discussion on this issue and all others related to the project > > > objectives. > > > > f does not imply tick-less is not acceptable, it implies that better process time > > accounting is not acceptable. > > My thinking is that a timer implementation that forced (f) would have > problems gaining acceptance (even with me :). I think a tick less > system does force this and this is why we have, at least for the moment, > abandoned it. In no way does this preclude (f) as it is compatible with > either ticks or tick less time keeping. On the other hand, the stated > project objectives do not include (f) unless, of course we do a tick > less time system. > > > > list organization is not complex, it is a sorted absolute time list. I would > > argue that this is a hell of a lot easier to understand that ticks + offsets. > > The complexity comes in when you want to maintain indexes into the list > for quick insertion of new timers. To get the current insert > performance, for example, you would need pointers to (at least) each of > the next 256 centasecond boundaries in the list. But the list ages, so > these pointers need to be continually updated. The thought I had was to > update needed pointers (and only those needed) each time an insert was > done and a needed pointer was found to be missing or stale. Still it > adds complexity that the static structure used now doesn't have. actually, I think a head and tail pointer would be sufficient for most cases. (most new timers are either going to be a new head of list or a new tail, i.e. long duration timeouts that will never be serviced or short duration timers that are going to go off "real soon now (tm)") the oddball cases would be mostly coming from user-space, i.e. nanosleep which a longerr list insertion disapears in the block/wakeup/context switch overhead > > > > still, better process time accounting should be a compile CONFIG option, not > > ignored and ruled out because some one thinks that is is to expensive in the > > general case. > > As I said above, we are not ruling it out, but rather, we are not > requiring it by going tick less. > As I said, it is not clear how you could get > CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every > jiffie. What did you have in mind? time accounting can be limited to the quantum expiration and voluntary yields in the tickless/useless case. > For the most part, I agree. I am not sure that it makes a lot of sense > to mix some of these options, however. I think it comes down to the > question of benefit vs cost. If keeping an old version around that is > not any faster or more efficient in any way would seem too costly to > me. We would like to provide a system that is better in every way and > thus eliminate the need to keep the old one around. We could leave it > in as a compile option so folks would have a fall back, I suppose. I agree that some combinations don't make much sense _TO_ME_ but that doesn't mean they don't meet sombody's needs. in my case (embedded, medium hard real time, massively parallel multicomputer) the only choices that makes sense to my customers is tickless/useless in deployment and tickless/useful in development/profiling/optimization. in other cases ticked/useless makes sense (dumb old slow chips) I have no idea who would want ticked/useful or hybrid. i suggested hybrid as an alternative in case linus might be reluctant to gutting and replacing the sysclock. > > An Issue no one has raised is that the tick less system would need to > start a timer each time it scheduled a task. This would lead to either > slow but very precise time slicing or about what we have today with more > schedule overhead. no. you would re-use the timer with a new expiration time and a re-insert. also re overhead. the cost of servicing 10 times as many interrupts as necessary for system function must cost a fair chunk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
mark salisbury wrote: > > george anzinger wrote: > > > f) As noted, the account timers (task user/system times) would be much > > more accurate with the tick less approach. The cost is added code in > > both the system call and the schedule path. > > > > Tentative conclusions: > > > > Currently we feel that the tick less approach is not acceptable due to > > (f). We felt that this added code would NOT be welcome AND would, in a > > reasonably active system, have much higher overhead than any savings in > > not having a tick. Also (d) implies a list organization that will, at > > the very least, be harder to understand. (We have some thoughts here, > > but abandoned the effort because of (f).) We are, of course, open to > > discussion on this issue and all others related to the project > > objectives. > > f does not imply tick-less is not acceptable, it implies that better process time > accounting is not acceptable. My thinking is that a timer implementation that forced (f) would have problems gaining acceptance (even with me :). I think a tick less system does force this and this is why we have, at least for the moment, abandoned it. In no way does this preclude (f) as it is compatible with either ticks or tick less time keeping. On the other hand, the stated project objectives do not include (f) unless, of course we do a tick less time system. > > list organization is not complex, it is a sorted absolute time list. I would > argue that this is a hell of a lot easier to understand that ticks + offsets. The complexity comes in when you want to maintain indexes into the list for quick insertion of new timers. To get the current insert performance, for example, you would need pointers to (at least) each of the next 256 centasecond boundaries in the list. But the list ages, so these pointers need to be continually updated. The thought I had was to update needed pointers (and only those needed) each time an insert was done and a needed pointer was found to be missing or stale. Still it adds complexity that the static structure used now doesn't have. > > still, better process time accounting should be a compile CONFIG option, not > ignored and ruled out because some one thinks that is is to expensive in the > general case. As I said above, we are not ruling it out, but rather, we are not requiring it by going tick less. > > the whole point of linux and CONFIG options is to get you the kernel with the > features you want, not what someone else wants. > > there should be a whole range of config options associated with this issue: > > CONFIG_JIFFIES == old jiffies implementation > CONFIG_TICKLESS == tickless > CONFIG_HYBRID == old jiffies plus a tickless high-res timer system on > the side but not assoc w/ process and global > timekeeping > > CONFIG_USELESS_PROCESS_TIME_ACCOUNTING = old style, cheap time acctg > CONFIG_USEFUL_BUT_COSTS_TIME_ACCOUNTING = accurate but expensive time accounting > > this way, users who want tickless and lousy time acctg can have it AND people who > want jiffies and good time acctg could have it. As I said, it is not clear how you could get CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every jiffie. What did you have in mind? > > these features are largely disjoint and easily seperable. it is also relatively > trivial to do this in such a way that drivers depending on the jiffie abstraction > can be supported without modification no matter what the configuration. > For the most part, I agree. I am not sure that it makes a lot of sense to mix some of these options, however. I think it comes down to the question of benefit vs cost. If keeping an old version around that is not any faster or more efficient in any way would seem too costly to me. We would like to provide a system that is better in every way and thus eliminate the need to keep the old one around. We could leave it in as a compile option so folks would have a fall back, I suppose. An Issue no one has raised is that the tick less system would need to start a timer each time it scheduled a task. This would lead to either slow but very precise time slicing or about what we have today with more schedule overhead. George > Mark Salisbury - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Tue, Apr 10, 2001 at 02:45:15PM -0400, Stephen D. Williams wrote: > When this is rewritten, I would strongly suggest that we find a way to > make 'gettimeofday' nearly free. Certain applications need to use this > frequently while still being portable. One solution when you do have > clock ticks is a read-only mapped Int. Another cheap solution is > library assembly that adds a cycle clock delta since last system call to > a 'gettimeofday' value set on every system call return. The upcoming x86-64 port supports vsyscalls for just that purpose. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
george anzinger wrote: > f) As noted, the account timers (task user/system times) would be much > more accurate with the tick less approach. The cost is added code in > both the system call and the schedule path. > > Tentative conclusions: > > Currently we feel that the tick less approach is not acceptable due to > (f). We felt that this added code would NOT be welcome AND would, in a > reasonably active system, have much higher overhead than any savings in > not having a tick. Also (d) implies a list organization that will, at > the very least, be harder to understand. (We have some thoughts here, > but abandoned the effort because of (f).) We are, of course, open to > discussion on this issue and all others related to the project > objectives. f does not imply tick-less is not acceptable, it implies that better process time accounting is not acceptable. list organization is not complex, it is a sorted absolute time list. I would argue that this is a hell of a lot easier to understand that ticks + offsets. still, better process time accounting should be a compile CONFIG option, not ignored and ruled out because some one thinks that is is to expensive in the general case. the whole point of linux and CONFIG options is to get you the kernel with the features you want, not what someone else wants. there should be a whole range of config options associated with this issue: CONFIG_JIFFIES == old jiffies implementation CONFIG_TICKLESS == tickless CONFIG_HYBRID == old jiffies plus a tickless high-res timer system on the side but not assoc w/ process and global timekeeping CONFIG_USELESS_PROCESS_TIME_ACCOUNTING = old style, cheap time acctg CONFIG_USEFUL_BUT_COSTS_TIME_ACCOUNTING = accurate but expensive time accounting this way, users who want tickless and lousy time acctg can have it AND people who want jiffies and good time acctg could have it. these features are largely disjoint and easily seperable. it is also relatively trivial to do this in such a way that drivers depending on the jiffie abstraction can be supported without modification no matter what the configuration. Mark Salisbury - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
When this is rewritten, I would strongly suggest that we find a way to make 'gettimeofday' nearly free. Certain applications need to use this frequently while still being portable. One solution when you do have clock ticks is a read-only mapped Int. Another cheap solution is library assembly that adds a cycle clock delta since last system call to a 'gettimeofday' value set on every system call return. sdw Andi Kleen wrote: > > On Tue, Apr 10, 2001 at 01:12:14PM +0100, Alan Cox wrote: > > Measure the number of clocks executing a timer interrupt. rdtsc is fast. Now > > consider the fact that out of this you get KHz or better scheduling > > resolution required for games and midi. I'd say it looks good. I agree > > And measure the number of cycles a gigahertz CPU can do between a 1ms timer. > And then check how often the typical application executes something like > gettimeofday. > ... sdw -- [EMAIL PROTECTED] http://sdw.st Stephen D. Williams 43392 Wayside Cir,Ashburn,VA 20147-4622 703-724-0118W 703-995-0407Fax Dec2000 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Just for your information we have a project going that is trying to come up with a good solution for all of this: http://sourceforge.net/projects/high-res-timers We have a mailing list there where we have discussed much of the same stuff. The mailing list archives are available at sourceforge. Lets separate this into findings and tentative conclusions :) Findings: a) The University of Kansas and others have done a lot of work here. b) High resolution timer events can be had with or without changing HZ. c) High resolution timer events can be had with or without eliminating the 1/HZ tick. d) The organization of the timer list should reflect the existence of the 1/HZ tick or not. The current structure is not optimal for a "tick less" implementation. Better would be strict expire order with indexes to "interesting times". e) The current organization of the timer list generates a hiccup every 2.56 seconds to handle "cascading". Hiccups are bad. f) As noted, the account timers (task user/system times) would be much more accurate with the tick less approach. The cost is added code in both the system call and the schedule path. Tentative conclusions: Currently we feel that the tick less approach is not acceptable due to (f). We felt that this added code would NOT be welcome AND would, in a reasonably active system, have much higher overhead than any savings in not having a tick. Also (d) implies a list organization that will, at the very least, be harder to understand. (We have some thoughts here, but abandoned the effort because of (f).) We are, of course, open to discussion on this issue and all others related to the project objectives. We would reorganize the current timer list structure to eliminate the cascade (e) and to add higher resolution entries. The higher resolution entries would carry an addition word which would be the fraction of a jiffie that needs to be added to the jiffie value for the timer. This fraction would be in units defined by the platform to best suit the sub jiffie interrupt generation code. Each of the timer lists would then be ordered by time based on this sub jiffie value. In addition, in order to eliminate the cascade, each timer list would carry all timers for times that expire on the (jiffie mod (size of list)). Thus, with the current 256 first order lists, all timers with the same (jiffies & 255) would be in the same list, again in expire order. We also think that the list size should be configurable to some power of two. Again we welcome discussion of these issues. George Alan Cox wrote: >> It's also all interrupts, not only syscalls, and also context switch if you >> want to be accurate. >We dont need to be that accurate. Our sample rate is currently so low the >data is worthless anyway - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Alan Cox wrote: > > > This is an X issue. I was talking with Jim Gettys about what is needed to > > > get the relevant existing X extensions for this working > > > > Last time I looked at XF86DGA (years ago), it seemed to have the right > > hooks in place. Just a matter of server implementation. My > > recollection is dusty though. > > There is also a timing extension for synchronizing events to happenings. The > stopper is the kernel interface for the vblank handling since the irq must > be handled and cleared in kernel context before we return to X. We now think > we know how to handle that cleanly Except for cards which don't generate an irq on vsync but do let you see how the raster is proceeding. (I vaguely recall these). For which a PLL and, wait for it high resolution timer is needed. Perhaps that's a 1990s problem that doesn't need a 2000s solution though :-) -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
> > This is an X issue. I was talking with Jim Gettys about what is needed to > > get the relevant existing X extensions for this working > > Last time I looked at XF86DGA (years ago), it seemed to have the right > hooks in place. Just a matter of server implementation. My > recollection is dusty though. There is also a timing extension for synchronizing events to happenings. The stopper is the kernel interface for the vblank handling since the irq must be handled and cleared in kernel context before we return to X. We now think we know how to handle that cleanly Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Tue, Apr 10, 2001 at 04:43:36AM -0700, David Schleef wrote: > However, on machines without a monotonically increasing counter, > i.e., the TSC, you have to use 8254 timer 0 as both the timebase > and the interval counter -- you end up slowly losing time because > of the race condition between reading the timer and writing a > new interval. RTAI's solution is to disable kd_mksound and > use timer 2 as a poor man's TSC. If either of those is too big > of a price, it may suffice to report that the timer granularity > on 486's is 10 ms. Just for the record, Michael Barabanov did this in RTLinux from before kd_mksound was a function pointer in 1995. Michael had an optimization attempt using channel 1 for a while, but on very slow machines this was not sufficient and he went back to channel 2. Of course, the fundamental problem is that board designers keep putting an 1920s part in machines built in 2001. Here's the comment from the RTLinux 0.5 patch -- all available on the archives on rtlinux.com. +/* The main procedure; resets the 8254 timer to generate an interrupt. The + * tricky part is to keep the global time while reprogramming it. We latch + * counters 0 and 2 atomically before and after reprogramming to figure it out. + */ -- - Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Alan Cox wrote: > > Games would like to be able to page flip at vertical refresh time -- > > <1ms accuracy please. Network traffic shaping benefits from better than > > This is an X issue. I was talking with Jim Gettys about what is needed to > get the relevant existing X extensions for this working Last time I looked at XF86DGA (years ago), it seemed to have the right hooks in place. Just a matter of server implementation. My recollection is dusty though. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
> Games would like to be able to page flip at vertical refresh time -- > <1ms accuracy please. Network traffic shaping benefits from better than This is an X issue. I was talking with Jim Gettys about what is needed to get the relevant existing X extensions for this working - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Mikulas Patocka wrote: > Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s > and MIN_DELACK is 0.04s, TCP would hardly benefit from them. Indeed, using precise timers for TCP would probably degrade performance -- you should process a group of timer events together, when resolution is not that important. There are plenty of apps that need higher resolution. Software modem comes to mind (guess why ;), though the device driver supplies the high resolution timed interrupts in that case. Games would like to be able to page flip at vertical refresh time -- <1ms accuracy please. Network traffic shaping benefits from better than 1ms timing. Video players want to display their frames preferably without 10ms jitter. Even that old classic game "snake" benefits from decent timing. I worked on an X multiplayer snake implementation which was very unpleasant and jerky at first. 1. Disable nagle for X connection :-) Better but still jerky. 2. Write delay loop like this: calculate next_event_time select (0, 0, 0, next_event_time - 20ms) while (gettimeofday() < next_event_time) /* Busy loop for last 20ms. */ It's no coincidence that I've had to write another very similar event loop recently. You can see, this sort of thing is a real waste of CPU. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s > and MIN_DELACK is 0.04s, TCP would hardly benefit from them. There are a considerable number of people who really do need 1Khz resolution. Midi is one of the example cases. That doesn't mean we have to go to a 1KHz timer, we may want to combine a 100Hz timer with a smart switch to 1Khz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
>BTW. Why we need to redesign timers at all? The cost of timer interrupt >each 1/100 second is nearly zero (1000 instances on S/390 VM is not common >case - it is not reasonable to degradate performance of timers because of >this). The cost of the timer interrupts on a single image system is neglectable, true. As I already pointed out in the original proposal we are looking for a solution that will allow us to minimize the costs of the timer interrupts when we run many images. For us this case is not unusual and it is reasonable to degrade performance of a running system by a very small amount to get rid of the HZ timer. This proposal was never meant to be the perfect solution for every platform, that is why it is configuratable with the CONFIG_NO_HZ_TIMER option. blue skies, Martin Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247 E-Mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Mikulas Patocka wrote: > BTW. Why we need to redesign timers at all? The cost of timer interrupt > each 1/100 second is nearly zero (1000 instances on S/390 VM is not common > case - it is not reasonable to degradate performance of timers because of > this). > > Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s > and MIN_DELACK is 0.04s, TCP would hardly benefit from them. > well, I can think dozens of real time applications off the top of my head that need beter than 1ms timing resolution (think sensor fusion) 1000 clock interrupts/sec is wasteful when what you need is 1 very precisely timed interrupt. why do we redesign anything? to make it better. TCP is not the only thing in the system. if you are in love with the existing system, it shouldn't be hard to make it a config option. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
On Tue, Apr 10, 2001 at 04:10:28PM +0200, Mikulas Patocka wrote: > Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s > and MIN_DELACK is 0.04s, TCP would hardly benefit from them. On networking bandwidth shaping and traffic control generally need higher precision timers. There are also people who don't like the minimum 10ms delay for non-busy wait, it's apparently a problem for some apps. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/