http://www.tglx.de/ktimers.htmlktimersThis project is not longer active. It is superseeded by the hrtimer project .
ktimers - subsystem for high-precision kernel timersThis patch introduces a new subsystem for high-precision kernel timers. Why two timer subsystems? After a lot of back and forth trying to integrate high-precision and high-resolution features into the existing timer framework, and after testing various such high-resolution timer implementations in practice, we came to the conclusion that the timer wheel code is fundamentally not suitable for such an approach. We initially didnt believe this ('there must be a way to solve this'), and we spent a considerable effort trying to integrate things into the timer wheel, but we failed. There are several reasons why such integration is impossible:
The primary users of precision timers are user-space applications that utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel users like drivers and subsystems with a requirement for precise timed events can benefit from the availability of a seperate high-precision timer subsystem as well. The ktimer subsystem is easily extended with high-resolution capabilities, and patches for that exist and are maturing quickly. The increasing demand for realtime and multimedia applications along with other potential users for precise timers gives another reason to separate the "timeout" and "precise timer" subsystems. Another potential benefit is that such seperation allows for future optimizations of the existing timer wheel implementation for the low resolution and low precision use cases - once the precision-sensitive APIs are separated from the timer wheel and are migrated over to ktimers. E.g. we could decrease the frequency of the timeout subsystem from 250 Hz to 100 HZ (or even smaller). ktimer subsystem implementation detailsthe basic design considerations were:
>From our previous experience with various approaches of high-resolution timers another basic requirement was the immediate enqueueing and ordering of timers at activation time. After looking at several possible solutions such as radix trees and hashes, the red black tree was choosen as the basic data structure. Rbtrees are available as a library in the kernel and are used in various performance-critical areas of e.g. memory management and file systems. The rbtree is solely used for the time sorted ordering, while a seperate list is used to give the expiry code fast access to the queued timers, without having to walk the rbtree. (This seperate list is also useful for high-resolution timers where we need seperate pending and expired queues while keeping the time-order intact.) The time-ordered enqueueing is not purely for the purposes of the high-resolution timers extension though, it also simplifies the handling of absolute timers based on CLOCK_REALTIME. The existing implementation needed to keep an extra list of all armed absolute CLOCK_REALTIME timers along with complex locking. In case of settimeofday and NTP, all the timers (!) had to be dequeued, the time-changing code had to fix them up one by one, and all of them had to be enqueued again. The time-ordered enqueueing and the storage of the expiry time in absolute time units removes all this complex and poorly scaling code from the posix-timer implementation - the clock can simply be set without having to touch the rbtree. This also makes the handling of posix-timers simpler in general. The locking and per-CPU behavior of ktimers was mostly taken from the existing timer wheel code, as it is mature and well suited. Sharing code was not really a win, due to the different data structures. Also, the ktimer functions now have clearer behavior and clearer names - such as ktimer_try_to_cancel() and ktimer_cancel() [which are roughly equivalent to del_timer() and del_timer_sync()] - and there's no direct 1:1 mapping between them on the algorithmical level. The internal representation of time values (ktime_t) is implemented via macros and inline functions, and can be switched between a "hybrid union" type and a plain "scalar" 64bit nanoseconds representation (at compile time). The hybrid union type exists to optimize time conversions on 32bit CPUs. This build-time-selectable ktime_t storage format was implemented to avoid the performance impact of 64-bit multiplications and divisions on 32bit CPUs. Such operations are frequently necessary to convert between the storage formats provided by kernel and userspace interfaces and the internal time format. (See include/linux/ktime.h for further details.) ktimers - rounding of timer valuesWhy do we need rounding at all ? Firstly, the POSIX specification requires rounding to the resolution - whatever that means. The POSIX specification is quite imprecise on the details of rounding though, so a practical interpretation had to be found. The first question is which resolution value should be returned to the user by the clock_getres() interface. The simplest case is when the hardware is capable of 1 nsec resolution: in that case we can fulfill all wishes and there is no rounding :-) Another simple case is when the clock hardware has a limited resolution that the kernel wants to fully offer to user-space: in this case that limited resolution is returned to userspace. The hairy case is when the underlying hardware is capable of finer grained resolution, but the kernel is not willing to offer that resolution. Why would the kernel want to do that? Because e.g. the system could easily be DoS-ed with high-frequency timer interrupts. Or the kernel might want to cluster high-res timer interrupts into groups for performance reasons, so that extremely high interrupt rates are avoided. So the kernel needs some leeway in deciding the 'effective' resolution that it is willing to expose to userspace. In this case, the clock_getres() decision is easy: we want to return the 'effective' resolution, not the 'theoretical' resolution. Thus an application programmer gets correct information about what granularity and accuracy to expect from the system. What is much less obvious in both the 'hardware is low-res' and 'kernel wants to offer low-res' cases is the actual behavior of timers, and where and how to round time values to the 'effective' resolution of the clock. For this we first need to see what types of expiries there exist for ktimers, and how rounding affects them. Ktimers have the following variants:
Interval timers can be led back to one-shot timers: they are a series of one-shot timers with the same interval. Relative one-shot timers can be handled identically to absolute one-shot timers after adding the relative expiry time to the current time of the respective clock. We picked to handle two cases of rounding:
An alternative implementation would be to not round the interval and to implicitly round at every timer event, but it's not clear what the advantages would be from doing that. There are a couple of disadvantages:
(as a sidenote, current POSIX APIs could be extended with a method of periodic timers to have an 'average' frequency, where there is no rounding of the interval. No such API exists at the moment.) ktimers - testing and verificationWe used the high-resolution timer subsystem ontop of ktimers to verify the ktimer implementation details in praxis, and we also ran the posix timer tests in order to ensure specification compliance. The ktimer patch converts the following kernel functionality to use ktimers:
The conversion of nanosleep and posix-timers enabled the unification of nanosleep and clock_nanosleep.
The code was successfully compiled for the following platforms:
The code was run-tested on the following platforms: ktimers were also integrated into the -rt tree, along with a ktimers-based high-resolution timer implementation, so the ktimers code got a healthy amount of testing and use in practice. Thomas Gleixner, Ingo Molnar See also this excellent article on LWN for further information ktimers high resolution patchesA ktimers high resolution timer implemtation is available which
introduces a new abstraction layer for clock event sources to make high
resolution timers less intrusive.
|
