Hey Ed Yes, I actually experimented with moving the new interface to hrtime_t instead of clock_t, but that involved adding conversion statements to all previous consumers of cv_timedwait(), all of which had been designed with clock_t in mind.
If a new interface with such characteristic is necessary or desired, I believe it should be implemented so that folks would gradually migrate their code to it. Not just have someone change the immediate routine that consumes the interface, but do it with a full understanding of how their code handles time. Thanks, Rafael Edward Pilatowicz wrote: > hey rafael, > > for the new cv_* functions the delta is still specified by a clock_t, > which iirc is still subject to the value of "hz". many driver writers > incorrectly assume that "hz" is always 100. hence when some > brave/foolish person comes along and sets "hires_tick" or a custom "hz" > value, things break. getting to my point, if we're introducing new > interfaces to allow driver writers to simplify their drivers, perhaps we > could move away from having them deal with clock_t values all together? > did you consider creating the new interfaces to allow the caller to > simply specify the requested delta directly in > NANOSEC/MICROSEC/MILLISEC/SEC values? > > ed > > On Tue, Sep 01, 2009 at 02:22:45PM -0700, Jerry Gilliam wrote: >> I am sponsoring the following fast-track on behalf of Rafael Vanoni, >> with a time-out of 09/09/2009. The project desires >> minor/major binding, plus micro/patch binding for one >> interface, as specified. >> >> ------------------------------------- >> >> Template Version: @(#)onepager.txt 1.35 07/11/07 SMI >> Copyright 2007 Sun Microsystems >> >> 1. Introduction >> 1.1. Project/Component Working Name: >> Tickless Kernel Architecture / lbolt decoupling >> >> 1.2. Name of Document Author/Supplier: >> Rafael Vanoni Polanczyk (rafael.vanoni at sun.com) >> >> 1.3. Date of This Document: >> 08/04/09 >> >> 1.3.1. Date this project was conceived: >> 07/01/09 >> >> 1.4. Name of Major Document Customer(s)/Consumer(s): >> 1.4.1. The PAC or CPT you expect to review your project: >> Solaris PAC >> 1.4.2. The ARC(s) you expect to review your project: >> 1.4.3. The Director/VP who is "Sponsoring" this project: >> Greg.Lavender at Sun.COM >> 1.4.4. The name of your business unit: >> Systems >> >> 1.5. Email Aliases: >> 1.5.1. Responsible Manager: darrin.johnson at sun.com >> 1.5.2. Responsible Engineer: rafael.vanoni at sun.com >> 1.5.3. Marketing Manger: mike.mulkey at sun.com >> 1.5.4. Interest List: tickless-dev at opensolaris.org >> >> >> 2. Project Summary >> 2.1. Project Description: >> The tickless project aims at implementing the services provided by the >> clock cyclic in an event driven fashion. The first sub-project is the >> decoupling of the lbolt and lbolt64 variables from clock(). These two >> variables are incremented at each firing of the clock cyclic and >> provide >> a time reference to the system. They are being replaced by two >> routines >> that are backed by gethrtime(), the existing ddi_get_lbolt() and >> the new ddi_get_lbolt64(), introduced as a migration path for existing >> non-DDI compliant consumers. >> >> This project also presents a solution to minimize the usage of the DDI >> lbolt routines through new interfaces, and a method to prevent any >> performance impact of migrating inexpensive references to variables, >> to >> calling of routines. These are described in detail on section 4.1. >> >> >> 4. Technical Description: >> 4.1. Details: >> lbolt and lbolt64 variables will be replaced by two routines, >> ddi_get_lbolt() and ddi_get_lbolt64(), which are backed by a hardware >> counter to provide the same service in en event driven way. >> >> One of the major consumers of the lbolt service are the cv_timedwait() >> and cv_timedwait_sig() routines, which require lbolt to form one of >> its >> arguments (an absolute value of time) and once again internally to >> decompose it into a relative time. This project is introducing two new >> routines, cv_reltimedwait() and cv_reltimedwait_sig() which will >> perform >> the same service of the previously mentioned routines but simply >> receiving a relative time, and not requiring lbolt at all. These new >> routines will also have a new argument of type time_res_t to inform >> the underlying timeout system as to how accurately the given timeout >> must expire. This will allow the kernel to anticipate or defer such >> timeouts when possible, allowing the system to stay idle for longer >> periods of time. >> >> Some consumers of the lbolt and lbolt64 variables may have inexplicit >> dependencies on the cheapness of reading a memory position that will >> be >> exposed when migrated to a gethrtime() backed routine. In such cases >> migrating references to lbolt and lbolt64 to ddi_get_lbolt() and >> ddi_get_lbolt64() will have a negative performance impact. To address >> this case, our project will perform the lbolt service in an hybrid >> way, >> switching from event to cyclic driven when the DDI lbolt routines are >> being heavily used. This cyclic mode will reprogram a timer that will >> expire at each clock tick and increment an internal (lbolt like) >> variable and return its value to the consumer. This cyclic will only >> be activated during periods of heavy load, and will switch itself off >> when the activity subsides. >> >> The decision to remove the lbolt and lbolt64 variables was made during >> design review, and a consensus was reached on the basis that, since >> we're reaching the end of a major release, this is the right moment to >> obsolete these. The side effects and cost of maintaining such symbols >> outweigh the benefits. However, this decision can be re-evaluated in >> case the negative impact on 3rd party modules during the development >> release is greater than expected. We're working with ISV and RPE to >> minimize the impact pro-actively. >> >> 4.2. Bug/RFE Number(s): >> 6860030 tickless clock requires a clock() decoupled lbolt / lbolt64 >> >> 4.5. Interfaces: >> This project is adding the following interfaces to the DDI: >> >> int64_t ddi_get_lbolt64(void); >> >> clock_t cv_reltimedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t delta, >> time_res_t res); >> >> clock_t cv_reltimedwait_sig(kcondvar_t *cvp, kmutex_t *mp, clock_t >> delta, time_res_t res); >> >> With time_res_t defined as >> >> enum time_res { >> TR_NANOSEC, >> TR_MICROSEC, >> TR_MILLISEC, >> TR_SEC, >> TR_CLOCK_TICK, >> TR_COUNT >> }; >> >> typedef enum time_res time_res_t; >> >> In addition to that, the lbolt and lbolt64 variables (which are >> *private* symbols known to be used by non-DDI compliant modules) are >> being removed. 3rd party modules that are not brought up to speed >> will >> fail to load. >> >> In summary: >> >> Interface Commitment Comments >> ----------------------------------------------------------------------- >> ddi_get_lbolt64() Public/DDI return lbolt64 >> cv_reltimedwait(9F) Public/DDI cv_timedwait(9f), relative time >> cv_reltimedwait_sig(9F) Public/DDI cv_timedwait_sig(9F), relative >> time >> lbolt Obsolete commonly referenced kernel symbol >> lbolt64 Obsolete commonly referenced kernel symbol >> >> We also plan on back porting the ddi_get_lbolt64() interface to Solaris >> 10 Update 9 to extend the migration path for S10 users who would like >> to update their modules before moving to Solaris Nevada or the next >> version of Solaris. These users already have ddi_get_lbolt() but >> currently lack the 64 bits version of it. Such back port will have >> patch release binding. >> >> >> 4.6. Doc Impact: >> 6868417 updates for tickless kernel/lbolt decoupling (6860030) >> >> Updates to the 'Writing Device Drivers' document are necessary, the >> project team is in contact with the documentation group to address >> these. >> >> >> 5. Reference Documents: >> This project is being developed through OpenSolaris, our project pages >> and alias contain all the necessary information: >> http://opensolaris.org/os/project/tickless/ >> http://opensolaris.org/os/project/tickless/tasks/lbolt/ >> tickless-dev at opensolaris.org >> >> >> 6. Resources and Schedule: >> 6.5. ARC review type: Fast track >> 6.6. ARC Exposure: open >> >> >> >> >> >> Updates to existing man pages: >> ------------------------------ >> >> drv_getparm.9f >> >> PARAMETERS >> ... >> >> LBOLT Read the value of lbolt. lbolt is a clock_t that | >> represents the number of clock ticks since system | >> boot. No special treatment is applied when | >> this value overflows the maximum value of the >> signed integral type clock_t. When this occurs, >> its value will be negative, and its magnitude will >> be decreasing until it again passes zero. It can >> ... >> >> >> >> >> drv_hztousec.9f >> >> DESCRIPTION >> The drv_hztousec() function converts into microseconds the >> time expressed by hertz, which is in system clock ticks. >> >> The length of time the system has been up since boot can be | >> retrieved by calling ddi_get_lbolt(9F), which will return a | >> value of type clock_t containing the number of clock ticks >> since boot. Drivers often use this value before and after an >> I/O request to measure the amount of time it took the device to >> process the request. The drv_hztousec() function can be used >> by the driver to convert the reading from clock ticks to a >> known unit of time. >> >> >> >> >> Intro.9f >> >> Kernel Functions for Drivers Intro(9F) >> >> ddi_get_instance Solaris DDI >> ddi_get_kt_did Solaris DDI >> ddi_get_lbolt Solaris DDI >> ddi_get_lbolt64 Solaris DDI + >> ddi_get_name Solaris DDI >> ... >> >> >> >> >> Updated ddi_get_lbolt.9f: >> ------------------------- >> >> Kernel Functions for Drivers ddi_get_lbolt(9F) >> >> NAME >> ddi_get_lbolt - returns the number of clock ticks since boot | >> >> SYNOPSIS >> #include <sys/types.h> >> #include <sys/ddi.h> >> #include <sys/sunddi.h> >> >> clock_t ddi_get_lbolt(void); >> >> INTERFACE LEVEL >> Solaris DDI specific (Solaris DDI). >> >> DESCRIPTION >> ddi_get_lbolt() returns a value that represents the number | >> of clock ticks since the system booted. This value is | >> used as a counter or timer inside the system kernel. >> The tick frequency can be determined by using drv_usectohz(9F) >> which converts microseconds into clock ticks. >> >> >> RETURN VALUES >> ddi_get_lbolt() returns the number of clock ticks since boot | >> in clock_t type. >> >> CONTEXT >> This routine can be called from any context. >> >> SEE ALSO >> ddi_get_lbolt64(9F), ddi_get_time(9F), drv_getparm(9F), >> drv_usectohz(9F) >> >> >> >> >> New man page for ddi_get_lbolt64(): >> ----------------------------------- >> >> Kernel Functions for Drivers ddi_get_lbolt64(9F) >> >> NAME >> ddi_get_lbolt64 - returns the number of clock ticks since boot >> in int64_t type >> >> SYNOPSIS >> #include <sys/types.h> >> #include <sys/ddi.h> >> #include <sys/sunddi.h> >> >> int64_t ddi_get_lbolt64(void); >> >> INTERFACE LEVEL >> Solaris DDI specific (Solaris DDI). >> >> DESCRIPTION >> ddi_get_lbolt64() returns a value that represents the number >> of clock ticks since the system booted. This value is >> used as a counter or timer inside the system kernel. It is >> essentially the same value returned by ddi_get_lbolt(9F), but in a >> longer data type that will not wrap for 2.9 billion years. >> >> RETURN VALUES >> ddi_get_lbolt64() returns the number of clock ticks since boot >> in int64_t type. >> >> CONTEXT >> This routine can be called from any context. >> >> SEE ALSO >> ddi_get_lbolt(9F), ddi_get_time(9F) >> >> Writing Device Drivers >> >> STREAMS Programming Guide >> >> SunOS 5.11 Last change: 29 Jul 2009 1 >> >> >> Updates to condvar(9f): >> ---------------------- >> >> Kernel Functions for Drivers condvar(9F) >> >> NAME >> condvar, cv_init, cv_destroy, cv_wait, cv_signal, >> cv_broadcast, cv_wait_sig, cv_timedwait, cv_timedwait_sig, >> cv_reltimedwait, cv_reltimedwait_sig - condition variable >> routines >> >> SYNOPSIS >> #include <sys/ksynch.h> >> >> void cv_init(kcondvar_t *cvp, char *name, kcv_type_t type, void *arg); >> >> void cv_destroy(kcondvar_t *cvp); >> >> void cv_wait(kcondvar_t *cvp, kmutex_t *mp); >> >> void cv_signal(kcondvar_t *cvp); >> >> void cv_broadcast(kcondvar_t *cvp); >> >> int cv_wait_sig(kcondvar_t *cvp, kmutex_t *mp); >> >> clock_t cv_timedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t timeout); >> >> clock_t cv_timedwait_sig(kcondvar_t *cvp, kmutex_t *mp, clock_t >> timeout); >> >> | clock_t cv_reltimedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t delta, >> | time_res_t resolution); >> >> | clock_t cv_reltimedwait_sig(kcondvar_t *cvp, kmutex_t *mp, clock_t >> delta, >> | time_res_t resolution); >> >> INTERFACE LEVEL >> Solaris DDI specific (Solaris DDI). >> >> PARAMETERS >> cvp A pointer to an abstract data type kcondvar_t. >> >> mp A pointer to a mutual exclusion lock (kmutex_t), >> initialized by mutex_init(9F) and held by the >> caller. >> >> name Descriptive string. This is obsolete and should >> be NULL. (Non-NULL strings are legal, but they're >> a waste of kernel memory.) >> >> SunOS 5.11 Last change: 02 Aug 2009 1 >> >> Kernel Functions for Drivers condvar(9F) >> >> type The constant CV_DRIVER. >> >> arg A type-specific argument, drivers should pass arg >> as NULL. >> >> timeout A time, in absolute ticks since boot, when >> cv_timedwait() or cv_timedwait_sig() should >> return. >> >> | delta A time, in relative ticks, when cv_reltimedwait() >> | or cv_reltimedwait_sig() should return. >> | >> | resolution A flag that specifies how accurately the relative >> | time interval should be. Possible values are >> | TR_NANOSEC, TR_MICROSEC, TR_MILLISEC, TR_SEC or >> | TR_CLOCK_TICK, the former indicating that the interval >> | should be aligned to system clock ticks. This >> | information allows the system to anticipate or >> | deffer the timeout expiration in order to batch process >> | similarly expiring events. Allowing the system to >> | stay idle for longer periods of time and enhance >> | its power efficiency. >> >> >> DESCRIPTION >> Condition variables are a standard form of thread synchroni- >> zation. They are designed to be used with mutual exclusion >> locks (mutexes). The associated mutex is used to ensure that >> a condition can be checked atomically and that the thread >> can block on the associated condition variable without miss- >> ing either a change to the condition or a signal that the >> condition has changed. Condition variables must be initial- >> ized by calling cv_init(), and must be deallocated by cal- >> ling cv_destroy(). >> >> The usual use of condition variables is to check a condition >> (for example, device state, data structure reference count, >> etc.) while holding a mutex which keeps other threads from >> changing the condition. If the condition is such that the >> thread should block, cv_wait() is called with a related con- >> dition variable and the mutex. At some later point in time, >> another thread would acquire the mutex, set the condition >> such that the previous thread can be unblocked, unblock the >> previous thread with cv_signal() or cv_broadcast(), and then >> release the mutex. >> >> cv_wait() suspends the calling thread and exits the mutex >> atomically so that another thread which holds the mutex can- >> not signal on the condition variable until the blocking >> thread is blocked. Before returning, the mutex is reac- >> quired. >> >> cv_signal() signals the condition and wakes one blocked >> thread. All blocked threads can be unblocked by calling >> cv_broadcast(). cv_signal() and cv_broadcast() can be called >> by a thread even if it does not hold the mutex passed into >> cv_wait(), though holding the mutex is necessary to ensure >> predictable scheduling. >> >> SunOS 5.11 Last change: 02 Aug 2009 2 >> >> Kernel Functions for Drivers condvar(9F) >> >> The function cv_wait_sig() is similar to cv_wait() but >> returns 0 if a signal (for example, by kill(2)) is sent to >> the thread. In any case, the mutex is reacquired before >> returning. >> >> The function cv_timedwait() is similar to cv_wait(), except >> that it returns -1 without the condition being signaled >> after the timeout time has been reached. >> >> The function cv_timedwait_sig() is similar to cv_timedwait() >> and cv_wait_sig(), except that it returns -1 without the >> condition being signaled after the timeout time has been >> reached, or 0 if a signal (for example, by kill(2)) is sent >> to the thread. >> >> For both cv_timedwait() and cv_timedwait_sig(), time is in >> absolute clock ticks since the last system reboot. The >> current time may be found by calling ddi_get_lbolt(9F). >> >> | The cv_reltimedwait() function is similar to cv_timedwait(), >> | except that it takes a relative time value as argument and >> | it also takes an additional argument to specify the accuracy >> | of such interval. cv_reltimedwait_sig() is analogous to >> | cv_timedwait_sig(), but takes the same arguments as >> | cv_reltimedwait(). >> >> RETURN VALUES >> 0 For cv_wait_sig(), cv_timedwait_sig() and cv_reltimedwait_sig() >> indicates >> that the condition was not necessarily signaled and >> the function returned because a signal (as in >> kill(2)) was pending. >> >> | -1 For cv_timedwait(), cv_timedwait_sig(), >> | cv_reltimedwait() and cv_reltimedwait_sig() indicates >> that the condition was not necessarily signaled and >> the function returned because the timeout time was >> reached. >> >> | >0 For cv_wait_sig(), cv_timedwait(), cv_timedwait_sig(), >> | cv_reltimedwait() or cv_reltimedwait_sig() >> | indicates that the condition was >> met and the function returned due to a call to >> cv_signal() or cv_broadcast(), or due to a prema- >> ture wakeup (see NOTES). >> >> CONTEXT >> These functions can be called from user, kernel or interrupt >> context. In most cases, however, cv_wait(), cv_timedwait(), >> | cv_wait_sig(), cv_timedwait_sig(), cv_reltimedwait() and >> | cv_reltimedwait_sig() >> should not be called >> from interrupt context, and cannot be called from a high- >> level interrupt context. >> >> If cv_wait(), cv_timedwait(), cv_wait_sig(), >> | cv_timedwait_sig(), cv_reltimedwait() or cv_reltimedwait_sig() >> | are used from interrupt context, lower- >> >> SunOS 5.11 Last change: 02 Aug 2009 3 >> >> Kernel Functions for Drivers condvar(9F) >> >> priority interrupts will not be serviced during the wait. >> This means that if the thread that will eventually perform >> the wakeup becomes blocked on anything that requires the >> lower-priority interrupt, the system will hang. >> >> For example, the thread that will perform the wakeup may >> need to first allocate memory. This memory allocation may >> require waiting for paging I/O to complete, which may >> require a lower-priority disk or network interrupt to be >> serviced. In general, situations like this are hard to >> predict, so it is advisable to avoid waiting on condition >> variables or semaphores in an interrupt context. >> >> EXAMPLES >> Example 1 Waiting for a Flag Value in a Driver's Unit >> >> Here the condition being waited for is a flag value in a >> driver's unit structure. The condition variable is also in >> the unit structure, and the flag word is protected by a >> mutex in the unit structure. >> >> mutex_enter(&un->un_lock); >> while (un->un_flag & UNIT_BUSY) >> cv_wait(&un->un_cv, &un->un_lock); >> un->un_flag |= UNIT_BUSY; >> mutex_exit(&un->un_lock); >> >> Example 2 Unblocking Threads Blocked by the Code in Example >> 1 >> >> At some later point in time, another thread would execute >> the following to unblock any threads blocked by the above >> code. >> >> mutex_enter(&un->un_lock); >> un->un_flag &= ~UNIT_BUSY; >> cv_broadcast(&un->un_cv); >> mutex_exit(&un->un_lock); >> >> NOTES >> | It is possible for cv_wait(), cv_wait_sig(), cv_timedwait(), >> | cv_timedwait_sig(), cv_reltimedwait() and cv_reltimedwait_sig() >> | to return prematurely, that is, not >> due to a call to cv_signal() or cv_broadcast(). This occurs >> most commonly in the case of cv_wait_sig(), >> >> SunOS 5.11 Last change: 02 Aug 2009 4 >> >> Kernel Functions for Drivers condvar(9F) >> >> | cv_timedwait_sig() and cv_reltimedwait_sig() when the thread >> | is stopped and restarted >> by job control signals or by a debugger, but can happen in >> other cases as well, even for cv_wait(). Code that calls >> these functions must always recheck the reason for blocking >> and call again if the reason for blocking is still true. >> >> | If your driver needs to wait on behalf of processes that >> | have real-time constraints, use cv_timedwait() or cv_reltimedwait() >> | rather than >> delay(9F). The delay() function calls timeout(9F), which can >> be subject to priority inversions. >> >> Not all threads can receive signals from user level >> processes. In cases where such reception is impossible (such >> as during execution of close(9E) due to exit(2)), >> cv_wait_sig() behaves as cv_wait(), cv_timedwait_sig() >> | behaves as cv_timedwait() and cv_reltimedwait_sig() behaves as >> | cv_reltimedwait(). >> To avoid unkillable processes, >> users of these functions may need to protect against waiting >> indefinitely for events that might not occur. The >> ddi_can_receive_sig(9F) function is provided to detect when >> signal reception is possible. >> >> SEE ALSO >> kill(2), ddi_can_receive_sig(9F), ddi_get_lbolt(9F), >> | ddi_get_lbolt64(9F), mutex(9F), mutex_init(9F) >> >> Writing Device Drivers >> >> SunOS 5.11 Last change: 02 Aug 2009 5