Re: No 100 HZ timer!

2001-04-23 Thread Mark Salisbury

On Mon, 23 Apr 2001, Ulrich Windl wrote:
> IMHO the POSIX is doable to comply with POSIX. Probably not what many 
> of the RT freaks expect, but doable. I'm tuning the nanoseconds for a 
> while now...
> 
> Ulrich

thanks for calling me a freak :)

yes, linux could have posix timers cobbled on today and comply with the POSIX
spec.

but, we would like to make it better, not just feature complete.

10ms timer resolutions, while completely in compliance w/ the posix spec, are
completely useless for "RT freaks" 

remember that 95% of all computers in the world are embedded and of those most
nead at least "soft" RT behavior.  seems like a good growth market for linux to
me.

when a PPC G4 is capable of 30 nanosecond resolution, why would you want to
settle for 10 millisecond  (30 billionths of a second vs 10 thousandths for
those of you who haven't had to familiarize yourselves with sub-second time
units)

> 
> On 17 Apr 2001, at 11:53, george anzinger wrote:
> 
> > I was thinking that it might be good to remove the POSIX API for the
> > kernel and allow a somewhat simplified interface.  For example, the user
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  I will be riding in the Multiple Sclerosis**
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,   **
**  please contact me.**
***/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-23 Thread Ulrich Windl

IMHO the POSIX is doable to comply with POSIX. Probably not what many 
of the RT freaks expect, but doable. I'm tuning the nanoseconds for a 
while now...

Ulrich

On 17 Apr 2001, at 11:53, george anzinger wrote:

> I was thinking that it might be good to remove the POSIX API for the
> kernel and allow a somewhat simplified interface.  For example, the user

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-23 Thread Ulrich Windl

IMHO the POSIX is doable to comply with POSIX. Probably not what many 
of the RT freaks expect, but doable. I'm tuning the nanoseconds for a 
while now...

Ulrich

On 17 Apr 2001, at 11:53, george anzinger wrote:

 I was thinking that it might be good to remove the POSIX API for the
 kernel and allow a somewhat simplified interface.  For example, the user

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-17 Thread Jamie Lokier

george anzinger wrote:
> > > a.) list insertion of an arbitrary timer,
> > should be O(log(n)) at worst
> > 
> > > b.) removal of canceled and expired timers, and
> > easy to make O(1)
> 
> I thought this was true also, but the priority heap structure that has
> been discussed here has a O(log(n)) removal time.

Several priority queue structures support removal in O(1) time.
Perhaps you are thinking of the classic array-based heap, for
which removal is O(log n) in the general case.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-17 Thread george anzinger

Mark Salisbury wrote:
> 
> > Functional Specification for the high-res-timers project.
> >
> > In addition we expect that we will provide a high resolution timer for
> > kernel use (heck, we may provide several).
> 
> what we do here determines what we can do for the user..

I was thinking that it might be good to remove the POSIX API for the
kernel and allow a somewhat simplified interface.  For example, the user
gets to resolution by specifying a CLOCK, where we MAY want to allow the
kernel call to directly specify the resolution.  This has already been
suggested.  I suppose you could say the functional spec should define
the kernel PI (KPI?) as well as the user API, but that is a bit fuzzy at
this time as I think it will depend on how we actually code the user
functionality.  Another example: I am leaning toward using a two word
uptime composed of jiffies (i.e. 1/HZ since boot) and a machine
dependent sub jiffie unit.  Each ARCH would then define this unit as
well as the conversion routines to move it back and forth to
nanoseconds, microseconds, and 1/HZ.  I think this format would play
well in task time accounting, as well as in the timer management.  

For calls to something like delay(), however, it sucks.  I think these
calls want a single word, possibly microsecond, time specification. 
This gives a 16 or 17 minutes max and 1 microsecond min. which probably
covers 99.99% of all kernel delay needs.  

Another kernel internal interface should allow the user specified
structures (timespec and timeval).  The notion is to put all the
conversion routines in the timer code to maintain the specified
resolution, and (more importantly), to avoid conversion to a format that
just needs an additional conversion.

To summarize,

I think there is a need for two classes of timer interfaces in the
kernel:

a.) For drivers and others that need "delay()" sorts of things, and
b.) For system calls that handle user specified times.
> 
> > We will provide several "clocks" as defined by the standard.  In
> > particular, the following capabilities will be attached to some clock,
> > regardless of the actual clock "name" we end up using:
> >
> > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> > linux today).
> >
> > CLOCK_HIGH_RES a wall clock supporting timers with the highest
> > resolution the hardware supports.
> >
> > CLOCK_1US a wall clock supporting timers with 1 micro second resolution
> > (if the hardware allows it).
> >
> > CLOCK_UPTIME a clock that give the system up time.  (Does this clock
> > need to support timers?)
> >
> > CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.
> >
> 
> Too many clocks.  we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but
> the others are just fluff.  we should have 1 single clock mechanism for the
> whole system with it's resolution and tick/tickless characteristics determined
> at compile time.

I think you already have let the nose of the camel into the tent :) 
Here is what I am thinking:

Suppose an array of structures of type clock.  Clock_id is an index into
this array.  Here is what is in the structure:

struct clock{
int resolution;
int *gettime();
int *settime();
int *convert_to_uptime();
int *convert_from_uptime();
;
};

Now the difference between CLOCK_UPTIME and CLOCK_REALTIME is surly in
the gettime/settime and possibly in the resolution.  But the difference
between CLOCK_REALTIME and CLOCK_1US, CLOCK_HIGH_RES, CLOCK_10MS is JUST
the resolution!  In other words, all they cost is the table entry.  Note
that CLOCK_GMT, CLOCK_UST, and CLOCK_GPS, etc. all fit nicely into this
same structure.

We should also provide a way to "register" a new clock so the user can
easily configure in additional clocks.  There are ways of doing this
that are really easy to use, e.g. the module_init() macro.
> 
> also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a
> convenience/compliance offset.

If you mean by "true" that this clock can not be set, starts at 0 at
boot time and can only be affected by rate adjustments to get it to pass
a real second every second, I agree.
> 
> > At the same time we will NOT support the following clocks:
> >
> > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> > wall) of a given task.
> >
> > CLOCK_PROFILE a clock used to generate profiling events.
> >
> > CLOCK_???  any clock keyed to a task.
> 
> we could do some KOOL things here but they are more related to process time
> accounting and should be dealt with in that context and as a separate project.
> 
> however our design should take these concepts into account and allow for easy
> integration of these types of functionality.

I agree.
> 
> >
> > (Note that this does not mean that the clock system will not support the
> > virtual and profile clocks, but that they will not be accessible thru
> > the POSIX timer interface.)
> 
> I think that should sombody choose 

Re: No 100 HZ timer!

2001-04-17 Thread Mark Salisbury

> Functional Specification for the high-res-timers project.
> 
> In addition we expect that we will provide a high resolution timer for
> kernel use (heck, we may provide several).
 
what we do here determines what we can do for the user..

> We will provide several "clocks" as defined by the standard.  In
> particular, the following capabilities will be attached to some clock,
> regardless of the actual clock "name" we end up using:
> 
> CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> linux today). 
> 
> CLOCK_HIGH_RES a wall clock supporting timers with the highest
> resolution the hardware supports.
> 
> CLOCK_1US a wall clock supporting timers with 1 micro second resolution
> (if the hardware allows it).
> 
> CLOCK_UPTIME a clock that give the system up time.  (Does this clock
> need to support timers?)
> 
> CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.
> 

Too many clocks.  we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but
the others are just fluff.  we should have 1 single clock mechanism for the
whole system with it's resolution and tick/tickless characteristics determined
at compile time.

also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a
convenience/compliance offset.



> At the same time we will NOT support the following clocks:
> 
> CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> wall) of a given task.  
> 
> CLOCK_PROFILE a clock used to generate profiling events. 
> 
> CLOCK_???  any clock keyed to a task.

we could do some KOOL things here but they are more related to process time
accounting and should be dealt with in that context and as a separate project.

however our design should take these concepts into account and allow for easy
integration of these types of functionality.

> 
> (Note that this does not mean that the clock system will not support the
> virtual and profile clocks, but that they will not be accessible thru
> the POSIX timer interface.)

I think that should sombody choose to implement them, they should be available
at least through the clock_xxx interfaces..

> 
> It would be NICE if we could provide a way to hook other time support
> into the system.  In particular a
> 
> CLOCK_WWV or CLOCK_GPS
> 

CLOCK_NNTP

> might be nice.  The problem with these sorts of clocks is that they
> imply an array of function pointers for each clock and function pointers
> slow the code down because of their non predictability.  Never the less,
> we will attempt to allow easy expansion in this direction.
> 
> Implications on the current kernel:
> 
> The high resolution timers will require a fast clock access with the
> maximum supported resolution in order to convert relative times to
> absolute times.  This same fast clock will be used to support the
> various user and system time requests.
> 
> There are two ways to provide timers to the kernel.  For lack of a
> better term we will refer to them as "ticked" and "tick less".  Until we
> have performance information that implies that one or the other of these
> methods is better in all cases we will provide both ticked and tick less
> systems.  The variety to be used will be selected at configure time.
> 
> For tick less systems we will need to provide code to collect execution
> times.  For the ticked system the current method of collection these
> times will be used.  This project will NOT attempt to improve the
> resolution of these timers, however, the high speed, high resolution
> access to the current time will allow others to augment the system in
> this area.

huh? why not?

> 
> For the tick less system the project will also provide a time slice
> expiration interrupt.
> 
> The timer list(s) (all pending timers) need to be organized so that the
> following operations are fast:
> 
> a.) list insertion of an arbitrary timer,
should be O(log(n)) at worst

> b.) removal of canceled and expired timers, and
easy to make O(1)

> c.) finding the timer for "NOW" and its immediate followers.
head of list or top of tree or top of heap or whatever, O(1)

> Times in the timer list will be absolute and related to system up time.
> These times will be converted to wall time as needed.
 
and ONLY when needed by users


> 
> The POSIX interface provides for "absolute" timers relative to a given

do you mean "relative" timers?

> clock.  When these timers are related to a "wall" clock they will need
> adjusting when the wall clock time is adjusted.  These adjustments are
> done for "leap seconds" and the date command.  (Note, we are keeping
> timers relative to "uptime" which can not be adjusted.  This means that
> relative timers and absolute timers related to CLOCK_UPTIME are not
> affected by the above wall time adjustments.)

absolute timers will automatically fall out when you adjust CLOCK_UPTIME...
i.e.  an absolute time is an absolute time and since CLOCK_UPTIME is the
ultimate arbiter of what absolute time it is, adjusting 

Re: No 100 HZ timer!

2001-04-17 Thread Mark Salisbury

On Mon, 16 Apr 2001, you wrote:
> Mark Salisbury wrote:
> > 
> > > Given a system speed, there is a repeating timer rate which will consume
> > > 100% of the system in handling the timer interrupts.  An attempt will
> > > be made to detect this rate and adjust the timer to prevent system
> > > lockup.  This adjustment will look like timer overruns to the user
> > > (i.e. we will take a percent of the interrupts and record the untaken
> > > interrupts as overruns)
> > 
> > just at first blush, there are some things in general but I need to read
> > this again and more closely
> > 
> > but, with POSIX timers, there is a nifty little restriction/protection built
> > into the spec regarding the re-insertion of short interval repeating timers.
> > that is: a repeating timer will not be re-inserted until AFTER the
> > associated signal handler has been handled.
> 
> Actually what it says is: "Only a single signal shall be queued to the
> process for a given timer at any point in time.  When a timer for which
> a signal is still pending expires, no signal shall be queued, and a
> timer overrun shall occur."
> 
> It then goes on to talk about the overrun count and how it is to be
> managed.
> 
I guess I was confusing what the spec said with the way in which I chose to
comply with the spec.  I calculate overruns (I know when a timer went off, and
how many of its intervals have passed since it went off by by 
time_since_expire/interval) and don't reinsert until return from the signal
handler.  

this prevents a flood of high frequency interrupts inherently and allows
processing of the signal handlers to proceed cleanly.

-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  I will be riding in the Multiple Sclerosis**
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,   **
**  please contact me.**
***/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-17 Thread Mark Salisbury

On Mon, 16 Apr 2001, you wrote:
 Mark Salisbury wrote:
  
   Given a system speed, there is a repeating timer rate which will consume
   100% of the system in handling the timer interrupts.  An attempt will
   be made to detect this rate and adjust the timer to prevent system
   lockup.  This adjustment will look like timer overruns to the user
   (i.e. we will take a percent of the interrupts and record the untaken
   interrupts as overruns)
  
  just at first blush, there are some things in general but I need to read
  this again and more closely
  
  but, with POSIX timers, there is a nifty little restriction/protection built
  into the spec regarding the re-insertion of short interval repeating timers.
  that is: a repeating timer will not be re-inserted until AFTER the
  associated signal handler has been handled.
 
 Actually what it says is: "Only a single signal shall be queued to the
 process for a given timer at any point in time.  When a timer for which
 a signal is still pending expires, no signal shall be queued, and a
 timer overrun shall occur."
 
 It then goes on to talk about the overrun count and how it is to be
 managed.
 
I guess I was confusing what the spec said with the way in which I chose to
comply with the spec.  I calculate overruns (I know when a timer went off, and
how many of its intervals have passed since it went off by by 
time_since_expire/interval) and don't reinsert until return from the signal
handler.  

this prevents a flood of high frequency interrupts inherently and allows
processing of the signal handlers to proceed cleanly.

-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  I will be riding in the Multiple Sclerosis**
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,   **
**  please contact me.**
***/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-17 Thread Mark Salisbury

 Functional Specification for the high-res-timers project.
 
 In addition we expect that we will provide a high resolution timer for
 kernel use (heck, we may provide several).
 
what we do here determines what we can do for the user..

 We will provide several "clocks" as defined by the standard.  In
 particular, the following capabilities will be attached to some clock,
 regardless of the actual clock "name" we end up using:
 
 CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
 linux today). 
 
 CLOCK_HIGH_RES a wall clock supporting timers with the highest
 resolution the hardware supports.
 
 CLOCK_1US a wall clock supporting timers with 1 micro second resolution
 (if the hardware allows it).
 
 CLOCK_UPTIME a clock that give the system up time.  (Does this clock
 need to support timers?)
 
 CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.
 

Too many clocks.  we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but
the others are just fluff.  we should have 1 single clock mechanism for the
whole system with it's resolution and tick/tickless characteristics determined
at compile time.

also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a
convenience/compliance offset.



 At the same time we will NOT support the following clocks:
 
 CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
 wall) of a given task.  
 
 CLOCK_PROFILE a clock used to generate profiling events. 
 
 CLOCK_???  any clock keyed to a task.

we could do some KOOL things here but they are more related to process time
accounting and should be dealt with in that context and as a separate project.

however our design should take these concepts into account and allow for easy
integration of these types of functionality.

 
 (Note that this does not mean that the clock system will not support the
 virtual and profile clocks, but that they will not be accessible thru
 the POSIX timer interface.)

I think that should sombody choose to implement them, they should be available
at least through the clock_xxx interfaces..

 
 It would be NICE if we could provide a way to hook other time support
 into the system.  In particular a
 
 CLOCK_WWV or CLOCK_GPS
 

CLOCK_NNTP

 might be nice.  The problem with these sorts of clocks is that they
 imply an array of function pointers for each clock and function pointers
 slow the code down because of their non predictability.  Never the less,
 we will attempt to allow easy expansion in this direction.
 
 Implications on the current kernel:
 
 The high resolution timers will require a fast clock access with the
 maximum supported resolution in order to convert relative times to
 absolute times.  This same fast clock will be used to support the
 various user and system time requests.
 
 There are two ways to provide timers to the kernel.  For lack of a
 better term we will refer to them as "ticked" and "tick less".  Until we
 have performance information that implies that one or the other of these
 methods is better in all cases we will provide both ticked and tick less
 systems.  The variety to be used will be selected at configure time.
 
 For tick less systems we will need to provide code to collect execution
 times.  For the ticked system the current method of collection these
 times will be used.  This project will NOT attempt to improve the
 resolution of these timers, however, the high speed, high resolution
 access to the current time will allow others to augment the system in
 this area.

huh? why not?

 
 For the tick less system the project will also provide a time slice
 expiration interrupt.
 
 The timer list(s) (all pending timers) need to be organized so that the
 following operations are fast:
 
 a.) list insertion of an arbitrary timer,
should be O(log(n)) at worst

 b.) removal of canceled and expired timers, and
easy to make O(1)

 c.) finding the timer for "NOW" and its immediate followers.
head of list or top of tree or top of heap or whatever, O(1)

 Times in the timer list will be absolute and related to system up time.
 These times will be converted to wall time as needed.
 
and ONLY when needed by users


 
 The POSIX interface provides for "absolute" timers relative to a given

do you mean "relative" timers?

 clock.  When these timers are related to a "wall" clock they will need
 adjusting when the wall clock time is adjusted.  These adjustments are
 done for "leap seconds" and the date command.  (Note, we are keeping
 timers relative to "uptime" which can not be adjusted.  This means that
 relative timers and absolute timers related to CLOCK_UPTIME are not
 affected by the above wall time adjustments.)

absolute timers will automatically fall out when you adjust CLOCK_UPTIME...
i.e.  an absolute time is an absolute time and since CLOCK_UPTIME is the
ultimate arbiter of what absolute time it is, adjusting CLOCK_UPTIME will cause
the absolute times in the timer list to be handled properly without 

Re: No 100 HZ timer!

2001-04-17 Thread george anzinger

Mark Salisbury wrote:
 
  Functional Specification for the high-res-timers project.
 
  In addition we expect that we will provide a high resolution timer for
  kernel use (heck, we may provide several).
 
 what we do here determines what we can do for the user..

I was thinking that it might be good to remove the POSIX API for the
kernel and allow a somewhat simplified interface.  For example, the user
gets to resolution by specifying a CLOCK, where we MAY want to allow the
kernel call to directly specify the resolution.  This has already been
suggested.  I suppose you could say the functional spec should define
the kernel PI (KPI?) as well as the user API, but that is a bit fuzzy at
this time as I think it will depend on how we actually code the user
functionality.  Another example: I am leaning toward using a two word
uptime composed of jiffies (i.e. 1/HZ since boot) and a machine
dependent sub jiffie unit.  Each ARCH would then define this unit as
well as the conversion routines to move it back and forth to
nanoseconds, microseconds, and 1/HZ.  I think this format would play
well in task time accounting, as well as in the timer management.  

For calls to something like delay(), however, it sucks.  I think these
calls want a single word, possibly microsecond, time specification. 
This gives a 16 or 17 minutes max and 1 microsecond min. which probably
covers 99.99% of all kernel delay needs.  

Another kernel internal interface should allow the user specified
structures (timespec and timeval).  The notion is to put all the
conversion routines in the timer code to maintain the specified
resolution, and (more importantly), to avoid conversion to a format that
just needs an additional conversion.

To summarize,

I think there is a need for two classes of timer interfaces in the
kernel:

a.) For drivers and others that need "delay()" sorts of things, and
b.) For system calls that handle user specified times.
 
  We will provide several "clocks" as defined by the standard.  In
  particular, the following capabilities will be attached to some clock,
  regardless of the actual clock "name" we end up using:
 
  CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
  linux today).
 
  CLOCK_HIGH_RES a wall clock supporting timers with the highest
  resolution the hardware supports.
 
  CLOCK_1US a wall clock supporting timers with 1 micro second resolution
  (if the hardware allows it).
 
  CLOCK_UPTIME a clock that give the system up time.  (Does this clock
  need to support timers?)
 
  CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.
 
 
 Too many clocks.  we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but
 the others are just fluff.  we should have 1 single clock mechanism for the
 whole system with it's resolution and tick/tickless characteristics determined
 at compile time.

I think you already have let the nose of the camel into the tent :) 
Here is what I am thinking:

Suppose an array of structures of type clock.  Clock_id is an index into
this array.  Here is what is in the structure:

struct clock{
int resolution;
int *gettime();
int *settime();
int *convert_to_uptime();
int *convert_from_uptime();
room for more bright ideas;
};

Now the difference between CLOCK_UPTIME and CLOCK_REALTIME is surly in
the gettime/settime and possibly in the resolution.  But the difference
between CLOCK_REALTIME and CLOCK_1US, CLOCK_HIGH_RES, CLOCK_10MS is JUST
the resolution!  In other words, all they cost is the table entry.  Note
that CLOCK_GMT, CLOCK_UST, and CLOCK_GPS, etc. all fit nicely into this
same structure.

We should also provide a way to "register" a new clock so the user can
easily configure in additional clocks.  There are ways of doing this
that are really easy to use, e.g. the module_init() macro.
 
 also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a
 convenience/compliance offset.

If you mean by "true" that this clock can not be set, starts at 0 at
boot time and can only be affected by rate adjustments to get it to pass
a real second every second, I agree.
 
  At the same time we will NOT support the following clocks:
 
  CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
  wall) of a given task.
 
  CLOCK_PROFILE a clock used to generate profiling events.
 
  CLOCK_???  any clock keyed to a task.
 
 we could do some KOOL things here but they are more related to process time
 accounting and should be dealt with in that context and as a separate project.
 
 however our design should take these concepts into account and allow for easy
 integration of these types of functionality.

I agree.
 
 
  (Note that this does not mean that the clock system will not support the
  virtual and profile clocks, but that they will not be accessible thru
  the POSIX timer interface.)
 
 I think that should sombody choose to implement them, they should be available
 at least through 

Re: No 100 HZ timer!

2001-04-17 Thread Jamie Lokier

george anzinger wrote:
   a.) list insertion of an arbitrary timer,
  should be O(log(n)) at worst
  
   b.) removal of canceled and expired timers, and
  easy to make O(1)
 
 I thought this was true also, but the priority heap structure that has
 been discussed here has a O(log(n)) removal time.

Several priority queue structures support removal in O(1) time.
Perhaps you are thinking of the classic array-based heap, for
which removal is O(log n) in the general case.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread george anzinger

Mark Salisbury wrote:
> 
> > Given a system speed, there is a repeating timer rate which will consume
> > 100% of the system in handling the timer interrupts.  An attempt will
> > be made to detect this rate and adjust the timer to prevent system
> > lockup.  This adjustment will look like timer overruns to the user
> > (i.e. we will take a percent of the interrupts and record the untaken
> > interrupts as overruns)
> 
> just at first blush, there are some things in general but I need to read
> this again and more closely
> 
> but, with POSIX timers, there is a nifty little restriction/protection built
> into the spec regarding the re-insertion of short interval repeating timers.
> that is: a repeating timer will not be re-inserted until AFTER the
> associated signal handler has been handled.

Actually what it says is: "Only a single signal shall be queued to the
process for a given timer at any point in time.  When a timer for which
a signal is still pending expires, no signal shall be queued, and a
timer overrun shall occur."

It then goes on to talk about the overrun count and how it is to be
managed.

What I am suggesting is that the system should detect when these
interrupts would come so fast as to stall the system and just set up a
percent of them while bumping the overrun count as if they had all
occured.

George

> 
> this has some interesting consequences for signal handling and signal
> delivery implementations, but importantly, it ensures that even a flood of
> POSIX timers with very short repeat intervals will be handled cleanly.
> 
> I will get more detailed comments to you tomorrow.
> 
> Mark Salisbury
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread Mark Salisbury


> Given a system speed, there is a repeating timer rate which will consume
> 100% of the system in handling the timer interrupts.  An attempt will
> be made to detect this rate and adjust the timer to prevent system
> lockup.  This adjustment will look like timer overruns to the user
> (i.e. we will take a percent of the interrupts and record the untaken
> interrupts as overruns)

just at first blush, there are some things in general but I need to read
this again and more closely

but, with POSIX timers, there is a nifty little restriction/protection built
into the spec regarding the re-insertion of short interval repeating timers.
that is: a repeating timer will not be re-inserted until AFTER the
associated signal handler has been handled.

this has some interesting consequences for signal handling and signal
delivery implementations, but importantly, it ensures that even a flood of
POSIX timers with very short repeat intervals will be handled cleanly.

I will get more detailed comments to you tomorrow.

Mark Salisbury

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread george anzinger

"Albert D. Cahalan" wrote:
> 
> > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> > linux today).
> 
> Except on the Alpha, and on some ARM systems, etc.
> The HZ constant varies from 10 to 1200.

I suspect we will want to use 10 ms resolution for a clock named
CLOCK_10MS :)
On the other hand we could have a CLOCK_1_OVER_HZ...  Actually with
high-res-timers the actual HZ value becomes a bit less important.  It
would be "nice" to keep 1/HZ == jiffie, however.
> 
> > At the same time we will NOT support the following clocks:
> >
> > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> > wall) of a given task.
> ...
> > For tick less systems we will need to provide code to collect execution
> > times.  For the ticked system the current method of collection these
> > times will be used.  This project will NOT attempt to improve the
> > resolution of these timers, however, the high speed, high resolution
> > access to the current time will allow others to augment the system in
> > this area.
> ...
> > This project will NOT provide higher resolution accounting (i.e. user
> > and system execution times).
> 
> It is nice to have accurate per-process user/system accounting.
> Since you'd be touching the code anyway...

Yeah sure... and will you pick up the ball on all the platform dependent
code to get high-res-timers on all the other platforms?  On second
thought I am reminded of the corollary to the old saw:  "The squeaking
wheel get the grease."  Which is: "He who complains most about the
squeaking gets to do the greasing."  Hint  hint.
> 
> > The POSIX interface provides for "absolute" timers relative to a given
> > clock.  When these timers are related to a "wall" clock they will need
> > adjusting when the wall clock time is adjusted.  These adjustments are
> > done for "leap seconds" and the date command.
> 
> This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely
> defined POSIX time that is none of the above. This is a horrid mess,
> even ignoring gravity and speed. :-)
> 
> Can a second be 2 billion nanoseconds?
> Can a nanosecond be twice as long as normal?
> Can a second appear twice, with the nanoseconds getting reset?
> Can a second never appear at all?
> Can you compute times more than 6 months into the future?
> How far does time deviate from solar time? Is this constrained?
> 
> If you deal with leap seconds, you have to have a table of them.
> This table grows with time, with adjustments being made with only
> about 6 months notice. So the user upgrades after a year or two,
> and the installer discovers that the user has been running a
> system that is unaware of the most recent leap second. Arrrgh.
> 
> Sure you want to touch this? The Austin group argued over it for
> a very long time and never did find a really good solution.
> Maybe you should just keep the code simple and fast, without any
> concern for clock adjustments.

There is a relatively simple way to handle this, at least from the
high-res-timers point of view.  First we convert all timers to absolute
uptime.  This is a nice well behaved time.  At boot time we peg the wall
clock to the uptime.  Then at any given time, wall time is boot wall
time + uptime.  Then date, leap seconds, etc. affect the pegged value of
boot wall time.  Using the POSIX CLOCK id we allow the user to ask for
either version of time.  Now if we define an array of struc clock_id
which contains pointers to such things as functions to return time, any
algorithm you might want can be plugged in to bend time as needed.  The
only fly in this mess is the NTP rate adjustment stuff.  This code is
supposed to allow the system to adjust its ticker to produce accurate
seconds and so gets at the root of the uptime counter be it in hardware
or software or some combination of the two.  But then that's what makes
life interesting :)
> 
> > In either a ticked or tick less system, it is expected that resolutions
> > higher than 1/HZ will come with some additional overhead.  For this
> > reason, the CLOCK resolution will be used to round up times for each
> > timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
> > project will attempt to meet or exceed the current systems timer
> > performance.
> 
> Within the kernel at least, it would be good to let drivers specify
> desired resolution. Then a near-by value could be selected, perhaps
> with some consideration for event type. (for cache reasons)

This could be done, however, I would prefer to provide CLOCK_s to do
this as per the standard.  What does the community say?  In either case
you get different resolutions, but in the latter case the possible
values are fixed at least at configure time.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread Albert D. Cahalan

> CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> linux today). 

Except on the Alpha, and on some ARM systems, etc.
The HZ constant varies from 10 to 1200.

> At the same time we will NOT support the following clocks:
> 
> CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> wall) of a given task.  
...
> For tick less systems we will need to provide code to collect execution
> times.  For the ticked system the current method of collection these
> times will be used.  This project will NOT attempt to improve the
> resolution of these timers, however, the high speed, high resolution
> access to the current time will allow others to augment the system in
> this area.
...
> This project will NOT provide higher resolution accounting (i.e. user
> and system execution times).

It is nice to have accurate per-process user/system accounting.
Since you'd be touching the code anyway...

> The POSIX interface provides for "absolute" timers relative to a given
> clock.  When these timers are related to a "wall" clock they will need
> adjusting when the wall clock time is adjusted.  These adjustments are
> done for "leap seconds" and the date command.

This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely
defined POSIX time that is none of the above. This is a horrid mess,
even ignoring gravity and speed. :-)

Can a second be 2 billion nanoseconds?
Can a nanosecond be twice as long as normal?
Can a second appear twice, with the nanoseconds getting reset?
Can a second never appear at all?
Can you compute times more than 6 months into the future?
How far does time deviate from solar time? Is this constrained?

If you deal with leap seconds, you have to have a table of them.
This table grows with time, with adjustments being made with only
about 6 months notice. So the user upgrades after a year or two,
and the installer discovers that the user has been running a
system that is unaware of the most recent leap second. Arrrgh.

Sure you want to touch this? The Austin group argued over it for
a very long time and never did find a really good solution.
Maybe you should just keep the code simple and fast, without any
concern for clock adjustments.

> In either a ticked or tick less system, it is expected that resolutions
> higher than 1/HZ will come with some additional overhead.  For this
> reason, the CLOCK resolution will be used to round up times for each
> timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
> project will attempt to meet or exceed the current systems timer
> performance.

Within the kernel at least, it would be good to let drivers specify
desired resolution. Then a near-by value could be selected, perhaps
with some consideration for event type. (for cache reasons)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread george anzinger

Mark Salisbury wrote:
> 
> all this talk about which data structure to use and how to allocate memory is
> wy premature.
> 
> there needs to be a clear definition of the requirements that we wish to meet,
> including whether we are going to do ticked, tickless, or both
> 
> a func spec, for lack of a better term needs to be developed
> 
> then, when we go to design the thing, THEN is when we decide on the particular
> flavor of list/tree/heap/array/dbase that we use.
> 
> let's engineer this thing instead of hacking it.

Absolutely, find first draft attached.

Comments please.

George


~snip~

Functional Specification for the high-res-timers project.

http://sourceforge.net/projects/high-res-timers

We are developing code to implement the POSIX clocks & timers as defined
by IEEE Std 1003.1b-1993 Section 14.  (For an on line reference see our
home page: http://high-res-timers.sourceforge.net/ )
  
The API specifies the following functions (for details please see the spec):

int clock_settime(clockid_t clock_id, const struct timespec *tp);
int clock_gettime(clockid_t clock_id, struct timespec *tp);
int clock_getres(clockid_t clock_id, struct timespec *res);

int timer_creat(clockid_t clock_id, struct sigevent *evp,
timer_t *timerid);
int timer_delete(timer_t *timerid);

int timer_settime(timer_t *timerid, int flags, 
  const struct itimerspec *value,
  struct itimerspec *ovalue);
int timer_gettime(timer_t *timerid, struct itimerspec *value);
int timer_getoverrun(timer_t *timerid);

int nanosleep( const struct timesped *rqtp, struct timespec *rmtp);

In addition we expect that we will provide a high resolution timer for
kernel use (heck, we may provide several).

In all this code we will code to allow resolutions to 1 nanosecond second (the
max provided by the timespec structure).  The actual resolution of
any given clock will be fixed at compile time and the code will do its
work at a higher resolution to avoid round off errors as much as
possible.

We will provide several "clocks" as defined by the standard.  In
particular, the following capabilities will be attached to some clock,
regardless of the actual clock "name" we end up using:

CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
linux today). 

CLOCK_HIGH_RES a wall clock supporting timers with the highest
resolution the hardware supports.

CLOCK_1US a wall clock supporting timers with 1 micro second resolution
(if the hardware allows it).

CLOCK_UPTIME a clock that give the system up time.  (Does this clock
need to support timers?)

CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.

At the same time we will NOT support the following clocks:

CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
wall) of a given task.  

CLOCK_PROFILE a clock used to generate profiling events. 

CLOCK_???  any clock keyed to a task.

(Note that this does not mean that the clock system will not support the
virtual and profile clocks, but that they will not be accessible thru
the POSIX timer interface.)

It would be NICE if we could provide a way to hook other time support
into the system.  In particular a

CLOCK_WWV or CLOCK_GPS

might be nice.  The problem with these sorts of clocks is that they
imply an array of function pointers for each clock and function pointers
slow the code down because of their non predictability.  Never the less,
we will attempt to allow easy expansion in this direction.

Implications on the current kernel:

The high resolution timers will require a fast clock access with the
maximum supported resolution in order to convert relative times to
absolute times.  This same fast clock will be used to support the
various user and system time requests.

There are two ways to provide timers to the kernel.  For lack of a
better term we will refer to them as "ticked" and "tick less".  Until we
have performance information that implies that one or the other of these
methods is better in all cases we will provide both ticked and tick less
systems.  The variety to be used will be selected at configure time.

For tick less systems we will need to provide code to collect execution
times.  For the ticked system the current method of collection these
times will be used.  This project will NOT attempt to improve the
resolution of these timers, however, the high speed, high resolution
access to the current time will allow others to augment the system in
this area.

For the tick less system the project will also provide a time slice
expiration interrupt.

The timer list(s) (all pending timers) need to be organized so that the
following operations are fast:

a.) list insertion of an arbitrary timer,
b.) removal of canceled and expired timers, and
c.) finding the timer for "NOW" and its immediate followers.

Times in the timer list will be absolute and related to system up time.
These times will be converted to wall time as needed.

The POSIX interface 

Re: No 100 HZ timer!

2001-04-16 Thread Mark Salisbury

all this talk about which data structure to use and how to allocate memory is
wy premature.

there needs to be a clear definition of the requirements that we wish to meet,
including whether we are going to do ticked, tickless, or both

a func spec, for lack of a better term needs to be developed

then, when we go to design the thing, THEN is when we decide on the particular
flavor of list/tree/heap/array/dbase that we use.

let's engineer this thing instead of hacking it.



On Sun, 15 Apr 2001, Jamie Lokier wrote:
> Ben Greear wrote:
> > > Note that jumping around the array thrashes no more cache than
> > > traversing a tree (it touches the same number of elements).  I prefer
> > > heap-ordered trees though because fixed size is always a bad idea.
> > 
> > With a tree, you will be allocating and de-allocating for every
> > insert/delete right?
> 
> No, there is no memory allocation.
> 
> > On cache-coherency issues, wouldn't it be more likely to have a cache
> > hit when you are accessing one contigious (ie the array) piece of
> > memory?  A 4-k page will hold a lot of indexes!!
> 
> No, because when traversing an array-heap, you don't access contiguous
> entries.  You might get one or two more cache hits near the root of the
> heap.
> 
> > To get around the fixed size thing..could have
> > the array grow itself when needed (and probably never shrink again).
> 
> You could to that, but then you'd have to deal with memory allocation
> failures and memory deadlocks, making add_timer rather complicated.
> It's not acceptable for add_timer to fail or require kmalloc().
> 
> -- Jamie
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  I will be riding in the Multiple Sclerosis**
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,   **
**  please contact me.**
***/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread Mark Salisbury

all this talk about which data structure to use and how to allocate memory is
wy premature.

there needs to be a clear definition of the requirements that we wish to meet,
including whether we are going to do ticked, tickless, or both

a func spec, for lack of a better term needs to be developed

then, when we go to design the thing, THEN is when we decide on the particular
flavor of list/tree/heap/array/dbase that we use.

let's engineer this thing instead of hacking it.



On Sun, 15 Apr 2001, Jamie Lokier wrote:
 Ben Greear wrote:
   Note that jumping around the array thrashes no more cache than
   traversing a tree (it touches the same number of elements).  I prefer
   heap-ordered trees though because fixed size is always a bad idea.
  
  With a tree, you will be allocating and de-allocating for every
  insert/delete right?
 
 No, there is no memory allocation.
 
  On cache-coherency issues, wouldn't it be more likely to have a cache
  hit when you are accessing one contigious (ie the array) piece of
  memory?  A 4-k page will hold a lot of indexes!!
 
 No, because when traversing an array-heap, you don't access contiguous
 entries.  You might get one or two more cache hits near the root of the
 heap.
 
  To get around the fixed size thing..could have
  the array grow itself when needed (and probably never shrink again).
 
 You could to that, but then you'd have to deal with memory allocation
 failures and memory deadlocks, making add_timer rather complicated.
 It's not acceptable for add_timer to fail or require kmalloc().
 
 -- Jamie
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  I will be riding in the Multiple Sclerosis**
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,   **
**  please contact me.**
***/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread george anzinger

Mark Salisbury wrote:
 
 all this talk about which data structure to use and how to allocate memory is
 wy premature.
 
 there needs to be a clear definition of the requirements that we wish to meet,
 including whether we are going to do ticked, tickless, or both
 
 a func spec, for lack of a better term needs to be developed
 
 then, when we go to design the thing, THEN is when we decide on the particular
 flavor of list/tree/heap/array/dbase that we use.
 
 let's engineer this thing instead of hacking it.

Absolutely, find first draft attached.

Comments please.

George


~snip~

Functional Specification for the high-res-timers project.

http://sourceforge.net/projects/high-res-timers

We are developing code to implement the POSIX clocks  timers as defined
by IEEE Std 1003.1b-1993 Section 14.  (For an on line reference see our
home page: http://high-res-timers.sourceforge.net/ )
  
The API specifies the following functions (for details please see the spec):

int clock_settime(clockid_t clock_id, const struct timespec *tp);
int clock_gettime(clockid_t clock_id, struct timespec *tp);
int clock_getres(clockid_t clock_id, struct timespec *res);

int timer_creat(clockid_t clock_id, struct sigevent *evp,
timer_t *timerid);
int timer_delete(timer_t *timerid);

int timer_settime(timer_t *timerid, int flags, 
  const struct itimerspec *value,
  struct itimerspec *ovalue);
int timer_gettime(timer_t *timerid, struct itimerspec *value);
int timer_getoverrun(timer_t *timerid);

int nanosleep( const struct timesped *rqtp, struct timespec *rmtp);

In addition we expect that we will provide a high resolution timer for
kernel use (heck, we may provide several).

In all this code we will code to allow resolutions to 1 nanosecond second (the
max provided by the timespec structure).  The actual resolution of
any given clock will be fixed at compile time and the code will do its
work at a higher resolution to avoid round off errors as much as
possible.

We will provide several "clocks" as defined by the standard.  In
particular, the following capabilities will be attached to some clock,
regardless of the actual clock "name" we end up using:

CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
linux today). 

CLOCK_HIGH_RES a wall clock supporting timers with the highest
resolution the hardware supports.

CLOCK_1US a wall clock supporting timers with 1 micro second resolution
(if the hardware allows it).

CLOCK_UPTIME a clock that give the system up time.  (Does this clock
need to support timers?)

CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.

At the same time we will NOT support the following clocks:

CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
wall) of a given task.  

CLOCK_PROFILE a clock used to generate profiling events. 

CLOCK_???  any clock keyed to a task.

(Note that this does not mean that the clock system will not support the
virtual and profile clocks, but that they will not be accessible thru
the POSIX timer interface.)

It would be NICE if we could provide a way to hook other time support
into the system.  In particular a

CLOCK_WWV or CLOCK_GPS

might be nice.  The problem with these sorts of clocks is that they
imply an array of function pointers for each clock and function pointers
slow the code down because of their non predictability.  Never the less,
we will attempt to allow easy expansion in this direction.

Implications on the current kernel:

The high resolution timers will require a fast clock access with the
maximum supported resolution in order to convert relative times to
absolute times.  This same fast clock will be used to support the
various user and system time requests.

There are two ways to provide timers to the kernel.  For lack of a
better term we will refer to them as "ticked" and "tick less".  Until we
have performance information that implies that one or the other of these
methods is better in all cases we will provide both ticked and tick less
systems.  The variety to be used will be selected at configure time.

For tick less systems we will need to provide code to collect execution
times.  For the ticked system the current method of collection these
times will be used.  This project will NOT attempt to improve the
resolution of these timers, however, the high speed, high resolution
access to the current time will allow others to augment the system in
this area.

For the tick less system the project will also provide a time slice
expiration interrupt.

The timer list(s) (all pending timers) need to be organized so that the
following operations are fast:

a.) list insertion of an arbitrary timer,
b.) removal of canceled and expired timers, and
c.) finding the timer for "NOW" and its immediate followers.

Times in the timer list will be absolute and related to system up time.
These times will be converted to wall time as needed.

The POSIX interface provides for 

Re: No 100 HZ timer!

2001-04-16 Thread Albert D. Cahalan

 CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
 linux today). 

Except on the Alpha, and on some ARM systems, etc.
The HZ constant varies from 10 to 1200.

 At the same time we will NOT support the following clocks:
 
 CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
 wall) of a given task.  
...
 For tick less systems we will need to provide code to collect execution
 times.  For the ticked system the current method of collection these
 times will be used.  This project will NOT attempt to improve the
 resolution of these timers, however, the high speed, high resolution
 access to the current time will allow others to augment the system in
 this area.
...
 This project will NOT provide higher resolution accounting (i.e. user
 and system execution times).

It is nice to have accurate per-process user/system accounting.
Since you'd be touching the code anyway...

 The POSIX interface provides for "absolute" timers relative to a given
 clock.  When these timers are related to a "wall" clock they will need
 adjusting when the wall clock time is adjusted.  These adjustments are
 done for "leap seconds" and the date command.

This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely
defined POSIX time that is none of the above. This is a horrid mess,
even ignoring gravity and speed. :-)

Can a second be 2 billion nanoseconds?
Can a nanosecond be twice as long as normal?
Can a second appear twice, with the nanoseconds getting reset?
Can a second never appear at all?
Can you compute times more than 6 months into the future?
How far does time deviate from solar time? Is this constrained?

If you deal with leap seconds, you have to have a table of them.
This table grows with time, with adjustments being made with only
about 6 months notice. So the user upgrades after a year or two,
and the installer discovers that the user has been running a
system that is unaware of the most recent leap second. Arrrgh.

Sure you want to touch this? The Austin group argued over it for
a very long time and never did find a really good solution.
Maybe you should just keep the code simple and fast, without any
concern for clock adjustments.

 In either a ticked or tick less system, it is expected that resolutions
 higher than 1/HZ will come with some additional overhead.  For this
 reason, the CLOCK resolution will be used to round up times for each
 timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
 project will attempt to meet or exceed the current systems timer
 performance.

Within the kernel at least, it would be good to let drivers specify
desired resolution. Then a near-by value could be selected, perhaps
with some consideration for event type. (for cache reasons)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread george anzinger

"Albert D. Cahalan" wrote:
 
  CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
  linux today).
 
 Except on the Alpha, and on some ARM systems, etc.
 The HZ constant varies from 10 to 1200.

I suspect we will want to use 10 ms resolution for a clock named
CLOCK_10MS :)
On the other hand we could have a CLOCK_1_OVER_HZ...  Actually with
high-res-timers the actual HZ value becomes a bit less important.  It
would be "nice" to keep 1/HZ == jiffie, however.
 
  At the same time we will NOT support the following clocks:
 
  CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
  wall) of a given task.
 ...
  For tick less systems we will need to provide code to collect execution
  times.  For the ticked system the current method of collection these
  times will be used.  This project will NOT attempt to improve the
  resolution of these timers, however, the high speed, high resolution
  access to the current time will allow others to augment the system in
  this area.
 ...
  This project will NOT provide higher resolution accounting (i.e. user
  and system execution times).
 
 It is nice to have accurate per-process user/system accounting.
 Since you'd be touching the code anyway...

Yeah sure... and will you pick up the ball on all the platform dependent
code to get high-res-timers on all the other platforms?  On second
thought I am reminded of the corollary to the old saw:  "The squeaking
wheel get the grease."  Which is: "He who complains most about the
squeaking gets to do the greasing."  Hint  hint.
 
  The POSIX interface provides for "absolute" timers relative to a given
  clock.  When these timers are related to a "wall" clock they will need
  adjusting when the wall clock time is adjusted.  These adjustments are
  done for "leap seconds" and the date command.
 
 This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely
 defined POSIX time that is none of the above. This is a horrid mess,
 even ignoring gravity and speed. :-)
 
 Can a second be 2 billion nanoseconds?
 Can a nanosecond be twice as long as normal?
 Can a second appear twice, with the nanoseconds getting reset?
 Can a second never appear at all?
 Can you compute times more than 6 months into the future?
 How far does time deviate from solar time? Is this constrained?
 
 If you deal with leap seconds, you have to have a table of them.
 This table grows with time, with adjustments being made with only
 about 6 months notice. So the user upgrades after a year or two,
 and the installer discovers that the user has been running a
 system that is unaware of the most recent leap second. Arrrgh.
 
 Sure you want to touch this? The Austin group argued over it for
 a very long time and never did find a really good solution.
 Maybe you should just keep the code simple and fast, without any
 concern for clock adjustments.

There is a relatively simple way to handle this, at least from the
high-res-timers point of view.  First we convert all timers to absolute
uptime.  This is a nice well behaved time.  At boot time we peg the wall
clock to the uptime.  Then at any given time, wall time is boot wall
time + uptime.  Then date, leap seconds, etc. affect the pegged value of
boot wall time.  Using the POSIX CLOCK id we allow the user to ask for
either version of time.  Now if we define an array of struc clock_id
which contains pointers to such things as functions to return time, any
algorithm you might want can be plugged in to bend time as needed.  The
only fly in this mess is the NTP rate adjustment stuff.  This code is
supposed to allow the system to adjust its ticker to produce accurate
seconds and so gets at the root of the uptime counter be it in hardware
or software or some combination of the two.  But then that's what makes
life interesting :)
 
  In either a ticked or tick less system, it is expected that resolutions
  higher than 1/HZ will come with some additional overhead.  For this
  reason, the CLOCK resolution will be used to round up times for each
  timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
  project will attempt to meet or exceed the current systems timer
  performance.
 
 Within the kernel at least, it would be good to let drivers specify
 desired resolution. Then a near-by value could be selected, perhaps
 with some consideration for event type. (for cache reasons)

This could be done, however, I would prefer to provide CLOCK_s to do
this as per the standard.  What does the community say?  In either case
you get different resolutions, but in the latter case the possible
values are fixed at least at configure time.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread Mark Salisbury


 Given a system speed, there is a repeating timer rate which will consume
 100% of the system in handling the timer interrupts.  An attempt will
 be made to detect this rate and adjust the timer to prevent system
 lockup.  This adjustment will look like timer overruns to the user
 (i.e. we will take a percent of the interrupts and record the untaken
 interrupts as overruns)

just at first blush, there are some things in general but I need to read
this again and more closely

but, with POSIX timers, there is a nifty little restriction/protection built
into the spec regarding the re-insertion of short interval repeating timers.
that is: a repeating timer will not be re-inserted until AFTER the
associated signal handler has been handled.

this has some interesting consequences for signal handling and signal
delivery implementations, but importantly, it ensures that even a flood of
POSIX timers with very short repeat intervals will be handled cleanly.

I will get more detailed comments to you tomorrow.

Mark Salisbury

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-16 Thread george anzinger

Mark Salisbury wrote:
 
  Given a system speed, there is a repeating timer rate which will consume
  100% of the system in handling the timer interrupts.  An attempt will
  be made to detect this rate and adjust the timer to prevent system
  lockup.  This adjustment will look like timer overruns to the user
  (i.e. we will take a percent of the interrupts and record the untaken
  interrupts as overruns)
 
 just at first blush, there are some things in general but I need to read
 this again and more closely
 
 but, with POSIX timers, there is a nifty little restriction/protection built
 into the spec regarding the re-insertion of short interval repeating timers.
 that is: a repeating timer will not be re-inserted until AFTER the
 associated signal handler has been handled.

Actually what it says is: "Only a single signal shall be queued to the
process for a given timer at any point in time.  When a timer for which
a signal is still pending expires, no signal shall be queued, and a
timer overrun shall occur."

It then goes on to talk about the overrun count and how it is to be
managed.

What I am suggesting is that the system should detect when these
interrupts would come so fast as to stall the system and just set up a
percent of them while bumping the overrun count as if they had all
occured.

George

 
 this has some interesting consequences for signal handling and signal
 delivery implementations, but importantly, it ensures that even a flood of
 POSIX timers with very short repeat intervals will be handled cleanly.
 
 I will get more detailed comments to you tomorrow.
 
 Mark Salisbury
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-15 Thread Ben Greear

Jamie Lokier wrote:
> 
> george anzinger wrote:
> > Horst von Brand wrote:
> > >
> > > Ben Greear <[EMAIL PROTECTED]> said:
> > >
> > > [...]
> > >
> > > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > > front  It can be implemented in an array which should help cache
> > > > coherency and all those other things they talked about in school :)
> > >
> > > Insertion and deleting the first are both O(log N). Plus the array is fixed
> > > size (bad idea) and the jumping around in the array thrashes the caches.
> > > --
> > And your solution is?
> 
> Note that jumping around the array thrashes no more cache than
> traversing a tree (it touches the same number of elements).  I prefer
> heap-ordered trees though because fixed size is always a bad idea.

With a tree, you will be allocating and de-allocating for every
insert/delete right?  That seems like a reasonable performance
hit that an array-based approach might not have... 

On cache-coherency issues, wouldn't it be more likely to have a cache hit when you are
accessing one contigious (ie the array) piece of memory?  A 4-k page
will hold a lot of indexes!!

To get around the fixed size thing..could have
the array grow itself when needed (and probably never shrink again).
This would suck if you did it often, but I'm assuming that it would
quickly grow to needed size and then stabalize...

> 
> Insertion is O(1) if entries can be predicted to be near
> enough some place in the list, be that the beginning, the end, or some
> marked places in the middle.
> 
> By the way, the current timer implementation only appears to be O(1) if
> you ignore the overhead of having to do a check on every tick, and the
> extra processing on table rollover.  For random timer usage patterns,
> that overhead adds up to O(log n), the same as a heap.
> 
> However for skewed usage patterns (likely in the kernel), the current
> table method avoids the O(log n) sorting overhead because long-delay
> timers are often removed before percolating down to the smallest tables.
> It is possible to produce a general purpose heap which also has this
> property.
> 
> -- Jamie
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear ([EMAIL PROTECTED])  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com (Released under GPL)
http://scry.wanfear.com   http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-15 Thread Ben Greear

george anzinger wrote:
> 
> Horst von Brand wrote:
> >
> > Ben Greear <[EMAIL PROTECTED]> said:
> >
> > [...]
> >
> > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > front  It can be implemented in an array which should help cache
> > > coherency and all those other things they talked about in school :)
> >
> > Insertion and deleting the first are both O(log N). Plus the array is fixed
> > size (bad idea) and the jumping around in the array thrashes the caches.

Jumping around an array can't be much worse than jumping around
a linked list, can it?

It does not have to be fixed length, though it wouldn't be log(n) to
grow the array, it can still be done...and once you reach maximal
size, you won't be growing it any more...

I had forgotten about the log(n) to delete, though log(n) is
still pretty good.  As others have suggested, it might be good
to have a linked list for very-soon-to-expire timers.  However,
they would have to be few enough that your linear insert was
not worse than a log(n) instert and a log(n) delete...

> > --
> And your solution is?


> 
> George

-- 
Ben Greear ([EMAIL PROTECTED])  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com (Released under GPL)
http://scry.wanfear.com   http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-15 Thread Ben Greear

george anzinger wrote:
 
 Horst von Brand wrote:
 
  Ben Greear [EMAIL PROTECTED] said:
 
  [...]
 
   Wouldn't a heap be a good data structure for a list of timers?  Insertion
   is log(n) and finding the one with the least time is O(1), ie pop off the
   front  It can be implemented in an array which should help cache
   coherency and all those other things they talked about in school :)
 
  Insertion and deleting the first are both O(log N). Plus the array is fixed
  size (bad idea) and the jumping around in the array thrashes the caches.

Jumping around an array can't be much worse than jumping around
a linked list, can it?

It does not have to be fixed length, though it wouldn't be log(n) to
grow the array, it can still be done...and once you reach maximal
size, you won't be growing it any more...

I had forgotten about the log(n) to delete, though log(n) is
still pretty good.  As others have suggested, it might be good
to have a linked list for very-soon-to-expire timers.  However,
they would have to be few enough that your linear insert was
not worse than a log(n) instert and a log(n) delete...

  --
 And your solution is?


 
 George

-- 
Ben Greear ([EMAIL PROTECTED])  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com (Released under GPL)
http://scry.wanfear.com   http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-15 Thread Ben Greear

Jamie Lokier wrote:
 
 george anzinger wrote:
  Horst von Brand wrote:
  
   Ben Greear [EMAIL PROTECTED] said:
  
   [...]
  
Wouldn't a heap be a good data structure for a list of timers?  Insertion
is log(n) and finding the one with the least time is O(1), ie pop off the
front  It can be implemented in an array which should help cache
coherency and all those other things they talked about in school :)
  
   Insertion and deleting the first are both O(log N). Plus the array is fixed
   size (bad idea) and the jumping around in the array thrashes the caches.
   --
  And your solution is?
 
 Note that jumping around the array thrashes no more cache than
 traversing a tree (it touches the same number of elements).  I prefer
 heap-ordered trees though because fixed size is always a bad idea.

With a tree, you will be allocating and de-allocating for every
insert/delete right?  That seems like a reasonable performance
hit that an array-based approach might not have... 

On cache-coherency issues, wouldn't it be more likely to have a cache hit when you are
accessing one contigious (ie the array) piece of memory?  A 4-k page
will hold a lot of indexes!!

To get around the fixed size thing..could have
the array grow itself when needed (and probably never shrink again).
This would suck if you did it often, but I'm assuming that it would
quickly grow to needed size and then stabalize...

 
 Insertion is O(1) if entries can be predicted to be near
 enough some place in the list, be that the beginning, the end, or some
 marked places in the middle.
 
 By the way, the current timer implementation only appears to be O(1) if
 you ignore the overhead of having to do a check on every tick, and the
 extra processing on table rollover.  For random timer usage patterns,
 that overhead adds up to O(log n), the same as a heap.
 
 However for skewed usage patterns (likely in the kernel), the current
 table method avoids the O(log n) sorting overhead because long-delay
 timers are often removed before percolating down to the smallest tables.
 It is possible to produce a general purpose heap which also has this
 property.
 
 -- Jamie
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear ([EMAIL PROTECTED])  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com (Released under GPL)
http://scry.wanfear.com   http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Jamie Lokier

george anzinger wrote:
> Horst von Brand wrote:
> > 
> > Ben Greear <[EMAIL PROTECTED]> said:
> > 
> > [...]
> > 
> > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > front  It can be implemented in an array which should help cache
> > > coherency and all those other things they talked about in school :)
> > 
> > Insertion and deleting the first are both O(log N). Plus the array is fixed
> > size (bad idea) and the jumping around in the array thrashes the caches.
> > --
> And your solution is?

Note that jumping around the array thrashes no more cache than
traversing a tree (it touches the same number of elements).  I prefer
heap-ordered trees though because fixed size is always a bad idea.

Insertion is O(1) if entries can be predicted to be near
enough some place in the list, be that the beginning, the end, or some
marked places in the middle.

By the way, the current timer implementation only appears to be O(1) if
you ignore the overhead of having to do a check on every tick, and the
extra processing on table rollover.  For random timer usage patterns,
that overhead adds up to O(log n), the same as a heap.

However for skewed usage patterns (likely in the kernel), the current
table method avoids the O(log n) sorting overhead because long-delay
timers are often removed before percolating down to the smallest tables.
It is possible to produce a general purpose heap which also has this
property.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Jamie Lokier

george anzinger wrote:
> If we are to do high-res-timers, I think we will always have to do some
> sort of a sort on insert.

Yes, that's called a priority queue...  (And you don't actually sort on
each insertion).

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread george anzinger

Horst von Brand wrote:
> 
> Ben Greear <[EMAIL PROTECTED]> said:
> 
> [...]
> 
> > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > is log(n) and finding the one with the least time is O(1), ie pop off the
> > front  It can be implemented in an array which should help cache
> > coherency and all those other things they talked about in school :)
> 
> Insertion and deleting the first are both O(log N). Plus the array is fixed
> size (bad idea) and the jumping around in the array thrashes the caches.
> --
And your solution is?

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Horst von Brand

Ben Greear <[EMAIL PROTECTED]> said:

[...]

> Wouldn't a heap be a good data structure for a list of timers?  Insertion
> is log(n) and finding the one with the least time is O(1), ie pop off the
> front  It can be implemented in an array which should help cache
> coherency and all those other things they talked about in school :)

Insertion and deleting the first are both O(log N). Plus the array is fixed
size (bad idea) and the jumping around in the array thrashes the caches.
-- 
Horst von Brand [EMAIL PROTECTED]
Casilla 9G, Vin~a del Mar, Chile   +56 32 672616
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread george anzinger

Jamie Lokier wrote:
> 
> george anzinger wrote:
> > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > front  It can be implemented in an array which should help cache
> > > coherency and all those other things they talked about in school :)
> > >
> > I must be missing something here.  You get log(n) from what?  B-trees?
> > How would you manage them to get the needed balance?  Stopping the world
> > to re-balance is worse than the cascade.  I guess I need to read up on
> > this stuff.  A good pointer would be much appreciated.
> 
> Look for "priority queues" and "heaps".  In its simplest form, you use a
> heap-ordered tree, which can be implemented using an array (that's
> usually how it's presented), or having the objects in the heap point to
> each other.
> 
> A heap-ordered tree is not as strictly ordered as, well, an ordered tree
> :-)  The rule is: if A is the parent of B and C, then A expires earlier
> than B, and A expires earlier than C.  There is no constraint on the
> relative expiry times of B and C.
> 
> There is no "stop the world" to rebalance, which you may consider an
> advantage over the present hierarchy of tables.  On the other hand, each
> insertion or deletion operation takes O(log n) time, where n is the
> number of items in the queue.  Although fairly fast, this bound can be
> improved if you know the typical insertion/deletion patterns, to O(1)
> for selected cases.  Also you should know that not all priority queues
> are based on heap-ordered trees.
> 
> Linux' current hierarchy of tables is a good example of optimisation: it
> is optimised for inserting and actually running short term timers, as
> well as inserting and deleting (before running) long term timers.  These
> extremes take O(1) for insertion, removal and expiry, including the
> "stop the world" time.  This should be considered before and move to a
> heap-based priority queue, which may turn out slower.
> 
> > Meanwhile, I keep thinking of a simple doubly linked list in time
> > order.  To speed it up keep an array of pointers to the first N whole
> > jiffie points and maybe pointers to coarser points beyond the first N.
> > Make N, say 512.  Build the pointers as needed.  This should give
> > something like O(n/N) insertion and O(1) removal.
> 
> You've just described the same as the current implementation, but with
> lists for longer term timers.  I.e. slower.  With your coarser points,
> you have to sort the front elements of the coarse list into a fine one
> from time to time.
> 
> The idea of having jiffie-point pointers into a data structure for fast
> insertion is a good one for speeding up any data structure for that
> common case, though.
> 
If we are to do high-res-timers, I think we will always have to do some
sort of a sort on insert.  If we can keep the sort to a VERY short list
and only do it on sub jiffie timers, we will have something that is VERY
close to what we have now.

I think that the density of timers tends to increase as they get closer
to "now".  This should allow coarser pointers for times more removed
from "now".  Still if we sort on insert we will not have to resort
later.

The pointers into the list, however, are perishable and need to be
refreshed as time passes.  My though was to do this only when a needed
pointer was found to be "expired" and then only for the needed pointer. 
On first need we would have a small overhead, but would remember for
next time.

Thanks for the good input

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Jamie Lokier

george anzinger wrote:
> > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > is log(n) and finding the one with the least time is O(1), ie pop off the
> > front  It can be implemented in an array which should help cache
> > coherency and all those other things they talked about in school :)
> > 
> I must be missing something here.  You get log(n) from what?  B-trees? 
> How would you manage them to get the needed balance?  Stopping the world
> to re-balance is worse than the cascade.  I guess I need to read up on
> this stuff.  A good pointer would be much appreciated. 

Look for "priority queues" and "heaps".  In its simplest form, you use a
heap-ordered tree, which can be implemented using an array (that's
usually how it's presented), or having the objects in the heap point to
each other.

A heap-ordered tree is not as strictly ordered as, well, an ordered tree
:-)  The rule is: if A is the parent of B and C, then A expires earlier
than B, and A expires earlier than C.  There is no constraint on the
relative expiry times of B and C.

There is no "stop the world" to rebalance, which you may consider an
advantage over the present hierarchy of tables.  On the other hand, each
insertion or deletion operation takes O(log n) time, where n is the
number of items in the queue.  Although fairly fast, this bound can be
improved if you know the typical insertion/deletion patterns, to O(1)
for selected cases.  Also you should know that not all priority queues
are based on heap-ordered trees.

Linux' current hierarchy of tables is a good example of optimisation: it
is optimised for inserting and actually running short term timers, as
well as inserting and deleting (before running) long term timers.  These
extremes take O(1) for insertion, removal and expiry, including the
"stop the world" time.  This should be considered before and move to a
heap-based priority queue, which may turn out slower.

> Meanwhile, I keep thinking of a simple doubly linked list in time
> order.  To speed it up keep an array of pointers to the first N whole
> jiffie points and maybe pointers to coarser points beyond the first N. 
> Make N, say 512.  Build the pointers as needed.  This should give
> something like O(n/N) insertion and O(1) removal.

You've just described the same as the current implementation, but with
lists for longer term timers.  I.e. slower.  With your coarser points,
you have to sort the front elements of the coarse list into a fine one
from time to time.

The idea of having jiffie-point pointers into a data structure for fast
insertion is a good one for speeding up any data structure for that
common case, though.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread george anzinger

Ben Greear wrote:
> 
> Bret Indrelee wrote:
> >
> > On Thu, 12 Apr 2001, george anzinger wrote:
> > > Bret Indrelee wrote:
> > > >
> > > > On Thu, 12 Apr 2001, george anzinger wrote:
> > > > > Bret Indrelee wrote:
> > > > > > Keep all timers in a sorted double-linked list. Do the insert
> > > > > > intelligently, adding it from the back or front of the list depending on
> > > > > > where it is in relation to existing entries.
> > > > >
> > > > > I think this is too slow, especially for a busy system, but there are
> > > > > solutions...
> > > >
> > > > It is better than the current solution.
> > >
> > > Uh, where are we talking about.  The current time list insert is real
> > > close to O(1) and never more than O(5).
> >
> > I don't like the cost of the cascades every (as I recall) 256
> > interrupts. This is more work than is done in the rest of the interrupt
> > processing, happens during the tick interrupt, and results in a rebuild of
> > much of the table.

Right, it needs to go, we need to eliminate the "lumps" in time :)
> >
> > -Bret
> 
> Wouldn't a heap be a good data structure for a list of timers?  Insertion
> is log(n) and finding the one with the least time is O(1), ie pop off the
> front  It can be implemented in an array which should help cache
> coherency and all those other things they talked about in school :)
> 
I must be missing something here.  You get log(n) from what?  B-trees? 
How would you manage them to get the needed balance?  Stopping the world
to re-balance is worse than the cascade.  I guess I need to read up on
this stuff.  A good pointer would be much appreciated. 

Meanwhile, I keep thinking of a simple doubly linked list in time
order.  To speed it up keep an array of pointers to the first N whole
jiffie points and maybe pointers to coarser points beyond the first N. 
Make N, say 512.  Build the pointers as needed.  This should give
something like O(n/N) insertion and O(1) removal.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread george anzinger

Ben Greear wrote:
 
 Bret Indrelee wrote:
 
  On Thu, 12 Apr 2001, george anzinger wrote:
   Bret Indrelee wrote:
   
On Thu, 12 Apr 2001, george anzinger wrote:
 Bret Indrelee wrote:
  Keep all timers in a sorted double-linked list. Do the insert
  intelligently, adding it from the back or front of the list depending on
  where it is in relation to existing entries.

 I think this is too slow, especially for a busy system, but there are
 solutions...
   
It is better than the current solution.
  
   Uh, where are we talking about.  The current time list insert is real
   close to O(1) and never more than O(5).
 
  I don't like the cost of the cascades every (as I recall) 256
  interrupts. This is more work than is done in the rest of the interrupt
  processing, happens during the tick interrupt, and results in a rebuild of
  much of the table.

Right, it needs to go, we need to eliminate the "lumps" in time :)
 
  -Bret
 
 Wouldn't a heap be a good data structure for a list of timers?  Insertion
 is log(n) and finding the one with the least time is O(1), ie pop off the
 front  It can be implemented in an array which should help cache
 coherency and all those other things they talked about in school :)
 
I must be missing something here.  You get log(n) from what?  B-trees? 
How would you manage them to get the needed balance?  Stopping the world
to re-balance is worse than the cascade.  I guess I need to read up on
this stuff.  A good pointer would be much appreciated. 

Meanwhile, I keep thinking of a simple doubly linked list in time
order.  To speed it up keep an array of pointers to the first N whole
jiffie points and maybe pointers to coarser points beyond the first N. 
Make N, say 512.  Build the pointers as needed.  This should give
something like O(n/N) insertion and O(1) removal.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Jamie Lokier

george anzinger wrote:
  Wouldn't a heap be a good data structure for a list of timers?  Insertion
  is log(n) and finding the one with the least time is O(1), ie pop off the
  front  It can be implemented in an array which should help cache
  coherency and all those other things they talked about in school :)
  
 I must be missing something here.  You get log(n) from what?  B-trees? 
 How would you manage them to get the needed balance?  Stopping the world
 to re-balance is worse than the cascade.  I guess I need to read up on
 this stuff.  A good pointer would be much appreciated. 

Look for "priority queues" and "heaps".  In its simplest form, you use a
heap-ordered tree, which can be implemented using an array (that's
usually how it's presented), or having the objects in the heap point to
each other.

A heap-ordered tree is not as strictly ordered as, well, an ordered tree
:-)  The rule is: if A is the parent of B and C, then A expires earlier
than B, and A expires earlier than C.  There is no constraint on the
relative expiry times of B and C.

There is no "stop the world" to rebalance, which you may consider an
advantage over the present hierarchy of tables.  On the other hand, each
insertion or deletion operation takes O(log n) time, where n is the
number of items in the queue.  Although fairly fast, this bound can be
improved if you know the typical insertion/deletion patterns, to O(1)
for selected cases.  Also you should know that not all priority queues
are based on heap-ordered trees.

Linux' current hierarchy of tables is a good example of optimisation: it
is optimised for inserting and actually running short term timers, as
well as inserting and deleting (before running) long term timers.  These
extremes take O(1) for insertion, removal and expiry, including the
"stop the world" time.  This should be considered before and move to a
heap-based priority queue, which may turn out slower.

 Meanwhile, I keep thinking of a simple doubly linked list in time
 order.  To speed it up keep an array of pointers to the first N whole
 jiffie points and maybe pointers to coarser points beyond the first N. 
 Make N, say 512.  Build the pointers as needed.  This should give
 something like O(n/N) insertion and O(1) removal.

You've just described the same as the current implementation, but with
lists for longer term timers.  I.e. slower.  With your coarser points,
you have to sort the front elements of the coarse list into a fine one
from time to time.

The idea of having jiffie-point pointers into a data structure for fast
insertion is a good one for speeding up any data structure for that
common case, though.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread george anzinger

Jamie Lokier wrote:
 
 george anzinger wrote:
   Wouldn't a heap be a good data structure for a list of timers?  Insertion
   is log(n) and finding the one with the least time is O(1), ie pop off the
   front  It can be implemented in an array which should help cache
   coherency and all those other things they talked about in school :)
  
  I must be missing something here.  You get log(n) from what?  B-trees?
  How would you manage them to get the needed balance?  Stopping the world
  to re-balance is worse than the cascade.  I guess I need to read up on
  this stuff.  A good pointer would be much appreciated.
 
 Look for "priority queues" and "heaps".  In its simplest form, you use a
 heap-ordered tree, which can be implemented using an array (that's
 usually how it's presented), or having the objects in the heap point to
 each other.
 
 A heap-ordered tree is not as strictly ordered as, well, an ordered tree
 :-)  The rule is: if A is the parent of B and C, then A expires earlier
 than B, and A expires earlier than C.  There is no constraint on the
 relative expiry times of B and C.
 
 There is no "stop the world" to rebalance, which you may consider an
 advantage over the present hierarchy of tables.  On the other hand, each
 insertion or deletion operation takes O(log n) time, where n is the
 number of items in the queue.  Although fairly fast, this bound can be
 improved if you know the typical insertion/deletion patterns, to O(1)
 for selected cases.  Also you should know that not all priority queues
 are based on heap-ordered trees.
 
 Linux' current hierarchy of tables is a good example of optimisation: it
 is optimised for inserting and actually running short term timers, as
 well as inserting and deleting (before running) long term timers.  These
 extremes take O(1) for insertion, removal and expiry, including the
 "stop the world" time.  This should be considered before and move to a
 heap-based priority queue, which may turn out slower.
 
  Meanwhile, I keep thinking of a simple doubly linked list in time
  order.  To speed it up keep an array of pointers to the first N whole
  jiffie points and maybe pointers to coarser points beyond the first N.
  Make N, say 512.  Build the pointers as needed.  This should give
  something like O(n/N) insertion and O(1) removal.
 
 You've just described the same as the current implementation, but with
 lists for longer term timers.  I.e. slower.  With your coarser points,
 you have to sort the front elements of the coarse list into a fine one
 from time to time.
 
 The idea of having jiffie-point pointers into a data structure for fast
 insertion is a good one for speeding up any data structure for that
 common case, though.
 
If we are to do high-res-timers, I think we will always have to do some
sort of a sort on insert.  If we can keep the sort to a VERY short list
and only do it on sub jiffie timers, we will have something that is VERY
close to what we have now.

I think that the density of timers tends to increase as they get closer
to "now".  This should allow coarser pointers for times more removed
from "now".  Still if we sort on insert we will not have to resort
later.

The pointers into the list, however, are perishable and need to be
refreshed as time passes.  My though was to do this only when a needed
pointer was found to be "expired" and then only for the needed pointer. 
On first need we would have a small overhead, but would remember for
next time.

Thanks for the good input

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Horst von Brand

Ben Greear [EMAIL PROTECTED] said:

[...]

 Wouldn't a heap be a good data structure for a list of timers?  Insertion
 is log(n) and finding the one with the least time is O(1), ie pop off the
 front  It can be implemented in an array which should help cache
 coherency and all those other things they talked about in school :)

Insertion and deleting the first are both O(log N). Plus the array is fixed
size (bad idea) and the jumping around in the array thrashes the caches.
-- 
Horst von Brand [EMAIL PROTECTED]
Casilla 9G, Vin~a del Mar, Chile   +56 32 672616
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread george anzinger

Horst von Brand wrote:
 
 Ben Greear [EMAIL PROTECTED] said:
 
 [...]
 
  Wouldn't a heap be a good data structure for a list of timers?  Insertion
  is log(n) and finding the one with the least time is O(1), ie pop off the
  front  It can be implemented in an array which should help cache
  coherency and all those other things they talked about in school :)
 
 Insertion and deleting the first are both O(log N). Plus the array is fixed
 size (bad idea) and the jumping around in the array thrashes the caches.
 --
And your solution is?

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Jamie Lokier

george anzinger wrote:
 If we are to do high-res-timers, I think we will always have to do some
 sort of a sort on insert.

Yes, that's called a priority queue...  (And you don't actually sort on
each insertion).

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-13 Thread Jamie Lokier

george anzinger wrote:
 Horst von Brand wrote:
  
  Ben Greear [EMAIL PROTECTED] said:
  
  [...]
  
   Wouldn't a heap be a good data structure for a list of timers?  Insertion
   is log(n) and finding the one with the least time is O(1), ie pop off the
   front  It can be implemented in an array which should help cache
   coherency and all those other things they talked about in school :)
  
  Insertion and deleting the first are both O(log N). Plus the array is fixed
  size (bad idea) and the jumping around in the array thrashes the caches.
  --
 And your solution is?

Note that jumping around the array thrashes no more cache than
traversing a tree (it touches the same number of elements).  I prefer
heap-ordered trees though because fixed size is always a bad idea.

Insertion is O(1) if entries can be predicted to be near
enough some place in the list, be that the beginning, the end, or some
marked places in the middle.

By the way, the current timer implementation only appears to be O(1) if
you ignore the overhead of having to do a check on every tick, and the
extra processing on table rollover.  For random timer usage patterns,
that overhead adds up to O(log n), the same as a heap.

However for skewed usage patterns (likely in the kernel), the current
table method avoids the O(log n) sorting overhead because long-delay
timers are often removed before percolating down to the smallest tables.
It is possible to produce a general purpose heap which also has this
property.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Ben Greear

Bret Indrelee wrote:
> 
> On Thu, 12 Apr 2001, george anzinger wrote:
> > Bret Indrelee wrote:
> > >
> > > On Thu, 12 Apr 2001, george anzinger wrote:
> > > > Bret Indrelee wrote:
> > > > > Keep all timers in a sorted double-linked list. Do the insert
> > > > > intelligently, adding it from the back or front of the list depending on
> > > > > where it is in relation to existing entries.
> > > >
> > > > I think this is too slow, especially for a busy system, but there are
> > > > solutions...
> > >
> > > It is better than the current solution.
> >
> > Uh, where are we talking about.  The current time list insert is real
> > close to O(1) and never more than O(5).
> 
> I don't like the cost of the cascades every (as I recall) 256
> interrupts. This is more work than is done in the rest of the interrupt
> processing, happens during the tick interrupt, and results in a rebuild of
> much of the table.
> 
> -Bret

Wouldn't a heap be a good data structure for a list of timers?  Insertion
is log(n) and finding the one with the least time is O(1), ie pop off the
front  It can be implemented in an array which should help cache
coherency and all those other things they talked about in school :)

Ben
> 
> --
> Bret Indrelee |  Sometimes, to be deep, we must act shallow!
> [EMAIL PROTECTED]   |  -Riff in The Quatrix
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear ([EMAIL PROTECTED])  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com (Released under GPL)
http://scry.wanfear.com   http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Bret Indrelee

On Thu, 12 Apr 2001, george anzinger wrote:
> Bret Indrelee wrote:
> > 
> > On Thu, 12 Apr 2001, george anzinger wrote:
> > > Bret Indrelee wrote:
> > > > Keep all timers in a sorted double-linked list. Do the insert
> > > > intelligently, adding it from the back or front of the list depending on
> > > > where it is in relation to existing entries.
> > >
> > > I think this is too slow, especially for a busy system, but there are
> > > solutions...
> > 
> > It is better than the current solution.
> 
> Uh, where are we talking about.  The current time list insert is real
> close to O(1) and never more than O(5).

I don't like the cost of the cascades every (as I recall) 256
interrupts. This is more work than is done in the rest of the interrupt
processing, happens during the tick interrupt, and results in a rebuild of
much of the table.

-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread george anzinger

Bret Indrelee wrote:
> 
> On Thu, 12 Apr 2001, george anzinger wrote:
> > Bret Indrelee wrote:
> > > Keep all timers in a sorted double-linked list. Do the insert
> > > intelligently, adding it from the back or front of the list depending on
> > > where it is in relation to existing entries.
> >
> > I think this is too slow, especially for a busy system, but there are
> > solutions...
> 
> It is better than the current solution.

Uh, where are we talking about.  The current time list insert is real
close to O(1) and never more than O(5).
> 
> The insert takes the most time, having to scan through the list. If you
> had to scan the whole list it would be O(n) with a simple linked list. If
> you insert it from the end, it is almost always going to be less than
> that.

Right, but compared to the current O(5) max, this is just too long.
> 
> The time to remove is O(1).
> 
> Fetching the first element from the list is also O(1), but you may have to
> fetch several items that have all expired. Here you could do something
> clever. Just make sure it is O(1) to determine if the list is empty.
> 
I would hope to move expired timers to another list or just process
them.  In any case they should not be a problem here.

One of the posts that started all this mentioned a tick less system (on
a 360 I think) that used the current time list.  They had to scan
forward in time to find the next event and easy 10 ms was a new list to
look at.  Conclusion: The current list structure is NOT organized for
tick less time keeping.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Bret Indrelee

On Thu, 12 Apr 2001, george anzinger wrote:
> Bret Indrelee wrote:
> > Keep all timers in a sorted double-linked list. Do the insert
> > intelligently, adding it from the back or front of the list depending on
> > where it is in relation to existing entries.
> 
> I think this is too slow, especially for a busy system, but there are
> solutions...

It is better than the current solution.

The insert takes the most time, having to scan through the list. If you
had to scan the whole list it would be O(n) with a simple linked list. If
you insert it from the end, it is almost always going to be less than
that.

The time to remove is O(1).

Fetching the first element from the list is also O(1), but you may have to
fetch several items that have all expired. Here you could do something
clever. Just make sure it is O(1) to determine if the list is empty.

[ snip ]
> > The real trick is to do a lot less processing on every tick than is
> > currently done. Current generation PCs can easily handle 1000s of
> > interrupts a second if you keep the overhead small.
> 
> I don't see the logic here.  Having taken the interrupt, one would tend
> to want to do as much as possible, rather than schedule another
> interrupt to continue the processing.  Rather, I think you are trying to
> say that we can afford to take more interrupts for time keeping.  Still,
> I think what we are trying to get with tick less timers is a system that
> takes FEWER interrupts, not more.

The system should be CAPABLE of handling 1000s of interrupts without
excessive system load.

The actual number of interrupts would be reduced if we adjusted the
interrupt interval based on the head of the list.

Two different things.

-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread george anzinger

Bret Indrelee wrote:
> 
> Mikulas Patocka ([EMAIL PROTECTED]) wrote:
> > Adding and removing timers happens much more frequently than PIT tick,
> > so
> > comparing these times is pointless.
> >
> > If you have some device and timer protecting it from lockup on buggy
> > hardware, you actually
> >
> > send request to device
> > add timer
> >
> > receive interrupt and read reply
> > remove timer
> >
> > With the curent timer semantics, the cost of add timer and del timer is
> > nearly zero. If you had to reprogram the PIT on each request and reply,
> > it
> > would slow things down.
> >
> > Note that you call mod_timer also on each packet received - and in worst
> > case (which may happen), you end up reprogramming the PIT on each
> > packet.
> 
> You can still have nearly zero cost for the normal case. Avoiding worst
> case behaviour is also pretty easy.
> 
> You only reprogram the PIT if you have to change the interval.
> 
> Keep all timers in a sorted double-linked list. Do the insert
> intelligently, adding it from the back or front of the list depending on
> where it is in relation to existing entries.

I think this is too slow, especially for a busy system, but there are
solutions...
> 
> You only need to reprogram the interval timer when:
> 1. You've got a new entry at the head of the list
> AND
> 2. You've reprogrammed the interval to something larger than the time to
> the new head of list.

Uh, I think 1. IMPLIES 2.  If it is at the head, it must be closer in
than what the system is waiting for (unless, of course its time has
already passed, but lets not consider that here).
> 
> In the case of a device timeout, it is usually not going to be inserted at
> the head of the list. It is very seldom going to actually timeout.

Right, and if the system doesn't have many timers, thus putting this at
the head, it has the time to do the extra work.
> 
> Choose your interval wisely, only increasing it when you know it will pay
> off. The best way of doing this would probably be to track some sort
> of LCD for timeouts.

I wonder about a more relaxed device timeout semantic that says
something like: wait X + next timer interrupt.  This would cause the
timer insert code to find an entry at least X out and put this timer
just after it.  There are other ways to do this of course.  The question
here is: Would this be worth while?
> 
> The real trick is to do a lot less processing on every tick than is
> currently done. Current generation PCs can easily handle 1000s of
> interrupts a second if you keep the overhead small.

I don't see the logic here.  Having taken the interrupt, one would tend
to want to do as much as possible, rather than schedule another
interrupt to continue the processing.  Rather, I think you are trying to
say that we can afford to take more interrupts for time keeping.  Still,
I think what we are trying to get with tick less timers is a system that
takes FEWER interrupts, not more.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Bret Indrelee

On Thu, 12 Apr 2001, Mark Salisbury wrote:
> On Wed, 11 Apr 2001, Bret Indrelee wrote:
> > Current generation PCs can easily handle 1000s of
> > interrupts a second if you keep the overhead small.
> 
> the PC centric implementation of the ticked system is one of its major flaws.
> 
> there are architectures which the cost of a fixed interval is the same as a
> variable interval, i.e. no reload register, so you must explicitely load a
> value each interrupt anyway. and if you want accurate intervals, you must
> perform a calculation each reload, and not just load a static value, because
> you don't know how many cycles it took from the time the interrupt happened
> till you get to the reload line because the int may have been masked or lower
> pri than another interrupt.

People were saying the cost of adjusting the PIT was high.

On those archs where this is true, you would want to avoid changing the
interval timer.

On a more reasonable architechure, you would change it each time the head
of the timer list changed.

> also, why handle 1000's of interrupts if you only need to handle 10?

There is no reason.

I was pointing out that current hardware can easily handle this sort of
load, not advocating that distributions change HZ to 1000.


On Linux 2.2, if you change HZ to 400 your system is going to run at
about 70% of speed. The thing is it is limited to this value, bumping it
any higher doesn't cause a change.

If you reprogram the RTC to generate a 1000 HZ interrupt pulse, as I 
recall there is about a 3% change in performance. This is with a constant
rate interrupt, making it variable rate would reduce this.

You can verify this yourself. Get whatever benchmark you prefer. Run it on
a system. Rebuild after changing HZ to 400 and run it again. Restore HZ
and use the RTC driver to produce an interval of 1024 HZ, run your
benchmark again. Changing HZ had a huge performance impact in 2.2.13, I'm
pretty sure it didn't change much in later 2.2 releases.

Seems to me like there is a problem here. I'm pretty sure it is a
combination of the overhead of the cascaded timers combined with the
very high overhead of the scheduler that causes this. Someday this should
be fixed.


What I would like to see is support for a higher resolution timer in
Linux, for those of us that need times down to about 1mSec. Those systems
that can quickly reprogram the interval timer would do so each time a new
value was at the head of list, others would have to do it more
intelligently.

Regardless of how it is done, forcing the system to run a constant 1000 HZ
interrupt through the system should not impact system performance more
than a few percent. If it does, something was done wrong. When there is
need for the higher resolution timer, people will accept the overhead. On
most systems it would not be used, so there would be no reason to run the
higher rate timer.


-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Mark Salisbury


On Wed, 11 Apr 2001, Bret Indrelee wrote:
> Current generation PCs can easily handle 1000s of
> interrupts a second if you keep the overhead small.

the PC centric implementation of the ticked system is one of its major flaws.

there are architectures which the cost of a fixed interval is the same as a
variable interval, i.e. no reload register, so you must explicitely load a
value each interrupt anyway. and if you want accurate intervals, you must
perform a calculation each reload, and not just load a static value, because
you don't know how many cycles it took from the time the interrupt happened
till you get to the reload line because the int may have been masked or lower
pri than another interrupt.

also, why handle 1000's of interrupts if you only need to handle 10?

-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  I will be riding in the Multiple Sclerosis**
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,   **
**  please contact me.**
***/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Maciej W. Rozycki

On Wed, 11 Apr 2001, Jamie Lokier wrote:

> Think of the original 64k and 256k VGA cards.  I think some of those
> didn't have an irq, but did have a way to read the progress of the
> raster, which you could PLL to using timer interrupts.  Some video games
> still look fine at 320x200 :-)

 The *original* VGA, i.e. the PS/2 one, did have an IRQ, IIRC (according
to docs -- I haven't ever seen one).  Cheap clones might have lacked it,
though.

 Then there is workstation (non-PC)  hardware from early '90s which we run
on and which uses an IRQ-driven interface to graphics adapters -- not only
for vsync. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Jamie Lokier

watermodem wrote:
> As somebody who is now debating how to measure latencies in a 
> giga-bit ethernet environment with several components doing 
> L3 switching in much less than 10 micro-seconds ... (hardware)
> I agree that some method is need to achieve higher resolutions.  
> (Sigh... I will likely need to buy something big and expensive)
> {this is for a system to make use of L. Yarrow's little protocol}

Use Alteon/Netgear cards, everyone else seems to be :-)  We get
measurements on the order of 100ns, if we are just measuring network
latencies.  (Data is not transferred over the PCI bus).

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Jamie Lokier

John Alvord wrote:
> I bumped into a funny non-optimization a few years ago. A system with
> a timer queue like the above had been "optimized" by keeping old timer
> elements... ready for new tasks to link onto and activate. The
> granularity was 1 millsecond. Over time, all timer values from 0 to
> roughly 10 minutes had been used. That resulted in 60,000 permanent
> storage fragments laying about... a significant fragmentation problem.
> The end was a forced recycle every month or so.

This is the sort of thing that Linux does with slab, dentry, inode
caches and so on.  In theory the memory is reclaimed as required :-)

It's not a big issue with timers, as the timer elements are fixed size
structures that tend to be embedded in other structures.  So the
lifetime of the timer element is the same as the lifetime of the object
associated with the timer.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Jamie Lokier

John Alvord wrote:
 I bumped into a funny non-optimization a few years ago. A system with
 a timer queue like the above had been "optimized" by keeping old timer
 elements... ready for new tasks to link onto and activate. The
 granularity was 1 millsecond. Over time, all timer values from 0 to
 roughly 10 minutes had been used. That resulted in 60,000 permanent
 storage fragments laying about... a significant fragmentation problem.
 The end was a forced recycle every month or so.

This is the sort of thing that Linux does with slab, dentry, inode
caches and so on.  In theory the memory is reclaimed as required :-)

It's not a big issue with timers, as the timer elements are fixed size
structures that tend to be embedded in other structures.  So the
lifetime of the timer element is the same as the lifetime of the object
associated with the timer.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Jamie Lokier

watermodem wrote:
 As somebody who is now debating how to measure latencies in a 
 giga-bit ethernet environment with several components doing 
 L3 switching in much less than 10 micro-seconds ... (hardware)
 I agree that some method is need to achieve higher resolutions.  
 (Sigh... I will likely need to buy something big and expensive)
 {this is for a system to make use of L. Yarrow's little protocol}

Use Alteon/Netgear cards, everyone else seems to be :-)  We get
measurements on the order of 100ns, if we are just measuring network
latencies.  (Data is not transferred over the PCI bus).

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Maciej W. Rozycki

On Wed, 11 Apr 2001, Jamie Lokier wrote:

 Think of the original 64k and 256k VGA cards.  I think some of those
 didn't have an irq, but did have a way to read the progress of the
 raster, which you could PLL to using timer interrupts.  Some video games
 still look fine at 320x200 :-)

 The *original* VGA, i.e. the PS/2 one, did have an IRQ, IIRC (according
to docs -- I haven't ever seen one).  Cheap clones might have lacked it,
though.

 Then there is workstation (non-PC)  hardware from early '90s which we run
on and which uses an IRQ-driven interface to graphics adapters -- not only
for vsync. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-12 Thread Mark Salisbury


On Wed, 11 Apr 2001, Bret Indrelee wrote:
 Current generation PCs can easily handle 1000s of
 interrupts a second if you keep the overhead small.

the PC centric implementation of the ticked system is one of its major flaws.

there are architectures which the cost of a fixed interval is the same as a
variable interval, i.e. no reload register, so you must explicitely load a
value each interrupt anyway. and if you want accurate intervals, you must
perform a calculation each reload, and not just load a static value, because
you don't know how many cycles it took from the time the interrupt happened
till you get to the reload line because the int may have been masked or lower
pri than another interrupt.

also, why handle 1000's of interrupts if you only need to handle 10?

-- 
/***
**   Mark Salisbury | Mercury Computer Systems**
**   [EMAIL PROTECTED] | System OS - Kernel Team **
****
**  I will be riding in the Multiple Sclerosis**
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,   **
**  please contact me.**
***/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Bret Indrelee

On Thu, 12 Apr 2001, Mark Salisbury wrote:
 On Wed, 11 Apr 2001, Bret Indrelee wrote:
  Current generation PCs can easily handle 1000s of
  interrupts a second if you keep the overhead small.
 
 the PC centric implementation of the ticked system is one of its major flaws.
 
 there are architectures which the cost of a fixed interval is the same as a
 variable interval, i.e. no reload register, so you must explicitely load a
 value each interrupt anyway. and if you want accurate intervals, you must
 perform a calculation each reload, and not just load a static value, because
 you don't know how many cycles it took from the time the interrupt happened
 till you get to the reload line because the int may have been masked or lower
 pri than another interrupt.

People were saying the cost of adjusting the PIT was high.

On those archs where this is true, you would want to avoid changing the
interval timer.

On a more reasonable architechure, you would change it each time the head
of the timer list changed.

 also, why handle 1000's of interrupts if you only need to handle 10?

There is no reason.

I was pointing out that current hardware can easily handle this sort of
load, not advocating that distributions change HZ to 1000.


On Linux 2.2, if you change HZ to 400 your system is going to run at
about 70% of speed. The thing is it is limited to this value, bumping it
any higher doesn't cause a change.

If you reprogram the RTC to generate a 1000 HZ interrupt pulse, as I 
recall there is about a 3% change in performance. This is with a constant
rate interrupt, making it variable rate would reduce this.

You can verify this yourself. Get whatever benchmark you prefer. Run it on
a system. Rebuild after changing HZ to 400 and run it again. Restore HZ
and use the RTC driver to produce an interval of 1024 HZ, run your
benchmark again. Changing HZ had a huge performance impact in 2.2.13, I'm
pretty sure it didn't change much in later 2.2 releases.

Seems to me like there is a problem here. I'm pretty sure it is a
combination of the overhead of the cascaded timers combined with the
very high overhead of the scheduler that causes this. Someday this should
be fixed.


What I would like to see is support for a higher resolution timer in
Linux, for those of us that need times down to about 1mSec. Those systems
that can quickly reprogram the interval timer would do so each time a new
value was at the head of list, others would have to do it more
intelligently.

Regardless of how it is done, forcing the system to run a constant 1000 HZ
interrupt through the system should not impact system performance more
than a few percent. If it does, something was done wrong. When there is
need for the higher resolution timer, people will accept the overhead. On
most systems it would not be used, so there would be no reason to run the
higher rate timer.


-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread george anzinger

Bret Indrelee wrote:
 
 Mikulas Patocka ([EMAIL PROTECTED]) wrote:
  Adding and removing timers happens much more frequently than PIT tick,
  so
  comparing these times is pointless.
 
  If you have some device and timer protecting it from lockup on buggy
  hardware, you actually
 
  send request to device
  add timer
 
  receive interrupt and read reply
  remove timer
 
  With the curent timer semantics, the cost of add timer and del timer is
  nearly zero. If you had to reprogram the PIT on each request and reply,
  it
  would slow things down.
 
  Note that you call mod_timer also on each packet received - and in worst
  case (which may happen), you end up reprogramming the PIT on each
  packet.
 
 You can still have nearly zero cost for the normal case. Avoiding worst
 case behaviour is also pretty easy.
 
 You only reprogram the PIT if you have to change the interval.
 
 Keep all timers in a sorted double-linked list. Do the insert
 intelligently, adding it from the back or front of the list depending on
 where it is in relation to existing entries.

I think this is too slow, especially for a busy system, but there are
solutions...
 
 You only need to reprogram the interval timer when:
 1. You've got a new entry at the head of the list
 AND
 2. You've reprogrammed the interval to something larger than the time to
 the new head of list.

Uh, I think 1. IMPLIES 2.  If it is at the head, it must be closer in
than what the system is waiting for (unless, of course its time has
already passed, but lets not consider that here).
 
 In the case of a device timeout, it is usually not going to be inserted at
 the head of the list. It is very seldom going to actually timeout.

Right, and if the system doesn't have many timers, thus putting this at
the head, it has the time to do the extra work.
 
 Choose your interval wisely, only increasing it when you know it will pay
 off. The best way of doing this would probably be to track some sort
 of LCD for timeouts.

I wonder about a more relaxed device timeout semantic that says
something like: wait X + next timer interrupt.  This would cause the
timer insert code to find an entry at least X out and put this timer
just after it.  There are other ways to do this of course.  The question
here is: Would this be worth while?
 
 The real trick is to do a lot less processing on every tick than is
 currently done. Current generation PCs can easily handle 1000s of
 interrupts a second if you keep the overhead small.

I don't see the logic here.  Having taken the interrupt, one would tend
to want to do as much as possible, rather than schedule another
interrupt to continue the processing.  Rather, I think you are trying to
say that we can afford to take more interrupts for time keeping.  Still,
I think what we are trying to get with tick less timers is a system that
takes FEWER interrupts, not more.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Bret Indrelee

On Thu, 12 Apr 2001, george anzinger wrote:
 Bret Indrelee wrote:
  Keep all timers in a sorted double-linked list. Do the insert
  intelligently, adding it from the back or front of the list depending on
  where it is in relation to existing entries.
 
 I think this is too slow, especially for a busy system, but there are
 solutions...

It is better than the current solution.

The insert takes the most time, having to scan through the list. If you
had to scan the whole list it would be O(n) with a simple linked list. If
you insert it from the end, it is almost always going to be less than
that.

The time to remove is O(1).

Fetching the first element from the list is also O(1), but you may have to
fetch several items that have all expired. Here you could do something
clever. Just make sure it is O(1) to determine if the list is empty.

[ snip ]
  The real trick is to do a lot less processing on every tick than is
  currently done. Current generation PCs can easily handle 1000s of
  interrupts a second if you keep the overhead small.
 
 I don't see the logic here.  Having taken the interrupt, one would tend
 to want to do as much as possible, rather than schedule another
 interrupt to continue the processing.  Rather, I think you are trying to
 say that we can afford to take more interrupts for time keeping.  Still,
 I think what we are trying to get with tick less timers is a system that
 takes FEWER interrupts, not more.

The system should be CAPABLE of handling 1000s of interrupts without
excessive system load.

The actual number of interrupts would be reduced if we adjusted the
interrupt interval based on the head of the list.

Two different things.

-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread george anzinger

Bret Indrelee wrote:
 
 On Thu, 12 Apr 2001, george anzinger wrote:
  Bret Indrelee wrote:
   Keep all timers in a sorted double-linked list. Do the insert
   intelligently, adding it from the back or front of the list depending on
   where it is in relation to existing entries.
 
  I think this is too slow, especially for a busy system, but there are
  solutions...
 
 It is better than the current solution.

Uh, where are we talking about.  The current time list insert is real
close to O(1) and never more than O(5).
 
 The insert takes the most time, having to scan through the list. If you
 had to scan the whole list it would be O(n) with a simple linked list. If
 you insert it from the end, it is almost always going to be less than
 that.

Right, but compared to the current O(5) max, this is just too long.
 
 The time to remove is O(1).
 
 Fetching the first element from the list is also O(1), but you may have to
 fetch several items that have all expired. Here you could do something
 clever. Just make sure it is O(1) to determine if the list is empty.
 
I would hope to move expired timers to another list or just process
them.  In any case they should not be a problem here.

One of the posts that started all this mentioned a tick less system (on
a 360 I think) that used the current time list.  They had to scan
forward in time to find the next event and easy 10 ms was a new list to
look at.  Conclusion: The current list structure is NOT organized for
tick less time keeping.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Bret Indrelee

On Thu, 12 Apr 2001, george anzinger wrote:
 Bret Indrelee wrote:
  
  On Thu, 12 Apr 2001, george anzinger wrote:
   Bret Indrelee wrote:
Keep all timers in a sorted double-linked list. Do the insert
intelligently, adding it from the back or front of the list depending on
where it is in relation to existing entries.
  
   I think this is too slow, especially for a busy system, but there are
   solutions...
  
  It is better than the current solution.
 
 Uh, where are we talking about.  The current time list insert is real
 close to O(1) and never more than O(5).

I don't like the cost of the cascades every (as I recall) 256
interrupts. This is more work than is done in the rest of the interrupt
processing, happens during the tick interrupt, and results in a rebuild of
much of the table.

-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-12 Thread Ben Greear

Bret Indrelee wrote:
 
 On Thu, 12 Apr 2001, george anzinger wrote:
  Bret Indrelee wrote:
  
   On Thu, 12 Apr 2001, george anzinger wrote:
Bret Indrelee wrote:
 Keep all timers in a sorted double-linked list. Do the insert
 intelligently, adding it from the back or front of the list depending on
 where it is in relation to existing entries.
   
I think this is too slow, especially for a busy system, but there are
solutions...
  
   It is better than the current solution.
 
  Uh, where are we talking about.  The current time list insert is real
  close to O(1) and never more than O(5).
 
 I don't like the cost of the cascades every (as I recall) 256
 interrupts. This is more work than is done in the rest of the interrupt
 processing, happens during the tick interrupt, and results in a rebuild of
 much of the table.
 
 -Bret

Wouldn't a heap be a good data structure for a list of timers?  Insertion
is log(n) and finding the one with the least time is O(1), ie pop off the
front  It can be implemented in an array which should help cache
coherency and all those other things they talked about in school :)

Ben
 
 --
 Bret Indrelee |  Sometimes, to be deep, we must act shallow!
 [EMAIL PROTECTED]   |  -Riff in The Quatrix
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear ([EMAIL PROTECTED])  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com (Released under GPL)
http://scry.wanfear.com   http://scry.wanfear.com/~greear
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread watermodem

Alan Cox wrote:
> 
> > Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> > and MIN_DELACK is 0.04s, TCP would hardly benefit from them.
> 
> There are a considerable number of people who really do need 1Khz resolution.
> Midi is one of the example cases. That doesn't mean we have to go to a 1KHz
> timer, we may want to combine a 100Hz timer with a smart switch to 1Khz

As somebody who is now debating how to measure latencies in a 
giga-bit ethernet environment with several components doing 
L3 switching in much less than 10 micro-seconds ... (hardware)
I agree that some method is need to achieve higher resolutions.  
(Sigh... I will likely need to buy something big and expensive)
{this is for a system to make use of L. Yarrow's little protocol}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Mark Salisbury

> I suspect you might go for ticked if its overhead was less.  The thing
> that makes me think the overhead is high for tick less is the accounting
> and time slice stuff.  This has to be handled each system call and each
> schedule call.  These happen WAY more often than ticks...  Contrary
> wise, I would go for tick less if its overhead is the same or less under
> a reasonable load.  Of course, we would also want to check overload
> sensitivity.

as I said, i think the process time accounting is disjoint from the
ticked/tickless issue and should therefore be considered seperately..

in my experience with the tickless method, after all pending timers have
been serviced, then to determine the interval until the next interrupt
should be generated all that is needed is one 64 bit subtraction and a
register load (about 5 - 8 cycles)

I think we should spend some serious time with some quick and dirty
prototypes of both methods, both to characterize all the issues raised and
to refine our ideas when it comes time to implement this.

I also think that we should give some thought to implementing both an
improved ticked system and a tickless system to be chosen as CONFIG options
so that the developer putting together a system can make a choice based on
their needs, and not our whims.  I am more than willing to concede that
there is a class of customer to whom a ticked system is appropriate, but I
am quite sure that there is a class for whom the ticked model is completely
unacceptable.

> On a half way loaded system?  Do you think it is that high?  I sort of
> think that, once the system is loaded, there would be a useful timer
> tick most of the time.  Might be useful to do some measurements of this.

any non-useful tick takes away cycles that could be better used performing
FFT's

there are many kinds of loads, some which generate large numbers of timed
events and some that generate none or few.
I think we want to be able to provide good solutions for both cases even.

we should do lots of measurements.

Mark


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread John Alvord

On Wed, 11 Apr 2001 20:57:04 +0200, Jamie Lokier
<[EMAIL PROTECTED]> wrote:

>george anzinger wrote:
>> > A pointer-based priority queue is really not a very complex thing, and
>> > there are ways to optimise them for typical cases like the above.
>> > 
>> Please do enlighten me.  The big problem in my mind is that the
>> pointers, pointing at points in time, are perishable.
>
>Um, I'm not sure what perishability has to do with anything.  Timers at
>the moment can be added, deleted and destroyed and it's no big deal.
>
>Look in an algorithms book under "priority queue".  They usually don't
>say how to use a heap-ordered tree -- usually it's an array -- but you
>can use a tree if you want.  In such a tree, each timer has a link to
>_two_ next timers and one previous timer.  (The previous timer link is
>only needed if you can delete timers before they expire, which is
>required for Linux).
>
>I'm not saying saying a heap-ordered tree is fastest.  But it's ok, and
>doesn't require any more storage than the `struct timer' objects
>themselves.
>
>It's possible to optimise this kind of data structure rather a lot,
>depending on how you want to use it.  Linux' current timer algorithm is
>a pretty good example of a priority queue optimised for timer ticks,
>where you don't mind doing a small amount of work per tick.
>
>One notable complication with the kernel is that we don't want every
>timer to run at its exactly allocated time, except for some special
>timers.  For example, if you have 100 TCP streams and their
>retransmission times are scheduled for 1.s, 1.0001s, 1.0002s, etc.,
>you'd much rather just process them all at once as is done at the moment
>by the 100Hz timer.  This is because cache locality is much more
>important for TCP retransmits than transmit timing resolution.
>
>There's literature online about this class of problems: look up "event
>set" and "simulation" and "fast priority queue".

I bumped into a funny non-optimization a few years ago. A system with
a timer queue like the above had been "optimized" by keeping old timer
elements... ready for new tasks to link onto and activate. The
granularity was 1 millsecond. Over time, all timer values from 0 to
roughly 10 minutes had been used. That resulted in 60,000 permanent
storage fragments laying about... a significant fragmentation problem.
The end was a forced recycle every month or so.

john alvord
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Jamie Lokier

george anzinger wrote:
> > A pointer-based priority queue is really not a very complex thing, and
> > there are ways to optimise them for typical cases like the above.
> > 
> Please do enlighten me.  The big problem in my mind is that the
> pointers, pointing at points in time, are perishable.

Um, I'm not sure what perishability has to do with anything.  Timers at
the moment can be added, deleted and destroyed and it's no big deal.

Look in an algorithms book under "priority queue".  They usually don't
say how to use a heap-ordered tree -- usually it's an array -- but you
can use a tree if you want.  In such a tree, each timer has a link to
_two_ next timers and one previous timer.  (The previous timer link is
only needed if you can delete timers before they expire, which is
required for Linux).

I'm not saying saying a heap-ordered tree is fastest.  But it's ok, and
doesn't require any more storage than the `struct timer' objects
themselves.

It's possible to optimise this kind of data structure rather a lot,
depending on how you want to use it.  Linux' current timer algorithm is
a pretty good example of a priority queue optimised for timer ticks,
where you don't mind doing a small amount of work per tick.

One notable complication with the kernel is that we don't want every
timer to run at its exactly allocated time, except for some special
timers.  For example, if you have 100 TCP streams and their
retransmission times are scheduled for 1.s, 1.0001s, 1.0002s, etc.,
you'd much rather just process them all at once as is done at the moment
by the 100Hz timer.  This is because cache locality is much more
important for TCP retransmits than transmit timing resolution.

There's literature online about this class of problems: look up "event
set" and "simulation" and "fast priority queue".

enjoy,
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-11 Thread Bret Indrelee

Mikulas Patocka ([EMAIL PROTECTED]) wrote:
> Adding and removing timers happens much more frequently than PIT tick,
> so 
> comparing these times is pointless. 
>
> If you have some device and timer protecting it from lockup on buggy 
> hardware, you actually 
>
> send request to device 
> add timer 
>
> receive interrupt and read reply 
> remove timer 
>
> With the curent timer semantics, the cost of add timer and del timer is 
> nearly zero. If you had to reprogram the PIT on each request and reply,
> it 
> would slow things down. 
>
> Note that you call mod_timer also on each packet received - and in worst 
> case (which may happen), you end up reprogramming the PIT on each
> packet. 

You can still have nearly zero cost for the normal case. Avoiding worst
case behaviour is also pretty easy.

You only reprogram the PIT if you have to change the interval.

Keep all timers in a sorted double-linked list. Do the insert
intelligently, adding it from the back or front of the list depending on
where it is in relation to existing entries.

You only need to reprogram the interval timer when:
1. You've got a new entry at the head of the list
AND
2. You've reprogrammed the interval to something larger than the time to 
the new head of list.

In the case of a device timeout, it is usually not going to be inserted at
the head of the list. It is very seldom going to actually timeout.

Choose your interval wisely, only increasing it when you know it will pay
off. The best way of doing this would probably be to track some sort
of LCD for timeouts.

The real trick is to do a lot less processing on every tick than is
currently done. Current generation PCs can easily handle 1000s of
interrupts a second if you keep the overhead small.

-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Oliver Xymoron

On Mon, 9 Apr 2001, Alan Cox wrote:

> > > Its worth doing even on the ancient x86 boards with the PIT.
> >
> > Note that programming the PIT is slw and doing it on every timer
> > add_timer/del_timer would be a pain.
>
> You only have to do it occasionally.
>
> When you add a timer newer than the current one
>   (arguably newer by at least 1/2*HZ sec)

That's only if we want to do no better than the current system. We'd want
a new variable called timer_margin or something, which would be dependent
on interrupt source and processor, and could be tuned up or down via
sysctl.

> When you finish running the timers at an interval and the new interval is
> significantly larger than the current one.

Make that larger or smaller. If we come out of a quiescent state (1 hz
interrupts, say) and start getting 10ms timers, we want to respond to them
right away.

> Remember each tick we poke the PIT anyway

We could also have a HZ_max tunable above which we would not try to
reprogram the interval. On older systems, this could be set at
100-200HZ...

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread george anzinger

Jamie Locker wrote:
> 
> Mark Salisbury wrote:
> > > The complexity comes in when you want to maintain indexes into the list
> > > for quick insertion of new timers.  To get the current insert
> > > performance, for example, you would need pointers to (at least) each of
> > > the next 256 centasecond boundaries in the list.  But the list ages, so
> > > these pointers need to be continually updated.  The thought I had was to
> > > update needed pointers (and only those needed) each time an insert was
> > > done and a needed pointer was found to be missing or stale.  Still it
> > > adds complexity that the static structure used now doesn't have.
> >
> > actually, I think a head and tail pointer would be sufficient for most
> > cases. (most new timers are either going to be a new head of list or a new
> > tail, i.e. long duration timeouts that will never be serviced or short
> > duration timers that are going to go off "real soon now (tm)")  the oddball
> > cases would be mostly coming from user-space, i.e. nanosleep which a longerr
> > list insertion disapears in the block/wakeup/context switch overhead
> 
> A pointer-based priority queue is really not a very complex thing, and
> there are ways to optimise them for typical cases like the above.
> 
Please do enlighten me.  The big problem in my mind is that the
pointers, pointing at points in time, are perishable.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Jamie Lokier

Maciej W. Rozycki wrote:
> On Tue, 10 Apr 2001, Zdenek Kabelac wrote:
> 
> > I think it would be wrong if we could not add new very usable features
> > to the system just because some old hardware doesn't support it.
> 
>  s/old/crappy/ -- even old hardware can handle vsync IRQs fine.  E.g. TVGA
> 8900C. 

Think of the original 64k and 256k VGA cards.  I think some of those
didn't have an irq, but did have a way to read the progress of the
raster, which you could PLL to using timer interrupts.  Some video games
still look fine at 320x200 :-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Jamie Lokier

Mark Salisbury wrote:
> > The complexity comes in when you want to maintain indexes into the list
> > for quick insertion of new timers.  To get the current insert
> > performance, for example, you would need pointers to (at least) each of
> > the next 256 centasecond boundaries in the list.  But the list ages, so
> > these pointers need to be continually updated.  The thought I had was to
> > update needed pointers (and only those needed) each time an insert was
> > done and a needed pointer was found to be missing or stale.  Still it
> > adds complexity that the static structure used now doesn't have.
> 
> actually, I think a head and tail pointer would be sufficient for most
> cases. (most new timers are either going to be a new head of list or a new
> tail, i.e. long duration timeouts that will never be serviced or short
> duration timers that are going to go off "real soon now (tm)")  the oddball
> cases would be mostly coming from user-space, i.e. nanosleep which a longerr
> list insertion disapears in the block/wakeup/context switch overhead

A pointer-based priority queue is really not a very complex thing, and
there are ways to optimise them for typical cases like the above.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Maciej W. Rozycki

On Tue, 10 Apr 2001, Zdenek Kabelac wrote:

> I think it would be wrong if we could not add new very usable features
> to the system just because some old hardware doesn't support it.

 s/old/crappy/ -- even old hardware can handle vsync IRQs fine.  E.g. TVGA
8900C. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread schwidefsky



>f) As noted, the account timers (task user/system times) would be much
>more accurate with the tick less approach.  The cost is added code in
>both the system call and the schedule path.
>
>Tentative conclusions:
>
>Currently we feel that the tick less approach is not acceptable due to
>(f).  We felt that this added code would NOT be welcome AND would, in a
>reasonably active system, have much higher overhead than any savings in
>not having a tick.  Also (d) implies a list organization that will, at
>the very least, be harder to understand.  (We have some thoughts here,
>but abandoned the effort because of (f).)  We are, of course, open to
>discussion on this issue and all others related to the project
>objectives.
f) might be true on Intel based systems. At least for s/390 the situation
is a little bit different. Here is a extract from the s/390 part of the
timer patch:

   .macro  UPDATE_ENTER_TIME reload
   la  %r14,thread+_TSS_UTIME(%r9) # pointer to utime
   tm  SP_PSW+1(%r15),0x01  # interrupting from user ?
   jno 0f   # yes -> add to user time
   la  %r14,8(%r14) # no -> add to system time
0: lm  %r0,%r1,0(%r14)  # load user/system time
   sl  %r1,__LC_LAST_MARK+4 # subtract last time mark
   bc  3,BASED(1f)  # borrow ?
   sl  %r0,BASED(.Lc1)
1: sl  %r0,__LC_LAST_MARK
   stck__LC_LAST_MARK   # make a new mark
   al  %r1,__LC_LAST_MARK+4 # add new mark -> added delta
   bc  12,BASED(2f) # carry ?
   al  %r0,BASED(.Lc1)
2: al  %r0,__LC_LAST_MARK
   stm %r0,%r1,0(%r14)  # store updated user/system time
   clc __LC_LAST_MARK(8),__LC_JIFFY_TIMER # check if enough time
   jl  3f   # passed for a jiffy update
   l   %r1,BASED(.Ltime_warp)
   basr%r14,%r1
   .if \reload  # reload != 0 for system call
   lm  %r2,%r6,SP_R2(%r15)  # reload clobbered parameters
   .endif
3:

   .macro  UPDATE_LEAVE_TIME
   l   %r1,BASED(.Ltq_timer)# test if tq_timer list is empty
   x   %r1,0(%r1)   # tq_timer->next != tq_timer ?
   jz  0f
   l   %r1,BASED(.Ltq_timer_active)
   icm %r0,15,0(%r1)# timer event already added ?
   jnz 0f
   l   %r1,BASED(.Ltq_pending)
   basr%r14,%r1
0: lm  %r0,%r1,thread+_TSS_STIME(%r9) # load system time
   sl  %r1,__LC_LAST_MARK+4 # subtract last time mark
   bc  3,BASED(1f)  # borrow ?
   sl  %r0,BASED(.Lc1)
1: sl  %r0,__LC_LAST_MARK
   stck__LC_LAST_MARK   # make new mark
   al  %r1,__LC_LAST_MARK+4 # add new mark -> added delta
   bc  12,BASED(2f) # carry ?
   al  %r0,BASED(.Lc1)
2: al  %r0,__LC_LAST_MARK
   stm %r0,%r1,thread+_TSS_STIME(%r9) # store system time
   .endm

The two macros UPDATE_ENTER_TIME and UPDATE_LEAVE_TIMER are executed
on every system entry/exit. In the case that no special work has to
be done less then 31 instruction are executed in addition to the
normal system entry/exit code. Special work has to be done if more
time then 1/HZ has passed (call time_warp), or if tq_timer contains
an element (call tq_pending).
The accuracy of the timer events has not changed. It still is 1/HZ.
The only thing this patch does is to avoid unneeded interruptions.
I'd be happy if this could be combined with a new, more accurate
timing method.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread schwidefsky



f) As noted, the account timers (task user/system times) would be much
more accurate with the tick less approach.  The cost is added code in
both the system call and the schedule path.

Tentative conclusions:

Currently we feel that the tick less approach is not acceptable due to
(f).  We felt that this added code would NOT be welcome AND would, in a
reasonably active system, have much higher overhead than any savings in
not having a tick.  Also (d) implies a list organization that will, at
the very least, be harder to understand.  (We have some thoughts here,
but abandoned the effort because of (f).)  We are, of course, open to
discussion on this issue and all others related to the project
objectives.
f) might be true on Intel based systems. At least for s/390 the situation
is a little bit different. Here is a extract from the s/390 part of the
timer patch:

   .macro  UPDATE_ENTER_TIME reload
   la  %r14,thread+_TSS_UTIME(%r9) # pointer to utime
   tm  SP_PSW+1(%r15),0x01  # interrupting from user ?
   jno 0f   # yes - add to user time
   la  %r14,8(%r14) # no - add to system time
0: lm  %r0,%r1,0(%r14)  # load user/system time
   sl  %r1,__LC_LAST_MARK+4 # subtract last time mark
   bc  3,BASED(1f)  # borrow ?
   sl  %r0,BASED(.Lc1)
1: sl  %r0,__LC_LAST_MARK
   stck__LC_LAST_MARK   # make a new mark
   al  %r1,__LC_LAST_MARK+4 # add new mark - added delta
   bc  12,BASED(2f) # carry ?
   al  %r0,BASED(.Lc1)
2: al  %r0,__LC_LAST_MARK
   stm %r0,%r1,0(%r14)  # store updated user/system time
   clc __LC_LAST_MARK(8),__LC_JIFFY_TIMER # check if enough time
   jl  3f   # passed for a jiffy update
   l   %r1,BASED(.Ltime_warp)
   basr%r14,%r1
   .if \reload  # reload != 0 for system call
   lm  %r2,%r6,SP_R2(%r15)  # reload clobbered parameters
   .endif
3:

   .macro  UPDATE_LEAVE_TIME
   l   %r1,BASED(.Ltq_timer)# test if tq_timer list is empty
   x   %r1,0(%r1)   # tq_timer-next != tq_timer ?
   jz  0f
   l   %r1,BASED(.Ltq_timer_active)
   icm %r0,15,0(%r1)# timer event already added ?
   jnz 0f
   l   %r1,BASED(.Ltq_pending)
   basr%r14,%r1
0: lm  %r0,%r1,thread+_TSS_STIME(%r9) # load system time
   sl  %r1,__LC_LAST_MARK+4 # subtract last time mark
   bc  3,BASED(1f)  # borrow ?
   sl  %r0,BASED(.Lc1)
1: sl  %r0,__LC_LAST_MARK
   stck__LC_LAST_MARK   # make new mark
   al  %r1,__LC_LAST_MARK+4 # add new mark - added delta
   bc  12,BASED(2f) # carry ?
   al  %r0,BASED(.Lc1)
2: al  %r0,__LC_LAST_MARK
   stm %r0,%r1,thread+_TSS_STIME(%r9) # store system time
   .endm

The two macros UPDATE_ENTER_TIME and UPDATE_LEAVE_TIMER are executed
on every system entry/exit. In the case that no special work has to
be done less then 31 instruction are executed in addition to the
normal system entry/exit code. Special work has to be done if more
time then 1/HZ has passed (call time_warp), or if tq_timer contains
an element (call tq_pending).
The accuracy of the timer events has not changed. It still is 1/HZ.
The only thing this patch does is to avoid unneeded interruptions.
I'd be happy if this could be combined with a new, more accurate
timing method.

blue skies,
   Martin

Linux/390 Design  Development, IBM Deutschland Entwicklung GmbH
Schnaicherstr. 220, D-71032 Bblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Maciej W. Rozycki

On Tue, 10 Apr 2001, Zdenek Kabelac wrote:

 I think it would be wrong if we could not add new very usable features
 to the system just because some old hardware doesn't support it.

 s/old/crappy/ -- even old hardware can handle vsync IRQs fine.  E.g. TVGA
8900C. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Jamie Lokier

Mark Salisbury wrote:
  The complexity comes in when you want to maintain indexes into the list
  for quick insertion of new timers.  To get the current insert
  performance, for example, you would need pointers to (at least) each of
  the next 256 centasecond boundaries in the list.  But the list ages, so
  these pointers need to be continually updated.  The thought I had was to
  update needed pointers (and only those needed) each time an insert was
  done and a needed pointer was found to be missing or stale.  Still it
  adds complexity that the static structure used now doesn't have.
 
 actually, I think a head and tail pointer would be sufficient for most
 cases. (most new timers are either going to be a new head of list or a new
 tail, i.e. long duration timeouts that will never be serviced or short
 duration timers that are going to go off "real soon now (tm)")  the oddball
 cases would be mostly coming from user-space, i.e. nanosleep which a longerr
 list insertion disapears in the block/wakeup/context switch overhead

A pointer-based priority queue is really not a very complex thing, and
there are ways to optimise them for typical cases like the above.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Jamie Lokier

Maciej W. Rozycki wrote:
 On Tue, 10 Apr 2001, Zdenek Kabelac wrote:
 
  I think it would be wrong if we could not add new very usable features
  to the system just because some old hardware doesn't support it.
 
  s/old/crappy/ -- even old hardware can handle vsync IRQs fine.  E.g. TVGA
 8900C. 

Think of the original 64k and 256k VGA cards.  I think some of those
didn't have an irq, but did have a way to read the progress of the
raster, which you could PLL to using timer interrupts.  Some video games
still look fine at 320x200 :-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread george anzinger

Jamie Locker wrote:
 
 Mark Salisbury wrote:
   The complexity comes in when you want to maintain indexes into the list
   for quick insertion of new timers.  To get the current insert
   performance, for example, you would need pointers to (at least) each of
   the next 256 centasecond boundaries in the list.  But the list ages, so
   these pointers need to be continually updated.  The thought I had was to
   update needed pointers (and only those needed) each time an insert was
   done and a needed pointer was found to be missing or stale.  Still it
   adds complexity that the static structure used now doesn't have.
 
  actually, I think a head and tail pointer would be sufficient for most
  cases. (most new timers are either going to be a new head of list or a new
  tail, i.e. long duration timeouts that will never be serviced or short
  duration timers that are going to go off "real soon now (tm)")  the oddball
  cases would be mostly coming from user-space, i.e. nanosleep which a longerr
  list insertion disapears in the block/wakeup/context switch overhead
 
 A pointer-based priority queue is really not a very complex thing, and
 there are ways to optimise them for typical cases like the above.
 
Please do enlighten me.  The big problem in my mind is that the
pointers, pointing at points in time, are perishable.

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Oliver Xymoron

On Mon, 9 Apr 2001, Alan Cox wrote:

   Its worth doing even on the ancient x86 boards with the PIT.
 
  Note that programming the PIT is slw and doing it on every timer
  add_timer/del_timer would be a pain.

 You only have to do it occasionally.

 When you add a timer newer than the current one
   (arguably newer by at least 1/2*HZ sec)

That's only if we want to do no better than the current system. We'd want
a new variable called timer_margin or something, which would be dependent
on interrupt source and processor, and could be tuned up or down via
sysctl.

 When you finish running the timers at an interval and the new interval is
 significantly larger than the current one.

Make that larger or smaller. If we come out of a quiescent state (1 hz
interrupts, say) and start getting 10ms timers, we want to respond to them
right away.

 Remember each tick we poke the PIT anyway

We could also have a HZ_max tunable above which we would not try to
reprogram the interval. On older systems, this could be set at
100-200HZ...

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread John Alvord

On Wed, 11 Apr 2001 20:57:04 +0200, Jamie Lokier
[EMAIL PROTECTED] wrote:

george anzinger wrote:
  A pointer-based priority queue is really not a very complex thing, and
  there are ways to optimise them for typical cases like the above.
  
 Please do enlighten me.  The big problem in my mind is that the
 pointers, pointing at points in time, are perishable.

Um, I'm not sure what perishability has to do with anything.  Timers at
the moment can be added, deleted and destroyed and it's no big deal.

Look in an algorithms book under "priority queue".  They usually don't
say how to use a heap-ordered tree -- usually it's an array -- but you
can use a tree if you want.  In such a tree, each timer has a link to
_two_ next timers and one previous timer.  (The previous timer link is
only needed if you can delete timers before they expire, which is
required for Linux).

I'm not saying saying a heap-ordered tree is fastest.  But it's ok, and
doesn't require any more storage than the `struct timer' objects
themselves.

It's possible to optimise this kind of data structure rather a lot,
depending on how you want to use it.  Linux' current timer algorithm is
a pretty good example of a priority queue optimised for timer ticks,
where you don't mind doing a small amount of work per tick.

One notable complication with the kernel is that we don't want every
timer to run at its exactly allocated time, except for some special
timers.  For example, if you have 100 TCP streams and their
retransmission times are scheduled for 1.s, 1.0001s, 1.0002s, etc.,
you'd much rather just process them all at once as is done at the moment
by the 100Hz timer.  This is because cache locality is much more
important for TCP retransmits than transmit timing resolution.

There's literature online about this class of problems: look up "event
set" and "simulation" and "fast priority queue".

I bumped into a funny non-optimization a few years ago. A system with
a timer queue like the above had been "optimized" by keeping old timer
elements... ready for new tasks to link onto and activate. The
granularity was 1 millsecond. Over time, all timer values from 0 to
roughly 10 minutes had been used. That resulted in 60,000 permanent
storage fragments laying about... a significant fragmentation problem.
The end was a forced recycle every month or so.

john alvord
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer!

2001-04-11 Thread Bret Indrelee

Mikulas Patocka ([EMAIL PROTECTED]) wrote:
 Adding and removing timers happens much more frequently than PIT tick,
 so 
 comparing these times is pointless. 

 If you have some device and timer protecting it from lockup on buggy 
 hardware, you actually 

 send request to device 
 add timer 

 receive interrupt and read reply 
 remove timer 

 With the curent timer semantics, the cost of add timer and del timer is 
 nearly zero. If you had to reprogram the PIT on each request and reply,
 it 
 would slow things down. 

 Note that you call mod_timer also on each packet received - and in worst 
 case (which may happen), you end up reprogramming the PIT on each
 packet. 

You can still have nearly zero cost for the normal case. Avoiding worst
case behaviour is also pretty easy.

You only reprogram the PIT if you have to change the interval.

Keep all timers in a sorted double-linked list. Do the insert
intelligently, adding it from the back or front of the list depending on
where it is in relation to existing entries.

You only need to reprogram the interval timer when:
1. You've got a new entry at the head of the list
AND
2. You've reprogrammed the interval to something larger than the time to 
the new head of list.

In the case of a device timeout, it is usually not going to be inserted at
the head of the list. It is very seldom going to actually timeout.

Choose your interval wisely, only increasing it when you know it will pay
off. The best way of doing this would probably be to track some sort
of LCD for timeouts.

The real trick is to do a lot less processing on every tick than is
currently done. Current generation PCs can easily handle 1000s of
interrupts a second if you keep the overhead small.

-Bret

--
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
[EMAIL PROTECTED]   |  -Riff in The Quatrix


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread Mark Salisbury

 I suspect you might go for ticked if its overhead was less.  The thing
 that makes me think the overhead is high for tick less is the accounting
 and time slice stuff.  This has to be handled each system call and each
 schedule call.  These happen WAY more often than ticks...  Contrary
 wise, I would go for tick less if its overhead is the same or less under
 a reasonable load.  Of course, we would also want to check overload
 sensitivity.

as I said, i think the process time accounting is disjoint from the
ticked/tickless issue and should therefore be considered seperately..

in my experience with the tickless method, after all pending timers have
been serviced, then to determine the interval until the next interrupt
should be generated all that is needed is one 64 bit subtraction and a
register load (about 5 - 8 cycles)

I think we should spend some serious time with some quick and dirty
prototypes of both methods, both to characterize all the issues raised and
to refine our ideas when it comes time to implement this.

I also think that we should give some thought to implementing both an
improved ticked system and a tickless system to be chosen as CONFIG options
so that the developer putting together a system can make a choice based on
their needs, and not our whims.  I am more than willing to concede that
there is a class of customer to whom a ticked system is appropriate, but I
am quite sure that there is a class for whom the ticked model is completely
unacceptable.

 On a half way loaded system?  Do you think it is that high?  I sort of
 think that, once the system is loaded, there would be a useful timer
 tick most of the time.  Might be useful to do some measurements of this.

any non-useful tick takes away cycles that could be better used performing
FFT's

there are many kinds of loads, some which generate large numbers of timed
events and some that generate none or few.
I think we want to be able to provide good solutions for both cases even.

we should do lots of measurements.

Mark


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-11 Thread watermodem

Alan Cox wrote:
 
  Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
  and MIN_DELACK is 0.04s, TCP would hardly benefit from them.
 
 There are a considerable number of people who really do need 1Khz resolution.
 Midi is one of the example cases. That doesn't mean we have to go to a 1KHz
 timer, we may want to combine a 100Hz timer with a smart switch to 1Khz

As somebody who is now debating how to measure latencies in a 
giga-bit ethernet environment with several components doing 
L3 switching in much less than 10 micro-seconds ... (hardware)
I agree that some method is need to achieve higher resolutions.  
(Sigh... I will likely need to buy something big and expensive)
{this is for a system to make use of L. Yarrow's little protocol}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Karim Yaghmour

Mark Salisbury wrote:
> 
> It would probably be a good compile config option to allow fine or coarse
> process time accounting, that leaves the choice to the person setting up the
> system to make the choice based on their needs.
> 

I suggested this a while ago during a discussion about performance
measurement. This would be fairly easy to implement using the patch
provided with the Linux Trace Toolkit since all entry points and
exit points are known (and it already is available in post-mortem
analysis). Implementing the measurement code within the kernel should
be fairly easy to implement and it would be provided as part of the
compile option. All in all, given the measurements I made, I'd place
the overhead at around 1% for the computations. (The overhead is very
likely to be negligeable when eventual fixes are taken into account.)

===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Embedded and Real-Time Linux Expert
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread george anzinger

Mark Salisbury wrote:
> 
> > mark salisbury wrote:
> > >
> > > george anzinger wrote:
> > >
> > > > f) As noted, the account timers (task user/system times) would be much
> > > > more accurate with the tick less approach.  The cost is added code in
> > > > both the system call and the schedule path.
> > > >
> > > > Tentative conclusions:
> > > >
> > > > Currently we feel that the tick less approach is not acceptable due to
> > > > (f).  We felt that this added code would NOT be welcome AND would, in
> a
> > > > reasonably active system, have much higher overhead than any savings
> in
> > > > not having a tick.  Also (d) implies a list organization that will, at
> > > > the very least, be harder to understand.  (We have some thoughts here,
> > > > but abandoned the effort because of (f).)  We are, of course, open to
> > > > discussion on this issue and all others related to the project
> > > > objectives.
> > >
> > > f does not imply tick-less is not acceptable, it implies that better
> process time
> > > accounting is not acceptable.
> >
> > My thinking is that a timer implementation that forced (f) would have
> > problems gaining acceptance (even with me :).  I think a tick less
> > system does force this and this is why we have, at least for the moment,
> > abandoned it.  In no way does this preclude (f) as it is compatible with
> > either ticks or tick less time keeping.  On the other hand, the stated
> > project objectives do not include (f) unless, of course we do a tick
> > less time system.
> > >
> > > list organization is not complex, it is a sorted absolute time list.  I
> would
> > > argue that this is a hell of a lot easier to understand that ticks +
> offsets.
> >
> > The complexity comes in when you want to maintain indexes into the list
> > for quick insertion of new timers.  To get the current insert
> > performance, for example, you would need pointers to (at least) each of
> > the next 256 centasecond boundaries in the list.  But the list ages, so
> > these pointers need to be continually updated.  The thought I had was to
> > update needed pointers (and only those needed) each time an insert was
> > done and a needed pointer was found to be missing or stale.  Still it
> > adds complexity that the static structure used now doesn't have.
> 
> actually, I think a head and tail pointer would be sufficient for most
> cases. (most new timers are either going to be a new head of list or a new
> tail, i.e. long duration timeouts that will never be serviced or short
> duration timers that are going to go off "real soon now (tm)")  the oddball
> cases would be mostly coming from user-space, i.e. nanosleep which a longerr
> list insertion disapears in the block/wakeup/context switch overhead
> 
> > >
> > > still, better process time accounting should be a compile CONFIG option,
> not
> > > ignored and ruled out because some one thinks that is is to expensive in
> the
> > > general case.
> >
> > As I said above, we are not ruling it out, but rather, we are not
> > requiring it by going tick less.
> > As I said, it is not clear how you could get
> > CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every
> > jiffie.  What did you have in mind?
> 
> time accounting can be limited to the quantum expiration and voluntary
> yields in the tickless/useless case.
> 
> > For the most part, I agree.  I am not sure that it makes a lot of sense
> > to mix some of these options, however.  I think it comes down to the
> > question of benefit vs cost.  If keeping an old version around that is
> > not any faster or more efficient in any way would seem too costly to
> > me.  We would like to provide a system that is better in every way and
> > thus eliminate the need to keep the old one around.  We could leave it
> > in as a compile option so folks would have a fall back, I suppose.
> 
> I agree that some combinations don't make much sense _TO_ME_ but that
> doesn't mean they don't meet sombody's needs.
> 
> in my case (embedded, medium hard real time, massively parallel
> multicomputer)  the only choices that makes sense to my customers is
> tickless/useless in deployment and tickless/useful in
> development/profiling/optimization.

I suspect you might go for ticked if its overhead was less.  The thing
that makes me think the overhead is high for tick less is the accounting
and time slice stuff.  This has to be handled each system call and each
schedule call.  These happen WAY more often than ticks...  Contrary
wise, I would go for tick less if its overhead is the same or less under
a reasonable load.  Of course, we would also want to check overload
sensitivity.
> 
> in other cases ticked/useless makes sense (dumb old slow chips)
> 
> I have no idea who would want ticked/useful or hybrid.  i suggested hybrid
> as an alternative in case linus might be reluctant to gutting and replacing
> the sysclock.
> 
> >
> > An Issue no one has raised is that the tick less system would need to
> > start a timer each 

Re: No 100 HZ timer !

2001-04-10 Thread Mark Salisbury


> mark salisbury wrote:
> >
> > george anzinger wrote:
> >
> > > f) As noted, the account timers (task user/system times) would be much
> > > more accurate with the tick less approach.  The cost is added code in
> > > both the system call and the schedule path.
> > >
> > > Tentative conclusions:
> > >
> > > Currently we feel that the tick less approach is not acceptable due to
> > > (f).  We felt that this added code would NOT be welcome AND would, in
a
> > > reasonably active system, have much higher overhead than any savings
in
> > > not having a tick.  Also (d) implies a list organization that will, at
> > > the very least, be harder to understand.  (We have some thoughts here,
> > > but abandoned the effort because of (f).)  We are, of course, open to
> > > discussion on this issue and all others related to the project
> > > objectives.
> >
> > f does not imply tick-less is not acceptable, it implies that better
process time
> > accounting is not acceptable.
>
> My thinking is that a timer implementation that forced (f) would have
> problems gaining acceptance (even with me :).  I think a tick less
> system does force this and this is why we have, at least for the moment,
> abandoned it.  In no way does this preclude (f) as it is compatible with
> either ticks or tick less time keeping.  On the other hand, the stated
> project objectives do not include (f) unless, of course we do a tick
> less time system.
> >
> > list organization is not complex, it is a sorted absolute time list.  I
would
> > argue that this is a hell of a lot easier to understand that ticks +
offsets.
>
> The complexity comes in when you want to maintain indexes into the list
> for quick insertion of new timers.  To get the current insert
> performance, for example, you would need pointers to (at least) each of
> the next 256 centasecond boundaries in the list.  But the list ages, so
> these pointers need to be continually updated.  The thought I had was to
> update needed pointers (and only those needed) each time an insert was
> done and a needed pointer was found to be missing or stale.  Still it
> adds complexity that the static structure used now doesn't have.

actually, I think a head and tail pointer would be sufficient for most
cases. (most new timers are either going to be a new head of list or a new
tail, i.e. long duration timeouts that will never be serviced or short
duration timers that are going to go off "real soon now (tm)")  the oddball
cases would be mostly coming from user-space, i.e. nanosleep which a longerr
list insertion disapears in the block/wakeup/context switch overhead

> >
> > still, better process time accounting should be a compile CONFIG option,
not
> > ignored and ruled out because some one thinks that is is to expensive in
the
> > general case.
>
> As I said above, we are not ruling it out, but rather, we are not
> requiring it by going tick less.
> As I said, it is not clear how you could get
> CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every
> jiffie.  What did you have in mind?

time accounting can be limited to the quantum expiration and voluntary
yields in the tickless/useless case.

> For the most part, I agree.  I am not sure that it makes a lot of sense
> to mix some of these options, however.  I think it comes down to the
> question of benefit vs cost.  If keeping an old version around that is
> not any faster or more efficient in any way would seem too costly to
> me.  We would like to provide a system that is better in every way and
> thus eliminate the need to keep the old one around.  We could leave it
> in as a compile option so folks would have a fall back, I suppose.

I agree that some combinations don't make much sense _TO_ME_ but that
doesn't mean they don't meet sombody's needs.

in my case (embedded, medium hard real time, massively parallel
multicomputer)  the only choices that makes sense to my customers is
tickless/useless in deployment and tickless/useful in
development/profiling/optimization.

in other cases ticked/useless makes sense (dumb old slow chips)

I have no idea who would want ticked/useful or hybrid.  i suggested hybrid
as an alternative in case linus might be reluctant to gutting and replacing
the sysclock.

>
> An Issue no one has raised is that the tick less system would need to
> start a timer each time it scheduled a task.  This would lead to either
> slow but very precise time slicing or about what we have today with more
> schedule overhead.

no.  you would re-use the timer with a new expiration time and a re-insert.

also re overhead. the cost of servicing 10 times as many interrupts as
necessary for system function must cost a fair chunk.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread george anzinger

mark salisbury wrote:
> 
> george anzinger wrote:
> 
> > f) As noted, the account timers (task user/system times) would be much
> > more accurate with the tick less approach.  The cost is added code in
> > both the system call and the schedule path.
> >
> > Tentative conclusions:
> >
> > Currently we feel that the tick less approach is not acceptable due to
> > (f).  We felt that this added code would NOT be welcome AND would, in a
> > reasonably active system, have much higher overhead than any savings in
> > not having a tick.  Also (d) implies a list organization that will, at
> > the very least, be harder to understand.  (We have some thoughts here,
> > but abandoned the effort because of (f).)  We are, of course, open to
> > discussion on this issue and all others related to the project
> > objectives.
> 
> f does not imply tick-less is not acceptable, it implies that better process time
> accounting is not acceptable.

My thinking is that a timer implementation that forced (f) would have
problems gaining acceptance (even with me :).  I think a tick less
system does force this and this is why we have, at least for the moment,
abandoned it.  In no way does this preclude (f) as it is compatible with
either ticks or tick less time keeping.  On the other hand, the stated
project objectives do not include (f) unless, of course we do a tick
less time system.
> 
> list organization is not complex, it is a sorted absolute time list.  I would
> argue that this is a hell of a lot easier to understand that ticks + offsets.

The complexity comes in when you want to maintain indexes into the list
for quick insertion of new timers.  To get the current insert
performance, for example, you would need pointers to (at least) each of
the next 256 centasecond boundaries in the list.  But the list ages, so
these pointers need to be continually updated.  The thought I had was to
update needed pointers (and only those needed) each time an insert was
done and a needed pointer was found to be missing or stale.  Still it
adds complexity that the static structure used now doesn't have.
> 
> still, better process time accounting should be a compile CONFIG option, not
> ignored and ruled out because some one thinks that is is to expensive in the
> general case.

As I said above, we are not ruling it out, but rather, we are not
requiring it by going tick less.
> 
> the whole point of linux and CONFIG options is to get you the kernel with the
> features you want, not what someone else wants.
> 
> there should be a whole range of config options associated with this issue:
> 
> CONFIG_JIFFIES   == old jiffies implementation
> CONFIG_TICKLESS  == tickless
> CONFIG_HYBRID  == old jiffies plus a tickless high-res timer system on
> the side but not assoc w/ process and global
> timekeeping
> 
> CONFIG_USELESS_PROCESS_TIME_ACCOUNTING = old style, cheap time acctg
> CONFIG_USEFUL_BUT_COSTS_TIME_ACCOUNTING = accurate but expensive time accounting
> 
> this way, users who want tickless and lousy time acctg can have it AND people who
> want jiffies and good time acctg could have it.

As I said, it is not clear how you could get
CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every
jiffie.  What did you have in mind?
> 
> these features are largely disjoint and easily seperable.  it is also relatively
> trivial to do this in such a way that drivers depending on the jiffie abstraction
> can be supported without modification no matter what the configuration.
> 
For the most part, I agree.  I am not sure that it makes a lot of sense
to mix some of these options, however.  I think it comes down to the
question of benefit vs cost.  If keeping an old version around that is
not any faster or more efficient in any way would seem too costly to
me.  We would like to provide a system that is better in every way and
thus eliminate the need to keep the old one around.  We could leave it
in as a compile option so folks would have a fall back, I suppose.

An Issue no one has raised is that the tick less system would need to
start a timer each time it scheduled a task.  This would lead to either
slow but very precise time slicing or about what we have today with more
schedule overhead.

George


> Mark Salisbury
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Andi Kleen

On Tue, Apr 10, 2001 at 02:45:15PM -0400, Stephen D. Williams wrote:
> When this is rewritten, I would strongly suggest that we find a way to
> make 'gettimeofday' nearly free.  Certain applications need to use this
> frequently while still being portable.  One solution when you do have
> clock ticks is a read-only mapped Int.  Another cheap solution is
> library assembly that adds a cycle clock delta since last system call to
> a 'gettimeofday' value set on every system call return.

The upcoming x86-64 port supports vsyscalls for just that purpose.



-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread mark salisbury

george anzinger wrote:

> f) As noted, the account timers (task user/system times) would be much
> more accurate with the tick less approach.  The cost is added code in
> both the system call and the schedule path.
>
> Tentative conclusions:
>
> Currently we feel that the tick less approach is not acceptable due to
> (f).  We felt that this added code would NOT be welcome AND would, in a
> reasonably active system, have much higher overhead than any savings in
> not having a tick.  Also (d) implies a list organization that will, at
> the very least, be harder to understand.  (We have some thoughts here,
> but abandoned the effort because of (f).)  We are, of course, open to
> discussion on this issue and all others related to the project
> objectives.

f does not imply tick-less is not acceptable, it implies that better process time
accounting is not acceptable.

list organization is not complex, it is a sorted absolute time list.  I would
argue that this is a hell of a lot easier to understand that ticks + offsets.

still, better process time accounting should be a compile CONFIG option, not
ignored and ruled out because some one thinks that is is to expensive in the
general case.

the whole point of linux and CONFIG options is to get you the kernel with the
features you want, not what someone else wants.

there should be a whole range of config options associated with this issue:

CONFIG_JIFFIES   == old jiffies implementation
CONFIG_TICKLESS  == tickless
CONFIG_HYBRID  == old jiffies plus a tickless high-res timer system on
the side but not assoc w/ process and global
timekeeping

CONFIG_USELESS_PROCESS_TIME_ACCOUNTING = old style, cheap time acctg
CONFIG_USEFUL_BUT_COSTS_TIME_ACCOUNTING = accurate but expensive time accounting

this way, users who want tickless and lousy time acctg can have it AND people who
want jiffies and good time acctg could have it.

these features are largely disjoint and easily seperable.  it is also relatively
trivial to do this in such a way that drivers depending on the jiffie abstraction
can be supported without modification no matter what the configuration.

Mark Salisbury

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Stephen D. Williams

When this is rewritten, I would strongly suggest that we find a way to
make 'gettimeofday' nearly free.  Certain applications need to use this
frequently while still being portable.  One solution when you do have
clock ticks is a read-only mapped Int.  Another cheap solution is
library assembly that adds a cycle clock delta since last system call to
a 'gettimeofday' value set on every system call return.

sdw

Andi Kleen wrote:
> 
> On Tue, Apr 10, 2001 at 01:12:14PM +0100, Alan Cox wrote:
> > Measure the number of clocks executing a timer interrupt. rdtsc is fast. Now
> > consider the fact that out of this you get KHz or better scheduling
> > resolution required for games and midi. I'd say it looks good. I agree
> 
> And measure the number of cycles a gigahertz CPU can do between a 1ms timer.
> And then check how often the typical application executes something like
> gettimeofday.
> 
...

sdw
-- 
[EMAIL PROTECTED]  http://sdw.st
Stephen D. Williams
43392 Wayside Cir,Ashburn,VA 20147-4622 703-724-0118W 703-995-0407Fax 
Dec2000
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread george anzinger

Just for your information we have a project going that is trying to come
up with a good solution for all of this:

http://sourceforge.net/projects/high-res-timers

We have a mailing list there where we have discussed much of the same
stuff.  The mailing list archives are available at sourceforge.

Lets separate this into findings and tentative conclusions :)

Findings:

a) The University of Kansas and others have done a lot of work here.

b) High resolution timer events can be had with or without changing HZ.

c) High resolution timer events can be had with or without eliminating
the 1/HZ tick.

d) The organization of the timer list should reflect the existence of
the 1/HZ tick or not.  The current structure is not optimal for a "tick
less" implementation.  Better would be strict expire order with indexes
to "interesting times".

e) The current organization of the timer list generates a hiccup every
2.56 seconds to handle "cascading".  Hiccups are bad.

f) As noted, the account timers (task user/system times) would be much
more accurate with the tick less approach.  The cost is added code in
both the system call and the schedule path.  

Tentative conclusions:

Currently we feel that the tick less approach is not acceptable due to
(f).  We felt that this added code would NOT be welcome AND would, in a
reasonably active system, have much higher overhead than any savings in
not having a tick.  Also (d) implies a list organization that will, at
the very least, be harder to understand.  (We have some thoughts here,
but abandoned the effort because of (f).)  We are, of course, open to
discussion on this issue and all others related to the project
objectives.

We would reorganize the current timer list structure to eliminate the
cascade (e) and to add higher resolution entries.  The higher resolution
entries would carry an addition word which would be the fraction of a
jiffie that needs to be added to the jiffie value for the timer.  This
fraction would be in units defined by the platform to best suit the sub
jiffie interrupt generation code.  Each of the timer lists would then be
ordered by time based on this sub jiffie value.  In addition, in order
to eliminate the cascade, each timer list would carry all timers for
times that expire on the (jiffie mod (size of list)).  Thus, with the
current 256 first order lists, all timers with the same (jiffies & 255)
would be in the same list, again in expire order.  We also think that
the list size should be configurable to some power of two.  Again we
welcome discussion of these issues.

George

Alan Cox wrote:

>> It's also all interrupts, not only syscalls, and also context switch if you
>> want to be accurate.

>We dont need to be that accurate. Our sample rate is currently so low the
>data is worthless anyway
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Jamie Lokier

Alan Cox wrote:
> > > This is an X issue. I was talking with Jim Gettys about what is needed to
> > > get the relevant existing X extensions for this working
> > 
> > Last time I looked at XF86DGA (years ago), it seemed to have the right
> > hooks in place.  Just a matter of server implementation.  My
> > recollection is dusty though.
> 
> There is also a timing extension for synchronizing events to happenings. The
> stopper is the kernel interface for the vblank handling since the irq must
> be handled and cleared in kernel context before we return to X. We now think
> we know how to handle that cleanly

Except for cards which don't generate an irq on vsync but do let you see
how the raster is proceeding.  (I vaguely recall these).  For which a
PLL and, wait for it high resolution timer is needed.

Perhaps that's a 1990s problem that doesn't need a 2000s solution though :-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Alan Cox

> > This is an X issue. I was talking with Jim Gettys about what is needed to
> > get the relevant existing X extensions for this working
> 
> Last time I looked at XF86DGA (years ago), it seemed to have the right
> hooks in place.  Just a matter of server implementation.  My
> recollection is dusty though.

There is also a timing extension for synchronizing events to happenings. The
stopper is the kernel interface for the vblank handling since the irq must
be handled and cleared in kernel context before we return to X. We now think
we know how to handle that cleanly

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread yodaiken

On Tue, Apr 10, 2001 at 04:43:36AM -0700, David Schleef wrote:
> However, on machines without a monotonically increasing counter,
> i.e., the TSC, you have to use 8254 timer 0 as both the timebase
> and the interval counter -- you end up slowly losing time because
> of the race condition between reading the timer and writing a
> new interval.  RTAI's solution is to disable kd_mksound and
> use timer 2 as a poor man's TSC.  If either of those is too big
> of a price, it may suffice to report that the timer granularity
> on 486's is 10 ms.

Just for the record, Michael Barabanov did this in RTLinux from before
kd_mksound was a function pointer in 1995. Michael had an optimization
attempt using channel 1 for a while, but on very slow machines this 
was not sufficient and he went back to channel 2. Of course, the 
fundamental problem is that board designers keep putting an 1920s
part in machines built in 2001. 

Here's the comment from the RTLinux 0.5 patch -- all available on the archives
on rtlinux.com.

+/* The main procedure; resets the 8254 timer to generate an interrupt.  The
+ * tricky part is to keep the global time while reprogramming it.  We latch
+ * counters 0 and 2 atomically before and after reprogramming to figure it out.
+ */


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Jamie Lokier

Alan Cox wrote:
> > Games would like to be able to page flip at vertical refresh time --
> > <1ms accuracy please.  Network traffic shaping benefits from better than
> 
> This is an X issue. I was talking with Jim Gettys about what is needed to
> get the relevant existing X extensions for this working

Last time I looked at XF86DGA (years ago), it seemed to have the right
hooks in place.  Just a matter of server implementation.  My
recollection is dusty though.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Alan Cox

> Games would like to be able to page flip at vertical refresh time --
> <1ms accuracy please.  Network traffic shaping benefits from better than

This is an X issue. I was talking with Jim Gettys about what is needed to
get the relevant existing X extensions for this working

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Jamie Lokier

Mikulas Patocka wrote:
> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.

Indeed, using precise timers for TCP would probably degrade performance
-- you should process a group of timer events together, when resolution
is not that important.

There are plenty of apps that need higher resolution.  Software modem
comes to mind (guess why ;), though the device driver supplies the high
resolution timed interrupts in that case.

Games would like to be able to page flip at vertical refresh time --
<1ms accuracy please.  Network traffic shaping benefits from better than
1ms timing.  Video players want to display their frames preferably
without 10ms jitter.

Even that old classic game "snake" benefits from decent timing.  I
worked on an X multiplayer snake implementation which was very
unpleasant and jerky at first.  1. Disable nagle for X connection :-)
Better but still jerky.  2. Write delay loop like this:

 calculate next_event_time
 select (0, 0, 0, next_event_time - 20ms)
 while (gettimeofday() < next_event_time)
   /* Busy loop for last 20ms. */

It's no coincidence that I've had to write another very similar event
loop recently.  You can see, this sort of thing is a real waste of CPU.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Alan Cox

> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.

There are a considerable number of people who really do need 1Khz resolution.
Midi is one of the example cases. That doesn't mean we have to go to a 1KHz
timer, we may want to combine a 100Hz timer with a smart switch to 1Khz

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread schwidefsky



>BTW. Why we need to redesign timers at all? The cost of timer interrupt
>each 1/100 second is nearly zero (1000 instances on S/390 VM is not common
>case - it is not reasonable to degradate performance of timers because of
>this).
The cost of the timer interrupts on a single image system is neglectable,
true. As I already pointed out in the original proposal we are looking
for a solution that will allow us to minimize the costs of the timer
interrupts when we run many images. For us this case is not unusual and
it is reasonable to degrade performance of a running system by a very
small amount to get rid of the HZ timer. This proposal was never meant
to be the perfect solution for every platform, that is why it is
configuratable with the CONFIG_NO_HZ_TIMER option.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread root

Mikulas Patocka wrote:

> BTW. Why we need to redesign timers at all? The cost of timer interrupt
> each 1/100 second is nearly zero (1000 instances on S/390 VM is not common
> case - it is not reasonable to degradate performance of timers because of
> this).
>
> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.
>

well, I can think dozens of real time applications off the top of my head that
need beter than 1ms timing resolution (think sensor fusion)  1000 clock
interrupts/sec is wasteful when what you need is 1 very precisely timed
interrupt.

why do we redesign anything?  to make it better.  TCP is not the only thing in
the system.


if you are in love with the existing system, it shouldn't be hard to make it a
config option.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: No 100 HZ timer !

2001-04-10 Thread Andi Kleen

On Tue, Apr 10, 2001 at 04:10:28PM +0200, Mikulas Patocka wrote:
> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.

On networking bandwidth shaping and traffic control generally need higher precision 
timers.
There are also people who don't like the minimum 10ms delay for non-busy wait, it's
apparently a problem for some apps.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   >