subject:"A question about timecounters"

RE: A question about timecounters

2002-02-06 Thread DOROVSKOY,IGOR (A-Portsmouth,ex1)


>be open source.  It's a simulated web client and web server, running
>inside the kernel.  It's good for load-testing and performance-testing
>many kinds of network devices.  With two 1-GHz PIII boxes (one acting
>as the client and the other acting as the server) it can generate
>around 5 (actually I think it's more than that) full web sessions
>per second.  Also, you can dial in any rate you want, and it will
>generate that rate very precisely.  Lots of fun!
... yea yea :-) It reminds me an 2 years old feature in our product (was
QARobot
originally, now it's a part of RouterTester) where we had an Session Storm
module generates HTTP traffic with kernel support too. I did remember
how it was killing any httpd server in secs even from one slave with
possible
up to 20 with thousands ssm's running on each cpu.
Unfortunately it's obsolete now (qa and development, but code and qbol
language support still be inside new product for free :). I may try to find
some
performance data tomorrow if it's interesting for you off cause...

Igor.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread M. Warner Losh


In message: <[EMAIL PROTECTED]>
John Polstra <[EMAIL PROTECTED]> writes:
: I'm testing that now.  But for how long would microuptime have to
: be interrupted to make this happen?  Surely not 7.81 seconds!  On
: this same machine I have a curses application running which is
: updating the screen once a second.  It never misses a beat, and
: userland is very responsive.

Silly hypothesis.  It isn't losing 7.81 seconds of time.  Rather it is
being interrupt at the wrong time and the wrap detection code works
badly when interrupted at that point.  So you aren't blocked for 7.81
seconds, but rather for 0.02 seconds since you are seeing the 7.79
jump.  You are basically catching the wrong edge of the phase of the
TSC.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Terry Lambert

John Polstra wrote:
> After 25 minutes testing that with NTIMECOUNTER=5, I haven't
> gotten any microuptime messages.  So it appears that my problem was
> just that the current timecounter wrapped all the way around the ring
> while microuptime was interrupted, due to the high HZ value and the
> heavy interrupt load.  I'm sorry I didn't try this sooner, when you
> suggested it.
> 
> After lunch I'll try NTIMECOUNTER=HZ (1 in my case).  That sounds
> like a nice default value to me.

5 * HZ @ 10,000 HZ * sizeof(struct timecounter)
= 5 * 1 * 23*4
= 4,600,000
= 4.5M in timecounter structures

Even with your NTIMECOUNTER=HZ (assuming it even works), that
is 1M of memory burned on timecounter structures to prevent
wrap around.

If the interrupts are happening fast enough that wrap is this
bad a problem, then I suggest another counter that divides
the interupt frequency down to a (much smaller) update frequency
for the timecounter.

Is having this many of these things really worthwhile?  I
can't see how off the top of my head, but I can imagine a
situation where it miht be referenced with a higher than
wrap frequency... but the question is, what's the highest
access frequency?  That should limit the update frequency
to twice that, right, which would guarantee adequate update
resolution to satisfy that use?

If it is, could we maybe just count wraps, instead, and
add 7.7?

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Bakul Shah


> Is C a great language, or what? ;-)

Nah, just mediocre even when it comes to obfuscation!
Have you played with unlambda?!

> The way I always remember it is that you read the declaration
> inside-out: starting with the variable name and then heading toward
> the outside while obeying the precedence rules.  When you hit a "*",
> you say "pointer to"; when you hit "[]", you say "array of"; and when
> you hit "()" you say "function returning."  For example:

I remember something about switching declaration reading
direction when you hit a bracket; but why bother once you
have cdecl?

cdecl> declare f as array of pointer to function returning pointer to function 
returning int  
int (*(*f[])())()

It is not clear to me how to apply your rule.  It doesn't
matter though, it is gotten to the point where I can only
store ptrs to ptrs to information in my ever shrinking brain!

To the people who pointed out the cdecl port, I did look in
/usr/ports/devel but missed cdecl somehow.  Sigh... :-)

-- bakul

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Anthony Schneider


> PS: Chances are most people don't have cdecl any more.  You
> can get it like this:

cd /usr/ports/devel/cdecl && make install

:)

-Anthony.




msg31489/pgp0.pgp
Description: PGP signature

Re: A question about timecounters

2002-02-05 Thread Kutulu


On Tue, Feb 05, 2002 at 02:42:38PM -0800, Bakul Shah wrote:
> 
> PS: Chances are most people don't have cdecl any more.  You
> can get it like this:
> 

You can also get it like this:

cd /usr/ports/devel/cdecl ; make install

which I just went and did.  Pretty helpful utility :)

--K


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Bakul Shah  <[EMAIL PROTECTED]> wrote:

> [I see that jdp has answered your question but] cdecl is your friend!
> 
> $ cdecl
> Type `help' or `?' for help
> cdecl> explain volatile struct timecounter *timecounter
> declare timecounter as pointer to volatile struct timecounter
> cdecl> declare timecounter as volatile pointer to struct timecounter
> struct timecounter * volatile timecounter

Is C a great language, or what? ;-)

The way I always remember it is that you read the declaration
inside-out: starting with the variable name and then heading toward
the outside while obeying the precedence rules.  When you hit a "*",
you say "pointer to"; when you hit "[]", you say "array of"; and when
you hit "()" you say "function returning."  For example:

struct timecounter * volatile timecounter;
/* "Timecounter is a volatile pointer to a struct timecounter." */

volatile struct timecounter *timecounter;
/* "Timecounter is a pointer to a struct timecounter which is volatile." */

The reason for the awkward "which is" in that last one is just because
C lets you get sloppy with the ordering of the outermost keywords.
The pedantically correct way to declare a pointer to volatile struct
is like this:

struct timecounter volatile *timecounter;
/* "Timecounter is a pointer to a volatile struct timecounter." */

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Bakul Shah


> Btw, regarding the volatile thing:
> 
> If I do
>   extern volatile struct timecounter *timecounter;
> 
>   microtime()
>   {
>   struct timecounter *tc;
> 
>   tc = timecounter;
> 
> The compiler complains about loosing the volatile thing.
> 
> How do I tell it that it is the contents of the "timecounter" pointer which
> is volatile, but now what it points at ?  I don't want the "tc" pointer to
> be volatile because it obviously isn't.  Do I really need to cast it ?
> 
>   tc = (struct timecounter *)timecounter;

[I see that jdp has answered your question but] cdecl is your friend!

$ cdecl
Type `help' or `?' for help
cdecl> explain volatile struct timecounter *timecounter
declare timecounter as pointer to volatile struct timecounter
cdecl> declare timecounter as volatile pointer to struct timecounter
struct timecounter * volatile timecounter

-- bakul

PS: Chances are most people don't have cdecl any more.  You
can get it like this:

mkdir cdecl;cd cdecl
fetch 
ftp://gatekeeper.dec.com/pub/usenet/comp.sources.unix/volume14/cdecl2/part0{1,2}.Z
gzcat part01.Z | gunshar
gzcat part02.Z | gunshar
patch <<'EOF'
diff -ru ../cdecl-orig/cdecl.c ./cdecl.c
--- ../cdecl-orig/cdecl.c   Tue Feb  5 14:24:23 2002
+++ ./cdecl.c   Tue Feb  5 12:12:30 2002
@@ -57,6 +57,9 @@
 # include 
 # include 
 # include 
+#ifdef BSD
+#include 
+#endif
 #else
 # ifndef NOVARARGS
 #  include 
@@ -110,6 +113,9 @@
   void docast(char*, char*, char*, char*);
   void dodexplain(char*, char*, char*, char*);
   void docexplain(char*, char*, char*, char*);
+#ifdef __FreeBSD__
+#define setprogname _bad_bad_bad_FreeBSD
+#endif
   void setprogname(char *);
   int dotmpfile(int, char**), dofileargs(int, char**);
 #else
diff -ru ../cdecl-orig/makefile ./makefile
--- ../cdecl-orig/makefile  Tue Feb  5 14:24:19 2002
+++ ./makefile  Tue Feb  5 12:10:10 2002
@@ -13,7 +13,7 @@
 # add -DdodebugTo compile in debugging trace statements.
 # add -Ddoyydebug  To compile in yacc trace statements.
 
-CFLAGS= -g -Ddodebug -Ddoyydebug
+CFLAGS= -g -Ddodebug -Ddoyydebug -DBSD
 CC= cc
 ALLFILES= makefile cdgram.y cdlex.l cdecl.c cdecl.1 testset testset++
 BIN= /usr/lubin
EOF
make
# as root:
make install BIN=/usr/local/bin

No idea if c++decl is valid any more!

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> 
> Well, either way I will commit the volatile and this NTIMECOUNTER to
> -current now, it's certainly better than what is there now.

Great, thanks.

> Thanks for the help, I owe you one at BSDcon!

I'll look forward to it!

> Ohh, and btw: do I need to say that I'm dying to know what the heck
> you are doing with that box ?  :-)

Making the room a good bit warmer, that's for sure. :-)

First I should mention for the benefit of those listening in that it
isn't something I can release publicly, and it will probably never
be open source.  It's a simulated web client and web server, running
inside the kernel.  It's good for load-testing and performance-testing
many kinds of network devices.  With two 1-GHz PIII boxes (one acting
as the client and the other acting as the server) it can generate
around 5 (actually I think it's more than that) full web sessions
per second.  Also, you can dial in any rate you want, and it will
generate that rate very precisely.  Lots of fun!

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, John Polstra writes:

>After 25 minutes testing that with NTIMECOUNTER=5, I haven't
>gotten any microuptime messages.  So it appears that my problem was
>just that the current timecounter wrapped all the way around the ring
>while microuptime was interrupted, due to the high HZ value and the
>heavy interrupt load.  I'm sorry I didn't try this sooner, when you
>suggested it.
>
>After lunch I'll try NTIMECOUNTER=HZ (1 in my case).  That sounds
>like a nice default value to me.

Yup, if we preempt things for more than a second we have other problems
too I think.

My fault for not letting it depend in HZ in the first place.

>It would be interesting to see whether this same fix also works in
>-current.  I'm not sure yet if I can test that here or not.  I haven't
>tried building my netgraph module under -current yet.

Well, either way I will commit the volatile and this NTIMECOUNTER to
-current now, it's certainly better than what is there now.

Thanks for the help, I owe you one at BSDcon!

Poul-Henning

Ohh, and btw: do I need to say that I'm dying to know what the heck
you are doing with that box ?  :-)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> Could you try this combination:
> 
>   NTIMECOUNTER = HZ  (or even 5 * HZ)
>   tco_method = 0
>   no splhigh protection for microuptime() ?

After 25 minutes testing that with NTIMECOUNTER=5, I haven't
gotten any microuptime messages.  So it appears that my problem was
just that the current timecounter wrapped all the way around the ring
while microuptime was interrupted, due to the high HZ value and the
heavy interrupt load.  I'm sorry I didn't try this sooner, when you
suggested it.

After lunch I'll try NTIMECOUNTER=HZ (1 in my case).  That sounds
like a nice default value to me.

It would be interesting to see whether this same fix also works in
-current.  I'm not sure yet if I can test that here or not.  I haven't
tried building my netgraph module under -current yet.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> >I don't follow that.  As I read the code, the "current" timecounter
> >is only advanced every second -- not every 1/HZ seconds.  Why should
> >more of them be needed when HZ is large?
> 
> No, only if you have set tco_method to one, if tco_method is zero (default)
> we update the timecounter every HZ.

Oh, you're right.  Sorry, I misread the code.  My tco_method is 0,
so it will do the update every HZ.

> Could you try this combination:
> 
>   NTIMECOUNTER = HZ  (or even 5 * HZ)
>   tco_method = 0
>   no splhigh protection for microuptime() ?

Yep, I'll try that.

> If I do
>   extern volatile struct timecounter *timecounter;
> 
>   microtime()
>   {
>   struct timecounter *tc;
> 
>   tc = timecounter;
> 
> The compiler complains about loosing the volatile thing.
> 
> How do I tell it that it is the contents of the "timecounter" pointer which
> is volatile, but now what it points at ?

You want:

extern struct timecounter *volatile timecounter;

and also change the definition of the variable in the .c file.  You
won't get any warnings from that.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, John Polstra writes:
>OK, adding the splhigh() around the body of microuptime seems to have
>solved the problem.  After 45 minutes of running the same test as
>before, I haven't gotten a single message.  If I get one later, I'll
>let you know.

Ok, so we know where the bogotism happens, now to identify it...

>I don't follow that.  As I read the code, the "current" timecounter
>is only advanced every second -- not every 1/HZ seconds.  Why should
>more of them be needed when HZ is large?

No, only if you have set tco_method to one, if tco_method is zero (default)
we update the timecounter every HZ.

>> You didn't say if you ran with standard NTIMECOUNTER right now,
>> but 5 would be awfully short time at HZ=1: 500 usec...
>
>Well, microseconds aren't what they used to be ... :-) But isn't it
>true that the current timecounter only advances every second?  I think
>I have 5 seconds, not 5/HZ seconds.

Depends on your tco_method...

Could you try this combination:

NTIMECOUNTER = HZ  (or even 5 * HZ)
tco_method = 0
no splhigh protection for microuptime() ?

The reason why tco_method=1 isn't nice is that we loose the ability to use
1/hz precision cached timestamps in the get*() funtions.

Btw, regarding the volatile thing:

If I do
extern volatile struct timecounter *timecounter;

microtime()
{
struct timecounter *tc;

tc = timecounter;

The compiler complains about loosing the volatile thing.

How do I tell it that it is the contents of the "timecounter" pointer which
is volatile, but now what it points at ?  I don't want the "tc" pointer to
be volatile because it obviously isn't.  Do I really need to cast it ?

tc = (struct timecounter *)timecounter;

That looks silly to me...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


OK, adding the splhigh() around the body of microuptime seems to have
solved the problem.  After 45 minutes of running the same test as
before, I haven't gotten a single message.  If I get one later, I'll
let you know.

> >I'm testing that now.  But for how long would microuptime have to
> >be interrupted to make this happen?  Surely not 7.81 seconds!  On
> >this same machine I have a curses application running which is
> >updating the screen once a second.  It never misses a beat, and
> >userland is very responsive.
> 
> Well, that is what I don't understand yet either :-)
> 
> The fact that the delta is not exactly 2^32 * cpu clock is probably
> blindingly obviously indicative of "why", but I havn't solved the
> puzzle yet...

I've been staring at the code, but I haven't gotten it yet either.

> Since you are running with a 1 HZ, NTIMECOUNTER should probably
> be considerably increased.  (Actually it might be a good idea to
> simply set NTIMECOUNTER == hz ... hmmm...)

I don't follow that.  As I read the code, the "current" timecounter
is only advanced every second -- not every 1/HZ seconds.  Why should
more of them be needed when HZ is large?

> You didn't say if you ran with standard NTIMECOUNTER right now,
> but 5 would be awfully short time at HZ=1: 500 usec...

Well, microseconds aren't what they used to be ... :-) But isn't it
true that the current timecounter only advances every second?  I think
I have 5 seconds, not 5/HZ seconds.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Nate Williams


> >How are issues (1) and (3) above different?
> >
> >ps. I'm just trying to understand, and am *NOT* trying to start a
> >flame-war. :) :) :)
> 
> If the starvation happens to hardclock() or rather tc_windup() the effect
> will be cummulative and show up in permanent jumps in the output of date
> for instance.  In stable hardclock() is spl-protected so this would be
> _really_ bad news.
> 
> If the starvation happens in any of {micro|nano}[up]time() (but not the
> "get&" variants!) the it will result in a single spurious reading.

Ok, the bulb is starting to grow from dim to bright. :)

> The premise for avoiding locking in the access functions to timecounters
> where precisely that we could trust them to not be pre-empted for long
> enough for the hardware to roll over, if this is not the case we loose
> because the overflow in the hardware counter means that the timecounter
> we calculate from is not valid for the delta we get from the hardware.
> 
> I'm not sure this answers your question, if not it is not bad will, just
> me not understanding the question :-)

*grin*

I think I understand the problem.  Let me try to rephrase to make sure.

1) If we have an interrupt lockout (*NOT* due to time-counting code),
   then we'd have a problem since the hardclock would never get run.

2) If however, the locking done to protect the timecounter code happens
   to make getting/setting the timecounter take too long, we'd get
   similar results, but for *completely* different reasons.

Let me be more precise.

(1)
  cli();
   /* Take a really long time doing something */
  sti();

(2)
  /* Do something */
  gettime();  /* Takes a really long time to complete */

The first is harder to track down/fix, simply because you don't know
*who* the offender is.  The latter is essentially the same problem to
fix, but *may* be easier to fix since the offending code *IS* the
timecounter code.

Am I even close to understanding?



Nate

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, John Polstra writes:
>In article <[EMAIL PROTECTED]>,
>Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
>> In message <[EMAIL PROTECTED]>, John Polstra writes:
>> >Yes, I think you're onto something now.  It's a 550 MHz. machine, so
>> >the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
>> >seconds. :-)
>> 
>> In that case I'm almost willing to put an AnchorSteam on microuptime()
>> being interrupted for more than good is in which case the splhigh() should
>> cure it.
>
>I'm testing that now.  But for how long would microuptime have to
>be interrupted to make this happen?  Surely not 7.81 seconds!  On
>this same machine I have a curses application running which is
>updating the screen once a second.  It never misses a beat, and
>userland is very responsive.

Well, that is what I don't understand yet either :-)

The fact that the delta is not exactly 2^32 * cpu clock is probably
blindingly obviously indicative of "why", but I havn't solved the
puzzle yet...

Since you are running with a 1 HZ, NTIMECOUNTER should probably
be considerably increased.  (Actually it might be a good idea to
simply set NTIMECOUNTER == hz ... hmmm...)

We could be seing a situation where a process is preempted in
microuptime() and the timecounter ring being recycled *and*
the hardware counter overflowing before it completes, that might
give this problem.

You didn't say if you ran with standard NTIMECOUNTER right now,
but 5 would be awfully short time at HZ=1: 500 usec...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, Nate Williams writes:

>How are issues (1) and (3) above different?
>
>ps. I'm just trying to understand, and am *NOT* trying to start a
>flame-war. :) :) :)

If the starvation happens to hardclock() or rather tc_windup() the effect
will be cummulative and show up in permanent jumps in the output of date
for instance.  In stable hardclock() is spl-protected so this would be
_really_ bad news.

If the starvation happens in any of {micro|nano}[up]time() (but not the
"get&" variants!) the it will result in a single spurious reading.

The premise for avoiding locking in the access functions to timecounters
where precisely that we could trust them to not be pre-empted for long
enough for the hardware to roll over, if this is not the case we loose
because the overflow in the hardware counter means that the timecounter
we calculate from is not valid for the delta we get from the hardware.

I'm not sure this answers your question, if not it is not bad will, just
me not understanding the question :-)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> >Yes, I think you're onto something now.  It's a 550 MHz. machine, so
> >the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
> >seconds. :-)
> 
> In that case I'm almost willing to put an AnchorSteam on microuptime()
> being interrupted for more than good is in which case the splhigh() should
> cure it.

I'm testing that now.  But for how long would microuptime have to
be interrupted to make this happen?  Surely not 7.81 seconds!  On
this same machine I have a curses application running which is
updating the screen once a second.  It never misses a beat, and
userland is very responsive.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Nate Williams


> >> >> Can you try to MFC rev 1.111 and see if that changes anything ?
> >> >
> >> >That produced some interesting results.  I am still testing under
> >> >very heavy network interrupt load.  With the change from 1.111, I
> >> >still get the microuptime messages about as often.  But look how
> >> >much larger the reported backwards jumps are:
> >> >
> >> >microuptime() went backwards (896.225603 -> 888.463636)
> >> >microuptime() went backwards (896.225603 -> 888.494440)
> >> >microuptime() went backwards (896.225603 -> 888.500875)
> >> >microuptime() went backwards (1184.392277 -> 1176.603001)
> >> >microuptime() went backwards (1184.392277 -> 1176.603749)
> >> 
> >> (Ok, I'll MFC 1.111)
> >
> >Huh?  It appears that 1.111 makes things worse, not better (larger
> >jumps).
> 
> No, 1.111 makes the jumps report more correctly I think. 

Now, if that ain't a glowing reason to MFC it, I don't know one (I
think). :) :)

> They will maybe save your meal in less bad cases than yours, but in
> yours they just make sure that we don't get invalid number of
> microseconds in a timeval, and consequently we get more honest output.

How can you verify that this is the case?

> >> We now have three options left:
> >>hardclock interrupt starvation 
> >
> >This is Bruce's hypothesis, right?
> 
> Also mine for that matter.
> 
> >>scheduling related anomaly wrt to the use of microuptime().
> >>arithmetic overflow because the call to microuptime() gets
> >>interrupted for too long.
> >
> >'Interrupted for too long'.  Do you mean 'not interrupted enough', aka
> >a long interrupt blockage?  (I'm trying to understand here.)
> 
> See my previous email, I just explained it there.

I still didn't understand, hence the reason for the question.  (The
explanation was in the email I originall responded to).

I understand the 'overflow' issue, but it would seem to my naive
thinking that it would occur only when interrupts are blocked for a
period of time, which is the same as hardclock interrupt starvation in
my mind.

How are issues (1) and (3) above different?



Nate

ps. I'm just trying to understand, and am *NOT* trying to start a
flame-war. :) :) :)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, John Polstra writes:
>In article <[EMAIL PROTECTED]>,
>Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
>> In message <[EMAIL PROTECTED]>, John Polstra writes:
>> >In article <[EMAIL PROTECTED]>,
>> >John Polstra  <[EMAIL PROTECTED]> wrote:
>> >
>> >Another interesting thing is that the jumps are always 7.7x seconds
>> >back -- usually 7.79 seconds.  This is even true with more data points
>> >from two different machines.
>> 
>> Yes, I noticed, but didn't dare draw conclusions based on two data points.
>
>It's pretty consistent -- always 7.7somthing.
>
>> This points to an arithmetic overflow (ie: point 3 in my previous email)
>
>Yes, I think you're onto something now.  It's a 550 MHz. machine, so
>the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
>seconds. :-)

In that case I'm almost willing to put an AnchorSteam on microuptime()
being interrupted for more than good is in which case the splhigh() should
cure it.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, Nate Williams writes:
>> >> Can you try to MFC rev 1.111 and see if that changes anything ?
>> >
>> >That produced some interesting results.  I am still testing under
>> >very heavy network interrupt load.  With the change from 1.111, I
>> >still get the microuptime messages about as often.  But look how
>> >much larger the reported backwards jumps are:
>> >
>> >microuptime() went backwards (896.225603 -> 888.463636)
>> >microuptime() went backwards (896.225603 -> 888.494440)
>> >microuptime() went backwards (896.225603 -> 888.500875)
>> >microuptime() went backwards (1184.392277 -> 1176.603001)
>> >microuptime() went backwards (1184.392277 -> 1176.603749)
>> 
>> (Ok, I'll MFC 1.111)
>
>Huh?  It appears that 1.111 makes things worse, not better (larger
>jumps).

No, 1.111 makes the jumps report more correctly I think.  They will
maybe save your meal in less bad cases than yours, but in yours they
just make sure that we don't get invalid number of microseconds in
a timeval, and consequently we get more honest output.

>> We now have three options left:
>>  hardclock interrupt starvation 
>
>This is Bruce's hypothesis, right?

Also mine for that matter.

>>  scheduling related anomaly wrt to the use of microuptime().
>>  arithmetic overflow because the call to microuptime() gets
>>  interrupted for too long.
>
>'Interrupted for too long'.  Do you mean 'not interrupted enough', aka
>a long interrupt blockage?  (I'm trying to understand here.)

See my previous email, I just explained it there.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, John Polstra writes:
>In article <[EMAIL PROTECTED]>,

>> This may be a problem, I have yet to see GCC make different code for
>> that but I should probably have committed the "volatile" anyway.
>
>It should be committed, but it is not causing the problem in this
>case.  I changed it and then compared MD5s of the object files.  The
>only changes that resulted were unimportant.

Will do.

>> >I also noticed this in tco_forward():
>> >
>> >tco = timecounter;
>> >tc = sync_other_counter();
>> >[...]
>> >if (tco->tc_poll_pps)
>> 
>> This code is actually correct, the tc_poll_pps needs to be done on
>> the "old" timecounter, because that would be the reference for any
>> captured hardware timestamps, if I did it on the new timecounter I
>> might get negative deltas which would complicate things.  Also the
>> new timecounter may have a changed frequency/offset (tickadj/ntpd
>> and all that).
>
>I don't think I follow your reasoning here.  If the call to
>sync_other_counter were inlined, we'd have something like this:
>
>tco = timecounter;
>tco_in_sync_other_counter = timecounter;
>[...]
>if (tco->tc_poll_pps)
>
>Obviously tco and tco_in_sync_other_counter will have the same value
>almost all of the time, so the code can't be relying on them being
>different.

It doesn't rely on them being different, they are used for two different
purposes.  sync_other_counter() sets up the next counter in the ring
to be used for timekeeping, tco/tc_poll_pps is used to calculate
any hardware captured timestamps which occured _before_ now.

sync_other_counter() returns the next timecounter after fiddling it
but we cannot use that for tc_poll_pps() purposes, we need the old
(or if you will: still present one) for that.

Trust me, this is not your problem because this is used only with
the xrpu.c timecounter code anyway and you don't have the hardware
for that :-)

>Many of the systems where I see this problem are using the TSC as
>the timecounter.  They don't have APM in the kernel, and they aren't
>running ntpd.  I.e., it's not only the i8254 that's the problem.  The
>fastest of these systems is a 1.13 GHz PIII, and it would take the 32
>bits of the TSC which are actually used 3.8 seconds to wrap around.

It would be a problem if the kernel is punched hard enough that code
doesn't complete in the time it takes for the used timecounter to wrap,
ie:  if you can have something like:

call microuptime()
tc = timecounter;
(interrupt and do other stuff for several seconds)
... overflow in arithmetic 

But you would have to pummel your kernel pretty bad for that.  On the
other hand, between a 1 Hz and the network load you describe, that
may be exactly what is happening...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> 
> Sanity-check: this is NOT a multi-CPU system, right ?

Right.  These are all single-CPU systems with non-SMP -stable
kernels.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Nate Williams


> >> Can you try to MFC rev 1.111 and see if that changes anything ?
> >
> >That produced some interesting results.  I am still testing under
> >very heavy network interrupt load.  With the change from 1.111, I
> >still get the microuptime messages about as often.  But look how
> >much larger the reported backwards jumps are:
> >
> >microuptime() went backwards (896.225603 -> 888.463636)
> >microuptime() went backwards (896.225603 -> 888.494440)
> >microuptime() went backwards (896.225603 -> 888.500875)
> >microuptime() went backwards (1184.392277 -> 1176.603001)
> >microuptime() went backwards (1184.392277 -> 1176.603749)
> 
> (Ok, I'll MFC 1.111)

Huh?  It appears that 1.111 makes things worse, not better (larger
jumps).

Can you explain why you think this is a good things, since it seems to
be a bad thing to me.

> Sanity-check: this is NOT a multi-CPU system, right ?

As stated before, both are > 1Ghz single-CPU systems running -stable,
although I'm sure John is capable of a answering this on his own. :)

> We now have three options left:
>   hardclock interrupt starvation 

This is Bruce's hypothesis, right?

>   scheduling related anomaly wrt to the use of microuptime().
>   arithmetic overflow because the call to microuptime() gets
>   interrupted for too long.

'Interrupted for too long'.  Do you mean 'not interrupted enough', aka
a long interrupt blockage?  (I'm trying to understand here.)



Nate

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> >In article <[EMAIL PROTECTED]>,
> >John Polstra  <[EMAIL PROTECTED]> wrote:
> >
> >Another interesting thing is that the jumps are always 7.7x seconds
> >back -- usually 7.79 seconds.  This is even true with more data points
> >from two different machines.
> 
> Yes, I noticed, but didn't dare draw conclusions based on two data points.

It's pretty consistent -- always 7.7somthing.

> This points to an arithmetic overflow (ie: point 3 in my previous email)

Yes, I think you're onto something now.  It's a 550 MHz. machine, so
the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
seconds. :-)

I'll try the things you suggested in your other mail.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, John Polstra writes:
>In article <[EMAIL PROTECTED]>,
>John Polstra  <[EMAIL PROTECTED]> wrote:
>> In article <[EMAIL PROTECTED]>,
>> Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
>> > In message <[EMAIL PROTECTED]>, John Polstra writes:
>> > 
>> > Can you try to MFC rev 1.111 and see if that changes anything ?
>> 
>> That produced some interesting results.  I am still testing under
>> very heavy network interrupt load.  With the change from 1.111, I
>> still get the microuptime messages about as often.  But look how
>> much larger the reported backwards jumps are:
>> 
>> microuptime() went backwards (896.225603 -> 888.463636)
>> microuptime() went backwards (896.225603 -> 888.494440)
>> microuptime() went backwards (896.225603 -> 888.500875)
>> microuptime() went backwards (1184.392277 -> 1176.603001)
>> microuptime() went backwards (1184.392277 -> 1176.603749)
>
>Another interesting thing is that the jumps are always 7.7x seconds
>back -- usually 7.79 seconds.  This is even true with more data points
>from two different machines.

Yes, I noticed, but didn't dare draw conclusions based on two data points.

This points to an arithmetic overflow (ie: point 3 in my previous email)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, John Polstra writes:
>In article <[EMAIL PROTECTED]>,
>Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
>> In message <[EMAIL PROTECTED]>, John Polstra writes:
>> 
>> Can you try to MFC rev 1.111 and see if that changes anything ?
>
>That produced some interesting results.  I am still testing under
>very heavy network interrupt load.  With the change from 1.111, I
>still get the microuptime messages about as often.  But look how
>much larger the reported backwards jumps are:
>
>microuptime() went backwards (896.225603 -> 888.463636)
>microuptime() went backwards (896.225603 -> 888.494440)
>microuptime() went backwards (896.225603 -> 888.500875)
>microuptime() went backwards (1184.392277 -> 1176.603001)
>microuptime() went backwards (1184.392277 -> 1176.603749)

(Ok, I'll MFC 1.111)

Sanity-check: this is NOT a multi-CPU system, right ?

We now have three options left:
hardclock interrupt starvation 
scheduling related anomaly wrt to the use of microuptime().
arithmetic overflow because the call to microuptime() gets
interrupted for too long.

Looking for the first, add this to hardclock:

static u_int64_t  lasttsc;
u_int64_t delta, now;

now = rdtsc();
delta = now - lasttsc;
lasttsc = now;
if (delta > (tsc_freq * 3) / (2 * hz)) {
printf("delta = %llu \n", delta);
}

In theory, you get no printfs.  If you do, hopefully they have small
delta values.

Looking for the third:

Add splhigh() protection around the stuff in microuptime();

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
John Polstra  <[EMAIL PROTECTED]> wrote:
> In article <[EMAIL PROTECTED]>,
> Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> > In message <[EMAIL PROTECTED]>, John Polstra writes:
> > 
> > Can you try to MFC rev 1.111 and see if that changes anything ?
> 
> That produced some interesting results.  I am still testing under
> very heavy network interrupt load.  With the change from 1.111, I
> still get the microuptime messages about as often.  But look how
> much larger the reported backwards jumps are:
> 
> microuptime() went backwards (896.225603 -> 888.463636)
> microuptime() went backwards (896.225603 -> 888.494440)
> microuptime() went backwards (896.225603 -> 888.500875)
> microuptime() went backwards (1184.392277 -> 1176.603001)
> microuptime() went backwards (1184.392277 -> 1176.603749)

Another interesting thing is that the jumps are always 7.7x seconds
back -- usually 7.79 seconds.  This is even true with more data points
from two different machines.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> 
> Can you try to MFC rev 1.111 and see if that changes anything ?

That produced some interesting results.  I am still testing under
very heavy network interrupt load.  With the change from 1.111, I
still get the microuptime messages about as often.  But look how
much larger the reported backwards jumps are:

microuptime() went backwards (896.225603 -> 888.463636)
microuptime() went backwards (896.225603 -> 888.494440)
microuptime() went backwards (896.225603 -> 888.500875)
microuptime() went backwards (1184.392277 -> 1176.603001)
microuptime() went backwards (1184.392277 -> 1176.603749)

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> 
> >Agreed.  But in the cases I'm worrying about right now, the
> >timecounter is the TSC.
> 
> Now, *that* is very interesting, how reproducible is it ?

I can reproduce it pretty easily.  This is from a PIII/550:

web1# uptime
 5:23PM  up 37 mins, 2 users, load averages: 0.08, 0.05, 0.01
web1# dmesg | grep microuptime
microuptime() went backwards (1992.7347381 -> 1991.587620)
microuptime() went backwards (1992.7347381 -> 1991.620385)
microuptime() went backwards (1992.7347381 -> 1991.621582)
microuptime() went backwards (2016.7063298 -> 2015.272466)
microuptime() went backwards (2286.7346482 -> 2285.587587)
microuptime() went backwards (2286.7346482 -> 2285.593646)
microuptime() went backwards (2286.7346482 -> 2285.595103)
microuptime() went backwards (2286.7346482 -> 2285.617023)

Now, two things are unusual about this system.  First, I have it
pumping packets from inside the kernel (with a special netgraph node)
at a rate which is keeping the system about 95% busy processing
interrupts.  (That's about 330K packets/second.)  That definitely
makes it happen more often.  But I also see it occasionally when the
system is sitting idle.  Second, I have HZ set to 1.  I think that
also contributes to the problem.  It shouldn't, though.  Not even a
PII/400 breathes hard at HZ=1.

On another system which is mostly idle, I see it about 4 times a
day.  That system also has HZ set to 1.  It's a PIII/1.13 GHz:

microuptime() went backwards (200671.582473 -> 200671.582472)
microuptime() went backwards (222622.358518 -> 222622.358517)
microuptime() went backwards (228934.897839 -> 228934.897838)
microuptime() went backwards (246166.386301 -> 246166.386300)
microuptime() went backwards (263484.488575 -> 263484.488574)

Notice the difference on this system that's fairly idle.  On the
other system, it was the seconds going backward.  Here it's the
microseconds.  These may be two entirely different problems.

> Can you try to MFC rev 1.111 and see if that changes anything ?

Sure.  I'll let you know.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, John Polstra writes:

>Agreed.  But in the cases I'm worrying about right now, the
>timecounter is the TSC.

Now, *that* is very interesting, how reproducible is it ?

Can you try to MFC rev 1.111 and see if that changes anything ?


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> >In article <[EMAIL PROTECTED]>,
> >John Baldwin  <[EMAIL PROTECTED]> wrote:
> >> 
> >> > like, "If X is never locked out for longer than Y, this problem
> >> > cannot happen."  I'm looking for definitions of X and Y.  X might be
> >> > hardclock() or softclock() or non-interrupt kernel processing.  Y
> >> > would be some measure of time, probably a function of HZ and/or the
> >> > timecounter frequency.
> >> 
> >> X is hardclock I think, since hardclock() calls tc_windup().
> >
> >That makes sense, but on the other hand hardclock seems unlikely to be
> >delayed by much.  The only thing that can block hardclock is another
> >hardclock, an splclock, or an splhigh.  And, maybe, splstatclock.  I'm
> >talking about -stable here, which is where I'm doing my experiments.
> 
> Try swapping so you use the RTC for hardclock & statclock.
> 
> Let the i8254 run with 65536 divisor and do only timecounter service.
> 
> That would be a very interresting experiment.

Agreed.  But in the cases I'm worrying about right now, the
timecounter is the TSC.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Poul-Henning Kamp  <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]>, John Polstra writes:
> >That's the global variable named "timecounter", right?  I did notice
> >one potential problem: that variable is not declared volatile.  So
> >in this part ...
> 
> This may be a problem, I have yet to see GCC make different code for
> that but I should probably have committed the "volatile" anyway.

It should be committed, but it is not causing the problem in this
case.  I changed it and then compared MD5s of the object files.  The
only changes that resulted were unimportant.

> >I also noticed this in tco_forward():
> >
> >tco = timecounter;
> >tc = sync_other_counter();
> > [...]
> >if (tco->tc_poll_pps)
> >
> >But sync_other_counter() loads its own copy of "timecounter",
> >and there's no guarantee it hasn't changed from the value that
> >tco_forward() saved in its local variable.  I'm not sure yet if
> >that's a potential problem.  It could corrected by passing "tco" as
> >an argument to sync_other_counter.  I'll try that too.
> 
> This code is actually correct, the tc_poll_pps needs to be done on
> the "old" timecounter, because that would be the reference for any
> captured hardware timestamps, if I did it on the new timecounter I
> might get negative deltas which would complicate things.  Also the
> new timecounter may have a changed frequency/offset (tickadj/ntpd
> and all that).

I don't think I follow your reasoning here.  If the call to
sync_other_counter were inlined, we'd have something like this:

tco = timecounter;
tco_in_sync_other_counter = timecounter;
[...]
if (tco->tc_poll_pps)

Obviously tco and tco_in_sync_other_counter will have the same value
almost all of the time, so the code can't be relying on them being
different.

Anyway, I realize now that this also isn't the problem, because
tco_forward is only ever called at splclock.  It can't be
interrupted or re-entered, at least not on the uniprocessor -stable
systems I'm looking at.

> There is one more failure mode which you have overlooked:  The individual
> timecounters maintain a binary counter of a certain width, if interrupt
> latency gets too bad, this may overflow.
> 
> This is a non-issue for the TSC, which is 64bit wide in hardware.

Many of the systems where I see this problem are using the TSC as
the timecounter.  They don't have APM in the kernel, and they aren't
running ntpd.  I.e., it's not only the i8254 that's the problem.  The
fastest of these systems is a 1.13 GHz PIII, and it would take the 32
bits of the TSC which are actually used 3.8 seconds to wrap around.

> Hope this helps...

Yep, thanks.  I have some ideas of other things to try.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, John Polstra writes:
>In article <[EMAIL PROTECTED]>,
>John Baldwin  <[EMAIL PROTECTED]> wrote:
>> 
>> > like, "If X is never locked out for longer than Y, this problem
>> > cannot happen."  I'm looking for definitions of X and Y.  X might be
>> > hardclock() or softclock() or non-interrupt kernel processing.  Y
>> > would be some measure of time, probably a function of HZ and/or the
>> > timecounter frequency.
>> 
>> X is hardclock I think, since hardclock() calls tc_windup().
>
>That makes sense, but on the other hand hardclock seems unlikely to be
>delayed by much.  The only thing that can block hardclock is another
>hardclock, an splclock, or an splhigh.  And, maybe, splstatclock.  I'm
>talking about -stable here, which is where I'm doing my experiments.

Try swapping so you use the RTC for hardclock & statclock.

Let the i8254 run with 65536 divisor and do only timecounter service.

That would be a very interresting experiment.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread John Polstra


In article <[EMAIL PROTECTED]>,
John Baldwin  <[EMAIL PROTECTED]> wrote:
> 
> > like, "If X is never locked out for longer than Y, this problem
> > cannot happen."  I'm looking for definitions of X and Y.  X might be
> > hardclock() or softclock() or non-interrupt kernel processing.  Y
> > would be some measure of time, probably a function of HZ and/or the
> > timecounter frequency.
> 
> X is hardclock I think, since hardclock() calls tc_windup().

That makes sense, but on the other hand hardclock seems unlikely to be
delayed by much.  The only thing that can block hardclock is another
hardclock, an splclock, or an splhigh.  And, maybe, splstatclock.  I'm
talking about -stable here, which is where I'm doing my experiments.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp


In message <[EMAIL PROTECTED]>, "M. Warner Losh" writes:
>In message: <[EMAIL PROTECTED]>
>Poul-Henning Kamp <[EMAIL PROTECTED]> writes:
>: But the i8254 is a piece of shit in this context, and due to
>: circumstances (apm being enabled0 most machines end up using the
>: i8254 by default.
>: 
>: My (and I belive Bruce's) diagnosis so far is that most problems
>: come from the i8254 timecounter.
>
>We measured pps interrupts with the i8254 timecounter in a fast
>interrupt handler via the parallel port (yes, we hacked it to give us
>a fast interrupt).  We found lots of outliers on the order of a few
>milliseconds in the data that we had to discard because they were
>obviously bogus.  We don't know if this is because of interrupt
>latency or because of bugs in the 8254 timecounter code/hardware.  At
>the time, it wasn't important enough to do a detailed numerology on to
>see if more data couldn't be mined from it or not.  And the data that
>we saw the outliers in was somewhat processed from the original
>data...

I have not tried to measure the i8254 against my hardware solution,
but by now I belive that certain bogus chipsets may have bummed
the 'latch' command or more than that maybe.

Anyway, some, but not all of the i8254 issues could be eased up a
bit if we lost the pcaudio crap and used the RTC's 128 Hz signal
for Hz and let the i8254 run at a 65536 count all the time.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread M. Warner Losh


In message: <[EMAIL PROTECTED]>
Poul-Henning Kamp <[EMAIL PROTECTED]> writes:
: But the i8254 is a piece of shit in this context, and due to
: circumstances (apm being enabled0 most machines end up using the
: i8254 by default.
: 
: My (and I belive Bruce's) diagnosis so far is that most problems
: come from the i8254 timecounter.

We measured pps interrupts with the i8254 timecounter in a fast
interrupt handler via the parallel port (yes, we hacked it to give us
a fast interrupt).  We found lots of outliers on the order of a few
milliseconds in the data that we had to discard because they were
obviously bogus.  We don't know if this is because of interrupt
latency or because of bugs in the 8254 timecounter code/hardware.  At
the time, it wasn't important enough to do a detailed numerology on to
see if more data couldn't be mined from it or not.  And the data that
we saw the outliers in was somewhat processed from the original
data...

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message <[EMAIL PROTECTED]>, John Polstra writes:
>Mike Smith  <[EMAIL PROTECTED]> wrote:
>> 
>> It's not necessarily caused by interrupt latency.  Here's the assumption 
>> that's being made.
>[...]
>
>Thanks for the superb explanation!  I appreciate it.

My apologies for never getting the timecounter paper finished and
published, I should really do that...

>That's the global variable named "timecounter", right?  I did notice
>one potential problem: that variable is not declared volatile.  So
>in this part ...

This may be a problem, I have yet to see GCC make different code for
that but I should probably have committed the "volatile" anyway.

>I also noticed this in tco_forward():
>
>tco = timecounter;
>tc = sync_other_counter();
>   [...]
>if (tco->tc_poll_pps)
>
>But sync_other_counter() loads its own copy of "timecounter",
>and there's no guarantee it hasn't changed from the value that
>tco_forward() saved in its local variable.  I'm not sure yet if
>that's a potential problem.  It could corrected by passing "tco" as
>an argument to sync_other_counter.  I'll try that too.

This code is actually correct, the tc_poll_pps needs to be done on
the "old" timecounter, because that would be the reference for any
captured hardware timestamps, if I did it on the new timecounter I
might get negative deltas which would complicate things.  Also the
new timecounter may have a changed frequency/offset (tickadj/ntpd
and all that).

>> There are a couple of possible problems with this mechanism.
>> 
>> One is that the ring "catches up" with your saved copy of the
>> "current" pointer, ie. inbetween fetching the pointer and reading the
>> timecounter contents, the "next" pointer passes over you again in such
>> a fashion that you get garbage out of the structure.

This is unlikely.

>> Another is that there is a race between multiple updaters of the
>> timecounter; if two parties are both updating the "next" timecounter
>> along with another party trying to get the "current" time, this could
>> cause corruption.

I have worries in this respect with SMPng, but have yet to see a
corrupted timecounter ring.

There is one more failure mode which you have overlooked:  The individual
timecounters maintain a binary counter of a certain width, if interrupt
latency gets too bad, this may overflow.

This is a non-issue for the TSC, which is 64bit wide in hardware.

It should also be a non-issue for the PIIX which is at least 24 bits
in hardware.

But the i8254 is a piece of shit in this context, and due to
circumstances (apm being enabled0 most machines end up using the
i8254 by default.

My (and I belive Bruce's) diagnosis so far is that most problems
come from the i8254 timecounter.

I made a commit recently which made the core-code more robust to
bad interrupt jitter/latency, basically it would return timestamps
with too many microseconds or nanoseconds because it only tried to
roll over to seconds ones.  Now it while()'s over that loop.

Hope this helps...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-04 Thread John Baldwin



On 04-Feb-02 John Polstra wrote:
> In article <[EMAIL PROTECTED]>,
> Dominic Marks  <[EMAIL PROTECTED]> wrote:
>> On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
>> > I'm trying to understand the timecounter code, and in particular the
>> > reason for the "microuptime went backwards" messages which I see on
>> > just about every machine I have, whether running -stable or -current.
>> 
>> I see them everywhere with -CURRENT, but not at all with -STABLE. This is
>> with two seperate machines. Perhaps that may add clues.
> 
> I'm looking for something less empirical than that.  When somebody
> says this problem is caused by too much interrupt latency, I assume
> they have a mental model of what is going wrong when this excessive
> latency occurs.  Given that, it should be possible to make a statement
> like, "If X is never locked out for longer than Y, this problem
> cannot happen."  I'm looking for definitions of X and Y.  X might be
> hardclock() or softclock() or non-interrupt kernel processing.  Y
> would be some measure of time, probably a function of HZ and/or the
> timecounter frequency.

X is hardclock I think, since hardclock() calls tc_windup().  I'm not sure what
Y is except that it is indeed a known value.  phk should know as he is Mr.
Timecounter.

> John
> -- 
>   John Polstra
>   John D. Polstra & Co., Inc.Seattle, Washington USA
>   "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

-- 

John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-04 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Mike Smith  <[EMAIL PROTECTED]> wrote:
> 
> It's not necessarily caused by interrupt latency.  Here's the assumption 
> that's being made.
[...]

Thanks for the superb explanation!  I appreciate it.

> There is a ring of timecounter structures, of some size.  In testing,
> I've used sizes of a thousand or more, but still seen this problem.
> 
> There is a pointer to the "current" timecounter structure.

That's the global variable named "timecounter", right?  I did notice
one potential problem: that variable is not declared volatile.  So
in this part ...

> When one wishes to read the current time, one proceeds as follows:
> 
>  - Get the "current" pointer and save it locally.
>  - Read the timecounter structure via the local "current" pointer.

... the compiler is perfectly free to reread the global multiple
times in the function rather than using the saved local copy.  If the
"current" pointer has moved in that time, we'll an inconsistent view
of the timecounter.  In looking at the generated code I haven't found
any actual instances of that.  But I'll try making it volatile just to
make sure.  Even if it doesn't cause any problems currently, I think
we should change it to volatile since it could start to cause problems
some day.

I also noticed this in tco_forward():

tco = timecounter;
tc = sync_other_counter();
[...]
if (tco->tc_poll_pps)

But sync_other_counter() loads its own copy of "timecounter",
and there's no guarantee it hasn't changed from the value that
tco_forward() saved in its local variable.  I'm not sure yet if
that's a potential problem.  It could corrected by passing "tco" as
an argument to sync_other_counter.  I'll try that too.

> There are a couple of possible problems with this mechanism.
> 
> One is that the ring "catches up" with your saved copy of the
> "current" pointer, ie. inbetween fetching the pointer and reading the
> timecounter contents, the "next" pointer passes over you again in such
> a fashion that you get garbage out of the structure.

As you mentioned, with a large enough ring this should be impossible.
If I read the code correctly, the "current" pointer is only moved
once per second.  So in the current ring of 4 counters (number 0 is
special), it would take 4 seconds to wrap around the ring.  That's a
pretty long time.

> Another is that there is a race between multiple updaters of the
> timecounter; if two parties are both updating the "next" timecounter
> along with another party trying to get the "current" time, this could
> cause corruption.
> 
> All that interrupt latency will do is make the updates late; I can't
> actually see how it could cause corruption.  Corruption has to be
> caused by mishandling of the timecounter ring in some fashion.

I agree.

> Note that you can probably eliminate the ring loop theory by
> allocating a very large number of entries in the ring by setting
> NTIMECOUNTER (kern/kern_tc.c) higher.  The structures are small; try
> 100,000 or so.

OK, but even the thousand you tried should give a cushion of more
than 16 minutes.

> If you can reproduce under these circumstances, try adding some checks
> to make sure the "current" timecounter pointer is behaving
> monotonically; just save the last timecounter pointer in microtime()
> et. al.
> 
> Another test worth performing is to look at the tco_delta function for
> the timecounter and make sure that it returns a sane value, and one
> that doesn't behave out of synch with the interrupt handler that updates
> the timecounter proper.  If you save the delta value in the timecounter 
> and zero it when it's updated, you can catch this.
> 
> You can rule this out by using getmicroptime() rather than
> microuptime(); it may return the same value twice, which isn't
> desirable, but that would be better than nothing.
> 
> Hope this helps a bit.

Yep, thanks again.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-04 Thread Mike Smith

> In article <[EMAIL PROTECTED]>,
> Dominic Marks  <[EMAIL PROTECTED]> wrote:
> > On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
> > > I'm trying to understand the timecounter code, and in particular the
> > > reason for the "microuptime went backwards" messages which I see on
> > > just about every machine I have, whether running -stable or -current.
> > 
> > I see them everywhere with -CURRENT, but not at all with -STABLE. This is
> > with two seperate machines. Perhaps that may add clues.
> 
> I'm looking for something less empirical than that.  When somebody
> says this problem is caused by too much interrupt latency, I assume
> they have a mental model of what is going wrong when this excessive
> latency occurs.

It's not necessarily caused by interrupt latency.  Here's the assumption 
that's being made.

There is a ring of timecounter structures, of some size.  In testing,
I've used sizes of a thousand or more, but still seen this problem.

There is a pointer to the "current" timecounter structure.

When the "current" time is updated, the following procedure is followed:

 - Find the "next" timecounter in the ring.
 - Update its contents with the new current time.
 - Move the "current" pointer.

When one wishes to read the current time, one proceeds as follows:

 - Get the "current" pointer and save it locally.
 - Read the timecounter structure via the local "current" pointer.

Since the operations on the "current" pointer are atomic, there is no 
need to lock the structure.

There are a couple of possible problems with this mechanism.

One is that the ring "catches up" with your saved copy of the
"current" pointer, ie. inbetween fetching the pointer and reading the
timecounter contents, the "next" pointer passes over you again in such
a fashion that you get garbage out of the structure.

Another is that there is a race between multiple updaters of the
timecounter; if two parties are both updating the "next" timecounter
along with another party trying to get the "current" time, this could
cause corruption.

All that interrupt latency will do is make the updates late; I can't
actually see how it could cause corruption.  Corruption has to be
caused by mishandling of the timecounter ring in some fashion.

Note that you can probably eliminate the ring loop theory by
allocating a very large number of entries in the ring by setting
NTIMECOUNTER (kern/kern_tc.c) higher.  The structures are small; try
100,000 or so.

If you can reproduce under these circumstances, try adding some checks
to make sure the "current" timecounter pointer is behaving
monotonically; just save the last timecounter pointer in microtime()
et. al.

Another test worth performing is to look at the tco_delta function for
the timecounter and make sure that it returns a sane value, and one
that doesn't behave out of synch with the interrupt handler that updates
the timecounter proper.  If you save the delta value in the timecounter 
and zero it when it's updated, you can catch this.

You can rule this out by using getmicroptime() rather than
microuptime(); it may return the same value twice, which isn't
desirable, but that would be better than nothing.

Hope this helps a bit.

Regards,
Mike

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-04 Thread John Polstra

In article <[EMAIL PROTECTED]>,
Dominic Marks  <[EMAIL PROTECTED]> wrote:
> On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
> > I'm trying to understand the timecounter code, and in particular the
> > reason for the "microuptime went backwards" messages which I see on
> > just about every machine I have, whether running -stable or -current.
> 
> I see them everywhere with -CURRENT, but not at all with -STABLE. This is
> with two seperate machines. Perhaps that may add clues.

I'm looking for something less empirical than that.  When somebody
says this problem is caused by too much interrupt latency, I assume
they have a mental model of what is going wrong when this excessive
latency occurs.  Given that, it should be possible to make a statement
like, "If X is never locked out for longer than Y, this problem
cannot happen."  I'm looking for definitions of X and Y.  X might be
hardclock() or softclock() or non-interrupt kernel processing.  Y
would be some measure of time, probably a function of HZ and/or the
timecounter frequency.

John
-- 
  John Polstra
  John D. Polstra & Co., Inc.Seattle, Washington USA
  "Disappointment is a good sign of basic intelligence."  -- Chögyam Trungpa

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: A question about timecounters

2002-02-04 Thread Dominic Marks


On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
> I'm trying to understand the timecounter code, and in particular the
> reason for the "microuptime went backwards" messages which I see on
> just about every machine I have, whether running -stable or -current.

I see them everywhere with -CURRENT, but not at all with -STABLE. This is
with two seperate machines. Perhaps that may add clues.

> This problem is usually attributed to too much interrupt latency.  My
> question is, how much latency is "too much"?  Which interrupt has to
> be locked out for how long in order to see these messages?
> 
> John
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-hackers" in the body of the message

-- 
Dominic

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

A question about timecounters

2002-02-04 Thread John Polstra


I'm trying to understand the timecounter code, and in particular the
reason for the "microuptime went backwards" messages which I see on
just about every machine I have, whether running -stable or -current.
This problem is usually attributed to too much interrupt latency.  My
question is, how much latency is "too much"?  Which interrupt has to
be locked out for how long in order to see these messages?

John

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

44 matches

Mail list logo