RE: A question about timecounters

2002-02-06 Thread DOROVSKOY,IGOR (A-Portsmouth,ex1)

be open source.  It's a simulated web client and web server, running
inside the kernel.  It's good for load-testing and performance-testing
many kinds of network devices.  With two 1-GHz PIII boxes (one acting
as the client and the other acting as the server) it can generate
around 5 (actually I think it's more than that) full web sessions
per second.  Also, you can dial in any rate you want, and it will
generate that rate very precisely.  Lots of fun!
... yea yea :-) It reminds me an 2 years old feature in our product (was
QARobot
originally, now it's a part of RouterTester) where we had an Session Storm
module generates HTTP traffic with kernel support too. I did remember
how it was killing any httpd server in secs even from one slave with
possible
up to 20 with thousands ssm's running on each cpu.
Unfortunately it's obsolete now (qa and development, but code and qbol
language support still be inside new product for free :). I may try to find
some
performance data tomorrow if it's interesting for you off cause...

Igor.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:
Mike Smith  [EMAIL PROTECTED] wrote:
 
 It's not necessarily caused by interrupt latency.  Here's the assumption 
 that's being made.
[...]

Thanks for the superb explanation!  I appreciate it.

My apologies for never getting the timecounter paper finished and
published, I should really do that...

That's the global variable named timecounter, right?  I did notice
one potential problem: that variable is not declared volatile.  So
in this part ...

This may be a problem, I have yet to see GCC make different code for
that but I should probably have committed the volatile anyway.

I also noticed this in tco_forward():

tco = timecounter;
tc = sync_other_counter();
   [...]
if (tco-tc_poll_pps)

But sync_other_counter() loads its own copy of timecounter,
and there's no guarantee it hasn't changed from the value that
tco_forward() saved in its local variable.  I'm not sure yet if
that's a potential problem.  It could corrected by passing tco as
an argument to sync_other_counter.  I'll try that too.

This code is actually correct, the tc_poll_pps needs to be done on
the old timecounter, because that would be the reference for any
captured hardware timestamps, if I did it on the new timecounter I
might get negative deltas which would complicate things.  Also the
new timecounter may have a changed frequency/offset (tickadj/ntpd
and all that).

 There are a couple of possible problems with this mechanism.
 
 One is that the ring catches up with your saved copy of the
 current pointer, ie. inbetween fetching the pointer and reading the
 timecounter contents, the next pointer passes over you again in such
 a fashion that you get garbage out of the structure.

This is unlikely.

 Another is that there is a race between multiple updaters of the
 timecounter; if two parties are both updating the next timecounter
 along with another party trying to get the current time, this could
 cause corruption.

I have worries in this respect with SMPng, but have yet to see a
corrupted timecounter ring.

There is one more failure mode which you have overlooked:  The individual
timecounters maintain a binary counter of a certain width, if interrupt
latency gets too bad, this may overflow.

This is a non-issue for the TSC, which is 64bit wide in hardware.

It should also be a non-issue for the PIIX which is at least 24 bits
in hardware.

But the i8254 is a piece of shit in this context, and due to
circumstances (apm being enabled0 most machines end up using the
i8254 by default.

My (and I belive Bruce's) diagnosis so far is that most problems
come from the i8254 timecounter.


I made a commit recently which made the core-code more robust to
bad interrupt jitter/latency, basically it would return timestamps
with too many microseconds or nanoseconds because it only tried to
roll over to seconds ones.  Now it while()'s over that loop.

Hope this helps...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread M. Warner Losh

In message: [EMAIL PROTECTED]
Poul-Henning Kamp [EMAIL PROTECTED] writes:
: But the i8254 is a piece of shit in this context, and due to
: circumstances (apm being enabled0 most machines end up using the
: i8254 by default.
: 
: My (and I belive Bruce's) diagnosis so far is that most problems
: come from the i8254 timecounter.

We measured pps interrupts with the i8254 timecounter in a fast
interrupt handler via the parallel port (yes, we hacked it to give us
a fast interrupt).  We found lots of outliers on the order of a few
milliseconds in the data that we had to discard because they were
obviously bogus.  We don't know if this is because of interrupt
latency or because of bugs in the 8254 timecounter code/hardware.  At
the time, it wasn't important enough to do a detailed numerology on to
see if more data couldn't be mined from it or not.  And the data that
we saw the outliers in was somewhat processed from the original
data...

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], M. Warner Losh writes:
In message: [EMAIL PROTECTED]
Poul-Henning Kamp [EMAIL PROTECTED] writes:
: But the i8254 is a piece of shit in this context, and due to
: circumstances (apm being enabled0 most machines end up using the
: i8254 by default.
: 
: My (and I belive Bruce's) diagnosis so far is that most problems
: come from the i8254 timecounter.

We measured pps interrupts with the i8254 timecounter in a fast
interrupt handler via the parallel port (yes, we hacked it to give us
a fast interrupt).  We found lots of outliers on the order of a few
milliseconds in the data that we had to discard because they were
obviously bogus.  We don't know if this is because of interrupt
latency or because of bugs in the 8254 timecounter code/hardware.  At
the time, it wasn't important enough to do a detailed numerology on to
see if more data couldn't be mined from it or not.  And the data that
we saw the outliers in was somewhat processed from the original
data...

I have not tried to measure the i8254 against my hardware solution,
but by now I belive that certain bogus chipsets may have bummed
the 'latch' command or more than that maybe.

Anyway, some, but not all of the i8254 issues could be eased up a
bit if we lost the pcaudio crap and used the RTC's 128 Hz signal
for Hz and let the i8254 run at a 65536 count all the time.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
John Baldwin  [EMAIL PROTECTED] wrote:
 
  like, If X is never locked out for longer than Y, this problem
  cannot happen.  I'm looking for definitions of X and Y.  X might be
  hardclock() or softclock() or non-interrupt kernel processing.  Y
  would be some measure of time, probably a function of HZ and/or the
  timecounter frequency.
 
 X is hardclock I think, since hardclock() calls tc_windup().

That makes sense, but on the other hand hardclock seems unlikely to be
delayed by much.  The only thing that can block hardclock is another
hardclock, an splclock, or an splhigh.  And, maybe, splstatclock.  I'm
talking about -stable here, which is where I'm doing my experiments.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 That's the global variable named timecounter, right?  I did notice
 one potential problem: that variable is not declared volatile.  So
 in this part ...
 
 This may be a problem, I have yet to see GCC make different code for
 that but I should probably have committed the volatile anyway.

It should be committed, but it is not causing the problem in this
case.  I changed it and then compared MD5s of the object files.  The
only changes that resulted were unimportant.

 I also noticed this in tco_forward():
 
 tco = timecounter;
 tc = sync_other_counter();
  [...]
 if (tco-tc_poll_pps)
 
 But sync_other_counter() loads its own copy of timecounter,
 and there's no guarantee it hasn't changed from the value that
 tco_forward() saved in its local variable.  I'm not sure yet if
 that's a potential problem.  It could corrected by passing tco as
 an argument to sync_other_counter.  I'll try that too.
 
 This code is actually correct, the tc_poll_pps needs to be done on
 the old timecounter, because that would be the reference for any
 captured hardware timestamps, if I did it on the new timecounter I
 might get negative deltas which would complicate things.  Also the
 new timecounter may have a changed frequency/offset (tickadj/ntpd
 and all that).

I don't think I follow your reasoning here.  If the call to
sync_other_counter were inlined, we'd have something like this:

tco = timecounter;
tco_in_sync_other_counter = timecounter;
[...]
if (tco-tc_poll_pps)

Obviously tco and tco_in_sync_other_counter will have the same value
almost all of the time, so the code can't be relying on them being
different.

Anyway, I realize now that this also isn't the problem, because
tco_forward is only ever called at splclock.  It can't be
interrupted or re-entered, at least not on the uniprocessor -stable
systems I'm looking at.

 There is one more failure mode which you have overlooked:  The individual
 timecounters maintain a binary counter of a certain width, if interrupt
 latency gets too bad, this may overflow.
 
 This is a non-issue for the TSC, which is 64bit wide in hardware.

Many of the systems where I see this problem are using the TSC as
the timecounter.  They don't have APM in the kernel, and they aren't
running ntpd.  I.e., it's not only the i8254 that's the problem.  The
fastest of these systems is a 1.13 GHz PIII, and it would take the 32
bits of the TSC which are actually used 3.8 seconds to wrap around.

 Hope this helps...

Yep, thanks.  I have some ideas of other things to try.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 In article [EMAIL PROTECTED],
 John Baldwin  [EMAIL PROTECTED] wrote:
  
   like, If X is never locked out for longer than Y, this problem
   cannot happen.  I'm looking for definitions of X and Y.  X might be
   hardclock() or softclock() or non-interrupt kernel processing.  Y
   would be some measure of time, probably a function of HZ and/or the
   timecounter frequency.
  
  X is hardclock I think, since hardclock() calls tc_windup().
 
 That makes sense, but on the other hand hardclock seems unlikely to be
 delayed by much.  The only thing that can block hardclock is another
 hardclock, an splclock, or an splhigh.  And, maybe, splstatclock.  I'm
 talking about -stable here, which is where I'm doing my experiments.
 
 Try swapping so you use the RTC for hardclock  statclock.
 
 Let the i8254 run with 65536 divisor and do only timecounter service.
 
 That would be a very interresting experiment.

Agreed.  But in the cases I'm worrying about right now, the
timecounter is the TSC.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:

Agreed.  But in the cases I'm worrying about right now, the
timecounter is the TSC.

Now, *that* is very interesting, how reproducible is it ?

Can you try to MFC rev 1.111 and see if that changes anything ?


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 
 Agreed.  But in the cases I'm worrying about right now, the
 timecounter is the TSC.
 
 Now, *that* is very interesting, how reproducible is it ?

I can reproduce it pretty easily.  This is from a PIII/550:

web1# uptime
 5:23PM  up 37 mins, 2 users, load averages: 0.08, 0.05, 0.01
web1# dmesg | grep microuptime
microuptime() went backwards (1992.7347381 - 1991.587620)
microuptime() went backwards (1992.7347381 - 1991.620385)
microuptime() went backwards (1992.7347381 - 1991.621582)
microuptime() went backwards (2016.7063298 - 2015.272466)
microuptime() went backwards (2286.7346482 - 2285.587587)
microuptime() went backwards (2286.7346482 - 2285.593646)
microuptime() went backwards (2286.7346482 - 2285.595103)
microuptime() went backwards (2286.7346482 - 2285.617023)

Now, two things are unusual about this system.  First, I have it
pumping packets from inside the kernel (with a special netgraph node)
at a rate which is keeping the system about 95% busy processing
interrupts.  (That's about 330K packets/second.)  That definitely
makes it happen more often.  But I also see it occasionally when the
system is sitting idle.  Second, I have HZ set to 1.  I think that
also contributes to the problem.  It shouldn't, though.  Not even a
PII/400 breathes hard at HZ=1.

On another system which is mostly idle, I see it about 4 times a
day.  That system also has HZ set to 1.  It's a PIII/1.13 GHz:

microuptime() went backwards (200671.582473 - 200671.582472)
microuptime() went backwards (222622.358518 - 222622.358517)
microuptime() went backwards (228934.897839 - 228934.897838)
microuptime() went backwards (246166.386301 - 246166.386300)
microuptime() went backwards (263484.488575 - 263484.488574)

Notice the difference on this system that's fairly idle.  On the
other system, it was the seconds going backward.  Here it's the
microseconds.  These may be two entirely different problems.

 Can you try to MFC rev 1.111 and see if that changes anything ?

Sure.  I'll let you know.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 
 Can you try to MFC rev 1.111 and see if that changes anything ?

That produced some interesting results.  I am still testing under
very heavy network interrupt load.  With the change from 1.111, I
still get the microuptime messages about as often.  But look how
much larger the reported backwards jumps are:

microuptime() went backwards (896.225603 - 888.463636)
microuptime() went backwards (896.225603 - 888.494440)
microuptime() went backwards (896.225603 - 888.500875)
microuptime() went backwards (1184.392277 - 1176.603001)
microuptime() went backwards (1184.392277 - 1176.603749)

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
John Polstra  [EMAIL PROTECTED] wrote:
 In article [EMAIL PROTECTED],
 Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
  In message [EMAIL PROTECTED], John Polstra writes:
  
  Can you try to MFC rev 1.111 and see if that changes anything ?
 
 That produced some interesting results.  I am still testing under
 very heavy network interrupt load.  With the change from 1.111, I
 still get the microuptime messages about as often.  But look how
 much larger the reported backwards jumps are:
 
 microuptime() went backwards (896.225603 - 888.463636)
 microuptime() went backwards (896.225603 - 888.494440)
 microuptime() went backwards (896.225603 - 888.500875)
 microuptime() went backwards (1184.392277 - 1176.603001)
 microuptime() went backwards (1184.392277 - 1176.603749)

Another interesting thing is that the jumps are always 7.7x seconds
back -- usually 7.79 seconds.  This is even true with more data points
from two different machines.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:
In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 
 Can you try to MFC rev 1.111 and see if that changes anything ?

That produced some interesting results.  I am still testing under
very heavy network interrupt load.  With the change from 1.111, I
still get the microuptime messages about as often.  But look how
much larger the reported backwards jumps are:

microuptime() went backwards (896.225603 - 888.463636)
microuptime() went backwards (896.225603 - 888.494440)
microuptime() went backwards (896.225603 - 888.500875)
microuptime() went backwards (1184.392277 - 1176.603001)
microuptime() went backwards (1184.392277 - 1176.603749)

(Ok, I'll MFC 1.111)

Sanity-check: this is NOT a multi-CPU system, right ?

We now have three options left:
hardclock interrupt starvation 
scheduling related anomaly wrt to the use of microuptime().
arithmetic overflow because the call to microuptime() gets
interrupted for too long.

Looking for the first, add this to hardclock:

static u_int64_t  lasttsc;
u_int64_t delta, now;

now = rdtsc();
delta = now - lasttsc;
lasttsc = now;
if (delta  (tsc_freq * 3) / (2 * hz)) {
printf(delta = %llu \n, delta);
}

In theory, you get no printfs.  If you do, hopefully they have small
delta values.

Looking for the third:

Add splhigh() protection around the stuff in microuptime();

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:
In article [EMAIL PROTECTED],
John Polstra  [EMAIL PROTECTED] wrote:
 In article [EMAIL PROTECTED],
 Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
  In message [EMAIL PROTECTED], John Polstra writes:
  
  Can you try to MFC rev 1.111 and see if that changes anything ?
 
 That produced some interesting results.  I am still testing under
 very heavy network interrupt load.  With the change from 1.111, I
 still get the microuptime messages about as often.  But look how
 much larger the reported backwards jumps are:
 
 microuptime() went backwards (896.225603 - 888.463636)
 microuptime() went backwards (896.225603 - 888.494440)
 microuptime() went backwards (896.225603 - 888.500875)
 microuptime() went backwards (1184.392277 - 1176.603001)
 microuptime() went backwards (1184.392277 - 1176.603749)

Another interesting thing is that the jumps are always 7.7x seconds
back -- usually 7.79 seconds.  This is even true with more data points
from two different machines.

Yes, I noticed, but didn't dare draw conclusions based on two data points.

This points to an arithmetic overflow (ie: point 3 in my previous email)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 In article [EMAIL PROTECTED],
 John Polstra  [EMAIL PROTECTED] wrote:
 
 Another interesting thing is that the jumps are always 7.7x seconds
 back -- usually 7.79 seconds.  This is even true with more data points
 from two different machines.
 
 Yes, I noticed, but didn't dare draw conclusions based on two data points.

It's pretty consistent -- always 7.7somthing.

 This points to an arithmetic overflow (ie: point 3 in my previous email)

Yes, I think you're onto something now.  It's a 550 MHz. machine, so
the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
seconds. :-)

I'll try the things you suggested in your other mail.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Nate Williams

  Can you try to MFC rev 1.111 and see if that changes anything ?
 
 That produced some interesting results.  I am still testing under
 very heavy network interrupt load.  With the change from 1.111, I
 still get the microuptime messages about as often.  But look how
 much larger the reported backwards jumps are:
 
 microuptime() went backwards (896.225603 - 888.463636)
 microuptime() went backwards (896.225603 - 888.494440)
 microuptime() went backwards (896.225603 - 888.500875)
 microuptime() went backwards (1184.392277 - 1176.603001)
 microuptime() went backwards (1184.392277 - 1176.603749)
 
 (Ok, I'll MFC 1.111)

Huh?  It appears that 1.111 makes things worse, not better (larger
jumps).

Can you explain why you think this is a good things, since it seems to
be a bad thing to me.

 Sanity-check: this is NOT a multi-CPU system, right ?

As stated before, both are  1Ghz single-CPU systems running -stable,
although I'm sure John is capable of a answering this on his own. :)

 We now have three options left:
   hardclock interrupt starvation 

This is Bruce's hypothesis, right?

   scheduling related anomaly wrt to the use of microuptime().
   arithmetic overflow because the call to microuptime() gets
   interrupted for too long.

'Interrupted for too long'.  Do you mean 'not interrupted enough', aka
a long interrupt blockage?  (I'm trying to understand here.)



Nate

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 
 Sanity-check: this is NOT a multi-CPU system, right ?

Right.  These are all single-CPU systems with non-SMP -stable
kernels.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:
In article [EMAIL PROTECTED],

 This may be a problem, I have yet to see GCC make different code for
 that but I should probably have committed the volatile anyway.

It should be committed, but it is not causing the problem in this
case.  I changed it and then compared MD5s of the object files.  The
only changes that resulted were unimportant.

Will do.

 I also noticed this in tco_forward():
 
 tco = timecounter;
 tc = sync_other_counter();
 [...]
 if (tco-tc_poll_pps)
 
 This code is actually correct, the tc_poll_pps needs to be done on
 the old timecounter, because that would be the reference for any
 captured hardware timestamps, if I did it on the new timecounter I
 might get negative deltas which would complicate things.  Also the
 new timecounter may have a changed frequency/offset (tickadj/ntpd
 and all that).

I don't think I follow your reasoning here.  If the call to
sync_other_counter were inlined, we'd have something like this:

tco = timecounter;
tco_in_sync_other_counter = timecounter;
[...]
if (tco-tc_poll_pps)

Obviously tco and tco_in_sync_other_counter will have the same value
almost all of the time, so the code can't be relying on them being
different.

It doesn't rely on them being different, they are used for two different
purposes.  sync_other_counter() sets up the next counter in the ring
to be used for timekeeping, tco/tc_poll_pps is used to calculate
any hardware captured timestamps which occured _before_ now.

sync_other_counter() returns the next timecounter after fiddling it
but we cannot use that for tc_poll_pps() purposes, we need the old
(or if you will: still present one) for that.

Trust me, this is not your problem because this is used only with
the xrpu.c timecounter code anyway and you don't have the hardware
for that :-)

Many of the systems where I see this problem are using the TSC as
the timecounter.  They don't have APM in the kernel, and they aren't
running ntpd.  I.e., it's not only the i8254 that's the problem.  The
fastest of these systems is a 1.13 GHz PIII, and it would take the 32
bits of the TSC which are actually used 3.8 seconds to wrap around.

It would be a problem if the kernel is punched hard enough that code
doesn't complete in the time it takes for the used timecounter to wrap,
ie:  if you can have something like:

call microuptime()
tc = timecounter;
(interrupt and do other stuff for several seconds)
... overflow in arithmetic 

But you would have to pummel your kernel pretty bad for that.  On the
other hand, between a 1 Hz and the network load you describe, that
may be exactly what is happening...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Nate Williams writes:
  Can you try to MFC rev 1.111 and see if that changes anything ?
 
 That produced some interesting results.  I am still testing under
 very heavy network interrupt load.  With the change from 1.111, I
 still get the microuptime messages about as often.  But look how
 much larger the reported backwards jumps are:
 
 microuptime() went backwards (896.225603 - 888.463636)
 microuptime() went backwards (896.225603 - 888.494440)
 microuptime() went backwards (896.225603 - 888.500875)
 microuptime() went backwards (1184.392277 - 1176.603001)
 microuptime() went backwards (1184.392277 - 1176.603749)
 
 (Ok, I'll MFC 1.111)

Huh?  It appears that 1.111 makes things worse, not better (larger
jumps).

No, 1.111 makes the jumps report more correctly I think.  They will
maybe save your meal in less bad cases than yours, but in yours they
just make sure that we don't get invalid number of microseconds in
a timeval, and consequently we get more honest output.

 We now have three options left:
  hardclock interrupt starvation 

This is Bruce's hypothesis, right?

Also mine for that matter.

  scheduling related anomaly wrt to the use of microuptime().
  arithmetic overflow because the call to microuptime() gets
  interrupted for too long.

'Interrupted for too long'.  Do you mean 'not interrupted enough', aka
a long interrupt blockage?  (I'm trying to understand here.)

See my previous email, I just explained it there.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:
In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 In article [EMAIL PROTECTED],
 John Polstra  [EMAIL PROTECTED] wrote:
 
 Another interesting thing is that the jumps are always 7.7x seconds
 back -- usually 7.79 seconds.  This is even true with more data points
 from two different machines.
 
 Yes, I noticed, but didn't dare draw conclusions based on two data points.

It's pretty consistent -- always 7.7somthing.

 This points to an arithmetic overflow (ie: point 3 in my previous email)

Yes, I think you're onto something now.  It's a 550 MHz. machine, so
the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
seconds. :-)

In that case I'm almost willing to put an AnchorSteam on microuptime()
being interrupted for more than good is in which case the splhigh() should
cure it.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 Yes, I think you're onto something now.  It's a 550 MHz. machine, so
 the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
 seconds. :-)
 
 In that case I'm almost willing to put an AnchorSteam on microuptime()
 being interrupted for more than good is in which case the splhigh() should
 cure it.

I'm testing that now.  But for how long would microuptime have to
be interrupted to make this happen?  Surely not 7.81 seconds!  On
this same machine I have a curses application running which is
updating the screen once a second.  It never misses a beat, and
userland is very responsive.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Nate Williams writes:

How are issues (1) and (3) above different?

ps. I'm just trying to understand, and am *NOT* trying to start a
flame-war. :) :) :)

If the starvation happens to hardclock() or rather tc_windup() the effect
will be cummulative and show up in permanent jumps in the output of date
for instance.  In stable hardclock() is spl-protected so this would be
_really_ bad news.

If the starvation happens in any of {micro|nano}[up]time() (but not the
get variants!) the it will result in a single spurious reading.

The premise for avoiding locking in the access functions to timecounters
where precisely that we could trust them to not be pre-empted for long
enough for the hardware to roll over, if this is not the case we loose
because the overflow in the hardware counter means that the timecounter
we calculate from is not valid for the delta we get from the hardware.

I'm not sure this answers your question, if not it is not bad will, just
me not understanding the question :-)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:
In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 Yes, I think you're onto something now.  It's a 550 MHz. machine, so
 the TSC increments every 1.82 nsec.  And 1.82 nsec * 2^32 is 7.81
 seconds. :-)
 
 In that case I'm almost willing to put an AnchorSteam on microuptime()
 being interrupted for more than good is in which case the splhigh() should
 cure it.

I'm testing that now.  But for how long would microuptime have to
be interrupted to make this happen?  Surely not 7.81 seconds!  On
this same machine I have a curses application running which is
updating the screen once a second.  It never misses a beat, and
userland is very responsive.

Well, that is what I don't understand yet either :-)

The fact that the delta is not exactly 2^32 * cpu clock is probably
blindingly obviously indicative of why, but I havn't solved the
puzzle yet...

Since you are running with a 1 HZ, NTIMECOUNTER should probably
be considerably increased.  (Actually it might be a good idea to
simply set NTIMECOUNTER == hz ... hmmm...)

We could be seing a situation where a process is preempted in
microuptime() and the timecounter ring being recycled *and*
the hardware counter overflowing before it completes, that might
give this problem.

You didn't say if you ran with standard NTIMECOUNTER right now,
but 5 would be awfully short time at HZ=1: 500 usec...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Nate Williams

 How are issues (1) and (3) above different?
 
 ps. I'm just trying to understand, and am *NOT* trying to start a
 flame-war. :) :) :)
 
 If the starvation happens to hardclock() or rather tc_windup() the effect
 will be cummulative and show up in permanent jumps in the output of date
 for instance.  In stable hardclock() is spl-protected so this would be
 _really_ bad news.
 
 If the starvation happens in any of {micro|nano}[up]time() (but not the
 get variants!) the it will result in a single spurious reading.

Ok, the bulb is starting to grow from dim to bright. :)

 The premise for avoiding locking in the access functions to timecounters
 where precisely that we could trust them to not be pre-empted for long
 enough for the hardware to roll over, if this is not the case we loose
 because the overflow in the hardware counter means that the timecounter
 we calculate from is not valid for the delta we get from the hardware.
 
 I'm not sure this answers your question, if not it is not bad will, just
 me not understanding the question :-)

*grin*

I think I understand the problem.  Let me try to rephrase to make sure.

1) If we have an interrupt lockout (*NOT* due to time-counting code),
   then we'd have a problem since the hardclock would never get run.

2) If however, the locking done to protect the timecounter code happens
   to make getting/setting the timecounter take too long, we'd get
   similar results, but for *completely* different reasons.

Let me be more precise.

(1)
  cli();
   /* Take a really long time doing something */
  sti();

(2)
  /* Do something */
  gettime();  /* Takes a really long time to complete */

The first is harder to track down/fix, simply because you don't know
*who* the offender is.  The latter is essentially the same problem to
fix, but *may* be easier to fix since the offending code *IS* the
timecounter code.

Am I even close to understanding?



Nate

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Nate Williams

   Can you try to MFC rev 1.111 and see if that changes anything ?
  
  That produced some interesting results.  I am still testing under
  very heavy network interrupt load.  With the change from 1.111, I
  still get the microuptime messages about as often.  But look how
  much larger the reported backwards jumps are:
  
  microuptime() went backwards (896.225603 - 888.463636)
  microuptime() went backwards (896.225603 - 888.494440)
  microuptime() went backwards (896.225603 - 888.500875)
  microuptime() went backwards (1184.392277 - 1176.603001)
  microuptime() went backwards (1184.392277 - 1176.603749)
  
  (Ok, I'll MFC 1.111)
 
 Huh?  It appears that 1.111 makes things worse, not better (larger
 jumps).
 
 No, 1.111 makes the jumps report more correctly I think. 

Now, if that ain't a glowing reason to MFC it, I don't know one (I
think). :) :)

 They will maybe save your meal in less bad cases than yours, but in
 yours they just make sure that we don't get invalid number of
 microseconds in a timeval, and consequently we get more honest output.

How can you verify that this is the case?

  We now have three options left:
 hardclock interrupt starvation 
 
 This is Bruce's hypothesis, right?
 
 Also mine for that matter.
 
 scheduling related anomaly wrt to the use of microuptime().
 arithmetic overflow because the call to microuptime() gets
 interrupted for too long.
 
 'Interrupted for too long'.  Do you mean 'not interrupted enough', aka
 a long interrupt blockage?  (I'm trying to understand here.)
 
 See my previous email, I just explained it there.

I still didn't understand, hence the reason for the question.  (The
explanation was in the email I originall responded to).

I understand the 'overflow' issue, but it would seem to my naive
thinking that it would occur only when interrupts are blocked for a
period of time, which is the same as hardclock interrupt starvation in
my mind.

How are issues (1) and (3) above different?



Nate

ps. I'm just trying to understand, and am *NOT* trying to start a
flame-war. :) :) :)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

OK, adding the splhigh() around the body of microuptime seems to have
solved the problem.  After 45 minutes of running the same test as
before, I haven't gotten a single message.  If I get one later, I'll
let you know.

 I'm testing that now.  But for how long would microuptime have to
 be interrupted to make this happen?  Surely not 7.81 seconds!  On
 this same machine I have a curses application running which is
 updating the screen once a second.  It never misses a beat, and
 userland is very responsive.
 
 Well, that is what I don't understand yet either :-)
 
 The fact that the delta is not exactly 2^32 * cpu clock is probably
 blindingly obviously indicative of why, but I havn't solved the
 puzzle yet...

I've been staring at the code, but I haven't gotten it yet either.

 Since you are running with a 1 HZ, NTIMECOUNTER should probably
 be considerably increased.  (Actually it might be a good idea to
 simply set NTIMECOUNTER == hz ... hmmm...)

I don't follow that.  As I read the code, the current timecounter
is only advanced every second -- not every 1/HZ seconds.  Why should
more of them be needed when HZ is large?

 You didn't say if you ran with standard NTIMECOUNTER right now,
 but 5 would be awfully short time at HZ=1: 500 usec...

Well, microseconds aren't what they used to be ... :-) But isn't it
true that the current timecounter only advances every second?  I think
I have 5 seconds, not 5/HZ seconds.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 I don't follow that.  As I read the code, the current timecounter
 is only advanced every second -- not every 1/HZ seconds.  Why should
 more of them be needed when HZ is large?
 
 No, only if you have set tco_method to one, if tco_method is zero (default)
 we update the timecounter every HZ.

Oh, you're right.  Sorry, I misread the code.  My tco_method is 0,
so it will do the update every HZ.

 Could you try this combination:
 
   NTIMECOUNTER = HZ  (or even 5 * HZ)
   tco_method = 0
   no splhigh protection for microuptime() ?

Yep, I'll try that.

 If I do
   extern volatile struct timecounter *timecounter;
 
   microtime()
   {
   struct timecounter *tc;
 
   tc = timecounter;
 
 The compiler complains about loosing the volatile thing.
 
 How do I tell it that it is the contents of the timecounter pointer which
 is volatile, but now what it points at ?

You want:

extern struct timecounter *volatile timecounter;

and also change the definition of the variable in the .c file.  You
won't get any warnings from that.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 In message [EMAIL PROTECTED], John Polstra writes:
 Could you try this combination:
 
   NTIMECOUNTER = HZ  (or even 5 * HZ)
   tco_method = 0
   no splhigh protection for microuptime() ?

After 25 minutes testing that with NTIMECOUNTER=5, I haven't
gotten any microuptime messages.  So it appears that my problem was
just that the current timecounter wrapped all the way around the ring
while microuptime was interrupted, due to the high HZ value and the
heavy interrupt load.  I'm sorry I didn't try this sooner, when you
suggested it.

After lunch I'll try NTIMECOUNTER=HZ (1 in my case).  That sounds
like a nice default value to me.

It would be interesting to see whether this same fix also works in
-current.  I'm not sure yet if I can test that here or not.  I haven't
tried building my netgraph module under -current yet.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], John Polstra writes:

After 25 minutes testing that with NTIMECOUNTER=5, I haven't
gotten any microuptime messages.  So it appears that my problem was
just that the current timecounter wrapped all the way around the ring
while microuptime was interrupted, due to the high HZ value and the
heavy interrupt load.  I'm sorry I didn't try this sooner, when you
suggested it.

After lunch I'll try NTIMECOUNTER=HZ (1 in my case).  That sounds
like a nice default value to me.

Yup, if we preempt things for more than a second we have other problems
too I think.

My fault for not letting it depend in HZ in the first place.

It would be interesting to see whether this same fix also works in
-current.  I'm not sure yet if I can test that here or not.  I haven't
tried building my netgraph module under -current yet.

Well, either way I will commit the volatile and this NTIMECOUNTER to
-current now, it's certainly better than what is there now.

Thanks for the help, I owe you one at BSDcon!

Poul-Henning

Ohh, and btw: do I need to say that I'm dying to know what the heck
you are doing with that box ?  :-)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Poul-Henning Kamp  [EMAIL PROTECTED] wrote:
 
 Well, either way I will commit the volatile and this NTIMECOUNTER to
 -current now, it's certainly better than what is there now.

Great, thanks.

 Thanks for the help, I owe you one at BSDcon!

I'll look forward to it!

 Ohh, and btw: do I need to say that I'm dying to know what the heck
 you are doing with that box ?  :-)

Making the room a good bit warmer, that's for sure. :-)

First I should mention for the benefit of those listening in that it
isn't something I can release publicly, and it will probably never
be open source.  It's a simulated web client and web server, running
inside the kernel.  It's good for load-testing and performance-testing
many kinds of network devices.  With two 1-GHz PIII boxes (one acting
as the client and the other acting as the server) it can generate
around 5 (actually I think it's more than that) full web sessions
per second.  Also, you can dial in any rate you want, and it will
generate that rate very precisely.  Lots of fun!

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Bakul Shah

 Btw, regarding the volatile thing:
 
 If I do
   extern volatile struct timecounter *timecounter;
 
   microtime()
   {
   struct timecounter *tc;
 
   tc = timecounter;
 
 The compiler complains about loosing the volatile thing.
 
 How do I tell it that it is the contents of the timecounter pointer which
 is volatile, but now what it points at ?  I don't want the tc pointer to
 be volatile because it obviously isn't.  Do I really need to cast it ?
 
   tc = (struct timecounter *)timecounter;

[I see that jdp has answered your question but] cdecl is your friend!

$ cdecl
Type `help' or `?' for help
cdecl explain volatile struct timecounter *timecounter
declare timecounter as pointer to volatile struct timecounter
cdecl declare timecounter as volatile pointer to struct timecounter
struct timecounter * volatile timecounter

-- bakul

PS: Chances are most people don't have cdecl any more.  You
can get it like this:

mkdir cdecl;cd cdecl
fetch 
ftp://gatekeeper.dec.com/pub/usenet/comp.sources.unix/volume14/cdecl2/part0{1,2}.Z
gzcat part01.Z | gunshar
gzcat part02.Z | gunshar
patch 'EOF'
diff -ru ../cdecl-orig/cdecl.c ./cdecl.c
--- ../cdecl-orig/cdecl.c   Tue Feb  5 14:24:23 2002
+++ ./cdecl.c   Tue Feb  5 12:12:30 2002
@@ -57,6 +57,9 @@
 # include stddef.h
 # include string.h
 # include stdarg.h
+#ifdef BSD
+#include errno.h
+#endif
 #else
 # ifndef NOVARARGS
 #  include varargs.h
@@ -110,6 +113,9 @@
   void docast(char*, char*, char*, char*);
   void dodexplain(char*, char*, char*, char*);
   void docexplain(char*, char*, char*, char*);
+#ifdef __FreeBSD__
+#define setprogname _bad_bad_bad_FreeBSD
+#endif
   void setprogname(char *);
   int dotmpfile(int, char**), dofileargs(int, char**);
 #else
diff -ru ../cdecl-orig/makefile ./makefile
--- ../cdecl-orig/makefile  Tue Feb  5 14:24:19 2002
+++ ./makefile  Tue Feb  5 12:10:10 2002
@@ -13,7 +13,7 @@
 # add -DdodebugTo compile in debugging trace statements.
 # add -Ddoyydebug  To compile in yacc trace statements.
 
-CFLAGS= -g -Ddodebug -Ddoyydebug
+CFLAGS= -g -Ddodebug -Ddoyydebug -DBSD
 CC= cc
 ALLFILES= makefile cdgram.y cdlex.l cdecl.c cdecl.1 testset testset++
 BIN= /usr/lubin
EOF
make
# as root:
make install BIN=/usr/local/bin

No idea if c++decl is valid any more!

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread John Polstra

In article [EMAIL PROTECTED],
Bakul Shah  [EMAIL PROTECTED] wrote:

 [I see that jdp has answered your question but] cdecl is your friend!
 
 $ cdecl
 Type `help' or `?' for help
 cdecl explain volatile struct timecounter *timecounter
 declare timecounter as pointer to volatile struct timecounter
 cdecl declare timecounter as volatile pointer to struct timecounter
 struct timecounter * volatile timecounter

Is C a great language, or what? ;-)

The way I always remember it is that you read the declaration
inside-out: starting with the variable name and then heading toward
the outside while obeying the precedence rules.  When you hit a *,
you say pointer to; when you hit [], you say array of; and when
you hit () you say function returning.  For example:

struct timecounter * volatile timecounter;
/* Timecounter is a volatile pointer to a struct timecounter. */

volatile struct timecounter *timecounter;
/* Timecounter is a pointer to a struct timecounter which is volatile. */

The reason for the awkward which is in that last one is just because
C lets you get sloppy with the ordering of the outermost keywords.
The pedantically correct way to declare a pointer to volatile struct
is like this:

struct timecounter volatile *timecounter;
/* Timecounter is a pointer to a volatile struct timecounter. */

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Kutulu

On Tue, Feb 05, 2002 at 02:42:38PM -0800, Bakul Shah wrote:
 
 PS: Chances are most people don't have cdecl any more.  You
 can get it like this:
 

You can also get it like this:

cd /usr/ports/devel/cdecl ; make install

which I just went and did.  Pretty helpful utility :)

--K


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Anthony Schneider

 PS: Chances are most people don't have cdecl any more.  You
 can get it like this:

cd /usr/ports/devel/cdecl  make install

:)

-Anthony.




msg31489/pgp0.pgp
Description: PGP signature


Re: A question about timecounters

2002-02-05 Thread Bakul Shah

 Is C a great language, or what? ;-)

Nah, just mediocre even when it comes to obfuscation!
Have you played with unlambda?!

 The way I always remember it is that you read the declaration
 inside-out: starting with the variable name and then heading toward
 the outside while obeying the precedence rules.  When you hit a *,
 you say pointer to; when you hit [], you say array of; and when
 you hit () you say function returning.  For example:

I remember something about switching declaration reading
direction when you hit a bracket; but why bother once you
have cdecl?

cdecl declare f as array of pointer to function returning pointer to function 
returning int  
int (*(*f[])())()

It is not clear to me how to apply your rule.  It doesn't
matter though, it is gotten to the point where I can only
store ptrs to ptrs to information in my ever shrinking brain!

To the people who pointed out the cdecl port, I did look in
/usr/ports/devel but missed cdecl somehow.  Sigh... :-)

-- bakul

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread Terry Lambert

John Polstra wrote:
 After 25 minutes testing that with NTIMECOUNTER=5, I haven't
 gotten any microuptime messages.  So it appears that my problem was
 just that the current timecounter wrapped all the way around the ring
 while microuptime was interrupted, due to the high HZ value and the
 heavy interrupt load.  I'm sorry I didn't try this sooner, when you
 suggested it.
 
 After lunch I'll try NTIMECOUNTER=HZ (1 in my case).  That sounds
 like a nice default value to me.

5 * HZ @ 10,000 HZ * sizeof(struct timecounter)
= 5 * 1 * 23*4
= 4,600,000
= 4.5M in timecounter structures

Even with your NTIMECOUNTER=HZ (assuming it even works), that
is 1M of memory burned on timecounter structures to prevent
wrap around.

If the interrupts are happening fast enough that wrap is this
bad a problem, then I suggest another counter that divides
the interupt frequency down to a (much smaller) update frequency
for the timecounter.

Is having this many of these things really worthwhile?  I
can't see how off the top of my head, but I can imagine a
situation where it miht be referenced with a higher than
wrap frequency... but the question is, what's the highest
access frequency?  That should limit the update frequency
to twice that, right, which would guarantee adequate update
resolution to satisfy that use?

If it is, could we maybe just count wraps, instead, and
add 7.7?

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-05 Thread M. Warner Losh

In message: [EMAIL PROTECTED]
John Polstra [EMAIL PROTECTED] writes:
: I'm testing that now.  But for how long would microuptime have to
: be interrupted to make this happen?  Surely not 7.81 seconds!  On
: this same machine I have a curses application running which is
: updating the screen once a second.  It never misses a beat, and
: userland is very responsive.

Silly hypothesis.  It isn't losing 7.81 seconds of time.  Rather it is
being interrupt at the wrong time and the wrap detection code works
badly when interrupted at that point.  So you aren't blocked for 7.81
seconds, but rather for 0.02 seconds since you are seeing the 7.79
jump.  You are basically catching the wrong edge of the phase of the
TSC.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



A question about timecounters

2002-02-04 Thread John Polstra

I'm trying to understand the timecounter code, and in particular the
reason for the microuptime went backwards messages which I see on
just about every machine I have, whether running -stable or -current.
This problem is usually attributed to too much interrupt latency.  My
question is, how much latency is too much?  Which interrupt has to
be locked out for how long in order to see these messages?

John

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-04 Thread Dominic Marks

On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
 I'm trying to understand the timecounter code, and in particular the
 reason for the microuptime went backwards messages which I see on
 just about every machine I have, whether running -stable or -current.

I see them everywhere with -CURRENT, but not at all with -STABLE. This is
with two seperate machines. Perhaps that may add clues.

 This problem is usually attributed to too much interrupt latency.  My
 question is, how much latency is too much?  Which interrupt has to
 be locked out for how long in order to see these messages?
 
 John
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message

-- 
Dominic

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-04 Thread John Polstra

In article [EMAIL PROTECTED],
Dominic Marks  [EMAIL PROTECTED] wrote:
 On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
  I'm trying to understand the timecounter code, and in particular the
  reason for the microuptime went backwards messages which I see on
  just about every machine I have, whether running -stable or -current.
 
 I see them everywhere with -CURRENT, but not at all with -STABLE. This is
 with two seperate machines. Perhaps that may add clues.

I'm looking for something less empirical than that.  When somebody
says this problem is caused by too much interrupt latency, I assume
they have a mental model of what is going wrong when this excessive
latency occurs.  Given that, it should be possible to make a statement
like, If X is never locked out for longer than Y, this problem
cannot happen.  I'm looking for definitions of X and Y.  X might be
hardclock() or softclock() or non-interrupt kernel processing.  Y
would be some measure of time, probably a function of HZ and/or the
timecounter frequency.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-04 Thread Mike Smith


 In article [EMAIL PROTECTED],
 Dominic Marks  [EMAIL PROTECTED] wrote:
  On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
   I'm trying to understand the timecounter code, and in particular the
   reason for the microuptime went backwards messages which I see on
   just about every machine I have, whether running -stable or -current.
  
  I see them everywhere with -CURRENT, but not at all with -STABLE. This is
  with two seperate machines. Perhaps that may add clues.
 
 I'm looking for something less empirical than that.  When somebody
 says this problem is caused by too much interrupt latency, I assume
 they have a mental model of what is going wrong when this excessive
 latency occurs.

It's not necessarily caused by interrupt latency.  Here's the assumption 
that's being made.

There is a ring of timecounter structures, of some size.  In testing,
I've used sizes of a thousand or more, but still seen this problem.

There is a pointer to the current timecounter structure.

When the current time is updated, the following procedure is followed:

 - Find the next timecounter in the ring.
 - Update its contents with the new current time.
 - Move the current pointer.

When one wishes to read the current time, one proceeds as follows:

 - Get the current pointer and save it locally.
 - Read the timecounter structure via the local current pointer.

Since the operations on the current pointer are atomic, there is no 
need to lock the structure.

There are a couple of possible problems with this mechanism.

One is that the ring catches up with your saved copy of the
current pointer, ie. inbetween fetching the pointer and reading the
timecounter contents, the next pointer passes over you again in such
a fashion that you get garbage out of the structure.

Another is that there is a race between multiple updaters of the
timecounter; if two parties are both updating the next timecounter
along with another party trying to get the current time, this could
cause corruption.

All that interrupt latency will do is make the updates late; I can't
actually see how it could cause corruption.  Corruption has to be
caused by mishandling of the timecounter ring in some fashion.

Note that you can probably eliminate the ring loop theory by
allocating a very large number of entries in the ring by setting
NTIMECOUNTER (kern/kern_tc.c) higher.  The structures are small; try
100,000 or so.

If you can reproduce under these circumstances, try adding some checks
to make sure the current timecounter pointer is behaving
monotonically; just save the last timecounter pointer in microtime()
et. al.

Another test worth performing is to look at the tco_delta function for
the timecounter and make sure that it returns a sane value, and one
that doesn't behave out of synch with the interrupt handler that updates
the timecounter proper.  If you save the delta value in the timecounter 
and zero it when it's updated, you can catch this.

You can rule this out by using getmicroptime() rather than
microuptime(); it may return the same value twice, which isn't
desirable, but that would be better than nothing.

Hope this helps a bit.

Regards,
Mike

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-04 Thread John Polstra

In article [EMAIL PROTECTED],
Mike Smith  [EMAIL PROTECTED] wrote:
 
 It's not necessarily caused by interrupt latency.  Here's the assumption 
 that's being made.
[...]

Thanks for the superb explanation!  I appreciate it.

 There is a ring of timecounter structures, of some size.  In testing,
 I've used sizes of a thousand or more, but still seen this problem.
 
 There is a pointer to the current timecounter structure.

That's the global variable named timecounter, right?  I did notice
one potential problem: that variable is not declared volatile.  So
in this part ...

 When one wishes to read the current time, one proceeds as follows:
 
  - Get the current pointer and save it locally.
  - Read the timecounter structure via the local current pointer.

... the compiler is perfectly free to reread the global multiple
times in the function rather than using the saved local copy.  If the
current pointer has moved in that time, we'll an inconsistent view
of the timecounter.  In looking at the generated code I haven't found
any actual instances of that.  But I'll try making it volatile just to
make sure.  Even if it doesn't cause any problems currently, I think
we should change it to volatile since it could start to cause problems
some day.

I also noticed this in tco_forward():

tco = timecounter;
tc = sync_other_counter();
[...]
if (tco-tc_poll_pps)

But sync_other_counter() loads its own copy of timecounter,
and there's no guarantee it hasn't changed from the value that
tco_forward() saved in its local variable.  I'm not sure yet if
that's a potential problem.  It could corrected by passing tco as
an argument to sync_other_counter.  I'll try that too.

 There are a couple of possible problems with this mechanism.
 
 One is that the ring catches up with your saved copy of the
 current pointer, ie. inbetween fetching the pointer and reading the
 timecounter contents, the next pointer passes over you again in such
 a fashion that you get garbage out of the structure.

As you mentioned, with a large enough ring this should be impossible.
If I read the code correctly, the current pointer is only moved
once per second.  So in the current ring of 4 counters (number 0 is
special), it would take 4 seconds to wrap around the ring.  That's a
pretty long time.

 Another is that there is a race between multiple updaters of the
 timecounter; if two parties are both updating the next timecounter
 along with another party trying to get the current time, this could
 cause corruption.
 
 All that interrupt latency will do is make the updates late; I can't
 actually see how it could cause corruption.  Corruption has to be
 caused by mishandling of the timecounter ring in some fashion.

I agree.

 Note that you can probably eliminate the ring loop theory by
 allocating a very large number of entries in the ring by setting
 NTIMECOUNTER (kern/kern_tc.c) higher.  The structures are small; try
 100,000 or so.

OK, but even the thousand you tried should give a cushion of more
than 16 minutes.

 If you can reproduce under these circumstances, try adding some checks
 to make sure the current timecounter pointer is behaving
 monotonically; just save the last timecounter pointer in microtime()
 et. al.
 
 Another test worth performing is to look at the tco_delta function for
 the timecounter and make sure that it returns a sane value, and one
 that doesn't behave out of synch with the interrupt handler that updates
 the timecounter proper.  If you save the delta value in the timecounter 
 and zero it when it's updated, you can catch this.
 
 You can rule this out by using getmicroptime() rather than
 microuptime(); it may return the same value twice, which isn't
 desirable, but that would be better than nothing.
 
 Hope this helps a bit.

Yep, thanks again.

John
-- 
  John Polstra
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: A question about timecounters

2002-02-04 Thread John Baldwin


On 04-Feb-02 John Polstra wrote:
 In article [EMAIL PROTECTED],
 Dominic Marks  [EMAIL PROTECTED] wrote:
 On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote:
  I'm trying to understand the timecounter code, and in particular the
  reason for the microuptime went backwards messages which I see on
  just about every machine I have, whether running -stable or -current.
 
 I see them everywhere with -CURRENT, but not at all with -STABLE. This is
 with two seperate machines. Perhaps that may add clues.
 
 I'm looking for something less empirical than that.  When somebody
 says this problem is caused by too much interrupt latency, I assume
 they have a mental model of what is going wrong when this excessive
 latency occurs.  Given that, it should be possible to make a statement
 like, If X is never locked out for longer than Y, this problem
 cannot happen.  I'm looking for definitions of X and Y.  X might be
 hardclock() or softclock() or non-interrupt kernel processing.  Y
 would be some measure of time, probably a function of HZ and/or the
 timecounter frequency.

X is hardclock I think, since hardclock() calls tc_windup().  I'm not sure what
Y is except that it is indeed a known value.  phk should know as he is Mr.
Timecounter.

 John
 -- 
   John Polstra
   John D. Polstra  Co., Inc.Seattle, Washington USA
   Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message