Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-29 Thread Kurt Roeckx
On Fri, Nov 28, 2003 at 08:22:30PM -0500, Tom Lane wrote:
> 
> One variable I didn't think to ask about is whether you are running
> NTP.  In my experience an ntp daemon that has achieved lock will never
> step the clock back by even 1 usec (it's supposed to use much more
> subtle methods than that to manage the clock ;-)) but maybe under
> unstable conditions such things could happen.  The machines I have
> tested here all run NTP.

ntpd will slew the clock in case the difference is small enough.
Which means it will tell the kernel to slow down time a little.

In case the offset is too large (0.128 seconds) it will tell the
kernel to step the clock.

ntpd also has an option to "always slew".  But it's broken on
atleast Linux 2.4.

Note that the kernel uses values from different places to
calculate what the current time is, and they're not all running
on the same frequency.  I wouldn't worry about a 1 us step back.


Kurt


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-29 Thread Gavin Sherry
I ran Tom's second script for two hours on an AMD Duron on Linux 2.4.18.
No problems.

Gavin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Neil Conway
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> What kinda processor are you running?

>From dmesg:

Timecounter "i8254"  frequency 1193182 Hz
CPU: Intel Celeron (631.29-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x686  Stepping = 6
  
Features=0x383f9ff

-Neil


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Marc G. Fournier
On Fri, 28 Nov 2003, Neil Conway wrote:

> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> > On Fri, 28 Nov 2003, Neil Conway wrote:
> >> $ uname -a
> >> FreeBSD home.samurai.com 4.9-RELEASE [...]
> >
> > what is kern.timecounter.method set to?
>
> $ sysctl kern.timecounter.method
> kern.timecounter.method: 0
>
> FYI, I ran the test program for another 30 minutes (an hour in total)
> without any output.

Great, I love consistent results :)  Some have 5.x that do it, some don't
... 4.x does it on all my machines, but not yours ...

What kinda processor are you running?  Intel or AMD?  Mine are all Intel
...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Neil Conway
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> On Fri, 28 Nov 2003, Neil Conway wrote:
>> $ uname -a
>> FreeBSD home.samurai.com 4.9-RELEASE [...]
>
> what is kern.timecounter.method set to?

$ sysctl kern.timecounter.method 
kern.timecounter.method: 0

FYI, I ran the test program for another 30 minutes (an hour in total)
without any output.

-Neil


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Marc G. Fournier
On Fri, 28 Nov 2003, Neil Conway wrote:

> $ uname -a
> FreeBSD home.samurai.com 4.9-RELEASE FreeBSD 4.9-RELEASE #0: Thu Nov  6 21:25:41 EST 
> 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SAMURAI  i386
>
> This machine produces no output after running the test app for about
> 30 minutes.

what is kern.timecounter.method set to?


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
>> 15 minutes of error is acceptable?

> Apparently *shrug*  That was just the first answer back though, so we'll
> see what happens ...

BTW, it seems from this thread that the BIND guys have seen even more
bogus behavior than this from gettimeofday() on BSD.  Their attitude is
clearly "BSD is broken":

http://www.isc.org/ml-archives/bind-users/2001/01/msg01246.html


regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Marc G. Fournier
On Fri, 28 Nov 2003, Tom Lane wrote:

> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> > Just got my first response on this, and, in fact, this is 'acceptable
> > behaviour' to a certain extent ... there is a kernel tuneable called
> > kern.timecounter.method, that is either 1 or 0:
>
> 15 minutes of error is acceptable?

Apparently *shrug*  That was just the first answer back though, so we'll
see what happens ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Marc G. Fournier
On Fri, 28 Nov 2003, Marc G. Fournier wrote:

> On Fri, 28 Nov 2003, Tom Lane wrote:
>
> > "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> > > I'm suspecting that the issue is amplified by load on the server itself
> >
> > That would fit right in with my idea that the failure occurs when the
> > process loses its timeslice partway through gettimeofday().  Heavier
> > load would presumably make it more likely to lose the CPU.
>
> I'm posting to the freebsd lists about it, see if they have anything to
> say ...

Just got my first response on this, and, in fact, this is 'acceptable
behaviour' to a certain extent ... there is a kernel tuneable called
kern.timecounter.method, that is either 1 or 0:

 0sacrifice precision in favor of faster code execution

 1return the more precise time (the same as calling microtime() or
  nanotime()).

 The intent of the getmicrotime() and getnanotime() functions is to
 enforce the user's preference for timer accuracy versus execution time.

Now, reading the man page, there seems to be a 'microtime' vs
'getmicrotime' function ... these might be preferable to using
gettimeofday on FreeBSD?

 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> Just got my first response on this, and, in fact, this is 'acceptable
> behaviour' to a certain extent ... there is a kernel tuneable called
> kern.timecounter.method, that is either 1 or 0:

15 minutes of error is acceptable?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Nigel J. Andrews
On Fri, 28 Nov 2003, Tom Lane wrote:

> "Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> 
> > $ time ./a.out 2>&1 | tee a.txt
> > out of order tv_sec: 1070066197 273140, prev 1070066195 721010
> > out of order tv_usec: 1070066197 273140, prev 1070066195 721010
> > out of order tv_sec: 1070067322 116061, prev 1070067320 440490
> > out of order tv_usec: 1070067322 116061, prev 1070067320 440490
> > out of order tv_sec: 1070067833 514969, prev 1070067831 755019
> > out of order tv_usec: 1070067833 514969, prev 1070067831 755019
> > ^C
> 
> AFAICT the above is a legal trace, indicating only that the test program
> sometimes lost control for more than a second at a time.  The "revised"
> version of the test will not complain about this.

Doh! I was looking for time going backwards so that's what I saw, even though I
already new the same complaint would be made if time skipped forward a second
or so.

> One variable I didn't think to ask about is whether you are running
> NTP.  In my experience an ntp daemon that has achieved lock will never
> step the clock back by even 1 usec (it's supposed to use much more
> subtle methods than that to manage the clock ;-)) but maybe under
> unstable conditions such things could happen.  The machines I have
> tested here all run NTP.

Yes, all NTPed. The FreeBSD system, the one that doesn't skip back *smacks
forehead*, is the local time server but is generally lightly loaded. Not to say
that there's not going to be the odd job it has to run which could make the
system busy and time time away from this test. It's just difficult to say that
every so often there's some task that makes the system unresponsive for a
couple of seconds, which is what this is saying.

Anyway, I like your suggestion as the timeslicing being a possible
cause. Seems a little strange though, presumably all the numbers going into the
timeofday calc are gathered by the one system call at a place past where the
thread could have had it's time sliced away and before the next. However, I
have no idea how the BSD kernel operates so can't do more than assume and
guess.


--
Nigel


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Marc G. Fournier
On Fri, 28 Nov 2003, Tom Lane wrote:

> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> > I'm suspecting that the issue is amplified by load on the server itself
>
> That would fit right in with my idea that the failure occurs when the
> process loses its timeslice partway through gettimeofday().  Heavier
> load would presumably make it more likely to lose the CPU.

I'm posting to the freebsd lists about it, see if they have anything to
say ...

>
> BTW, I can't get top(1) to work on either pgsql74 or svr1, any idea why?

top is one of those things that do not work within a VM ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> I'm suspecting that the issue is amplified by load on the server itself

That would fit right in with my idea that the failure occurs when the
process loses its timeslice partway through gettimeofday().  Heavier
load would presumably make it more likely to lose the CPU.

BTW, I can't get top(1) to work on either pgsql74 or svr1, any idea why?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Marc G. Fournier
On Fri, 28 Nov 2003, Tom Lane wrote:

> Could it be a hardware problem on Marc's box?  Or specific to some other
> aspect of that installation (Marc, is pgsql74.hub.org multi-CPU, for
> example?)

All the servers are SMP ... pgsql74 is on a Dual-Xeon ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
"Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> Ah, I have made a mistake. It's only a 2.2.18 kernal. Dual SMP P-III, perhaps
> that's the issue there.

Hm.  I seem to recall there were still lots of SMP bugs in the 2.2.*
kernels.

> And on the FreeBSD system I've got this: 

> $ time ./a.out 2>&1 | tee a.txt
> out of order tv_sec: 1070066197 273140, prev 1070066195 721010
> out of order tv_usec: 1070066197 273140, prev 1070066195 721010
> out of order tv_sec: 1070067322 116061, prev 1070067320 440490
> out of order tv_usec: 1070067322 116061, prev 1070067320 440490
> out of order tv_sec: 1070067833 514969, prev 1070067831 755019
> out of order tv_usec: 1070067833 514969, prev 1070067831 755019
> ^C

AFAICT the above is a legal trace, indicating only that the test program
sometimes lost control for more than a second at a time.  The "revised"
version of the test will not complain about this.

Time going backwards by even one microsecond, however, is generally
considered Bad News, unless you're actively manipulating the system
clock setting.

One variable I didn't think to ask about is whether you are running
NTP.  In my experience an ntp daemon that has achieved lock will never
step the clock back by even 1 usec (it's supposed to use much more
subtle methods than that to manage the clock ;-)) but maybe under
unstable conditions such things could happen.  The machines I have
tested here all run NTP.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Marc G. Fournier
On Fri, 28 Nov 2003, Darcy Buskermolen wrote:

> I've been running this code on a pair of FreeBSD (i386) boxen, for some time
> now, one of which is a 4.8-STABLE, the other is a 5.2-BETA.
>
> In 10 minutes of execution the 4.8 box has only shown the following:
>
> out of order tv_sec: 1070063393 115226, prev 1070063391 381087
> out of order tv_usec: 1070063393 115226, prev 1070063391 381087

I'm suspecting that the issue is amplified by load on the server itself
...

> the 5.2 box, hasn't outputed anything.  I'm goignt o run this test on
> some other boxen I have and see what/ifany results are happening.

Similar results here with a 5.2 box ...



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Nigel J. Andrews
On Fri, 28 Nov 2003, Tom Lane wrote:

> "Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> > On Fri, 28 Nov 2003, Tom Lane wrote:
> >> "Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> > On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they
> > are all just a microsecond.
> >> 
> >> What do you mean by "just a microsecond"?
> 
> > I mean it's always a "out of order tv_usec..." line and the difference is
> > 1us. That is a.out gives:
> 
> > out of order tv_usec: 1070065862 374978, prev 1070065862 374979
> > out of order tv_usec: 1070065867 814300, prev 1070065867 814301
> 
> Fascinating.  I'd call that a bug too, but evidently one with a
> different mechanism than the BSD issue we are chasing.
> 
> FWIW, I have not seen any failures in a fair amount of runtime on
> a 2.4.18 (Red Hat 8.0) kernel here, running on a Dell P4.  What is
> the hardware platform you're using?

Ah, I have made a mistake. It's only a 2.2.18 kernal. Dual SMP P-III, perhaps
that's the issue there.

And on the FreeBSD system I've got this: 

$ time ./a.out 2>&1 | tee a.txt
out of order tv_sec: 1070066197 273140, prev 1070066195 721010
out of order tv_usec: 1070066197 273140, prev 1070066195 721010
out of order tv_sec: 1070067322 116061, prev 1070067320 440490
out of order tv_usec: 1070067322 116061, prev 1070067320 440490
out of order tv_sec: 1070067833 514969, prev 1070067831 755019
out of order tv_usec: 1070067833 514969, prev 1070067831 755019
^C

real38m53.026s
user6m13.953s
sys 32m6.589s

So not very often there.


--
Nigel



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Neil Conway
Tom Lane <[EMAIL PROTECTED]> writes:
> Whoa.  Try the following test program.

I tried this on two uniprocessor x86 machines:

(1)

$ uname -a
Linux tokyo 2.6.0-test10 #3 Mon Nov 24 13:43:54 EST 2003 i686 GNU/Linux

This machine produces this output, after running the test app for
about 30 minutes:

out of order tv_sec: 1070067959 929846, prev 1070067957 813066
out of order tv_usec: 1070068017 80145, prev 1070068017 80384
out of order tv_usec: 1070068017 104138, prev 1070068017 104368

(2)

$ uname -a
FreeBSD home.samurai.com 4.9-RELEASE FreeBSD 4.9-RELEASE #0: Thu Nov  6 21:25:41 EST 
2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SAMURAI  i386

This machine produces no output after running the test app for about
30 minutes.

-Neil


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
"Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> On Fri, 28 Nov 2003, Tom Lane wrote:
>> "Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they
> are all just a microsecond.
>> 
>> What do you mean by "just a microsecond"?

> I mean it's always a "out of order tv_usec..." line and the difference is
> 1us. That is a.out gives:

> out of order tv_usec: 1070065862 374978, prev 1070065862 374979
> out of order tv_usec: 1070065867 814300, prev 1070065867 814301

Fascinating.  I'd call that a bug too, but evidently one with a
different mechanism than the BSD issue we are chasing.

FWIW, I have not seen any failures in a fair amount of runtime on
a 2.4.18 (Red Hat 8.0) kernel here, running on a Dell P4.  What is
the hardware platform you're using?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> After 30+ minutes I'm now starting to see the same problems on my 5.2 testbed
> as well. I'm going to fire this test up on my FreeBSD 3.x and 2.x boxen and 
> see if' it's there too. I can confirm that this not a SMP issue as that it's 
> happeing on both UP and SMP boxen for me.

Good, that's one variable eliminated.

Looking at my own data, I notice that when the error happens, the
elapsed time shown between the immediately preceding and following
okay-looking timestamps is always significant (at least 100 msec
and often a second or more).  That is kind of a lot for a tight loop
containing one simple kernel call, no?  I am suspicious that the failure
occurs when gettimeofday() is called just as the process is losing
control of the CPU (due to using up its timeslice or whatever).  When
control eventually returns, the process gets a reading that is neither
pre-loss-of-CPU nor post-regain-of-CPU, but some unholy combination that
nets out as a time about 15 min in the past.  Just a theory, but it fits
some of the available facts.

Can anyone think of a significant interpretation for the number 695
seconds?  That's got to be an important clue ...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Nigel J. Andrews
On Fri, 28 Nov 2003, Tom Lane wrote:

> "Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> > On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they
> > are all just a microsecond.
> 
> What do you mean by "just a microsecond"?

I mean it's always a "out of order tv_usec..." line and the difference is
1us. That is a.out gives:

out of order tv_usec: 1070065862 374978, prev 1070065862 374979
out of order tv_usec: 1070065867 814300, prev 1070065867 814301
out of order tv_usec: 1070065868 794176, prev 1070065868 794177
out of order tv_usec: 1070065871 553831, prev 1070065871 553832

etc.


> 
> Attached is a tightened-up test program that will only complain if the
> value of gettimeofday goes backward (at all) or forward by more than
> 10 seconds (adjustable as MAX_SKIP).  This should be suitable to run on
> moderately loaded machines where the test program might occasionally not
> get dispatched for a few seconds.

I don't think that would show any for me. I've only got two skips showing from
a longer run on my FreeBSD 3.3 and those differ in the seconds counter by only
2. Time between the occurances is about 1100s.


--
Nigel



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
"Nigel J. Andrews" <[EMAIL PROTECTED]> writes:
> On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they
> are all just a microsecond.

What do you mean by "just a microsecond"?

Attached is a tightened-up test program that will only complain if the
value of gettimeofday goes backward (at all) or forward by more than
10 seconds (adjustable as MAX_SKIP).  This should be suitable to run on
moderately loaded machines where the test program might occasionally not
get dispatched for a few seconds.

I still see no failures on my own (non-BSD) machines, fairly frequent
failures on pgsql74.hub.org:

> ./a.out
out of order tv_sec: 1070066031 262048, prev 1070066726 688240
out of order tv_sec: 1070066726 785019, prev 1070066031 262048
out of order tv_sec: 1070066062 62622, prev 1070066757 588814
out of order tv_sec: 1070066757 771848, prev 1070066062 62622
out of order tv_sec: 1070066093 262974, prev 1070066788 689167
out of order tv_sec: 1070066788 777486, prev 1070066093 262974
out of order tv_sec: 1070066114 113410, prev 1070066809 589602
out of order tv_sec: 1070066809 724663, prev 1070066114 113410
out of order tv_sec: 1070066145 113899, prev 1070066840 590097
out of order tv_sec: 1070066840 726558, prev 1070066145 113899
out of order tv_sec: 1070066155 263911, prev 1070066850 690103
out of order tv_sec: 1070066850 781343, prev 1070066155 263911
out of order tv_sec: 1070066176 164307, prev 1070066871 590505
out of order tv_sec: 1070066871 643350, prev 1070066176 164307
out of order tv_sec: 1070066217 264846, prev 1070066912 691039
out of order tv_sec: 1070066912 775989, prev 1070066217 264846
out of order tv_sec: 1070066248 65394, prev 1070066943 591592
out of order tv_sec: 1070066943 773822, prev 1070066248 65394
^C

and rarer failures on cvs.postgresql.org:

> ./a.out
out of order tv_sec: 1070066099 389427, prev 1070066794 865617
out of order tv_sec: 1070066795 17855, prev 1070066099 389427
out of order tv_sec: 1070066252 541729, prev 1070066947 967921
out of order tv_sec: 1070066948 38715, prev 1070066252 541729
out of order tv_sec: 1070066371 393525, prev 1070067066 869715
out of order tv_sec: 1070067068 24754, prev 1070066371 393525
^C

It seems consistent that the error is 695 seconds and change
when it happens, and that the very next read gives a correct
(or at least plausible) value again.

regards, tom lane


#include 
#include 

#define MAX_SKIP  10

int
main()
{
struct timeval prevtime;
struct timeval curtime;

gettimeofday(&prevtime, NULL);

for (;;)
{
gettimeofday(&curtime, NULL);

if (curtime.tv_usec < 0 || curtime.tv_usec >= 100)
fprintf(stderr, "bogus tv_usec: %ld %ld, prev %ld %ld\n",
(long int) curtime.tv_sec,
(long int) curtime.tv_usec,
(long int) prevtime.tv_sec,
(long int) prevtime.tv_usec);
else if (curtime.tv_sec < prevtime.tv_sec ||
 curtime.tv_sec > prevtime.tv_sec + MAX_SKIP)
fprintf(stderr, "out of order tv_sec: %ld %ld, prev %ld %ld\n",
(long int) curtime.tv_sec,
(long int) curtime.tv_usec,
(long int) prevtime.tv_sec,
(long int) prevtime.tv_usec);
else if (curtime.tv_usec < prevtime.tv_usec &&
 curtime.tv_sec == prevtime.tv_sec)
fprintf(stderr, "out of order tv_usec: %ld %ld, prev %ld 
%ld\n",
(long int) curtime.tv_sec,
(long int) curtime.tv_usec,
(long int) prevtime.tv_sec,
(long int) prevtime.tv_usec);

prevtime = curtime;
}

return 0;
}

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Darcy Buskermolen
On November 28, 2003 04:19 pm, Tom Lane wrote:
> I said:
> > The failure is definitely quite repeatable on pgsql74.hub.org.  I don't
> > see it on svr1.postgresql.org, though, which seems to be running almost
> > the same kernel.
>
> After looking more closely, I take that back: svr1 is failing too,
>

After 30+ minutes I'm now starting to see the same problems on my 5.2 testbed 
as well. I'm going to fire this test up on my FreeBSD 3.x and 2.x boxen and 
see if' it's there too. I can confirm that this not a SMP issue as that it's 
happeing on both UP and SMP boxen for me.



> though not as often:
> > uname -a
>
> FreeBSD svr1.postgresql.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #4: Sat
> Sep 20 14:41:58 ADT 2003
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel  i386
>
> > ./a.out
>
> out of order tv_sec: 1070064182 410497, prev 1070064877 836691
> <--- out of order tv_usec: 1070064182 410497, prev 1070064877
> 836691
> out of order tv_sec: 1070064877 920579, prev 1070064182 410497
> out of order tv_sec: 1070064901 126624, prev 1070064899 897160
> out of order tv_usec: 1070064901 126624, prev 1070064899 897160
> out of order tv_sec: 1070064907 306286, prev 1070064905 481423
> out of order tv_usec: 1070064907 306286, prev 1070064905 481423
> out of order tv_sec: 1070064218 861047, prev 1070064914 337241
> <--- out of order tv_sec: 1070064914 570717, prev 1070064218 861047
> out of order tv_usec: 1070064914 570717, prev 1070064218 861047
> out of order tv_sec: 1070064241 411391, prev 1070064936 837585
> <--- out of order tv_usec: 1070064241 411391, prev 1070064936
> 837585
> out of order tv_sec: 1070064937 497925, prev 1070064241 411391
> out of order tv_sec: 1070064251 811548, prev 1070064947 337739
> <--- out of order tv_sec: 1070064947 508364, prev 1070064251 811548
> out of order tv_usec: 1070064947 508364, prev 1070064251 811548
>
> Maybe it's a 4.9-PRERELEASE bug?
>
>   regards, tom lane

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Nigel J. Andrews
On Fri, 28 Nov 2003, Tom Lane wrote:

> Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> > On November 28, 2003 12:33 pm, Tom Lane wrote:
> >> Whoa.  Try the following test program.  Then send it in to your friendly
> >> local BSD hackers 
> 
> > I've been running this code on a pair of FreeBSD (i386) boxen, for some time 
> > now, one of which is a 4.8-STABLE, the other is a 5.2-BETA.
> 
> Could it be a hardware problem on Marc's box?  Or specific to some other
> aspect of that installation (Marc, is pgsql74.hub.org multi-CPU, for
> example?)
> 
> The failure is definitely quite repeatable on pgsql74.hub.org.  I don't
> see it on svr1.postgresql.org, though, which seems to be running almost
> the same kernel.

On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they are all
just a microsecond.

On an Intel FreeBSD 3.3 I had one just after starting the program that was 2
secs behind. Then I stopped to restart with a leading under time and I've not
seen any since. (4'50" it was running)  .529 .2

user/sys time = 0.53 on the Linux system
user/sys time = 0.2 on the FreeBSD one

Make of that what you will.


--
Nigel Andrews



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Darcy Buskermolen
On November 28, 2003 12:33 pm, Tom Lane wrote:
> I said:
> > I'm wondering if there's an actual bug in gettimeofday() in this
> > platform, such that once in a while it returns a value that's off
> > a minute or so ...
>
> Whoa.  Try the following test program.  Then send it in to your friendly
> local BSD hackers 
>
> > uname -a
>
> FreeBSD pgsql74.hub.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #3: Sat Sep
> 20 14:16:48 ADT 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel 
> i386
>
> > ./a.out
8<-- snip
Further to my last email, after 30 minutes of running I was able to get 
simular behavior off the 5.2 test system.

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
I said:
> The failure is definitely quite repeatable on pgsql74.hub.org.  I don't
> see it on svr1.postgresql.org, though, which seems to be running almost
> the same kernel.

After looking more closely, I take that back: svr1 is failing too,
though not as often:

> uname -a
FreeBSD svr1.postgresql.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #4: Sat Sep 20 
14:41:58 ADT 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel  i386
> ./a.out
out of order tv_sec: 1070064182 410497, prev 1070064877 836691 <---
out of order tv_usec: 1070064182 410497, prev 1070064877 836691
out of order tv_sec: 1070064877 920579, prev 1070064182 410497
out of order tv_sec: 1070064901 126624, prev 1070064899 897160
out of order tv_usec: 1070064901 126624, prev 1070064899 897160
out of order tv_sec: 1070064907 306286, prev 1070064905 481423
out of order tv_usec: 1070064907 306286, prev 1070064905 481423
out of order tv_sec: 1070064218 861047, prev 1070064914 337241 <---
out of order tv_sec: 1070064914 570717, prev 1070064218 861047
out of order tv_usec: 1070064914 570717, prev 1070064218 861047
out of order tv_sec: 1070064241 411391, prev 1070064936 837585 <---
out of order tv_usec: 1070064241 411391, prev 1070064936 837585
out of order tv_sec: 1070064937 497925, prev 1070064241 411391
out of order tv_sec: 1070064251 811548, prev 1070064947 337739 <---
out of order tv_sec: 1070064947 508364, prev 1070064251 811548
out of order tv_usec: 1070064947 508364, prev 1070064251 811548

Maybe it's a 4.9-PRERELEASE bug?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Tom Lane
Darcy Buskermolen <[EMAIL PROTECTED]> writes:
> On November 28, 2003 12:33 pm, Tom Lane wrote:
>> Whoa.  Try the following test program.  Then send it in to your friendly
>> local BSD hackers 

> I've been running this code on a pair of FreeBSD (i386) boxen, for some time 
> now, one of which is a 4.8-STABLE, the other is a 5.2-BETA.

Could it be a hardware problem on Marc's box?  Or specific to some other
aspect of that installation (Marc, is pgsql74.hub.org multi-CPU, for
example?)

The failure is definitely quite repeatable on pgsql74.hub.org.  I don't
see it on svr1.postgresql.org, though, which seems to be running almost
the same kernel.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] gettimeofday() goes backwards on FreeBSD 4.9

2003-11-28 Thread Darcy Buskermolen
On November 28, 2003 12:33 pm, Tom Lane wrote:
> I said:
> > I'm wondering if there's an actual bug in gettimeofday() in this
> > platform, such that once in a while it returns a value that's off
> > a minute or so ...
>
> Whoa.  Try the following test program.  Then send it in to your friendly
> local BSD hackers 
>
> > uname -a
>
> FreeBSD pgsql74.hub.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #3: Sat Sep
> 20 14:16:48 ADT 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel 
> i386
>
> > ./a.out
>
> out of order tv_sec: 1070050646 939970, prev 1070051342 516165
> out of order tv_sec: 1070051342 789127, prev 1070050646 939970
> out of order tv_usec: 1070051342 789127, prev 1070050646 939970
> out of order tv_sec: 1070050654 690255, prev 1070051350 216448
> out of order tv_sec: 1070051350 402412, prev 1070050654 690255
> out of order tv_usec: 1070051350 402412, prev 1070050654 690255
> out of order tv_sec: 1070050672 990359, prev 1070051368 416546
> out of order tv_sec: 1070051368 501648, prev 1070050672 990359
> out of order tv_usec: 1070051368 501648, prev 1070050672 990359
> out of order tv_sec: 1070050672 990363, prev 1070051368 516557
> out of order tv_sec: 1070051368 700227, prev 1070050672 990363
> out of order tv_usec: 1070051368 700227, prev 1070050672 990363
> out of order tv_sec: 1070050675 740524, prev 1070051371 216718
> out of order tv_sec: 1070051371 346521, prev 1070050675 740524
> out of order tv_usec: 1070051371 346521, prev 1070050675 740524
> out of order tv_sec: 1070050676 440415, prev 1070051371 916605
> out of order tv_usec: 1070050676 440415, prev 1070051371 916605
> out of order tv_sec: 1070051372 238348, prev 1070050676 440415
> out of order tv_usec: 1070051372 238348, prev 1070050676 440415
> ^C
>


I've been running this code on a pair of FreeBSD (i386) boxen, for some time 
now, one of which is a 4.8-STABLE, the other is a 5.2-BETA.

In 10 minutes of execution the 4.8 box has only shown the following:

out of order tv_sec: 1070063393 115226, prev 1070063391 381087
out of order tv_usec: 1070063393 115226, prev 1070063391 381087

the 5.2 box, hasn't outputed anything.  I'm goignt o run this test on some 
other boxen  I have and see what/ifany results are happening.

>
>   regards, tom lane
>
>
> #include 
> #include 
>
> int
> main()
> {
>   struct timeval prevtime;
>   struct timeval curtime;
>
>   gettimeofday(&prevtime, NULL);
>
>   for (;;)
>   {
>   gettimeofday(&curtime, NULL);
>
>   if (curtime.tv_usec < 0 || curtime.tv_usec >= 100)
>   fprintf(stderr, "bogus tv_usec: %ld %ld, prev %ld %ld\n",
>   (long int) curtime.tv_sec,
>   (long int) curtime.tv_usec,
>   (long int) prevtime.tv_sec,
>   (long int) prevtime.tv_usec);
>
>   if (curtime.tv_sec < prevtime.tv_sec ||
>   curtime.tv_sec > prevtime.tv_sec + 1)
>   fprintf(stderr, "out of order tv_sec: %ld %ld, prev %ld %ld\n",
>   (long int) curtime.tv_sec,
>   (long int) curtime.tv_usec,
>   (long int) prevtime.tv_sec,
>   (long int) prevtime.tv_usec);
>
>   if (curtime.tv_usec < prevtime.tv_usec &&
>   curtime.tv_sec != prevtime.tv_sec + 1)
>   fprintf(stderr, "out of order tv_usec: %ld %ld, prev %ld 
> %ld\n",
>   (long int) curtime.tv_sec,
>   (long int) curtime.tv_usec,
>   (long int) prevtime.tv_sec,
>   (long int) prevtime.tv_usec);
>
>   prevtime = curtime;
>   }
>
>   return 0;
> }
>
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])