Re: time keeps on slipping... slipping...

2012-10-11 Thread Alexander Motin

On 11.10.2012 09:30, John-Mark Gurney wrote:

Alexander Motin wrote this message on Thu, Oct 11, 2012 at 01:43 +0300:

On 08.10.2012 07:02, John-Mark Gurney wrote:

I recently put together a new machine w/ a SuperMicro H8SCM and an
AMD Opteron 4228 HE...  I've having an issue where the clock on the
machine skips around...  The wierd part is that it's very sudden when
it happens...  ntp sometimes brings it back, but it can't when the clock
gets too far ahread (1000 seconds), ntp dies...

In order to catch it happening, I ran a sleep 60 loop fetching time

>from another server that keeps time correctly via:

while sleep 60; do echo -n h2:; nc h2 13; date; ntpdate h2.funkthat.com;
done

here are some snippits:
h2:Sun Oct  7 17:12:54 2012^M
Sun Oct  7 17:12:54 PDT 2012
  7 Oct 17:12:54 ntpdate[31036]: the NTP socket is in use, exiting
h2:Sun Oct  7 17:13:48 2012^M
Sun Oct  7 17:20:21 PDT 2012
  7 Oct 17:20:21 ntpdate[31045]: the NTP socket is in use, exiting

but then ntp brings it back in sync:
h2:Sun Oct  7 17:28:49 2012^M
Sun Oct  7 17:35:21 PDT 2012
  7 Oct 17:35:21 ntpdate[31164]: the NTP socket is in use, exiting
h2:Sun Oct  7 17:29:49 2012^M
Sun Oct  7 17:29:49 PDT 2012
  7 Oct 17:29:49 ntpdate[31170]: the NTP socket is in use, exiting

It happens pretty often:
Oct  7 00:19:13 gold ntpd[3721]: time reset -785.347912 s
Oct  7 00:46:37 gold ntpd[3721]: time reset -392.673256 s
Oct  7 01:04:24 gold ntpd[3721]: time reset -785.346533 s
Oct  7 15:00:59 gold ntpd[3721]: time reset -392.681720 s
Oct  7 16:32:11 gold ntpd[3721]: time reset -392.671268 s
Oct  7 17:29:29 gold ntpd[3721]: time reset -392.671752 s
Oct  7 18:04:37 gold ntpd[3721]: time reset -785.346987 s

but as you can see above, the time slip happens abruptly.. looks like
a rounding error or something...

I'm now reducing the sleep to 5 seconds... but as you can see the sleep
ends a few seconds early and local time suddenly jumped forward 6
minutes 33 seconds...

$ sysctl kern.timecounter
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: TSC-low(1000) ACPI-safe(850) HPET(950) i8254(0)
dummy(-100)
kern.timecounter.hardware: TSC-low
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 11598
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.HPET.counter: 3257069245
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.ACPI-safe.mask: 16777215
kern.timecounter.tc.ACPI-safe.counter: 4219134510
kern.timecounter.tc.ACPI-safe.frequency: 3579545
kern.timecounter.tc.ACPI-safe.quality: 850
kern.timecounter.tc.TSC-low.mask: 4294967295
kern.timecounter.tc.TSC-low.counter: 2854866610
kern.timecounter.tc.TSC-low.frequency: 10937740
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
$ sysctl kern.eventtimer
kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0)
kern.eventtimer.et.LAPIC.flags: 15
kern.eventtimer.et.LAPIC.frequency: 12217
kern.eventtimer.et.LAPIC.quality: 400
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2

I have switched my timecounter to HPET to see if things are different...

Any clues?


Mentioned switching to HPET could tell a lot about the problem.
Switching event timer also may be interesting.


Since I switch to HPET, it hasn't happened at all in the last 3 days..


That is probably tells about some problems with TSC timecounter. What is 
strange to me is time jump size of 5 minutes. TSC timecounter should 
overflow each few seconds, so single jump should be just that big.



Should I try switching back to TSC and switching event timer? do you
need any other info, or want me to try anything else?


You may try to do it to be sure eventtimers are not related to the case.


Oh, forgot to include the specific processor info in my previous
email:
CPU: AMD Opteron(tm) Processor 4228 HE   (2800.05-MHz K8-class CPU)
   Origin = "AuthenticAMD"  Id = 0x600f12  Family = 0x15  Model = 0x1  Stepping 
= 2
   
Features=0x178bfbff
   
Features2=0x1e98220b
   AMD Features=0x2e500800
   AMD 
Features2=0x1c9bfff,>
   TSC: P-state invariant, performance statistics


Unfortunately, I don't know AMD processors specifics. May be jkim@ or 
avg@ may remember something. As far as I know, kernel should block enter 
sleep states on AMD CPUs when LAPIC eventtimer is used (by default). In 
such case I guess TSC should also work fine. But I don't know what other 
possible sources of asynchronicity may be there.


--
Alexander Motin

Re: time keeps on slipping... slipping...

2012-10-11 Thread Peter Jeremy
On 2012-Oct-10 23:30:30 -0700, John-Mark Gurney  wrote:
>> >kern.timecounter.tc.TSC-low.mask: 4294967295
>> >kern.timecounter.tc.TSC-low.counter: 2854866610
>> >kern.timecounter.tc.TSC-low.frequency: 10937740
>> >kern.timecounter.tc.TSC-low.quality: 1000
...
>Since I switch to HPET, it hasn't happened at all in the last 3 days..

That suggests that there's something peculiar about your TSC.  There
are a variety of possibilities...  Does your CPU support multiple
Cx states and are you using them ("sysctl dev.cpu | grep cx_")?

-- 
Peter Jeremy


pgpeGw8Kk7Rhc.pgp
Description: PGP signature


Re: time keeps on slipping... slipping...

2012-10-10 Thread John-Mark Gurney
Alexander Motin wrote this message on Thu, Oct 11, 2012 at 01:43 +0300:
> On 08.10.2012 07:02, John-Mark Gurney wrote:
> >I recently put together a new machine w/ a SuperMicro H8SCM and an
> >AMD Opteron 4228 HE...  I've having an issue where the clock on the
> >machine skips around...  The wierd part is that it's very sudden when
> >it happens...  ntp sometimes brings it back, but it can't when the clock
> >gets too far ahread (1000 seconds), ntp dies...
> >
> >In order to catch it happening, I ran a sleep 60 loop fetching time
> >from another server that keeps time correctly via:
> >while sleep 60; do echo -n h2:; nc h2 13; date; ntpdate h2.funkthat.com; 
> >done
> >
> >here are some snippits:
> >h2:Sun Oct  7 17:12:54 2012^M
> >Sun Oct  7 17:12:54 PDT 2012
> >  7 Oct 17:12:54 ntpdate[31036]: the NTP socket is in use, exiting
> >h2:Sun Oct  7 17:13:48 2012^M
> >Sun Oct  7 17:20:21 PDT 2012
> >  7 Oct 17:20:21 ntpdate[31045]: the NTP socket is in use, exiting
> >
> >but then ntp brings it back in sync:
> >h2:Sun Oct  7 17:28:49 2012^M
> >Sun Oct  7 17:35:21 PDT 2012
> >  7 Oct 17:35:21 ntpdate[31164]: the NTP socket is in use, exiting
> >h2:Sun Oct  7 17:29:49 2012^M
> >Sun Oct  7 17:29:49 PDT 2012
> >  7 Oct 17:29:49 ntpdate[31170]: the NTP socket is in use, exiting
> >
> >It happens pretty often:
> >Oct  7 00:19:13 gold ntpd[3721]: time reset -785.347912 s
> >Oct  7 00:46:37 gold ntpd[3721]: time reset -392.673256 s
> >Oct  7 01:04:24 gold ntpd[3721]: time reset -785.346533 s
> >Oct  7 15:00:59 gold ntpd[3721]: time reset -392.681720 s
> >Oct  7 16:32:11 gold ntpd[3721]: time reset -392.671268 s
> >Oct  7 17:29:29 gold ntpd[3721]: time reset -392.671752 s
> >Oct  7 18:04:37 gold ntpd[3721]: time reset -785.346987 s
> >
> >but as you can see above, the time slip happens abruptly.. looks like
> >a rounding error or something...
> >
> >I'm now reducing the sleep to 5 seconds... but as you can see the sleep
> >ends a few seconds early and local time suddenly jumped forward 6
> >minutes 33 seconds...
> >
> >$ sysctl kern.timecounter
> >kern.timecounter.fast_gettime: 1
> >kern.timecounter.tick: 1
> >kern.timecounter.choice: TSC-low(1000) ACPI-safe(850) HPET(950) i8254(0) 
> >dummy(-100)
> >kern.timecounter.hardware: TSC-low
> >kern.timecounter.stepwarnings: 0
> >kern.timecounter.tc.i8254.mask: 65535
> >kern.timecounter.tc.i8254.counter: 11598
> >kern.timecounter.tc.i8254.frequency: 1193182
> >kern.timecounter.tc.i8254.quality: 0
> >kern.timecounter.tc.HPET.mask: 4294967295
> >kern.timecounter.tc.HPET.counter: 3257069245
> >kern.timecounter.tc.HPET.frequency: 14318180
> >kern.timecounter.tc.HPET.quality: 950
> >kern.timecounter.tc.ACPI-safe.mask: 16777215
> >kern.timecounter.tc.ACPI-safe.counter: 4219134510
> >kern.timecounter.tc.ACPI-safe.frequency: 3579545
> >kern.timecounter.tc.ACPI-safe.quality: 850
> >kern.timecounter.tc.TSC-low.mask: 4294967295
> >kern.timecounter.tc.TSC-low.counter: 2854866610
> >kern.timecounter.tc.TSC-low.frequency: 10937740
> >kern.timecounter.tc.TSC-low.quality: 1000
> >kern.timecounter.smp_tsc: 1
> >kern.timecounter.invariant_tsc: 1
> >$ sysctl kern.eventtimer
> >kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0)
> >kern.eventtimer.et.LAPIC.flags: 15
> >kern.eventtimer.et.LAPIC.frequency: 12217
> >kern.eventtimer.et.LAPIC.quality: 400
> >kern.eventtimer.et.i8254.flags: 1
> >kern.eventtimer.et.i8254.frequency: 1193182
> >kern.eventtimer.et.i8254.quality: 100
> >kern.eventtimer.et.RTC.flags: 17
> >kern.eventtimer.et.RTC.frequency: 32768
> >kern.eventtimer.et.RTC.quality: 0
> >kern.eventtimer.periodic: 0
> >kern.eventtimer.timer: LAPIC
> >kern.eventtimer.activetick: 1
> >kern.eventtimer.idletick: 0
> >kern.eventtimer.singlemul: 2
> >
> >I have switched my timecounter to HPET to see if things are different...
> >
> >Any clues?
> 
> Mentioned switching to HPET could tell a lot about the problem. 
> Switching event timer also may be interesting.

Since I switch to HPET, it hasn't happened at all in the last 3 days..

Should I try switching back to TSC and switching event timer? do you
need any other info, or want me to try anything else?

Oh, forgot to include the specific processor info in my previous
email:
CPU: AMD Opteron(tm) Processor 4228 HE   (2800.05-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x600f12  Family = 0x15  Model = 0x1  Stepping 
= 2
  
Features=0x178bfbff
  
Features2=0x1e98220b
  AMD Features=0x2e500800
  AMD 
Features2=0x1c9bfff,>
  TSC: P-state invariant, performance statistics

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: time keeps on slipping... slipping...

2012-10-10 Thread Alexander Motin

On 08.10.2012 07:02, John-Mark Gurney wrote:

I recently put together a new machine w/ a SuperMicro H8SCM and an
AMD Opteron 4228 HE...  I've having an issue where the clock on the
machine skips around...  The wierd part is that it's very sudden when
it happens...  ntp sometimes brings it back, but it can't when the clock
gets too far ahread (1000 seconds), ntp dies...

In order to catch it happening, I ran a sleep 60 loop fetching time
from another server that keeps time correctly via:
while sleep 60; do echo -n h2:; nc h2 13; date; ntpdate h2.funkthat.com; done

here are some snippits:
h2:Sun Oct  7 17:12:54 2012^M
Sun Oct  7 17:12:54 PDT 2012
  7 Oct 17:12:54 ntpdate[31036]: the NTP socket is in use, exiting
h2:Sun Oct  7 17:13:48 2012^M
Sun Oct  7 17:20:21 PDT 2012
  7 Oct 17:20:21 ntpdate[31045]: the NTP socket is in use, exiting

but then ntp brings it back in sync:
h2:Sun Oct  7 17:28:49 2012^M
Sun Oct  7 17:35:21 PDT 2012
  7 Oct 17:35:21 ntpdate[31164]: the NTP socket is in use, exiting
h2:Sun Oct  7 17:29:49 2012^M
Sun Oct  7 17:29:49 PDT 2012
  7 Oct 17:29:49 ntpdate[31170]: the NTP socket is in use, exiting

It happens pretty often:
Oct  7 00:19:13 gold ntpd[3721]: time reset -785.347912 s
Oct  7 00:46:37 gold ntpd[3721]: time reset -392.673256 s
Oct  7 01:04:24 gold ntpd[3721]: time reset -785.346533 s
Oct  7 15:00:59 gold ntpd[3721]: time reset -392.681720 s
Oct  7 16:32:11 gold ntpd[3721]: time reset -392.671268 s
Oct  7 17:29:29 gold ntpd[3721]: time reset -392.671752 s
Oct  7 18:04:37 gold ntpd[3721]: time reset -785.346987 s

but as you can see above, the time slip happens abruptly.. looks like
a rounding error or something...

I'm now reducing the sleep to 5 seconds... but as you can see the sleep
ends a few seconds early and local time suddenly jumped forward 6
minutes 33 seconds...

$ sysctl kern.timecounter
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: TSC-low(1000) ACPI-safe(850) HPET(950) i8254(0) 
dummy(-100)
kern.timecounter.hardware: TSC-low
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 11598
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.HPET.counter: 3257069245
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.ACPI-safe.mask: 16777215
kern.timecounter.tc.ACPI-safe.counter: 4219134510
kern.timecounter.tc.ACPI-safe.frequency: 3579545
kern.timecounter.tc.ACPI-safe.quality: 850
kern.timecounter.tc.TSC-low.mask: 4294967295
kern.timecounter.tc.TSC-low.counter: 2854866610
kern.timecounter.tc.TSC-low.frequency: 10937740
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
$ sysctl kern.eventtimer
kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0)
kern.eventtimer.et.LAPIC.flags: 15
kern.eventtimer.et.LAPIC.frequency: 12217
kern.eventtimer.et.LAPIC.quality: 400
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2

I have switched my timecounter to HPET to see if things are different...

Any clues?


Mentioned switching to HPET could tell a lot about the problem. 
Switching event timer also may be interesting.


--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: time keeps on slipping... slipping...

2012-10-08 Thread Chris Nehren
On Sun, Oct 07, 2012 at 21:02:39 -0700 , John-Mark Gurney wrote:
> I recently put together a new machine w/ a SuperMicro H8SCM and an
> AMD Opteron 4228 HE...  I've having an issue where the clock on the
> machine skips around...  The wierd part is that it's very sudden when
> it happens...  ntp sometimes brings it back, but it can't when the clock
> gets too far ahread (1000 seconds), ntp dies...
> 
> In order to catch it happening, I ran a sleep 60 loop fetching time
> from another server that keeps time correctly via:
> while sleep 60; do echo -n h2:; nc h2 13; date; ntpdate h2.funkthat.com; done
> 
> here are some snippits:
> h2:Sun Oct  7 17:12:54 2012^M
> Sun Oct  7 17:12:54 PDT 2012
>  7 Oct 17:12:54 ntpdate[31036]: the NTP socket is in use, exiting
> h2:Sun Oct  7 17:13:48 2012^M
> Sun Oct  7 17:20:21 PDT 2012
>  7 Oct 17:20:21 ntpdate[31045]: the NTP socket is in use, exiting
> 
> but then ntp brings it back in sync:
> h2:Sun Oct  7 17:28:49 2012^M
> Sun Oct  7 17:35:21 PDT 2012
>  7 Oct 17:35:21 ntpdate[31164]: the NTP socket is in use, exiting
> h2:Sun Oct  7 17:29:49 2012^M
> Sun Oct  7 17:29:49 PDT 2012
>  7 Oct 17:29:49 ntpdate[31170]: the NTP socket is in use, exiting
> 
> I have switched my timecounter to HPET to see if things are different...
> 
> Any clues?

Looks like faulty hardware to me. You shouldn't need to work that hard
to keep the clock in sync. Can you boot a livecd of e.g. Linux or
OpenIndiana and reproduce the issue?

-- 
Thanks and best regards,
Chris Nehren


pgppOHFTirbxg.pgp
Description: PGP signature