Re: logging

2019-04-13 Thread Udo van den Heuvel via devel
On 13-04-19 18:28, Achim Gratz via devel wrote:
>> !?
>> What /was/ the problem in that case?
> 
> The same that you still see: the system clock gets stopped and two
> consecutive reads of the clock read the same (or maybe time even goes
> backwards when looked at with higher resolution).

Timetravel possibility?
After switching to tsc the logging continues.
Also the PLL goes up and offsets rise. (just like before)

Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Achim Gratz via devel
Udo van den Heuvel via devel writes:
> On 13-04-19 15:29, Achim Gratz via devel wrote:
>> Udo van den Heuvel via devel writes:
>>> Fedora 29, kernel.org 4.19.30. (my compile)
>> 
>> Is this a "tickless" kernel perhaps?  If so, try "nohz=off" on the
>> kernel command line in the boot manager and see if that helps.
>
> # grep HZ .config|grep -v ^#
> CONFIG_NO_HZ_COMMON=y
> CONFIG_NO_HZ_IDLE=y
> CONFIG_HZ_250=y
> CONFIG_HZ=250
>
> tickless does not go with PPS support...

How about you boot with "nohz=off"?

> My kernel is from kernel.org and compiled by me.

What keeps you from using a newer one, then?

> I need faster memory for later BIOS. Perhaps later this year.

That's the first time I ever hear about a BIOS update requiring a higher
memory speed (earlier BIOS versions not supporting _hugher_ memory
speeds, yes -- but not the other way around).

> !?
> What /was/ the problem in that case?

The same that you still see: the system clock gets stopped and two
consecutive reads of the clock read the same (or maybe time even goes
backwards when looked at with higher resolution).

> With tsc there was no real sync to pps.
> With hpet we do have sync to pps.

The only thing switching to HPET does is reducing clock resolution _a
lot_, so if your clock is funky with TSC, it still is with HPET.  It has
more chance of averaging out over the longer interval between clock
ticks, that's all.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Udo van den Heuvel via devel
On 13-04-19 14:02, Hal Murray wrote:
> Plese give it a quick try to see if ntpsec is the problem.  How about just 
> trying the 1.1.3 release?

ntpsec 1.1.3's ntpd from
ftp://ftp.ntpsec.org/pub/releases/ntpsec-1.1.3.tar.gz gives me after
startup:

Apr 13 15:53:50 bla ntpd[12382]: CLOCK: ts_prev 1555163630 s + 594156272
ns, ts_min 1555163630 s + 594155713 ns
Apr 13 15:53:50 bla ntpd[12382]: CLOCK: ts 1555163630 s + 594156272 ns
Apr 13 15:53:50 bla ntpd[12382]: CLOCK: sys_fuzz 2374 nsec, prior fuzz
0.02254
Apr 13 15:53:50 bla ntpd[12382]: CLOCK: this fuzz -0.01891
Apr 13 15:53:50 bla ntpd[12382]: CLOCK: prev get_systime
0xe05c686e.981a94b6 is 0.01211 later than 0xe05c686e.981a8064



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Udo van den Heuvel via devel
On 13-04-19 15:29, Achim Gratz via devel wrote:
> Udo van den Heuvel via devel writes:
>> Fedora 29, kernel.org 4.19.30. (my compile)
> 
> Is this a "tickless" kernel perhaps?  If so, try "nohz=off" on the
> kernel command line in the boot manager and see if that helps.

# grep HZ .config|grep -v ^#
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
CONFIG_HZ_250=y
CONFIG_HZ=250

tickless does not go with PPS support...

>> AMD Ryzen 5 2400G with Radeon Vega Graphics
> 
> While Raven Ridge should fully work with 4.19, there were mutiple fixes
> for covering all sorts of things in later kernels.  I'm not sure how
> many of these got backported by the Fedora folks, 

My kernel is from kernel.org and compiled by me.

> the latest BIOS installed.

I need faster memory for later BIOS. Perhaps later this year.

>> clocksource is fixed at hpet since the previous situations where clock
>> sync was weird/gone/etc.
> 
> That likely wasn't your real problem then, and switching to HPET just
> has made it less easy to detect.

!?
What /was/ the problem in that case?
With tsc there was no real sync to pps.
With hpet we do have sync to pps.


Udo



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Achim Gratz via devel
Udo van den Heuvel via devel writes:
> Fedora 29, kernel.org 4.19.30. (my compile)

Is this a "tickless" kernel perhaps?  If so, try "nohz=off" on the
kernel command line in the boot manager and see if that helps.

> AMD Ryzen 5 2400G with Radeon Vega Graphics

While Raven Ridge should fully work with 4.19, there were mutiple fixes
for covering all sorts of things in later kernels.  I'm not sure how
many of these got backported by the Fedora folks, but you might want to
check a distribution with a more modern kernel (an openSUSE Tumbleweed
live imgae for instance).  

> on Gigabyte X470 AORUS ULTRA GAMING motherboard.

Raven Ridge on Linux is also well known to have interactions with
certain mobo and/or BIOS features.  There are a number of functions that
can be controlled essentially from outside the CPU and if the OS doesn't
know about them (like missing or wrong ACPI entries), "interesting"
things might occur.  So again, check and maybe switch off anything in
the BIOS aimed at performance/overclocking, preferrably after getting
the latest BIOS installed.

> clocksource is fixed at hpet since the previous situations where clock
> sync was weird/gone/etc.

That likely wasn't your real problem then, and switching to HPET just
has made it less easy to detect.

-- 

Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Waldorf MIDI Implementation & additional documentation:
http://Synth.Stromeko.net/Downloads.html#WaldorfDocs

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Hal Murray via devel
> I know about bisect but it is quite a task.

HPET works for me.

So far, you have the only test case.

Plese give it a quick try to see if ntpsec is the problem.  How about just 
trying the 1.1.3 release?

-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Udo van den Heuvel via devel
On 13-04-19 12:21, Hal Murray wrote:
>> I never ever saw these before.
> 
> Something changed.  All we have to do is figure out what/when.
> 
> Was the switch to using HPET recent?

hpet must have been the default until it wasn't or at least tsc stopped
being stable.

> Did you do a recent git pull?  Do you know how to drive git bisect?

I know about bisect but it is quite a task.

Kind regards,
Udo

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Hal Murray via devel


> clocksource is fixed at hpet since the previous situations where clock sync
> was weird/gone/etc.

> I never ever saw these before.

Something changed.  All we have to do is figure out what/when.

Was the switch to using HPET recent?

Did you do a recent git pull?  Do you know how to drive git bisect?


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Udo van den Heuvel via devel
On 13-04-19 10:09, Hal Murray via devel wrote:
> Here is a typical batch of the confusing CLOCK printout:
> 
> Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: ts_prev 1555072655 s +
> 712154322 ns, ts_min 1555072655 s + 712153764 ns
> Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: ts 1555072655 s + 712154322 ns
> Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: sys_fuzz 1816 nsec, prior
> fuzz 0.01594
> Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: this fuzz -0.01722
> Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: prev get_systime
> 0xe05b050f.b64fb1ce is 0.00942 later than 0xe05b050f.b64fa201
> 
> It looks like something is screwed up with your system clock.

Hmm... ? :-/

>> Fedora 29 on x86_64 with Garmin gps18x on rs232.
> 
> x86_64 covers a lot of ground.  Anything interesting about that system?

Fedora 29, kernel.org 4.19.30. (my compile)
AMD Ryzen 5 2400G with Radeon Vega Graphics on Gigabyte X470 AORUS ULTRA
GAMING motherboard.
ntpsec from git.


>  Did 
> it work correctly a while ago?  Does it work without any refclocks?  Did you 
> update the kernel recently?  ...

I did not see these messages before.

> What do we know about the system timekeeping?  I'm looking for two things.  
> The first is from ntpd, probably in your syslog, before switching logging to 
> the typical log file.  It should be something like this:
>   Apr 12 22:51:29 hgm ntpd[724]: INIT: precision = 0.157 usec (-23)

Apr 12 14:15:23 bla ntpd[2109]: INIT: precision = 1.816 usec (-19)

> The other comes from the Linux kernel at boot time.  I don't know a simple 
> grep expression to get what I want.  Grep for clocksource is a good start, 
> but 
> poke around to see if there is anything interesting nearby.  Grep for MHz may 
> be interesting.

clocksource is fixed at hpet since the previous situations where clock
sync was weird/gone/etc.

# dmesg|grep clocksource
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.19.30 ro
root=/dev/myvg/rootlv noexec=on noexec32=on vga=0xF06
SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us
acpi_enforce_resources=lax fbcon=font:VGA8x16 radeon.pcie_gen2=1
cgroup_disable=memory threadirqs plymouth.enable=0 rd.plymouth=0
mce=dont_log_ce panic=0 rd.lvm.vg=myvg radeon.dpm=1 zswap.enabled=1
rd.auto=1 audit=0 systemd.log_level=warning  net.ifnames=0 acpi=force
amdgpu.gttsize=8192 clocksource=hpet initrd=/initramfs-4.19.30.img
[0.00] clocksource: refined-jiffies: mask: 0x
max_cycles: 0x, max_idle_ns: 7645519600211568 ns
[0.00] Kernel command line: BOOT_IMAGE=/vmlinuz-4.19.30 ro
root=/dev/myvg/rootlv noexec=on noexec32=on vga=0xF06
SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us
acpi_enforce_resources=lax fbcon=font:VGA8x16 radeon.pcie_gen2=1
cgroup_disable=memory threadirqs plymouth.enable=0 rd.plymouth=0
mce=dont_log_ce panic=0 rd.lvm.vg=myvg radeon.dpm=1 zswap.enabled=1
rd.auto=1 audit=0 systemd.log_level=warning net.ifnames=0 acpi=force
amdgpu.gttsize=8192 clocksource=hpet initrd=/initramfs-4.19.30.img
[0.00] clocksource: hpet: mask: 0x max_cycles:
0x, max_idle_ns: 133484873504 ns
[0.05] clocksource: tsc-early: mask: 0x
max_cycles: 0x33e4791d180, max_idle_ns: 440795290210 ns
[0.288707] clocksource: jiffies: mask: 0x max_cycles:
0x, max_idle_ns: 764504178510 ns
[1.160577] clocksource: Switched to clocksource hpet
[1.174849] clocksource: acpi_pm: mask: 0xff max_cycles:
0xff, max_idle_ns: 2085701024 ns
[2.433537] tsc: Refined TSC clocksource calibration: 3603.134 MHz
[2.433787] clocksource: tsc: mask: 0x max_cycles:
0x33efe3ec317, max_idle_ns: 440795248122 ns

> You need to be careful with the arithmetic.  The ns per tick variable gets 
> adjusted by the clock drift.

I never ever saw these before.

Kind regards,
Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Achim Gratz via devel
Hal Murray via devel writes:
> The general idea is that if your system clock goes tick, tick, tick, in great 
> big steps, you want to fill in the bottom bits with randomness.  The code 
> does 
> that by assuming that the tick size is the time it takes to read the clock - 
> difference in times between 2 back-to-back readings.  That's not right, but 
> doesn't normally cause any troubles.  Maybe it skips samples that don't 
> change.

No, that description only holds for what are called "coarse" clocks.

> That made sense a long time ago when the system clock was updated on a 10 ms 
> clock interrupt.

Yes.

> The error messages say that filling in the low bits made clock go backwards.

Yes, but the real problem is that you got to identical timestamps from
the system for two different reads of the realtime clock.  ON the system
he uses the timestamps have nanosecond resolution, but the clock
probably updates in larger steps, depending on what the system timer is.
System fuzz was determined to be around 1µs, so that would indicate HPET
or something similarly low-frequency.

> I'd really like to rip out all that stuff, but I have at least one Raspberry 
> Pi where I can read the clock faster than it ticks.

Don't.  This is a lot more complicated than you seem to think.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Samples for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldSamplesExtra

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Hal Murray via devel
Here is a typical batch of the confusing CLOCK printout:

Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: ts_prev 1555072655 s +
712154322 ns, ts_min 1555072655 s + 712153764 ns
Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: ts 1555072655 s + 712154322 ns
Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: sys_fuzz 1816 nsec, prior
fuzz 0.01594
Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: this fuzz -0.01722
Apr 12 14:37:35 box.local ntpd[2109]: CLOCK: prev get_systime
0xe05b050f.b64fb1ce is 0.00942 later than 0xe05b050f.b64fa201

It looks like something is screwed up with your system clock.

This comes from an area of ntpd that I don't really understand.  I don't 
remember investigating anything like this before.

The general idea is that if your system clock goes tick, tick, tick, in great 
big steps, you want to fill in the bottom bits with randomness.  The code does 
that by assuming that the tick size is the time it takes to read the clock - 
difference in times between 2 back-to-back readings.  That's not right, but 
doesn't normally cause any troubles.  Maybe it skips samples that don't change.

That made sense a long time ago when the system clock was updated on a 10 ms 
clock interrupt.

The error messages say that filling in the low bits made clock go backwards.

I'd really like to rip out all that stuff, but I have at least one Raspberry 
Pi where I can read the clock faster than it ticks.


> Fedora 29 on x86_64 with Garmin gps18x on rs232.

x86_64 covers a lot of ground.  Anything interesting about that system?  Did 
it work correctly a while ago?  Does it work without any refclocks?  Did you 
update the kernel recently?  ...

What do we know about the system timekeeping?  I'm looking for two things.  
The first is from ntpd, probably in your syslog, before switching logging to 
the typical log file.  It should be something like this:
  Apr 12 22:51:29 hgm ntpd[724]: INIT: precision = 0.157 usec (-23)

The other comes from the Linux kernel at boot time.  I don't know a simple 
grep expression to get what I want.  Grep for clocksource is a good start, but 
poke around to see if there is anything interesting nearby.  Grep for MHz may 
be interesting.

Here is what is in my system:
Apr 12 22:50:01 hgm kernel: tsc: Refined TSC clocksource calibration: 3292.520 
MHz
Apr 12 22:50:01 hgm kernel: clocksource: tsc: mask: 0x 
max_cycles: 0x2f75b1eac7a, max_idle_ns: 440795210882 ns
Apr 12 22:50:01 hgm kernel: clocksource: Switched to clocksource tsc

That says it's using the TSC on a 3.3 GHz system.

TSC is Intel's TimeStampCounter.  It's a register that counts CPU clock 
cycles.  (or maybe it runs off a pseudo clock so timekeeping works when the 
CPU speed is adjusted to save power)

The basic idea is that you read and save the TSC when you set the clock.  
Then, when you want to know the time, read the TSC again, subtract to get the 
ticks since the last known time, multiply by the time per tick (inverse of 
clock frequency) and add that the the last known time.

You need to be careful with the arithmetic.  The ns per tick variable gets 
adjusted by the clock drift.



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: logging

2019-04-13 Thread Achim Gratz via devel
Udo van den Heuvel via devel writes:
> On another box:

[please add the log as an attachment or tell your MUA not to break the lines]

> Apr 12 14:32:16 box.local ntpd[2109]: CLOCK: ts_prev 1555072336 s + 595028760 
> ns, ts_min 1555072336 s + 595027293 ns
> Apr 12 14:32:16 box.local ntpd[2109]: CLOCK: ts 1555072336 s + 595028760 ns
> Apr 12 14:32:16 box.local ntpd[2109]: CLOCK: sys_fuzz 1816 nsec, prior fuzz 
> 0.01663
> Apr 12 14:32:16 box.local ntpd[2109]: CLOCK: this fuzz -0.01626
> Apr 12 14:32:16 box.local ntpd[2109]: CLOCK: prev get_systime 
> 0xe05b03d0.9853b2dc is 0.6 later than 0xe05b03d0.9853b2c2

Your clock seems to stop at times as far as ntpd is concerned: the
previous and the current timestamp are identical.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Samples for the Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#BlofeldSamplesExtra

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel