(Replying from digest, hopefully this doesn't break message numbering)
Date: Tue, 20 Apr 2021 13:52:45 +0200
From: Diego Garc?a Prieto <gprie...@unican.es>
I am running ptp4l and phc2sys with values lower than 100ns, that is,
my
nodes are synchronized to the GM. It is fine until I run "stress-ng" in
my nodes (4-core systems) with 4 CPU threads at 25% of load each. ptp4l
and phc2sys have priority fifo 99 and the stress-ng process fifo 50.
Tried in Linux kernels: 4.13.0-36-generic and Preemp-RT
4.9.18-rt14-rt14. Both work badly.
I looked back through your previous messages and did not see that you
ever indicated what platform you are running.
An earlier message did report that the only time sources your platform
has available are tsc and acpi_pm which I take to mean you are on an x86
platform, but I have not seen an x86 platform in a long time which does
not also have HPET timer available as a clock source. According to the
wiki article on HPET it has been in x86 chipsets since 2005:
https://en.wikipedia.org/wiki/High_Precision_Event_Timer
When I run the load, I see a loss of sync and trying to recover it
periodically, but always losing the sync over time. This is the log of
ptp4l and phc2sys together:
ptp4l[706.965]: clockcheck: clock jumped forward or running faster than
expected!
...
ptp4l[710.980]: clockcheck: clock jumped backward or running slower
than
expected!
...
ptp4l[711.093]: clockcheck: clock jumped forward or running faster than
expected!
Have you looked at the description of TSC on wikipedia?
https://en.wikipedia.org/wiki/Time_Stamp_Counter
"There is no promise that the timestamp counters of multiple CPUs on a
single motherboard will be synchronized. Therefore, a program can get
reliable results only by limiting itself to run on one specific CPU.
Even then, the CPU speed may change because of power-saving measures
taken by the OS or BIOS, or the system may be hibernated and later
resumed, resetting the TSC."
Does /proc/cpuinfo contain the constant_tsc flag? If not then probably
the processor clock is being changed which is changing the rate at which
the tsc is ticking.
Even if your platform does have constant_tsc I am not positive what
happens if the processor goes into thermal throttling.
Can you monitor core temperatures to see what happens when you start the
load program? Maybe the processor clock first increases because of the
higher load, then decreases as the processor gets hot. If that seems to
be the case you could try running the processor at a constant clock and
lock to a frequency lower than maximum (using the user mode scheduler
for example).
I should note that my workstation is using TSC for clock, not HPET, so I
would guess that is the common choice if the processor supports
constant_tsc.
According to the Red Hat documentation the TSC clock is preferred, but
again I would assume that is on platforms which have constant_tsc, so
definitely verify that first.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/chap-timestamping
That Red Hat doc has a lot of good information, you could read through
and compare the options available on your platform to how that doc
describes the behavior of the various options.
--
Chris Caudle
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users