(Replying from digest, hopefully this doesn't break message numbering)

Date: Tue, 20 Apr 2021 13:52:45 +0200
From: Diego Garc?a Prieto <gprie...@unican.es>
I am running ptp4l and phc2sys with values lower than 100ns, that is, my
nodes are synchronized to the GM. It is fine until I run "stress-ng" in
my nodes (4-core systems) with 4 CPU threads at 25% of load each. ptp4l
and phc2sys have priority fifo 99 and the stress-ng process fifo 50.
Tried in Linux kernels: 4.13.0-36-generic and Preemp-RT
4.9.18-rt14-rt14. Both work badly.

I looked back through your previous messages and did not see that you ever indicated what platform you are running. An earlier message did report that the only time sources your platform has available are tsc and acpi_pm which I take to mean you are on an x86 platform, but I have not seen an x86 platform in a long time which does not also have HPET timer available as a clock source. According to the wiki article on HPET it has been in x86 chipsets since 2005:
https://en.wikipedia.org/wiki/High_Precision_Event_Timer

When I run the load, I see a loss of sync and trying to recover it
periodically, but always losing the sync over time. This is the log of
ptp4l and phc2sys together:
ptp4l[706.965]: clockcheck: clock jumped forward or running faster than
expected!
...
ptp4l[710.980]: clockcheck: clock jumped backward or running slower than
expected!
...
ptp4l[711.093]: clockcheck: clock jumped forward or running faster than
expected!

Have you looked at the description of TSC on wikipedia?
https://en.wikipedia.org/wiki/Time_Stamp_Counter

"There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. Therefore, a program can get reliable results only by limiting itself to run on one specific CPU. Even then, the CPU speed may change because of power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed, resetting the TSC."

Does /proc/cpuinfo contain the constant_tsc flag? If not then probably the processor clock is being changed which is changing the rate at which the tsc is ticking. Even if your platform does have constant_tsc I am not positive what happens if the processor goes into thermal throttling. Can you monitor core temperatures to see what happens when you start the load program? Maybe the processor clock first increases because of the higher load, then decreases as the processor gets hot. If that seems to be the case you could try running the processor at a constant clock and lock to a frequency lower than maximum (using the user mode scheduler for example).

I should note that my workstation is using TSC for clock, not HPET, so I would guess that is the common choice if the processor supports constant_tsc. According to the Red Hat documentation the TSC clock is preferred, but again I would assume that is on platforms which have constant_tsc, so definitely verify that first.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/chap-timestamping

That Red Hat doc has a lot of good information, you could read through and compare the options available on your platform to how that doc describes the behavior of the various options.

--
Chris Caudle


_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to