On Wed, Apr 05, 2017 at 02:34:14PM +0000, Ian Thompson wrote:
> Why is the time that gets put into the PTP registers in the STM MAC, Unix
> time rather than PTP time?
See below.
To you question from the other thread:
On Tue, Apr 04, 2017 at 03:45:16PM +0000, Ian Thompson wrote:
> Possibly following on from David’s post.
>
> We have a system with 18 boards in a rack, each board has a Altera SoC with
> the STM Ethernet MAC connected via gigabit Ethernet to an Arista ptp-aware
> switch and then a Spectracom GrandMaster.
> The boards are running Linux kernel 3.15.0.
That HW puts the time stamps into the buffer descriptor, and so in
theory it should never miss a time stamp. This is most likely a
driver bug. Looking at the git log I see:
v4.11-rc1~124^2~171^2~12 deeb637 net: stmmac: remove freesoftware address
v4.9-rc7~33^2~33^2~1 ba1ffd7 stmmac: fix PTP support for GMAC4
v4.9-rc7~33^2~33^2~2 d204205 stmmac: update the PTP header file
v4.9-rc4~28^2~68 c30a70d stmmac: fix and review the ptp registration.
v4.9-rc4~28^2~96 50756eb stmmac: fix an error code in
stmmac_ptp_register()
v4.9-rc1~28^2~10 7086605 stmmac: fix error check when init ptp
v4.9-rc1~127^2~108 efee95f ptp_clock: future-proofing drivers against
PTP subsystem becoming optional
v4.1-rc1~128^2~100^2~5 e7ea55b ptp: stmmac: use helpers for converting ns to
timespec.
v4.1-rc1~128^2~119^2~6 3f6c465 ptp: stmmac: convert to the 64 bit get/set
time methods.
v3.17-rc5~41^2~38 5566401 stmmac: ptp: fix the reference clock
v3.17-rc5~41^2~50 f95f404 stmmac: set ptp_clock to NULL while unregister
v3.15-rc1~113^2~108^2~5 4986b4f0 ptp: drivers: set the number of programmable
pins.
v3.13-rc7~13^2 7cd0139 stmmac: Fix incorrect spinlock release and
PTP cap detection.
v3.10-rc1~66^2~195 32ceabc stmmac: improve/review and fix kernel-doc
v3.10-rc1~66^2~327^2~1 92ba688 stmmac: add the support for PTP hw clock
driver
v3.10-rc1~66^2~327^2~2 891434b stmmac: add IEEE PTPv1 and PTPv2 support.
Especially ba1ffd7 looks suspicious.
> Apr 4 13:42:04 localhost user.info ptp4l: [537.164] rms 123 max 599 freq
> +255 +/- 39 delay 7362 +/- 48
> Apr 4 13:42:29 localhost user.err ptp4l: [561.387] timed out while
> polling for tx timestamp
> Apr 4 13:42:29 localhost user.err ptp4l: [561.387] increasing
> tx_timestamp_timeout may correct this issue, but it is likely caused by a
> driver bug
> Apr 4 13:42:29 localhost user.err ptp4l: [561.387] port 1: send delay
> request failed
> Apr 4 13:42:29 localhost user.notice ptp4l: [561.387] port 1: SLAVE to
> FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
> Apr 4 13:42:45 localhost user.notice ptp4l: [577.388] port 1: FAULTY to
> LISTENING on FAULT_CLEARED
> Apr 4 13:42:45 localhost user.warn ptp4l: [577.414] clockcheck: clock
> jumped backward or running slower than expected!
> Apr 4 13:42:45 localhost user.notice ptp4l: [577.414] port 1: new foreign
> master 000cec.fffe.0a085d-1
> Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] selected best master
> clock 000cec.fffe.0a085d
> Apr 4 13:42:47 localhost user.notice ptp4l: [579.414] port 1: LISTENING to
> UNCALIBRATED on RS_SLAVE
> Apr 4 13:42:54 localhost user.notice ptp4l: [587.164] port 1: UNCALIBRATED
> to SLAVE on MASTER_CLOCK_SELECTED
> Apr 4 13:46:46 localhost user.info ptp4l: [818.414] rms 2312500092 max
> 37000001557 freq +246 +/- 250 delay 7358 +/- 46
> Apr 4 13:51:02 localhost user.info ptp4l: [1074.413] rms 116 max 681
> freq +256 +/- 48 delay 7373 +/- 88
>
> Does this imply that one lost delay request can do this, or is there a retry
> mechanism?
One lost delay request shouldn't introduct such a large error. This
is a driver bug. Notice that the time error is 37 seconds, or the
UTC/TAI offset.
When resetting the fault, ptp4l re-initializes HW time stamping.
The funtion, stmmac_hwtstamp_ioctl(), in
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
programs the system time (UTC) into the PHC every time HW time
stamping is enabled. It shouldn't do that.
> We have a lot of traffic leaving the boards but only PTP traffic
> coming in. As we increase the off board transfer rates the problem
> seems to occur more often.
That could indicate a driver or a HW issue, or both.
HTH,
Richard
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linuxptp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-users