On 27 February 2015 22:06, quoth Richard Hacker: > > I have a question regarding support for cable redundancy in the > > stable-1.5 branch. > > > > I know that it has options for enabling a "backup" network port on the > > PC and connecting the end of a single chain to this port. Presumably > > this is mostly transparent to the application code (although it can > > query for status)? > > > > Does it also support redundant tree links similarly? > > In principle it should work, although I have not tested it. The trick with > redundancy is, that the number of visible slaves and the order of packet > traversal must not change when a single link is destroyed. > > You are quite correct in the assumption that redundancy is transparent to the > application. The status is only required to report a redundant state or not, > otherwise redundancy would be useless to the user. The state is not required > by the application to select another source/destination of data.
Yes, that's all I was thinking of, to display some sort of warning to the user that their network might have issues. On a related note though, I've been testing basic redundancy (a single loop without internal subloops) recently and I've noticed some things that seem odd to me: 1. On a two slave network with the break between the two (so one slave on each master link), the log messages identify both slaves as "0-0", making it hard to see what's going on. I've already written a patch to improve this, which I'll include in the patch bundle that I've been threatening to send to the dev list for a few months now. ;) 2. There appear to be a few things that only seem to work on the main link, not the backup link (unless I'm missing something). Register requests (maybe only some types?) seem to be one of them, and I'm dubious about the DC sync behaviour as well -- I don't think the RMW broadcast sync to the refclock is really going to work on a link that doesn't contain the refclock. The transmission delay measurements seem incorrect too. 3. Whenever the etherlab master service is started (with the network initially in "good" state), the first time that the network breaks and redundancy is activated takes about 2 seconds to resolve (which seems to be a standard network link-up delay). If the break is then fixed, future breaks in the same spot resolve almost instantly. (I haven't yet tested with a large enough network to check breaks in different places.) The below is an example of the syslog output when the slave0 -> slave1 link is broken and the slave1 <- backup link needs to pick up the slack. [ 1368.157824] e1000e: ecb0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None [ 1368.157829] ec_e1000e 0000:01:00.1: (unregistered net_device): 10/100 speed: disabling TSO [ 1368.157831] EtherCAT 0: Link state of ecb0 changed to UP. [ 1368.157960] EtherCAT WARNING 0: Domain 0: Redundant link in use! On both master and slave the LINK/ACT lights are lit on the redundant ports both before and after this event (it's a two-port adapter, in case that makes a difference), so I'm not sure why the driver is announcing a link-up at this time instead of earlier. In case it helps, this is the initial output when the master is loaded: [ 3620.561200] EtherCAT: 1 master waiting for devices. [ 3635.431476] ec_e1000e: EtherCAT-capable Intel(R) PRO/1000 Network Driver - 1.5.1-k-EtherCAT [ 3635.431479] ec_e1000e: Copyright(c) 1999 - 2011 Intel Corporation. [ 3635.431501] ec_e1000e 0000:01:00.0: Disabling ASPM L1 [ 3635.431520] ec_e1000e 0000:01:00.0: setting latency timer to 64 [ 3635.431606] ec_e1000e 0000:01:00.0: irq 41 for MSI/MSI-X [ 3635.604415] EtherCAT: Accepting 68:05:CA:0A:99:18 as main device for master 0. [ 3635.748669] ec_e1000e 0000:01:00.0: irq 41 for MSI/MSI-X [ 3635.804370] ec_e1000e 0000:01:00.0: (unregistered net_device): MSI interrupt test failed, using legacy interrupt. [ 3635.804398] ec_e1000e 0000:01:00.0: (unregistered net_device): (PCI Express:2.5GT/s:Width x4) 68:05:ca:0a:99:18 [ 3635.804401] ec_e1000e 0000:01:00.0: (unregistered net_device): Intel(R) PRO/1000 Network Connection [ 3635.804476] ec_e1000e 0000:01:00.0: (unregistered net_device): MAC: 0, PHY: 4, PBA No: D50868-008 [ 3635.804487] ec_e1000e 0000:01:00.1: Disabling ASPM L1 [ 3635.804500] ec_e1000e 0000:01:00.1: setting latency timer to 64 [ 3635.804581] ec_e1000e 0000:01:00.1: irq 41 for MSI/MSI-X [ 3635.980331] EtherCAT: Accepting 68:05:CA:0A:99:19 as backup device for master 0. [ 3636.124622] ec_e1000e 0000:01:00.1: irq 41 for MSI/MSI-X [ 3636.180287] ec_e1000e 0000:01:00.1: (unregistered net_device): MSI interrupt test failed, using legacy interrupt. [ 3636.180315] EtherCAT DEBUG 0: ORPHANED -> IDLE. [ 3636.180316] EtherCAT 0: Starting EtherCAT-IDLE thread. [ 3636.180363] ec_e1000e 0000:01:00.1: (unregistered net_device): (PCI Express:2.5GT/s:Width x4) 68:05:ca:0a:99:19 [ 3636.180366] EtherCAT DEBUG 0: Idle thread running with send interval = 4000 us, max data size=45000 [ 3636.180369] ec_e1000e 0000:01:00.1: (unregistered net_device): Intel(R) PRO/1000 Network Connection [ 3636.180446] ec_e1000e 0000:01:00.1: (unregistered net_device): MAC: 0, PHY: 4, PBA No: D50868-008 [ 3637.806692] e1000e: ecm0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None [ 3637.806696] ec_e1000e 0000:01:00.0: (unregistered net_device): 10/100 speed: disabling TSO [ 3637.806699] EtherCAT 0: Link state of ecm0 changed to UP. [ 3637.814759] EtherCAT 0: 2 slave(s) responding on main device. [ 3637.814762] EtherCAT 0: Slave states on main device: INIT, SAFEOP + ERROR. [ 3637.818835] EtherCAT DEBUG 0: Sending broadcast-write to measure transmission delays on main link. [ 3637.818887] EtherCAT DEBUG 0: 2 slaves responded to delay measuring on main link. [ 3637.818888] EtherCAT 0: Scanning bus. [ 3637.818890] EtherCAT DEBUG 0: Scanning slave 0 on main link. I'm expecting it to say that the ecb0 link is also up at this time, despite not needing to talk to any slaves via that link yet since the main link is sufficient. Instead this doesn't happen until a network break actually occurs, which is too late if I want a smooth transition. ("ethercat slaves -v" reports that the last slave thinks the backup link is up as well.) Also possibly of interest is that if I disconnect/reconnect the backup link while the main link is still working normally (even after the first fault), then the link LEDs change as you'd expect but there is no syslog output in either case. Any hints where in the code I should be looking to resolve this? I've had a look around but can't see anything obvious -- it looks like it should be checking the link whenever e1000_watchdog_task is called, which should be whenever ec_poll is called, which should be whenever ecrt_master_receive is called, which should be all the time. Unless there's some quirk about it being a dual-port board? It does work once the main link breaks somewhere though. _______________________________________________ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users