Hi,

I’m off from work for the weekend, but I did view the contents of 0x808 (size 
8) before and after doing a write of zero and all the values stayed the same. 
SM1 is configured with its default values so I’m assuming writing a zero will 
set it back to its defaults, so will remain the same. Needs more testing but 
ran out of time this week.

I got to the point of writing a zero to SM1 due to noticing that the rx 
communications resumed when I manually changed the state of the slave from OP 
to PREOP. So I followed the steps in the state machine. That was the command 
that did it. Other things I tried like rescans or unplugging the coupler 
(keeping power on the EoE slave didn’t work).

I don’t know of any slaves that might require OP mode to run as I’m new to EoE 
too. From my memory of the code on initial startup it remains in PREOP and its 
only on a master deactivate that it remains in OP (and it does so by putting 
EoE slaves back into OP after all slaves have been set to PREOP).

Time for more testing next week. Yes I will be  wiresharking the Ethercat 
frames. I’ve been checking the Ethernet side so far. Will need to grab another 
computer to check the Ethercat side.

Thanks,
Graeme.

________________________________
From: Gavin Lambert <gav...@compacsort.com>
Sent: Friday, 16 February 2018 7:47:17 PM
To: Graeme Foot; etherlab-dev@etherlab.org
Subject: RE: EoE patchs and questions

Those sound like great changes to have.

I suspect the EoE-OP thing came from an assumption that the slave had to be in 
OP to transfer EoE frames; there was previously a similar assumption regarding 
the DC reference clock that was fixed in 
[559f2f]<https://sourceforge.net/p/etherlabmaster/code/ci/559f2f9c5b08700f2e4722f498799236a2c9f78a/>.
  I don’t have any experience with EoE myself but a quick glance through the 
manual for EL6614 does suggest that it will happily do EoE in PREOP and above.  
Do you think there could be any older slaves that might need OP for that?

The register write to 0x808 as a recovery from that condition seems a bit 
peculiar – most of those registers are read-only while SM1 is enabled – though 
you’re writing 0 to 0x80E, which should disable the SM, which then ought to 
stop it working entirely, unless something reconfigures it.

Perhaps inspecting other SM registers might be interesting?  Or see if there’s 
anything noticeable around that time in a Wireshark trace (if you have some way 
to detect exactly when it stops)?  Does the problem still happen with fewer 
patches applied?

From: Graeme Foot
Sent: Friday, 16 February 2018 19:01
To: etherlab-dev@etherlab.org
Subject: [etherlab-dev] EoE patchs and questions

Hi,

I've been setting up my system to use EoE (Ethernet over EtherCAT) with an RTAI 
user space application.

I've updated my master to revision 33b922ec1871 (default branch) and applied 
the gavinl (Gavin Lambert) patch set 20171108.
Linux 2.6.32.11
RTAI 3.8.1


Firstly I have a bit of a different use case for my EoE.  The current 
implementation auto creates and removes the eoe interfaces as the EoE capable 
slaves are configured and removed.  This means the interface is not available 
until the slave is scanned, and is not available if it is removed.  The eoe 
interface is also temporarily destroyed on a bus rescan.  In my use case I want 
to bridge the eoe interface to a real Ethernet interface.  So I want the eoe 
interface to always exists whether the slave is plugged in or not.

So the first patch does a few things:
1) adds explicit eoe_addif and eoe_delif tool functions so that you can 
manually add/remove an eoe iface without the slave existing
2) no longer deletes and eoe iface if the slave disappears
3) will relink a slave to an eoe iface when it is configured
4) will let you configure eoe ifaces via the sysconfig/ethercat config file
5) will let you turn off auto creation of eoe ifaces via the sysconfig/ethercat 
config file
6) no longer keeps slaves with EoE capability in OP mode when the master is 
deactivated

The above is made possible by using the netif_carrier_on() and 
netif_carrier_off() functions of the iface.  (The same as having a normal 
network interface up, but not plugged in.)

The other thing the patch does is fix a race condition bug in the eoe iface 
code.  The current implementation uses a struct list_head queue with a 
semaphore to protect it between the iface tx callback and the ethercat thread.  
Sleeps are not allowed in the ifaces tx callback as it is in an interrupt 
context.  To fix this I have changed the queue to a ring buffer so that it no 
longer needs a lock.

FYI, when the race condition occurred I was getting:
BUG: scheduling while atomic
Call Trace:
[<c0146aa2>] ? ktime_get_real+0x0/0x29
[<c0146987>] ? ktime_get+0x0/0x88

Florian you may be interested in this patch, especially the bug fix part.


The second patch is so that I can run the EoE pump without callbacks.  As I am 
using a user space RTAI application I cannot use callbacks as they would need 
to call back from a kernel context to the user space context.  Instead I am 
running a thread in my application that makes calls into EtherCAT in a similar 
fashion to the masters EoE thread.  I have created two functions 
(ecrt_master_eoe_is_open() and ecrt_master_eoe_process()) to call without 
application locks as the locks only need to be around the ecrt_master_receive() 
and ecrt_master_send_ext() calls.


Now for the question.  I have been hammering my test rig pretty hard with 
various communications (pings with multiple fragments multiple times a second 
from both directions, SDO calls to the EoE slave without a pause approx. 100 
per second).  Every now and then (after around 10 to 30 minutes with the above 
tests) the receive mailbox (SM1) of the EoE slave stops responding (slave to 
master).  CoE reads to the slave also fail.  The transmit mailbox still 
continues to function.  The RX SM1 status register continually returns a zero 
value.  I have found that if I send the command below the receive mailbox 
starts to function again (until it doesn't):

  ethercat reg_write -p3 0x808 -tuint64 0

Has anyone else come across this?  At the moment I suspecting a Slave firmware 
bug (EL6614).  Does anyone have any other ideas?


Regards,
Graeme Foot.



_______________________________________________
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Reply via email to