Re: [etherlab-users] Error reassigning removed PDO

Gavin Lambert Thu, 29 May 2014 02:25:27 -0700

It’s mostly a master problem I think, although some of the worst misbehaviour 
requires particular functionality in the slave (which may be rarer).

The main problem that I’ve personally run into recently (and coded my own 
workaround for, just a few minutes ago) was from this scenario:

1.       Master starts up, starts doing slave scanning.

2.       Application starts up, calls ecrt_request_master, which waits for 
slave scanning to complete before returning.

3.       Application sets up basic configuration and calls ecrt_master_activate.

4.       Slaves wind their way up to OP.

5.       Meanwhile in the background the master starts reading the CoE 
dictionary and getting entry descriptions to fill in the names.  (This takes 
quite a long time.)

6.       Application decides something is screwy while this is still happening 
and calls ecrt_master_release and unloads the master module.

7.       Since the master stops dead when this happens, occasionally it has 
just sent a CoE Info request to a slave but abandoned waiting for the response. 
 The response is still sitting there in the slave’s mailbox.  The slaves have 
dropped back to SAFEOP+ERROR because they’re no longer receiving data.

8.       The master service and application are reloaded.

9.       The initial scan sees the slaves at >= PREOP so merely acknowledges 
the error and leaves them at SAFEOP, then starts to read SM+PDOs.

10.   When it gets to the slave that had a stale SDO Info response in its 
mailbox (which is still there, because the slave was never sent back to INIT), 
it gets confused because it wasn’t the SDO 0x1C12 data response it was 
expecting (because it had just sent the request); it aborts the request and 
assumes 0 PDOs in that SM.  Hilarity ensues, as I’ve already outlined below.

(This can also occur if the network is disconnected but not unpowered at any 
time during the CoE dictionary scan, then reconnected later.)

Note that it’s reasonable for the scan to not reset to INIT, because rescans 
can occur during operation (although having said that, I haven’t looked too 
closely at whether this disrupts anything).  But I think it’s definitely a 
master-side bug that it can’t cope with stale responses – that’s just something 
you always have to expect with mailboxes, especially when there are timeouts 
involved as well.

My workaround was to change the CoE FSM to check for and discard any stale data 
in the mailbox prior to beginning any CoE operation.  It seemed to resolve the 
above issue in a very basic test, but I’ll hopefully know more after a more 
thorough one tomorrow.

It’s not an ideal solution, of course; the underlying problem (which I hinted 
at below, and posted in more detail about several months ago) is that the 
Etherlab code assumes that only one thing is going on in the mailboxes at a 
time, and so only checks them when it’s expecting a response and throws its 
virtual hands up when it finds something other than what it wanted.  This is 
particularly noticeable if a slave sends asynchronous notifications, or can 
process multiple mailbox protocols in parallel (both of which are allowed in 
the standards).  The most common types of these are CoE emergencies and EoE.  
And woe betide you if the master happens to be handling a FoE request when an 
emergency arrives, or a CoE request when an EoE packet arrives, etc.

Ideally the master should have some sort of central dispatcher which is 
constantly watching mailboxes and handing off incoming data to the protocol 
state machines as they arrive.  Often this can even be done for “free” – many 
slaves provide a dedicated “MBoxState” FMMU that can be used to watch for new 
mailbox messages as part of the regular process datagram, avoiding the need to 
individually poll the slaves.

From: Jun Yuan [mailto:j.y...@rtleaders.com] 
Sent: Thursday, 29 May 2014 20:40
To: Gavin Lambert
Cc: etherlab-users@etherlab.org
Subject: Re: [etherlab-users] Error reassigning removed PDO

Hello Gavin,

for that specific part of the CoE transfer problem you mentioned, I may have 
observed the same problem, and I did some analysis on it. This is actually a 
big problem, makes the master quite unreliable for me. I have a temporary fix 
for it. But I don't know who should be responsible for this CoE mailbox bug. Is 
it the master? Is it the slave? or is it a design error in the EtherCAT 
standard for the mailbox? I'll write another email to elaborate the problem 
with the flaky CoE mailbox.

Regards,
Jun

On 29 May 2014 09:37, Gavin Lambert <gav...@compacsort.com> wrote:

Last month, I wrote:
> TLDR: when reassigning PDOs, why doesn't the master read mappings from
> the slave via CoE?
[...]
> Shouldn't this scenario work?  The PDO is always specified in the SII,
> even if not presently in PDO Assign, so the master ought to know that it
> exists.
> And failing that, it could just try to read the mappings directly from
> the slave (if CoE is available) when unable to load default mapping from
> its cache.  (I think part of the problem is that the CoE data appears to
> be replacing the SII data in the master's PDO cache.)
>
> I'm also a little puzzled as to why (if it wants to have a cache of PDO
> mappings) it seems to limit itself to reading only the currently
> assigned PDOs during the initial scan, instead of fetching all of them.
> They shouldn't be hard to find -- they can be identified purely by their
> index.

There's a further problem with this that I've since discovered: if, during
the master's scan of the PDO assignment registers, something goes wrong with
the CoE transfer of 0x1C1x:0, then the master will log an error but proceed
anyway under the assumption that the slave has 0 PDOs assigned in that SM.
If this is not contradicted by the application using ecrt_slave_config_pdos
(including both assigns and mappings, because it read no default mappings),
then the master will *write 0 back* to the PDO assignment register (if
writable) on activate.

This guarantees that the next scan will not find any PDOs, unless the slave
reloads the default assignments during INIT (and with my "slave author" hat
on, all advice I can find says that slaves should not do that, although I
couldn't find official word).

So basically it all seems to point to applications being unreliable (at
least for flexible-assignment slaves) unless they use ecrt_slave_config_pdos
to configure *everything* (including mappings, even for fixed-mapping
slaves).  Which makes me wonder why it bothers scanning for PDO assignments
at all.  Doesn't that just waste time if apps have to use
ecrt_slave_config_pdos anyway?

Given how flaky mailbox handling is in general (as previously mentioned),
I'm surprised this hasn't come up more often.

_______________________________________________
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users

_______________________________________________
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users

Re: [etherlab-users] Error reassigning removed PDO

Reply via email to