Thank you so much, after reading your mail, I finally understand why some slave goto SAFEOP+ERROR state under the circumstances. Yes I had exactly the same problem.
On 29 May 2014 11:24, Gavin Lambert <gav...@compacsort.com> wrote: > It’s mostly a master problem I think, although some of the worst > misbehaviour requires particular functionality in the slave (which may be > rarer). > > > > The main problem that I’ve personally run into recently (and coded my own > workaround for, just a few minutes ago) was from this scenario: > > 1. Master starts up, starts doing slave scanning. > > 2. Application starts up, calls ecrt_request_master, which waits > for slave scanning to complete before returning. > > 3. Application sets up basic configuration and calls > ecrt_master_activate. > > 4. Slaves wind their way up to OP. > > 5. Meanwhile in the background the master starts reading the CoE > dictionary and getting entry descriptions to fill in the names. (This > takes quite a long time.) > > 6. Application decides something is screwy while this is still > happening and calls ecrt_master_release and unloads the master module. > > 7. Since the master stops dead when this happens, occasionally it > has just sent a CoE Info request to a slave but abandoned waiting for the > response. The response is still sitting there in the slave’s mailbox. The > slaves have dropped back to SAFEOP+ERROR because they’re no longer > receiving data. > > 8. The master service and application are reloaded. > > 9. The initial scan sees the slaves at >= PREOP so merely > acknowledges the error and leaves them at SAFEOP, then starts to read > SM+PDOs. > > 10. When it gets to the slave that had a stale SDO Info response in its > mailbox (which is still there, because the slave was never sent back to > INIT), it gets confused because it wasn’t the SDO 0x1C12 data response it > was expecting (because it had just sent the request); it aborts the request > and assumes 0 PDOs in that SM. Hilarity ensues, as I’ve already outlined > below. > > > > (This can also occur if the network is disconnected but not unpowered at > any time during the CoE dictionary scan, then reconnected later.) > > > > Note that it’s reasonable for the scan to not reset to INIT, because > rescans can occur during operation (although having said that, I haven’t > looked too closely at whether this disrupts anything). But I think it’s > definitely a master-side bug that it can’t cope with stale responses – > that’s just something you always have to expect with mailboxes, especially > when there are timeouts involved as well. > > > > My workaround was to change the CoE FSM to check for and discard any stale > data in the mailbox prior to beginning any CoE operation. It seemed to > resolve the above issue in a very basic test, but I’ll hopefully know more > after a more thorough one tomorrow. > > > > It’s not an ideal solution, of course; the underlying problem (which I > hinted at below, and posted in more detail about several months ago) is > that the Etherlab code assumes that only one thing is going on in the > mailboxes at a time, and so only checks them when it’s expecting a response > and throws its virtual hands up when it finds something other than what it > wanted. This is particularly noticeable if a slave sends asynchronous > notifications, or can process multiple mailbox protocols in parallel (both > of which are allowed in the standards). The most common types of these are > CoE emergencies and EoE. And woe betide you if the master happens to be > handling a FoE request when an emergency arrives, or a CoE request when an > EoE packet arrives, etc. > > > > Ideally the master should have some sort of central dispatcher which is > constantly watching mailboxes and handing off incoming data to the protocol > state machines as they arrive. Often this can even be done for “free” – > many slaves provide a dedicated “MBoxState” FMMU that can be used to watch > for new mailbox messages as part of the regular process datagram, avoiding > the need to individually poll the slaves. > > > > *From:* Jun Yuan [mailto:j.y...@rtleaders.com] > *Sent:* Thursday, 29 May 2014 20:40 > *To:* Gavin Lambert > *Cc:* etherlab-users@etherlab.org > *Subject:* Re: [etherlab-users] Error reassigning removed PDO > > > > Hello Gavin, > > for that specific part of the CoE transfer problem you mentioned, I may > have observed the same problem, and I did some analysis on it. This is > actually a big problem, makes the master quite unreliable for me. I have a > temporary fix for it. But I don't know who should be responsible for this > CoE mailbox bug. Is it the master? Is it the slave? or is it a design error > in the EtherCAT standard for the mailbox? I'll write another email to > elaborate the problem with the flaky CoE mailbox. > > Regards, > Jun > > > > On 29 May 2014 09:37, Gavin Lambert <gav...@compacsort.com> wrote: > > Last month, I wrote: > > TLDR: when reassigning PDOs, why doesn't the master read mappings from > > the slave via CoE? > [...] > > Shouldn't this scenario work? The PDO is always specified in the SII, > > even if not presently in PDO Assign, so the master ought to know that it > > exists. > > And failing that, it could just try to read the mappings directly from > > the slave (if CoE is available) when unable to load default mapping from > > its cache. (I think part of the problem is that the CoE data appears to > > be replacing the SII data in the master's PDO cache.) > > > > I'm also a little puzzled as to why (if it wants to have a cache of PDO > > mappings) it seems to limit itself to reading only the currently > > assigned PDOs during the initial scan, instead of fetching all of them. > > They shouldn't be hard to find -- they can be identified purely by their > > index. > > There's a further problem with this that I've since discovered: if, during > the master's scan of the PDO assignment registers, something goes wrong > with > the CoE transfer of 0x1C1x:0, then the master will log an error but proceed > anyway under the assumption that the slave has 0 PDOs assigned in that SM. > If this is not contradicted by the application using ecrt_slave_config_pdos > (including both assigns and mappings, because it read no default mappings), > then the master will *write 0 back* to the PDO assignment register (if > writable) on activate. > > This guarantees that the next scan will not find any PDOs, unless the slave > reloads the default assignments during INIT (and with my "slave author" hat > on, all advice I can find says that slaves should not do that, although I > couldn't find official word). > > So basically it all seems to point to applications being unreliable (at > least for flexible-assignment slaves) unless they use > ecrt_slave_config_pdos > to configure *everything* (including mappings, even for fixed-mapping > slaves). Which makes me wonder why it bothers scanning for PDO assignments > at all. Doesn't that just waste time if apps have to use > ecrt_slave_config_pdos anyway? > > Given how flaky mailbox handling is in general (as previously mentioned), > I'm surprised this hasn't come up more often. > > > _______________________________________________ > etherlab-users mailing list > etherlab-users@etherlab.org > http://lists.etherlab.org/mailman/listinfo/etherlab-users > > > > -- Jun Yuan [Aussprache: Djün Üän] Robotics Technology Leaders GmbH Am Loferfeld 58, D-81249 München Tel: +49 89 189 0465 24 Fax: +49 89 189 0465 11 mailto: j.y...@rtleaders.com Umlautregel in der chinesischen Lautschrift Pinyin: Nach den Anlauten y, j, q, und x wird u als ü ausgesprochen, z.B. yu => ü, ju => dschü, qu => tschü, xu => schü.
_______________________________________________ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users