Re: [etherlab-users] Error reassigning removed PDO
Hi Jun, While that patch looks like an improvement, it will still have the same trouble if the master service is restarted between runs, or if an application wants to include a PDO that is not assigned by default. I think having ecrt_slave_config_pdos (or actually ec_slave_config_load_default_mapping) upload the mapping from the slave is actually the better solution and not ugly at all, in theory. Bear in mind that this should only happen if the slave is online and if the mapping was not already found in the application-supplied mappings or the previously-read cache. (Though note that the current code structure would do it regardless of whether the application supplied mappings or not, as an unfortunate consequence of the API structure. But it will meet the other two conditions.) And it would only be one SDO upload if the slave supports Complete Access, which the master should already know at that point. (Although that’s an optimisation missing from the current PDO configuration code as well.) Regards, Gavin Lambert From: Jun Yuan [mailto:j.y...@rtleaders.com] Sent: Saturday, 31 May 2014 04:24 To: Gavin Lambert Cc: etherlab-users@etherlab.org Subject: Re: [etherlab-users] Error reassigning removed PDO Hi Gavin, I have a gift for you. The attached patch should make your scenario with different PDOs of interest in different apps working. The problem was that the master always makes the last PDO assign in SyncManager as the default PDO assign, and it don't remember any older PDO assignment. I made a patch to get a memory for the PDO mappings, and it always merge the new PDO mapping list into the old list, instead of throw the old list away. It remembers things. It is still not so smart like you said to fetch the PDO assign using their index via CoE automatically. I don't know if it's a good idea for the master to do it blindly. The question is when should the master fetch it. If the master fetches all the PDO mappings during the bus scan, isn't that a waste of time, because most of the time we don't need all of them? If fetch it when needed, the master needs to call several ecrt_master_sdo_upload() in the function ecrt_slave_config_pdos() to fetch the mapping, which makes the code quite ugly. And actually the app can do it itself, and then provides the correct default PDO mapping to the master. Hope you enjoy it! Regards, Jun On 22 April 2014 09:33, Gavin Lambert gav...@compacsort.com wrote: Hi all, TLDR: when reassigning PDOs, why doesn't the master read mappings from the slave via CoE? I have a (custom) slave that provides a number of different PDOs. I have a couple of different master applications which are interested in different subsets of these PDOs. As an example, let's say that the slave has an RxPDO at 0x1600 that points to 0x7000:0x00:0x20, and one app wants to use this value and the other doesn't. If the master apps just use ecrt_domain_reg_pdo_entry_list to register the PDOs of interest, then they both work (assuming that the slave has all the required PDOs assigned by default), but it wastes space in the packet as the whole SM is transferred even if some of the data is not of interest to that particular master app. (And in the case of outputs, it forces the master app to write something even when it doesn't want to, lest the slave get uninitialized data and think it needs to do something with it.) If the master apps use ecrt_slave_config_pdos to select the PDOs of interest, then things get troublesome. If the master apps specify the full mappings explicitly, then again things work, but as the slave does not support remapping (just reassignment) this generates warnings, and it just seems ugly to me to have to specify all this data that the slave already knows. (And it makes things more brittle, as if the mapping is changed in a future version of the slave it will generate an error instead of just working, as it would if it had loaded the slave's current mappings.) If the master apps don't specify the full mappings, however (just the sync manager - PDO assignments, which seems like it's a supported scenario given the docs and examples), then results are mixed. If the slave is rebooted prior to running either master app, it works. If not, then the master app that wants the extra PDO will fail to run. The problem case seems to be: - slave boots, has all PDOs in SII and CoE PDO assign. - first app runs, specifies PDO Assign to not include 0x1600. - runs successfully. - PDO Assign is updated in the actual slave. - second app runs, specifies PDO Assign to include 0x1600. - fails at ecrt_reg_pdo_entry_list as it cannot find a mapping for 0x7000:0x00. - problem is Loading default mapping for PDO 0x1600. - No default mapping found. - PDO Assign of the actual slave is never actually updated in this case as it fails before it activates the slave configs. - ethercat rescan / ethercat pdos
Re: [etherlab-users] Error reassigning removed PDO
Hi Gavin, I have a gift for you. The attached patch should make your scenario with different PDOs of interest in different apps working. The problem was that the master always makes the last PDO assign in SyncManager as the default PDO assign, and it don't remember any older PDO assignment. I made a patch to get a memory for the PDO mappings, and it always merge the new PDO mapping list into the old list, instead of throw the old list away. It remembers things. It is still not so smart like you said to fetch the PDO assign using their index via CoE automatically. I don't know if it's a good idea for the master to do it blindly. The question is when should the master fetch it. If the master fetches all the PDO mappings during the bus scan, isn't that a waste of time, because most of the time we don't need all of them? If fetch it when needed, the master needs to call several ecrt_master_sdo_upload() in the function ecrt_slave_config_pdos() to fetch the mapping, which makes the code quite ugly. And actually the app can do it itself, and then provides the correct default PDO mapping to the master. Hope you enjoy it! Regards, Jun On 22 April 2014 09:33, Gavin Lambert gav...@compacsort.com wrote: Hi all, TLDR: when reassigning PDOs, why doesn't the master read mappings from the slave via CoE? I have a (custom) slave that provides a number of different PDOs. I have a couple of different master applications which are interested in different subsets of these PDOs. As an example, let's say that the slave has an RxPDO at 0x1600 that points to 0x7000:0x00:0x20, and one app wants to use this value and the other doesn't. If the master apps just use ecrt_domain_reg_pdo_entry_list to register the PDOs of interest, then they both work (assuming that the slave has all the required PDOs assigned by default), but it wastes space in the packet as the whole SM is transferred even if some of the data is not of interest to that particular master app. (And in the case of outputs, it forces the master app to write something even when it doesn't want to, lest the slave get uninitialized data and think it needs to do something with it.) If the master apps use ecrt_slave_config_pdos to select the PDOs of interest, then things get troublesome. If the master apps specify the full mappings explicitly, then again things work, but as the slave does not support remapping (just reassignment) this generates warnings, and it just seems ugly to me to have to specify all this data that the slave already knows. (And it makes things more brittle, as if the mapping is changed in a future version of the slave it will generate an error instead of just working, as it would if it had loaded the slave's current mappings.) If the master apps don't specify the full mappings, however (just the sync manager - PDO assignments, which seems like it's a supported scenario given the docs and examples), then results are mixed. If the slave is rebooted prior to running either master app, it works. If not, then the master app that wants the extra PDO will fail to run. The problem case seems to be: - slave boots, has all PDOs in SII and CoE PDO assign. - first app runs, specifies PDO Assign to not include 0x1600. - runs successfully. - PDO Assign is updated in the actual slave. - second app runs, specifies PDO Assign to include 0x1600. - fails at ecrt_reg_pdo_entry_list as it cannot find a mapping for 0x7000:0x00. - problem is Loading default mapping for PDO 0x1600. - No default mapping found. - PDO Assign of the actual slave is never actually updated in this case as it fails before it activates the slave configs. - ethercat rescan / ethercat pdos at this point does not show 0x1600. - it requires rebooting the slave, or manually updating PDO Assign (and rescanning) before the master will admit that it exists again. Shouldn't this scenario work? The PDO is always specified in the SII, even if not presently in PDO Assign, so the master ought to know that it exists. And failing that, it could just try to read the mappings directly from the slave (if CoE is available) when unable to load default mapping from its cache. (I think part of the problem is that the CoE data appears to be replacing the SII data in the master's PDO cache.) I'm also a little puzzled as to why (if it wants to have a cache of PDO mappings) it seems to limit itself to reading only the currently assigned PDOs during the initial scan, instead of fetching all of them. They shouldn't be hard to find -- they can be identified purely by their index. It shouldn't be all that uncommon to have a slave that provides PDOs that aren't in the default PDO Assign, or to provide more information than needed for particular master apps. Is it just expected that master apps always hard-code the full mappings, instead of fetching the mappings from the slave? Or is this something missing from the
Re: [etherlab-users] Error reassigning removed PDO
Hello Gavin, for that specific part of the CoE transfer problem you mentioned, I may have observed the same problem, and I did some analysis on it. This is actually a big problem, makes the master quite unreliable for me. I have a temporary fix for it. But I don't know who should be responsible for this CoE mailbox bug. Is it the master? Is it the slave? or is it a design error in the EtherCAT standard for the mailbox? I'll write another email to elaborate the problem with the flaky CoE mailbox. Regards, Jun On 29 May 2014 09:37, Gavin Lambert gav...@compacsort.com wrote: Last month, I wrote: TLDR: when reassigning PDOs, why doesn't the master read mappings from the slave via CoE? [...] Shouldn't this scenario work? The PDO is always specified in the SII, even if not presently in PDO Assign, so the master ought to know that it exists. And failing that, it could just try to read the mappings directly from the slave (if CoE is available) when unable to load default mapping from its cache. (I think part of the problem is that the CoE data appears to be replacing the SII data in the master's PDO cache.) I'm also a little puzzled as to why (if it wants to have a cache of PDO mappings) it seems to limit itself to reading only the currently assigned PDOs during the initial scan, instead of fetching all of them. They shouldn't be hard to find -- they can be identified purely by their index. There's a further problem with this that I've since discovered: if, during the master's scan of the PDO assignment registers, something goes wrong with the CoE transfer of 0x1C1x:0, then the master will log an error but proceed anyway under the assumption that the slave has 0 PDOs assigned in that SM. If this is not contradicted by the application using ecrt_slave_config_pdos (including both assigns and mappings, because it read no default mappings), then the master will *write 0 back* to the PDO assignment register (if writable) on activate. This guarantees that the next scan will not find any PDOs, unless the slave reloads the default assignments during INIT (and with my slave author hat on, all advice I can find says that slaves should not do that, although I couldn't find official word). So basically it all seems to point to applications being unreliable (at least for flexible-assignment slaves) unless they use ecrt_slave_config_pdos to configure *everything* (including mappings, even for fixed-mapping slaves). Which makes me wonder why it bothers scanning for PDO assignments at all. Doesn't that just waste time if apps have to use ecrt_slave_config_pdos anyway? Given how flaky mailbox handling is in general (as previously mentioned), I'm surprised this hasn't come up more often. ___ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users ___ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users
Re: [etherlab-users] Error reassigning removed PDO
It’s mostly a master problem I think, although some of the worst misbehaviour requires particular functionality in the slave (which may be rarer). The main problem that I’ve personally run into recently (and coded my own workaround for, just a few minutes ago) was from this scenario: 1. Master starts up, starts doing slave scanning. 2. Application starts up, calls ecrt_request_master, which waits for slave scanning to complete before returning. 3. Application sets up basic configuration and calls ecrt_master_activate. 4. Slaves wind their way up to OP. 5. Meanwhile in the background the master starts reading the CoE dictionary and getting entry descriptions to fill in the names. (This takes quite a long time.) 6. Application decides something is screwy while this is still happening and calls ecrt_master_release and unloads the master module. 7. Since the master stops dead when this happens, occasionally it has just sent a CoE Info request to a slave but abandoned waiting for the response. The response is still sitting there in the slave’s mailbox. The slaves have dropped back to SAFEOP+ERROR because they’re no longer receiving data. 8. The master service and application are reloaded. 9. The initial scan sees the slaves at = PREOP so merely acknowledges the error and leaves them at SAFEOP, then starts to read SM+PDOs. 10. When it gets to the slave that had a stale SDO Info response in its mailbox (which is still there, because the slave was never sent back to INIT), it gets confused because it wasn’t the SDO 0x1C12 data response it was expecting (because it had just sent the request); it aborts the request and assumes 0 PDOs in that SM. Hilarity ensues, as I’ve already outlined below. (This can also occur if the network is disconnected but not unpowered at any time during the CoE dictionary scan, then reconnected later.) Note that it’s reasonable for the scan to not reset to INIT, because rescans can occur during operation (although having said that, I haven’t looked too closely at whether this disrupts anything). But I think it’s definitely a master-side bug that it can’t cope with stale responses – that’s just something you always have to expect with mailboxes, especially when there are timeouts involved as well. My workaround was to change the CoE FSM to check for and discard any stale data in the mailbox prior to beginning any CoE operation. It seemed to resolve the above issue in a very basic test, but I’ll hopefully know more after a more thorough one tomorrow. It’s not an ideal solution, of course; the underlying problem (which I hinted at below, and posted in more detail about several months ago) is that the Etherlab code assumes that only one thing is going on in the mailboxes at a time, and so only checks them when it’s expecting a response and throws its virtual hands up when it finds something other than what it wanted. This is particularly noticeable if a slave sends asynchronous notifications, or can process multiple mailbox protocols in parallel (both of which are allowed in the standards). The most common types of these are CoE emergencies and EoE. And woe betide you if the master happens to be handling a FoE request when an emergency arrives, or a CoE request when an EoE packet arrives, etc. Ideally the master should have some sort of central dispatcher which is constantly watching mailboxes and handing off incoming data to the protocol state machines as they arrive. Often this can even be done for “free” – many slaves provide a dedicated “MBoxState” FMMU that can be used to watch for new mailbox messages as part of the regular process datagram, avoiding the need to individually poll the slaves. From: Jun Yuan [mailto:j.y...@rtleaders.com] Sent: Thursday, 29 May 2014 20:40 To: Gavin Lambert Cc: etherlab-users@etherlab.org Subject: Re: [etherlab-users] Error reassigning removed PDO Hello Gavin, for that specific part of the CoE transfer problem you mentioned, I may have observed the same problem, and I did some analysis on it. This is actually a big problem, makes the master quite unreliable for me. I have a temporary fix for it. But I don't know who should be responsible for this CoE mailbox bug. Is it the master? Is it the slave? or is it a design error in the EtherCAT standard for the mailbox? I'll write another email to elaborate the problem with the flaky CoE mailbox. Regards, Jun On 29 May 2014 09:37, Gavin Lambert gav...@compacsort.com wrote: Last month, I wrote: TLDR: when reassigning PDOs, why doesn't the master read mappings from the slave via CoE? [...] Shouldn't this scenario work? The PDO is always specified in the SII, even if not presently in PDO Assign, so the master ought to know that it exists. And failing that, it could just try to read the mappings directly from the slave (if CoE is available) when unable
Re: [etherlab-users] Error reassigning removed PDO
Thank you so much, after reading your mail, I finally understand why some slave goto SAFEOP+ERROR state under the circumstances. Yes I had exactly the same problem. On 29 May 2014 11:24, Gavin Lambert gav...@compacsort.com wrote: It’s mostly a master problem I think, although some of the worst misbehaviour requires particular functionality in the slave (which may be rarer). The main problem that I’ve personally run into recently (and coded my own workaround for, just a few minutes ago) was from this scenario: 1. Master starts up, starts doing slave scanning. 2. Application starts up, calls ecrt_request_master, which waits for slave scanning to complete before returning. 3. Application sets up basic configuration and calls ecrt_master_activate. 4. Slaves wind their way up to OP. 5. Meanwhile in the background the master starts reading the CoE dictionary and getting entry descriptions to fill in the names. (This takes quite a long time.) 6. Application decides something is screwy while this is still happening and calls ecrt_master_release and unloads the master module. 7. Since the master stops dead when this happens, occasionally it has just sent a CoE Info request to a slave but abandoned waiting for the response. The response is still sitting there in the slave’s mailbox. The slaves have dropped back to SAFEOP+ERROR because they’re no longer receiving data. 8. The master service and application are reloaded. 9. The initial scan sees the slaves at = PREOP so merely acknowledges the error and leaves them at SAFEOP, then starts to read SM+PDOs. 10. When it gets to the slave that had a stale SDO Info response in its mailbox (which is still there, because the slave was never sent back to INIT), it gets confused because it wasn’t the SDO 0x1C12 data response it was expecting (because it had just sent the request); it aborts the request and assumes 0 PDOs in that SM. Hilarity ensues, as I’ve already outlined below. (This can also occur if the network is disconnected but not unpowered at any time during the CoE dictionary scan, then reconnected later.) Note that it’s reasonable for the scan to not reset to INIT, because rescans can occur during operation (although having said that, I haven’t looked too closely at whether this disrupts anything). But I think it’s definitely a master-side bug that it can’t cope with stale responses – that’s just something you always have to expect with mailboxes, especially when there are timeouts involved as well. My workaround was to change the CoE FSM to check for and discard any stale data in the mailbox prior to beginning any CoE operation. It seemed to resolve the above issue in a very basic test, but I’ll hopefully know more after a more thorough one tomorrow. It’s not an ideal solution, of course; the underlying problem (which I hinted at below, and posted in more detail about several months ago) is that the Etherlab code assumes that only one thing is going on in the mailboxes at a time, and so only checks them when it’s expecting a response and throws its virtual hands up when it finds something other than what it wanted. This is particularly noticeable if a slave sends asynchronous notifications, or can process multiple mailbox protocols in parallel (both of which are allowed in the standards). The most common types of these are CoE emergencies and EoE. And woe betide you if the master happens to be handling a FoE request when an emergency arrives, or a CoE request when an EoE packet arrives, etc. Ideally the master should have some sort of central dispatcher which is constantly watching mailboxes and handing off incoming data to the protocol state machines as they arrive. Often this can even be done for “free” – many slaves provide a dedicated “MBoxState” FMMU that can be used to watch for new mailbox messages as part of the regular process datagram, avoiding the need to individually poll the slaves. *From:* Jun Yuan [mailto:j.y...@rtleaders.com] *Sent:* Thursday, 29 May 2014 20:40 *To:* Gavin Lambert *Cc:* etherlab-users@etherlab.org *Subject:* Re: [etherlab-users] Error reassigning removed PDO Hello Gavin, for that specific part of the CoE transfer problem you mentioned, I may have observed the same problem, and I did some analysis on it. This is actually a big problem, makes the master quite unreliable for me. I have a temporary fix for it. But I don't know who should be responsible for this CoE mailbox bug. Is it the master? Is it the slave? or is it a design error in the EtherCAT standard for the mailbox? I'll write another email to elaborate the problem with the flaky CoE mailbox. Regards, Jun On 29 May 2014 09:37, Gavin Lambert gav...@compacsort.com wrote: Last month, I wrote: TLDR: when reassigning PDOs, why doesn't the master read mappings from the slave via CoE
Re: [etherlab-users] Error reassigning removed PDO
Am 04/22/2014 09:33 AM, schrieb Gavin Lambert: Hi all, TLDR: when reassigning PDOs, why doesn't the master read mappings from the slave via CoE? For the very simple reason, that the application can start without any slaves being attached to the network! In order to be able to do that, the master must be informed of the network topology _and_ the SyncManager configuration of every slave. The slaves themselves have different levels of intellegence. The simplest of them all don't support any reconfiguration, some support reconfiguring SyncManagers with different predefined and fixed PDO's and yet others support even reconfiguring PDO's themselves. Some slaves don't even know their configuration until they have booted and been configured by the master, as in the case of completely dynamic slaves like bus converters, e.g. EtherCAT - ProfiBus converters. These slaves are completely dependent on the master telling the slave what its configuration looks like, in terms of SyncManager, PDO's and even PDO Entries! Whether SyncManagers and PDO's are fixed or even mandatory should be documented in the ESI xml file. Just by the way, SII does not necessarily contain valid information, but it might. SII is a sort of online data storage where the slave manufacturor can store some information. Here is a typical example of a single point of truth flaw: the information in SII should reflect the slave, but if it doesn't, the slave still works but you have been led behind the bush! Even though there is a certification test to check that the SII information is correct, slaves exist where this is not the case. TLDR: RTFM of the slave and tell the master how a slave is to be configured ;) It is not a waste of space. - Richard I have a (custom) slave that provides a number of different PDOs. I have a couple of different master applications which are interested in different subsets of these PDOs. As an example, let's say that the slave has an RxPDO at 0x1600 that points to 0x7000:0x00:0x20, and one app wants to use this value and the other doesn't. If the master apps just use ecrt_domain_reg_pdo_entry_list to register the PDOs of interest, then they both work (assuming that the slave has all the required PDOs assigned by default), but it wastes space in the packet as the whole SM is transferred even if some of the data is not of interest to that particular master app. (And in the case of outputs, it forces the master app to write something even when it doesn't want to, lest the slave get uninitialized data and think it needs to do something with it.) If the master apps use ecrt_slave_config_pdos to select the PDOs of interest, then things get troublesome. If the master apps specify the full mappings explicitly, then again things work, but as the slave does not support remapping (just reassignment) this generates warnings, and it just seems ugly to me to have to specify all this data that the slave already knows. (And it makes things more brittle, as if the mapping is changed in a future version of the slave it will generate an error instead of just working, as it would if it had loaded the slave's current mappings.) If the master apps don't specify the full mappings, however (just the sync manager - PDO assignments, which seems like it's a supported scenario given the docs and examples), then results are mixed. If the slave is rebooted prior to running either master app, it works. If not, then the master app that wants the extra PDO will fail to run. The problem case seems to be: - slave boots, has all PDOs in SII and CoE PDO assign. - first app runs, specifies PDO Assign to not include 0x1600. - runs successfully. - PDO Assign is updated in the actual slave. - second app runs, specifies PDO Assign to include 0x1600. - fails at ecrt_reg_pdo_entry_list as it cannot find a mapping for 0x7000:0x00. - problem is Loading default mapping for PDO 0x1600. - No default mapping found. - PDO Assign of the actual slave is never actually updated in this case as it fails before it activates the slave configs. - ethercat rescan / ethercat pdos at this point does not show 0x1600. - it requires rebooting the slave, or manually updating PDO Assign (and rescanning) before the master will admit that it exists again. Shouldn't this scenario work? The PDO is always specified in the SII, even if not presently in PDO Assign, so the master ought to know that it exists. And failing that, it could just try to read the mappings directly from the slave (if CoE is available) when unable to load default mapping from its cache. (I think part of the problem is that the CoE data appears to be replacing the SII data in the master's PDO cache.) I'm also a little puzzled as to why (if it wants to have a cache of PDO mappings) it seems to limit itself to reading only the currently assigned PDOs during the initial scan, instead of fetching all of them. They shouldn't be hard