Hi, Is it expected that after ecrt_request_master(), all online slaves are in PREOP (or possibly stuck in INIT with error_flag=1)? Or is an application expected to explicitly verify the state of all slaves before trying to do anything?
In the continuing saga of fun with mailbox SDOs, I've found that even with Frank's or Knud's patches to reduce mailbox contention, there are still some issues that stem from the slave state not being as expected. In my application, on startup it requests the master and then uses ecrt_master_sdo_upload to fetch certain information from slaves (eg. profile, version, etc), both for diagnostics and to help ensure the config is sane. While this normally works fine, there can be problems if it occurs too soon after the master service is started or after it was last released. In particular, when the master is deactivated or released it will internally schedule a transition back to PREOP for all slaves. If the master is re-requested too quickly, then this may not have even started yet, and since SDO requests are disallowed (and the request state machines not processed) during slave reconfiguration, it can end up doing two consecutive writes (first the upload request from the application, then a retry or occasionally something involving 0x1C12 and 0x1C13). Firstly, this can result in the second request to fail due to an unexpected response and consequently fail the entire slave configuration (unless retried as in Frank's patches), and secondly this will result in the application request timing out (because the request machine is paused in a state where it just sent the request and then resumed thinking that it just needs to wait for the reply, but in the meantime the mailbox has been reset out from under it). And of course this also means that currently when ecrt_request_master() returns, some slaves may still be in a non-PREOP state pending transition to PREOP, so it is not possible to rely on accessing SDOs that are "preop only" - although this probably isn't a big problem as most of those will probably be used with ecrt_slave_config_sdo* instead, which is safer. Another interesting quirk that I noticed along the way is that ecrt_request_master() will internally wait on master->config_busy - but this is toggled (and waitqueue released) in between each slave, so even if slave configuration has started, ecrt_request_master() will block only until it finishes configuring the current slave and then return to the application while configuration continues in the background; this seems of dubious usefulness to me. ("slave configuration" here refers to returning the slaves to PREOP.) I'm happy to look at writing some patches to resolve this behaviour, but before I do that it seemed like a good idea to ask which behaviour is more correct (in the view of the community): 1. Everything is working as expected (no patches are required), and it's the application's responsibility to wait for the slave to return to PREOP before using ecrt_master_sdo_{down,up}load. 2. ecrt_request_master() should block until all slaves finish returning to PREOP, not just whichever one slave happens to be in progress at the time. (Sub-decision: should it be the open or the reserve that blocks? Currently it's only the latter.) 3. ecrt_master_deactivate() (and consequently ecrt_release_master() too) should block until all slaves finish returning to PREOP. (This won't help with initial startup happening too early.) 4. Don't allow configuration to start while a request is still in progress, but then do the configuration before starting the *next* request. (This won't help with ensuring it's in PREOP before requesting, but will prevent the mailbox mixup and timeout.) 5. Something else that I did not think of. (Note that where I say "return to PREOP" above, this also applies to the initial change to PREOP if the application is started too soon after the master module is loaded.) Thoughts? (Hopefully this doesn't bias the responses too much, but I'm slightly leaning towards #4, as this would uniformly apply to all types of requests from all sources [command-line, blocking API, asynch API], and is likely to be a step closer to structural improvement of the state machines. It's a little weaker in not assuring PREOP, but *usually* SDOs are always readable and the write-in-PREOP-only SDOs should be handled via ecrt_slave_config_sdo* as noted above. The main problem with this [and why one of the other options might be better] is that it could still try [and fail] to transfer while the slave is in INIT, in the case when the app is started too soon after the master, so #1 or #2 may be needed anyway.) Regards, Gavin Lambert _______________________________________________ etherlab-dev mailing list etherlab-dev@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-dev