On Mon, Oct 20, 2014 at 07:53:24PM +1100, Frank Tkalcevic wrote: > I run axis on the local machine, then remote to it using keystick.
It also *typically* works when using e.g., axis + halui or axis + linuxcncrsh. However, sometimes it doesn't. This leads to reports like 328 (axis + halui, using halui sometimes axis is unresponsive for a few seconds) and 395 (axis + linuxcncrsh, linuxcncrsh becomes totally unresponsive) The problem crops up when the UI wants to do one of two things: wait to be certain its command was received by task; or wait to be certain its command was fully acted on by task. All the current UIs have an implementation similar to this one (from shcom): int emcCommandWaitReceived(int serial_number) { double end = 0.0; while (emcTimeout <= 0.0 || end < emcTimeout) { updateStatus(); if (emcStatus->echo_serial_number == serial_number) { return 0; } esleep(EMC_COMMAND_DELAY); end += EMC_COMMAND_DELAY; } return -1; } In this implementation, the UI waits for up to emcTimeout seconds (or forever, if emcTimeout <= 0) for the stat buffer to hold a certain serial number in echo_serial_number. (problems also arise in emcCommandWaitDone, which in shcom calls out to emcCommandWaitReceived as a first step) Here's one sequence of operations which causes this algorithm to go wrong: UI 1 UI 2 Task send SN 1 receive SN 1 echo SN 1 send SN 1001 receive SN 1001 echo SN 1001 poll status buffer until echo SN = 1 (never finishes) .. and this sort of situation is easy to trigger. In bug 395, it is easy to trigger because when linuxcncrsh is waiting for "SET MODE MANUAL", AXIS automatically sends another command when it reads the stat buffer and sees the mode has changed to manual. (UI 1 = linuxcncrsh, UI 2 = axis) I am aware that there must be some differences in behavior when using the different client-name arguments to RCS_CMD_BUFFER / RCS_STAT_CHANNEL etc but it doesn't seem to affect the way echo_serial_number behaves. To confirm this belief I had, I ran keystick (uses client name string "keystick" as you point out) and linuxcnctop (uses client name string "xemc", I assume). As I issued commands in linuxcnctop, I saw changing echo_serial_number values in linuxcnctop. This is why I said in my original message "the serial number method ... simply does not work". In my analysis, this bad behavior of multiple UIs in no way is a bug in libnml. It's a bug in the way "wait for command to be received / completed" were implemented on top of NML. The combo keystick + axis probably works better than many because keystick and axis both have finite timeouts, while linuxcncrsh apparently defaults to an infinite timeout so it readily exhibits very bad behavior when it triggers this bug. I'm sure open to solving this bug properly while retaining NML as the IPC method of LinuxCNC, because even if *this* project gets done on the fastest likely schedule (new API in 2.8, new backend in 2.9), *and* we try to adopt twice-a-year releases, it's still ~18 months to 2.9 and a fix for this class of bug. Perhaps it is worth returning to the solution suggested in bug 328, and ignoring the derail that happened right away (amusingly enough, by somebody else who wanted to replace NML). That patch uses an NML queue, which makes message reception reliable; and implements a globally increasing serial number. This makes it possible to wait for echo_serial_number >= serial_number (instead of ==), so it's OK if another UI sends a command around the same time you do. (however, this means you can't reliably determine whether your command was successful [RCS_DONE] or failure [RCS_ERROR] because you're likely to see the status of some command issued subsequent to your own. but mostly UIs don't actually indicate this success/failure result, but instead rely on an operator message being shown when there's an error.) Jeff ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho _______________________________________________ Emc-developers mailing list Emc-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/emc-developers