A year ago I wrote about how ARM's memory model is not strong enough to reliably transport double-precision floats across HAL pins: http://mid.gmane.org/20140702141237.GB65254%40unpythonic.net
Today after looking at rare ARM-only buildbot failures, some of us researched the ARM memory model a bit more, and found some unfortunate assumptions that seem to hold up on x86 but not on ARM. You can find the lengthy PDF document "ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition" by your favorite search engine. Down in appendix G.2.2 is a nice section explaining the observed failures, all of which seemed to happen only on ARM, the impact being that sometimes halsampler prints 0 instead of an expected value. Weakly-ordered message passing problem P1: STR R5, [R1] ; set new data STR R0, [R2] ; send flag indicating data ready P2: WAIT([R2]==1) ; wait on flag LDS R5, [R1] ; read new data In the absence of barriers, an end result of P2: R5=0 is permissible The fix is to use "barrier instructions", "DMB [ST]" and "DMB" on the writer and reader sides respectively. ("DMB [ST]" seems to mean '"DMB" or "DMB ST"; "DMB" is the strongest barrier, "DMB ST" is a specific kind of weaker barrier) It appears that the gcc built-in function __sync_synchoronize will generate the required instruction on ARM. On x86 this generates the odd instruction 'lock orl $0, (%esp)' and on x86_64 (or x86 with -march=pentium4), the 'mfence' instruction which will cause a small performance hit and as far as I know is not necessary. In particular, it's not required in this case according to this summary of the Intel SDM in http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf : Example 8-1 Stores are not reordered with other stores Proc 0 Proc 1 MOV [x] <- 1 mov EAX <- [y] MOV [y] <- 1 mov EBX <- [x] Forbidden final state: Proc 1:EAX=1 and Proc 1:EBX=0 We identified some locations in LinuxCNC where these barriers definitely need to be added: streamer/sampler halscope task/motion nml shared memory regions mutex operations (may already be right) but probably we will not immediately identify all such places and fix them all. I hope to work up a branch this weekend for further testing, particularly if I can reproduce the behavior on my ARM board (which is the odroid u3, same as in the buildbot farm). If it's not too invasive, I'll propose it for inclusion in 2.7, but I'm likely to make it cumulative with my rework of streamer/sampler since it centralizes what was 2 distinct sets of code before. ... since I first tried to send this message, sourceforge had their little meltdown and I did further research. First, I did targeted testing on my ARM and found that with a new test I coded up, the sampler bug showed up on average more than once per minute; and that with the addition of barriers it went way down to zero, or at least less than once per 16 hours. I have not placed this work on a tree on git.linuxcnc.org yet, but I plan to rework the basic fix for streamer/sampler *not* on top of the experimental library-ized streamer but in a way that is suitable for 2.7. I also now believe that nml shared memory regions are safe, due to use of OS mutexes which should already contain the required barriers. Jeff ------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ _______________________________________________ Emc-developers mailing list Emc-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/emc-developers