Hi Jeff, I'm continuing from our chat yesterday here because it might be of wider interest: http://linuxcnc.mah.priv.at/irc/%23linuxcnc-devel/2012-12-13.html#22:57:03
to recap: - current 'sim' uses Gnu Pth threads - the rt-preempt merge candiate code uses Pthreads - to reduce code duplication, we folded rt-preempt code into the old 'sim' code and dropped Pth - runtests is fine on real machines, including threads.0 - runtests of threads.0 fails on a Virtualbox machine, which has exceptionally bad scheduling behaviour due to the hunderlying hypervisor scheme and second OS 'underneath'. Looking at the threads.0 test, it seems to imply there is a relative ordering of task execution, namely: - two threads, a fast one, and a slow one with 10 times the period of the fast one - the assumption seems to be the fast one executes 10 times before the slow one, resulting in a certain output pattern in the 'result' file, namely '1...10','1..10' which says the fast thread ran 10 times, then the slow one got scheduled. The only plausible explanation we arrived at so far seems the semantics of the threading libraries: - Gnu pth uses N:1 threading (http://en.wikipedia.org/wiki/Thread_%28computing%29#N:1_.28User-level_threading.29) - Pthreads uses M:N threading If that is the case, then threads.0 really verifies the implementation of Gnu Pthreads, but not some expected behaviour of HAL threads. At least I couldnt find a spec or comment which says 'even in sim mode, relative scheduling counts must remain fixed', which seems to happen by accident with Pth but not with pthreads in my Vbox setup. The reason why this test succeeds in usermode RT schemes, and on real machines seems to be that scheduling in these cases is precise enough to fit within the expected behavior time windows, whereas pthreads scheduling on Vbox is so massively off the scale that it violates the expected behavior. -- The question now is what to do with the result. First, does it indicate a fatal error situation? I dont think so, because in 'sim' mode all bets are off wrt relative scheduling anyway. Second, what does that test actually say? if all it does is to say 'well Gnu Pth just behaves _so_' then I'm unsure what we are testing against here. Of course intuitively one would _assume_ a thread 10x as fast gets to run 10x as often, but that it is not how it is implemented. If we were to ascertain a fixed relative scheduling rate, then the HAL/RTAPI threading code must assure that, for instance by explicitly scheduling a slow thread after N invocations of the fast thread where N is the period ratio. However, AFAICT that is not the case, and I am not at all sure this is actually the case with RT scheduling by the OS - just because it's precise enough on average that doesnt mean the behaviour isnt actually stochastic. I guess the proper answer to all this is to firm up the HAL/RTAPI threading specification by explicitly stating what relative periods of HAL threads mean for expected invocation counts. I can only infer this from the code, but I would think the answer is: If several threads are used, the relative timing suggests, *but does not ascertain* a certain ratio of thread invocations. If we can agree on that, that means threads.0 realistically is not acceptance test - but we can make it a standalone measurement for relative scheduling count probabilities, or drop it altogether. - Michael ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers
