just noting where we are on various ARM builds - ARM really because it is the 
only non-x86 platform I'm looking into and so I have nothing contribute 
otherwise

getting a new board working with LinuxCNC is has several aspects: 

1. getting a realtime kernel going for the board
2. getting LinuxCNC to compile and run
3. getting drivers to work
4. make it work as fast as you can
5. determine if the result is actually usable

Lets look at these in turn.


1) is a bit of a hit-and-miss game. 

historically ARM support in Linux has suffered from the enormous range of 
offerings available, and only recently with the Linaro effort the ARM ecosystem 
is trying to get its act together in terms of build support, and integration 
with Linux mainline.

As for RT, options really are only Xenomai and RT_PREEMPT at this point. 
Generally one cannot hope for any stock kernel of this genre, it means building 
one. The main difference is that Xenomai ports are pegged to very few kernel 
versions as starting point - realistically 2, maybe three for the adventurous; 
whereas RT_PREEMPT has been available for many major kernel versions so far and 
will likely be the first RT option to be in Linux mainline; Xenomai might 
follow later when their 'Xenomai 3' strategy pans out. Still the key flow 
remains 'find a working kernel for that board; find a matching RT patch for 
that kernel version; progress or abandon otherwise'.

Xenomai can be built off 2.6.38.8 and 3.2.21 kernel versions based on a patch, 
and some of that patch is hardware dependent, in particular high resolution 
timer support. Without that, one need to look no further because if the timer 
support is low resolution, any latency measurements - leave alone usable 
results with Linuxcnc and a fast base thread - are useless. The porting and 
adaptation process is well documented and not that huge in lines of code, but 
it requires intimate knowledge of the hardware. This means reading 
processor/SoC datasheets and very low level work.

There is a 3.6-based Xenomai patch but I dont think it has seen much exposure 
yet.

Luckily I found a quite usable patch for the Raspberry board and Xenomai works 
well on it, giving on the order of 40uS latency, and that is in my repo; 
currently I have no binary packages available but it looks doable. I have not 
found a RT_PREEMPT patch which matches a usable Raspberry kernel version close 
enough to give it a try. Usually kernel minor revisions mostly vary with 
respect to driver support, so there is a chance one can fast forward over some 
minor kernel version and still have something usable.

The other board I have and I'm dabbling with, the BeagleBone, has several 
options - there's a RT_PREEMPT patched kernel source readily available for 
2.6.8 (building right now) and there are several reports and patches for a 
3.2.21 based Xenomai kernel which I'll try next. I have several starting points 
and 'just' need to determine which one works well. 'Just' should be read as 'a 
kernel build for such platforms should be started before you go to sleep, and 
check in the morning'.


2) means build support - that's package availability and configure support. 
Configure support can be fixed, but massive special-purpose package builds are 
out of scope for me, so I try to pick a base which has a decent package stream; 
sometimes there are several options (sometimes there are too many).

Rasperry: has a very decent ubuntu-like package stream, so most of the moving 
parts are in place. Adapting configure involved all ARM dependencies so it was 
initially more but will be much less for other boards. While all of LinuxCNC 
builds, I have yet to see an Axis screen, but this is very likely a local setup 
problem. HAL/RTAPI/Gladevcp run fine.

Beaglebone: this comes with the Angstrom distribution installed, and that is 
useless for LinuxCNC purposes - too many packages missing. I switched to an 
Ubuntu precise based setup and that behaves pretty much like the x86 
environment. There were minimal configure changes after the Raspberry initial 
round of changes. No suprises and Axis actually runs (see 5) below).

Building master: the current rtos-integration-preview1 code is based on 2.5. I 
have test-merged into master with minimal touchups. Due to the use of 
boost::python and its memory requirements during compilation swap space is 
needed during a master build; a USB flash stick is fine (both of my boards 
sport 256M memory)


3) Drivers
we are out in no-parport, no PCI land. Candidate Boards usally have GPIO pins, 
some of which can be overlaid with other functions like I2C or SPI. The is 
usually driver support through sysfs to configure and wiggle pins, and drive 
i2c and SPI peripherals, but is not a high speed option as it requires system 
calls and that is generally not a good idea in real time code. I have made a 
minimal attempt with the Raspberry hal_gpio module; minimal insofar as it works 
but isnt flexible in terms of configuration and not optimized for speed.

The good news is: once you have one, you have them all (more or less); all ARM 
I/O is memory mapped and very similar from platform to platform. So getting 
from A to B is quite straighforward; slightly different setup, different macros 
for memory location, but that is about it. hal_gpio is really just a starting 
point, but once you get some pin to wiggle with a simple C test program, you're 
almost there.

4) Making it fast
that's really two questions and they are not the same - is the latency ok, and 
does the overall system have enough umph to run the whole of LinuxCNC. My 
answer right now would be 'yes and no' to that. That however doesnt make it a 
moot effort for me - my primary reason is _not_ to find a $50 PC replacement 
but to arrive at a realtime outboard solution where the rest of LinuxCNC runs 
on some other non-RT platform. Maybe an iPad, or an Android tablet, who knows.

Latency with X86/RTAI is still best; Xenomai second, RT_PREEMPT third. Is it 
good enough? Well, for servo it is, but who would run servo-only on such a 
low-end board which is likely hooked up to steppers to start with? This isnt 
making any impression with the RepRap crowd either which still thinks Arduino. 
So that turns attention to base thread performance. Can you move steppers with 
Pi/Beagleboard/Xenomai/hal_gpio? Yes you can, but not very fast. Similar to 
RTAI/x86, just slower. So the standard solution to that would be 'glue on some 
extra hardware for higher speeds' - an FPGA-based board for example, and then 
we are leaving low-cost, single-board land, at least for now.

That however is not the end of the story since we are not the first to have 
discovered the issue.

There is more than one option to deal with that issue, and some of them promise 
to yield significantly better results we currently have than with a 
LinuxCNC-optimized, hand-massaged RTAI-driven junkyard PC with a parport. 
Sergey already has shown how to use a low-overhead hardware feature to improve 
stepping performance with miniemc2.

I find the features of the TI AM335x Sitara Omap processors used in the 
Beaglebone board particularly promising. Let me give a minimal rundown what is 
special here:

- besides the main ARM cpu, there are two processors called 'Programmable 
Realtime Units' on-chip.
- these run at 200Mhz, are 32bit integer CPU's and can drive the relevant 
peripherals like GPIO, at speeds exceeding 50Mhz while them main CPU is doing 
nothing (this isnt marketing baloney, I have scoped it myself)
- programming these PRU's is done in assembly (not C, limited fun, but fairly 
straighforward)
- integrating these PRU's can be done with reasonably low effort, see 
hal/components/hal_pru.c here: 
http://git.mah.priv.at/gitweb/emc2-dev.git/blob/refs/heads/arm335x-hal-pru-module:/src/hal/components/hal_pru.c
- these CPU's are programmed in assembler, and that assembler is provided as 
open source by TI
- the whole interaction is through shared memory, including stepping, halt, 
run, inspect registers etc, which makes debugging straighforward (determined 
low-level hackers can stop and inspect the PRU's by fiddling bits in /dev/mem)
- there is a time stamp counter running at 200Mhz

I find it entirely feasible to recode the RT thread functions of stepgen, 
encoder, freqgen etc for these PRU's and it is not rocket science. Ok, reading 
manuals and trying this and that, but definitely for mortals. What I 
_think_could be the result of such an effort is a LinuxCNC HAL/RTAP/component 
system which runs a base-thread lookalike in maybe 5-10uS cycles, and quite 
deterministic at that.

Is it portable? no. Fast? yes. Cheap? yes. That is about as much as one can 
expect for a $85 board. And we will _never_ have a solution with all three 
boxes ticked. Not going to happen.

For anybody who is researching that option I would suggest to study Bas' work 
which is the most advanced use of the PRU scheme I could find: 
https://github.com/modmaker/BeBoPr . Since Bas started early on that platform, 
some of the PRU handling is much easier nowadays since support code from TI has 
become available.

On the cultural/community side it seems TI 'gets it' - they have made great 
strides to appeal to, and actively support the open source and hacker 
communities, and committed manpower and marketing money to the cause.

Thee are other options in the pipe - processors with a bit of FPGA inside or 
glued on, a Beaglebone FPGA 'cape' plugin is in the works, so we're going to 
see more here.

As for the Raspberry: that has very limited potential - the platform is about 
half as fast as the Beaglebone, has fairly minimal GPIO, and it is wed to the 
Broadcom chipset, which I would rate as a company which still doesnt get it. A 
mean voice said about the chipset used on the Pi 'half of it isnt documented, 
and the other half doesnt work'. Thats not entirely true but there is something 
to that comment.

5) Is the result usable? 
If you're pegging hopes to connect screen and keyboard to any of the current 
boards, be able to fire up Axis and determine it is as fast as the current 
PC/RTAI option: you are likely in for a bad surprise. Both Sergey and I found 
that the current code maxes out the main CPU to the extent that maybe 10, 20% 
CPU are left and that isnt good news. Note we both havent seriosly looked into 
profiling and removing any glaring bottelnecks; this is still to be verified.

The hardware offerings will improve, but not within the lifetime of the boards 
I reviewed.

If one follows the idea of spinning out RTAP+HAL+drivers onto such a board, I 
think the prospect is excellent and potentially the same or better as the 
current best-of-breed soft stepping solution (NB I am explicitly am exluding 
Mesanet/Pico type solutions here). That is 'potential only' - we still have 
such inflexible interfacing in LinuxCNC that such a setup isnt possible with 
today's code structure.

- Michael






















------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to