[Emc-developers] on the ARM builds

Michael Haberler Sun, 02 Dec 2012 20:24:29 -0800

just noting where we are on various ARM builds - ARM really because it is the 
only non-x86 platform I'm looking into and so I have nothing contribute 
otherwise

getting a new board working with LinuxCNC is has several aspects:

1. getting a realtime kernel going for the board
2. getting LinuxCNC to compile and run
3. getting drivers to work
4. make it work as fast as you can
5. determine if the result is actually usable

Lets look at these in turn.

1) is a bit of a hit-and-miss game.

historically ARM support in Linux has suffered from the enormous range of
offerings available, and only recently with the Linaro effort the ARM ecosystem
is trying to get its act together in terms of build support, and integration
with Linux mainline.

As for RT, options really are only Xenomai and RT_PREEMPT at this point.
Generally one cannot hope for any stock kernel of this genre, it means building
one. The main difference is that Xenomai ports are pegged to very few kernel
versions as starting point - realistically 2, maybe three for the adventurous;
whereas RT_PREEMPT has been available for many major kernel versions so far and
will likely be the first RT option to be in Linux mainline; Xenomai might
follow later when their 'Xenomai 3' strategy pans out. Still the key flow
remains 'find a working kernel for that board; find a matching RT patch for
that kernel version; progress or abandon otherwise'.

Xenomai can be built off 2.6.38.8 and 3.2.21 kernel versions based on a patch,
and some of that patch is hardware dependent, in particular high resolution
timer support. Without that, one need to look no further because if the timer
support is low resolution, any latency measurements - leave alone usable
results with Linuxcnc and a fast base thread - are useless. The porting and
adaptation process is well documented and not that huge in lines of code, but
it requires intimate knowledge of the hardware. This means reading
processor/SoC datasheets and very low level work.

There is a 3.6-based Xenomai patch but I dont think it has seen much exposure
yet.

Luckily I found a quite usable patch for the Raspberry board and Xenomai works
well on it, giving on the order of 40uS latency, and that is in my repo;
currently I have no binary packages available but it looks doable. I have not
found a RT_PREEMPT patch which matches a usable Raspberry kernel version close
enough to give it a try. Usually kernel minor revisions mostly vary with
respect to driver support, so there is a chance one can fast forward over some
minor kernel version and still have something usable.

The other board I have and I'm dabbling with, the BeagleBone, has several
options - there's a RT_PREEMPT patched kernel source readily available for
2.6.8 (building right now) and there are several reports and patches for a
3.2.21 based Xenomai kernel which I'll try next. I have several starting points
and 'just' need to determine which one works well. 'Just' should be read as 'a
kernel build for such platforms should be started before you go to sleep, and
check in the morning'.

2) means build support - that's package availability and configure support.
Configure support can be fixed, but massive special-purpose package builds are
out of scope for me, so I try to pick a base which has a decent package stream;
sometimes there are several options (sometimes there are too many).

Rasperry: has a very decent ubuntu-like package stream, so most of the moving
parts are in place. Adapting configure involved all ARM dependencies so it was
initially more but will be much less for other boards. While all of LinuxCNC
builds, I have yet to see an Axis screen, but this is very likely a local setup
problem. HAL/RTAPI/Gladevcp run fine.

Beaglebone: this comes with the Angstrom distribution installed, and that is
useless for LinuxCNC purposes - too many packages missing. I switched to an
Ubuntu precise based setup and that behaves pretty much like the x86
environment. There were minimal configure changes after the Raspberry initial
round of changes. No suprises and Axis actually runs (see 5) below).

Building master: the current rtos-integration-preview1 code is based on 2.5. I
have test-merged into master with minimal touchups. Due to the use of
boost::python and its memory requirements during compilation swap space is
needed during a master build; a USB flash stick is fine (both of my boards
sport 256M memory)

3) Drivers
we are out in no-parport, no PCI land. Candidate Boards usally have GPIO pins,
some of which can be overlaid with other functions like I2C or SPI. The is
usually driver support through sysfs to configure and wiggle pins, and drive
i2c and SPI peripherals, but is not a high speed option as it requires system
calls and that is generally not a good idea in real time code. I have made a
minimal attempt with the Raspberry hal_gpio module; minimal insofar as it works
but isnt flexible in terms of configuration and not optimized for speed.

The good news is: once you have one, you have them all (more or less); all ARM
I/O is memory mapped and very similar from platform to platform. So getting
from A to B is quite straighforward; slightly different setup, different macros
for memory location, but that is about it. hal_gpio is really just a starting
point, but once you get some pin to wiggle with a simple C test program, you're
almost there.

4) Making it fast
that's really two questions and they are not the same - is the latency ok, and
does the overall system have enough umph to run the whole of LinuxCNC. My
answer right now would be 'yes and no' to that. That however doesnt make it a
moot effort for me - my primary reason is _not_ to find a $50 PC replacement
but to arrive at a realtime outboard solution where the rest of LinuxCNC runs
on some other non-RT platform. Maybe an iPad, or an Android tablet, who knows.

Latency with X86/RTAI is still best; Xenomai second, RT_PREEMPT third. Is it
good enough? Well, for servo it is, but who would run servo-only on such a
low-end board which is likely hooked up to steppers to start with? This isnt
making any impression with the RepRap crowd either which still thinks Arduino.
So that turns attention to base thread performance. Can you move steppers with
Pi/Beagleboard/Xenomai/hal_gpio? Yes you can, but not very fast. Similar to
RTAI/x86, just slower. So the standard solution to that would be 'glue on some
extra hardware for higher speeds' - an FPGA-based board for example, and then
we are leaving low-cost, single-board land, at least for now.

That however is not the end of the story since we are not the first to have
discovered the issue.

There is more than one option to deal with that issue, and some of them promise
to yield significantly better results we currently have than with a
LinuxCNC-optimized, hand-massaged RTAI-driven junkyard PC with a parport.
Sergey already has shown how to use a low-overhead hardware feature to improve
stepping performance with miniemc2.

I find the features of the TI AM335x Sitara Omap processors used in the
Beaglebone board particularly promising. Let me give a minimal rundown what is
special here:

- besides the main ARM cpu, there are two processors called 'Programmable
Realtime Units' on-chip.
- these run at 200Mhz, are 32bit integer CPU's and can drive the relevant
peripherals like GPIO, at speeds exceeding 50Mhz while them main CPU is doing
nothing (this isnt marketing baloney, I have scoped it myself)
- programming these PRU's is done in assembly (not C, limited fun, but fairly
straighforward)
- integrating these PRU's can be done with reasonably low effort, see
hal/components/hal_pru.c here:
http://git.mah.priv.at/gitweb/emc2-dev.git/blob/refs/heads/arm335x-hal-pru-module:/src/hal/components/hal_pru.c
- these CPU's are programmed in assembler, and that assembler is provided as
open source by TI
- the whole interaction is through shared memory, including stepping, halt,
run, inspect registers etc, which makes debugging straighforward (determined
low-level hackers can stop and inspect the PRU's by fiddling bits in /dev/mem)
- there is a time stamp counter running at 200Mhz

I find it entirely feasible to recode the RT thread functions of stepgen,
encoder, freqgen etc for these PRU's and it is not rocket science. Ok, reading
manuals and trying this and that, but definitely for mortals. What I
_think_could be the result of such an effort is a LinuxCNC HAL/RTAP/component
system which runs a base-thread lookalike in maybe 5-10uS cycles, and quite
deterministic at that.

Is it portable? no. Fast? yes. Cheap? yes. That is about as much as one can
expect for a $85 board. And we will _never_ have a solution with all three
boxes ticked. Not going to happen.

For anybody who is researching that option I would suggest to study Bas' work
which is the most advanced use of the PRU scheme I could find:
https://github.com/modmaker/BeBoPr . Since Bas started early on that platform,
some of the PRU handling is much easier nowadays since support code from TI has
become available.

On the cultural/community side it seems TI 'gets it' - they have made great
strides to appeal to, and actively support the open source and hacker
communities, and committed manpower and marketing money to the cause.

Thee are other options in the pipe - processors with a bit of FPGA inside or
glued on, a Beaglebone FPGA 'cape' plugin is in the works, so we're going to
see more here.

As for the Raspberry: that has very limited potential - the platform is about
half as fast as the Beaglebone, has fairly minimal GPIO, and it is wed to the
Broadcom chipset, which I would rate as a company which still doesnt get it. A
mean voice said about the chipset used on the Pi 'half of it isnt documented,
and the other half doesnt work'. Thats not entirely true but there is something
to that comment.

5) Is the result usable?
If you're pegging hopes to connect screen and keyboard to any of the current
boards, be able to fire up Axis and determine it is as fast as the current
PC/RTAI option: you are likely in for a bad surprise. Both Sergey and I found
that the current code maxes out the main CPU to the extent that maybe 10, 20%
CPU are left and that isnt good news. Note we both havent seriosly looked into
profiling and removing any glaring bottelnecks; this is still to be verified.

The hardware offerings will improve, but not within the lifetime of the boards
I reviewed.

If one follows the idea of spinning out RTAP+HAL+drivers onto such a board, I
think the prospect is excellent and potentially the same or better as the
current best-of-breed soft stepping solution (NB I am explicitly am exluding
Mesanet/Pico type solutions here). That is 'potential only' - we still have
such inflexible interfacing in LinuxCNC that such a setup isnt possible with
today's code structure.

- Michael

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

[Emc-developers] on the ARM builds

Reply via email to