Re: [beagleboard] Equivalent of PRU on main CPU

Charles Steinkuehler Wed, 05 Aug 2015 15:01:04 -0700

On 8/5/2015 3:12 PM, Lenny wrote:
> 
> @Charles: Thanks for the warning :) Im still still a noob when it comes to 
> processor architecture. The application I have in mind (FIR filter) is 
> computationally intensive, but does not need a huge data throughput (few 
> MSps would be enough, which I know I can delegate to the PRU if necessary). 
> I found the idea of using the main processor appealing as I read somewhere 
> about its SIMD capability (doing 16 or 32 multiplications and accumulates 
> simultaneously, which would theoretically allow something like 16-32Gflops, 
> right?), and floating point arithmetics. 
> 
> So if you confirm that all those advantages are lost somewhere in the 
> communication between core and dedicated modules, that would be a pity but 
> indeed save me a lot of time :)


The Cortex-A9 core should be great at running a FIR filter,
particularly if you can use the NEON SIMD instructions.  The problem
with the application style processors (and the optimizations that make
them fast) is you create uncertainty and variable delay in responding
to a real-world event (like an interrupt for a new chunk of data).

For the BBB, you can get around 75 uS worst-case latency is a good
estimate.  If you have a mechanism to DMA (or use the PRU to collect
and write) samples into main memory, the ARM should be fine at running
the FIR filter, but you should bunch samples together and only fire an
interrupt for processing every N samples (or you're wasting a *LOT* of
time in IRQ overhead).

> And for curiosity/ease of later implementation/number of available 
> input-output ports: What delay and number of necessary instructions can I 
> expect for exchanging one or multiple bits between the main processor and a 
> GPIO port? More than 10 cycles?

The ARM core should see about the same latency as the PRU when talking
to the GPIO.  Writes will typically be posted and won't "cost" time on
the CPU as long as you don't write so fast you saturate the
interconnect.  Reads will generally stall the CPU and should take on
the order of a couple hundred nanoseconds:

https://github.com/machinekit/machinekit/blob/master/src/hal/drivers/hal_pru_generic/pru_generic.p#L137-L165

The interconnect may be somewhat faster for the ARM core, but talking
to the GPIO is going to be *WAY* slower than talking to main memory,
which is itself much slower than the CPU core frequency.

-- 
Charles Steinkuehler
char...@steinkuehler.net

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [beagleboard] Equivalent of PRU on main CPU

Reply via email to