On 4/18/2017 6:38 AM, letterdotandnum...@gmail.com wrote:
> Hello,
> I've encountered a problem with very slow reading speed from memory allocated 
> by 
> pru kernel driver uio_pruss comparing to reading from usual address spaces. 
> Here 
> is an performance tests on my Beagle Bone black:
> 
> Average memcpy from pru DDR start address to application virtual address (300 
> kB 
> of data): 10.4781ms
> Average cv::Mat.copyTo (300 kB of data): 11.0681ms
> Average memcpy from one virtual address to another (300 kB of data): 
> 0.510001ms
> 
> Kernel version is 4.4.12-bone11
> 
> Can somebody explain the issue? May be I should have used new pru rpmesg 
> rproc 
> driver?

Like William said, we can't really answer your question without more
detail, but I'll take a guess.  The DRAM that's shared with the PRU is
marked as non-cachable memory since the PRU can modify it.  That means
for a typical memory copy loop *EACH* word read from DRAM is going to
turn into a full round-trip CPU to DRAM to CPU read latency rather
than the first read triggering a cache-line fill.

You probably want to use a memory copy that uses a bunch more
registers and does burst reads from the PRU memory region (as big as
you can for performance, but at least a cache line long).  There are
several useful routines from the ARM folks themselves:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html

...along with the benefits and drawbacks of each.

-- 
Charles Steinkuehler
char...@steinkuehler.net

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beagleboard/5c302592-7edc-caa1-2e87-d47250fc5843%40steinkuehler.net.
For more options, visit https://groups.google.com/d/optout.

Reply via email to