On 4/18/2017 6:38 AM, letterdotandnum...@gmail.com wrote: > Hello, > I've encountered a problem with very slow reading speed from memory allocated > by > pru kernel driver uio_pruss comparing to reading from usual address spaces. > Here > is an performance tests on my Beagle Bone black: > > Average memcpy from pru DDR start address to application virtual address (300 > kB > of data): 10.4781ms > Average cv::Mat.copyTo (300 kB of data): 11.0681ms > Average memcpy from one virtual address to another (300 kB of data): > 0.510001ms > > Kernel version is 4.4.12-bone11 > > Can somebody explain the issue? May be I should have used new pru rpmesg > rproc > driver?
Like William said, we can't really answer your question without more detail, but I'll take a guess. The DRAM that's shared with the PRU is marked as non-cachable memory since the PRU can modify it. That means for a typical memory copy loop *EACH* word read from DRAM is going to turn into a full round-trip CPU to DRAM to CPU read latency rather than the first read triggering a cache-line fill. You probably want to use a memory copy that uses a bunch more registers and does burst reads from the PRU memory region (as big as you can for performance, but at least a cache line long). There are several useful routines from the ARM folks themselves: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html ...along with the benefits and drawbacks of each. -- Charles Steinkuehler char...@steinkuehler.net -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/5c302592-7edc-caa1-2e87-d47250fc5843%40steinkuehler.net. For more options, visit https://groups.google.com/d/optout.