Re: [beagleboard] PRU to DMA DDR caching issues

2016-01-03 Thread John Syne
This training material from free-electron explains the cache effects and how to 
deal with them. Starting at slide 440 

http://free-electrons.com/doc/training/linux-kernel/linux-kernel-slides.pdf 


Regards,
John




> On Jan 3, 2016, at 8:58 AM, Thomas Köhler  wrote:
> 
> Hello and thank you for the fast help. Here my answers for your comments:
> 
> 
> Unless you carefully write kernel code to treat your DDR memory buffer 
> as DMA memory, you are almost certainly encountering caching effects. 
> 
> I thought to have this done by getting the memory space from 
> dma_alloc_coherent(). I will research if there is more needed to disable the 
> caching but my understanding was that a DMA flagged space will never be 
> cached because the ARM core can not know if something has changed.
> 
> > I recommend instead of using a buffer in DDR memory, use the PRU data 
> > memories.
> 
> As to my tests, writing from the PRU to the DDR memory only requires 3 cycles 
> on the PRU and not more on a few million tries (L3 fast interconnect). 
> However I do not know how long it takes for this memory to be available at 
> the ARM...
> 
> I can not do all work in the PRU code because I need to tag the rising edge 
> with the Linux Kernel time. Therefore I need to find out the most 
> deterministic way to get the counter value into the Kernel.
> 
> I will do further tests. Maybe there is someone here who experienced the same 
> road. I think tagging an event with the PRU (5ns) and set it into relation to 
> Linux Kernel Time without losing to much nanoseconds should be one of the 
> great PRU benefits.
> 
> Thanks all
> 
> 
> 
> -- 
> For more options, visit http://beagleboard.org/discuss 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "BeagleBoard" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to beagleboard+unsubscr...@googlegroups.com 
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [beagleboard] PRU to DMA DDR caching issues

2016-01-03 Thread Charles Steinkuehler
On 1/3/2016 7:53 AM, Thomas Köhler wrote:
> 
> Any help? So many thanks... I hope the problem can be understood.

Unless you carefully write kernel code to treat your DDR memory buffer
as DMA memory, you are almost certainly encountering caching effects.
 The ARM core reads the memory location once, and will not do so again
as long as the data remains in the cache.  The more often you read the
DDR memory location, the more likely the data is to stay in the cache.

I recommend instead of using a buffer in DDR memory, use the PRU data
memories.  They are accessible by both the ARM and PRU cores, and have
the proper memory flags setup so the ARM core will not cache reads.

-- 
Charles Steinkuehler
char...@steinkuehler.net

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [beagleboard] PRU to DMA DDR caching issues

2016-01-03 Thread Thomas Köhler
I tracked down the issue a bit more:

If I insert something between two reads of the DDR memory in my kernel 
module (I inserted a pr_info("test")), the value is refreshed.

Maybe there is a possiblity to invalidate the cache? I will investigate 
more.

Thanks

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [beagleboard] PRU to DMA DDR caching issues

2016-01-03 Thread Thomas Köhler
Hello and thank you for the fast help. Here my answers for your comments:


Unless you carefully write kernel code to treat your DDR memory buffer 
> as DMA memory, you are almost certainly encountering caching effects. 
>

I thought to have this done by getting the memory space from 
dma_alloc_coherent(). 
I will research if there is more needed to disable the caching but my 
understanding was that a DMA flagged space will never be cached because the 
ARM core can not know if something has changed.

> I recommend instead of using a buffer in DDR memory, use the PRU data 
memories.

As to my tests, writing from the PRU to the DDR memory only requires 3 
cycles on the PRU and not more on a few million tries (L3 fast 
interconnect). However I do not know how long it takes for this memory to 
be available at the ARM...

I can not do all work in the PRU code because I need to tag the rising edge 
with the Linux Kernel time. Therefore I need to find out the most 
deterministic way to get the counter value into the Kernel.

I will do further tests. Maybe there is someone here who experienced the 
same road. I think tagging an event with the PRU (5ns) and set it into 
relation to Linux Kernel Time without losing to much nanoseconds should be 
one of the great PRU benefits.

Thanks all


-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [beagleboard] PRU to DMA DDR caching issues

2016-01-03 Thread Charles Steinkuehler
On 1/3/2016 10:58 AM, Thomas Köhler wrote:
> Hello and thank you for the fast help. Here my answers for your comments:
> 
> Unless you carefully write kernel code to treat your DDR memory buffer 
>> as DMA memory, you are almost certainly encountering caching effects. 
> 
> I thought to have this done by getting the memory space from 
> dma_alloc_coherent(). 
> I will research if there is more needed to disable the caching but my 
> understanding was that a DMA flagged space will never be cached because the 
> ARM core can not know if something has changed.

There's more than just allocating the memory to successfully use it
for DMA (which is basically what's happening here, the PRU is an
independent mechanism that modifies memory outside the context of the
ARM core).  This is very non-trivial to implement correctly, and
drastic overkill for what you need to do unless you're moving very
large amounts of data (more than will fit in the PRU data memories).

>> I recommend instead of using a buffer in DDR memory, use the PRU data 
> memories.
> 
> As to my tests, writing from the PRU to the DDR memory only requires 3 
> cycles on the PRU and not more on a few million tries (L3 fast 
> interconnect). However I do not know how long it takes for this memory to 
> be available at the ARM...

Several hundred nS at the very least.  It looks like the transaction
only takes three cycles on the PRU because the write is posted.  To
get a better idea of the actual transaction time, try doing a read on
the PRU side!

> I can not do all work in the PRU code because I need to tag the rising edge 
> with the Linux Kernel time. Therefore I need to find out the most 
> deterministic way to get the counter value into the Kernel.
> 
> I will do further tests. Maybe there is someone here who experienced the 
> same road. I think tagging an event with the PRU (5ns) and set it into 
> relation to Linux Kernel Time without losing to much nanoseconds should be 
> one of the great PRU benefits.

Have the PRU sample the pin and record the data you want into the
shared data memory.  Then have the PRU send an interrupt to the ARM
core indicating the data is available.

An alternate method that should produce similar quality results
without needing the PRU is to use the capture timers.  You can
configure the hardware timers to capture on the rising and/or falling
edge of a signal, and have the timer interrupt the ARM core.  This
provides cycle-level accuracy for timing, and you should be able to
implement everything using the exiting Linux kernel drivers for the
timer hardware.

-- 
Charles Steinkuehler
char...@steinkuehler.net

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.