Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-23 Thread Warner Losh
Hi Svatopluk,

That looks very interesting.

You may be interested in the efforts of various people to bring up the armv6 
multi-core boards.

You can checkout the source from http://svn.freebsd.org/base/projects/armv6 to 
see where we are in that effort.  I believe that many of these issues have been 
addressed.  Perhaps you could take a look and contribute to any areas that are 
incomplete rather than starting from scratch?

Hope you are doing well!  We need more people that truly understand the ARM 
cache issues.

Warner


On May 23, 2012, at 7:13 AM, Svatopluk Kraus wrote:

> Hi,
> 
> with respect to your replies and among other things, the following
> summary could be made:
> 
> There are three kinds of DMA buffers according to their origin:
> 
> 1. driver buffers
> As Alexander wrote, the buffers should be allocated by
> bus_dmamap_alloc(). The function should be implemented to allocate the
> buffers correctly aligned with help of bus_dma_tag_t. For these
> buffers, we can avoid bouncing totally just by correct driver
> implementation. For badly implemented drivers, bouncing penalty is
> paid in case of unaligned buffers. For BUS_DMA_COHERENT allocations,
> as Mark wrote, an allocation pool of coherent pages is good
> optimalization.
> 
> 2. well-known system buffers
> Mbufs and vfs buffers. The buffers should be aligned on
> CACHE_LINE_SIZE (start and size).
> It should be enough for vfs buffers as they are carring data only and
> only whole buffers should be accessed by DMA. The mbuf is a structure
> and data can be carried on three possible locations. The first one,
> the external buffer, should be aligned on CACHE_LINE_SIZE. The next
> two locations, which are parts of the mbuf structure, could be
> unaligned in general. If we assume that no one else is writing any
> part of the mbuf during DMA access, we can set BUS_DMA_UNALIGNED_SAFE
> flag in mbuf load functions. I.e., we don't bounce unaligned buffers
> if the flag is set in dmamap. A tunable can be implemented to suppres
> the flag for debugging purposes.
> 
> 3. other buffers
> As we know nothing about these buffers, we must always bounce unaligned ones.
> 
> Just two more notes. The DMA buffer should not be access by anyone
> (except DMA itself) after PRESYNC and before POSTSYNC. For DMA
> descriptors (for example), using bus_dmamap_alloc() with
> BUS_DMA_COHERENT flag could be inevitable.
> 
> As I'm implementing bus dma for ARM11mpcore, I'm doing it with next 
> assumptions:
> 1. ARMv6k and higher
> 2. PIPT data cache
> 3. SMP ready
> 
> Svata
> ___
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscr...@freebsd.org"
> 
> 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-23 Thread Svatopluk Kraus
Hi,

with respect to your replies and among other things, the following
summary could be made:

There are three kinds of DMA buffers according to their origin:

1. driver buffers
As Alexander wrote, the buffers should be allocated by
bus_dmamap_alloc(). The function should be implemented to allocate the
buffers correctly aligned with help of bus_dma_tag_t. For these
buffers, we can avoid bouncing totally just by correct driver
implementation. For badly implemented drivers, bouncing penalty is
paid in case of unaligned buffers. For BUS_DMA_COHERENT allocations,
as Mark wrote, an allocation pool of coherent pages is good
optimalization.

2. well-known system buffers
Mbufs and vfs buffers. The buffers should be aligned on
CACHE_LINE_SIZE (start and size).
It should be enough for vfs buffers as they are carring data only and
only whole buffers should be accessed by DMA. The mbuf is a structure
and data can be carried on three possible locations. The first one,
the external buffer, should be aligned on CACHE_LINE_SIZE. The next
two locations, which are parts of the mbuf structure, could be
unaligned in general. If we assume that no one else is writing any
part of the mbuf during DMA access, we can set BUS_DMA_UNALIGNED_SAFE
flag in mbuf load functions. I.e., we don't bounce unaligned buffers
if the flag is set in dmamap. A tunable can be implemented to suppres
the flag for debugging purposes.

3. other buffers
As we know nothing about these buffers, we must always bounce unaligned ones.

Just two more notes. The DMA buffer should not be access by anyone
(except DMA itself) after PRESYNC and before POSTSYNC. For DMA
descriptors (for example), using bus_dmamap_alloc() with
BUS_DMA_COHERENT flag could be inevitable.

As I'm implementing bus dma for ARM11mpcore, I'm doing it with next assumptions:
1. ARMv6k and higher
2. PIPT data cache
3. SMP ready

Svata
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-23 Thread Svatopluk Kraus
On Mon, May 21, 2012 at 6:20 PM, Ian Lepore
 wrote:
>> ...
>> Some more notes.
>>
>> SMP makes things worse and ARM11mpcore is about SMP too. For example,
>> another thread could be open about that how to flush caches (exclusive
>> L1 cache) in SMP case.
>>
>> I'm not sure how to correctly change memory attributes on page which
>> is in use. Making new temporary mapping with different attributes is
>> wrong and does not help at all. It's question how to do TLB and cache
>> flushes on two and more processors and be sure that everything is OK.
>> It could be slow and maybe, changing memory attributes on the fly is
>> not a good idea at all.
>>
>
> My suggestion of making a temporary writable mapping was the answer to
> how to correctly change memory attributes on a page which is in use, at
> least in the existing code, which is for a single processor.
>
> You don't need, and won't even use, the temporary mapping.  You would
> make it just because doing so invokes logic in arm/arm/pmap.c which will
> find all existing virtual mappings of the given physical pages, and
> disable caching in each of those existing mappings.  In effect, it makes
> all existing mappings of the physical pages DMA_COHERENT.  When you
> later free the temporary mapping, all other existing mappings are
> changed back to being cacheable (as long as no more than one of the
> mappings that remain is writable).
>
> I don't know that making a temporary mapping just for its side effect of
> changing other existing mappings is a good idea, it's just a quick and
> easy thing to do if you want to try changing all existing mappings to
> non-cacheable.  It could be that a better way would be to have the
> busdma_machdep code call directly to lower-level routines in pmap.c to
> change existing mappings without making a new temporary mapping in the
> kernel pmap.  The actual changes to the existing mappings are made by
> pmap_fix_cache() but that routine isn't directly callable right now.
>

Thanks for explanation. In fact, I known only a little about current
ARM pmap implementation in FreeBSD tree. I took i386 pmap
implementation and modified it according to arm11mpcore.

> Also, as far as I know all of this automatic disabling of cache for
> multiple writable mappings applies only to VIVT cache architectures.
> I'm not sure how the pmap code is going to change to support VIPT and
> PIPT caches, but it may no longer be true that making a second writable
> mapping of a page will lead to changing all existing mappings to
> non-cacheable.
>
> -- Ian


Svata
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-22 Thread Alexander Kabaev
On Tue, 22 May 2012 07:56:42 +0200
Hans Petter Selasky  wrote:

> On Tuesday 22 May 2012 01:35:48 Alexander Kabaev wrote:
> > On Thu, 17 May 2012 11:01:34 -0500
> > 
> > Mark Tinguely  wrote:
> > > On Thu, May 17, 2012 at 8:20 AM, Svatopluk Kraus
> > > 
> > > 
> > > wrote:
> > > > Hi,
> > > > 
> > > > I'm working on DMA bus implementation for ARM11mpcore platform.
> > > > I've looked at implementation in ARM tree, but IMHO it only
> > > > works with some assumptions. There is a problem with DMA on
> > > > memory block which is not aligned on CACHE_LINE_SIZE (start and
> > > > end) if memory is not coherent.
> > > > 
> > > > Let's have a buffer for DMA which is no aligned on
> > > > CACHE_LINE_SIZE. Then first cache line associated with the
> > > > buffer can be divided into two parts, A and B, where A is a
> > > > memory we know nothing about it and B is buffer memory. The
> > > > same stands for last cache line associatted with the buffer. We
> > > > have no problem if a memory is coherent. Otherwise it depends
> > > > on memory attributes.
> > > > 
> > > > 1. [no cache] attribute
> > > > No problem as memory is coherent.
> > > > 
> > > > 2. [write throught] attribute
> > > > The part A can be invalidated without loss of any data. It's not
> > > > problem too.
> > > > 
> > > > 3. [write back] attribute
> > > > In general, there is no way how to keep both parts consistent.
> > > > At the start of DMA transaction, the cache line is written back
> > > > and invalidated. However, as we know nothing about memory
> > > > associated with part A of the cache line, the cache line can be
> > > > filled again at any time and messing up DMA transaction if
> > > > flushed. Even if the cache line is only filled but not flushed
> > > > during DMA transaction, we must make it coherent with memory
> > > > after that. There is a trick with saving part A of the line
> > > > into temporary buffer, invalidating the line, and restoring
> > > > part A in current ARM (MIPS) implementation. However, if
> > > > somebody is writting to memory associated with part A of the
> > > > line during this trick, the part A will be messed up. Moreover,
> > > > the part A can be part of another DMA transaction.
> > > > 
> > > > To safely use DMA with no coherent memory, a memory with [no
> > > > cache] or [write throught] attributes can be used without
> > > > problem. A memory with [write back] attribute must be aligned on
> > > > CACHE_LINE_SIZE.
> > > > 
> > > > However, for example mbuf, a buffer for DMA can be part of a
> > > > structure which can be aligned on CACHE_LINE_SIZE, but not the
> > > > buffer itself. We can know that nobody will write to the
> > > > structure during DMA transaction, so it's safe to use the
> > > > buffer event if it's not aligned on CACHE_LINE_SIZE.
> > > > 
> > > > So, in practice, if DMA buffer is not aligned on
> > > > CACHE_LINE_SIZE and we want to avoid bounce pages overhead, we
> > > > must support additional information to DMA transaction. It
> > > > should be easy to support the information about drivers data
> > > > buffers. However, what about OS data buffers like mentioned
> > > > mbufs?
> > > > 
> > > > The question is following. Is or can be guaranteed for all or at
> > > > least well-known OS data buffers which can be part of DMA access
> > > > that the not CACHE_LINE_SIZE aligned buffers are surrounded by
> > > > data which belongs to the same object as the buffer and the
> > > > data is not written by OS when given to a driver?
> > > > 
> > > > Any answer is appreciated. However, 'bounce pages' is not an
> > > > answer.
> > > > 
> > > > Thanks, Svata
> > > 
> > > Sigh. A several ideas by several people, but a good answer has not
> > > been created yet. SMP will make this worse.
> > > 
> > > To make things worse, there are drivers that use memory from the
> > > stack as DMA buffers.
> > > 
> > > I was hoping that mbufs are pretty well self-contained, unless you
> > > expect to modify them while DMA is happening.
> > > 
> > > This is on my to-do list.
> > > 
> > > --Mark.
> > 
> > Drivers that do DMA from memory that was not allocated by proper
> > busdma methods or load buffers for DMA using not properly
> > constrained busdma tags are broken drivers. We did not have a
> > busdma tag inheritance from parent bus to child devices before, but
> > now we should just take advantage of that and just make cache line
> > alignment a requirement for the platform. USB is firmly in that
> > 'broken' category btw and is currently being worked around by the
> > USB_HOST_ALIGN hack on MIPS, which suffers from the very same cache
> > coherency issues you describe.
> 
> Hi,
> 
> Drivers do not always use the same buffer format. That mean two
> entities exchanging data using different buffer allocations must
> either:
> 
> 1) Copy the data
> 2) Negotiate parameters for zero copy
> 
> Many USB protocols have headers which are designed without any
> thought about ARM's and CACHE alignment. That means byte access 

Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-21 Thread Hans Petter Selasky
On Tuesday 22 May 2012 01:35:48 Alexander Kabaev wrote:
> On Thu, 17 May 2012 11:01:34 -0500
> 
> Mark Tinguely  wrote:
> > On Thu, May 17, 2012 at 8:20 AM, Svatopluk Kraus 
> > 
> > wrote:
> > > Hi,
> > > 
> > > I'm working on DMA bus implementation for ARM11mpcore platform. I've
> > > looked at implementation in ARM tree, but IMHO it only works with
> > > some assumptions. There is a problem with DMA on memory block which
> > > is not aligned on CACHE_LINE_SIZE (start and end) if memory is not
> > > coherent.
> > > 
> > > Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
> > > Then first cache line associated with the buffer can be divided into
> > > two parts, A and B, where A is a memory we know nothing about it
> > > and B is buffer memory. The same stands for last cache line
> > > associatted with the buffer. We have no problem if a memory is
> > > coherent. Otherwise it depends on memory attributes.
> > > 
> > > 1. [no cache] attribute
> > > No problem as memory is coherent.
> > > 
> > > 2. [write throught] attribute
> > > The part A can be invalidated without loss of any data. It's not
> > > problem too.
> > > 
> > > 3. [write back] attribute
> > > In general, there is no way how to keep both parts consistent. At
> > > the start of DMA transaction, the cache line is written back and
> > > invalidated. However, as we know nothing about memory associated
> > > with part A of the cache line, the cache line can be filled again
> > > at any time and messing up DMA transaction if flushed. Even if the
> > > cache line is only filled but not flushed during DMA transaction,
> > > we must make it coherent with memory after that. There is a trick
> > > with saving part A of the line into temporary buffer, invalidating
> > > the line, and restoring part A in current ARM (MIPS)
> > > implementation. However, if somebody is writting to memory
> > > associated with part A of the line during this trick, the part A
> > > will be messed up. Moreover, the part A can be part of another DMA
> > > transaction.
> > > 
> > > To safely use DMA with no coherent memory, a memory with [no cache]
> > > or [write throught] attributes can be used without problem. A
> > > memory with [write back] attribute must be aligned on
> > > CACHE_LINE_SIZE.
> > > 
> > > However, for example mbuf, a buffer for DMA can be part of a
> > > structure which can be aligned on CACHE_LINE_SIZE, but not the
> > > buffer itself. We can know that nobody will write to the structure
> > > during DMA transaction, so it's safe to use the buffer event if
> > > it's not aligned on CACHE_LINE_SIZE.
> > > 
> > > So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
> > > we want to avoid bounce pages overhead, we must support additional
> > > information to DMA transaction. It should be easy to support the
> > > information about drivers data buffers. However, what about OS data
> > > buffers like mentioned mbufs?
> > > 
> > > The question is following. Is or can be guaranteed for all or at
> > > least well-known OS data buffers which can be part of DMA access
> > > that the not CACHE_LINE_SIZE aligned buffers are surrounded by data
> > > which belongs to the same object as the buffer and the data is not
> > > written by OS when given to a driver?
> > > 
> > > Any answer is appreciated. However, 'bounce pages' is not an answer.
> > > 
> > > Thanks, Svata
> > 
> > Sigh. A several ideas by several people, but a good answer has not
> > been created yet. SMP will make this worse.
> > 
> > To make things worse, there are drivers that use memory from the
> > stack as DMA buffers.
> > 
> > I was hoping that mbufs are pretty well self-contained, unless you
> > expect to modify them while DMA is happening.
> > 
> > This is on my to-do list.
> > 
> > --Mark.
> 
> Drivers that do DMA from memory that was not allocated by proper busdma
> methods or load buffers for DMA using not properly constrained busdma
> tags are broken drivers. We did not have a busdma tag inheritance from
> parent bus to child devices before, but now we should just take
> advantage of that and just make cache line alignment a requirement for
> the platform. USB is firmly in that 'broken' category btw and is
> currently being worked around by the USB_HOST_ALIGN hack on MIPS, which
> suffers from the very same cache coherency issues you describe.

Hi,

Drivers do not always use the same buffer format. That mean two entities 
exchanging data using different buffer allocations must either:

1) Copy the data
2) Negotiate parameters for zero copy

Many USB protocols have headers which are designed without any thought about 
ARM's and CACHE alignment. That means byte access via DMA must be supported, 
else you end up having to copy the data en-mass.

The USB_HOST_ALIGN is not a hack. It is coherently implemented across EHCI, 
OHCI, UHCI and XHCI drivers, which are currently the only USB drivers using 
DMA.

BUSDMA must instruct use of bounce buffers for case 1) for such CP

Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-21 Thread Alexander Kabaev
On Thu, 17 May 2012 11:01:34 -0500
Mark Tinguely  wrote:

> On Thu, May 17, 2012 at 8:20 AM, Svatopluk Kraus 
> wrote:
> > Hi,
> >
> > I'm working on DMA bus implementation for ARM11mpcore platform. I've
> > looked at implementation in ARM tree, but IMHO it only works with
> > some assumptions. There is a problem with DMA on memory block which
> > is not aligned on CACHE_LINE_SIZE (start and end) if memory is not
> > coherent.
> >
> > Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
> > Then first cache line associated with the buffer can be divided into
> > two parts, A and B, where A is a memory we know nothing about it
> > and B is buffer memory. The same stands for last cache line
> > associatted with the buffer. We have no problem if a memory is
> > coherent. Otherwise it depends on memory attributes.
> >
> > 1. [no cache] attribute
> > No problem as memory is coherent.
> >
> > 2. [write throught] attribute
> > The part A can be invalidated without loss of any data. It's not
> > problem too.
> >
> > 3. [write back] attribute
> > In general, there is no way how to keep both parts consistent. At
> > the start of DMA transaction, the cache line is written back and
> > invalidated. However, as we know nothing about memory associated
> > with part A of the cache line, the cache line can be filled again
> > at any time and messing up DMA transaction if flushed. Even if the
> > cache line is only filled but not flushed during DMA transaction,
> > we must make it coherent with memory after that. There is a trick
> > with saving part A of the line into temporary buffer, invalidating
> > the line, and restoring part A in current ARM (MIPS)
> > implementation. However, if somebody is writting to memory
> > associated with part A of the line during this trick, the part A
> > will be messed up. Moreover, the part A can be part of another DMA
> > transaction.
> >
> > To safely use DMA with no coherent memory, a memory with [no cache]
> > or [write throught] attributes can be used without problem. A
> > memory with [write back] attribute must be aligned on
> > CACHE_LINE_SIZE.
> >
> > However, for example mbuf, a buffer for DMA can be part of a
> > structure which can be aligned on CACHE_LINE_SIZE, but not the
> > buffer itself. We can know that nobody will write to the structure
> > during DMA transaction, so it's safe to use the buffer event if
> > it's not aligned on CACHE_LINE_SIZE.
> >
> > So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
> > we want to avoid bounce pages overhead, we must support additional
> > information to DMA transaction. It should be easy to support the
> > information about drivers data buffers. However, what about OS data
> > buffers like mentioned mbufs?
> >
> > The question is following. Is or can be guaranteed for all or at
> > least well-known OS data buffers which can be part of DMA access
> > that the not CACHE_LINE_SIZE aligned buffers are surrounded by data
> > which belongs to the same object as the buffer and the data is not
> > written by OS when given to a driver?
> >
> > Any answer is appreciated. However, 'bounce pages' is not an answer.
> >
> > Thanks, Svata
> 
> Sigh. A several ideas by several people, but a good answer has not
> been created yet. SMP will make this worse.
> 
> To make things worse, there are drivers that use memory from the
> stack as DMA buffers.
> 
> I was hoping that mbufs are pretty well self-contained, unless you
> expect to modify them while DMA is happening.
> 
> This is on my to-do list.
> 
> --Mark.

Drivers that do DMA from memory that was not allocated by proper busdma
methods or load buffers for DMA using not properly constrained busdma
tags are broken drivers. We did not have a busdma tag inheritance from
parent bus to child devices before, but now we should just take
advantage of that and just make cache line alignment a requirement for
the platform. USB is firmly in that 'broken' category btw and is
currently being worked around by the USB_HOST_ALIGN hack on MIPS, which
suffers from the very same cache coherency issues you describe.


-- 
Alexander Kabaev


signature.asc
Description: PGP signature


Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-21 Thread Mark Tinguely
On Mon, May 21, 2012 at 11:20 AM, Ian Lepore
 wrote:
> On Fri, 2012-05-18 at 16:13 +0200, Svatopluk Kraus wrote:
>> On Thu, May 17, 2012 at 10:07 PM, Ian Lepore
>>  wrote:
>> > On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote:
>> >> Hi,
>> >>
>> >> I'm working on DMA bus implementation for ARM11mpcore platform. I've
>> >> looked at implementation in ARM tree, but IMHO it only works with some
>> >> assumptions. There is a problem with DMA on memory block which is not
>> >> aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.
>> >>
>> >> Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
>> >> Then first cache line associated with the buffer can be divided into
>> >> two parts, A and B, where A is a memory we know nothing about it and B
>> >> is buffer memory. The same stands for last cache line associatted with
>> >> the buffer. We have no problem if a memory is coherent. Otherwise it
>> >> depends on memory attributes.
...

> My suggestion of making a temporary writable mapping was the answer to
> how to correctly change memory attributes on a page which is in use, at
> least in the existing code, which is for a single processor.
>
> You don't need, and won't even use, the temporary mapping.  You would
> make it just because doing so invokes logic in arm/arm/pmap.c which will
> find all existing virtual mappings of the given physical pages, and
> disable caching in each of those existing mappings.  In effect, it makes
> all existing mappings of the physical pages DMA_COHERENT.  When you
> later free the temporary mapping, all other existing mappings are
> changed back to being cacheable (as long as no more than one of the
> mappings that remain is writable).
>
> I don't know that making a temporary mapping just for its side effect of
> changing other existing mappings is a good idea, it's just a quick and
> easy thing to do if you want to try changing all existing mappings to
> non-cacheable.  It could be that a better way would be to have the
> busdma_machdep code call directly to lower-level routines in pmap.c to
> change existing mappings without making a new temporary mapping in the
> kernel pmap.  The actual changes to the existing mappings are made by
> pmap_fix_cache() but that routine isn't directly callable right now.
>
> Also, as far as I know all of this automatic disabling of cache for
> multiple writable mappings applies only to VIVT cache architectures.
> I'm not sure how the pmap code is going to change to support VIPT and
> PIPT caches, but it may no longer be true that making a second writable
> mapping of a page will lead to changing all existing mappings to
> non-cacheable.
>
> -- Ian

We don't want to carry the VIVT cache fixing code to VIPT/PIPT.

I like the x86 approach of marking the page with a cache type
(default/device/uncached/etc). The page mapping routines (for example
pmap_qenter() on a clustered write) will honor that cache type in all
future mappings. It is easy to implement. Other allocations, such as
page tables, can benefit from an attributed allocation too.

I also like having a separate allocator for the sub-page
bus_dmamem_alloc() requests that want uncached buffers. These entries
can stick around for a long time. If we just malloced the entries,
then the other threads that happen to allocate data from the same page
are penalized with uncached buffers too.

--Mark.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-21 Thread Daan Vreeken
Hi Ian (and list),

just commenting on the mbuf part :

On Monday 21 May 2012 18:20:21 Ian Lepore wrote:
> On Fri, 2012-05-18 at 16:13 +0200, Svatopluk Kraus wrote:
> > On Thu, May 17, 2012 at 10:07 PM, Ian Lepore
> >  wrote:
> > > On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote:
> > >> Hi,
...
> > What I can do in a driver it's not so simple in case of OS buffers
> > like mbufs. I can check how mbufs are used in current implementation
> > and say, yes, it's safe to use them unaligned. However, it can be
> > changed in next release if anybody won't take care of it. It would be
> > nice to have a maintained list of OS buffers which are DMA safe in
> > respect of CACHE_LINE_SIZE. Is the flag and the list interesting for
> > somebody else?
>
> I don't have a definitive answer for this, but my assumption has always
> been that once an mbuf is handed to a driver (for example, when it's
> added to an interface's send queue), the driver then "owns" that mbuf
> and nothing else in the system will touch it until the driver makes a
> call to hand it off or free it.  If that assumption is true, a driver
> could make good use of a BUS_DMA_UNALIGNED_SAFE flag with mbufs.
>
> The part that scares me about my assumption is the m_ext.ref_cnt field
> of the mbuf.  Its existance seems to imply that multiple entities
> concurrently have an interest in the data.  On the other hand, the lack
> of any built in provisions for locking seems to imply that concurrent
> access isn't happening, or perhaps it implies that any required
> synchronization is temporal rather than lock-based.
>
> I've never found anything in writing that explains mbuf usage
> conventions at this level of detail.

This assumption isn't always true. 'man 9 mbuf' mentions this, but not in one 
place. M_WRITABLE() can be used to tell wether or not you're allowed to 
modify an mbuf. If it returns false, you can create a writable copy of the 
mbuf and alter the copy instead of the original.
A writable copy of an mbuf can be made using m_dup().

Writing to non-writable mbuf's will cause data corruption in e.g. BPF and TCP 
retransmits (even in the non-SMP case).


Regards,
-- 
Daan Vreeken
Vitsch Electronics
http://Vitsch.nl
tel: +31-(0)40-7113051
KvK nr: 17174380
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-21 Thread Ian Lepore
On Fri, 2012-05-18 at 16:13 +0200, Svatopluk Kraus wrote:
> On Thu, May 17, 2012 at 10:07 PM, Ian Lepore
>  wrote:
> > On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote:
> >> Hi,
> >>
> >> I'm working on DMA bus implementation for ARM11mpcore platform. I've
> >> looked at implementation in ARM tree, but IMHO it only works with some
> >> assumptions. There is a problem with DMA on memory block which is not
> >> aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.
> >>
> >> Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
> >> Then first cache line associated with the buffer can be divided into
> >> two parts, A and B, where A is a memory we know nothing about it and B
> >> is buffer memory. The same stands for last cache line associatted with
> >> the buffer. We have no problem if a memory is coherent. Otherwise it
> >> depends on memory attributes.
> >>
> >> 1. [no cache] attribute
> >> No problem as memory is coherent.
> >>
> >> 2. [write throught] attribute
> >> The part A can be invalidated without loss of any data. It's not problem 
> >> too.
> >>
> >> 3. [write back] attribute
> >> In general, there is no way how to keep both parts consistent. At the
> >> start of DMA transaction, the cache line is written back and
> >> invalidated. However, as we know nothing about memory associated with
> >> part A of the cache line, the cache line can be filled again at any
> >> time and messing up DMA transaction if flushed. Even if the cache line
> >> is only filled but not flushed during DMA transaction, we must make it
> >> coherent with memory after that. There is a trick with saving part A
> >> of the line into temporary buffer, invalidating the line, and
> >> restoring part A in current ARM (MIPS) implementation. However, if
> >> somebody is writting to memory associated with part A of the line
> >> during this trick, the part A will be messed up. Moreover, the part A
> >> can be part of another DMA transaction.
> >>
> >> To safely use DMA with no coherent memory, a memory with [no cache] or
> >> [write throught] attributes can be used without problem. A memory with
> >> [write back] attribute must be aligned on CACHE_LINE_SIZE.
> >>
> >> However, for example mbuf, a buffer for DMA can be part of a structure
> >> which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We
> >> can know that nobody will write to the structure during DMA
> >> transaction, so it's safe to use the buffer event if it's not aligned
> >> on CACHE_LINE_SIZE.
> >>
> >> So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
> >> we want to avoid bounce pages overhead, we must support additional
> >> information to DMA transaction. It should be easy to support the
> >> information about drivers data buffers. However, what about OS data
> >> buffers like mentioned mbufs?
> >>
> >> The question is following. Is or can be guaranteed for all or at least
> >> well-known OS data buffers which can be part of DMA access that the
> >> not CACHE_LINE_SIZE aligned buffers are surrounded by data which
> >> belongs to the same object as the buffer and the data is not written
> >> by OS when given to a driver?
> >>
> >> Any answer is appreciated. However, 'bounce pages' is not an answer.
> >>
> >> Thanks, Svata
> >
> > I'm adding freebsd-arm@ to the CC list; that's where this has been
> > discussed before.
> >
> > Your analysis is correct... to the degree that it works at all right
> > now, it's working by accident.  At work we've been making the good
> > accident a bit more likely by setting the minimum allocation size to
> > arm_dcache_align in kern_malloc.c.  This makes it somewhat less likely
> > that unrelated objects in the kernel are sharing a cache line, but it
> > also reduces the effectiveness of the cache somewhat.
> >
> > Another factor, not mentioned in your analysis, is the size of the IO
> > operation.  Even if the beginning of the DMA buffer is cache-aligned, if
> > the size isn't exactly a multiple of the cache line size you still have
> > the partial flush situation and all of its problems.
> >
> > It's not guaranteed that data surrounding a DMA buffer will be untouched
> > during the DMA, even when that surrounding data is part of the same
> > conceptual object as the IO buffer.  It's most often true, but certainly
> > not guaranteed.  In addition, as Mark pointed out in a prior reply,
> > sometimes the DMA buffer is on the stack, and even returning from the
> > function that starts the IO operation affects the cacheline associated
> > with the DMA buffer.  Consider something like this:
> >
> >void do_io()
> >{
> >int buffer;
> >start_read(&buffer);
> >// maybe do other stuff here
> >wait_for_read_done();
> >}
> >
> > start_read() gets some IO going, so before it returns a call has been
> > made to bus_dmamap_sync(..., BUS_DMASYNC_PREREAD) and an invalidate gets
> > done on the cacheline containing the variable 'buffer'.

Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-18 Thread Svatopluk Kraus
On Thu, May 17, 2012 at 10:07 PM, Ian Lepore
 wrote:
> On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote:
>> Hi,
>>
>> I'm working on DMA bus implementation for ARM11mpcore platform. I've
>> looked at implementation in ARM tree, but IMHO it only works with some
>> assumptions. There is a problem with DMA on memory block which is not
>> aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.
>>
>> Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
>> Then first cache line associated with the buffer can be divided into
>> two parts, A and B, where A is a memory we know nothing about it and B
>> is buffer memory. The same stands for last cache line associatted with
>> the buffer. We have no problem if a memory is coherent. Otherwise it
>> depends on memory attributes.
>>
>> 1. [no cache] attribute
>> No problem as memory is coherent.
>>
>> 2. [write throught] attribute
>> The part A can be invalidated without loss of any data. It's not problem too.
>>
>> 3. [write back] attribute
>> In general, there is no way how to keep both parts consistent. At the
>> start of DMA transaction, the cache line is written back and
>> invalidated. However, as we know nothing about memory associated with
>> part A of the cache line, the cache line can be filled again at any
>> time and messing up DMA transaction if flushed. Even if the cache line
>> is only filled but not flushed during DMA transaction, we must make it
>> coherent with memory after that. There is a trick with saving part A
>> of the line into temporary buffer, invalidating the line, and
>> restoring part A in current ARM (MIPS) implementation. However, if
>> somebody is writting to memory associated with part A of the line
>> during this trick, the part A will be messed up. Moreover, the part A
>> can be part of another DMA transaction.
>>
>> To safely use DMA with no coherent memory, a memory with [no cache] or
>> [write throught] attributes can be used without problem. A memory with
>> [write back] attribute must be aligned on CACHE_LINE_SIZE.
>>
>> However, for example mbuf, a buffer for DMA can be part of a structure
>> which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We
>> can know that nobody will write to the structure during DMA
>> transaction, so it's safe to use the buffer event if it's not aligned
>> on CACHE_LINE_SIZE.
>>
>> So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
>> we want to avoid bounce pages overhead, we must support additional
>> information to DMA transaction. It should be easy to support the
>> information about drivers data buffers. However, what about OS data
>> buffers like mentioned mbufs?
>>
>> The question is following. Is or can be guaranteed for all or at least
>> well-known OS data buffers which can be part of DMA access that the
>> not CACHE_LINE_SIZE aligned buffers are surrounded by data which
>> belongs to the same object as the buffer and the data is not written
>> by OS when given to a driver?
>>
>> Any answer is appreciated. However, 'bounce pages' is not an answer.
>>
>> Thanks, Svata
>
> I'm adding freebsd-arm@ to the CC list; that's where this has been
> discussed before.
>
> Your analysis is correct... to the degree that it works at all right
> now, it's working by accident.  At work we've been making the good
> accident a bit more likely by setting the minimum allocation size to
> arm_dcache_align in kern_malloc.c.  This makes it somewhat less likely
> that unrelated objects in the kernel are sharing a cache line, but it
> also reduces the effectiveness of the cache somewhat.
>
> Another factor, not mentioned in your analysis, is the size of the IO
> operation.  Even if the beginning of the DMA buffer is cache-aligned, if
> the size isn't exactly a multiple of the cache line size you still have
> the partial flush situation and all of its problems.
>
> It's not guaranteed that data surrounding a DMA buffer will be untouched
> during the DMA, even when that surrounding data is part of the same
> conceptual object as the IO buffer.  It's most often true, but certainly
> not guaranteed.  In addition, as Mark pointed out in a prior reply,
> sometimes the DMA buffer is on the stack, and even returning from the
> function that starts the IO operation affects the cacheline associated
> with the DMA buffer.  Consider something like this:
>
>    void do_io()
>    {
>        int buffer;
>        start_read(&buffer);
>        // maybe do other stuff here
>        wait_for_read_done();
>    }
>
> start_read() gets some IO going, so before it returns a call has been
> made to bus_dmamap_sync(..., BUS_DMASYNC_PREREAD) and an invalidate gets
> done on the cacheline containing the variable 'buffer'.  The act of
> returning from the start_read() function causes that cacheline to get
> reloaded, so now the stale pre-DMA value of the variable 'buffer' is in
> cache again.  Right after that, the DMA completes so that ram has a
> newer value that belongs in

Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-17 Thread Ian Lepore
On Thu, 2012-05-17 at 15:20 +0200, Svatopluk Kraus wrote:
> Hi,
> 
> I'm working on DMA bus implementation for ARM11mpcore platform. I've
> looked at implementation in ARM tree, but IMHO it only works with some
> assumptions. There is a problem with DMA on memory block which is not
> aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.
> 
> Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
> Then first cache line associated with the buffer can be divided into
> two parts, A and B, where A is a memory we know nothing about it and B
> is buffer memory. The same stands for last cache line associatted with
> the buffer. We have no problem if a memory is coherent. Otherwise it
> depends on memory attributes.
> 
> 1. [no cache] attribute
> No problem as memory is coherent.
> 
> 2. [write throught] attribute
> The part A can be invalidated without loss of any data. It's not problem too.
> 
> 3. [write back] attribute
> In general, there is no way how to keep both parts consistent. At the
> start of DMA transaction, the cache line is written back and
> invalidated. However, as we know nothing about memory associated with
> part A of the cache line, the cache line can be filled again at any
> time and messing up DMA transaction if flushed. Even if the cache line
> is only filled but not flushed during DMA transaction, we must make it
> coherent with memory after that. There is a trick with saving part A
> of the line into temporary buffer, invalidating the line, and
> restoring part A in current ARM (MIPS) implementation. However, if
> somebody is writting to memory associated with part A of the line
> during this trick, the part A will be messed up. Moreover, the part A
> can be part of another DMA transaction.
> 
> To safely use DMA with no coherent memory, a memory with [no cache] or
> [write throught] attributes can be used without problem. A memory with
> [write back] attribute must be aligned on CACHE_LINE_SIZE.
> 
> However, for example mbuf, a buffer for DMA can be part of a structure
> which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We
> can know that nobody will write to the structure during DMA
> transaction, so it's safe to use the buffer event if it's not aligned
> on CACHE_LINE_SIZE.
> 
> So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
> we want to avoid bounce pages overhead, we must support additional
> information to DMA transaction. It should be easy to support the
> information about drivers data buffers. However, what about OS data
> buffers like mentioned mbufs?
> 
> The question is following. Is or can be guaranteed for all or at least
> well-known OS data buffers which can be part of DMA access that the
> not CACHE_LINE_SIZE aligned buffers are surrounded by data which
> belongs to the same object as the buffer and the data is not written
> by OS when given to a driver?
> 
> Any answer is appreciated. However, 'bounce pages' is not an answer.
> 
> Thanks, Svata

I'm adding freebsd-arm@ to the CC list; that's where this has been
discussed before.

Your analysis is correct... to the degree that it works at all right
now, it's working by accident.  At work we've been making the good
accident a bit more likely by setting the minimum allocation size to
arm_dcache_align in kern_malloc.c.  This makes it somewhat less likely
that unrelated objects in the kernel are sharing a cache line, but it
also reduces the effectiveness of the cache somewhat.

Another factor, not mentioned in your analysis, is the size of the IO
operation.  Even if the beginning of the DMA buffer is cache-aligned, if
the size isn't exactly a multiple of the cache line size you still have
the partial flush situation and all of its problems.

It's not guaranteed that data surrounding a DMA buffer will be untouched
during the DMA, even when that surrounding data is part of the same
conceptual object as the IO buffer.  It's most often true, but certainly
not guaranteed.  In addition, as Mark pointed out in a prior reply,
sometimes the DMA buffer is on the stack, and even returning from the
function that starts the IO operation affects the cacheline associated
with the DMA buffer.  Consider something like this:

void do_io()
{
int buffer;
start_read(&buffer);
// maybe do other stuff here
wait_for_read_done();
}

start_read() gets some IO going, so before it returns a call has been
made to bus_dmamap_sync(..., BUS_DMASYNC_PREREAD) and an invalidate gets
done on the cacheline containing the variable 'buffer'.  The act of
returning from the start_read() function causes that cacheline to get
reloaded, so now the stale pre-DMA value of the variable 'buffer' is in
cache again.  Right after that, the DMA completes so that ram has a
newer value that belongs in the buffer variable and the copy in the
cacheline is stale.  

Before control gets into the wait_for_read_done() routine that will
attempt to handle the POSTREAD parti

Re: ARM + CACHE_LINE_SIZE + DMA

2012-05-17 Thread Mark Tinguely
On Thu, May 17, 2012 at 8:20 AM, Svatopluk Kraus  wrote:
> Hi,
>
> I'm working on DMA bus implementation for ARM11mpcore platform. I've
> looked at implementation in ARM tree, but IMHO it only works with some
> assumptions. There is a problem with DMA on memory block which is not
> aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.
>
> Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
> Then first cache line associated with the buffer can be divided into
> two parts, A and B, where A is a memory we know nothing about it and B
> is buffer memory. The same stands for last cache line associatted with
> the buffer. We have no problem if a memory is coherent. Otherwise it
> depends on memory attributes.
>
> 1. [no cache] attribute
> No problem as memory is coherent.
>
> 2. [write throught] attribute
> The part A can be invalidated without loss of any data. It's not problem too.
>
> 3. [write back] attribute
> In general, there is no way how to keep both parts consistent. At the
> start of DMA transaction, the cache line is written back and
> invalidated. However, as we know nothing about memory associated with
> part A of the cache line, the cache line can be filled again at any
> time and messing up DMA transaction if flushed. Even if the cache line
> is only filled but not flushed during DMA transaction, we must make it
> coherent with memory after that. There is a trick with saving part A
> of the line into temporary buffer, invalidating the line, and
> restoring part A in current ARM (MIPS) implementation. However, if
> somebody is writting to memory associated with part A of the line
> during this trick, the part A will be messed up. Moreover, the part A
> can be part of another DMA transaction.
>
> To safely use DMA with no coherent memory, a memory with [no cache] or
> [write throught] attributes can be used without problem. A memory with
> [write back] attribute must be aligned on CACHE_LINE_SIZE.
>
> However, for example mbuf, a buffer for DMA can be part of a structure
> which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We
> can know that nobody will write to the structure during DMA
> transaction, so it's safe to use the buffer event if it's not aligned
> on CACHE_LINE_SIZE.
>
> So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
> we want to avoid bounce pages overhead, we must support additional
> information to DMA transaction. It should be easy to support the
> information about drivers data buffers. However, what about OS data
> buffers like mentioned mbufs?
>
> The question is following. Is or can be guaranteed for all or at least
> well-known OS data buffers which can be part of DMA access that the
> not CACHE_LINE_SIZE aligned buffers are surrounded by data which
> belongs to the same object as the buffer and the data is not written
> by OS when given to a driver?
>
> Any answer is appreciated. However, 'bounce pages' is not an answer.
>
> Thanks, Svata

Sigh. A several ideas by several people, but a good answer has not been
created yet. SMP will make this worse.

To make things worse, there are drivers that use memory from the stack as
DMA buffers.

I was hoping that mbufs are pretty well self-contained, unless you expect to
modify them while DMA is happening.

This is on my to-do list.

--Mark.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


ARM + CACHE_LINE_SIZE + DMA

2012-05-17 Thread Svatopluk Kraus
Hi,

I'm working on DMA bus implementation for ARM11mpcore platform. I've
looked at implementation in ARM tree, but IMHO it only works with some
assumptions. There is a problem with DMA on memory block which is not
aligned on CACHE_LINE_SIZE (start and end) if memory is not coherent.

Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE.
Then first cache line associated with the buffer can be divided into
two parts, A and B, where A is a memory we know nothing about it and B
is buffer memory. The same stands for last cache line associatted with
the buffer. We have no problem if a memory is coherent. Otherwise it
depends on memory attributes.

1. [no cache] attribute
No problem as memory is coherent.

2. [write throught] attribute
The part A can be invalidated without loss of any data. It's not problem too.

3. [write back] attribute
In general, there is no way how to keep both parts consistent. At the
start of DMA transaction, the cache line is written back and
invalidated. However, as we know nothing about memory associated with
part A of the cache line, the cache line can be filled again at any
time and messing up DMA transaction if flushed. Even if the cache line
is only filled but not flushed during DMA transaction, we must make it
coherent with memory after that. There is a trick with saving part A
of the line into temporary buffer, invalidating the line, and
restoring part A in current ARM (MIPS) implementation. However, if
somebody is writting to memory associated with part A of the line
during this trick, the part A will be messed up. Moreover, the part A
can be part of another DMA transaction.

To safely use DMA with no coherent memory, a memory with [no cache] or
[write throught] attributes can be used without problem. A memory with
[write back] attribute must be aligned on CACHE_LINE_SIZE.

However, for example mbuf, a buffer for DMA can be part of a structure
which can be aligned on CACHE_LINE_SIZE, but not the buffer itself. We
can know that nobody will write to the structure during DMA
transaction, so it's safe to use the buffer event if it's not aligned
on CACHE_LINE_SIZE.

So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and
we want to avoid bounce pages overhead, we must support additional
information to DMA transaction. It should be easy to support the
information about drivers data buffers. However, what about OS data
buffers like mentioned mbufs?

The question is following. Is or can be guaranteed for all or at least
well-known OS data buffers which can be part of DMA access that the
not CACHE_LINE_SIZE aligned buffers are surrounded by data which
belongs to the same object as the buffer and the data is not written
by OS when given to a driver?

Any answer is appreciated. However, 'bounce pages' is not an answer.

Thanks, Svata
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"