Access to DMA memory while DMA in progress?

2017-10-27 Thread Mouse
Back 8 days ago, I asked about bus_dmamap_unload() at splhigh().
Thanks to very helpful off-list responses (thank you - you know who you
are!), I think I understand that a little better now.  (Summary: no,
that can't be counted on to work, and it's documented, just not where I
was looking - it's in the splhigh doc, not the bus_dmamap_unload doc.)

Now, I've got another problem.

I would like to read the DMA buffer while DMA is still going on.  That
is, I have a buffer of (say) 64K and the hardware is busily writing
into it; I want to read the buffer and see what the hardware has
written in the memory it has written and what used to be there in the
memory it hasn't.  I'm fine if the CPU's view lags the hardware's view
slightly, but I do care about the CPU's view of the DMA write order
matching the hardware's: that is, if the CPU sees the value written by
a given DMA cycle, then the CPU must also see the values written by all
previous DMA cycles.  (This reading is being carried out from within
the kernel, by driver code.  I might be able to move it to userland,
but it would surprise me if userland could do something the kernel
can't.)

But I'm not sure what sort of sync calls I need to make.  Because of
things like bounce buffers and data caches, I presumably need
bus_dmamap_sync(BUS_DMASYNC_POSTREAD) somewhere in the mix, but it is
not clear to me how/when, nor how fine-grained those calls can be.  Do
I just POSTREAD each byte/word/whatever before I read it?  How
expensive is bus_dmamap_sync - for example, is a 1K sync significantly
cheaper than four 256-byte syncs covering the same memory?  If I'm
reading a bunch of (say) uint32_ts, is it reasonable to POSTREAD each
uint32_t individually?  If I POSTREAD something that DMA hasn't written
yet, will it work to POSTREAD it again (and then read it) after DMA
_has_ written it?  Is BUS_DMA_STREAMING relevant?  I will be
experimenting to see what seems to work, but I'd like to understand
what is promised, not just what happens to work on my development
system.

Of course, there is the risk of reading a partially-written datum.  In
my case (aligned uint32_ts on amd64) I don't think that can happen.

The presence of bus_dmamem_mmap seems to me to imply that it should be
possible to make simple memory accesses Just Work, but it's not clear
to me to what extent bus_dmamem_mmap supports _concurrent_ access by
DMA and userland (for example, does the driver have to
BUS_DMASYNC_POSTREAD after the DMA and before userland access to
mmapped memory, or does the equivalent happen automagically, eg in the
page fault handler, or does bus_dmamem_mmap succeed only on systems
where no such care needs to be taken, or what?).

My impression is that bus_dma is pretty stable, and, thus, version
doesn't matter much.  But, in case it matters, 5.2 on amd64.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Paul.Koning

> On Oct 27, 2017, at 9:38 AM, Mouse  wrote:
> 
> ...
> I would like to read the DMA buffer while DMA is still going on.  That
> is, I have a buffer of (say) 64K and the hardware is busily writing
> into it; I want to read the buffer and see what the hardware has
> written in the memory it has written and what used to be there in the
> memory it hasn't.  I'm fine if the CPU's view lags the hardware's view
> slightly, but I do care about the CPU's view of the DMA write order
> matching the hardware's: that is, if the CPU sees the value written by
> a given DMA cycle, then the CPU must also see the values written by all
> previous DMA cycles. 

I'm not sure if that requirement is necessarily supported by hardware.  For 
example, in machines that have incoherent DMA, I would think it isn't.

paul



Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Mouse
>> I would like to read the DMA buffer while DMA is still going on.
>> [...]  I'm fine if the CPU's view lags the hardware's view slightly,
>> but I do care about the CPU's view of the DMA write order matching
>> the hardware's: that is, if the CPU sees the value written by a
>> given DMA cycle, then the CPU must also see the values written by
>> all previous DMA cycles.
> I'm not sure if that requirement is necessarily supported by hardware.  For $

Hm!  On such hardware, then, you can't count on any particular portion
of a DMA transfer being visible until the whole transfer is finished?

For my purposes, unelss amd64 is such a platform, I'm willing to write
off portability to such machines.  Is there any way to detect them from
within the driver?  I could just ignore the issue, but I'd prefer to
give an error at attach time.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Paul.Koning

> On Oct 27, 2017, at 10:36 AM, Mouse  wrote:
> 
>>> I would like to read the DMA buffer while DMA is still going on.
>>> [...]  I'm fine if the CPU's view lags the hardware's view slightly,
>>> but I do care about the CPU's view of the DMA write order matching
>>> the hardware's: that is, if the CPU sees the value written by a
>>> given DMA cycle, then the CPU must also see the values written by
>>> all previous DMA cycles.
>> I'm not sure if that requirement is necessarily supported by hardware.  For $
> 
> Hm!  On such hardware, then, you can't count on any particular portion
> of a DMA transfer being visible until the whole transfer is finished?

Yes.  I'm assuming here that the driver would do a data cache invalidate
(for the address range, if possible) at DMA end.  Given that, during the
transfer you would see pieces that weren't in the cache before, and would
not see pieces for which there are cache hits.

> For my purposes, unelss amd64 is such a platform, I'm willing to write
> off portability to such machines.  Is there any way to detect them from
> within the driver?  I could just ignore the issue, but I'd prefer to
> give an error at attach time.

I don't know much about x86 style platforms.  An example of the sort of
platform I mentioned would be the MIPS R5000.  I still have some scars
from building a fast router on top of its incoherent DMA...

paul


Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Mouse
>>> [...access to DMA buffer while DMA in progress...ordering...]
>> Hm!  On such hardware, then, you can't count on any particular
>> portion of a DMA transfer being visible until the whole transfer is
>> finished?
> Yes.  I'm assuming here that the driver would do a data cache
> invalidate (for the address range, if possible) at DMA end.

That sounds to me like something bus_dmamap_sync(BUS_DMASYNC_POSTREAD)
would do.  If so, it's not an issue for me; I'm perfectly fine with
some such operation being part of any CPU access to the buffer during
the transfer.

> I don't know much about x86 style platforms.  An example of the sort
> of platform I mentioned would be the MIPS R5000.  I still have some
> scars from building a fast router on top of its incoherent DMA...

Heh.  Well, reading the x86 bus_dma implementation (amd64 doesn't seem
to have a separate bus_dma implementation of its own) leads me to think
it has no such issues; all POSTREAD does there is copy from the bounce
buffer (if the transfer is bounced) and issue an lfence memory barrier.
(Hmm, I wonder if the memory barrier needs to be before the memcpy)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Eduardo Horvath
On Fri, 27 Oct 2017, Mouse wrote:

> I would like to read the DMA buffer while DMA is still going on.  That
> is, I have a buffer of (say) 64K and the hardware is busily writing
> into it; I want to read the buffer and see what the hardware has
> written in the memory it has written and what used to be there in the
> memory it hasn't.  I'm fine if the CPU's view lags the hardware's view
> slightly, but I do care about the CPU's view of the DMA write order
> matching the hardware's: that is, if the CPU sees the value written by
> a given DMA cycle, then the CPU must also see the values written by all
> previous DMA cycles.  (This reading is being carried out from within
> the kernel, by driver code.  I might be able to move it to userland,
> but it would surprise me if userland could do something the kernel
> can't.)

This is all very hardware dependent.

Make sure you map that area with the BUS_DMA_COHERENT flag.  It will 
disable as much caching as possible on those sections of memory, and on 
some hardware may be required or the CPU won't be able to read the data 
until the segment is bus_dmamem_unmap()ped even with the bus_dmamap_sync() 
operations.

Many NICs do something like this.  They have a ring buffer the CPU sets up 
with pointers to other buffers to hold incoming packets.  When a packet 
comes in the NIC writes out the contents and then updates the pointer to 
indicate DMA completion.  The CPU then swaps the pointer with one pointing 
to an empty buffer.

> 
> But I'm not sure what sort of sync calls I need to make.  Because of
> things like bounce buffers and data caches, I presumably need
> bus_dmamap_sync(BUS_DMASYNC_POSTREAD) somewhere in the mix, but it is
> not clear to me how/when, nor how fine-grained those calls can be.  Do
> I just POSTREAD each byte/word/whatever before I read it?  How
> expensive is bus_dmamap_sync - for example, is a 1K sync significantly
> cheaper than four 256-byte syncs covering the same memory?  If I'm
> reading a bunch of (say) uint32_ts, is it reasonable to POSTREAD each
> uint32_t individually?  If I POSTREAD something that DMA hasn't written
> yet, will it work to POSTREAD it again (and then read it) after DMA
> _has_ written it?  Is BUS_DMA_STREAMING relevant?  I will be
> experimenting to see what seems to work, but I'd like to understand
> what is promised, not just what happens to work on my development
> system.
> 
> Of course, there is the risk of reading a partially-written datum.  In
> my case (aligned uint32_ts on amd64) I don't think that can happen.

You want to do a bus_dmamap_sync(BUS_DMASYNC_POSTREAD) for each... let's 
call it a snapshot.  It will try to provide the CPU a consistent view of 
that section of memory at the time the sync call is made.

The cost of these operations is very hardware dependent.  On some machines 
the bus_dmamem_map() operation with or without the BUS_DMA_COHERENT flag 
will turn off all caches and the bus_dmamap_sync() calls are noops.

On hardware that has an I/O cache, bus_dmamap_sync() may need to flush it 
first to get the DMA data into the coherency domain.

If there's a CPU cache that has not been disabled for that secion of 
memory, bus_dmamap_sync() may need to invalidate it.

In the NIC example above, you map the ring buffer with BUS_DMA_COHERENT, 
fill it up and do a bus_dmamap_sync(BUS_DMASYNC_PREREAD).  When you want 
to read it (usually after getting an interrupt) you do 
bus_dmamap_sync(BUS_DMASYNC_POSTREAD) before doing the read.

I have long argued that we should also have bus_dma accessor functions 
like the ones used by bus_dma to access device registers.  They can do fun 
things like fixing up alignment and endianness swapping without having to 
litter the driver with code only needed for certain hardware.

> The presence of bus_dmamem_mmap seems to me to imply that it should be
> possible to make simple memory accesses Just Work, but it's not clear
> to me to what extent bus_dmamem_mmap supports _concurrent_ access by
> DMA and userland (for example, does the driver have to
> BUS_DMASYNC_POSTREAD after the DMA and before userland access to
> mmapped memory, or does the equivalent happen automagically, eg in the
> page fault handler, or does bus_dmamem_mmap succeed only on systems
> where no such care needs to be taken, or what?).

Trying to do this in userland on a machine with an I/O cache won't work 
too good.

> My impression is that bus_dma is pretty stable, and, thus, version
> doesn't matter much.  But, in case it matters, 5.2 on amd64.

AFAIK amd64 disables all caches on BUS_DMAMAP_COHERENT, so the sync 
operations aren't really necessary.  But jumping through all these hoops 
is important on other hardware.

Eduardo


Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Mouse
>> I would like to read the DMA buffer while DMA is still going on.
>> [...]
> This is all very hardware dependent.

That doesn't really surprise me; as MI as bus_dma/bus_space tries to
make device drivers, there are still some things it can't really
compensate for (or can compensate for in theory, but only at a ruinous
performance penalty).

> Make sure you map that area with the BUS_DMA_COHERENT flag.

Thank you!  The manpage called it a hint, so I left it off.  I've added
it; I'll see if that helps anything.

> Many NICs do something like this.

Hmm, true; a NIC descriptor ring is not a bad example of the sort of
thing I want to do: a block of memory I want to read while DMA is
mutating it.

>> But I'm not sure what sort of sync calls I need to make.  [...]
> You want to do a bus_dmamap_sync(BUS_DMASYNC_POSTREAD) [...]
> In the NIC example above, you map the ring buffer with
> BUS_DMA_COHERENT, fill it up and do a
> bus_dmamap_sync(BUS_DMASYNC_PREREAD).  When you want to read it
> (usually after getting an interrupt) you do
> bus_dmamap_sync(BUS_DMASYNC_POSTREAD) before doing the read.

Don't you need to PREWRITE after filling it?  Based on the mental
models I've formed, that feels necessary.

>> [...] bus_dmamem_mmap [...]
> Trying to do this in userland on a machine with an I/O cache won't
> work too good.

That's about what I'd expect.

> AFAIK amd64 disables all caches on BUS_DMAMAP_COHERENT, so the sync
> operations aren't really necessary.  But jumping through all these
> hoops is important on other hardware.

Okay.  If I can make it work on amd64 I'll be content for the moment,
even if I have to be horribly MD about it.  It'd be nice for this to be
MI, but not necessary.  (The hardware is a data acquisition card which
DMAs received data into host memory, but as far as I have been able to
tell has no way for the host to tell how far a DMA transfer in progress
has progressed.  In my case, the data is strongly enough patterned I
can fill the buffer with impossible values and tell how far it's gone
by looking at the buffer...but that means reading the buffer.)

While I don't always mind being MD, I _do_ rather like to know when and
how I am. :-)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Eduardo Horvath
On Fri, 27 Oct 2017, Mouse wrote:

> >> But I'm not sure what sort of sync calls I need to make.  [...]
> > You want to do a bus_dmamap_sync(BUS_DMASYNC_POSTREAD) [...]
> > In the NIC example above, you map the ring buffer with
> > BUS_DMA_COHERENT, fill it up and do a
> > bus_dmamap_sync(BUS_DMASYNC_PREREAD).  When you want to read it
> > (usually after getting an interrupt) you do
> > bus_dmamap_sync(BUS_DMASYNC_POSTREAD) before doing the read.
> 
> Don't you need to PREWRITE after filling it?  Based on the mental
> models I've formed, that feels necessary.

You'd want to do a PREWRITE and a POSTWRITE, but since writing wasn't part 
of your usage model I skipped that part.

Eduardo


Re: Access to DMA memory while DMA in progress?

2017-10-30 Thread David Holland
On Fri, Oct 27, 2017 at 11:15:34AM -0400, Mouse wrote:
 > Heh.  Well, reading the x86 bus_dma implementation (amd64 doesn't seem
 > to have a separate bus_dma implementation of its own) leads me to think
 > it has no such issues; all POSTREAD does there is copy from the bounce
 > buffer (if the transfer is bounced) and issue an lfence memory barrier.
 > (Hmm, I wonder if the memory barrier needs to be before the memcpy)

This was discussed at length on irc over the weekend, and the
conclusion seems to be "yes". With a side order of "why can't
architecture manuals ever be clear about these things?"

-- 
David A. Holland
dholl...@netbsd.org