Re: Am I using bus_dma right?
> You missed the most important part of my response: >>> So I have to treat it like a DMA write even if there is never any >>> write-direction DMA actually going on? >> Yes. Actually, I didn't miss that, though now that you mention it I can see how it would seem like it, because I didn't overtly respond to it. I probably should have said something like "Got it!" there. And, I think this too I didn't say, but thank you very much - you and everyone who's contributed - for taking the time and effort to help me with this! /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Am I using bus_dma right?
You missed the most important part of my response: On Fri, 24 Apr 2020, Eduardo Horvath wrote: > > > So I have to treat it like a DMA write even if there is never any > > write-direction DMA actually going on? > > Yes. > > > Then the problem *probably* is not bus_dma botchery. Eduardo
Re: Am I using bus_dma right?
>> I've been treating it as though my inspection of a given sample in >> the buffer counts as "transfer completed" for purposes of that >> sample. > Are you inspecting the buffer only after reciept of an interrupt or > are you polling? Polling. (Polls are provoked by userland doing a read() that ends up in my driver's read routine.) > [...] POSTWRITE does tell the kernel it can free up any bounce > buffers it may have allocated if it allocated bounce buffers, but I > digress. Someone else asked me (off-list) if bounce buffers were actually in use. I don't know; when next I'm back at that job (it's only three days a week), one thing I intend to do is peek under the hood of the bus_dma structures and find out. >> For my immediate needs, I don't care about anything other than >> amd64. But I'd prefer to understand the paradigm properly for the >> benefit of potential future work. > I believe if you use COHERENT on amd64 none of this matters since it > turns off caching on those memory regions. (But I don't have time to > grep the souces to verify this.) I do - or, rather, I will. I don't recall whether I'm using COHERENT, but it's easy enough to add if I'm not. >> And, indeed, I tried making the read routine do POSTREAD|POSTWRITE >> before and PREREAD|PREWRITE after its read-test-write of the >> samples, and it didn't help. > Ah now we're getting to something interesting. > What failure mode are you seeing? That off-list person asked me that too. I wrote up a more detailed explanation, but I saved it in case someone else wanted it. I'll include the relevant text below. The short summary is that I'm seeing data get _severely_ delayed before reaching CPU visibility - severely delayed as in multiple seconds. And here's why I believe that's what's going on: Perhaps I should explain why I believe what I do about the behaviour. The commercial product is a turnkey system involving some heavily custom application-specific hardware. It generates blobs of data which historically it sent up to the host over the 7300 to a DOS program (the version in use when I came into it was running under DOS and I was brought in to help move it to something more modern). Shortly into the project, we learned that the 7300 had been EOLed by Adlink with no replacement device suggested. We cast about and ended up putting another small CPU on the generating end which sends data up over Ethernet. The only reason we still care about data over the 7300 is a relatively large installed base that doesn't have Ethernet-capable data-generating hardware, but which we want to upgrade (the DOS versions have various problems in addition to feature lack). But my test hardware does have Ethernet. And, in the presence of that hardware, it always sends the data both ways, both as Ethernet packets and as signals on differential pairs (which get converted to the single-ended signals the 7300 needs very close to it - the differential pairs are for better noise immunity over a relatively long cable run in end-user installations). For my test runs, I not only ran the application, which I told to read from the 7300, but also a snoop program, which (a) uses BPF to capture the Ethernet form of the data and (b) uses a snoop facility I added to the 7300 driver to record a copy of everything that got returned through a read() call. I also added code to the userland application to record everything it gets from read(). (The driver code I put up for FTP does not have that. I can make that version available too if you want.) What I'm seeing is an Ethernet packet arriving containing, let us say, 11 22 33 44 55 66 77 88 99 aa bb cc dd ee, but I'm also seeing the 7300 driver returning, say, 11 22 33 44 55 66, to userland, then userland calling read() many times - enough to burn a full second of time - getting "no data here" each time (see the next paragraph for what this means). Multiple seconds later, after userland has timed out and gone into its "I'm not getting data" recovery mode, the driver sees the 77 88 99 aa bb cc dd ee part getting passed back to userland. When userland calls read(), the driver reads the next sample out of the DMA buffer, looking to see whether it's been overwritten. If it has, the samples it finds are passed back to userland and their places in the buffer written over with a value that cannot correspond to a sample (23 of the 32 data pins are grounded, so those bits cannot be nonzero in a sample). The driver uses interrupts only to deal with the case of data arriving over the 7300 but userland not reading it. The driver wants to track where the hardware is DMAing into, so it knows where to look for new data. I configure the hardware to interrupt every half-meg of data (in a 16M buffer); if the writing is getting too close to the reading, I push the read point forward, clearing the buffer to the "impossible" value in the process. But, in the tests I'm doing, I doubt that's happening (I can
Re: Am I using bus_dma right?
On Thu, 23 Apr 2020, Mouse wrote: > Okay, here's the first problem. There is no clear "transaction > completes". Let's clarify that. > The card has a DMA engine on it (a PLX9080, on the off chance you've > run into it before) that can DMA into chained buffers. I set it up > with a ring of butters - a chain of buffers with the last buffer > pointing to the first, none of them with the "end of chain" bit set - > and tell it to go. I request an interrupt at completion of each > buffer, so I have a buffer-granularity idea of where it's at, modulo > interrupt servicing latency. > > This means that there is no clear "this transfer has completed" moment. > What I want to do is inspect the DMA buffer to see how far it's been > overwritten, since there is a data value I know cannot be generated by > the hardware that's feeding samples to the card (over half the data > pins are hardwired to known logic levels). > > I've been treating it as though my inspection of a given sample in the > buffer counts as "transfer completed" for purposes of that sample. Are you inspecting the buffer only after reciept of an interrupt or are you polling? > > > When you do a write operation you should: > > > 1) Make sure the buffer contains all the data you want to transmit. > > > 2) Do a BUS_DMASYNC_PREWRITE to make sure any data that may remain in > > the CPU writeback cache is flushed to memory. > > > 3) Tell the hardware to do the write operation. > > > 4) When the write operation completes... well it shouldn't matter. > > ...but, according to the 8.0 manpage, I should do a POSTWRITE anyway, > and going under the hood (this is all on amd64), I find that PREREAD is > a no-op and POSTWRITE might matter because it issues an mfence to avoid > memory access reordering issues. I doubt the mfence does much of anything in this circumstance, but POSTWRITE does tell the kernel it can free up any bounce buffers it may have allocated if it allocated bounce buffers, but I digress. > > > If you have a ring buffer you should try to map it CONSISTENT which > > will disable all caching of that memory. > > CONSISTENT? I don't find that anywhere; do you mean COHERENT? Yes COHERENT. (That's what I get for relying om my memory.) > > > However, some CPUs will not allow you to disable caching, so you > > should put in the appropriate bus_dmamap_sync() operations so the > > code will not break on those machines. > > For my immediate needs, I don't care about anything other than amd64. > But I'd prefer to understand the paradigm properly for the benefit of > potential future work. I believe if you use COHERENT on amd64 none of this matters since it turns off caching on those memory regions. (But I don't have time to grep the souces to verify this.) > > Then copy the data out of the ring buffer and do another > > BUS_DMASYNC_PREREAD or BUS_DMASYNC_PREWRITE as appropriate. > > Then I think I was already doing everything necessary. And, indeed, I > tried making the read routine do POSTREAD|POSTWRITE before and > PREREAD|PREWRITE after its read-test-write of the samples, and it > didn't help. Ah now we're getting to something interesting. What failure mode are you seeing? > >> One of the things that confuses me is that I have no write-direction > >> DMA going on at all; all the DMA is in the read direction. But > >> there is a driver write to the buffer that is, to put it loosely, > >> half of a write DMA operation (the "host writes the buffer" half). > > When the CPU updates the contents of the ring buffer it *is* a DMA > > write, > > Well, maybe from bus_dma's point of view, but I would not say there is > write-direction DMA happening unless something DMAs data out of memory. > > > even if the device never tries to read the contents, since the update > > must be flushed from the cache to DRAM or you may end up reading > > stale data later. > > So I have to treat it like a DMA write even if there is never any > write-direction DMA actually going on? Yes. > Then the problem *probably* is not bus_dma botchery. Eduardo