--- Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > On Thu, Nov 09, 2006 at 07:06:00PM -0800, Jonathan > Day ([EMAIL PROTECTED]) wrote: > > Hi, > > > > I've got an interesting problem to contend with > and > > need some advice from the great wise ones here. > > > > First of all, is it possible (and/or "reasonable > > practice") when developing a network driver to do > > zero-copy transfers between main memory and the > > network device? > > What do you mean? > DMA from NIC memory into CPU memory?
Yes. I want to bypass the kernel altogether and think there may be a way to do this, but want to make very certain that I'm not going down the wrong track. The underlying problem is this. The group I'm working with is messing about with building their own networking device that will run at an equal speed to the bus leading to the host (2.5 gigabits/second). The device has its own DMA controller and can operate as bus master. It's my task to figure out how to get the data into the host at near-100% bandwidth without dropping anything, with minimal latency and real-time characteristics. (I talked them out of making me do this blindfolded, but on further consideration, I'm not sure if this was a good idea.) > > Secondly, the network device is only designed to > work > > with short packets and I really want to keep the > > throughput up. My thought was that if I fired off > an > > interrupt then transfer a page of data into an > area I > > know is safe, the kernel will have enough time to > find > > a new safe area and post the address before the > next > > page is ready to send. > > > > Can anyone suggest why this wouldn't work or, > assuming > > it can work, why this would be a Bad Idea? > > There should not be any kind of 'kernel will have > enough time to do > something', instead you must guarantee that there > will not be any kind > of races. You can either prealocate several buffers > or allocate them on > demand in interrupts. The exact process I was thinking of is as follows: 1. Driver sets up a full page and pins it. 2. Driver obtains the physical address and places that address into a known, fixed location. 3. Driver sends interrupt to network device to say that everything is ready. 4. On obtaining the interrupt, a bit is set to true on the network device, to say that the host is ready to receive. 5. The network device has a counter for the number of packets that can be put in one page. If this number is zero and the bit is set, then: 5.1 The counter is set to the maximum number of packets storeable in the page. 5.2 The page address is DMAed out of the known location in host memory and placed in network device memory. 5.3 The bit is cleared. 5.4 The driver is given an interrupt to tell it to prepare the next page. 5.5 If the sender had previously been told to stop transmitting, it is now told to continue. 6. Every received packet is placed in a ring buffer on the network device. 7. The counter is reduced by 1. 8. If the counter reaches zero, OR the packet is followed by an end-of-transmission notification: 8.1. The ring buffer is DMAed en-masse into host memory at the location given in the previously cached page address. 8.2. If the new page bit has not been set to true, the network device notifies the sender to pause. The idea is that we're DMAing to a page that we know is safe, because the driver has ensured that before telling the network device where it is. (ie: the driver ensures that the page actually does exist, has been fuly set up in the VMM, and isn't going to move in physical memory or into swap.) The above description could be "enhanced" by having two or more available pages, so that it could rapidly switch. The only requirement for a sustainable system would be that I could allocate and make safe pages at an average rate equal to or faster than they would be consumed. The xon/xoff-type control would then be simply a way of guaranteeing that the network device didn't run out of places to put things. The people working on the hardware have said that they can handle the hardware side of my description, but I want to make sure that (a) this will actually work the way I expect, and (b) there isn't something staring me in the face that's a billion times easier and a trillion times more efficient. What I'm looking for is every argument that can possibly be thrown against this method - latency, throughput, accepted standards for Linux drivers, excessive weirdness, whatever. > > Lastly, assuming my sanity lasts that long, would > I be > > correct in assuming that the first step in the > process > > of getting the driver peer-reviewed and accepted > would > > be to post the patches here? > > Actually not, the first step in that process is > learning jig dance and > of course providing enough beer and other goodies to > network maintainers. I tried doing a jig dance once, but the saw cut my shoes in half. I can try getting beer. Oregon has some acceptable microbreweries, but I miss having a pint of Hatters in England. Mead is easier. I brew mead. Strong, dry, rocket-fuel mead. > > Thanks for any help, > > No problem, but to answer at least on of your > question more > information should be provided. Hopefully what's there now is sufficient, though I'd be happy to add more if need-be. ____________________________________________________________________________________ Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail beta. http://new.mail.yahoo.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html