Rusty Russell wrote: > On Wed, 2007-04-11 at 17:28 +0300, Avi Kivity wrote: > >> Rusty Russell wrote: >> >>> On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote: >>> >>> >>>> Nope. Being async is critical for copyless networking: >>>> >>>> >> With async operations, the saga continues like this: the host-side >> driver allocates an skb, get_page()s and attaches the data to the new >> skb, this skb crosses the bridge, trickles into the real ethernet >> device, gets queued there, sent, interrupts fire, triggering async >> completion. On this completion, we send a virtual interrupt to the >> guest, which tells it to destroy the skb and reclaim the pages attached >> to it. >> > > Hi Avi! > > Thanks for spelling it out, I now understand your POV. I had > considered it obvious that a (non-async) write which didn't copy would > block until the skb was finished with, which is easy to code up within > the tap device itself. Otherwise it's actually an async write without a > notification mechanism, which I agree is broken. > >
I hadn't considered an always-blocking (or unbuffered) networking API. It's very counter to current APIs, but does make sense with things like syslets. Without syslets, I don't think it's very useful as you need some artificial threads to keep things humming along. (How would userspace specify it? O_DIRECT when opening the tap?) I don't think there's a lot of difference between implementing aio or always-blocking copyless writes for tap. They just differ in how they sleep and in how to access user pages. > Note though: if the guest can change the packet headers they can > subvert some firewall rules and possibly crash the host. None of the > networking code I wrote expects packets to change in flight 8( > > This applies to a userspace or kernelspace driver. > > Umm, right. We could write-protect the packets (which would be very expensive). We could set the evil bit on guest-originated packets, and rewrite the entire networking stack to copy any part which is inspected if the evil bit is set. We need more head-scratching on this. >>> Yes, and this is already present in the tap device. Anthony suggested a >>> slightly nasty hack for multiple sg packets in one writev()/readv, which >>> could also give us batching. >>> >> No need for hacks if we get list aio support one day. >> > > As you point out though, aio is not something we want to hold our breath > for. Plus, aio never makes things simpler, and complexity kills > puppies. > The puppies had better stay away from qemu then, as it is completely async. Always-blocking writes won't reduce complexity. Suddenly you need a thread for each request batch and some pleasant code for joining the threads when done. Syslets do make it go away, though they're more for the mostly-nonblocking-with-occasional-blockage stuff rather than the always blocking thingie you describe. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html