Rusty Russell wrote:
> On Wed, 2007-04-11 at 17:28 +0300, Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:
>>>   
>>>       
>>>> Nope.  Being async is critical for copyless networking:
>>>>
>>>>         
>> With async operations, the saga continues like this: the host-side 
>> driver allocates an skb, get_page()s and attaches the data to the new 
>> skb, this skb crosses the bridge, trickles into the real ethernet 
>> device, gets queued there, sent, interrupts fire, triggering async 
>> completion.  On this completion, we send a virtual interrupt to the 
>> guest, which tells it to destroy the skb and reclaim the pages attached 
>> to it.
>>     
>
> Hi Avi!
>
>       Thanks for spelling it out, I now understand your POV.  I had
> considered it obvious that a (non-async) write which didn't copy would
> block until the skb was finished with, which is easy to code up within
> the tap device itself.  Otherwise it's actually an async write without a
> notification mechanism, which I agree is broken.
>
>   

I hadn't considered an always-blocking (or unbuffered) networking API. 
It's very counter to current APIs, but does make sense with things like
syslets.  Without syslets, I don't think it's very useful as you need
some artificial threads to keep things humming along.

(How would userspace specify it? O_DIRECT when opening the tap?)

I don't think there's a lot of difference between implementing aio or
always-blocking copyless writes for tap.  They just differ in how they
sleep and in how to access user pages.

>       Note though: if the guest can change the packet headers they can
> subvert some firewall rules and possibly crash the host.  None of the
> networking code I wrote expects packets to change in flight 8(
>
>       This applies to a userspace or kernelspace driver.
>
>   

Umm, right.  We could write-protect the packets (which would be very
expensive).  We could set the evil bit on guest-originated packets, and
rewrite the entire networking stack to copy any part which is inspected
if the evil bit is set.  We need more head-scratching on this.

>>> Yes, and this is already present in the tap device.  Anthony suggested a
>>> slightly nasty hack for multiple sg packets in one writev()/readv, which
>>> could also give us batching.
>>>       
>> No need for hacks if we get list aio support one day.
>>     
>
> As you point out though, aio is not something we want to hold our breath
> for.  Plus, aio never makes things simpler, and complexity kills
> puppies.
>   

The puppies had better stay away from qemu then, as it is completely async.

Always-blocking writes won't reduce complexity.  Suddenly you need a
thread for each request batch and some pleasant code for joining the
threads when done.  Syslets do make it go away, though they're more for
the mostly-nonblocking-with-occasional-blockage stuff rather than the
always blocking thingie you describe.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to