On Fri, 2007-08-17 at 01:26 -0400, Gregory Haskins wrote:
> Hi Rusty,
>
> Comments inline...
>
> On Fri, 2007-08-17 at 11:25 +1000, Rusty Russell wrote:
> >
> > Transport has several parts. What the hypervisor knows about (usually
> > shared memory and some interrupt mechanism and possibly "DMA") and what
> > is convention between users (eg. ringbuffer layouts). Whether it's 1:1
> > or n-way (if 1:1, is it symmetrical?).
>
> TBH, I am not sure what you mean by 1:1 vs n-way ringbuffers (its
> probably just lack of sleep and tomorrow I will smack myself for
> asking ;)
>
> But could you elaborate here?
Hi Gregory,
Sure, these discussions can get pretty esoteric. The question is
whether you want a point-to-point transport (as we discuss here), or an
N-way. Lguest has N-way, but I'm not convinced it's worthwhile, as
there's some overhead involved in looking up recipients (basically futex
code).
> > And not having inter-guest is just
> > poor form (and putting it in later is impossible, as we'll see).
>
> I agree that having an ability to do inter-guest is a good idea.
> However, I don't know if I am convinced if it has to be done in a
> direct, zero-copy way. Mediating through the host certainly can work and
> is probably acceptable for most things. In this way the host is
> essentially acting as a DMA agent to copy from one guests memory to the
> other. It solves the "trust" issue and simplifies the need to have a
> "grant table" like mechanism which can get pretty hairy, IMHO.
I agree that page sharing is silly. But we can design a mechanism where
it such a "DMA agent" need only enforce a few very simple rules not the
whole protocol, and yet the guest doesn't know whether it's talking to
an agent or the host.
> > So we end up with an array of descriptors with next pointers, and two
> > ring buffers which refer to those descriptors: one for what descriptors
> > are pending, and one for what descriptors have been used (by the other
> > end).
>
> That's certainly one way to do it. IOQ (coming from the "simple ordered
> event sequence" mindset) has one logically linear ring. It uses a set
> of two "head/tail" indices ("valid" and "inuse") and an ownership flag
> (per descriptor) to essentially offer similar services as you mention.
> Producers "push" items at the index head, and consumers "pop" items from
> the index tail. Only the guest side can manipulate the valid index.
> Only the producer can manipulate the inuse-head. And only the consumer
> can manipulate the inuse-tail. Either side can manipulate the ownership
> bit, but only in strict accordance with the production or consumption of
> data.
Well, for cache reasons you should really try to avoid having both sides
write to the same data. Hence two separate cache-aligned regions is
better than one region and a flip bit. And if you make them separate
pages, then this can also be inter-guest safe 8)
> One thing that is particularly cool about the IOQ design is that its
> possible to get to 0 IO events for certain circumstances. For instance,
> if you look at the IOQNET driver, it has what I would call
> "bidirectional NAPI". I think everyone here probably understands how
> standard NAPI disables RX interrupts after the first packet is received
> Well, IOQNET can also disable TX hypercalls after the first one goes
> down to the host. Any subsequent writes will simply post to the queue
> until the host catches up and re-enables "interrupts". Maybe all of
> these queue schemes typically do that...im not sure...but I thought it
> was pretty cool.
Yeah, I agree. I'm not sure how important it is IRL, but it *feels*
clever 8)
> > (1) have the hypervisor be aware of the descriptor page format, location
> > and which guest can access it.
> > (2) have the descriptors themselves contains a type (read/write) and a
> > valid bit.
> > (3) have a "DMA" hypercall to copy to/from someone else's descriptors.
> >
> > Note that this means we do a copy for the untrusted case which doesn't
> > exist for the trusted case. In theory the hypervisor could do some
> > tricky copy-on-write page-sharing for very large well-aligned buffers,
> > but it remains to be seen if that is actually useful.
>
> That sounds *somewhat* similar to what I was getting at above with the
> dma/loopback thingy. Though you are talking about that "grant table"
> stuff and are scaring me ;)
Yeah, I fear grant tables too. But in any scheme, the descriptors imply
permission, so with a little careful design and implementation it should
"just work"...
Cheers,
Rusty.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel