Re: [PATCHv2 1/2] mm: export use_mm/unuse_mm to modules

2009-09-16 Thread Andrew Morton
On Thu, 17 Sep 2009 08:38:18 +0300 "Michael S. Tsirkin"  wrote:

> Hi Andrew,
> On Tue, Aug 11, 2009 at 03:10:10PM -0700, Andrew Morton wrote:
> > On Wed, 12 Aug 2009 00:27:52 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > vhost net module wants to do copy to/from user from a kernel thread,
> > > which needs use_mm (like what fs/aio has).  Move that into mm/ and
> > > export to modules.
> > 
> > OK by me.  Please include this change in the virtio patchset.  Which I
> > shall cheerfully not be looking at :)
> 
> The virtio patches are somewhat delayed as we are ironing out the
> kernel/user interface with Rusty. Can the patch moving use_mm to mm/ be
> applied without exporting to modules for now? This will make it easier
> for virtio which will only have to patch in the EXPORT line.

That was 10,000 patches ago.

> I also have a small patch optimizing atomic usage in use_mm (which I did for
> virtio) and it's easier to apply it if the code is in the new place.
> 
> If ok, pls let me know and I'll post the patch without the EXPORT line.

Please just send them all out.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv2 1/2] mm: export use_mm/unuse_mm to modules

2009-09-16 Thread Michael S. Tsirkin
Hi Andrew,
On Tue, Aug 11, 2009 at 03:10:10PM -0700, Andrew Morton wrote:
> On Wed, 12 Aug 2009 00:27:52 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > vhost net module wants to do copy to/from user from a kernel thread,
> > which needs use_mm (like what fs/aio has).  Move that into mm/ and
> > export to modules.
> 
> OK by me.  Please include this change in the virtio patchset.  Which I
> shall cheerfully not be looking at :)

The virtio patches are somewhat delayed as we are ironing out the
kernel/user interface with Rusty. Can the patch moving use_mm to mm/ be
applied without exporting to modules for now? This will make it easier
for virtio which will only have to patch in the EXPORT line.

I also have a small patch optimizing atomic usage in use_mm (which I did for
virtio) and it's easier to apply it if the code is in the new place.

If ok, pls let me know and I'll post the patch without the EXPORT line.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote:
>>> There is no role reversal.
>> So if I have virtio-blk driver running on the x86 and vhost-blk device
>> running on the ppc board, I can use the ppc board as a block-device.
>> What if I really wanted to go the other way?
> 
> It seems ppc is the only one that can initiate DMA to an arbitrary
> address, so you can't do this really, or you can by tunneling each
> request back to ppc, or doing an extra data copy, but it's unlikely to
> work well.
> 
> The limitation comes from hardware, not from the API we use.

Understood, but presumably it can be exposed as a sub-function of the
ppc's board's register file as a DMA-controller service to the x86.
This would fall into the "tunnel requests back" category you mention
above, though I think "tunnel" implies a heavier protocol than it would
actually require.  This would look more like a PIO cycle to a DMA
controller than some higher layer protocol.

You would then utilize that DMA service inside the memctx, and it the
rest of vbus would work transparently with the existing devices/drivers.

I do agree it would require some benchmarking to determine its
feasibility, which is why I was careful to say things like "may work"
;).  I also do not even know if its possible to expose the service this
way on his system.  If this design is not possible or performs poorly, I
admit vbus is just as hosed as vhost in regard to the "role correction"
benefit.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 10:10:55AM -0400, Gregory Haskins wrote:
> > There is no role reversal.
> 
> So if I have virtio-blk driver running on the x86 and vhost-blk device
> running on the ppc board, I can use the ppc board as a block-device.
> What if I really wanted to go the other way?

It seems ppc is the only one that can initiate DMA to an arbitrary
address, so you can't do this really, or you can by tunneling each
request back to ppc, or doing an extra data copy, but it's unlikely to
work well.

The limitation comes from hardware, not from the API we use.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/16/2009 10:22 PM, Gregory Haskins wrote:
>> Avi Kivity wrote:
>>   
>>> On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>>> 
> If kvm can do it, others can.
>
>  
 The problem is that you seem to either hand-wave over details like
 this,
 or you give details that are pretty much exactly what vbus does
 already.
My point is that I've already sat down and thought about these
 issues
 and solved them in a freely available GPL'ed software package.


>>> In the kernel.  IMO that's the wrong place for it.
>>>  
>> 3) "in-kernel": You can do something like virtio-net to vhost to
>> potentially meet some of the requirements, but not all.
>>
>> In order to fully meet (3), you would need to do some of that stuff you
>> mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
>> we need to have a facility for mapping eventfds and establishing a
>> signaling mechanism (like PIO+qid), etc. KVM does this with
>> IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
>> invented.
>>
> 
> irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.

Not per se, but it needs to be interfaced.  How do I register that
eventfd with the fastpath in Ira's rig? How do I signal the eventfd
(x86->ppc, and ppc->x86)?

To take it to the next level, how do I organize that mechanism so that
it works for more than one IO-stream (e.g. address the various queues
within ethernet or a different device like the console)?  KVM has
IOEVENTFD and IRQFD managed with MSI and PIO.  This new rig does not
have the luxury of an established IO paradigm.

Is vbus the only way to implement a solution?  No.  But it is _a_ way,
and its one that was specifically designed to solve this very problem
(as well as others).

(As an aside, note that you generally will want an abstraction on top of
irqfd/eventfd like shm-signal or virtqueues to do shared-memory based
event mitigation, but I digress.  That is a separate topic).

> 
>> To meet performance, this stuff has to be in kernel and there has to be
>> a way to manage it.
> 
> and management belongs in userspace.

vbus does not dictate where the management must be.  Its an extensible
framework, governed by what you plug into it (ala connectors and devices).

For instance, the vbus-kvm connector in alacrityvm chooses to put DEVADD
and DEVDROP hotswap events into the interrupt stream, because they are
simple and we already needed the interrupt stream anyway for fast-path.

As another example: venet chose to put ->call(MACQUERY) "config-space"
into its call namespace because its simple, and we already need
->calls() for fastpath.  It therefore exports an attribute to sysfs that
allows the management app to set it.

I could likewise have designed the connector or device-model differently
as to keep the mac-address and hotswap-events somewhere else (QEMU/PCI
userspace) but this seems silly to me when they are so trivial, so I didn't.

> 
>> Since vbus was designed to do exactly that, this is
>> what I would advocate.  You could also reinvent these concepts and put
>> your own mux and mapping code in place, in addition to all the other
>> stuff that vbus does.  But I am not clear why anyone would want to.
>>
> 
> Maybe they like their backward compatibility and Windows support.

This is really not relevant to this thread, since we are talking about
Ira's hardware.  But if you must bring this up, then I will reiterate
that you just design the connector to interface with QEMU+PCI and you
have that too if that was important to you.

But on that topic: Since you could consider KVM a "motherboard
manufacturer" of sorts (it just happens to be virtual hardware), I don't
know why KVM seems to consider itself the only motherboard manufacturer
in the world that has to make everything look legacy.  If a company like
ASUS wants to add some cutting edge IO controller/bus, they simply do
it.  Pretty much every product release may contain a different array of
devices, many of which are not backwards compatible with any prior
silicon.  The guy/gal installing Windows on that system may see a "?" in
device-manager until they load a driver that supports the new chip, and
subsequently it works.  It is certainly not a requirement to make said
chip somehow work with existing drivers/facilities on bare metal, per
se.  Why should virtual systems be different?

So, yeah, the current design of the vbus-kvm connector means I have to
provide a driver.  This is understood, and I have no problem with that.

The only thing that I would agree has to be backwards compatible is the
BIOS/boot function.  If you can't support running an image like the
Windows installer, you are hosed.  If you can't use your ethernet until
you get a chance to install a driver after the install completes, its
just like most other systems in existence.  IOW: It's not a big deal.

For cases where the IO system is needed 

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-16 Thread Anthony Liguori
Alan Cox wrote:
>> This device is very much a serial port.  I don't see any reason not
>> to treat it like one.
>> 
>
> Here are a few
>
> - You don't need POSIX multi-open semantics, hangup and the like
>   

We do actually want hangup and a few other of the tty specific ops.  The 
only thing we really don't want is a baud rate.

> - Seek makes sense on some kinds of fixed attributes
>   

I don't think we're dealing with fixed attributes.  These are streams.  
Fundamentally, this is a paravirtual uart.  The improvement over a 
standard uart is that there can be a larger number of ports, ports can 
have some identification associated with them, and we are not 
constrained to the emulated hardware interface which doesn't exist on 
certain platforms (like s390).

> - TTY has a relatively large memory overhead per device
> - Sysfs is what everything else uses
> - Sysfs has some rather complete lifetime management you'll need to
>   redo by hand
>   

sysfs doesn't model streaming data which is what this driver provides.

> - You don't need idiotic games with numbering spaces
>
> Abusing tty for this is ridiculous.

If the argument is that tty is an awkward interface that should only be 
used for legacy purposes, then sure, we should just implement a new 
userspace interface for this.  In fact, this is probably supported by 
the very existence of hvc.

On the other hand, this is fundamentally a paravirtual serial device.  
Since serial devices are exposed via the tty subsystem, it seems like a 
logical choice.

>  In some ways putting much of it in
> kernel is ridiculous too as you can do it with a FUSE fs or simply
> export the info guest-guest using SNMP.
>   

This device cannot be implemented as-is in userspace because it depends 
on DMA which precludes the use of something like uio_pci.  We could 
modify the device to avoid dma if the feeling was that there was no 
interest in putting this in the kernel.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/16/2009 10:22 PM, Gregory Haskins wrote:
> Avi Kivity wrote:
>
>> On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>>  
 If kvm can do it, others can.

  
>>> The problem is that you seem to either hand-wave over details like this,
>>> or you give details that are pretty much exactly what vbus does already.
>>>My point is that I've already sat down and thought about these issues
>>> and solved them in a freely available GPL'ed software package.
>>>
>>>
>> In the kernel.  IMO that's the wrong place for it.
>>  
> 3) "in-kernel": You can do something like virtio-net to vhost to
> potentially meet some of the requirements, but not all.
>
> In order to fully meet (3), you would need to do some of that stuff you
> mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
> we need to have a facility for mapping eventfds and establishing a
> signaling mechanism (like PIO+qid), etc. KVM does this with
> IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
> invented.
>

irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.

> To meet performance, this stuff has to be in kernel and there has to be
> a way to manage it.

and management belongs in userspace.

> Since vbus was designed to do exactly that, this is
> what I would advocate.  You could also reinvent these concepts and put
> your own mux and mapping code in place, in addition to all the other
> stuff that vbus does.  But I am not clear why anyone would want to.
>

Maybe they like their backward compatibility and Windows support.

> So no, the kernel is not the wrong place for it.  Its the _only_ place
> for it.  Otherwise, just use (1) and be done with it.
>
>

I'm talking about the config stuff, not the data path.

>>   Further, if we adopt
>> vbus, if drop compatibility with existing guests or have to support both
>> vbus and virtio-pci.
>>  
> We already need to support both (at least to support Ira).  virtio-pci
> doesn't work here.  Something else (vbus, or vbus-like) is needed.
>

virtio-ira.

>>> So the question is: is your position that vbus is all wrong and you wish
>>> to create a new bus-like thing to solve the problem?
>>>
>> I don't intend to create anything new, I am satisfied with virtio.  If
>> it works for Ira, excellent.  If not, too bad.
>>  
> I think that about sums it up, then.
>

Yes.  I'm all for reusing virtio, but I'm not going switch to vbus or 
support both for this esoteric use case.

>>> If so, how is it
>>> different from what Ive already done?  More importantly, what specific
>>> objections do you have to what Ive done, as perhaps they can be fixed
>>> instead of starting over?
>>>
>>>
>> The two biggest objections are:
>> - the host side is in the kernel
>>  
> As it needs to be.
>

vhost-net somehow manages to work without the config stuff in the kernel.

> With all due respect, based on all of your comments in aggregate I
> really do not think you are truly grasping what I am actually building here.
>

Thanks.



>>> Bingo.  So now its a question of do you want to write this layer from
>>> scratch, or re-use my framework.
>>>
>>>
>> You will have to implement a connector or whatever for vbus as well.
>> vbus has more layers so it's probably smaller for vbus.
>>  
> Bingo!

(addictive, isn't it)

> That is precisely the point.
>
> All the stuff for how to map eventfds, handle signal mitigation, demux
> device/function pointers, isolation, etc, are built in.  All the
> connector has to do is transport the 4-6 verbs and provide a memory
> mapping/copy function, and the rest is reusable.  The device models
> would then work in all environments unmodified, and likewise the
> connectors could use all device-models unmodified.
>

Well, virtio has a similar abstraction on the guest side.  The host side 
abstraction is limited to signalling since all configuration is in 
userspace.  vhost-net ought to work for lguest and s390 without change.

>> It was already implemented three times for virtio, so apparently that's
>> extensible too.
>>  
> And to my point, I'm trying to commoditize as much of that process as
> possible on both the front and backends (at least for cases where
> performance matters) so that you don't need to reinvent the wheel for
> each one.
>

Since you're interested in any-to-any connectors it makes sense to you.  
I'm only interested in kvm-host-to-kvm-guest, so reducing the already 
minor effort to implement a new virtio binding has little appeal to me.

>> You mean, if the x86 board was able to access the disks and dma into the
>> ppb boards memory?  You'd run vhost-blk on x86 and virtio-net on ppc.
>>  
> But as we discussed, vhost doesn't work well if you try to run it on the
> x86 side due to its assumptions about pagable "guest" memory, right?  So
> is that even an option?  And even still, you would still need to solve
> t

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>>
>>> If kvm can do it, others can.
>>>  
>> The problem is that you seem to either hand-wave over details like this,
>> or you give details that are pretty much exactly what vbus does already.
>>   My point is that I've already sat down and thought about these issues
>> and solved them in a freely available GPL'ed software package.
>>
> 
> In the kernel.  IMO that's the wrong place for it.

In conversations with Ira, he indicated he needs kernel-to-kernel
ethernet for performance, and needs at least an ethernet and console
connectivity.  You could conceivably build a solution for this system 3
basic ways:

1) "completely" in userspace: use things like tuntap on the ppc boards,
and tunnel packets across a custom point-to-point connection formed over
the pci link to a userspace app on the x86 board.  This app then
reinjects the packets into the x86 kernel as a raw socket or tuntap,
etc.  Pretty much vanilla tuntap/vpn kind of stuff.  Advantage: very
little kernel code.  Problem: performance (citation: hopefully obvious).

2) "partially" in userspace: have an in-kernel virtio-net driver talk to
a userspace based virtio-net backend.  This is the (current, non-vhost
oriented) KVM/qemu model.  Advantage, re-uses existing kernel-code.
Problem: performance (citation: see alacrityvm numbers).

3) "in-kernel": You can do something like virtio-net to vhost to
potentially meet some of the requirements, but not all.

In order to fully meet (3), you would need to do some of that stuff you
mentioned in the last reply with muxing device-nr/reg-nr.  In addition,
we need to have a facility for mapping eventfds and establishing a
signaling mechanism (like PIO+qid), etc. KVM does this with
IRQFD/IOEVENTFD, but we dont have KVM in this case so it needs to be
invented.

To meet performance, this stuff has to be in kernel and there has to be
a way to manage it.  Since vbus was designed to do exactly that, this is
what I would advocate.  You could also reinvent these concepts and put
your own mux and mapping code in place, in addition to all the other
stuff that vbus does.  But I am not clear why anyone would want to.

So no, the kernel is not the wrong place for it.  Its the _only_ place
for it.  Otherwise, just use (1) and be done with it.

>  Further, if we adopt
> vbus, if drop compatibility with existing guests or have to support both
> vbus and virtio-pci.

We already need to support both (at least to support Ira).  virtio-pci
doesn't work here.  Something else (vbus, or vbus-like) is needed.

> 
>> So the question is: is your position that vbus is all wrong and you wish
>> to create a new bus-like thing to solve the problem?
> 
> I don't intend to create anything new, I am satisfied with virtio.  If
> it works for Ira, excellent.  If not, too bad.

I think that about sums it up, then.


>  I believe it will work without too much trouble.

Afaict it wont for the reasons I mentioned.

> 
>> If so, how is it
>> different from what Ive already done?  More importantly, what specific
>> objections do you have to what Ive done, as perhaps they can be fixed
>> instead of starting over?
>>
> 
> The two biggest objections are:
> - the host side is in the kernel

As it needs to be.

> - the guest side is a new bus instead of reusing pci (on x86/kvm),
> making Windows support more difficult

Thats a function of the vbus-connector, which is different from
vbus-core.  If you don't like it (and I know you don't), we can write
one that interfaces to qemu's pci system.  I just don't like the
limitations that imposes, nor do I think we need that complexity of
dealing with a split PCI model, so I chose to not implement vbus-kvm
this way.

With all due respect, based on all of your comments in aggregate I
really do not think you are truly grasping what I am actually building here.

> 
> I guess these two are exactly what you think are vbus' greatest
> advantages, so we'll probably have to extend our agree-to-disagree on
> this one.
> 
> I also had issues with using just one interrupt vector to service all
> events, but that's easily fixed.

Again, function of the connector.

> 
>>> There is no guest and host in this scenario.  There's a device side
>>> (ppc) and a driver side (x86).  The driver side can access configuration
>>> information on the device side.  How to multiplex multiple devices is an
>>> interesting exercise for whoever writes the virtio binding for that
>>> setup.
>>>  
>> Bingo.  So now its a question of do you want to write this layer from
>> scratch, or re-use my framework.
>>
> 
> You will have to implement a connector or whatever for vbus as well. 
> vbus has more layers so it's probably smaller for vbus.

Bingo! That is precisely the point.

All the stuff for how to map eventfds, handle signal mitigation, demux
device/function pointers, isolation, etc, are built in.  All the
connector has to do is transport the 4-6 verbs and provide

Re: vhost-net todo list

2009-09-16 Thread Avi Kivity
On 09/16/2009 06:27 PM, Arnd Bergmann wrote:
> That scenario is probably not so relevant for KVM, unless you
> consider the guest taking over the qemu host process a valid
> security threat.
>

It is.  We address it by using SCM_RIGHTS for all sensitive operations 
and selinuxing qemu as tightly as possible.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost-net todo list

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 05:27:25PM +0200, Arnd Bergmann wrote:
> On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > > 
> > > No, I think this is less important, because the bridge code
> > > also doesn't do this.
> > 
> > True, but the reason might be that it is much harder in bridge (you have
> > to snoop multicast registrations). With macvlan you know which
> > multicasts does each device want.
> 
> Right. It shouldn't be hard to do, and I'll probably get to
> that after the other changes.
> > > One of the problems that raw packet sockets have is the requirement
> > > for root permissions (e.g. through libvirt). Tap sockets and
> > > macvtap both don't have this limitation, so you can use them as
> > > a regular user without libvirt.
> > 
> > I don't see a huge difference here.
> > If you are happy with the user being able to bypass filters in host,
> > just give her CAP_NET_RAW capability.  It does not have to be root.
> 
> Capabilities are nice in theory, but I've never seen them being used
> effectively in practice, where it essentially comes down to some
> SUID wrapper.

Heh, for tap people seem to just give out write access to
it and that's all.  Not really different.

> Also, I might not want to allow the user to open a
> random random raw socket, but only one on a specific downstream
> port of a macvlan interface, so I can filter out the data from
> that respective MAC address in an external switch.

I agree. Maybe we can fix that for raw sockets, want me to
add it to the list? :)

> That scenario is probably not so relevant for KVM, unless you
> consider the guest taking over the qemu host process a valid
> security threat.

Defence in depth is a good thing, anyway.

>   Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 05:22:37PM +0200, Arnd Bergmann wrote:
> On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote:
> > > On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> > > > Userspace in x86 maps a PCI region, uses it for communication with ppc?
> > > 
> > > This might have portability issues. On x86 it should work, but if the
> > > host is powerpc or similar, you cannot reliably access PCI I/O memory
> > > through copy_tofrom_user but have to use memcpy_toio/fromio or 
> > > readl/writel
> > > calls, which don't work on user pointers.
> > > 
> > > Specifically on powerpc, copy_from_user cannot access unaligned buffers
> > > if they are on an I/O mapping.
> > > 
> > We are talking about doing this in userspace, not in kernel.
> 
> Ok, that's fine then. I thought the idea was to use the vhost_net driver

It's a separate issue. We were talking generally about configuration
and setup. Gregory implemented it in kernel, Avi wants it
moved to userspace, with only fastpath in kernel.

> to access the user memory, which would be a really cute hack otherwise,
> as you'd only need to provide the eventfds from a hardware specific
> driver and could use the regular virtio_net on the other side.
> 
>   Arnd <><

To do that, maybe copy to user on ppc can be fixed, or wrapped
around in a arch specific macro, so that everyone else
does not have to go through abstraction layers.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/16/2009 05:10 PM, Gregory Haskins wrote:
>
>> If kvm can do it, others can.
>>  
> The problem is that you seem to either hand-wave over details like this,
> or you give details that are pretty much exactly what vbus does already.
>   My point is that I've already sat down and thought about these issues
> and solved them in a freely available GPL'ed software package.
>

In the kernel.  IMO that's the wrong place for it.  Further, if we adopt 
vbus, if drop compatibility with existing guests or have to support both 
vbus and virtio-pci.

> So the question is: is your position that vbus is all wrong and you wish
> to create a new bus-like thing to solve the problem?

I don't intend to create anything new, I am satisfied with virtio.  If 
it works for Ira, excellent.  If not, too bad.  I believe it will work 
without too much trouble.

> If so, how is it
> different from what Ive already done?  More importantly, what specific
> objections do you have to what Ive done, as perhaps they can be fixed
> instead of starting over?
>

The two biggest objections are:
- the host side is in the kernel
- the guest side is a new bus instead of reusing pci (on x86/kvm), 
making Windows support more difficult

I guess these two are exactly what you think are vbus' greatest 
advantages, so we'll probably have to extend our agree-to-disagree on 
this one.

I also had issues with using just one interrupt vector to service all 
events, but that's easily fixed.

>> There is no guest and host in this scenario.  There's a device side
>> (ppc) and a driver side (x86).  The driver side can access configuration
>> information on the device side.  How to multiplex multiple devices is an
>> interesting exercise for whoever writes the virtio binding for that setup.
>>  
> Bingo.  So now its a question of do you want to write this layer from
> scratch, or re-use my framework.
>

You will have to implement a connector or whatever for vbus as well.  
vbus has more layers so it's probably smaller for vbus.


  
>>> I am talking about how we would tunnel the config space for N devices
>>> across his transport.
>>>
>>>
>> Sounds trivial.
>>  
> No one said it was rocket science.  But it does need to be designed and
> implemented end-to-end, much of which Ive already done in what I hope is
> an extensible way.
>

It was already implemented three times for virtio, so apparently that's 
extensible too.

>>   Write an address containing the device number and
>> register number to on location, read or write data from another.
>>  
> You mean like the "u64 devh", and "u32 func" fields I have here for the
> vbus-kvm connector?
>
> http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64
>
>

Probably.



>>> That sounds convenient given his hardware, but it has its own set of
>>> problems.  For one, the configuration/inventory of these boards is now
>>> driven by the wrong side and has to be addressed.
>>>
>> Why is it the wrong side?
>>  
> "Wrong" is probably too harsh a word when looking at ethernet.  Its
> certainly "odd", and possibly inconvenient.  It would be like having
> vhost in a KVM guest, and virtio-net running on the host.  You could do
> it, but its weird and awkward.  Where it really falls apart and enters
> the "wrong" category is for non-symmetric devices, like disk-io.
>
>


It's not odd or wrong or wierd or awkward.  An ethernet NIC is not 
symmetric, one side does DMA and issues interrupts, the other uses its 
own memory.  That's exactly the case with Ira's setup.

If the ppc boards were to emulate a disk controller, you'd run 
virtio-blk on x86 and vhost-blk on the ppc boards.

>>> Second, the role
>>> reversal will likely not work for many models other than ethernet (e.g.
>>> virtio-console or virtio-blk drivers running on the x86 board would be
>>> naturally consuming services from the slave boards...virtio-net is an
>>> exception because 802.x is generally symmetrical).
>>>
>>>
>> There is no role reversal.
>>  
> So if I have virtio-blk driver running on the x86 and vhost-blk device
> running on the ppc board, I can use the ppc board as a block-device.
> What if I really wanted to go the other way?
>

You mean, if the x86 board was able to access the disks and dma into the 
ppb boards memory?  You'd run vhost-blk on x86 and virtio-net on ppc.

As long as you don't use the words "guest" and "host" but keep to 
"driver" and "device", it all works out.

>> The side doing dma is the device, the side
>> accessing its own memory is the driver.  Just like that other 1e12
>> driver/device pairs out there.
>>  
> IIUC, his ppc boards really can be seen as "guests" (they are linux
> instances that are utilizing services from the x86, not the other way
> around).

They aren't guests.  Guests d

Re: vhost-net todo list

2009-09-16 Thread Arnd Bergmann
On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > 
> > No, I think this is less important, because the bridge code
> > also doesn't do this.
> 
> True, but the reason might be that it is much harder in bridge (you have
> to snoop multicast registrations). With macvlan you know which
> multicasts does each device want.

Right. It shouldn't be hard to do, and I'll probably get to
that after the other changes.

> > One of the problems that raw packet sockets have is the requirement
> > for root permissions (e.g. through libvirt). Tap sockets and
> > macvtap both don't have this limitation, so you can use them as
> > a regular user without libvirt.
> 
> I don't see a huge difference here.
> If you are happy with the user being able to bypass filters in host,
> just give her CAP_NET_RAW capability.  It does not have to be root.

Capabilities are nice in theory, but I've never seen them being used
effectively in practice, where it essentially comes down to some
SUID wrapper. Also, I might not want to allow the user to open a
random random raw socket, but only one on a specific downstream
port of a macvlan interface, so I can filter out the data from
that respective MAC address in an external switch.

That scenario is probably not so relevant for KVM, unless you
consider the guest taking over the qemu host process a valid
security threat.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Arnd Bergmann
On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote:
> > On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> > > Userspace in x86 maps a PCI region, uses it for communication with ppc?
> > 
> > This might have portability issues. On x86 it should work, but if the
> > host is powerpc or similar, you cannot reliably access PCI I/O memory
> > through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel
> > calls, which don't work on user pointers.
> > 
> > Specifically on powerpc, copy_from_user cannot access unaligned buffers
> > if they are on an I/O mapping.
> > 
> We are talking about doing this in userspace, not in kernel.

Ok, that's fine then. I thought the idea was to use the vhost_net driver
to access the user memory, which would be a really cute hack otherwise,
as you'd only need to provide the eventfds from a hardware specific
driver and could use the regular virtio_net on the other side.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost-net todo list

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 05:08:46PM +0200, Arnd Bergmann wrote:
> On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote:
> > > On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > > > vhost-net driver projects
> > > 
> > > I still think that list should include
> > 
> > Yea, why not. Go wild.
> > 
> > > - UDP multicast socket support
> > > - TCP socket support
> > 
> > Switch to UDP unicast while we are at it?
> > tunneling raw packets over TCP looks wrong.
> 
> Well, TCP is what qemu supports right now, that's why
> I added it to the list. We could add UDP unicast as
> yet another protocol in both qemu and vhost_net if there
> is demand for it. The implementation should be trivial
> based on the existing code paths.
> 
> > > One thing I'm planning to work on is bridge support in macvlan,
> > > together with VEPA compliant operation, i.e. not sending back
> > > multicast frames to the origin.
> > 
> > is multicast filtering already there (i.e. only getting
> > frames for groups you want)?
> 
> No, I think this is less important, because the bridge code
> also doesn't do this.

True, but the reason might be that it is much harder in bridge (you have
to snoop multicast registrations). With macvlan you know which
multicasts does each device want.

> > > I'll also keep looking into macvtap, though that will be less
> > > important once you get the tap socket support running.
> > 
> > Not sure I see the connection. to get an equivalent to macvtap,
> > what you need is tso etc support in packet sockets. No?
> 
> I'm not worried about tso support here.
> 
> One of the problems that raw packet sockets have is the requirement
> for root permissions (e.g. through libvirt). Tap sockets and
> macvtap both don't have this limitation, so you can use them as
> a regular user without libvirt.

I don't see a huge difference here.
If you are happy with the user being able to bypass filters in host,
just give her CAP_NET_RAW capability.  It does not have to be root.

>   Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 04:57:42PM +0200, Arnd Bergmann wrote:
> On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> > Userspace in x86 maps a PCI region, uses it for communication with ppc?
> 
> This might have portability issues. On x86 it should work, but if the
> host is powerpc or similar, you cannot reliably access PCI I/O memory
> through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel
> calls, which don't work on user pointers.
> 
> Specifically on powerpc, copy_from_user cannot access unaligned buffers
> if they are on an I/O mapping.
> 
>   Arnd <><

We are talking about doing this in userspace, not in kernel.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost-net todo list

2009-09-16 Thread Arnd Bergmann
On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote:
> > On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > > vhost-net driver projects
> > 
> > I still think that list should include
> 
> Yea, why not. Go wild.
> 
> > - UDP multicast socket support
> > - TCP socket support
> 
> Switch to UDP unicast while we are at it?
> tunneling raw packets over TCP looks wrong.

Well, TCP is what qemu supports right now, that's why
I added it to the list. We could add UDP unicast as
yet another protocol in both qemu and vhost_net if there
is demand for it. The implementation should be trivial
based on the existing code paths.

> > One thing I'm planning to work on is bridge support in macvlan,
> > together with VEPA compliant operation, i.e. not sending back
> > multicast frames to the origin.
> 
> is multicast filtering already there (i.e. only getting
> frames for groups you want)?

No, I think this is less important, because the bridge code
also doesn't do this.
 
> > I'll also keep looking into macvtap, though that will be less
> > important once you get the tap socket support running.
> 
> Not sure I see the connection. to get an equivalent to macvtap,
> what you need is tso etc support in packet sockets. No?

I'm not worried about tso support here.

One of the problems that raw packet sockets have is the requirement
for root permissions (e.g. through libvirt). Tap sockets and
macvtap both don't have this limitation, so you can use them as
a regular user without libvirt.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost-net todo list

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote:
> On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > vhost-net driver projects
> 
> I still think that list should include

Why not. But note that including things in a list will not magically
make them done :)

> - UDP multicast socket support
> - TCP socket support

Switch to UDP unicast while we are at it?
tunneling raw packets over TCP looks wrong.

> - raw packet socket support for qemu (from Or Gerlitz)
> if we have those, plus the tap support that is already on
> your list, we can use vhost-net as a generic offload
> for the host networking in qemu.
> 
> > projects involing networking stack
> > - export socket from tap so vhost can use it - working on it now
> > - extend raw sockets to support GSO/checksum offloading,
> >   and teach vhost to use that capability
> >   [one way to do this: virtio net header support]
> >   will allow working with e.g. macvlan
> 
> One thing I'm planning to work on is bridge support in macvlan,
> together with VEPA compliant operation, i.e. not sending back
> multicast frames to the origin.

is multicast filtering already there (i.e. only getting
frames for groups you want)?

> I'll also keep looking into macvtap, though that will be less
> important once you get the tap socket support running.

Not sure I see the connection. to get an equivalent to macvtap,
what you need is tso etc support in packet sockets. No?

>   Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost-net todo list

2009-09-16 Thread Michael S. Tsirkin
On Wed, Sep 16, 2009 at 04:52:40PM +0200, Arnd Bergmann wrote:
> On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> > vhost-net driver projects
> 
> I still think that list should include

Yea, why not. Go wild.

> - UDP multicast socket support
> - TCP socket support

Switch to UDP unicast while we are at it?
tunneling raw packets over TCP looks wrong.

> - raw packet socket support for qemu (from Or Gerlitz)
> if we have those, plus the tap support that is already on
> your list, we can use vhost-net as a generic offload
> for the host networking in qemu.
> 
> > projects involing networking stack
> > - export socket from tap so vhost can use it - working on it now
> > - extend raw sockets to support GSO/checksum offloading,
> >   and teach vhost to use that capability
> >   [one way to do this: virtio net header support]
> >   will allow working with e.g. macvlan
> 
> One thing I'm planning to work on is bridge support in macvlan,
> together with VEPA compliant operation, i.e. not sending back
> multicast frames to the origin.

is multicast filtering already there (i.e. only getting
frames for groups you want)?

> I'll also keep looking into macvtap, though that will be less
> important once you get the tap socket support running.

Not sure I see the connection. to get an equivalent to macvtap,
what you need is tso etc support in packet sockets. No?

>   Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Arnd Bergmann
On Tuesday 15 September 2009, Michael S. Tsirkin wrote:
> Userspace in x86 maps a PCI region, uses it for communication with ppc?

This might have portability issues. On x86 it should work, but if the
host is powerpc or similar, you cannot reliably access PCI I/O memory
through copy_tofrom_user but have to use memcpy_toio/fromio or readl/writel
calls, which don't work on user pointers.

Specifically on powerpc, copy_from_user cannot access unaligned buffers
if they are on an I/O mapping.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: vhost-net todo list

2009-09-16 Thread Arnd Bergmann
On Wednesday 16 September 2009, Michael S. Tsirkin wrote:
> vhost-net driver projects

I still think that list should include
- UDP multicast socket support
- TCP socket support
- raw packet socket support for qemu (from Or Gerlitz)

if we have those, plus the tap support that is already on
your list, we can use vhost-net as a generic offload
for the host networking in qemu.

> projects involing networking stack
> - export socket from tap so vhost can use it - working on it now
> - extend raw sockets to support GSO/checksum offloading,
>   and teach vhost to use that capability
>   [one way to do this: virtio net header support]
>   will allow working with e.g. macvlan

One thing I'm planning to work on is bridge support in macvlan,
together with VEPA compliant operation, i.e. not sending back
multicast frames to the origin.

I'll also keep looking into macvtap, though that will be less
important once you get the tap socket support running.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/16/2009 02:44 PM, Gregory Haskins wrote:
>> The problem isn't where to find the models...the problem is how to
>> aggregate multiple models to the guest.
>>
> 
> You mean configuration?
> 
>>> You instantiate multiple vhost-nets.  Multiple ethernet NICs is a
>>> supported configuration for kvm.
>>>  
>> But this is not KVM.
>>
>>
> 
> If kvm can do it, others can.

The problem is that you seem to either hand-wave over details like this,
or you give details that are pretty much exactly what vbus does already.
 My point is that I've already sat down and thought about these issues
and solved them in a freely available GPL'ed software package.

So the question is: is your position that vbus is all wrong and you wish
to create a new bus-like thing to solve the problem?  If so, how is it
different from what Ive already done?  More importantly, what specific
objections do you have to what Ive done, as perhaps they can be fixed
instead of starting over?

> 
 His slave boards surface themselves as PCI devices to the x86
 host.  So how do you use that to make multiple vhost-based devices (say
 two virtio-nets, and a virtio-console) communicate across the
 transport?


>>> I don't really see the difference between 1 and N here.
>>>  
>> A KVM surfaces N virtio-devices as N pci-devices to the guest.  What do
>> we do in Ira's case where the entire guest represents itself as a PCI
>> device to the host, and nothing the other way around?
>>
> 
> There is no guest and host in this scenario.  There's a device side
> (ppc) and a driver side (x86).  The driver side can access configuration
> information on the device side.  How to multiplex multiple devices is an
> interesting exercise for whoever writes the virtio binding for that setup.

Bingo.  So now its a question of do you want to write this layer from
scratch, or re-use my framework.

> 
 There are multiple ways to do this, but what I am saying is that
 whatever is conceived will start to look eerily like a vbus-connector,
 since this is one of its primary purposes ;)


>>> I'm not sure if you're talking about the configuration interface or data
>>> path here.
>>>  
>> I am talking about how we would tunnel the config space for N devices
>> across his transport.
>>
> 
> Sounds trivial.

No one said it was rocket science.  But it does need to be designed and
implemented end-to-end, much of which Ive already done in what I hope is
an extensible way.

>  Write an address containing the device number and
> register number to on location, read or write data from another.

You mean like the "u64 devh", and "u32 func" fields I have here for the
vbus-kvm connector?

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64

> Just
> like the PCI cf8/cfc interface.
> 
>>> They aren't in the "guest".  The best way to look at it is
>>>
>>> - a device side, with a dma engine: vhost-net
>>> - a driver side, only accessing its own memory: virtio-net
>>>
>>> Given that Ira's config has the dma engine in the ppc boards, that's
>>> where vhost-net would live (the ppc boards acting as NICs to the x86
>>> board, essentially).
>>>  
>> That sounds convenient given his hardware, but it has its own set of
>> problems.  For one, the configuration/inventory of these boards is now
>> driven by the wrong side and has to be addressed.
> 
> Why is it the wrong side?

"Wrong" is probably too harsh a word when looking at ethernet.  Its
certainly "odd", and possibly inconvenient.  It would be like having
vhost in a KVM guest, and virtio-net running on the host.  You could do
it, but its weird and awkward.  Where it really falls apart and enters
the "wrong" category is for non-symmetric devices, like disk-io.

> 
>> Second, the role
>> reversal will likely not work for many models other than ethernet (e.g.
>> virtio-console or virtio-blk drivers running on the x86 board would be
>> naturally consuming services from the slave boards...virtio-net is an
>> exception because 802.x is generally symmetrical).
>>
> 
> There is no role reversal.

So if I have virtio-blk driver running on the x86 and vhost-blk device
running on the ppc board, I can use the ppc board as a block-device.
What if I really wanted to go the other way?

> The side doing dma is the device, the side
> accessing its own memory is the driver.  Just like that other 1e12
> driver/device pairs out there.

IIUC, his ppc boards really can be seen as "guests" (they are linux
instances that are utilizing services from the x86, not the other way
around).  vhost forces the model to have the ppc boards act as IO-hosts,
whereas vbus would likely work in either direction due to its more
refined abstraction layer.

> 
>>> I have no idea, that's for Ira to solve.
>>>  
>> Bingo.  Thus

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/16/2009 02:44 PM, Gregory Haskins wrote:
> The problem isn't where to find the models...the problem is how to
> aggregate multiple models to the guest.
>

You mean configuration?

>> You instantiate multiple vhost-nets.  Multiple ethernet NICs is a
>> supported configuration for kvm.
>>  
> But this is not KVM.
>
>

If kvm can do it, others can.

>>> His slave boards surface themselves as PCI devices to the x86
>>> host.  So how do you use that to make multiple vhost-based devices (say
>>> two virtio-nets, and a virtio-console) communicate across the transport?
>>>
>>>
>> I don't really see the difference between 1 and N here.
>>  
> A KVM surfaces N virtio-devices as N pci-devices to the guest.  What do
> we do in Ira's case where the entire guest represents itself as a PCI
> device to the host, and nothing the other way around?
>

There is no guest and host in this scenario.  There's a device side 
(ppc) and a driver side (x86).  The driver side can access configuration 
information on the device side.  How to multiplex multiple devices is an 
interesting exercise for whoever writes the virtio binding for that setup.

>>> There are multiple ways to do this, but what I am saying is that
>>> whatever is conceived will start to look eerily like a vbus-connector,
>>> since this is one of its primary purposes ;)
>>>
>>>
>> I'm not sure if you're talking about the configuration interface or data
>> path here.
>>  
> I am talking about how we would tunnel the config space for N devices
> across his transport.
>

Sounds trivial.  Write an address containing the device number and 
register number to on location, read or write data from another.  Just 
like the PCI cf8/cfc interface.

>> They aren't in the "guest".  The best way to look at it is
>>
>> - a device side, with a dma engine: vhost-net
>> - a driver side, only accessing its own memory: virtio-net
>>
>> Given that Ira's config has the dma engine in the ppc boards, that's
>> where vhost-net would live (the ppc boards acting as NICs to the x86
>> board, essentially).
>>  
> That sounds convenient given his hardware, but it has its own set of
> problems.  For one, the configuration/inventory of these boards is now
> driven by the wrong side and has to be addressed.

Why is it the wrong side?

> Second, the role
> reversal will likely not work for many models other than ethernet (e.g.
> virtio-console or virtio-blk drivers running on the x86 board would be
> naturally consuming services from the slave boards...virtio-net is an
> exception because 802.x is generally symmetrical).
>

There is no role reversal.  The side doing dma is the device, the side 
accessing its own memory is the driver.  Just like that other 1e12 
driver/device pairs out there.

>> I have no idea, that's for Ira to solve.
>>  
> Bingo.  Thus my statement that the vhost proposal is incomplete.  You
> have the virtio-net and vhost-net pieces covering the fast-path
> end-points, but nothing in the middle (transport, aggregation,
> config-space), and nothing on the management-side.  vbus provides most
> of the other pieces, and can even support the same virtio-net protocol
> on top.  The remaining part would be something like a udev script to
> populate the vbus with devices on board-insert events.
>

Of course vhost is incomplete, in the same sense that Linux is 
incomplete.  Both require userspace.

>> If he could fake the PCI
>> config space as seen by the x86 board, he would just show the normal pci
>> config and use virtio-pci (multiple channels would show up as a
>> multifunction device).  Given he can't, he needs to tunnel the virtio
>> config space some other way.
>>  
> Right, and note that vbus was designed to solve this.  This tunneling
> can, of course, be done without vbus using some other design.  However,
> whatever solution is created will look incredibly close to what I've
> already done, so my point is "why reinvent it"?
>

virtio requires binding for this tunnelling, so does vbus.  Its the same 
problem with the same solution.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/15/2009 11:08 PM, Gregory Haskins wrote:
>>
>>> There's virtio-console, virtio-blk etc.  None of these have kernel-mode
>>> servers, but these could be implemented if/when needed.
>>>  
>> IIUC, Ira already needs at least ethernet and console capability.
>>
>>
> 
> He's welcome to pick up the necessary code from qemu.

The problem isn't where to find the models...the problem is how to
aggregate multiple models to the guest.

> 
 b) what do you suppose this protocol to aggregate the connections would
 look like? (hint: this is what a vbus-connector does).


>>> You mean multilink?  You expose the device as a multiqueue.
>>>  
>> No, what I mean is how do you surface multiple ethernet and consoles to
>> the guests?  For Ira's case, I think he needs at minimum at least one of
>> each, and he mentioned possibly having two unique ethernets at one point.
>>
> 
> You instantiate multiple vhost-nets.  Multiple ethernet NICs is a
> supported configuration for kvm.

But this is not KVM.

> 
>> His slave boards surface themselves as PCI devices to the x86
>> host.  So how do you use that to make multiple vhost-based devices (say
>> two virtio-nets, and a virtio-console) communicate across the transport?
>>
> 
> I don't really see the difference between 1 and N here.

A KVM surfaces N virtio-devices as N pci-devices to the guest.  What do
we do in Ira's case where the entire guest represents itself as a PCI
device to the host, and nothing the other way around?


> 
>> There are multiple ways to do this, but what I am saying is that
>> whatever is conceived will start to look eerily like a vbus-connector,
>> since this is one of its primary purposes ;)
>>
> 
> I'm not sure if you're talking about the configuration interface or data
> path here.

I am talking about how we would tunnel the config space for N devices
across his transport.

As an aside, the vbus-kvm connector makes them one and the same, but
they do not have to be.  Its all in the connector design.

> 
 c) how do you manage the configuration, especially on a per-board
 basis?


>>> pci (for kvm/x86).
>>>  
>> Ok, for kvm understood (and I would also add "qemu" to that mix).  But
>> we are talking about vhost's application in a non-kvm environment here,
>> right?.
>>
>> So if the vhost-X devices are in the "guest",
> 
> They aren't in the "guest".  The best way to look at it is
> 
> - a device side, with a dma engine: vhost-net
> - a driver side, only accessing its own memory: virtio-net
> 
> Given that Ira's config has the dma engine in the ppc boards, that's
> where vhost-net would live (the ppc boards acting as NICs to the x86
> board, essentially).

That sounds convenient given his hardware, but it has its own set of
problems.  For one, the configuration/inventory of these boards is now
driven by the wrong side and has to be addressed.  Second, the role
reversal will likely not work for many models other than ethernet (e.g.
virtio-console or virtio-blk drivers running on the x86 board would be
naturally consuming services from the slave boards...virtio-net is an
exception because 802.x is generally symmetrical).

IIUC, vbus would support having the device models live properly on the
x86 side, solving both of these problems.  It would be impossible to
reverse vhost given its current design.

> 
>> and the x86 board is just
>> a slave...How do you tell each ppc board how many devices and what
>> config (e.g. MACs, etc) to instantiate?  Do you assume that they should
>> all be symmetric and based on positional (e.g. slot) data?  What if you
>> want asymmetric configurations (if not here, perhaps in a different
>> environment)?
>>
> 
> I have no idea, that's for Ira to solve.

Bingo.  Thus my statement that the vhost proposal is incomplete.  You
have the virtio-net and vhost-net pieces covering the fast-path
end-points, but nothing in the middle (transport, aggregation,
config-space), and nothing on the management-side.  vbus provides most
of the other pieces, and can even support the same virtio-net protocol
on top.  The remaining part would be something like a udev script to
populate the vbus with devices on board-insert events.

> If he could fake the PCI
> config space as seen by the x86 board, he would just show the normal pci
> config and use virtio-pci (multiple channels would show up as a
> multifunction device).  Given he can't, he needs to tunnel the virtio
> config space some other way.

Right, and note that vbus was designed to solve this.  This tunneling
can, of course, be done without vbus using some other design.  However,
whatever solution is created will look incredibly close to what I've
already done, so my point is "why reinvent it"?

> 
>>> Yes.  virtio is really virtualization oriented.
>>>  
>> I would say that its vhost in particular that is virtualization
>> oriented.  virtio, as a concept, generally should work in physical
>

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-16 Thread Alan Cox
> This device is very much a serial port.  I don't see any reason not
> to treat it like one.

Here are a few

- You don't need POSIX multi-open semantics, hangup and the like
- Seek makes sense on some kinds of fixed attributes
- TTY has a relatively large memory overhead per device
- Sysfs is what everything else uses
- Sysfs has some rather complete lifetime management you'll need to
  redo by hand
- You don't need idiotic games with numbering spaces

Abusing tty for this is ridiculous. In some ways putting much of it in
kernel is ridiculous too as you can do it with a FUSE fs or simply
export the info guest-guest using SNMP.

Alan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


vhost-net todo list

2009-09-16 Thread Michael S. Tsirkin
Some people asked about getting involved with vhost.

Here's a short list of projects.

vhost-net driver projects
- profiling would be very helpful, I have not done any yet
- tap support - working on it now
- merged buffers - working on it now
- scalability/fairness for large # of guests - working on it now
- logging support with dirty page tracking in kernel - working on it now
- indirect buffers - worth it?
- vm exit mitigation for TX (worth it?
  naive implementation does not seem to help)
- interrupt mitigation for RX
- level triggered interrupts - what's the best thing
  to do here?

qemu projects
- migration support
- level triggered interrupts - what's the best thing
  to do here?
- upstream support for injecting interrupts from kernel,
  from qemu-kvm.git to qemu.git
  (this is a vhost dependency, without it vhost
   can't be upstreamed, or it can, but without real benefit)
- general cleanup and upstreaming

projects involing networking stack
- export socket from tap so vhost can use it - working on it now
- extend raw sockets to support GSO/checksum offloading,
  and teach vhost to use that capability
  [one way to do this: virtio net header support]
  will allow working with e.g. macvlan

long term projects
- multiqueue (involves all of vhost, qemu, virtio,
  networking stack)

- More testing is always good

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-16 Thread Avi Kivity
On 09/15/2009 11:08 PM, Gregory Haskins wrote:
>
>> There's virtio-console, virtio-blk etc.  None of these have kernel-mode
>> servers, but these could be implemented if/when needed.
>>  
> IIUC, Ira already needs at least ethernet and console capability.
>
>

He's welcome to pick up the necessary code from qemu.

>>> b) what do you suppose this protocol to aggregate the connections would
>>> look like? (hint: this is what a vbus-connector does).
>>>
>>>
>> You mean multilink?  You expose the device as a multiqueue.
>>  
> No, what I mean is how do you surface multiple ethernet and consoles to
> the guests?  For Ira's case, I think he needs at minimum at least one of
> each, and he mentioned possibly having two unique ethernets at one point.
>

You instantiate multiple vhost-nets.  Multiple ethernet NICs is a 
supported configuration for kvm.

> His slave boards surface themselves as PCI devices to the x86
> host.  So how do you use that to make multiple vhost-based devices (say
> two virtio-nets, and a virtio-console) communicate across the transport?
>

I don't really see the difference between 1 and N here.

> There are multiple ways to do this, but what I am saying is that
> whatever is conceived will start to look eerily like a vbus-connector,
> since this is one of its primary purposes ;)
>

I'm not sure if you're talking about the configuration interface or data 
path here.

>>> c) how do you manage the configuration, especially on a per-board basis?
>>>
>>>
>> pci (for kvm/x86).
>>  
> Ok, for kvm understood (and I would also add "qemu" to that mix).  But
> we are talking about vhost's application in a non-kvm environment here,
> right?.
>
> So if the vhost-X devices are in the "guest",

They aren't in the "guest".  The best way to look at it is

- a device side, with a dma engine: vhost-net
- a driver side, only accessing its own memory: virtio-net

Given that Ira's config has the dma engine in the ppc boards, that's 
where vhost-net would live (the ppc boards acting as NICs to the x86 
board, essentially).

> and the x86 board is just
> a slave...How do you tell each ppc board how many devices and what
> config (e.g. MACs, etc) to instantiate?  Do you assume that they should
> all be symmetric and based on positional (e.g. slot) data?  What if you
> want asymmetric configurations (if not here, perhaps in a different
> environment)?
>

I have no idea, that's for Ira to solve.  If he could fake the PCI 
config space as seen by the x86 board, he would just show the normal pci 
config and use virtio-pci (multiple channels would show up as a 
multifunction device).  Given he can't, he needs to tunnel the virtio 
config space some other way.

>> Yes.  virtio is really virtualization oriented.
>>  
> I would say that its vhost in particular that is virtualization
> oriented.  virtio, as a concept, generally should work in physical
> systems, if perhaps with some minor modifications.  The biggest "limit"
> is having "virt" in its name ;)
>

Let me rephrase.  The virtio developers are virtualization oriented.  If 
it works for non-virt applications, that's good, but not a design goal.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization