Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread H. Peter Anvin
Alok Kataria wrote:
> 
> What do you suggest would be the right time ? 
> 
> Please note that the next major release of VMware's product will not
> have this supported. Also that, most of our customers will actually be
> running some distro's enterprise release, rather than running the
> cutting edge kernel. So IMO there is still a window of around 1-1.5
> years, until a customer actually sees a kernel which has dropped VMI
> support.
> 

I would say it might make sense pulling it out around the end of 2010, 
which would be about 6 kernel releases from now -- 2.6.37.

-hpa

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Alok Kataria

On Tue, 2009-09-22 at 14:27 -0700, H. Peter Anvin wrote:
> Alok Kataria wrote:
> > Hi Ingo,
> > 
> > On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:
> > 
> >> The thing is, the overwhelming majority of vmware users dont benefit 
> >> from hardware features like nested page tables yet. So this needs to be 
> >> done _way_ more carefully, with a proper sunset period of a couple of 
> >> kernel cycles.
> > 
> > I am fine with that too. Below is a patch which adds notes in
> > feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
> > Please consider this patch for 2.6.32.
> > 
> 
> This seems way, way too early still.

What do you suggest would be the right time ? 

Please note that the next major release of VMware's product will not
have this supported. Also that, most of our customers will actually be
running some distro's enterprise release, rather than running the
cutting edge kernel. So IMO there is still a window of around 1-1.5
years, until a customer actually sees a kernel which has dropped VMI
support.

Thanks,
Alok

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread H. Peter Anvin
Alok Kataria wrote:
> Hi Ingo,
> 
> On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:
> 
>> The thing is, the overwhelming majority of vmware users dont benefit 
>> from hardware features like nested page tables yet. So this needs to be 
>> done _way_ more carefully, with a proper sunset period of a couple of 
>> kernel cycles.
> 
> I am fine with that too. Below is a patch which adds notes in
> feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
> Please consider this patch for 2.6.32.
> 

This seems way, way too early still.

-hpa
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Jeremy Fitzhardinge
On 09/22/09 12:30, Alok Kataria wrote:
> We can certainly look at removing some paravirt-hooks which are only
> used by VMI. Not sure if there are any but will take a look when we
> actually remove VMI.
>   

There are a couple:

* pte_update_defer
* alloc_pmd_clone

lguest appears to still use pte_update(), but I suspect its two
callsites could be recast in the form of other existing pvops.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Alok Kataria
Hi Ingo,

On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:

> 
> The thing is, the overwhelming majority of vmware users dont benefit 
> from hardware features like nested page tables yet. So this needs to be 
> done _way_ more carefully, with a proper sunset period of a couple of 
> kernel cycles.

I am fine with that too. Below is a patch which adds notes in
feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
Please consider this patch for 2.6.32.

> If we were able to rip out all (or most) of paravirt from arch/x86 it 
> would be tempting for other technical reasons - but the patch above is 
> well localized.

We can certainly look at removing some paravirt-hooks which are only
used by VMI. Not sure if there are any but will take a look when we
actually remove VMI.

Thanks,
Alok

--

Mark VMI for deprecation in feature-removal-schedule.txt.

From: Alok N Kataria 

Add text in feature-removal.txt and also modify Kconfig to disable
vmi by default.
Patch on top of tip/master.

Details about VMware's plan about retiring VMI  can be found here
http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html

---

 Documentation/feature-removal-schedule.txt |   24 
 arch/x86/Kconfig   |8 +---
 2 files changed, 29 insertions(+), 3 deletions(-)


diff --git a/Documentation/feature-removal-schedule.txt 
b/Documentation/feature-removal-schedule.txt
index fa75220..b985328 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -459,3 +459,27 @@ Why:   OSS sound_core grabs all legacy minors (0-255) 
of SOUND_MAJOR
will also allow making ALSA OSS emulation independent of
sound_core.  The dependency will be broken then too.
 Who:   Tejun Heo 
+
+
+
+What:  Support for VMware's guest paravirtuliazation technique [VMI] will be
+   dropped.
+When:  2.6.34
+Why:   With the recent innovations in CPU hardware acceleration technologies
+   from Intel and AMD, VMware ran a few experiments to compare these
+   techniques to guest paravirtulization technique on VMware's platform.
+   These hardware assisted virtualization techniques have outperformed the
+   performance benefits provided by VMI in most of the workloads. VMware
+   expects that these hardware features will be ubiquitous in a couple of
+   years, as a result, VMware has started a phased retirement of this
+   feature from the hypervisor. We will be removing this feature from the
+   Kernel too, in a couple of releases.
+   Please note that VMI has always been an optimization and non-VMI kernels
+   still work fine on VMware's platform.
+
+   For more details about VMI retirement take a look at this,
+   http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who:   Alok N Kataria 
+
+
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e214f45..1f3e156 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -485,14 +485,16 @@ if PARAVIRT_GUEST
 source "arch/x86/xen/Kconfig"
 
 config VMI
-   bool "VMI Guest support"
-   select PARAVIRT
-   depends on X86_32
+   bool "VMI Guest support [will be deprecated soon]"
+   default n
+   depends on X86_32 && PARAVIRT
---help---
  VMI provides a paravirtualized interface to the VMware ESX server
  (it could be used by other hypervisors in theory too, but is not
  at the moment), by linking the kernel to a GPL-ed ROM module
  provided by the hypervisor.
+ VMware has started a phased retirement of this feature from there
+ products. Please see feature-removal-schedule.txt for details.
 
 config KVM_CLOCK
bool "KVM paravirtualized clock"


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Jeremy Fitzhardinge
On 09/22/09 12:04, Ingo Molnar wrote:
> Sorry for being dense, but what does that mean precisely? No available 
> hardware? Xen doesnt run?

Nobody has implemented hybrid PV mode yet, so we haven't got anything to
measure.

Also, I don't think there have been very many measurements of Linux HVM
(full virtualization) Xen guests, because Linux is typically run
paravirtualized and HVM support is primarily tuned for Windows guests.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Ingo Molnar

* Jeremy Fitzhardinge  wrote:

> On 09/22/09 11:02, Ingo Molnar wrote:
>
> > obviously they are workload dependent - that's why numbers were 
> > posted in this thread with various workloads. Do you concur with 
> > those conclusions that they are generally a speedup over paravirt? 
> > If not, which are the workloads where paravirt offers significant 
> > speedup over hardware acceleration?
> 
> We're not in a position to do any useful measurements yet.

Sorry for being dense, but what does that mean precisely? No available 
hardware? Xen doesnt run?

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Arnd Bergmann
On Tuesday 22 September 2009, Stephen Hemminger wrote:
> > My idea for that was to open multiple file descriptors to the same
> > macvtap device and let the kernel figure out the  right thing to
> > do with that. You can do the same with raw packed sockets in case
> > of vhost_net, but I wouldn't want to add more complexity to the
> > tun/tap driver for this.
> > 
> Or get tap out of the way entirely. The packets should not have
> to go out to user space at all (see veth)

How does veth relate to that, do you mean vhost_net? With vhost_net,
you could still open multiple sockets, only the access is in the kernel.
Obviously, once it all is in the kernel, that could be done under the
covers, but I think it would be cleaner to treat vhost_net purely as
a way to bypass the syscalls for user space, with as little as possible
visible impact otherwise.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Jeremy Fitzhardinge
On 09/22/09 11:02, Ingo Molnar wrote:
> obviously they are workload dependent - that's why numbers were posted 
> in this thread with various workloads. Do you concur with those 
> conclusions that they are generally a speedup over paravirt? If not, 
> which are the workloads where paravirt offers significant speedup over 
> hardware acceleration?
>   

We're not in a position to do any useful measurements yet.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Ingo Molnar

* Jeremy Fitzhardinge  wrote:

> On 09/22/09 01:09, Ingo Molnar wrote:
> >>> kvm will be removing the pvmmu support soon; and Xen is talking about
> >>> running paravirtualized guests in a vmx/svm container where they don't
> >>> need most of the hooks.
> >>>   
> >> We have no plans to drop support for non-vmx/svm capable processors, 
> >> let alone require ept/npt.
> > 
> > But, just to map out our plans for the future, do you concur with 
> > the statements and numbers offered here by the VMware and KVM folks 
> > that on sufficiently recent hardware, hardware-assisted 
> > virtualization outperforms paravirt_ops in many (most?) workloads?
> 
> Well, what Avi is referring to here is some discussions about a hybrid 
> paravirtualized mode, in which Xen runs a normal Xen PV guest within a 
> hardware container in order to get some immediate optimisations, and 
> allow further optimisations like using hardware assisted paging 
> extensions.
> 
> For KVM and VMI, which always use a shadow pagetable scheme, hardware 
> paging is now unambigiously better than shadow pagetables, but for Xen 
> PV guests the picture is mixed since they don't use shadow pagetables. 
> The NPT/EPT extensions make updating the pagetable more efficent, but 
> actual access is more expensive because of the higher load on the TLB 
> and the increased expense of a TLB miss, so the actual performance 
> effects are very workload dependent.

obviously they are workload dependent - that's why numbers were posted 
in this thread with various workloads. Do you concur with those 
conclusions that they are generally a speedup over paravirt? If not, 
which are the workloads where paravirt offers significant speedup over 
hardware acceleration?

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Jeremy Fitzhardinge
On 09/22/09 00:22, Rusty Russell wrote:
> When they're all gone, even I don't think lguest is sufficient excuse
> to keep CONFIG_PARAVIRT.  Oh well.  But that will probably be a while.
>   

/Solidarność/!

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Jeremy Fitzhardinge
On 09/22/09 01:09, Ingo Molnar wrote:
>>> kvm will be removing the pvmmu support soon; and Xen is talking about
>>> running paravirtualized guests in a vmx/svm container where they don't
>>> need most of the hooks.
>>>   
>> We have no plans to drop support for non-vmx/svm capable processors, 
>> let alone require ept/npt.
>> 
> But, just to map out our plans for the future, do you concur with the 
> statements and numbers offered here by the VMware and KVM folks that
> on sufficiently recent hardware, hardware-assisted virtualization 
> outperforms paravirt_ops in many (most?) workloads?
>   

Well, what Avi is referring to here is some discussions about a hybrid
paravirtualized mode, in which Xen runs a normal Xen PV guest within a
hardware container in order to get some immediate optimisations, and
allow further optimisations like using hardware assisted paging extensions.

For KVM and VMI, which always use a shadow pagetable scheme, hardware
paging is now unambigiously better than shadow pagetables, but for Xen
PV guests the picture is mixed since they don't use shadow pagetables. 
The NPT/EPT extensions make updating the pagetable more efficent, but
actual access is more expensive because of the higher load on the TLB
and the increased expense of a TLB miss, so the actual performance
effects are very workload dependent.

> I.e. paravirt_ops becomes a legacy hardware thing, not a core component 
> of the design of arch/x86/.
>
> (with a long obsoletion period, of course.)
>   

I expect we'll eventually get to the point that the performance delta
and the installed userbase will no longer justify the effort in
maintaining the full set of pvops hooks.  But I don't have a good
feeling for when that might be.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Stephen Hemminger
On Tue, 22 Sep 2009 13:50:54 +0200
Arnd Bergmann  wrote:

> On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
> > > > More importantly, when virtualizations is used with multi-queue
> > > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > > > NIC should preserve the parallelism (lock free) using multiple
> > > > receive/transmit queues. The number of queues should equal the
> > > > number of CPUs.
> > > 
> > > Yup, multiqueue virtio is on todo list ;-)
> > > 
> > 
> > Note we'll need multiqueue tap for that to help.
> 
> My idea for that was to open multiple file descriptors to the same
> macvtap device and let the kernel figure out the  right thing to
> do with that. You can do the same with raw packed sockets in case
> of vhost_net, but I wouldn't want to add more complexity to the
> tun/tap driver for this.
> 
>   Arnd <><


Or get tap out of the way entirely. The packets should not have
to go out to user space at all (see veth)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-22 Thread Avi Kivity
On 09/22/2009 06:25 PM, Ira W. Snyder wrote:
>
>> Yes.  vbus is more finely layered so there is less code duplication.
>>
>> The virtio layering was more or less dictated by Xen which doesn't have
>> shared memory (it uses grant references instead).  As a matter of fact
>> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that
>> part is duplicated.  It's probably possible to add a virtio-shmem.ko
>> library that people who do have shared memory can reuse.
>>
>>  
> Seems like a nice benefit of vbus.
>

Yes, it is.  With some work virtio can gain that too (virtio-shmem.ko).

>>> I've given it some thought, and I think that running vhost-net (or
>>> similar) on the ppc boards, with virtio-net on the x86 crate server will
>>> work. The virtio-ring abstraction is almost good enough to work for this
>>> situation, but I had to re-invent it to work with my boards.
>>>
>>> I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
>>> Remember that this is the "host" system. I used each 4K block as a
>>> "device descriptor" which contains:
>>>
>>> 1) the type of device, config space, etc. for virtio
>>> 2) the "desc" table (virtio memory descriptors, see virtio-ring)
>>> 3) the "avail" table (available entries in the desc table)
>>>
>>>
>> Won't access from x86 be slow to this memory (on the other hand, if you
>> change it to main memory access from ppc will be slow... really depends
>> on how your system is tuned.
>>
>>  
> Writes across the bus are fast, reads across the bus are slow. These are
> just the descriptor tables for memory buffers, not the physical memory
> buffers themselves.
>
> These only need to be written by the guest (x86), and read by the host
> (ppc). The host never changes the tables, so we can cache a copy in the
> guest, for a fast detach_buf() implementation (see virtio-ring, which
> I'm copying the design from).
>
> The only accesses are writes across the PCI bus. There is never a need
> to do a read (except for slow-path configuration).
>

Okay, sounds like what you're doing it optimal then.

> In the spirit of "post early and often", I'm making my code available,
> that's all. I'm asking anyone interested for some review, before I have
> to re-code this for about the fifth time now. I'm trying to avoid
> Haskins' situation, where he's invented and debugged a lot of new code,
> and then been told to do it completely differently.
>
> Yes, the code I posted is only compile-tested, because quite a lot of
> code (kernel and userspace) must be working before anything works at
> all. I hate to design the whole thing, then be told that something
> fundamental about it is wrong, and have to completely re-write it.
>

Understood.  Best to get a review from Rusty then.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-22 Thread Amit Shah
On (Tue) Sep 22 2009 [12:14:04], Rusty Russell wrote:
> On Sat, 12 Sep 2009 01:30:10 am Alan Cox wrote:
> > > The interface presented to guest userspace is of a simple char
> > > device, so it can be used like this:
> > > 
> > > fd = open("/dev/vcon2", O_RDWR);
> > > ret = read(fd, buf, 100);
> > > ret = write(fd, string, strlen(string));
> > > 
> > > Each port is to be assigned a unique function, for example, the
> > > first 4 ports may be reserved for libvirt usage, the next 4 for
> > > generic streaming data and so on. This port-function mapping
> > > isn't finalised yet.
> > 
> > Unless I am missing something this looks completely bonkers
> > 
> > Every time we have a table of numbers for functionality it ends in
> > tears. We have to keep tables up to date and managed, we have to
> > administer the magical number to name space.
> 
> The number comes from the ABI; we need some identifier for the different
> ports.  Amit started using names, and I said "just use numbers"; they have
> to be documented and agreed by all clients anyway.
> 
> ie. the host says "here's a port id 7", which might be the cut & paste
> port or whatever.

Yeah; port 0 has to be reserved for a console (and then we might need
to do a bit more for multiple consoles -- hvc operates on a 'vtermno',
so we need to allocate them as well).

Also, a 'name' property can be attached to ports, as has been suggested:

qemu ... -device virtconport,name=org.qemu.clipboard,port=3,...

spawns a port at id 3 and the guest will also place a file:

/sys/class/virtio-console/vcon3/name

which has "org.qemu.clipboard" as contents, so udev scripts could
create a symlink:

/dev/vcon/org.qemu.clipboard -> /dev/vcon3

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-22 Thread Ira W. Snyder
On Tue, Sep 22, 2009 at 12:43:36PM +0300, Avi Kivity wrote:
> On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
> >
> >> Sure, virtio-ira and he is on his own to make a bus-model under that, or
> >> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
> >> model can work, I agree.
> >>
> >>  
> > Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
> > virtio-s390. It isn't especially easy. I can steal lots of code from the
> > lguest bus model, but sometimes it is good to generalize, especially
> > after the fourth implemention or so. I think this is what GHaskins tried
> > to do.
> >
> 
> Yes.  vbus is more finely layered so there is less code duplication.
> 
> The virtio layering was more or less dictated by Xen which doesn't have 
> shared memory (it uses grant references instead).  As a matter of fact 
> lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
> part is duplicated.  It's probably possible to add a virtio-shmem.ko 
> library that people who do have shared memory can reuse.
> 

Seems like a nice benefit of vbus.

> > I've given it some thought, and I think that running vhost-net (or
> > similar) on the ppc boards, with virtio-net on the x86 crate server will
> > work. The virtio-ring abstraction is almost good enough to work for this
> > situation, but I had to re-invent it to work with my boards.
> >
> > I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
> > Remember that this is the "host" system. I used each 4K block as a
> > "device descriptor" which contains:
> >
> > 1) the type of device, config space, etc. for virtio
> > 2) the "desc" table (virtio memory descriptors, see virtio-ring)
> > 3) the "avail" table (available entries in the desc table)
> >
> 
> Won't access from x86 be slow to this memory (on the other hand, if you 
> change it to main memory access from ppc will be slow... really depends 
> on how your system is tuned.
> 

Writes across the bus are fast, reads across the bus are slow. These are
just the descriptor tables for memory buffers, not the physical memory
buffers themselves.

These only need to be written by the guest (x86), and read by the host
(ppc). The host never changes the tables, so we can cache a copy in the
guest, for a fast detach_buf() implementation (see virtio-ring, which
I'm copying the design from).

The only accesses are writes across the PCI bus. There is never a need
to do a read (except for slow-path configuration).

> > Parts 2 and 3 are repeated three times, to allow for a maximum of three
> > virtqueues per device. This is good enough for all current drivers.
> >
> 
> The plan is to switch to multiqueue soon.  Will not affect you if your 
> boards are uniprocessor or small smp.
> 

Everything I have is UP. I don't need extreme performance, either.
40MB/sec is the minimum I need to reach, though I'd like to have some
headroom.

For reference, using the CPU to handle data transfers, I get ~2MB/sec
transfers. Using the DMA engine, I've hit about 60MB/sec with my
"crossed-wires" virtio-net.

> > I've gotten plenty of email about this from lots of interested
> > developers. There are people who would like this kind of system to just
> > work, while having to write just some glue for their device, just like a
> > network driver. I hunch most people have created some proprietary mess
> > that basically works, and left it at that.
> >
> 
> So long as you keep the system-dependent features hookable or 
> configurable, it should work.
> 
> > So, here is a desperate cry for help. I'd like to make this work, and
> > I'd really like to see it in mainline. I'm trying to give back to the
> > community from which I've taken plenty.
> >
> 
> Not sure who you're crying for help to.  Once you get this working, post 
> patches.  If the patches are reasonably clean and don't impact 
> performance for the main use case, and if you can show the need, I 
> expect they'll be merged.
> 

In the spirit of "post early and often", I'm making my code available,
that's all. I'm asking anyone interested for some review, before I have
to re-code this for about the fifth time now. I'm trying to avoid
Haskins' situation, where he's invented and debugged a lot of new code,
and then been told to do it completely differently.

Yes, the code I posted is only compile-tested, because quite a lot of
code (kernel and userspace) must be working before anything works at
all. I hate to design the whole thing, then be told that something
fundamental about it is wrong, and have to completely re-write it.

Thanks for the comments,
Ira

> -- 
> error compiling committee.c: too many arguments to function
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Arnd Bergmann
On Tuesday 22 September 2009, Michael S. Tsirkin wrote:
> > > More importantly, when virtualizations is used with multi-queue
> > > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > > NIC should preserve the parallelism (lock free) using multiple
> > > receive/transmit queues. The number of queues should equal the
> > > number of CPUs.
> > 
> > Yup, multiqueue virtio is on todo list ;-)
> > 
> 
> Note we'll need multiqueue tap for that to help.

My idea for that was to open multiple file descriptors to the same
macvtap device and let the kernel figure out the  right thing to
do with that. You can do the same with raw packed sockets in case
of vhost_net, but I wouldn't want to add more complexity to the
tun/tap driver for this.

Arnd <><
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] Virtual Machine Device Queues(VMDq) support on KVM

2009-09-22 Thread Michael S. Tsirkin
On Mon, Sep 21, 2009 at 09:27:18AM -0700, Chris Wright wrote:
> * Stephen Hemminger (shemmin...@vyatta.com) wrote:
> > On Mon, 21 Sep 2009 16:37:22 +0930
> > Rusty Russell  wrote:
> > 
> > > > > Actually this framework can apply to traditional network adapters 
> > > > > which have
> > > > > just one tx/rx queue pair. And applications using the same 
> > > > > user/kernel interface
> > > > > can utilize this framework to send/receive network traffic directly 
> > > > > thru a tx/rx
> > > > > queue pair in a network adapter.
> > > > > 
> > 
> > More importantly, when virtualizations is used with multi-queue
> > NIC's the virtio-net NIC is a single CPU bottleneck. The virtio-net
> > NIC should preserve the parallelism (lock free) using multiple
> > receive/transmit queues. The number of queues should equal the
> > number of CPUs.
> 
> Yup, multiqueue virtio is on todo list ;-)
> 
> thanks,
> -chris

Note we'll need multiqueue tap for that to help.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-22 Thread Avi Kivity
On 09/22/2009 12:43 AM, Ira W. Snyder wrote:
>
>> Sure, virtio-ira and he is on his own to make a bus-model under that, or
>> virtio-vbus + vbus-ira-connector to use the vbus framework.  Either
>> model can work, I agree.
>>
>>  
> Yes, I'm having to create my own bus model, a-la lguest, virtio-pci, and
> virtio-s390. It isn't especially easy. I can steal lots of code from the
> lguest bus model, but sometimes it is good to generalize, especially
> after the fourth implemention or so. I think this is what GHaskins tried
> to do.
>

Yes.  vbus is more finely layered so there is less code duplication.

The virtio layering was more or less dictated by Xen which doesn't have 
shared memory (it uses grant references instead).  As a matter of fact 
lguest, kvm/pci, and kvm/s390 all have shared memory, as you do, so that 
part is duplicated.  It's probably possible to add a virtio-shmem.ko 
library that people who do have shared memory can reuse.

> I've given it some thought, and I think that running vhost-net (or
> similar) on the ppc boards, with virtio-net on the x86 crate server will
> work. The virtio-ring abstraction is almost good enough to work for this
> situation, but I had to re-invent it to work with my boards.
>
> I've exposed a 16K region of memory as PCI BAR1 from my ppc board.
> Remember that this is the "host" system. I used each 4K block as a
> "device descriptor" which contains:
>
> 1) the type of device, config space, etc. for virtio
> 2) the "desc" table (virtio memory descriptors, see virtio-ring)
> 3) the "avail" table (available entries in the desc table)
>

Won't access from x86 be slow to this memory (on the other hand, if you 
change it to main memory access from ppc will be slow... really depends 
on how your system is tuned.

> Parts 2 and 3 are repeated three times, to allow for a maximum of three
> virtqueues per device. This is good enough for all current drivers.
>

The plan is to switch to multiqueue soon.  Will not affect you if your 
boards are uniprocessor or small smp.

> I've gotten plenty of email about this from lots of interested
> developers. There are people who would like this kind of system to just
> work, while having to write just some glue for their device, just like a
> network driver. I hunch most people have created some proprietary mess
> that basically works, and left it at that.
>

So long as you keep the system-dependent features hookable or 
configurable, it should work.

> So, here is a desperate cry for help. I'd like to make this work, and
> I'd really like to see it in mainline. I'm trying to give back to the
> community from which I've taken plenty.
>

Not sure who you're crying for help to.  Once you get this working, post 
patches.  If the patches are reasonably clean and don't impact 
performance for the main use case, and if you can show the need, I 
expect they'll be merged.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Ingo Molnar

* Jeremy Fitzhardinge  wrote:

> On 09/20/09 02:00, Avi Kivity wrote:
> > On 09/20/2009 10:52 AM, Arjan van de Ven wrote:
> >> On Sun, 20 Sep 2009 09:42:47 +0200
> >> Ingo Molnar  wrote:
> >>
> >>   
> >>> If we were able to rip out all (or most) of paravirt from arch/x86 it
> >>> would be tempting for other technical reasons - but the patch above
> >>> is well localized.
> >>>  
> >> interesting question is if this would allow us to remove a few of the
> >> paravirt hooks
> >>
> >
> > kvm will be removing the pvmmu support soon; and Xen is talking about
> > running paravirtualized guests in a vmx/svm container where they don't
> > need most of the hooks.
> 
> We have no plans to drop support for non-vmx/svm capable processors, 
> let alone require ept/npt.

But, just to map out our plans for the future, do you concur with the 
statements and numbers offered here by the VMware and KVM folks that
on sufficiently recent hardware, hardware-assisted virtualization 
outperforms paravirt_ops in many (most?) workloads?

I.e. paravirt_ops becomes a legacy hardware thing, not a core component 
of the design of arch/x86/.

(with a long obsoletion period, of course.)

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Paravirtualization on VMware's Platform [VMI].

2009-09-22 Thread Rusty Russell
On Sun, 20 Sep 2009 06:30:21 pm Avi Kivity wrote:
> On 09/20/2009 10:52 AM, Arjan van de Ven wrote:
> > On Sun, 20 Sep 2009 09:42:47 +0200
> > Ingo Molnar  wrote:
> >
> >
> >> If we were able to rip out all (or most) of paravirt from arch/x86 it
> >> would be tempting for other technical reasons - but the patch above
> >> is well localized.
> >>  
> > interesting question is if this would allow us to remove a few of the
> > paravirt hooks
> >
> 
> kvm will be removing the pvmmu support soon; and Xen is talking about 
> running paravirtualized guests in a vmx/svm container where they don't 
> need most of the hooks.

When they're all gone, even I don't think lguest is sufficient excuse
to keep CONFIG_PARAVIRT.  Oh well.  But that will probably be a while.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization