date:20091001

KVM: x86: disable paravirt mmu reporting

2009-10-01 Thread Marcelo Tosatti


Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.

Signed-off-by: Marcelo Tosatti 


diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffccb5c..fe467cb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1238,8 +1238,8 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_NR_MEMSLOTS:
r = KVM_MEMORY_SLOTS;
break;
-   case KVM_CAP_PV_MMU:
-   r = !tdp_enabled;
+   case KVM_CAP_PV_MMU:/* obsolete */
+   r = 0;
break;
case KVM_CAP_IOMMU:
r = iommu_found();

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: VMX: flush TLB with INVEPT on cpu migration

2009-10-01 Thread Marcelo Tosatti


It is possible that stale EPTP-tagged mappings are used, if a 
vcpu migrates to a different pcpu.

Set KVM_REQ_TLB_FLUSH in vmx_vcpu_load, when switching pcpus, which
will invalidate both VPID and EPT mappings on the next vm-entry.

Signed-off-by: Marcelo Tosatti 

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e86f1a6..97f4265 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -708,7 +708,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (vcpu->cpu != cpu) {
vcpu_clear(vmx);
kvm_migrate_timers(vcpu);
-   vpid_sync_vcpu_all(vmx);
+   set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
local_irq_disable();
list_add(&vmx->local_vcpus_link,
 &per_cpu(vcpus_on_cpu, cpu));
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task kjournal:337 blocked for more than 120 seconds

2009-10-01 Thread Shirley Ma

Switching to different scheduler doesn't make the problem gone away.

Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task kjournal:337 blocked for more than 120 seconds

2009-10-01 Thread Shirley Ma

On Thu, 2009-10-01 at 16:03 -0500, Javier Guerra wrote:
> deadline is the most recommended one for virtualization hosts.  some
> distros set it as default if you select Xen or KVM at installation
> time.  (and noop for the guests)

I spoke too earlier, after a while noop scheduler hit the same issue. I
am switching to deadline to test it again.

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Release plan for 0.12.0

2009-10-01 Thread Luiz Capitulino

On Wed, 30 Sep 2009 08:05:16 -0500
Anthony Liguori  wrote:

> Avi Kivity wrote:
> > On 09/30/2009 01:54 AM, Anthony Liguori wrote:
> >> Hi,
> >>
> >> Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0.
> >>
> >> I'd like to do a few things different this time around.  I don't 
> >> think the -rc process went very well as I don't think we got more 
> >> testing out of it.  I'd like to shorten the timeline for 0.12.0 a 
> >> good bit.  The 0.10 stable tree got pretty difficult to maintain 
> >> toward the end of the cycle.  We also had a pretty huge amount of 
> >> change between 0.10 and 0.11 so I think a shorter cycle is warranted.
> >>
> >> I think aiming for early to mid-December would give us roughly a 3 
> >> month cycle and would align well with some of the Linux distribution 
> >> cycles.  I'd like to limit things to a single -rc that lasted only 
> >> for about a week.  This is enough time to fix most of the obvious 
> >> issues I think.
> >>
> >> I'd also like to try to enumerate some features for this release.  
> >> Here's a short list of things I expect to see for this release 
> >> (target-i386 centric).  Please add or comment on items that you'd 
> >> either like to see in the release or are planning on working on.
> >>
> >> o VMState conversion -- I expect most of the pc target to be completed
> >> o qdev conversion -- I hope that we'll get most of the pc target 
> >> completely converted to qdev
> >> o storage live migration
> >> o switch to SeaBIOS (need to finish porting features from Bochs)
> >> o switch to gPXE (need to resolve slirp tftp server issue)
> >> o KSM integration
> >> o in-kernel APIC support for KVM
> >> o guest SMP support for KVM
> >> o updates to the default pc machine type
> >
> > Machine monitor protocol.
> 
> If we're going to support the protocol for 0.12, I'd like to most of the 
> code merged by the end of October.

 Four weeks.. Not so much time, but let's try.

 There are two major issues that may delay QMP.

 Firstly, we are still on the infrastructure/design phase, which
is natural to take time. Maybe when handlers start getting converted
en masse things will be faster.

 Secondly: testing. I have a very ugly python script to test the
already converted handlers. The problem is not only the ugliness,
the right way to do this would be to use kvm-autotest. So, I was
planning to take a detailed look at it and perhaps start writing
tests for QMP right when each handler is converted. Right Thing,
but takes time.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task kjournal:337 blocked for more than 120 seconds

2009-10-01 Thread Javier Guerra

On Thu, Oct 1, 2009 at 4:00 PM, Shirley Ma  wrote:
> I talked to Mingming, she suggested to use different IO scheduler. The
> default scheduler is cfg, after I switch to noop, the problem is gone.

deadline is the most recommended one for virtualization hosts.  some
distros set it as default if you select Xen or KVM at installation
time.  (and noop for the guests)


-- 
Javier
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task kjournal:337 blocked for more than 120 seconds

2009-10-01 Thread Shirley Ma

I talked to Mingming, she suggested to use different IO scheduler. The
default scheduler is cfg, after I switch to noop, the problem is gone. 

So there seems a bug in cfg scheduler. It's easily reproduced it when
running the guest kernel, so far I haven't hit this problem on the host
side.

If I need to file a bug for some one to look at, please let me know.

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-10-01 Thread Gregory Haskins

Avi Kivity wrote:
> On 09/30/2009 10:04 PM, Gregory Haskins wrote:
> 
> 
>>> A 2.6.27 guest, or Windows guest with the existing virtio drivers,
>>> won't work
>>> over vbus.
>>>  
>> Binary compatibility with existing virtio drivers, while nice to have,
>> is not a specific requirement nor goal.  We will simply load an updated
>> KMP/MSI into those guests and they will work again.  As previously
>> discussed, this is how more or less any system works today.  It's like
>> we are removing an old adapter card and adding a new one to "uprev the
>> silicon".
>>
> 
> Virtualization is about not doing that.  Sometimes it's necessary (when
> you have made unfixable design mistakes), but just to replace a bus,
> with no advantages to the guest that has to be changed (other
> hypervisors or hypervisorless deployment scenarios aren't).

The problem is that your continued assertion that there is no advantage
to the guest is a completely unsubstantiated claim.  As it stands right
now, I have a public git tree that, to my knowledge, is the fastest KVM
PV networking implementation around.  It also has capabilities that are
demonstrably not found elsewhere, such as the ability to render generic
shared-memory interconnects (scheduling, timers), interrupt-priority
(qos), and interrupt-coalescing (exit-ratio reduction).  I designed each
of these capabilities after carefully analyzing where KVM was coming up
short.

Those are facts.

I can't easily prove which of my new features alone are what makes it
special per se, because I don't have unit tests for each part that
breaks it down.  What I _can_ state is that its the fastest and most
feature rich KVM-PV tree that I am aware of, and others may download and
test it themselves to verify my claims.

The disproof, on the other hand, would be in a counter example that
still meets all the performance and feature criteria under all the same
conditions while maintaining the existing ABI.  To my knowledge, this
doesn't exist.

Therefore, if you believe my work is irrelevant, show me a git tree that
accomplishes the same feats in a binary compatible way, and I'll rethink
my position.  Until then, complaining about lack of binary compatibility
is pointless since it is not an insurmountable proposition, and the one
and only available solution declares it a required casualty.

> 
>>>   Further, non-shmem virtio can't work over vbus.
>>>  
>> Actually I misspoke earlier when I said virtio works over non-shmem.
>> Thinking about it some more, both virtio and vbus fundamentally require
>> shared-memory, since sharing their metadata concurrently on both sides
>> is their raison d'être.
>>
>> The difference is that virtio utilizes a pre-translation/mapping (via
>> ->add_buf) from the guest side.  OTOH, vbus uses a post translation
>> scheme (via memctx) from the host-side.  If anything, vbus is actually
>> more flexible because it doesn't assume the entire guest address space
>> is directly mappable.
>>
>> In summary, your statement is incorrect (though it is my fault for
>> putting that idea in your head).
>>
> 
> Well, Xen requires pre-translation (since the guest has to give the host
> (which is just another guest) permissions to access the data).

Actually I am not sure that it does require pre-translation.  You might
be able to use the memctx->copy_to/copy_from scheme in post translation
as well, since those would be able to communicate to something like the
xen kernel.  But I suppose either method would result in extra exits, so
there is no distinct benefit using vbus there..as you say below "they're
just different".

The biggest difference is that my proposed model gets around the notion
that the entire guest address space can be represented by an arbitrary
pointer.  For instance, the copy_to/copy_from routines take a GPA, but
may use something indirect like a DMA controller to access that GPA.  On
the other hand, virtio fully expects a viable pointer to come out of the
interface iiuc.  This is in part what makes vbus more adaptable to non-virt.

> So neither is a superset of the other, they're just different.
> 
> It doesn't really matter since Xen is unlikely to adopt virtio.

Agreed.

> 
>> An interesting thing here is that you don't even need a fancy
>> multi-homed setup to see the effects of my exit-ratio reduction work:
>> even single port configurations suffer from the phenomenon since many
>> devices have multiple signal-flows (e.g. network adapters tend to have
>> at least 3 flows: rx-ready, tx-complete, and control-events (link-state,
>> etc).  Whats worse, is that the flows often are indirectly related (for
>> instance, many host adapters will free tx skbs during rx operations, so
>> you tend to get bursts of tx-completes at the same time as rx-ready.  If
>> the flows map 1:1 with IDT, they will suffer the same problem.
>>
> 
> You can simply use the same vector for both rx and tx and poll both at
> every interrupt.

Yes, but that has its own problems: e.

Re: kvm or qemu-kvm?

2009-10-01 Thread Jim Paris

Avi Kivity wrote:
> On 10/01/2009 08:06 PM, Jim Paris wrote:
>> That's what I do.  Just as a warning, if you're using the libvirt
>> packages from Debian unstable, make sure you also install
>> linux-libc-dev from unstable before building qemu-kvm.
>>
>> Otherwise, virtio networking will fail.  The reason is that qemu-kvm
>> will be built against the Lenny linux-libc-dev which does not have
>> IFF_VNET_HDR, while the libvirt was built against a newer
>> linux-libc-dev that did define IFF_VNET_HDR.  If they don't agree then
>> things break.
>
> Isn't that a libvirt bug?  libvirt shouldn't assume qemu supports  
> IFF_VNET_HDR just because it sees it.

Probably, but I think it has no way of knowing, because qemu doesn't
report whether it supports it.

http://libvirt.org/git/?p=libvirt.git;a=commit;h=b4f62abbf1191c8fbab3306b4bf2f2567e18067f

"However, we need to be careful to only set the flag when a) QEMU has
support for this ABI and b) the value of the flag is queryable using
the TUNGETIFF ioctl.

It's nearly five months since kvm-74 - the first KVM release with this
feature - was released. Up until now, we've not added libvirt support
because there is no clean way to detect support for this in QEMU at
runtime. A brief attempt to add a "info capabilities" monitor command
to QEMU floundered. Perfect is the enemy of good enough. Probing the
KVM version will suffice for now."

-jim

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm or qemu-kvm?

2009-10-01 Thread Avi Kivity


On 10/01/2009 08:06 PM, Jim Paris wrote:

That's what I do.  Just as a warning, if you're using the libvirt
packages from Debian unstable, make sure you also install
linux-libc-dev from unstable before building qemu-kvm.

Otherwise, virtio networking will fail.  The reason is that qemu-kvm
will be built against the Lenny linux-libc-dev which does not have
IFF_VNET_HDR, while the libvirt was built against a newer
linux-libc-dev that did define IFF_VNET_HDR.  If they don't agree then
things break.
   


Isn't that a libvirt bug?  libvirt shouldn't assume qemu supports 
IFF_VNET_HDR just because it sees it.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] core, x86: Add user return notifiers

2009-10-01 Thread H. Peter Anvin

On 10/01/2009 08:30 AM, Avi Kivity wrote:
> 
> While I'd prefer it to be in .32, as there's much churn in both trees, I 
> guess we should follow the process and schedule it for .33.
> 

OK, sounds like a plan.  I'll pull it into -tip.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm or qemu-kvm?

2009-10-01 Thread Jim Paris

Ross Boylan wrote:
> My distro (Debian) is only at 85, even in unstable.  Since it wasn't
> current, and also the dependencies will have wide effects on my system
> (which I'm trying to keep at the stable release Lenny), I figured
> getting the current source and building it myself would be the best
> move.  For other reasons I'm already running a 2.6.30 kernel from
> Debian, which includes kernel side kvm.  So I figure I only need to mess
> with user space.

That's what I do.  Just as a warning, if you're using the libvirt
packages from Debian unstable, make sure you also install
linux-libc-dev from unstable before building qemu-kvm.

Otherwise, virtio networking will fail.  The reason is that qemu-kvm
will be built against the Lenny linux-libc-dev which does not have
IFF_VNET_HDR, while the libvirt was built against a newer
linux-libc-dev that did define IFF_VNET_HDR.  If they don't agree then
things break.

(just had to track this one down the other day)

-jim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-10-01 Thread Avi Kivity


On 10/01/2009 07:00 PM, Mark McLoughlin wrote:

On Thu, 2009-10-01 at 18:49 +0200, Avi Kivity wrote:
   

On 09/30/2009 03:59 PM, Mark McLoughlin wrote:
 

I think we should keep the vlan stuff, just de-emphasise it.

   

Maybe we should do what X.org does, break it silently and remove it some
time later when no one complains.
 

Well, the 'silently' part isn't going to work now, is it?
   


Me and my big mouth.


Anyway, I'm sure at least Jan would notice if his 'dump' backend stopped
working
   


Patch it to use libpcap?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm or qemu-kvm?

2009-10-01 Thread Avi Kivity


On 10/01/2009 06:51 PM, Ross Boylan wrote:

So which file should I start from?

   

qemu-kvm is the userspace component, kvm-kmod is the kernel component as
an external module.  'kvm' is a package containing both.
 

That helps a lot; maybe that info could go up on the bugs or faq page.
I couldn't find it (that is, there is info that there are kernel and
user space components, but not how these relate to the tar files).
   


Yes, I plan on writing something up.


In general the best place to start is with the distro provided packages.

 

My distro (Debian) is only at 85, even in unstable.  Since it wasn't
current, and also the dependencies will have wide effects on my system
(which I'm trying to keep at the stable release Lenny), I figured
getting the current source and building it myself would be the best
move.  For other reasons I'm already running a 2.6.30 kernel from
Debian, which includes kernel side kvm.  So I figure I only need to mess
with user space.
   


Right, stick with your kernel's kvm.ko, qemu-kvm-0.11.0 should make a 
good fit.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-10-01 Thread Mark McLoughlin

On Thu, 2009-10-01 at 18:49 +0200, Avi Kivity wrote:
> On 09/30/2009 03:59 PM, Mark McLoughlin wrote:
> > I think we should keep the vlan stuff, just de-emphasise it.
> >
> 
> Maybe we should do what X.org does, break it silently and remove it some 
> time later when no one complains.

Well, the 'silently' part isn't going to work now, is it?

Anyway, I'm sure at least Jan would notice if his 'dump' backend stopped
working

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Avi Kivity


On 10/01/2009 06:50 PM, Anthony Liguori wrote:

Avi Kivity wrote:

On 10/01/2009 04:23 PM, Anthony Liguori wrote:

Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a 
pipe from a signal handler.  The problem is that a write() can 
return a partial result and following the partial result, we can end 
up getting an EAGAIN.  We have no way to queue signals beyond that 
point and we have no sane way to deal with partial writes.


pipe buffers are multiples of of the signalfd size.  As long as we 
read and write signalfd-sized blocks, we won't get partial writes.  
It's true that depending on an implementation detail is bad practice, 
but this is emulation code, and if helps simplifying everything else, 
I think it's fine to use it.


That's a pretty hairy detail to rely upon..


Well, it's a posix detail, as I quoted below.  I'm not in love with it 
but it should work.




Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the 
fd's buffer.  This requires that we use an fd-per-signal which means 
we end up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  
I don't think that's a huge advantage and the inability to do 
backwards compatibility in a sane way means that emulated signalfd 
is not workable.


signalfd is several microseconds faster than signals + pipes.  Do we 
have so much performance we can throw some of it away?


Do we have any indication that this difference is actually 
observable?  This seems like very premature optimization.


Multiply the signal rate by "a few microseconds", if you get more than 
0.1% cpu it's worthwhile in my opinion.  The code is localized, and 
signalfd is a better interface than signals.





The same is generally true for eventfd.


eventfd emulation will also never get partial writes.


But you cannot emulate eventfd faithfully because eventfd is supposed 
to be additive.  If you write 1 50x to eventfd, you should be able to 
read a set of integers that add up to 50.  If you hit EAGAIN in a 
signal handler, you have no way of handling that.


We never rely on the count anyway.  You can simply ignore EAGAIN.

As I said earlier, the better thing to do is have a higher level 
interface that has a subset of the behavior of eventfd/signalfd that 
we can emulate correctly.


Sure, but it's more work.  Copying an existing interface is easier.  
It's not like there's no other work in qemu left to be done.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task journal:337 blocked for more than 120 seconds

2009-10-01 Thread Shirley Ma

On Thu, 2009-10-01 at 10:20 -0300, Marcelo Tosatti wrote:
> I've hit this in the past with ext3, mounting with data=writeback made
> it
> disappear.

Thanks. I will make a try. Someone should fix this.

Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm or qemu-kvm?

2009-10-01 Thread Ross Boylan

On Thu, 2009-10-01 at 18:42 +0200, Avi Kivity wrote:
> On 09/30/2009 04:21 AM, Ross Boylan wrote:
> > http://www.linux-kvm.org/page/HOWTO1 says to build kvm I should get the
> > latest kvm-release.tar.gz.
> >
> > http://www.linux-kvm.org/page/Downloads says "If you want to use the
> > latest version of KVM kernel modules and supporting userspace, you can
> > download the latest version from
> > http://sourceforge.net/project/showfiles.php?group_id=180599.";
> > That page shows the latest version is qemu-kvm-0.11.0.tar.gz.
> >
> > The most recent kvm-release.tar.gz appears to be for kvm-88.
> >
> > So which file should I start from?
> >
> 
> qemu-kvm is the userspace component, kvm-kmod is the kernel component as 
> an external module.  'kvm' is a package containing both.
That helps a lot; maybe that info could go up on the bugs or faq page.
I couldn't find it (that is, there is info that there are kernel and
user space components, but not how these relate to the tar files).
> 
> In general the best place to start is with the distro provided packages.
> 
My distro (Debian) is only at 85, even in unstable.  Since it wasn't
current, and also the dependencies will have wide effects on my system
(which I'm trying to keep at the stable release Lenny), I figured
getting the current source and building it myself would be the best
move.  For other reasons I'm already running a 2.6.30 kernel from
Debian, which includes kernel side kvm.  So I figure I only need to mess
with user space.

Ross

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Anthony Liguori


Avi Kivity wrote:

On 10/01/2009 04:23 PM, Anthony Liguori wrote:

Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a pipe 
from a signal handler.  The problem is that a write() can return a 
partial result and following the partial result, we can end up 
getting an EAGAIN.  We have no way to queue signals beyond that point 
and we have no sane way to deal with partial writes.


pipe buffers are multiples of of the signalfd size.  As long as we 
read and write signalfd-sized blocks, we won't get partial writes.  
It's true that depending on an implementation detail is bad practice, 
but this is emulation code, and if helps simplifying everything else, 
I think it's fine to use it.


That's a pretty hairy detail to rely upon..

Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the 
fd's buffer.  This requires that we use an fd-per-signal which means 
we end up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  I 
don't think that's a huge advantage and the inability to do backwards 
compatibility in a sane way means that emulated signalfd is not 
workable.


signalfd is several microseconds faster than signals + pipes.  Do we 
have so much performance we can throw some of it away?


Do we have any indication that this difference is actually observable?  
This seems like very premature optimization.



The same is generally true for eventfd.


eventfd emulation will also never get partial writes.


But you cannot emulate eventfd faithfully because eventfd is supposed to 
be additive.  If you write 1 50x to eventfd, you should be able to read 
a set of integers that add up to 50.  If you hit EAGAIN in a signal 
handler, you have no way of handling that.


As I said earlier, the better thing to do is have a higher level 
interface that has a subset of the behavior of eventfd/signalfd that we 
can emulate correctly.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-10-01 Thread Avi Kivity


On 09/30/2009 03:59 PM, Mark McLoughlin wrote:

I think we should keep the vlan stuff, just de-emphasise it.
   


Maybe we should do what X.org does, break it silently and remove it some 
time later when no one complains.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm or qemu-kvm?

2009-10-01 Thread Avi Kivity


On 09/30/2009 04:21 AM, Ross Boylan wrote:

http://www.linux-kvm.org/page/HOWTO1 says to build kvm I should get the
latest kvm-release.tar.gz.

http://www.linux-kvm.org/page/Downloads says "If you want to use the
latest version of KVM kernel modules and supporting userspace, you can
download the latest version from
http://sourceforge.net/project/showfiles.php?group_id=180599.";
That page shows the latest version is qemu-kvm-0.11.0.tar.gz.

The most recent kvm-release.tar.gz appears to be for kvm-88.

So which file should I start from?
   


qemu-kvm is the userspace component, kvm-kmod is the kernel component as 
an external module.  'kvm' is a package containing both.


In general the best place to start is with the distro provided packages.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm or qemu-kvm?

2009-10-01 Thread Ross Boylan

On Wed, 2009-09-30 at 20:51 -0500, Charles Duffy wrote:
> Ross Boylan wrote:
> > http://www.linux-kvm.org/page/HOWTO1 says to build kvm I should get the
> > latest kvm-release.tar.gz.
> > 
> > http://www.linux-kvm.org/page/Downloads says "If you want to use the
> > latest version of KVM kernel modules and supporting userspace, you can
> > download the latest version from
> > http://sourceforge.net/project/showfiles.php?group_id=180599.";
> > That page shows the latest version is qemu-kvm-0.11.0.tar.gz.
> > 
> > The most recent kvm-release.tar.gz appears to be for kvm-88.
> > 
> > So which file should I start from?
> 
> If you don't know what you want, you want qemu-kvm, which is based off a 
> stable release of qemu.
I'm trying to follow the advice on the bug reporting page to "Please use
the latest release version of kvm at the time you submit the bug."
Ross

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] core, x86: Add user return notifiers

2009-10-01 Thread Avi Kivity


On 10/01/2009 05:25 PM, H. Peter Anvin wrote:

On 10/01/2009 08:21 AM, Avi Kivity wrote:
   

Re-ping?

If accepted, please merge just the core patch and I will carry all of
them in parallel.  Once tip is merged I'll drop my copy of the first patch.

 

I was just talking to Ingo about this... would you consider this .32 or
.33 material at his time?  It came in awfully late for .32...
   


While I'd prefer it to be in .32, as there's much churn in both trees, I 
guess we should follow the process and schedule it for .33.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] core, x86: Add user return notifiers

2009-10-01 Thread H. Peter Anvin

On 10/01/2009 08:21 AM, Avi Kivity wrote:
> 
> Re-ping?
> 
> If accepted, please merge just the core patch and I will carry all of
> them in parallel.  Once tip is merged I'll drop my copy of the first patch.
> 

I was just talking to Ingo about this... would you consider this .32 or
.33 material at his time?  It came in awfully late for .32...

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] core, x86: Add user return notifiers

2009-10-01 Thread Avi Kivity


On 09/22/2009 06:19 PM, H. Peter Anvin wrote:

Ingo Molnar wrote:

* Avi Kivity  wrote:


On 09/19/2009 09:40 AM, Avi Kivity wrote:

Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace.  The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.

Ingo/Peter?


Would be nice to convert some existing open-coded 
return-to-user-space logic to this facility. One such candidate would 
be lockdep_sys_exit?


Ingo


Sorry, limited bandwidth due to LinuxCon, but I like the concept, and 
the previous (partial) patch was really clean.  I agree with Ingo that 
arch support so we can use this as a general facility would be nice, 
but I don't consider that as a prerequisite for merging.


Re-ping?

If accepted, please merge just the core patch and I will carry all of 
them in parallel.  Once tip is merged I'll drop my copy of the first patch.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Build problem found during daily testing (09/30/09)

2009-10-01 Thread Lucas Meneghel Rodrigues

On Wed, Sep 30, 2009 at 2:36 PM, Marcelo Tosatti  wrote:
> On Wed, Sep 30, 2009 at 08:51:43AM -0300, Lucas Meneghel Rodrigues wrote:
>> Today's git test failed due to a build problem:
>
>> /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.c:381: error: 
>> unknown field ‘change_pte’ specified in initializer
>> /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.c:381: warning: 
>> initialization from incompatible pointer type
>> make[3]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.o] 
>> Error 1
>>
>> Relevant commit hashes:
>>
>> 09/30 04:52:40 INFO | kvm_utils:0182| Commit hash for 
>> git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git is 
>> d80e68823cada7b6d850330da1edfdf8bff9e2e6 (v2.6.31-rc3-11538-gd80e688)
>> 09/30 04:53:21 INFO | kvm_utils:0182| Commit hash for 
>> git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 
>> 692d9aca97b865b0f7903565274a52606910f129 (kvm-88-1366-g692d9ac)
>> 09/30 04:53:23 INFO | kvm_utils:0182| Commit hash for 
>> git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git is 
>> b86de9524511f75bf9115047b7b57e1da86bfb37 (kvm-88-22-gb86de95)
>>
>> If you need more info please let me know,
>>
>> Lucas
>
> Lucas,
>
> I commited a half-assed fix to kvm-kmod.
>

And indeed it worked, today's git test is running. Thank you very much Marcelo!

Cheers,

Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Migration assigned device interrupts

2009-10-01 Thread Avi Kivity

It occurs to me that we're handling assigned device interrupts 
inefficiently: an interrupt is received on cpu A, injected, and wakes up 
(or forces out of guest mode) a vcpu on cpu B.  This involved an IPI and 
bothers two cpus instead of one.


But we often known which vcpu will be woken up (DM_FIXED interupts) and 
which cpu it runs on (vcpu->cpu, preempt notifiers) so we can migrate 
the host interrupt to follow the vcpu it wakes.  This should improve 
latency and cpu utilization.


I'm not sure how to do this generically (with irqfd), so vhost-net can 
benefit from it as well - migrate the vhost threads and the interrupts 
that feed them too.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Avi Kivity


On 10/01/2009 04:23 PM, Anthony Liguori wrote:

Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a pipe 
from a signal handler.  The problem is that a write() can return a 
partial result and following the partial result, we can end up getting 
an EAGAIN.  We have no way to queue signals beyond that point and we 
have no sane way to deal with partial writes.


pipe buffers are multiples of of the signalfd size.  As long as we read 
and write signalfd-sized blocks, we won't get partial writes.  It's true 
that depending on an implementation detail is bad practice, but this is 
emulation code, and if helps simplifying everything else, I think it's 
fine to use it.


hmm, pipe(7) says writes smaller than the pipe buffer size are atomic:

   O_NONBLOCK enabled, n <= PIPE_BUF
  If there is room to write n bytes to the pipe, then 
write(2) succeeds immediately, writing all n bytes;

  otherwise write(2) fails, with errno set to EAGAIN.

so it seems this practice has been blessed by posix.

Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the fd's 
buffer.  This requires that we use an fd-per-signal which means we end 
up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  I 
don't think that's a huge advantage and the inability to do backwards 
compatibility in a sane way means that emulated signalfd is not workable.


signalfd is several microseconds faster than signals + pipes.  Do we 
have so much performance we can throw some of it away?



The same is generally true for eventfd.


eventfd emulation will also never get partial writes.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Anthony Liguori


Juan Quintela wrote:

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly),


It's not an issue of being difficult.

To emulate signalfd, we need to create a thread that writes to a pipe 
from a signal handler.  The problem is that a write() can return a 
partial result and following the partial result, we can end up getting 
an EAGAIN.  We have no way to queue signals beyond that point and we 
have no sane way to deal with partial writes.


Instead, how we do this in upstream QEMU is that we install a signal 
handler and write one byte to the fd.  If we get EAGAIN, that's fine 
because all we care about is that at least one byte exists in the fd's 
buffer.  This requires that we use an fd-per-signal which means we end 
up with a different model than signalfd.


The reason to use signalfd over what we do in upstream QEMU is that 
signalfd can allow us to mask the signals which means less EINTRs.  I 
don't think that's a huge advantage and the inability to do backwards 
compatibility in a sane way means that emulated signalfd is not workable.


We could possibly introduce a higher level interface that only required 
one fd per signal and that had a function that drained the signals from 
the fd without returning any special information.


The same is generally true for eventfd.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Anthony Liguori


Christoph Hellwig wrote:

On Thu, Oct 01, 2009 at 01:58:10PM +0200, Juan Quintela wrote:
  

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly), and eventfd ...  The current
eventfd emulation is worse than the pipe code that it substitutes.

His suggestion here was to create a new abstraction with an API like:

push_notify()

pop_notify()

and then you can implement it with eventfd() pipes/whatever.

What was missing for you of compatfd: qemu_eventfd/qemu_signalfd?
Do a push_notify()/pop_notify() work for you?



I don't desperately want to use it myself anyway.  I just want to get
rid of the highly annoyind spurious differences in the AIO code due
to use of compatfd.  I would be perfectly fine with just killing this
use of eventfd in qemu-kvm.
  


That's what I'd suggest.  The use of eventfd in qemu-kvm is wrong 
because the compat function is not implemented correctly.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Fix warning in sync

2009-10-01 Thread Marcelo Tosatti

On Wed, Sep 30, 2009 at 05:05:35PM -1000, Zachary Amsden wrote:
> Patch is self-explanatory
.
> commit 071a800cd07c2b9d13c7909aa99016d89a814ae6
> Author: Zachary Amsden 
> Date:   Wed Sep 30 17:03:16 2009 -1000
> 
> Remove warning due to kvm_mmu_notifier_change_pte being static
> 
> Signed-off-by: Zachary Amsden 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Fix task switch back link handling (take 2)

2009-10-01 Thread Marcelo Tosatti

On Wed, Sep 30, 2009 at 05:39:07PM +0200, Juan Quintela wrote:
> Now, also remove pre_task_link setting in save_state_to_tss16.
> 
>   commit b237ac37a149e8b56436fabf093532483bff13b0
>   Author: Gleb Natapov 
>   Date:   Mon Mar 30 16:03:24 2009 +0300
> 
> KVM: Fix task switch back link handling.
> 
> CC: Gleb Natapov 
> Signed-off-by: Juan Quintela 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: INFO: task journal:337 blocked for more than 120 seconds

2009-10-01 Thread Marcelo Tosatti

On Wed, Sep 30, 2009 at 02:11:52PM -0700, Shirley Ma wrote:
> Hello all,
> 
> Anybody found this problem before? I kept hitting this issue for 2.6.31
> guest kernel even with a simple network test.
> 
> INFO: task kjournal:337 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_sec" disables this message.
> 
> kjournald D 0041  0   337 2 0x
> 
> My test is totally being blocked.

I've hit this in the past with ext3, mounting with data=writeback made it
disappear.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Christoph Hellwig

On Thu, Oct 01, 2009 at 01:58:10PM +0200, Juan Quintela wrote:
> Discused with Anthony about it.  signalfd is complicated for qemu
> upstream (too difficult to use properly), and eventfd ...  The current
> eventfd emulation is worse than the pipe code that it substitutes.
> 
> His suggestion here was to create a new abstraction with an API like:
> 
> push_notify()
> 
> pop_notify()
> 
> and then you can implement it with eventfd() pipes/whatever.
> 
> What was missing for you of compatfd: qemu_eventfd/qemu_signalfd?
> Do a push_notify()/pop_notify() work for you?

I don't desperately want to use it myself anyway.  I just want to get
rid of the highly annoyind spurious differences in the AIO code due
to use of compatfd.  I would be perfectly fine with just killing this
use of eventfd in qemu-kvm.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/24] compatfd is included before, and it is compiled unconditionally

2009-10-01 Thread Juan Quintela

Christoph Hellwig  wrote:
> On Tue, Sep 22, 2009 at 03:25:13PM +0200, Juan Quintela wrote:
>> Christoph Hellwig  wrote:
>> > Btw, what's the state of getting compatfd upstream?  It's a pretty
>> > annoying difference between qemu upstream and qemu-kvm.
>> 
>> I haven't tried.  I can try to send a patch.  Do you have any use case
>> that will help the cause?
>
> Well, the eventfd compat is used in the thread pool AIO code.  I don't
> know what difference it makes, but I really hate this code beeing
> different in both trees.  I want to see compatfd used either in both or
> none.

Discused with Anthony about it.  signalfd is complicated for qemu
upstream (too difficult to use properly), and eventfd ...  The current
eventfd emulation is worse than the pipe code that it substitutes.

His suggestion here was to create a new abstraction with an API like:

push_notify()

pop_notify()

and then you can implement it with eventfd() pipes/whatever.

What was missing for you of compatfd: qemu_eventfd/qemu_signalfd?
Do a push_notify()/pop_notify() work for you?

Later, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Q: Stopped VM still using host cpu CPU ?

2009-10-01 Thread Daniel Schwager

One more,

> So, how can I prevent the paused/stopped VM's to use my CPU from
> the hostsystem ? Is there a way to handle this ?

If i send a signal STOP/CONT (kill -STOP  or kill -CONT )
to the KVM-process, it looks like the kvm does not (sure ;-) use
any host CPU usage.

- Are there some side effects using this approach ? 
 (e.g. with networking, ...)

- And, why is there a way to send STOP/CONT via socket to KVM-process ?
Why
  not using the sending-signal-apporch ?

best regards
Danny

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Q: Stopped VM still using host cpu CPU ?

2009-10-01 Thread Daniel Schwager

Hi,

we are running some stopped (sending "stop" via kvm-monitor socket) 
vm's on our system. My intention was to pause (stop) the vm's and
unpause (cont) them on demand (very fast, without time delay, within 2
seconds ..).

After 'stop'ing, the vm's still using CPU-load, like the "top" will
tell:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

25983 root  20   0  495m 407m 1876 R  8.9  2.5 228:09.15
qemu-system-x86  
25523 root  20   0  495m 2040 1868 S  7.9  0.0   2700:16
qemu-system-x86  
19738 root  20   0 1139m  40m 1876 R  7.3  0.3   2501:57
qemu-system-x86  
32521 root  20   0  495m 407m 1876 S  6.9  2.5 136:23.23
qemu-system-x86  
21773 root  20   0  495m 407m 1876 S  5.6  2.5 575:01.44
qemu-system-x86  
 6720 root  20   0 2168m 2.0g 1876 R  4.6 12.9 524:55.70
qemu-system-x86  
 6819 root  20   0 2168m 2.0g 1876 S  4.6 12.9 489:24.24
qemu-system-x86  
12752 root  20   0  495m 407m 1876 S  4.6  2.5   0:44.57
qemu-system-x86  
22803 root  20   0 1139m 211m 1868 S  4.0  1.3   1217:21
qemu-system-x86  
27083 root  20   0 1139m 1.0g 1868 S  4.0  6.5 121:58.05
qemu-system-x86  
21970 root  20   0  495m 407m 1876 S  3.6  2.5 434:48.36
qemu-system-x86  
 3649 root  20   0  496m 394m 1908 S  3.3  2.5  39:41.61
qemu-system-x86  
19919 root  20   0  559m 402m 1880 S  3.3  2.5 469:57.14
qemu-system-x86  
24461 root  20   0  495m 2048 1876 S  3.3  0.0   1081:41
qemu-system-x86  
24619 root  20   0  688m 536m 1880 S  3.0  3.3 293:15.38
qemu-system-x86  
24776 root  20   0  495m 407m 1856 S  3.0  2.5  92:08.95
qemu-system-x86  
...

So, how can I prevent the paused/stopped VM's to use my CPU from
the hostsystem ? Is there a way to handle this ?

Sure, stop+migrating the vm to filesystem will solve my problem - but
restart
(migrate back from filesystem to RAM) takes a lot (>30 seconds) of time.


best regards 

Danny

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-10-01 Thread Michael S. Tsirkin

On Thu, Oct 01, 2009 at 10:34:17AM +0200, Avi Kivity wrote:
>> Second, I do not use ioeventfd anymore because it has too many problems
>> with the surrounding technology.  However, that is a topic for a
>> different thread.
>>
>
> Please post your issues.  I see ioeventfd/irqfd as critical kvm interfaces.

I second that. AFAIK ioeventfd/irqfd got exposed to userspace in 2.6.32-rc1,
if there are issues we better nail them before 2.6.32 is out.
And yes, please start a different thread.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-10-01 Thread Avi Kivity


On 09/30/2009 10:04 PM, Gregory Haskins wrote:



A 2.6.27 guest, or Windows guest with the existing virtio drivers, won't work
over vbus.
 

Binary compatibility with existing virtio drivers, while nice to have,
is not a specific requirement nor goal.  We will simply load an updated
KMP/MSI into those guests and they will work again.  As previously
discussed, this is how more or less any system works today.  It's like
we are removing an old adapter card and adding a new one to "uprev the
silicon".
   


Virtualization is about not doing that.  Sometimes it's necessary (when 
you have made unfixable design mistakes), but just to replace a bus, 
with no advantages to the guest that has to be changed (other 
hypervisors or hypervisorless deployment scenarios aren't).



  Further, non-shmem virtio can't work over vbus.
 

Actually I misspoke earlier when I said virtio works over non-shmem.
Thinking about it some more, both virtio and vbus fundamentally require
shared-memory, since sharing their metadata concurrently on both sides
is their raison d'être.

The difference is that virtio utilizes a pre-translation/mapping (via
->add_buf) from the guest side.  OTOH, vbus uses a post translation
scheme (via memctx) from the host-side.  If anything, vbus is actually
more flexible because it doesn't assume the entire guest address space
is directly mappable.

In summary, your statement is incorrect (though it is my fault for
putting that idea in your head).
   


Well, Xen requires pre-translation (since the guest has to give the host 
(which is just another guest) permissions to access the data).  So 
neither is a superset of the other, they're just different.


It doesn't really matter since Xen is unlikely to adopt virtio.


An interesting thing here is that you don't even need a fancy
multi-homed setup to see the effects of my exit-ratio reduction work:
even single port configurations suffer from the phenomenon since many
devices have multiple signal-flows (e.g. network adapters tend to have
at least 3 flows: rx-ready, tx-complete, and control-events (link-state,
etc).  Whats worse, is that the flows often are indirectly related (for
instance, many host adapters will free tx skbs during rx operations, so
you tend to get bursts of tx-completes at the same time as rx-ready.  If
the flows map 1:1 with IDT, they will suffer the same problem.
   


You can simply use the same vector for both rx and tx and poll both at 
every interrupt.



In any case, here is an example run of a simple single-homed guest over
standard GigE.  Whats interesting here is that .qnotify to .notify
ratio, as this is the interrupt-to-signal ratio.  In this case, its
170047/151918, which comes out to about 11% savings in interrupt injections:

vbus-guest:/home/ghaskins # netperf -H dev
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
dev.laurelwood.net (192.168.1.10) port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

1048576  16384  1638410.01 940.77
vbus-guest:/home/ghaskins # cat /sys/kernel/debug/pci-to-vbus-bridge
   .events: 170048
   .qnotify   : 151918
   .qinject   : 0
   .notify: 170047
   .inject: 18238
   .bridgecalls   : 18
   .buscalls  : 12
vbus-guest:/home/ghaskins # cat /proc/interrupts
 CPU0
0: 87   IO-APIC-edge  timer
1:  6   IO-APIC-edge  i8042
4:733   IO-APIC-edge  serial
6:  2   IO-APIC-edge  floppy
7:  0   IO-APIC-edge  parport0
8:  0   IO-APIC-edge  rtc0
9:  0   IO-APIC-fasteoi   acpi
   10:  0   IO-APIC-fasteoi   virtio1
   12: 90   IO-APIC-edge  i8042
   14:   3041   IO-APIC-edge  ata_piix
   15:   1008   IO-APIC-edge  ata_piix
   24: 151933   PCI-MSI-edge  vbus
   25:  0   PCI-MSI-edge  virtio0-config
   26:190   PCI-MSI-edge  virtio0-input
   27: 28   PCI-MSI-edge  virtio0-output
  NMI:  0   Non-maskable interrupts
  LOC:   9854   Local timer interrupts
  SPU:  0   Spurious interrupts
  CNT:  0   Performance counter interrupts
  PND:  0   Performance pending work
  RES:  0   Rescheduling interrupts
  CAL:  0   Function call interrupts
  TLB:  0   TLB shootdowns
  TRM:  0   Thermal event interrupts
  THR:  0   Threshold APIC interrupts
  MCE:  0   Machine check exceptions
  MCP:  1   Machine check polls
  ERR:  0
  MIS:  0

Its important to note here that we are actually looking at the interrupt
rate, not the exit rate (which is usually a multiple of the interrupt
rate, since you have to factor in as many as three exits per interrupt
(IPI,

Re: [Qemu-devel] Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-10-01 Thread Mark McLoughlin

On Wed, 2009-09-30 at 21:15 +0200, Gerd Hoffmann wrote:
> On 09/30/09 15:59, Mark McLoughlin wrote:
> > I'm planning on adding -hostnet and -nic arguments, which would not use
> > vlans by default but rather connect the nic directly to the host side.
> 
> No new -nic argument please.  We should just finalize the qdev-ifycation 
> of the nic drivers, then you'll do either
> 
>-device e1000,vlan=
> 
> or
> 
>-device e1000,hostnet=
> 
> and be done with it.

Yeah, that makes sense.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

40 matches

Mail list logo