Re: [PATCH] KVM: x86: Remove user space triggerable MCE error message

2011-01-16 Thread Huang Ying
On Mon, 2011-01-17 at 15:11 +0800, Jan Kiszka wrote:
> On 2011-01-17 01:54, Huang Ying wrote:
> > On Sat, 2011-01-15 at 17:00 +0800, Jan Kiszka wrote:
> >> From: Jan Kiszka 
> >>
> >> This case is a pure user space error we do not need to record. Moreover,
> >> it can be misused to flood the kernel log. Remove it.
> > 
> > I don't think this is a pure user space error.  This happens on real
> > hardware too, if the Machine Check exception is raised during early boot
> > stage or the second MC exception is raised before the first MC exception
> > is processed/cleared.
> > 
> > So I use printk here to help debugging these issues.
> > 
> > To avoid flooding the kernel log, we can use ratelimit.
> 
> With user space I meant qemu, and maybe "error" was the wrong term. This
> code path is only triggered if qemu decides to.

Not only decided by qemu, but also decided by guest OS.  If guest OS
does not clear the MSR or guest OS does not set the X86_CR4_MCE bit in
the cr4, the triple fault will be triggered.

> And there you may also
> print this event (and you already do).

Sorry, which print do you mean?  I can not find similar print in user
space.

> Another reason to not rely on catching this case here: KVM_X86_SET_MCE
> is obsolete on current kernels. Qemu will use a combination of
> KVM_SET_MSRS and KVM_SET_VCPU_EVENTS in the future, only falling back to
> this interface on pre-vcpu-events kernels. Then you need to debug this
> in user space anyway as the triple fault will no longer make it to the
> kernel.

OK.  Then, I think it will be helpful for debugging if we can print
something like this in user space implementation.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Remove user space triggerable MCE error message

2011-01-16 Thread Jan Kiszka
On 2011-01-17 01:54, Huang Ying wrote:
> On Sat, 2011-01-15 at 17:00 +0800, Jan Kiszka wrote:
>> From: Jan Kiszka 
>>
>> This case is a pure user space error we do not need to record. Moreover,
>> it can be misused to flood the kernel log. Remove it.
> 
> I don't think this is a pure user space error.  This happens on real
> hardware too, if the Machine Check exception is raised during early boot
> stage or the second MC exception is raised before the first MC exception
> is processed/cleared.
> 
> So I use printk here to help debugging these issues.
> 
> To avoid flooding the kernel log, we can use ratelimit.

With user space I meant qemu, and maybe "error" was the wrong term. This
code path is only triggered if qemu decides to. And there you may also
print this event (and you already do).

Another reason to not rely on catching this case here: KVM_X86_SET_MCE
is obsolete on current kernels. Qemu will use a combination of
KVM_SET_MSRS and KVM_SET_VCPU_EVENTS in the future, only falling back to
this interface on pre-vcpu-events kernels. Then you need to debug this
in user space anyway as the triple fault will no longer make it to the
kernel.

Jan

> 
> Best Regards,
> Huang Ying
> 
>> Signed-off-by: Jan Kiszka 
>> ---
>>  arch/x86/kvm/x86.c |3 ---
>>  1 files changed, 0 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 9dda70d..7f7e4a5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2575,9 +2575,6 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu 
>> *vcpu,
>>  if (mce->status & MCI_STATUS_UC) {
>>  if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
>>  !kvm_read_cr4_bits(vcpu, X86_CR4_MCE)) {
>> -printk(KERN_DEBUG "kvm: set_mce: "
>> -   "injects mce exception while "
>> -   "previous one is in progress!\n");
>>  kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>>  return 0;
>>  }
> 
> 




signature.asc
Description: OpenPGP digital signature


buildbot failure in kvm on next-ia64

2011-01-16 Thread kvm
The Buildbot has detected a new failure of next-ia64 on kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-ia64/builds/50

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH uq/master 2/2] MCE, unpoison memory address across reboot

2011-01-16 Thread Huang Ying
On Fri, 2011-01-14 at 16:38 +0800, Jan Kiszka wrote:
> Am 14.01.2011 02:51, Huang Ying wrote:
> > On Thu, 2011-01-13 at 17:01 +0800, Jan Kiszka wrote:
> >> Am 13.01.2011 09:34, Huang Ying wrote:
[snip]
> >>> +
> >>> +void kvm_unpoison_all(void *param)
> >>
> >> Minor nit: This can be static now.
> > 
> > In uq/master, it can be make static.  But in kvm/master, kvm_arch_init
> > is not compiled because of conditional compiling, so we will get warning
> > and error for unused symbol.  Should we consider kvm/master in this
> > patch?
> 
> qemu-kvm is very close to switching to upstream kvm_*init. As long as it
> requires this service in its own modules, it will have to patch this
> detail. It does this for other functions already.

OK.  I will change this.

[snip]
> >> As indicated, I'm sitting on lots of fixes and refactorings of the MCE
> >> user space code. How do you test your patches? Any suggestions how to do
> >> this efficiently would be warmly welcome.
> > 
> > We use a self-made test script to test.  Repository is at:
> > 
> > git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
> > 
> > The kvm test script is in kvm sub-directory.
> > 
> > The qemu patch attached is need by the test script.
> > 
> 
> Yeah, I already found this yesterday and started reading. I was just
> searching for p2v in qemu, but now it's clear where it comes from. Will
> have a look (if you want to preview my changes:
> git://git.kiszka.org/qemu-kvm.git queues/kvm-upstream).
> 
> I was almost about to use MADV_HWPOISON instead of the injection module.
> Is there a way to recover the fake corruption afterward? I think that
> would allow to move some of the test logic into qemu and avoid p2v which
> - IIRC - was disliked upstream.

I don't know how to fully recover from  MADV_HWPOISON.  You can recover
the virtual address space via qemu_ram_remap() introduced in 1/2 of this
patchset.  But you will lose one or several physical pages for each
testing.  I think that may be not a big issue for a testing machine.

Ccing Andi and Fengguang, they know more than me about MADV_HWPOISON.

> Also, is there a way to simulate corrected errors (BUS_MCEERR_AO)?

BUS_MCEERR_AO is recoverable uncorrected error instead of corrected
error.

The test script is for BUS_MCEERR_AO and BUS_MCEERR_AR.  To see the
effect of pure BUS_MCEERR_AO, just remove the memory accessing loop
(memset) in tools/simple_process/simple_process.c.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Remove user space triggerable MCE error message

2011-01-16 Thread Huang Ying
On Sat, 2011-01-15 at 17:00 +0800, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> This case is a pure user space error we do not need to record. Moreover,
> it can be misused to flood the kernel log. Remove it.

I don't think this is a pure user space error.  This happens on real
hardware too, if the Machine Check exception is raised during early boot
stage or the second MC exception is raised before the first MC exception
is processed/cleared.

So I use printk here to help debugging these issues.

To avoid flooding the kernel log, we can use ratelimit.

Best Regards,
Huang Ying

> Signed-off-by: Jan Kiszka 
> ---
>  arch/x86/kvm/x86.c |3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9dda70d..7f7e4a5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2575,9 +2575,6 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu 
> *vcpu,
>   if (mce->status & MCI_STATUS_UC) {
>   if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
>   !kvm_read_cr4_bits(vcpu, X86_CR4_MCE)) {
> - printk(KERN_DEBUG "kvm: set_mce: "
> -"injects mce exception while "
> -"previous one is in progress!\n");
>   kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>   return 0;
>   }


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally

2011-01-16 Thread Huang Ying
Hi, Andrew,

On Sun, 2011-01-16 at 23:35 +0800, Avi Kivity wrote:
> On 01/14/2011 03:37 AM, Huang Ying wrote:
> > On Thu, 2011-01-13 at 18:43 +0800, Avi Kivity wrote:
> > >  On 01/13/2011 10:42 AM, Huang Ying wrote:
> > >  >  Make __get_user_pages return -EHWPOISON for HWPOISON page only if
> > >  >  FOLL_HWPOISON is specified.  With this patch, the interested callers
> > >  >  can distinguish HWPOISON page from general FAULT page, while other
> > >  >  callers will still get -EFAULT for pages, so the user space interface
> > >  >  need not to be changed.
> > >  >
> > >  >  get_user_pages_hwpoison is added as a variant of get_user_pages that
> > >  >  can return -EHWPOISON for HWPOISON page.
> > >  >
> > >  >  This feature is needed by KVM, where UCR MCE should be relayed to
> > >  >  guest for HWPOISON page, while instruction emulation and MMIO will be
> > >  >  tried for general FAULT page.
> > >  >
> > >  >  The idea comes from Andrew Morton.
> > >  >
> > >  >  Signed-off-by: Huang Ying
> > >  >  Cc: Andrew Morton
> > >  >
> > >  >  +#ifdef CONFIG_MEMORY_FAILURE
> > >  >  +int get_user_pages_hwpoison(struct task_struct *tsk, struct 
> > > mm_struct *mm,
> > >  >  + unsigned long start, int nr_pages, int 
> > > write,
> > >  >  + int force, struct page **pages,
> > >  >  + struct vm_area_struct **vmas);
> > >  >  +#else
> > >
> > >  Since we'd also like to add get_user_pages_noio(), perhaps adding a
> > >  flags field to get_user_pages() is better.
> >
> > Or export __get_user_pages()?
> 
> That's better, yes.

Do you think it is a good idea to export __get_user_pages() instead of
adding get_user_pages_noio() and get_user_pages_hwpoison()?

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-16 Thread Rusty Russell
On Mon, 17 Jan 2011 09:07:30 am Simon Horman wrote:

[snip]

I've been away, but what concerns me is that socket buffer limits are
bypassed in various configurations, due to skb cloning.  We should probably
drop such limits altogether, or fix them to be consistent.

Simple fix is as someone suggested here, to attach the clone.  That might
seriously reduce your sk limit, though.  I haven't thought about it hard,
but might it make sense to move ownership into skb_shared_info; ie. the
data, rather than the skb head?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-16 Thread Simon Horman
On Fri, Jan 14, 2011 at 08:54:15AM +0200, Michael S. Tsirkin wrote:
> On Fri, Jan 14, 2011 at 03:35:28PM +0900, Simon Horman wrote:
> > On Fri, Jan 14, 2011 at 06:58:18AM +0200, Michael S. Tsirkin wrote:
> > > On Fri, Jan 14, 2011 at 08:41:36AM +0900, Simon Horman wrote:
> > > > On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
> > > > > On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman  
> > > > > wrote:
> > > > > > On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> > > > > >> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> > > > > >> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> > > > > >> >
> > > > > >> > [ snip ]
> > > > > >> > >
> > > > > >> > > I know that everyone likes a nice netperf result but I agree 
> > > > > >> > > with
> > > > > >> > > Michael that this probably isn't the right question to be 
> > > > > >> > > asking.  I
> > > > > >> > > don't think that socket buffers are a real solution to the flow
> > > > > >> > > control problem: they happen to provide that functionality but 
> > > > > >> > > it's
> > > > > >> > > more of a side effect than anything.  It's just that the 
> > > > > >> > > amount of
> > > > > >> > > memory consumed by packets in the queue(s) doesn't really have 
> > > > > >> > > any
> > > > > >> > > implicit meaning for flow control (think multiple physical 
> > > > > >> > > adapters,
> > > > > >> > > all with the same speed instead of a virtual device and a 
> > > > > >> > > physical
> > > > > >> > > device with wildly different speeds).  The analog in the 
> > > > > >> > > physical
> > > > > >> > > world that you're looking for would be Ethernet flow control.
> > > > > >> > > Obviously, if the question is limiting CPU or memory 
> > > > > >> > > consumption then
> > > > > >> > > that's a different story.
> > > > > >> >
> > > > > >> > Point taken. I will see if I can control CPU (and thus memory) 
> > > > > >> > consumption
> > > > > >> > using cgroups and/or tc.
> > > > > >>
> > > > > >> I have found that I can successfully control the throughput using
> > > > > >> the following techniques
> > > > > >>
> > > > > >> 1) Place a tc egress filter on dummy0
> > > > > >>
> > > > > >> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then 
> > > > > >> eth1,
> > > > > >>    this is effectively the same as one of my hacks to the datapath
> > > > > >>    that I mentioned in an earlier mail. The result is that eth1
> > > > > >>    "paces" the connection.
> > > 
> > > This is actually a bug. This means that one slow connection will affect
> > > fast ones. I intend to change the default for qemu to sndbuf=0 : this
> > > will fix it but break your "pacing". So pls do not count on this
> > > behaviour.
> > 
> > Do you have a patch I could test?
> 
> You can (and users already can) just run qemu with sndbuf=0. But if you
> like, below.

Thanks

> > > > > > Further to this, I wonder if there is any interest in providing
> > > > > > a method to switch the action order - using ovs-ofctl is a hack 
> > > > > > imho -
> > > > > > and/or switching the default action order for mirroring.
> > > > > 
> > > > > I'm not sure that there is a way to do this that is correct in the
> > > > > generic case.  It's possible that the destination could be a VM while
> > > > > packets are being mirrored to a physical device or we could be
> > > > > multicasting or some other arbitrarily complex scenario.  Just think
> > > > > of what a physical switch would do if it has ports with two different
> > > > > speeds.
> > > > 
> > > > Yes, I have considered that case. And I agree that perhaps there
> > > > is no sensible default. But perhaps we could make it configurable 
> > > > somehow?
> > > 
> > > The fix is at the application level. Run netperf with -b and -w flags to
> > > limit the speed to a sensible value.
> > 
> > Perhaps I should have stated my goals more clearly.
> > I'm interested in situations where I don't control the application.
> 
> Well an application that streams UDP without any throttling
> at the application level will break on a physical network, right?
> So I am not sure why should one try to make it work on the virtual one.
> 
> But let's assume that you do want to throttle the guest
> for reasons such as QOS. The proper approach seems
> to be to throttle the sender, not have a dummy throttled
> receiver "pacing" it. Place the qemu process in the
> correct net_cls cgroup, set the class id and apply a rate limit?

I would like to be able to use a class to rate limit egress packets.
That much works fine for me.

What I would also like is for there to be back-pressure such that the guest
doesn't consume lots of CPU, spinning, sending packets as fast as it can,
almost of all of which are dropped. That does seem like a lot of wasted
CPU to me.

Unfortunately there are several problems with this and I am fast concluding
that I will need to use a CPU cgroup. Which does make some sense, as what I
am really trying to limit here is CP

[Bug 26872] New: qemu stop responding if using kvm with usb passthru

2011-01-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=26872

   Summary: qemu stop responding if using kvm with usb passthru
   Product: Virtualization
   Version: unspecified
Kernel Version: 2.6.37
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: alien.vi...@gmail.com
Regression: No


qemu stop responding, if using kvm with usb passthru.
if kvm disabled qemu does not hung, but usb device 2022:0008 didn't work. 
in processes:
13304 pts/0Dl+0:10  |   \_ /usr/bin/qemu-system-x86_64
--enable-kvm -boot c -vnc :8 -drive
file=/raid5/image.vmdk,cache=none,if=virtio,boot=on -name image1 -uuid
564d125f-0948-a1ff-7ae3-c9acb9a25626 -cdrom /virtio-win-last.iso -fda
/virtio-win-last.vfd -enable-kvm -m 1024 -usb -smp 1 -net
vde,vlan=0,name=vmwin1,sock=/var/run/vde.sock -net
nic,vlan=0,macaddr=00:0c:29:a2:56:26,model=virtio -usbdevice tablet -vga std
-monitor telnet:0.0.0.0:4008,server,nowait -device
usb-host,vendorid=2022,productid=8 -no-kvm-irqchip -no-kvm-pit

in console
husb: open device 2.2
husb: config #1 need -1
husb: 2 interfaces claimed for configuration 1
husb: grabbed usb device 2.2
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1

in messages
Jan 17 01:02:45 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:02:45 gentoo kernel: usb 2-9: reset full speed USB device using
ohci_hcd and address 2
Jan 17 01:02:45 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:02:46 gentoo kernel: usbfs 2-9:1.0: forced unbind
Jan 17 01:02:46 gentoo kernel: usbfs 2-9:1.1: forced unbind
Jan 17 01:02:46 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:02:46 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:02:46 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:02:46 gentoo kernel: usb 2-9: reset full speed USB device using
ohci_hcd and address 2
Jan 17 01:02:46 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:03:06 gentoo kernel: usbfs 2-9:1.0: forced unbind
Jan 17 01:03:06 gentoo kernel: usbfs 2-9:1.1: forced unbind
Jan 17 01:03:06 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:03:06 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:03:06 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:03:06 gentoo kernel: usb 2-9: reset full speed USB device using
ohci_hcd and address 2
Jan 17 01:03:06 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:03:07 gentoo kernel: usbfs 2-9:1.0: forced unbind
Jan 17 01:03:07 gentoo kernel: usbfs 2-9:1.1: forced unbind
Jan 17 01:03:07 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:03:07 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:03:07 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:03:07 gentoo kernel: usb 2-9: reset full speed USB device using
ohci_hcd and address 2
Jan 17 01:03:07 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:03:07 gentoo kernel: ohci_hcd :00:02.0: urb 88011f54a200 path
9 ep4in 4216 cc 4 --> status -32
Jan 17 01:03:26 gentoo kernel: usbfs 2-9:1.0: forced unbind
Jan 17 01:03:26 gentoo kernel: usbfs 2-9:1.1: forced unbind
Jan 17 01:03:26 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:03:26 gentoo kernel: ohci_hcd :00:02.0: GetStatus
roothub.portstatus [8] = 0x00100103 PRSC PPS PES CCS
Jan 17 01:03:26 gentoo kernel: hub 2-0:1.0: state 7 ports 10 chg  evt 0200
Jan 17 01:03:26 gentoo kerne

Re: [PATCH -v2] vmx: increase ple_gap default to 64

2011-01-16 Thread Avi Kivity

On 01/04/2011 04:51 PM, Rik van Riel wrote:

On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger
PLE exits, even with the minimalistic PLE test from kvm-unit-tests.

http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545

For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in
order to get pause loop exits:

# modprobe kvm_intel ple_gap=47
# taskset 1 /usr/local/bin/qemu-system-x86_64 -device testdev,chardev=log 
-chardev stdio,id=log -kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 58298446
# rmmod kvm_intel
# modprobe kvm_intel ple_gap=48
# taskset 1 /usr/local/bin/qemu-system-x86_64 -device testdev,chardev=log 
-chardev stdio,id=log -kernel x86/vmexit.flat -append ple-round-robin -smp 2
VNC server running on `::1:5900'
enabling apic
enabling apic
ple-round-robin 36616

Increase the ple_gap to 128 to be on the safe side.  Is this enough
for a CPU with HT that has a busy sibling thread, or should it be
even larger?   On the X5670, loading up the sibling thread with an
infinite loop does not seem to increase the required ple_gap.



Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v2] perf-kvm support for SVM

2011-01-16 Thread Avi Kivity

On 01/16/2011 05:35 PM, Joerg Roedel wrote:

On Sun, Jan 16, 2011 at 12:49:41PM +0200, Avi Kivity wrote:
>  On 01/14/2011 05:45 PM, Joerg Roedel wrote:

>>  here is the reworked version of the patch-set. Only patch 1/2 has
>>  changed and now contains the real fix for the crashes that were seen and
>>  has an updated log message.
>>
>
>  Thanks, applied.  2.6.37 and earlier aren't affected, yes?  So I'm
>  queuing it for 2.6.38 only.

I think the problem is there since KVM has lazy state switching. So the
fix in patch 1 should make it in all currently maintained stable-trees.



The problem is with load_gs_index(), yes?  In 2.6.37 this is called 
before stgi(), so it's protected from nmi.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally

2011-01-16 Thread Avi Kivity

On 01/14/2011 03:37 AM, Huang Ying wrote:

On Thu, 2011-01-13 at 18:43 +0800, Avi Kivity wrote:
>  On 01/13/2011 10:42 AM, Huang Ying wrote:
>  >  Make __get_user_pages return -EHWPOISON for HWPOISON page only if
>  >  FOLL_HWPOISON is specified.  With this patch, the interested callers
>  >  can distinguish HWPOISON page from general FAULT page, while other
>  >  callers will still get -EFAULT for pages, so the user space interface
>  >  need not to be changed.
>  >
>  >  get_user_pages_hwpoison is added as a variant of get_user_pages that
>  >  can return -EHWPOISON for HWPOISON page.
>  >
>  >  This feature is needed by KVM, where UCR MCE should be relayed to
>  >  guest for HWPOISON page, while instruction emulation and MMIO will be
>  >  tried for general FAULT page.
>  >
>  >  The idea comes from Andrew Morton.
>  >
>  >  Signed-off-by: Huang Ying
>  >  Cc: Andrew Morton
>  >
>  >  +#ifdef CONFIG_MEMORY_FAILURE
>  >  +int get_user_pages_hwpoison(struct task_struct *tsk, struct mm_struct 
*mm,
>  >  +   unsigned long start, int nr_pages, int write,
>  >  +   int force, struct page **pages,
>  >  +   struct vm_area_struct **vmas);
>  >  +#else
>
>  Since we'd also like to add get_user_pages_noio(), perhaps adding a
>  flags field to get_user_pages() is better.

Or export __get_user_pages()?


That's better, yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v2] perf-kvm support for SVM

2011-01-16 Thread Joerg Roedel
On Sun, Jan 16, 2011 at 12:49:41PM +0200, Avi Kivity wrote:
> On 01/14/2011 05:45 PM, Joerg Roedel wrote:

>> here is the reworked version of the patch-set. Only patch 1/2 has
>> changed and now contains the real fix for the crashes that were seen and
>> has an updated log message.
>>
>
> Thanks, applied.  2.6.37 and earlier aren't affected, yes?  So I'm  
> queuing it for 2.6.38 only.

I think the problem is there since KVM has lazy state switching. So the
fix in patch 1 should make it in all currently maintained stable-trees.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC -v5 PATCH 3/4] export pid symbols needed for kvm_vcpu_on_spin

2011-01-16 Thread Avi Kivity

On 01/14/2011 10:04 AM, Rik van Riel wrote:

Export the symbols required for a race-free kvm_vcpu_on_spin.



Needs to be reordered with the first patch for bisectability.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC -v5 PATCH 1/4] kvm: keep track of which task is running a KVM vcpu

2011-01-16 Thread Avi Kivity

On 01/14/2011 10:03 AM, Rik van Riel wrote:

Keep track of which task is running a KVM vcpu.  This helps us
figure out later what task to wake up if we want to boost a
vcpu that got preempted.

Unfortunately there are no guarantees that the same task
always keeps the same vcpu, so we can only track the task
across a single "run" of the vcpu.


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5225052..65e997a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -185,6 +185,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, 
unsigned id)
vcpu->cpu = -1;
vcpu->kvm = kvm;
vcpu->vcpu_id = id;
+   vcpu->pid = 0;


NULL


@@ -1456,6 +1459,12 @@ static long kvm_vcpu_ioctl(struct file *filp,
r = -EINVAL;
if (arg)
goto out;
+   if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
+   /* The thread running this VCPU changed. */
+   struct pid *oldpid = vcpu->pid;
+   vcpu->pid = get_task_pid(current, PIDTYPE_PID);
+   put_pid(oldpid);
+   }


This is subject to the same race as before.  If another vcpu picks up 
vcpu->pid before the assignment (that is, oldpid), but dereferences it 
after put_pid(), it hits freed memory.


You want something like

struct pid *oldpid = vcpu->pid;
rcu_assign_pointer(vcpu->pid, get_task_pid());
synchronize_rcu();
put_pid(oldpid);

with rcu_read_lock() / rcu_dereference() protection on the reader side.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Errors on MMIO read access on VM suspend / resume operations

2011-01-16 Thread Avi Kivity

On 01/14/2011 09:27 PM, Stefan Berger wrote:




Can you sprinkle some printfs() arount kvm_run (in qemu-kvm.c) to 
verify this?



Here's what I did:


interrupt exit requested


It appears from this you're using qemu.git.  Please try qemu-kvm.git, 
where the code appears to be correct.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v2] perf-kvm support for SVM

2011-01-16 Thread Avi Kivity

On 01/14/2011 05:45 PM, Joerg Roedel wrote:

Hi,

here is the reworked version of the patch-set. Only patch 1/2 has
changed and now contains the real fix for the crashes that were seen and
has an updated log message.



Thanks, applied.  2.6.37 and earlier aren't affected, yes?  So I'm 
queuing it for 2.6.38 only.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html