Re: [PATCH 1/6] KVM: Check for pending events before attempting injection

2010-07-28 Thread Gleb Natapov
On Tue, Jul 27, 2010 at 04:19:35PM +0300, Avi Kivity wrote:
> Instead of blindly attempting to inject an event before each guest entry,
> check for a possible event first in vcpu->requests.  Sites that can trigger
> event injection are modified to set KVM_REQ_EVENT:
> 
> - interrupt, nmi window opening
> - ppr updates
> - i8259 output changes
> - local apic irr changes
> - rflags updates
> - gif flag set
> - event set on exit
> 
What about userspace irq chip? Does it work with this patch? I don't see
that you set KVM_REQ_EVENT on ioctl(KVM_INTERRUPT) for instance and
vcpu->run->request_interrupt_window should be probably checked out of
if (KVM_REQ_EVEN). It looks like with this approach we scatter irq
injection logic all over the code instead of having it in one place.

> This improves non-injecting entry performance, and sets the stage for
> non-atomic injection.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/i8259.c |1 +
>  arch/x86/kvm/lapic.c |   12 ++--
>  arch/x86/kvm/svm.c   |8 +++-
>  arch/x86/kvm/vmx.c   |6 ++
>  arch/x86/kvm/x86.c   |   35 ++-
>  include/linux/kvm_host.h |1 +
>  6 files changed, 51 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> index 8d10c06..9f7ab44 100644
> --- a/arch/x86/kvm/i8259.c
> +++ b/arch/x86/kvm/i8259.c
> @@ -64,6 +64,7 @@ static void pic_unlock(struct kvm_pic *s)
>   if (!found)
>   found = s->kvm->bsp_vcpu;
>  
> + kvm_make_request(KVM_REQ_EVENT, found);
>   kvm_vcpu_kick(found);
>   }
>  }
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 77d8c0f..e83d203 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -259,9 +259,10 @@ static inline int apic_find_highest_isr(struct kvm_lapic 
> *apic)
>  
>  static void apic_update_ppr(struct kvm_lapic *apic)
>  {
> - u32 tpr, isrv, ppr;
> + u32 tpr, isrv, ppr, old_ppr;
>   int isr;
>  
> + old_ppr = apic_get_reg(apic, APIC_PROCPRI);
>   tpr = apic_get_reg(apic, APIC_TASKPRI);
>   isr = apic_find_highest_isr(apic);
>   isrv = (isr != -1) ? isr : 0;
> @@ -274,7 +275,10 @@ static void apic_update_ppr(struct kvm_lapic *apic)
>   apic_debug("vlapic %p, ppr 0x%x, isr 0x%x, isrv 0x%x",
>  apic, ppr, isr, isrv);
>  
> - apic_set_reg(apic, APIC_PROCPRI, ppr);
> + if (old_ppr != ppr) {
> + apic_set_reg(apic, APIC_PROCPRI, ppr);
> + kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> + }
>  }
>  
>  static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
> @@ -391,6 +395,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
> delivery_mode,
>   break;
>   }
>  
> + kvm_make_request(KVM_REQ_EVENT, vcpu);
>   kvm_vcpu_kick(vcpu);
>   break;
>  
> @@ -416,6 +421,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
> delivery_mode,
>  "INIT on a runnable vcpu %d\n",
>  vcpu->vcpu_id);
>   vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> + kvm_make_request(KVM_REQ_EVENT, vcpu);
>   kvm_vcpu_kick(vcpu);
>   } else {
>   apic_debug("Ignoring de-assert INIT to vcpu %d\n",
> @@ -430,6 +436,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
> delivery_mode,
>   result = 1;
>   vcpu->arch.sipi_vector = vector;
>   vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
> + kvm_make_request(KVM_REQ_EVENT, vcpu);
>   kvm_vcpu_kick(vcpu);
>   }
>   break;
> @@ -475,6 +482,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
>   trigger_mode = IOAPIC_EDGE_TRIG;
>   if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI))
>   kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> + kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>  }
>  
>  static void apic_send_ipi(struct kvm_lapic *apic)
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 56c9b6b..a51e067 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2258,6 +2258,7 @@ static int stgi_interception(struct vcpu_svm *svm)
>  
>   svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
>   skip_emulated_instruction(&svm->vcpu);
> + kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
>  
>   enable_gif(svm);
>  
> @@ -2644,6 +2645,7 @@ static int interrupt_window_interception(struct 
> vcpu_svm *svm)
>  {
>   struct kvm_run *kvm_run = svm->vcpu.run;
>  
> + kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
>   svm_clear_vintr(svm);
>   svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
>   /*
> @@ -3089,8 +3091,10 @@ static void svm_complet

Re: [PATCH 5/6] kvm, x86: use ro page and don't copy shared page

2010-07-28 Thread Gleb Natapov
On Thu, Jul 29, 2010 at 10:15:22AM +0800, Lai Jiangshan wrote:
> On 07/16/2010 03:19 PM, Gleb Natapov wrote:
> 
> >> +/* get a current mapped page fast, and test whether the page is writable. 
> >> */
> >> +static struct page *get_user_page_and_protection(unsigned long addr,
> >> +  int *writable)
> >> +{
> >> +  struct page *page[1];
> >> +
> >> +  if (__get_user_pages_fast(addr, 1, 1, page) == 1) {
> >> +  *writable = 1;
> >> +  return page[0];
> >> +  }
> >> +  if (__get_user_pages_fast(addr, 1, 0, page) == 1) {
> >> +  *writable = 0;
> >> +  return page[0];
> >> +  }
> >> +  return NULL;
> >> +}
> >> +
> >> +static pfn_t kvm_get_pfn_for_page_fault(struct kvm *kvm, gfn_t gfn,
> >> +  int write_fault, int *host_writable)
> >> +{
> >> +  unsigned long addr;
> >> +  struct page *page;
> >> +
> >> +  if (!write_fault) {
> >> +  addr = gfn_to_hva(kvm, gfn);
> >> +  if (kvm_is_error_hva(addr)) {
> >> +  get_page(bad_page);
> >> +  return page_to_pfn(bad_page);
> >> +  }
> >> +
> >> +  page = get_user_page_and_protection(addr, host_writable);
> >> +  if (page)
> >> +  return page_to_pfn(page);
> >> +  }
> >> +
> >> +  *host_writable = 1;
> >> +  return kvm_get_pfn_for_gfn(kvm, gfn);
> >> +}
> >> +
> > kvm_get_pfn_for_gfn() returns fault_page if page is mapped RO, so caller
> > of kvm_get_pfn_for_page_fault() and kvm_get_pfn_for_gfn() will get
> > different results when called on the same page. Not good.
> > kvm_get_pfn_for_page_fault() logic should be folded into
> > kvm_get_pfn_for_gfn().
> > 
> 
> 
> The different results are the things we just need.
How so? Users of kvm_get_pfn_for_gfn() will think that page is invalid
and may report misconfiguration to userspace and users of
kvm_get_pfn_for_page_fault() will think that the access to page is OK.
There are no many users of kvm_get_pfn_for_gfn() and may be your patch
replace all of them with kvm_get_pfn_for_page_fault(), but this just
strengthen the point that they should be merged.

> We don't want to copy and write a page which is mapped RO when
> only read fault.
I don't see how returning inconsistent results helps us achieving that.

BTW since kvm_get_pfn_for_gfn() will never map RO page
get_user_page_and_protection() will never find any RO pages. Looks like
kvm_get_pfn_for_page_fault() is equivalent to kvm_get_pfn_for_gfn()
since !write_fault section will at best find mapped RW page.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM Test: time_drift_with_stop: Wait a few seconds after the continue

2010-07-28 Thread Amos Kong
Except for kvm-clock, we need to wait a few seconds to let the interrupt
lost during the stop to be reinjected. So sleep for 30 seconds ( could
be configurated throug configuration file).

Signed-off-by: Jason Wang 
Signed-off-by: Amos Kong 
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/timedrift_with_stop.py 
b/client/tests/kvm/tests/timedrift_with_stop.py
index fe98571..3473276 100644
--- a/client/tests/kvm/tests/timedrift_with_stop.py
+++ b/client/tests/kvm/tests/timedrift_with_stop.py
@@ -20,6 +20,7 @@ def run_timedrift_with_stop(test, params, env):
 @param env: Dictionary with the test environment.
 """
 login_timeout = int(params.get("login_timeout", 360))
+sleep_time = int(params.get("sleep_time", 30))
 vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
 session = kvm_test_utils.wait_for_login(vm, timeout=login_timeout)
 
@@ -54,6 +55,10 @@ def run_timedrift_with_stop(test, params, env):
 time.sleep(stop_time)
 vm.monitor.cmd("cont")
 
+# Sleep for a while to wait the interrupt to be reinjected
+logging.info("Waiting for the interrupt to be reinjected ...")
+time.sleep(sleep_time)
+
 # Get time after current iteration
 (ht1_, gt1_) = kvm_test_utils.get_time(session, time_command,
time_filter_re, time_format)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Marcelo Tosatti
On Thu, Jul 29, 2010 at 02:18:20AM +0300, Mohammed Gamal wrote:
> >  Fourth test:
> >   qemu-system-x86-12850 [001]  5167.510302: kvm_emulate_insn: 0:4200: 9c 
> > (real)
> >   qemu-system-x86-12850 [001]  5167.510304: kvm_emulate_insn: 0:4201: 58 
> > (real)
> >   qemu-system-x86-12850 [001]  5167.510306: kvm_emulate_insn: 0:4202:
> >  83 e0 fd (real)
> >   qemu-system-x86-12850 [001]  5167.510308: kvm_emulate_insn: 0:4205:
> >  0d 28 80 (real)
> >   qemu-system-x86-12850 [001]  5167.510310: kvm_emulate_insn: 0:4208: 50 
> > (real)
> >   qemu-system-x86-12850 [001]  5167.510312: kvm_emulate_insn: 0:4209: 0e 
> > (real)
> >   qemu-system-x86-12850 [001]  5167.510313: kvm_emulate_insn: 0:420a:
> >  e8 02 00 (real)
> >   qemu-system-x86-12850 [001]  5167.510315: kvm_emulate_insn: 0:420f: cf 
> > (real)
> >   qemu-system-x86-12850 [001]  5167.510318: kvm_emulate_insn: 0:420d:
> >  eb 01 (real)
> >
> 
> As an extra note, you need to run realmode.flat with the
> emulate_invalid_guest_state=1 module option for kvm_intel. If you
> don't use that option, the emulator is not going to be invoked.

Doh, right.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] kvm, x86: use ro page and don't copy shared page

2010-07-28 Thread Lai Jiangshan
On 07/17/2010 07:26 AM, Marcelo Tosatti wrote:
> On Fri, Jul 16, 2010 at 10:19:36AM +0300, Gleb Natapov wrote:
>> On Fri, Jul 16, 2010 at 10:13:07AM +0800, Lai Jiangshan wrote:
>>> When page fault, we always call get_user_pages(write=1).
>>>
>>> Actually, we don't need to do this when it is not write fault.
>>> get_user_pages(write=1) will cause shared page(ksm) copied.
>>> If this page is not modified in future, this copying and the copied page
>>> are just wasted. Ksm may scan and merge them and may cause thrash.
>>>
>> But is page is written into afterwords we will get another page fault.
>>
>>> In this patch, if the page is RO for host VMM and it not write fault for 
>>> guest,
>>> we will use RO page, otherwise we use a writable page.
>>>
>> Currently pages allocated for guest memory are required to be RW, so after 
>> your series
>> behaviour will remain exactly the same as before.
> 
> Except KSM pages.
> 
>>> Signed-off-by: Lai Jiangshan 
>>> ---
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 8ba9b0d..6382140 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -1832,6 +1832,45 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  
>>> gfn_t gfn)
>>> }
>>>  }
>>>  
>>> +/* get a current mapped page fast, and test whether the page is writable. 
>>> */
>>> +static struct page *get_user_page_and_protection(unsigned long addr,
>>> +   int *writable)
>>> +{
>>> +   struct page *page[1];
>>> +
>>> +   if (__get_user_pages_fast(addr, 1, 1, page) == 1) {
>>> +   *writable = 1;
>>> +   return page[0];
>>> +   }
>>> +   if (__get_user_pages_fast(addr, 1, 0, page) == 1) {
>>> +   *writable = 0;
>>> +   return page[0];
>>> +   }
>>> +   return NULL;
>>> +}
>>> +
>>> +static pfn_t kvm_get_pfn_for_page_fault(struct kvm *kvm, gfn_t gfn,
>>> +   int write_fault, int *host_writable)
>>> +{
>>> +   unsigned long addr;
>>> +   struct page *page;
>>> +
>>> +   if (!write_fault) {
>>> +   addr = gfn_to_hva(kvm, gfn);
>>> +   if (kvm_is_error_hva(addr)) {
>>> +   get_page(bad_page);
>>> +   return page_to_pfn(bad_page);
>>> +   }
>>> +
>>> +   page = get_user_page_and_protection(addr, host_writable);
>>> +   if (page)
>>> +   return page_to_pfn(page);
>>> +   }
>>> +
>>> +   *host_writable = 1;
>>> +   return kvm_get_pfn_for_gfn(kvm, gfn);
>>> +}
>>> +
>> kvm_get_pfn_for_gfn() returns fault_page if page is mapped RO, so caller
>> of kvm_get_pfn_for_page_fault() and kvm_get_pfn_for_gfn() will get
>> different results when called on the same page. Not good.
>> kvm_get_pfn_for_page_fault() logic should be folded into
>> kvm_get_pfn_for_gfn().
> 
> Agreed. Please keep gfn_to_pfn related code in virt/kvm/kvm_main.c.
> 
> 

Pass write_fault parameter to kvm_get_pfn_for_gfn()?
But only X86 use this parameter currently, I think it is OK to
keep these code in arch/x86/kvm/mmu.c
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] kvm, x86: use ro page and don't copy shared page

2010-07-28 Thread Lai Jiangshan
On 07/16/2010 03:19 PM, Gleb Natapov wrote:

>> +/* get a current mapped page fast, and test whether the page is writable. */
>> +static struct page *get_user_page_and_protection(unsigned long addr,
>> +int *writable)
>> +{
>> +struct page *page[1];
>> +
>> +if (__get_user_pages_fast(addr, 1, 1, page) == 1) {
>> +*writable = 1;
>> +return page[0];
>> +}
>> +if (__get_user_pages_fast(addr, 1, 0, page) == 1) {
>> +*writable = 0;
>> +return page[0];
>> +}
>> +return NULL;
>> +}
>> +
>> +static pfn_t kvm_get_pfn_for_page_fault(struct kvm *kvm, gfn_t gfn,
>> +int write_fault, int *host_writable)
>> +{
>> +unsigned long addr;
>> +struct page *page;
>> +
>> +if (!write_fault) {
>> +addr = gfn_to_hva(kvm, gfn);
>> +if (kvm_is_error_hva(addr)) {
>> +get_page(bad_page);
>> +return page_to_pfn(bad_page);
>> +}
>> +
>> +page = get_user_page_and_protection(addr, host_writable);
>> +if (page)
>> +return page_to_pfn(page);
>> +}
>> +
>> +*host_writable = 1;
>> +return kvm_get_pfn_for_gfn(kvm, gfn);
>> +}
>> +
> kvm_get_pfn_for_gfn() returns fault_page if page is mapped RO, so caller
> of kvm_get_pfn_for_page_fault() and kvm_get_pfn_for_gfn() will get
> different results when called on the same page. Not good.
> kvm_get_pfn_for_page_fault() logic should be folded into
> kvm_get_pfn_for_gfn().
> 


The different results are the things we just need.
We don't want to copy and write a page which is mapped RO when
only read fault.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Mohammed Gamal
On Thu, Jul 29, 2010 at 2:16 AM, Mohammed Gamal  wrote:
> On Thu, Jul 29, 2010 at 2:13 AM, Mohammed Gamal  wrote:
>> On Wed, Jul 28, 2010 at 11:56 PM, Marcelo Tosatti  
>> wrote:
>>> On Wed, Jul 28, 2010 at 11:36:16PM +0300, Mohammed Gamal wrote:
 On Wed, Jul 28, 2010 at 11:32 PM, Marcelo Tosatti  
 wrote:
 > On Wed, Jul 28, 2010 at 11:28:05PM +0300, Mohammed Gamal wrote:
 >> On Wed, Jul 28, 2010 at 10:55 PM, Marcelo Tosatti  
 >> wrote:
 >> > On Wed, Jul 28, 2010 at 12:39:01PM +0300, Mohammed Gamal wrote:
 >> >> This adds a unit test for real mode emulation of the iret instruction
 >> >>
 >> >> Signed-off-by: Mohammed Gamal 
 >> >>
 >> >> ---
 >> >> Changes from v1:
 >> >> - Added test for 16-bit iret
 >> >> - Added tests for returned eflags
 >> >> ---
 >> >>  kvm/test/x86/realmode.c |   79 
 >> >> +++
 >> >>  1 files changed, 79 insertions(+), 0 deletions(-)
 >> >>
 >> >> diff --git a/kvm/test/x86/realmode.c b/kvm/test/x86/realmode.c
 >> >> index bd79348..74456c3 100644
 >> >> --- a/kvm/test/x86/realmode.c
 >> >> +++ b/kvm/test/x86/realmode.c
 >> >> @@ -865,6 +865,84 @@ void test_pusha_popa()
 >> >>               print_serial("Pusha/Popa Test2: PASS\n");
 >> >>  }
 >> >>
 >> >> +void test_iret()
 >> >> +{
 >> >> +     struct regs inregs = { 0 }, outregs;
 >> >> +
 >> >> +     MK_INSN(iret32, "pushf\n\t"
 >> >> +                     "pushl %cs\n\t"
 >> >> +                     "call 1f\n\t" /* a near call will push eip 
 >> >> onto the stack */
 >> >> +                     "jmp 2f\n\t"
 >> >> +                     "1: iret\n\t"
 >> >> +                     "2:\n\t"
 >> >> +                  );
 >> >> +
 >> >> +     MK_INSN(iret16, "pushfw\n\t"
 >> >> +                     "pushw %cs\n\t"
 >> >> +                     "callw 1f\n\t"
 >> >> +                     "jmp 2f\n\t"
 >> >> +                     "1: iretw\n\t"
 >> >> +                     "2:\n\t");
 >> >
 >> > Unless iret causes a task switch, it will not cause an exit.
 >> >
 >> >
 >> The test covers only the real mode case, we won't have a task switch 
 >> there, no?
 >
 > No. And there is no vmexit for iret, so it does not go through the
 > emulator.
 >
 I'm pretty sure the iret instruction went through the emulator. I saw
 the instruction opcode (0xcf) being in ftrace.
>>>
>>> Can you confirm that please? I don't see it here.
>>>
>>
> Sure, here is the ftrace output corresponding to the test instructions
>
> From the first test
> qemu-system-x86-12850 [001]  5167.509842: kvm_emulate_insn: 0:4200: 66 9c 
> (real)
>  qemu-system-x86-12850 [001]  5167.509844: kvm_emulate_insn: 0:4202:
> 66 0e (real)
>  qemu-system-x86-12850 [001]  5167.509847: kvm_emulate_insn: 0:4204:
> 66 e8 02 00 00 00 (real)
>  qemu-system-x86-12850 [001]  5167.509849: kvm_emulate_insn: 0:420c:
> 66 cf (real)
>  qemu-system-x86-12850 [001]  5167.509852: kvm_emulate_insn: 0:420a:
> eb 02 (real)
>
> Second test:
> qemu-system-x86-12850 [001]  5167.509998: kvm_emulate_insn: 0:4200: 9c (real)
>  qemu-system-x86-12850 [001]  5167.51: kvm_emulate_insn: 0:4201: 0e (real)
>  qemu-system-x86-12850 [001]  5167.510002: kvm_emulate_insn: 0:4202:
> e8 02 00 (real)
>   qemu-system-x86-12850 [001]  5167.510004: kvm_emulate_insn: 0:4207: cf 
> (real)
>   qemu-system-x86-12850 [001]  5167.510006: kvm_emulate_insn: 0:4205:
> eb 01 (real)
>
> Third test:
>           <...>-12850 [001]  5167.510162: kvm_emulate_insn: 0:4200: 66 9c 
> (real)
>            <...>-12850 [001]  5167.510164: kvm_emulate_insn: 0:4202:
>  66 58 (real)
>            <...>-12850 [001]  5167.510166: kvm_emulate_insn: 0:4204:
>  66 83 e0 fd (real)
>            <...>-12850 [001]  5167.510168: kvm_emulate_insn: 0:4208:
>  66 0d 28 80 c0 ff (real)
>            <...>-12850 [001]  5167.510170: kvm_emulate_insn: 0:420e:
>  66 50 (real)
>            <...>-12850 [001]  5167.510172: kvm_emulate_insn: 0:4210:
>  66 0e (real)
>            <...>-12850 [001]  5167.510174: kvm_emulate_insn: 0:4212:
>  66 e8 02 00 00 00 (real)
>            <...>-12850 [001]  5167.510176: kvm_emulate_insn: 0:421a:
>  66 cf (real)
>            <...>-12850 [001]  5167.510179: kvm_emulate_insn: 0:4218:
>  eb 02 (real)
>
>
>  Fourth test:
>   qemu-system-x86-12850 [001]  5167.510302: kvm_emulate_insn: 0:4200: 9c 
> (real)
>   qemu-system-x86-12850 [001]  5167.510304: kvm_emulate_insn: 0:4201: 58 
> (real)
>   qemu-system-x86-12850 [001]  5167.510306: kvm_emulate_insn: 0:4202:
>  83 e0 fd (real)
>   qemu-system-x86-12850 [001]  5167.510308: kvm_emulate_insn: 0:4205:
>  0d 28 80 (real)
>   qemu-system-x86-12850 [001]  5167.510310: kvm_emulate_insn: 0:4208: 50 
> (real)
>   qemu-system-x86-12850 [001]  5167.510312: kvm_emulate_insn: 0:4209: 0e 
> (real)
>   qemu-system-x86-12850 [001]  5167.51

Re: [PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Mohammed Gamal
On Thu, Jul 29, 2010 at 2:13 AM, Mohammed Gamal  wrote:
> On Wed, Jul 28, 2010 at 11:56 PM, Marcelo Tosatti  wrote:
>> On Wed, Jul 28, 2010 at 11:36:16PM +0300, Mohammed Gamal wrote:
>>> On Wed, Jul 28, 2010 at 11:32 PM, Marcelo Tosatti  
>>> wrote:
>>> > On Wed, Jul 28, 2010 at 11:28:05PM +0300, Mohammed Gamal wrote:
>>> >> On Wed, Jul 28, 2010 at 10:55 PM, Marcelo Tosatti  
>>> >> wrote:
>>> >> > On Wed, Jul 28, 2010 at 12:39:01PM +0300, Mohammed Gamal wrote:
>>> >> >> This adds a unit test for real mode emulation of the iret instruction
>>> >> >>
>>> >> >> Signed-off-by: Mohammed Gamal 
>>> >> >>
>>> >> >> ---
>>> >> >> Changes from v1:
>>> >> >> - Added test for 16-bit iret
>>> >> >> - Added tests for returned eflags
>>> >> >> ---
>>> >> >>  kvm/test/x86/realmode.c |   79 
>>> >> >> +++
>>> >> >>  1 files changed, 79 insertions(+), 0 deletions(-)
>>> >> >>
>>> >> >> diff --git a/kvm/test/x86/realmode.c b/kvm/test/x86/realmode.c
>>> >> >> index bd79348..74456c3 100644
>>> >> >> --- a/kvm/test/x86/realmode.c
>>> >> >> +++ b/kvm/test/x86/realmode.c
>>> >> >> @@ -865,6 +865,84 @@ void test_pusha_popa()
>>> >> >>               print_serial("Pusha/Popa Test2: PASS\n");
>>> >> >>  }
>>> >> >>
>>> >> >> +void test_iret()
>>> >> >> +{
>>> >> >> +     struct regs inregs = { 0 }, outregs;
>>> >> >> +
>>> >> >> +     MK_INSN(iret32, "pushf\n\t"
>>> >> >> +                     "pushl %cs\n\t"
>>> >> >> +                     "call 1f\n\t" /* a near call will push eip onto 
>>> >> >> the stack */
>>> >> >> +                     "jmp 2f\n\t"
>>> >> >> +                     "1: iret\n\t"
>>> >> >> +                     "2:\n\t"
>>> >> >> +                  );
>>> >> >> +
>>> >> >> +     MK_INSN(iret16, "pushfw\n\t"
>>> >> >> +                     "pushw %cs\n\t"
>>> >> >> +                     "callw 1f\n\t"
>>> >> >> +                     "jmp 2f\n\t"
>>> >> >> +                     "1: iretw\n\t"
>>> >> >> +                     "2:\n\t");
>>> >> >
>>> >> > Unless iret causes a task switch, it will not cause an exit.
>>> >> >
>>> >> >
>>> >> The test covers only the real mode case, we won't have a task switch 
>>> >> there, no?
>>> >
>>> > No. And there is no vmexit for iret, so it does not go through the
>>> > emulator.
>>> >
>>> I'm pretty sure the iret instruction went through the emulator. I saw
>>> the instruction opcode (0xcf) being in ftrace.
>>
>> Can you confirm that please? I don't see it here.
>>
>
Sure, here is the ftrace output corresponding to the test instructions

>From the first test
qemu-system-x86-12850 [001]  5167.509842: kvm_emulate_insn: 0:4200: 66 9c (real)
 qemu-system-x86-12850 [001]  5167.509844: kvm_emulate_insn: 0:4202:
66 0e (real)
 qemu-system-x86-12850 [001]  5167.509847: kvm_emulate_insn: 0:4204:
66 e8 02 00 00 00 (real)
 qemu-system-x86-12850 [001]  5167.509849: kvm_emulate_insn: 0:420c:
66 cf (real)
 qemu-system-x86-12850 [001]  5167.509852: kvm_emulate_insn: 0:420a:
eb 02 (real)

Second test:
qemu-system-x86-12850 [001]  5167.509998: kvm_emulate_insn: 0:4200: 9c (real)
 qemu-system-x86-12850 [001]  5167.51: kvm_emulate_insn: 0:4201: 0e (real)
 qemu-system-x86-12850 [001]  5167.510002: kvm_emulate_insn: 0:4202:
e8 02 00 (real)
  qemu-system-x86-12850 [001]  5167.510004: kvm_emulate_insn: 0:4207: cf (real)
  qemu-system-x86-12850 [001]  5167.510006: kvm_emulate_insn: 0:4205:
eb 01 (real)

Third test:
          <...>-12850 [001]  5167.510162: kvm_emulate_insn: 0:4200: 66 9c (real)
           <...>-12850 [001]  5167.510164: kvm_emulate_insn: 0:4202:
 66 58 (real)
           <...>-12850 [001]  5167.510166: kvm_emulate_insn: 0:4204:
 66 83 e0 fd (real)
           <...>-12850 [001]  5167.510168: kvm_emulate_insn: 0:4208:
 66 0d 28 80 c0 ff (real)
           <...>-12850 [001]  5167.510170: kvm_emulate_insn: 0:420e:
 66 50 (real)
           <...>-12850 [001]  5167.510172: kvm_emulate_insn: 0:4210:
 66 0e (real)
           <...>-12850 [001]  5167.510174: kvm_emulate_insn: 0:4212:
 66 e8 02 00 00 00 (real)
           <...>-12850 [001]  5167.510176: kvm_emulate_insn: 0:421a:
 66 cf (real)
           <...>-12850 [001]  5167.510179: kvm_emulate_insn: 0:4218:
 eb 02 (real)


 Fourth test:
  qemu-system-x86-12850 [001]  5167.510302: kvm_emulate_insn: 0:4200: 9c (real)
  qemu-system-x86-12850 [001]  5167.510304: kvm_emulate_insn: 0:4201: 58 (real)
  qemu-system-x86-12850 [001]  5167.510306: kvm_emulate_insn: 0:4202:
 83 e0 fd (real)
  qemu-system-x86-12850 [001]  5167.510308: kvm_emulate_insn: 0:4205:
 0d 28 80 (real)
  qemu-system-x86-12850 [001]  5167.510310: kvm_emulate_insn: 0:4208: 50 (real)
  qemu-system-x86-12850 [001]  5167.510312: kvm_emulate_insn: 0:4209: 0e (real)
  qemu-system-x86-12850 [001]  5167.510313: kvm_emulate_insn: 0:420a:
 e8 02 00 (real)
  qemu-system-x86-12850 [001]  5167.510315: kvm_emulate_insn: 0:420f: cf (real)
  qemu-system-x86-12850 [001]  5167.510318: kvm_emulate_insn: 0:420d:
 eb 01 (real)
--
To unsubscribe from this list

Re: [PATCH RFC 0/4] Paravirt-spinlock implementation for KVM guests (Version 0)

2010-07-28 Thread Konrad Rzeszutek Wilk
On Wed, Jul 28, 2010 at 06:10:59PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Jul 26, 2010 at 11:41:50AM +0530, Srivatsa Vaddagiri wrote:
> > This patch-series implements paravirt-spinlock implementation for KVM 
> > guests,
> > based heavily on Xen's implementation. I tried to refactor Xen's spinlock
> > implementation to make it common for both Xen and KVM - but found that 
> > few differences between Xen and KVM (Xen has the ability to block on a
> > particular event/irq for example) _and_ the fact that the guest kernel 
> > can be compiled to support both Xen and KVM hypervisors (CONFIG_XEN and
> > CONFIG_KVM_GUEST can both be turned on) makes the "common" code a eye-sore.
> > There will have to be:
> > 
> > if (xen) {
> > ...
> > } else if (kvm) {
> > ..
> > }
> > 
> > or possibly:
> > 
> > alternative(NOP, some_xen_specific_call, )
> 
> Why not utilize the pvops path?

Duh. You did use it. Sorry about the noise.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 11/14] KVM-test: Add a subtest of changing mac address

2010-07-28 Thread Lucas Meneghel Rodrigues
On Tue, 2010-07-20 at 09:36 +0800, Amos Kong wrote:
> Mainly test steps:
> 1. get a new mac from pool, and the old mac addr of guest.
> 2. execute the mac_change.sh in guest.
> 3. relogin to guest and query the interfaces info by `ifconfig`
> 
> Signed-off-by: Cao, Chen 
> Signed-off-by: Amos Kong 
> ---
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/tests/mac_change.py 
> b/client/tests/kvm/tests/mac_change.py
> new file mode 100644
> index 000..dc93377
> --- /dev/null
> +++ b/client/tests/kvm/tests/mac_change.py
> @@ -0,0 +1,66 @@
> +import logging
> +from autotest_lib.client.common_lib import error
> +import kvm_utils, kvm_test_utils, kvm_net_utils
> +
> +
> +def run_mac_change(test, params, env):
> +"""
> +Change MAC Address of Guest.
> +
> +1. get a new mac from pool, and the old mac addr of guest.
> +2. set new mac in guest and regain new IP.
> +3. re-log into guest with new mac
> +
> +@param test: kvm test object
> +@param params: Dictionary with the test parameters
> +@param env: Dictionary with test environment.
> +"""
> +timeout = int(params.get("login_timeout", 360))
> +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +logging.info("Trying to log into guest '%s' by serial", vm.name)
> +session = kvm_utils.wait_for(lambda: vm.serial_login(),
> +  timeout, 0, step=2)

^ One thing that I forgot to comment on previous patches: For more
clarity, it'd be good to name the session variable in a way that lets
people refer easily that is a serial session

session_serial = ...

> +if not session:
> +raise error.TestFail("Could not log into guest '%s'" % vm.name)
> +
> +old_mac = vm.get_macaddr(0)
> +kvm_utils.put_mac_to_pool(vm.root_dir, old_mac, vm.instance)
> +new_mac = kvm_utils.get_mac_from_pool(vm.root_dir,
> +  vm=vm.instance,
> +  nic_index=0,
> +  prefix=vm.mac_prefix)
> +logging.info("The initial MAC address is %s" % old_mac)
> +interface = kvm_net_utils.get_linux_ifname(session, old_mac)
> +
> +# Start change mac address
> +logging.info("Changing mac address to %s" % new_mac)
> +change_cmd = "ifconfig %s down && ifconfig %s hw ether %s && ifconfig %s 
> up"\
> + % (interface, interface, new_mac, interface)
> +if session.get_command_status(change_cmd) != 0:
> +raise error.TestFail("Fail to send mac_change command")
> +
> +# Verify whether mac address is changed to new one
> +logging.info("Verifying the new mac address")
> +if session.get_command_status("ifconfig | grep -i %s" % new_mac) != 0:
> +raise error.TestFail("Fail to change mac address")
> +
> +# Restart `dhclient' to regain IP for new mac address
> +logging.info("Re-start the network to gain new ip")
> +dhclient_cmd = "dhclient -r && dhclient %s" % interface
> +session.sendline(dhclient_cmd)
> +
> +# Re-log into the guest after changing mac address
> +if kvm_utils.wait_for(session.is_responsive, 120, 20, 3):
> +# Just warning when failed to see the session become dead,
> +# because there is a little chance the ip does not change.
> +logging.warn("The session is still responsive, settings may fail.")

^ Isn't this a serial session? Then why the IP of guest changing would
make this session un-responsive? I think the best idea here is to:

1) Release the IP through dhclient -r [interface]
2) Make sure we can't stablish a ssh based session to the guest by
making a try/except block with kvm_test_utils.wait_for_login() with the
appropriate timeouts and other parameters, if succeeds, fail the test,
if it doesn't, proceed with the test.
3) Get a new IP with dhclient [interface]
4) Try to stablish a new, ssh based session to the guest and see if that
works.

> +session.close()
> +
> +# Re-log into guest and check if session is responsive
> +logging.info("Re-log into the guest")
> +session = kvm_test_utils.wait_for_login(vm,
> +  timeout=int(params.get("login_timeout", 360)))
> +if not session.is_responsive():
> +raise error.TestFail("The new session is not responsive.")

^ Is it possible that right after you stablish the session it becomes
non-responsive? It seems like a redundant verification step to me.

> +session.close()
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 5515601..7716d48 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -394,6 +394,10 @@ variants:
>  restart_vm = yes
>  pxe_timeout = 60
>  
> +- mac_change: install setup unattended_install.cdrom
> +type = mac_change
> +kill_vm = yes
> +
>  - physical_resources_check: install setup unattende

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Alex Williamson
On Thu, 2010-07-29 at 00:57 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 28, 2010 at 03:57:02PM -0600, Alex Williamson wrote:
> > 
> > Something like GET_MSIX_VECTORS seems like a user library routine to me.
> > The PCI config space is well specified and if we try to do more than
> > shortcut trivial operations (like getting the BAR length), we risk
> > losing functionality.  And for my purposes, translating to and from a
> > made up API to PCI for the guest seems like a pain.
> 
> Won't a userspace library do just as well for you?

You mean aside from qemu's reluctance to add dependencies for more
libraries?  My only concern is that I want enough virtualized/raw config
space that I'm not always translating PCI config accesses from the guest
into some userspace API.  If it makes sense to do this for things like
MSI, since I need someone to figure out what resources can actually be
allocated on the host, then maybe the library makes sense for that.
Then again, if every user needs to do this, let the vfio kernel driver
check what it can get and virtualize the available MSIs in exposed
config space, and my driver would be even happier.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/4] Paravirt-spinlock implementation for KVM guests (Version 0)

2010-07-28 Thread Konrad Rzeszutek Wilk
On Mon, Jul 26, 2010 at 11:41:50AM +0530, Srivatsa Vaddagiri wrote:
> This patch-series implements paravirt-spinlock implementation for KVM guests,
> based heavily on Xen's implementation. I tried to refactor Xen's spinlock
> implementation to make it common for both Xen and KVM - but found that 
> few differences between Xen and KVM (Xen has the ability to block on a
> particular event/irq for example) _and_ the fact that the guest kernel 
> can be compiled to support both Xen and KVM hypervisors (CONFIG_XEN and
> CONFIG_KVM_GUEST can both be turned on) makes the "common" code a eye-sore.
> There will have to be:
> 
>   if (xen) {
>   ...
>   } else if (kvm) {
>   ..
>   }
> 
> or possibly:
> 
>   alternative(NOP, some_xen_specific_call, )

Why not utilize the pvops path?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 10/14] KVM-test: Add a subtest of pxe

2010-07-28 Thread Lucas Meneghel Rodrigues
On Tue, 2010-07-20 at 09:36 +0800, Amos Kong wrote:
> This case just snoop tftp packet through tcpdump, it depends on public dhcp
> server, better to test it through dnsmasq.

It would be a good idea to have an alternate implementation using
dnsmasq, but not urgent.

> Signed-off-by: Jason Wang 
> Signed-off-by: Amos Kong 
> ---
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/tests/pxe.py b/client/tests/kvm/tests/pxe.py
> new file mode 100644
> index 000..8859aaa
> --- /dev/null
> +++ b/client/tests/kvm/tests/pxe.py
> @@ -0,0 +1,30 @@
> +import logging
> +from autotest_lib.client.common_lib import error
> +import kvm_subprocess, kvm_test_utils, kvm_utils
> +
> +
> +def run_pxe(test, params, env):
> +"""
> +PXE test:
> +
> +1) Snoop the tftp packet in the tap device
> +2) Wait for some seconds
> +3) Check whether capture tftp packets
> +
> +@param test: kvm test object
> +@param params: Dictionary with the test parameters
> +@param env: Dictionary with test environment.
> +"""
> +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +timeout = int(params.get("pxe_timeout", 60))
> +
> +logging.info("Try to boot from pxe")
> +status, output = kvm_subprocess.run_fg("tcpdump -nli %s" % 
> vm.get_ifname(),
> +   logging.debug,
> +   "(pxe) ",
> +   timeout)

^ The only complaint I could make here is that since this command
doesn't need to live throughout tests, utils.run would do just fine.
Other than that, looks fine to me.

> +logging.info("Analysing the tcpdump result...")

^ typo, analyzing 

> +if not "tftp" in output:
> +raise error.TestFail("Couldn't find tftp packet in %s seconds" % 
> timeout)
> +logging.info("Found tftp packet")
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 9594a38..5515601 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -381,6 +381,19 @@ variants:
>  mgroup_count = 20
>  flood_minutes = 1
>  
> +- pxe:
> +type = pxe
> +images = pxe
> +image_name_pxe = pxe-test
> +image_size_pxe = 1G
> +force_create_image_pxe = yes
> +remove_image_pxe = yes
> +extra_params += ' -boot n'
> +kill_vm_on_error = yes
> +network = bridge
> +restart_vm = yes
> +pxe_timeout = 60
> +
>  - physical_resources_check: install setup unattended_install.cdrom
>  type = physical_resources_check
>  catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Michael S. Tsirkin
On Wed, Jul 28, 2010 at 03:57:02PM -0600, Alex Williamson wrote:
> 
> Something like GET_MSIX_VECTORS seems like a user library routine to me.
> The PCI config space is well specified and if we try to do more than
> shortcut trivial operations (like getting the BAR length), we risk
> losing functionality.  And for my purposes, translating to and from a
> made up API to PCI for the guest seems like a pain.

Won't a userspace library do just as well for you?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Alex Williamson
On Wed, 2010-07-28 at 14:14 -0700, Tom Lyon wrote:
> On Tuesday, July 27, 2010 04:53:22 pm Michael S. Tsirkin wrote:
> > On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
> > > [ Sorry for the long hiatus, I've been wrapped up in other issues.]
> > > 
> > > I think the fundamental issue to resolve is to decide on the model which
> > > the VFIO driver presents to its users.
> > > 
> > > Fundamentally, VFIO as part of the OS must protect the system from its
> > > users and also protect the users from each other.  No disagreement here.
> > > 
> > > But another fundamental purpose of an OS to to present an abstract model
> > > of the underlying hardware to its users, so that the users don't have to
> > > deal with the full complexity of the hardware.
> > > 
> > > So I think VFIO should present a 'virtual', abstracted PCI device to its
> > > users whereas Michael has argued for a simpler model of presenting the
> > > real PCI device config registers while preventing writes only to the
> > > registers which would clearly disrupt the system.
> > 
> > In fact, there is no contradiction. I am all for an abstracted
> > API *and* I think the virtualization concept is a bad way
> > to build this API.
> > 
> > The 'virtual' interface you present is very complex and hardware specific:
> > you do not hide literally *anything*. Deciding which functionality
> > userspace needs, and exposing it to userspace as a set of APIs would be
> > abstract. Instead you ask people to go read the PCI spec, the device spec,
> > and bang on PCI registers, little-endian-ness and all, then try to
> > interpret what do the virtual values mean.
> 
> Exactly! The PCI bus is far better *specified*, *documented*, and widely 
> implemented than a Linux driver could ever hope to be.  And there are lots of 
> current Linux drivers which bang around in pci config space simply because 
> the 
> authors were not aware of some api call buried deep in linux which would do 
> the work for them - or - got tired of using OS-specific APIs when porting a 
> driver and decided to just ask the hardware.
> 
> > Example:
> > 
> > How do I find # of MSI-X vectors? Sure, scan the capability list,
> > find the capability, read the value, convert from little endian
> > at each step.
> > A page or two of code, and let's hope I have a ppc to test on.
> > And note no driver has this code - they all use OS routines.
> > 
> > So why wouldn't
> > ioctl(dev, VFIO_GET_MSIX_VECTORS, &n);
> > better serve the declared goal of presenting an abstracted PCI device to
> > users?
> 
> By and large, the user drivers just know how many because the hardware is 
> constant.

Something like GET_MSIX_VECTORS seems like a user library routine to me.
The PCI config space is well specified and if we try to do more than
shortcut trivial operations (like getting the BAR length), we risk
losing functionality.  And for my purposes, translating to and from a
made up API to PCI for the guest seems like a pain.

> And inventing 20 or 30 ioctls to do a bunch of random stuff is gross when you 
> can instead use normal read and write calls to a well defined structure.

Yep, this sounds like a job for libvfio.

> > > Now, the virtual model *could* look little like the real hardware, and
> > > use bunches of ioctls for everything it needs,
> > 
> > Or reads/writes at special offsets, or sysfs attributes.
> > 
> > > or it could look a lot like PCI and
> > > use reads and writes of the virtual PCI config registers to trigger its
> > > actions.  The latter makes things more amenable to those porting drivers
> > > from other environments.
> > 
> > I really doubt this helps at all. Drivers typically use OS-specific
> > APIs. It is very uncommon for them to touch standard registers,
> > which is 100% of what your patch seem to be dealing with.
> > 
> > And again, how about a small userspace library that would wrap vfio and
> > add the abstractions for drivers that do need them?
> 
> Yes, there will be userspace libraries - I already have a vfio backend for 
> libpci.
> > 
> > > I realize that to date the VFIO driver has been a  bit of a mish-mash
> > > between the ioctl and config based techniques; I intend to clean that
> > > up.  And, yes, the abstract model presented by VFIO will need plenty of
> > > documentation.
> > 
> > And, it will need to be maintained forever, bugs and all.
> > For example, if you change some register you emulated
> > to fix a bug, to the driver this looks like a hardware change,
> > and it will crash.
> 
> The changes will come only to allow for a more-perfect emulation, so I doubt 
> that  will cause driver problems.  No different than discovering and fixing 
> bugs in the ioctls needed in you scenario.
> 
> > 
> > The PCI spec has some weak versioning support, but it
> > is mostly not a problem in that space: a specific driver needs to
> > only deal with a specific device.  We have a generic driver so PCI
> > configuration space is a bad interface to use.
> 
> PCI h

Re: [Qemu-devel] [RFC PATCH 09/14] KVM-test: Add a subtest of multicast

2010-07-28 Thread Lucas Meneghel Rodrigues
On Tue, 2010-07-20 at 09:36 +0800, Amos Kong wrote:
> Use 'ping' to test send/recive multicat packets. Flood ping test is also 
> added.
> Limit guest network as 'bridge' mode, because multicast packets could not be
> transmitted to guest when using 'user' network.
> Add join_mcast.py for joining machine into multicast groups.
> 
> Signed-off-by: Amos Kong 
> ---
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/scripts/join_mcast.py 
> b/client/tests/kvm/scripts/join_mcast.py
> new file mode 100755
> index 000..0d90e5c
> --- /dev/null
> +++ b/client/tests/kvm/scripts/join_mcast.py
> @@ -0,0 +1,29 @@
> +import socket, struct, os, signal, sys
> +# this script is used to join machine into multicast groups
> +# author: Amos Kong 
> +
> +if len(sys.argv) < 4:
> +print """%s [mgroup_count] [prefix] [suffix]
> +mgroup_count: count of multicast addresses
> +prefix: multicast address prefix
> +suffix: multicast address suffix""" % sys.argv[0]
> +sys.exit()
> +
> +mgroup_count = int(sys.argv[1])
> +prefix = sys.argv[2]
> +suffix = int(sys.argv[3])
> +
> +s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
> +for i in range(mgroup_count):
> +mcast = prefix + "." + str(suffix + i)
> +try:
> +mreq = struct.pack("4sl", socket.inet_aton(mcast), socket.INADDR_ANY)
> +s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)
> +except:
> +s.close()
> +print "Could not join multicast: %s" % mcast
> +raise
> +
> +print "join_mcast_pid:%s" % os.getpid()
> +os.kill(os.getpid(), signal.SIGSTOP)
> +s.close()
> diff --git a/client/tests/kvm/tests/multicast.py 
> b/client/tests/kvm/tests/multicast.py
> new file mode 100644
> index 000..6b0e106
> --- /dev/null
> +++ b/client/tests/kvm/tests/multicast.py
> @@ -0,0 +1,67 @@
> +import logging, commands, os, re
> +from autotest_lib.client.common_lib import error
> +import kvm_test_utils, kvm_net_utils
> +
> +
> +def run_multicast(test, params, env):
> +"""
> +Test multicast function of nic (rtl8139/e1000/virtio)
> +
> +1) Create a VM
> +2) Join guest into multicast groups
> +3) Ping multicast addresses on host
> +4) Flood ping test with different size of packets
> +5) Final ping test and check if lose packet
> +
> +@param test: Kvm test object
> +@param params: Dictionary with the test parameters.
> +@param env: Dictionary with test environment.
> +"""
> +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +session = kvm_test_utils.wait_for_login(vm,
> +  timeout=int(params.get("login_timeout", 360)))
> +
> +# stop iptable/selinux on guest/host
> +cmd = "/etc/init.d/iptables stop && echo 0 > /selinux/enforce"

^ Different linux distros handle init scripts differently, so it's
better to just flush the firewall rules with iptables -F instead of
resorting to the init script.

Also, not all distros use selinux, so it'd be better to test for the
presence of selinux on the system (for example, by checking if there's
a /selinux/enforce file before trying to write to it). Also, need to do
proper return code checking/treatment.

> +session.get_command_status(cmd)
> +commands.getoutput(cmd)
> +# make sure guest replies to broadcasts
> +session.get_command_status("echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore"
> +"_broadcasts && echo 0 > 
> /proc/sys/net/ipv4/icmp_echo_ignore_all")
> +
> +# base multicast address
> +mcast = params.get("mcast", "225.0.0.1")
> +# count of multicast addresses, less than 20
> +mgroup_count = int(params.get("mgroup_count", 5))
> +flood_minutes = float(params.get("flood_minutes", 10))
> +ifname = vm.get_ifname()
> +prefix = re.findall("\d+.\d+.\d+", mcast)[0]
> +suffix = int(re.findall("\d+", mcast)[-1])
> +# copy python script to guest for joining guest to multicast groups
> +mcast_path = os.path.join(test.bindir, "scripts/join_mcast.py")
> +vm.copy_files_to(mcast_path, "/tmp")

^ What if copy_files_to fails? Need to do proper return handling here

> +output = session.get_command_output("python /tmp/join_mcast.py %d %s %d" 
> %
> +(mgroup_count, prefix, suffix))
> +# if success to join multicast the process will be paused, and return 
> pid.
> +if not re.findall("join_mcast_pid:(\d+)", output):
> +raise error.TestFail("Can't join multicast groups,output:%s" % 
> output)
> +pid = output.split()[0]
> +
> +try:
> +for i in range(mgroup_count):
> +new_suffix = suffix + i
> +mcast = "%s.%d" % (prefix, new_suffix)
> +logging.info("Initial ping test, mcast: %s", mcast)
> +s, o = kvm_net_utils.ping(mcast, 10, interface=ifname, 
> timeout=20)
> +if s != 0:
> +raise error.TestFail(" Ping return non-zero value %s" % o)
> +logging.info("Flo

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Michael S. Tsirkin
On Wed, Jul 28, 2010 at 02:14:21PM -0700, Tom Lyon wrote:
> On Tuesday, July 27, 2010 04:53:22 pm Michael S. Tsirkin wrote:
> > On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
> > > [ Sorry for the long hiatus, I've been wrapped up in other issues.]
> > > 
> > > I think the fundamental issue to resolve is to decide on the model which
> > > the VFIO driver presents to its users.
> > > 
> > > Fundamentally, VFIO as part of the OS must protect the system from its
> > > users and also protect the users from each other.  No disagreement here.
> > > 
> > > But another fundamental purpose of an OS to to present an abstract model
> > > of the underlying hardware to its users, so that the users don't have to
> > > deal with the full complexity of the hardware.
> > > 
> > > So I think VFIO should present a 'virtual', abstracted PCI device to its
> > > users whereas Michael has argued for a simpler model of presenting the
> > > real PCI device config registers while preventing writes only to the
> > > registers which would clearly disrupt the system.
> > 
> > In fact, there is no contradiction. I am all for an abstracted
> > API *and* I think the virtualization concept is a bad way
> > to build this API.
> > 
> > The 'virtual' interface you present is very complex and hardware specific:
> > you do not hide literally *anything*. Deciding which functionality
> > userspace needs, and exposing it to userspace as a set of APIs would be
> > abstract. Instead you ask people to go read the PCI spec, the device spec,
> > and bang on PCI registers, little-endian-ness and all, then try to
> > interpret what do the virtual values mean.
> 
> Exactly! The PCI bus is far better *specified*, *documented*, and widely 
> implemented than a Linux driver could ever hope to be.

Yes but it does not map all that well to what you need to do.
We need a sane backward compatibility plan, cross-platform support,
error reporting, atomicity ... PCI config has support for none of this.
So you implement a "kind of" PCI config, where accesses might fail
or not go through to device, where there are some atomicity guarantees
but not others ...
And there won't even be a header file to look at to say "aha,
this driver has this functionality".
How does an application know whether you support capability X?
Reading the driver source seems to be shaping up the only way.

>  And there are lots of 
> current Linux drivers which bang around in pci config space simply because 
> the 
> authors were not aware of some api call buried deep in linux which would do 
> the work for them - or - got tired of using OS-specific APIs when porting a 
> driver and decided to just ask the hardware.

Really? Example? drivers either use proper APIs or are broken in some way.
You can not even size the BARs without using the OS API.
So what's safe to do directly? Maybe reading out device/vendor/revision ID ...
looks like small change to me.

> 
> > Example:
> > 
> > How do I find # of MSI-X vectors? Sure, scan the capability list,
> > find the capability, read the value, convert from little endian
> > at each step.
> > A page or two of code, and let's hope I have a ppc to test on.
> > And note no driver has this code - they all use OS routines.
> > 
> > So why wouldn't
> > ioctl(dev, VFIO_GET_MSIX_VECTORS, &n);
> > better serve the declared goal of presenting an abstracted PCI device to
> > users?
> 
> By and large, the user drivers just know how many because the hardware is 
> constant.

But you might not have CPU resources to allocate all vectors.
And, same will apply to any register you spend code virtualizing.

> And inventing 20 or 30 ioctls to do a bunch of random stuff is gross


If you dislike ioctls, use read/write at a defined offset,
or sysfs. Just don't pretend you can say "look at PCI spec"
and avoid the need to document your interface this way.

> when you 
> can instead use normal read and write calls to a well defined structure.

It is not all that well defined.
What if hardware supports MSIX but host controller does not?
Do you return error from write enabling MSIX?
Virtualize it and pretend there is no capability?
PCI has no provision for this, and deciding what to do
here is policy which kernel should not dictate.


> > 
> > > Now, the virtual model *could* look little like the real hardware, and
> > > use bunches of ioctls for everything it needs,
> > 
> > Or reads/writes at special offsets, or sysfs attributes.
> > 
> > > or it could look a lot like PCI and
> > > use reads and writes of the virtual PCI config registers to trigger its
> > > actions.  The latter makes things more amenable to those porting drivers
> > > from other environments.
> > 
> > I really doubt this helps at all. Drivers typically use OS-specific
> > APIs. It is very uncommon for them to touch standard registers,
> > which is 100% of what your patch seem to be dealing with.
> > 
> > And again, how about a small userspace library that would wrap vfio and
> > add the abstrac

Re: [RFC PATCH 08/14] KVM-test: Add a subtest of nic promisc

2010-07-28 Thread Lucas Meneghel Rodrigues
On Tue, 2010-07-20 at 09:35 +0800, Amos Kong wrote:
> This test mainly covers TCP sent from host to guest and from guest to host
> with repeatedly turn on/off NIC promiscuous mode.
> 
> Signed-off-by: Amos Kong 
> ---
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/tests/nic_promisc.py 
> b/client/tests/kvm/tests/nic_promisc.py
> new file mode 100644
> index 000..9a0c979
> --- /dev/null
> +++ b/client/tests/kvm/tests/nic_promisc.py
> @@ -0,0 +1,87 @@
> +import logging, commands
> +from autotest_lib.client.common_lib import error
> +import kvm_utils, kvm_test_utils, kvm_net_utils
> +
> +def run_nic_promisc(test, params, env):
> +"""
> +Test nic driver in promisc mode:
> +
> +1) Boot up a guest
> +2) Repeatedly enable/disable promiscuous mode in guest
> +3) TCP data transmission from host to guest, and from guest to host,
> +   with 1/1460/65000/1 bytes payloads
> +4) Clean temporary files
> +5) Stop enable/disable promiscuous mode change
> +
> +@param test: kvm test object
> +@param params: Dictionary with the test parameters
> +@param env: Dictionary with test environment
> +"""
> +timeout = int(params.get("login_timeout", 360))
> +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +session = kvm_test_utils.wait_for_login(vm, timeout=timeout)
> +logging.info("Trying to log into guest '%s' by serial", vm.name)
> +session2 = kvm_utils.wait_for(lambda: vm.serial_login(),
> +  timeout, 0, step=2)
> +if not session2:
> +raise error.TestFail("Could not log into guest '%s'" % vm.name)
> +
> +def compare(filename):
> +cmd = "md5sum %s" % filename
> +s1, ret_host = commands.getstatusoutput(cmd)
> +s2, ret_guest = session.get_command_status_output(cmd)
> +if s1 != 0 or s2 != 0:
> +logging.debug("ret_host:%s, ret_guest:%s" % (ret_host, 
> ret_guest))
> +logging.error("Could not get md5, cmd:%s" % cmd)
> +return False
> +if ret_host.strip() != ret_guest.strip():
> +logging.debug("ret_host :%s, ret_guest:%s" % (ret_host, 
> ret_guest))
> +logging.error("Files' md5sum mismatch" % (receiver))
> +return False

^ The above debug messages will be confusing when looked by someone who
is not familiar with the test code, so we should make their lives
easier:

def compare(filename):
cmd = "md5sum %s" % filename
rc_host, md5_host = commands.getstatusoutput(cmd)
rc_guest, md5_guest = session.get_command_status_output(cmd)
if rc_host:
logging.debug('Could not get MD5 hash for file %s on host, output: 
%s', filename, md5_host)
return False
if rc_guest:
logging.debug('Could not get MD5 hash for file %s on guest, output: 
%s', filename, md5_guest)
return False
md5host = md5host.strip()
md5guest = md5guest.strip()
if md5host != md5guest:
logging.error('MD5 hash mismatch between file %s present on guest 
and on host', filename)
logging.error('MD5 hash for file on guest: %s, MD5 hash for file on 
host: %s', md5_host, md5_guest)
return False
return True


> +return True
> +
> +ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0))
> +set_promisc_cmd = "ip link set %s promisc on; sleep 0.01;" % ethname
> +set_promisc_cmd += "ip link set %s promisc off; sleep 0.01" % ethname

^ You could do the above on a single attribution, see comment on patch 7
of the patchseries.

> +logging.info("Set promisc change repeatedly in guest")
> +session2.sendline("while true; do %s; done" % set_promisc_cmd)
> +
> +dd_cmd = "dd if=/dev/urandom of=%s bs=%d count=1"
> +filename = "/tmp/nic_promisc_file"
> +file_size = params.get("file_size", "1, 1460, 65000, 
> 1").split(",")
> +try:
> +for size in file_size:
> +logging.info("Create %s bytes file on host" % size)
> +s, o = commands.getstatusoutput(dd_cmd % (filename, int(size)))
> +if s != 0:
> +logging.debug("Output: %s"% o)
> +raise error.TestFail("Create file on host failed")
> +
> +logging.info("Transfer file from host to guest")
> +if not vm.copy_files_to(filename, filename):
> +raise error.TestFail("File transfer failed")
> +if not compare(filename):
> +raise error.TestFail("Compare file failed")

^ It'd be better if we don't abruptly fail the whole test if we get a
failure for a single size, what about having a global failure counter,
and increment it if we have failures, making sure we log errors
appropriately?

> +logging.info("Create %s bytes file on guest" % size)
> +if session.get_command_status(dd_cmd % (filename, int(size))

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Tom Lyon
On Tuesday, July 27, 2010 04:53:22 pm Michael S. Tsirkin wrote:
> On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
> > [ Sorry for the long hiatus, I've been wrapped up in other issues.]
> > 
> > I think the fundamental issue to resolve is to decide on the model which
> > the VFIO driver presents to its users.
> > 
> > Fundamentally, VFIO as part of the OS must protect the system from its
> > users and also protect the users from each other.  No disagreement here.
> > 
> > But another fundamental purpose of an OS to to present an abstract model
> > of the underlying hardware to its users, so that the users don't have to
> > deal with the full complexity of the hardware.
> > 
> > So I think VFIO should present a 'virtual', abstracted PCI device to its
> > users whereas Michael has argued for a simpler model of presenting the
> > real PCI device config registers while preventing writes only to the
> > registers which would clearly disrupt the system.
> 
> In fact, there is no contradiction. I am all for an abstracted
> API *and* I think the virtualization concept is a bad way
> to build this API.
> 
> The 'virtual' interface you present is very complex and hardware specific:
> you do not hide literally *anything*. Deciding which functionality
> userspace needs, and exposing it to userspace as a set of APIs would be
> abstract. Instead you ask people to go read the PCI spec, the device spec,
> and bang on PCI registers, little-endian-ness and all, then try to
> interpret what do the virtual values mean.

Exactly! The PCI bus is far better *specified*, *documented*, and widely 
implemented than a Linux driver could ever hope to be.  And there are lots of 
current Linux drivers which bang around in pci config space simply because the 
authors were not aware of some api call buried deep in linux which would do 
the work for them - or - got tired of using OS-specific APIs when porting a 
driver and decided to just ask the hardware.

> Example:
> 
> How do I find # of MSI-X vectors? Sure, scan the capability list,
> find the capability, read the value, convert from little endian
> at each step.
> A page or two of code, and let's hope I have a ppc to test on.
> And note no driver has this code - they all use OS routines.
> 
> So why wouldn't
>   ioctl(dev, VFIO_GET_MSIX_VECTORS, &n);
> better serve the declared goal of presenting an abstracted PCI device to
> users?

By and large, the user drivers just know how many because the hardware is 
constant.

And inventing 20 or 30 ioctls to do a bunch of random stuff is gross when you 
can instead use normal read and write calls to a well defined structure.
> 
> > Now, the virtual model *could* look little like the real hardware, and
> > use bunches of ioctls for everything it needs,
> 
> Or reads/writes at special offsets, or sysfs attributes.
> 
> > or it could look a lot like PCI and
> > use reads and writes of the virtual PCI config registers to trigger its
> > actions.  The latter makes things more amenable to those porting drivers
> > from other environments.
> 
> I really doubt this helps at all. Drivers typically use OS-specific
> APIs. It is very uncommon for them to touch standard registers,
> which is 100% of what your patch seem to be dealing with.
> 
> And again, how about a small userspace library that would wrap vfio and
> add the abstractions for drivers that do need them?

Yes, there will be userspace libraries - I already have a vfio backend for 
libpci.
> 
> > I realize that to date the VFIO driver has been a  bit of a mish-mash
> > between the ioctl and config based techniques; I intend to clean that
> > up.  And, yes, the abstract model presented by VFIO will need plenty of
> > documentation.
> 
> And, it will need to be maintained forever, bugs and all.
> For example, if you change some register you emulated
> to fix a bug, to the driver this looks like a hardware change,
> and it will crash.

The changes will come only to allow for a more-perfect emulation, so I doubt 
that  will cause driver problems.  No different than discovering and fixing 
bugs in the ioctls needed in you scenario.

> 
> The PCI spec has some weak versioning support, but it
> is mostly not a problem in that space: a specific driver needs to
> only deal with a specific device.  We have a generic driver so PCI
> configuration space is a bad interface to use.

PCI has great versioning. Damn near every change made in 16+ years has been 
upwards compatible.  BIOS and OS writers don't have trouble with generic PCI, 
why should vfio?

> 
> > Since KVM/qemu already has its own notion of a virtual PCI device which
> > it presents to the guest OS, we either need to reconcile VFIO and qemu,
> > or provide a bypass of the VFIO virtual model.  This could be direct
> > access through sysfs, or else an ioctl to VFIO.  Since I have no
> > internals knowledge of qemu, I look to others to choose.
> 
> Ah, so there will be 2 APIs, one for qemu, one for userspace drivers?

I hope not,

Re: [PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Mohammed Gamal
On Wed, Jul 28, 2010 at 11:32 PM, Marcelo Tosatti  wrote:
> On Wed, Jul 28, 2010 at 11:28:05PM +0300, Mohammed Gamal wrote:
>> On Wed, Jul 28, 2010 at 10:55 PM, Marcelo Tosatti  
>> wrote:
>> > On Wed, Jul 28, 2010 at 12:39:01PM +0300, Mohammed Gamal wrote:
>> >> This adds a unit test for real mode emulation of the iret instruction
>> >>
>> >> Signed-off-by: Mohammed Gamal 
>> >>
>> >> ---
>> >> Changes from v1:
>> >> - Added test for 16-bit iret
>> >> - Added tests for returned eflags
>> >> ---
>> >>  kvm/test/x86/realmode.c |   79 
>> >> +++
>> >>  1 files changed, 79 insertions(+), 0 deletions(-)
>> >>
>> >> diff --git a/kvm/test/x86/realmode.c b/kvm/test/x86/realmode.c
>> >> index bd79348..74456c3 100644
>> >> --- a/kvm/test/x86/realmode.c
>> >> +++ b/kvm/test/x86/realmode.c
>> >> @@ -865,6 +865,84 @@ void test_pusha_popa()
>> >>               print_serial("Pusha/Popa Test2: PASS\n");
>> >>  }
>> >>
>> >> +void test_iret()
>> >> +{
>> >> +     struct regs inregs = { 0 }, outregs;
>> >> +
>> >> +     MK_INSN(iret32, "pushf\n\t"
>> >> +                     "pushl %cs\n\t"
>> >> +                     "call 1f\n\t" /* a near call will push eip onto the 
>> >> stack */
>> >> +                     "jmp 2f\n\t"
>> >> +                     "1: iret\n\t"
>> >> +                     "2:\n\t"
>> >> +                  );
>> >> +
>> >> +     MK_INSN(iret16, "pushfw\n\t"
>> >> +                     "pushw %cs\n\t"
>> >> +                     "callw 1f\n\t"
>> >> +                     "jmp 2f\n\t"
>> >> +                     "1: iretw\n\t"
>> >> +                     "2:\n\t");
>> >
>> > Unless iret causes a task switch, it will not cause an exit.
>> >
>> >
>> The test covers only the real mode case, we won't have a task switch there, 
>> no?
>
> No. And there is no vmexit for iret, so it does not go through the
> emulator.
>
I'm pretty sure the iret instruction went through the emulator. I saw
the instruction opcode (0xcf) being in ftrace.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Marcelo Tosatti
On Wed, Jul 28, 2010 at 11:28:05PM +0300, Mohammed Gamal wrote:
> On Wed, Jul 28, 2010 at 10:55 PM, Marcelo Tosatti  wrote:
> > On Wed, Jul 28, 2010 at 12:39:01PM +0300, Mohammed Gamal wrote:
> >> This adds a unit test for real mode emulation of the iret instruction
> >>
> >> Signed-off-by: Mohammed Gamal 
> >>
> >> ---
> >> Changes from v1:
> >> - Added test for 16-bit iret
> >> - Added tests for returned eflags
> >> ---
> >>  kvm/test/x86/realmode.c |   79 
> >> +++
> >>  1 files changed, 79 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/kvm/test/x86/realmode.c b/kvm/test/x86/realmode.c
> >> index bd79348..74456c3 100644
> >> --- a/kvm/test/x86/realmode.c
> >> +++ b/kvm/test/x86/realmode.c
> >> @@ -865,6 +865,84 @@ void test_pusha_popa()
> >>               print_serial("Pusha/Popa Test2: PASS\n");
> >>  }
> >>
> >> +void test_iret()
> >> +{
> >> +     struct regs inregs = { 0 }, outregs;
> >> +
> >> +     MK_INSN(iret32, "pushf\n\t"
> >> +                     "pushl %cs\n\t"
> >> +                     "call 1f\n\t" /* a near call will push eip onto the 
> >> stack */
> >> +                     "jmp 2f\n\t"
> >> +                     "1: iret\n\t"
> >> +                     "2:\n\t"
> >> +                  );
> >> +
> >> +     MK_INSN(iret16, "pushfw\n\t"
> >> +                     "pushw %cs\n\t"
> >> +                     "callw 1f\n\t"
> >> +                     "jmp 2f\n\t"
> >> +                     "1: iretw\n\t"
> >> +                     "2:\n\t");
> >
> > Unless iret causes a task switch, it will not cause an exit.
> >
> >
> The test covers only the real mode case, we won't have a task switch there, 
> no?

No. And there is no vmexit for iret, so it does not go through the
emulator.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Mohammed Gamal
On Wed, Jul 28, 2010 at 10:55 PM, Marcelo Tosatti  wrote:
> On Wed, Jul 28, 2010 at 12:39:01PM +0300, Mohammed Gamal wrote:
>> This adds a unit test for real mode emulation of the iret instruction
>>
>> Signed-off-by: Mohammed Gamal 
>>
>> ---
>> Changes from v1:
>> - Added test for 16-bit iret
>> - Added tests for returned eflags
>> ---
>>  kvm/test/x86/realmode.c |   79 
>> +++
>>  1 files changed, 79 insertions(+), 0 deletions(-)
>>
>> diff --git a/kvm/test/x86/realmode.c b/kvm/test/x86/realmode.c
>> index bd79348..74456c3 100644
>> --- a/kvm/test/x86/realmode.c
>> +++ b/kvm/test/x86/realmode.c
>> @@ -865,6 +865,84 @@ void test_pusha_popa()
>>               print_serial("Pusha/Popa Test2: PASS\n");
>>  }
>>
>> +void test_iret()
>> +{
>> +     struct regs inregs = { 0 }, outregs;
>> +
>> +     MK_INSN(iret32, "pushf\n\t"
>> +                     "pushl %cs\n\t"
>> +                     "call 1f\n\t" /* a near call will push eip onto the 
>> stack */
>> +                     "jmp 2f\n\t"
>> +                     "1: iret\n\t"
>> +                     "2:\n\t"
>> +                  );
>> +
>> +     MK_INSN(iret16, "pushfw\n\t"
>> +                     "pushw %cs\n\t"
>> +                     "callw 1f\n\t"
>> +                     "jmp 2f\n\t"
>> +                     "1: iretw\n\t"
>> +                     "2:\n\t");
>
> Unless iret causes a task switch, it will not cause an exit.
>
>
The test covers only the real mode case, we won't have a task switch there, no?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Marcelo Tosatti
On Wed, Jul 28, 2010 at 12:39:01PM +0300, Mohammed Gamal wrote:
> This adds a unit test for real mode emulation of the iret instruction
> 
> Signed-off-by: Mohammed Gamal 
> 
> ---
> Changes from v1:
> - Added test for 16-bit iret
> - Added tests for returned eflags
> ---
>  kvm/test/x86/realmode.c |   79 
> +++
>  1 files changed, 79 insertions(+), 0 deletions(-)
> 
> diff --git a/kvm/test/x86/realmode.c b/kvm/test/x86/realmode.c
> index bd79348..74456c3 100644
> --- a/kvm/test/x86/realmode.c
> +++ b/kvm/test/x86/realmode.c
> @@ -865,6 +865,84 @@ void test_pusha_popa()
>   print_serial("Pusha/Popa Test2: PASS\n");
>  }
>  
> +void test_iret()
> +{
> + struct regs inregs = { 0 }, outregs;
> +
> + MK_INSN(iret32, "pushf\n\t"
> + "pushl %cs\n\t"
> + "call 1f\n\t" /* a near call will push eip onto the 
> stack */
> + "jmp 2f\n\t"
> + "1: iret\n\t"
> + "2:\n\t"
> +  );
> +
> + MK_INSN(iret16, "pushfw\n\t"
> + "pushw %cs\n\t"
> + "callw 1f\n\t"
> + "jmp 2f\n\t"
> + "1: iretw\n\t"
> + "2:\n\t");

Unless iret causes a task switch, it will not cause an exit.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: Check for pending events before attempting injection

2010-07-28 Thread Marcelo Tosatti
On Wed, Jul 28, 2010 at 07:53:32PM +0300, Avi Kivity wrote:
>  On 07/28/2010 07:37 PM, Marcelo Tosatti wrote:
> >On Wed, Jul 28, 2010 at 07:31:03PM +0300, Avi Kivity wrote:
> >>  On 07/28/2010 07:21 PM, Marcelo Tosatti wrote:
> >>>On Tue, Jul 27, 2010 at 04:19:35PM +0300, Avi Kivity wrote:
> Instead of blindly attempting to inject an event before each guest entry,
> check for a possible event first in vcpu->requests.  Sites that can 
> trigger
> event injection are modified to set KVM_REQ_EVENT:
> 
> - interrupt, nmi window opening
> - ppr updates
> - i8259 output changes
> - local apic irr changes
> - rflags updates
> - gif flag set
> - event set on exit
> 
> This improves non-injecting entry performance, and sets the stage for
> non-atomic injection.
> 
> Signed-off-by: Avi Kivity
> ---
>   arch/x86/kvm/i8259.c |1 +
>   arch/x86/kvm/lapic.c |   12 ++--
>   arch/x86/kvm/svm.c   |8 +++-
>   arch/x86/kvm/vmx.c   |6 ++
>   arch/x86/kvm/x86.c   |   35 ++-
>   include/linux/kvm_host.h |1 +
>   6 files changed, 51 insertions(+), 12 deletions(-)
> 
> @@ -4731,17 +4737,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   goto out;
>   }
> 
> - inject_pending_event(vcpu);
> + if (kvm_check_request(KVM_REQ_EVENT, vcpu)) {
> + inject_pending_event(vcpu);
> 
> - /* enable NMI/IRQ window open exits if needed */
> - if (vcpu->arch.nmi_pending)
> - kvm_x86_ops->enable_nmi_window(vcpu);
> - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> - kvm_x86_ops->enable_irq_window(vcpu);
> + /* enable NMI/IRQ window open exits if needed */
> + if (vcpu->arch.nmi_pending)
> + kvm_x86_ops->enable_nmi_window(vcpu);
> + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> + kvm_x86_ops->enable_irq_window(vcpu);
> >>>Problem is it might not be possible to inject the event signalled by
> >>>KVM_REQ_EVENT, say an interrupt from an irqchip, if there is an event
> >>>that needs reinjection (or an exception).
> >>That can happen event now, no?  A pending exception, interrupt comes
> >>along, injection picks up the exception but leaves the interrupt.
> >>
> >>Now the situation can be more complicated:
> >>
> >>- pending exception
> >>- injection
> >>- interrupt, sets KVM_REQ_EVENT
> >>- notices KVM_REQ_EVENT
> >>- drops KVM_REQ_EVENT, cancels exception (made pending again)
> >>- goes back
> >>- injection (injects exception again, interrupt is pending)
> >>
> >>as far as I can tell, this is all fine.
> >But you cleared KVM_REQ_EVENT. Which means you're not going to inject
> >the pending interrupt on the next entry.
> 
> Doh.  So we need to set KVM_REQ_EVENT again, after the final check
> for vcpu->requests, to make sure we redo injection again.
> 
> So we can make inject_pending_event() return true if there's more in
> the queue, and if it did, re-raise KVM_REQ_EVENT just before entry?

Yeah, that would do it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 07/14] KVM-test: Add a subtest of load/unload nic driver

2010-07-28 Thread Lucas Meneghel Rodrigues
On Tue, 2010-07-20 at 09:35 +0800, Amos Kong wrote:
> Repeatedly load/unload nic driver, try to transfer file between guest and host
> by threads at the same time, and check the md5sum.
> 
> Signed-off-by: Amos Kong 
> ---
>  0 files changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/tests/nicdriver_unload.py 
> b/client/tests/kvm/tests/nicdriver_unload.py
> new file mode 100644
> index 000..22f9f44
> --- /dev/null
> +++ b/client/tests/kvm/tests/nicdriver_unload.py
> @@ -0,0 +1,128 @@
> +import logging, commands, threading, re, os
> +from autotest_lib.client.common_lib import error
> +import kvm_utils, kvm_test_utils, kvm_net_utils
> +
> +def run_nicdriver_unload(test, params, env):
> +"""
> +Test nic driver
> +
> +1) Boot a vm
> +2) Get the nic driver name
> +3) Repeatedly unload/load nic driver
> +4) Multi-session TCP transfer on test interface
> +5) Check the test interface should still work
> +
> +@param test: KVM test object
> +@param params: Dictionary with the test parameters
> +@param env: Dictionary with test environment.
> +"""
> +timeout = int(params.get("login_timeout", 360))
> +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +session = kvm_test_utils.wait_for_login(vm, timeout=timeout)
> +logging.info("Trying to log into guest '%s' by serial", vm.name)
> +session2 = kvm_utils.wait_for(lambda: vm.serial_login(),
> +  timeout, 0, step=2)
> +if not session2:
> +raise error.TestFail("Could not log into guest '%s'" % vm.name)
> +
> +ethname = kvm_net_utils.get_linux_ifname(session, vm.get_macaddr(0))
> +try:
> +# FIXME: Try three waies to get nic driver name, because the
> +# modprobe.conf is dropped in latest system, and ethtool method is 
> not
> +# supported by virtio_nic.
> +
> +output = session.get_command_output("cat /etc/modprobe.conf")
> +driver = re.findall(r'%s (\w+)' % ethname,output)
> +if not driver:
> +output = session.get_command_output("ethtool -i %s" % ethname)
> +driver = re.findall(r'driver: (\w+)', output)
> +if not driver:
> +output = session.get_command_output("lspci -k")
> +driver = re.findall("Ethernet controller.*\n.*\n.*Kernel driver"
> +" in use: (\w+)", output)
> +driver = driver[0]
> +except IndexError:
> +raise error.TestError("Could not find driver name")

^ About this whole block where there's an attempt to discover what the
driver name is. The methods listed there are not reliable (depends on
ethtool being installed, lspci -k not allways will list the kernel
driver in use, /etc/modprobe.conf will not be present on more recent
distributions). Instead of these methods, why don't we have a variant
for this test on each linux distro definition block, with the
appropriate driver names? It'd be something along the lines:

- 13.64:
image_name = f13-64
cdrom_cd1 = linux/Fedora-13-x86_64-DVD.iso
md5sum = 6fbae6379cf27f36e1f2c7827ba7dc35
md5sum_1m = 68821b9de4d3b5975d6634334e7f47a6
unattended_install:
unattended_file = unattended/Fedora-13.ks
tftp = images/f13-64/tftpboot
floppy = images/f13-64/floppy.img
nicdriver_unload:
e1000:
driver = e1000c
virtio:
driver = virtio_net
rtl8139:
driver = rtl8939

I believe it's safer than having to rely on the above methods.

> +
> +logging.info("driver is %s" % driver)
> +
> +class ThreadScp(threading.Thread):
> +def run(self):
> +remote_file = '/tmp/' + self.getName()
> +file_list.append(remote_file)
> +ret = vm.copy_files_to(file_name, remote_file, 
> timeout=scp_timeout)
> +logging.debug("Copy result of %s: %s" % (remote_file, ret))

^ Here it'd be worth to have 2 debug messages,

if ret:
logging.debug("File %s was transfered successfuly", remote_file)
else:
logging.debug("Failed to transfer file %s", remote_file)

> +def compare(origin_file, receive_file):
> +cmd = "md5sum %s"
> +output1 = commands.getstatusoutput(cmd % origin_file)[1].strip()

^ If we are only interested on the output, getoutput() could be used.
But in this case we care whether md5 succeeded or not, so better to do
appropriate return code treatment, as you do below. Also, you could use
utils.hash() instead of using directly md5, and that *must* yield that
exact same result, being faster than resorting to subprocesses.

Also, remembering that I do

Re: [PATCH 6/6] test: verify that the emulator honours svm intercepts

2010-07-28 Thread Avi Kivity

 On 07/28/2010 07:25 PM, Avi Kivity wrote:


+static void corrupt_cr3_intercept_bypass(void *_test)
+{
+struct test *test = _test;
+extern volatile u32 mmio_insn;
+
+while (!__sync_bool_compare_and_swap(&test->scratch, 1, 2))
+pause();
+pause();
+pause();
+pause();
+mmio_insn = 0x90d8200f;  // mov %cr3, %rax; nop
+}
+
+static void prepare_cr3_intercept_bypass(struct test *test)
+{
+default_prepare(test);
+test->vmcb->control.intercept_cr_read |= 1<<  3;
+on_cpu_async(1, corrupt_cr3_intercept_bypass, test);
+}
+
+static void test_cr3_intercept_bypass(struct test *test)
+{
+ulong a = 0xa;
+
+test->scratch = 1;
+while (test->scratch != 2)
+barrier();
+
+asm volatile ("mmio_insn: mov %0, (%0); nop"
+  : "+a"(a) : : "memory");
+test->scratch = a;
+}


ftrace makes it quite easy to see how things go wrong:

 qemu-system-x86-10545 [004] 98291.582062: kvm_exit: reason vmrun rip 
0x4003d4


step into the guest


 qemu-system-x86-10546 [006] 98291.582064: kvm_inj_virq: irq 32


here's out evil IPI

 qemu-system-x86-10545 [004] 98291.582064: kvm_nested_vmrun: rip: 
0x004003d1 vmcb: 0x07ff8000 nrip: 0x00400330 
int_ctl: 0x event_inj: 0x npt: off

 qemu-system-x86-10546 [006] 98291.582065: kvm_inj_virq: irq 32
 qemu-system-x86-10545 [004] 98291.582065: kvm_nested_intercepts: 
cr_read: 0008 cr_write:  excp:  intercept: 0003


Note cr3 reads are intercepted


 qemu-system-x86-10546 [006] 98291.582065: kvm_entry: vcpu 1
 qemu-system-x86-10545 [004] 98291.582070: kvm_entry: vcpu 0
 qemu-system-x86-10545 [004] 98291.582072: kvm_exit: reason npf rip 
0x400368
 qemu-system-x86-10545 [004] 98291.582073: kvm_nested_vmexit: rip: 
0x00400368 reason: npf ext_inf1: 0x0006 ext_inf2: 
0x000a ext_int: 0x ext_int_err: 0x


access assigned mmio -> trap to host

 qemu-system-x86-10546 [006] 98291.582074: kvm_exit: reason npf rip 
0x4013c9
 qemu-system-x86-10545 [004] 98291.582074: kvm_page_fault: address 
a error_code 6
 qemu-system-x86-10546 [006] 98291.582074: kvm_page_fault: address 
fee000b0 error_code 6


IPI on its way out

 qemu-system-x86-10545 [004] 98291.582075: kvm_emulate_insn: 0:400368: 
0f 20 d8 (prot64)


Emulating mov %cr3, %rax

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Nested SVM fix and next_rip emulation

2010-07-28 Thread Marcelo Tosatti
On Tue, Jul 27, 2010 at 06:14:19PM +0200, Joerg Roedel wrote:
> Hi Avi, Marcelo,
> 
> here are two small patches for kvm-amd. The first patch fixes a small bug in
> nested svm efer handling. The second patch implements emulation for the
> next_rip feature.

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add IRET instruction

2010-07-28 Thread Marcelo Tosatti
On Wed, Jul 28, 2010 at 12:38:40PM +0300, Mohammed Gamal wrote:
> Ths patch adds IRET instruction (opcode 0xcf).
> Currently, only IRET in real mode is emulated. Protected mode support is to 
> be added later if needed.
> 
> Signed-off-by: Mohammed Gamal 
> 
> 
> Changes from v2:
> - Delay committing cs until all pop operation are ensured to be successful
> - Corrected handling for eflags
> ---
>  arch/x86/kvm/emulate.c |   81 
> 
>  1 files changed, 81 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: Check for pending events before attempting injection

2010-07-28 Thread Avi Kivity

 On 07/28/2010 07:37 PM, Marcelo Tosatti wrote:

On Wed, Jul 28, 2010 at 07:31:03PM +0300, Avi Kivity wrote:

  On 07/28/2010 07:21 PM, Marcelo Tosatti wrote:

On Tue, Jul 27, 2010 at 04:19:35PM +0300, Avi Kivity wrote:

Instead of blindly attempting to inject an event before each guest entry,
check for a possible event first in vcpu->requests.  Sites that can trigger
event injection are modified to set KVM_REQ_EVENT:

- interrupt, nmi window opening
- ppr updates
- i8259 output changes
- local apic irr changes
- rflags updates
- gif flag set
- event set on exit

This improves non-injecting entry performance, and sets the stage for
non-atomic injection.

Signed-off-by: Avi Kivity
---
  arch/x86/kvm/i8259.c |1 +
  arch/x86/kvm/lapic.c |   12 ++--
  arch/x86/kvm/svm.c   |8 +++-
  arch/x86/kvm/vmx.c   |6 ++
  arch/x86/kvm/x86.c   |   35 ++-
  include/linux/kvm_host.h |1 +
  6 files changed, 51 insertions(+), 12 deletions(-)

@@ -4731,17 +4737,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
goto out;
}

-   inject_pending_event(vcpu);
+   if (kvm_check_request(KVM_REQ_EVENT, vcpu)) {
+   inject_pending_event(vcpu);

-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu->arch.nmi_pending)
-   kvm_x86_ops->enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops->enable_irq_window(vcpu);
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu->arch.nmi_pending)
+   kvm_x86_ops->enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops->enable_irq_window(vcpu);

Problem is it might not be possible to inject the event signalled by
KVM_REQ_EVENT, say an interrupt from an irqchip, if there is an event
that needs reinjection (or an exception).

That can happen event now, no?  A pending exception, interrupt comes
along, injection picks up the exception but leaves the interrupt.

Now the situation can be more complicated:

- pending exception
- injection
- interrupt, sets KVM_REQ_EVENT
- notices KVM_REQ_EVENT
- drops KVM_REQ_EVENT, cancels exception (made pending again)
- goes back
- injection (injects exception again, interrupt is pending)

as far as I can tell, this is all fine.

But you cleared KVM_REQ_EVENT. Which means you're not going to inject
the pending interrupt on the next entry.


Doh.  So we need to set KVM_REQ_EVENT again, after the final check for 
vcpu->requests, to make sure we redo injection again.


So we can make inject_pending_event() return true if there's more in the 
queue, and if it did, re-raise KVM_REQ_EVENT just before entry?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: Check for pending events before attempting injection

2010-07-28 Thread Marcelo Tosatti
On Wed, Jul 28, 2010 at 07:31:03PM +0300, Avi Kivity wrote:
>  On 07/28/2010 07:21 PM, Marcelo Tosatti wrote:
> >On Tue, Jul 27, 2010 at 04:19:35PM +0300, Avi Kivity wrote:
> >>Instead of blindly attempting to inject an event before each guest entry,
> >>check for a possible event first in vcpu->requests.  Sites that can trigger
> >>event injection are modified to set KVM_REQ_EVENT:
> >>
> >>- interrupt, nmi window opening
> >>- ppr updates
> >>- i8259 output changes
> >>- local apic irr changes
> >>- rflags updates
> >>- gif flag set
> >>- event set on exit
> >>
> >>This improves non-injecting entry performance, and sets the stage for
> >>non-atomic injection.
> >>
> >>Signed-off-by: Avi Kivity
> >>---
> >>  arch/x86/kvm/i8259.c |1 +
> >>  arch/x86/kvm/lapic.c |   12 ++--
> >>  arch/x86/kvm/svm.c   |8 +++-
> >>  arch/x86/kvm/vmx.c   |6 ++
> >>  arch/x86/kvm/x86.c   |   35 ++-
> >>  include/linux/kvm_host.h |1 +
> >>  6 files changed, 51 insertions(+), 12 deletions(-)
> >>
> >>@@ -4731,17 +4737,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >>goto out;
> >>}
> >>
> >>-   inject_pending_event(vcpu);
> >>+   if (kvm_check_request(KVM_REQ_EVENT, vcpu)) {
> >>+   inject_pending_event(vcpu);
> >>
> >>-   /* enable NMI/IRQ window open exits if needed */
> >>-   if (vcpu->arch.nmi_pending)
> >>-   kvm_x86_ops->enable_nmi_window(vcpu);
> >>-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>-   kvm_x86_ops->enable_irq_window(vcpu);
> >>+   /* enable NMI/IRQ window open exits if needed */
> >>+   if (vcpu->arch.nmi_pending)
> >>+   kvm_x86_ops->enable_nmi_window(vcpu);
> >>+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> >>+   kvm_x86_ops->enable_irq_window(vcpu);
> >Problem is it might not be possible to inject the event signalled by
> >KVM_REQ_EVENT, say an interrupt from an irqchip, if there is an event
> >that needs reinjection (or an exception).
> 
> That can happen event now, no?  A pending exception, interrupt comes
> along, injection picks up the exception but leaves the interrupt.
> 
> Now the situation can be more complicated:
> 
> - pending exception
> - injection
> - interrupt, sets KVM_REQ_EVENT
> - notices KVM_REQ_EVENT
> - drops KVM_REQ_EVENT, cancels exception (made pending again)
> - goes back
> - injection (injects exception again, interrupt is pending)
> 
> as far as I can tell, this is all fine.

But you cleared KVM_REQ_EVENT. Which means you're not going to inject
the pending interrupt on the next entry.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: Check for pending events before attempting injection

2010-07-28 Thread Avi Kivity

 On 07/28/2010 07:21 PM, Marcelo Tosatti wrote:

On Tue, Jul 27, 2010 at 04:19:35PM +0300, Avi Kivity wrote:

Instead of blindly attempting to inject an event before each guest entry,
check for a possible event first in vcpu->requests.  Sites that can trigger
event injection are modified to set KVM_REQ_EVENT:

- interrupt, nmi window opening
- ppr updates
- i8259 output changes
- local apic irr changes
- rflags updates
- gif flag set
- event set on exit

This improves non-injecting entry performance, and sets the stage for
non-atomic injection.

Signed-off-by: Avi Kivity
---
  arch/x86/kvm/i8259.c |1 +
  arch/x86/kvm/lapic.c |   12 ++--
  arch/x86/kvm/svm.c   |8 +++-
  arch/x86/kvm/vmx.c   |6 ++
  arch/x86/kvm/x86.c   |   35 ++-
  include/linux/kvm_host.h |1 +
  6 files changed, 51 insertions(+), 12 deletions(-)

@@ -4731,17 +4737,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
goto out;
}

-   inject_pending_event(vcpu);
+   if (kvm_check_request(KVM_REQ_EVENT, vcpu)) {
+   inject_pending_event(vcpu);

-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu->arch.nmi_pending)
-   kvm_x86_ops->enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops->enable_irq_window(vcpu);
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu->arch.nmi_pending)
+   kvm_x86_ops->enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops->enable_irq_window(vcpu);

Problem is it might not be possible to inject the event signalled by
KVM_REQ_EVENT, say an interrupt from an irqchip, if there is an event
that needs reinjection (or an exception).


That can happen event now, no?  A pending exception, interrupt comes 
along, injection picks up the exception but leaves the interrupt.


Now the situation can be more complicated:

- pending exception
- injection
- interrupt, sets KVM_REQ_EVENT
- notices KVM_REQ_EVENT
- drops KVM_REQ_EVENT, cancels exception (made pending again)
- goes back
- injection (injects exception again, interrupt is pending)

as far as I can tell, this is all fine.


Perhaps moving atomic_set(&vcpu->guest_mode, 1) up to preemptible
section is safe, because kvm_vcpu_kick can only IPI stale vcpu->cpu
while preemption is enabled. In that case, it will hit

if (!atomic_read(&vcpu->guest_mode)

later.



I don't really follow.


Although the KVM_REQ_EVENT idea is nice. Can you think of a way
to fix the issue?




--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] test: verify that the emulator honours svm intercepts

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/x86/svm.c |   37 +
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/kvm/test/x86/svm.c b/kvm/test/x86/svm.c
index 3c30118..cb26af6 100644
--- a/kvm/test/x86/svm.c
+++ b/kvm/test/x86/svm.c
@@ -3,6 +3,7 @@
 #include "processor.h"
 #include "msr.h"
 #include "vm.h"
+#include "smp.h"
 
 static void setup_svm(void)
 {
@@ -168,6 +169,39 @@ static bool check_cr3_nointercept(struct test *test)
 return null_check(test) && test->scratch == read_cr3();
 }
 
+static void corrupt_cr3_intercept_bypass(void *_test)
+{
+struct test *test = _test;
+extern volatile u32 mmio_insn;
+
+while (!__sync_bool_compare_and_swap(&test->scratch, 1, 2))
+pause();
+pause();
+pause();
+pause();
+mmio_insn = 0x90d8200f;  // mov %cr3, %rax; nop
+}
+
+static void prepare_cr3_intercept_bypass(struct test *test)
+{
+default_prepare(test);
+test->vmcb->control.intercept_cr_read |= 1 << 3;
+on_cpu_async(1, corrupt_cr3_intercept_bypass, test);
+}
+
+static void test_cr3_intercept_bypass(struct test *test)
+{
+ulong a = 0xa;
+
+test->scratch = 1;
+while (test->scratch != 2)
+barrier();
+
+asm volatile ("mmio_insn: mov %0, (%0); nop"
+  : "+a"(a) : : "memory");
+test->scratch = a;
+}
+
 static struct test tests[] = {
 { "null", default_prepare, null_test, default_finished, null_check },
 { "vmrun", default_prepare, test_vmrun, default_finished, check_vmrun },
@@ -177,6 +211,8 @@ static struct test tests[] = {
   default_finished, check_cr3_intercept },
 { "cr3 read nointercept", default_prepare, test_cr3_intercept,
   default_finished, check_cr3_nointercept },
+{ "cr3 read intercept emulate", prepare_cr3_intercept_bypass,
+  test_cr3_intercept_bypass, default_finished, check_cr3_intercept }
 };
 
 int main(int ac, char **av)
@@ -185,6 +221,7 @@ int main(int ac, char **av)
 struct vmcb *vmcb;
 
 setup_vm();
+smp_init();
 
 if (!(cpuid(0x8001).c & 4)) {
 printf("SVM not availble\n");
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] test: add intercepted and unintercepted cr3 read tests for svm

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/x86/svm.c |   26 +-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/kvm/test/x86/svm.c b/kvm/test/x86/svm.c
index 5bb64ff..222356d 100644
--- a/kvm/test/x86/svm.c
+++ b/kvm/test/x86/svm.c
@@ -146,12 +146,36 @@ static bool check_vmrun(struct test *test)
 return test->vmcb->control.exit_code == SVM_EXIT_VMRUN;
 }
 
+static void prepare_cr3_intercept(struct test *test)
+{
+default_prepare(test);
+test->vmcb->control.intercept_cr_read |= 1 << 3;
+}
+
+static void test_cr3_intercept(struct test *test)
+{
+asm volatile ("mov %%cr3, %0" : "=r"(test->scratch) : : "memory");
+}
+
+static bool check_cr3_intercept(struct test *test)
+{
+return test->vmcb->control.exit_code == SVM_EXIT_READ_CR3;
+}
+
+static bool check_cr3_nointercept(struct test *test)
+{
+return null_check(test) && test->scratch == read_cr3();
+}
+
 static struct test tests[] = {
 { "null", default_prepare, null_test, default_finished, null_check },
 { "vmrun", default_prepare, test_vmrun, default_finished, check_vmrun },
 { "vmrun intercept check", prepare_no_vmrun_int, null_test,
   default_finished, check_no_vmrun_int },
-
+{ "cr3 read intercept", prepare_cr3_intercept, test_cr3_intercept,
+  default_finished, check_cr3_intercept },
+{ "cr3 read nointercept", default_prepare, test_cr3_intercept,
+  default_finished, check_cr3_nointercept },
 };
 
 int main(int ac, char **av)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] svm intercept tests

2010-07-28 Thread Avi Kivity
This patchset adds three more svm tests: cr3 read intercept, cr3 read
intercept (disabled), and cr3 read intercept through the instruction
emulator.  As usual, 66.7% of the tests pass.

Avi Kivity (6):
  test: add scratch word for use by svm tests
  test: add intercepted and unintercepted cr3 read tests for svm
  test: add pause() instruction accessor
  test: add cli() and sti() instruction accessors
  test: ensure svm tests are executed with interrupts disabled by
default
  test: verify that the emulator honours svm intercepts

 kvm/test/lib/x86/processor.h |   15 +
 kvm/test/x86/svm.c   |   65 +-
 2 files changed, 79 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] test: add cli() and sti() instruction accessors

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/lib/x86/processor.h |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kvm/test/lib/x86/processor.h b/kvm/test/lib/x86/processor.h
index 0376d04..67d7ca5 100644
--- a/kvm/test/lib/x86/processor.h
+++ b/kvm/test/lib/x86/processor.h
@@ -248,4 +248,14 @@ static inline void pause(void)
 asm volatile ("pause");
 }
 
+static inline void cli(void)
+{
+asm volatile ("cli");
+}
+
+static inline void sti(void)
+{
+asm volatile ("sti");
+}
+
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] test: ensure svm tests are executed with interrupts disabled by default

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/x86/svm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kvm/test/x86/svm.c b/kvm/test/x86/svm.c
index 222356d..3c30118 100644
--- a/kvm/test/x86/svm.c
+++ b/kvm/test/x86/svm.c
@@ -110,6 +110,7 @@ static bool test_run(struct test *test, struct vmcb *vmcb)
 static void default_prepare(struct test *test)
 {
 vmcb_ident(test->vmcb);
+cli();
 }
 
 static bool default_finished(struct test *test)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] test: add pause() instruction accessor

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/lib/x86/processor.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kvm/test/lib/x86/processor.h b/kvm/test/lib/x86/processor.h
index ea44a9d..0376d04 100644
--- a/kvm/test/lib/x86/processor.h
+++ b/kvm/test/lib/x86/processor.h
@@ -243,4 +243,9 @@ static inline struct cpuid cpuid(u32 function)
 return cpuid_indexed(function, 0);
 }
 
+static inline void pause(void)
+{
+asm volatile ("pause");
+}
+
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] test: add scratch word for use by svm tests

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/x86/svm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kvm/test/x86/svm.c b/kvm/test/x86/svm.c
index af0e60c..5bb64ff 100644
--- a/kvm/test/x86/svm.c
+++ b/kvm/test/x86/svm.c
@@ -63,6 +63,7 @@ struct test {
 bool (*succeeded)(struct test *test);
 struct vmcb *vmcb;
 int exits;
+ulong scratch;
 };
 
 static void test_thunk(struct test *test)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] KVM: Check for pending events before attempting injection

2010-07-28 Thread Marcelo Tosatti
On Tue, Jul 27, 2010 at 04:19:35PM +0300, Avi Kivity wrote:
> Instead of blindly attempting to inject an event before each guest entry,
> check for a possible event first in vcpu->requests.  Sites that can trigger
> event injection are modified to set KVM_REQ_EVENT:
> 
> - interrupt, nmi window opening
> - ppr updates
> - i8259 output changes
> - local apic irr changes
> - rflags updates
> - gif flag set
> - event set on exit
> 
> This improves non-injecting entry performance, and sets the stage for
> non-atomic injection.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/i8259.c |1 +
>  arch/x86/kvm/lapic.c |   12 ++--
>  arch/x86/kvm/svm.c   |8 +++-
>  arch/x86/kvm/vmx.c   |6 ++
>  arch/x86/kvm/x86.c   |   35 ++-
>  include/linux/kvm_host.h |1 +
>  6 files changed, 51 insertions(+), 12 deletions(-)
> 

> @@ -4731,17 +4737,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   goto out;
>   }
>  
> - inject_pending_event(vcpu);
> + if (kvm_check_request(KVM_REQ_EVENT, vcpu)) {
> + inject_pending_event(vcpu);
>  
> - /* enable NMI/IRQ window open exits if needed */
> - if (vcpu->arch.nmi_pending)
> - kvm_x86_ops->enable_nmi_window(vcpu);
> - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> - kvm_x86_ops->enable_irq_window(vcpu);
> + /* enable NMI/IRQ window open exits if needed */
> + if (vcpu->arch.nmi_pending)
> + kvm_x86_ops->enable_nmi_window(vcpu);
> + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
> + kvm_x86_ops->enable_irq_window(vcpu);

Problem is it might not be possible to inject the event signalled by
KVM_REQ_EVENT, say an interrupt from an irqchip, if there is an event
that needs reinjection (or an exception).

Perhaps moving atomic_set(&vcpu->guest_mode, 1) up to preemptible
section is safe, because kvm_vcpu_kick can only IPI stale vcpu->cpu
while preemption is enabled. In that case, it will hit

if (!atomic_read(&vcpu->guest_mode)

later.

Although the KVM_REQ_EVENT idea is nice. Can you think of a way 
to fix the issue?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezing Windows 2008 x64bit guest

2010-07-28 Thread Gleb Natapov
On Wed, Jul 28, 2010 at 12:53:02AM +0300, Harri Olin wrote:
> Gleb Natapov wrote:
> >On Wed, Jul 21, 2010 at 09:25:31AM +0300, Harri Olin wrote:
> >>Gleb Natapov kirjoitti:
> >>>On Mon, Jul 19, 2010 at 10:17:02AM +0300, Harri Olin wrote:
> Gleb Natapov kirjoitti:
> >On Thu, Jul 15, 2010 at 03:19:44PM +0200, Christoph Adomeit wrote:
> >>But one Windows 2008 64 Bit Server Standard is freezing regularly.
> >>This happens sometimes 3 times a day, sometimes it takes 2 days
> >>until freeze. The Windows Machine is a clean fresh install.
> I think I have seen same problem occur on my Windows 2008 SBS SP2
> 64bit system, but a bit less often, only like once a week.
> Now I haven't seen crashes but only freezes with qemu on 100% and
> virtual system unresponsive.
> >Does sendkey from monitor works? qemu-kvm-0.11.1 is very old and this is
> >not total freeze which even harder to debug. I don't see anything
> >extraordinary in your logs. 4643 interrupt per second for 4 cpus is
> >normal if windows runs multimedia or other app that need hi-res timers.
> >Does your host swapping? Is there any chance that you can try upstream 
> >qemu-kvm?
> 
> I tried running qemu-kvm from git but it exhibited the same problem
> as 12.x that I tried before, BSODing once in a while, running kernel
> 2.6.34.1.
> 
That should be pretty stable config, although it would be nice if you
could try running in qemy-kvm.git head.

> sample BSOD failure details:
> These two with Realtec nic and qemu cpu
> 0x0019 (0x0020, 0xf88007e65970,
> 0xf88007e65990, 0x0502040f)
> 0x0019 (0x0020, 0xf88007a414c0,
> 0xf88007a414e0, 0x0502044c)
> 
> These are with e1000 and -cpu host
> 0x003b (0xc005, 0xf80001c5d842,
> 0xfa60093ddb70, 0x)
> 0x003b (0xc005, 0xf80001cb8842,
> 0xfa600c94ab70, 0x)
> 0x000a (0x0080, 0x000c,
> 0x0001, 0xf80001cadefd)
> 
Can you attach screenshots of BSODs? Have you reinstalled your guests or
are you running the same images you ran in 11.x?

> I'll see if I can analyze minidumps later.
> 
> In addition to these there have been as many reboots that have been
> only logged as 'disruptive shutdown'.
> 
> Right now I'm running the problematic guest under Xen
> 3.2.1-something from Debian to see if it works better.
> 
> -- 
> Harri.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Fix segfault in mmio subpage handling code.

2010-07-28 Thread Gleb Natapov
It is possible that subpage mmio is registered over existing memory
page. When this happens "memory" will have real memory address and not
index into io_mem array so next access to the page will generate
segfault. It is uncommon to have some part of a page to be accessed as
memory and some as mmio, but qemu shouldn't crash even when guest does
stupid things. So lets just pretend that the rest of the page is
unassigned if guest configure part of the memory page as mmio.

Signed-off-by: Gleb Natapov 
---
 exec.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index 5e9a5b7..5945496 100644
--- a/exec.c
+++ b/exec.c
@@ -3363,6 +3363,8 @@ static int subpage_register (subpage_t *mmio, uint32_t 
start, uint32_t end,
mmio, start, end, idx, eidx, memory);
 #endif
 memory = (memory >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+if ((memory & ~TARGET_PAGE_MASK) == IO_MEM_RAM)
+memory = IO_MEM_UNASSIGNED;
 for (; idx <= eidx; idx++) {
 mmio->sub_io_index[idx] = memory;
 mmio->region_offset[idx] = region_offset;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] cpu_register_physical_memory() is completely broken.

2010-07-28 Thread Gleb Natapov
Or just a little bit?

Nothing prevents guest from configuring pci mmio bar to overlap system
memory region and the physical memory address will became mmio, but
when guest will change pci bar mapping the physical address location
will not become memory again, but instead it becomes unassigned. Yes,
guest can only hurt itself by doing this, but real HW works different,
so things that may work on real HW will break in qemu.

Anyway attached are two patches that fix more pressing issues: segfault and
abourt() that can be triggered by a guest.

To trigger segfaul run Linux in qemu tcg (or apply patch 2 and then kvm
can be used too) with standard config. In the guest do the following:
# setpci -s 00:03.0 0x14.L=0xc000
# dd if=/dev/zero of=/dev/mem bs=4096 count=1 seek=12


To trigger abort run Linux in qemu with kvm and do:
# setpci -s 00:03.0 0x14.L=0xc000

Gleb Natapov (2):
  Fix segfault in mmio subpage handling code.
  Remove guest triggerable abort()

 exec.c|2 ++
 kvm-all.c |   16 
 2 files changed, 6 insertions(+), 12 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Remove guest triggerable abort()

2010-07-28 Thread Gleb Natapov
This abort() condition is easily triggerable by a guest if it configures
pci bar with unaligned address that overlaps main memory.

Signed-off-by: Gleb Natapov 
---
 kvm-all.c |   16 
 1 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index fec6d05..ad46b10 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -437,18 +437,10 @@ static void kvm_set_phys_mem(target_phys_addr_t 
start_addr,
 KVMSlot *mem, old;
 int err;
 
-if (start_addr & ~TARGET_PAGE_MASK) {
-if (flags >= IO_MEM_UNASSIGNED) {
-if (!kvm_lookup_overlapping_slot(s, start_addr,
- start_addr + size)) {
-return;
-}
-fprintf(stderr, "Unaligned split of a KVM memory slot\n");
-} else {
-fprintf(stderr, "Only page-aligned memory slots supported\n");
-}
-abort();
-}
+/* kvm works in page size chunks, but the function may be called
+   with sub-page size and analigned start address. */
+size = TARGET_PAGE_ALIGN(size);
+start_addr = TARGET_PAGE_ALIGN(start_addr);
 
 /* KVM does not support read-only slots */
 phys_offset &= ~IO_MEM_ROM;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/13] Emulator group simplification

2010-07-28 Thread Marcelo Tosatti
On Mon, Jul 26, 2010 at 02:37:38PM +0300, Avi Kivity wrote:
> This patchset simplifies the x86 emulator group decoding, cleans up the 
> decoder
> tables (and adds a missing Lock marker as well).
> 
> Avi Kivity (13):
>   KVM: x86 emulator: add macros for repetitive instructions
>   KVM: x86 emulator: consolidate inc/dec reg decoding
>   KVM: x86 emulator: consolidate push/pop reg decoding
>   KVM: X86 emulator: consolidate Jcc rel8 decoding
>   KVM: x86 emulator: consolidate MOV reg, imm decoding
>   KVM: x86 emulator: consolidate CMOVcc decoding
>   KVM: x86 emulator: consolidate Jcc rel32 decoding
>   KVM: x86 emulator: Make group storage bits separate from operand bits
>   KVM: x86 emulator: add Undefined decode flag
>   KVM: x86 emulator: mix decode bits from opcode and group decode
> tables
>   KVM: x86 emulator: simplify Group 1 decoding
>   KVM: x86 emulator: Allow LOCK prefix for NEG and NOT
>   KVM: x86 emulator: unify the two Group 3 variants
> 
>  arch/x86/kvm/emulate.c |  143 
> 
>  1 files changed, 47 insertions(+), 96 deletions(-)

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: fix compile warning

2010-07-28 Thread Marcelo Tosatti
On Mon, Jul 26, 2010 at 01:47:23PM +0800, Xiao Guangrong wrote:
> 
> Stephen Rothwell wrote:
> > Hi all,
> > 
> > After merging the kvm tree, today's linux-next build (x86_64
> > allmodconfig) produced this warning:
> > 
> > arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write':
> > arch/x86/kvm/mmu.c:2908: warning: suggest parentheses around operand of '!' 
> > or change '&' to '&&' or '!' to '~'
> > 
> > Introduced by commit af8720393c71a8f81ade71c404798d80c68c5d73 ("KVM: MMU:
> > add missing reserved bits check in speculative path").
> 
> Hi Stephen,
> 
> Thanks for your report, this patch can fix it.
> 
> Subject: [PATCH] KVM: fix compile warning
> 
> Fix:
> arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write':
> arch/x86/kvm/mmu.c:2908: warning: suggest parentheses around operand of '!' 
> or change '&' to '&&' or '!' to '~'
> 
> Reported-by: Stephen Rothwell 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: remove valueless output message

2010-07-28 Thread Marcelo Tosatti
On Tue, Jul 27, 2010 at 11:21:18AM +0800, Xiao Guangrong wrote:
> After commit 53383eaad08d, the '*spte' has updated before call
> rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so
> remove this information from error message
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |   11 +--
>  1 files changed, 5 insertions(+), 6 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86 emulator: fix handling for unemulated instructions

2010-07-28 Thread Marcelo Tosatti
On Sun, Jul 25, 2010 at 02:51:16PM +0300, Avi Kivity wrote:
> If an instruction is present in the decode tables but not in the execution
> switch, it will be emulated as a NOP.  An example is IRET (0xcf).
> 
> Fix by adding default: labels to the execution switches.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/emulate.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Fix GDT limit corruption on Intel

2010-07-28 Thread Marcelo Tosatti
On Mon, Jul 26, 2010 at 06:32:37PM +0300, Avi Kivity wrote:
> Intel vmx does not restore GDT.LIMIT; this allows host userspace to look at
> some host kernel bits by loading a segment register and looking whether a
> trap happened or not.
> 
> Fix that by reloading GDT on heavyweight exits.
> 
> Avi Kivity (2):
>   KVM: VMX: Fix host GDT.LIMIT corruption
>   KVM: VMX: Use host_gdt variable wherever we need the host gdt
> 
>  arch/x86/kvm/vmx.c |   19 ++-
>  1 files changed, 10 insertions(+), 9 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test: emulator: test LOCK NEG and LOCK NOT

2010-07-28 Thread Marcelo Tosatti
On Mon, Jul 26, 2010 at 03:21:54PM +0300, Avi Kivity wrote:
> Signed-off-by: Avi Kivity 
> ---
>  kvm/test/x86/emulator.c |   21 ++---
>  1 files changed, 18 insertions(+), 3 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/4] Add yield hypercall for KVM guests

2010-07-28 Thread Srivatsa Vaddagiri
On Mon, Jul 26, 2010 at 10:19:41AM -0700, Jeremy Fitzhardinge wrote:
>  On 07/25/2010 11:14 PM, Srivatsa Vaddagiri wrote:
> >Add KVM hypercall for yielding vcpu timeslice.
> 
> Can you do a directed yield?

We don't have that support yet in Linux scheduler. Also I feel it would be more
useful when the target vcpu and yielding vcpu are on the same physical cpu,
rather than when they are on separate cpus. With latter, yielding (or
donating) timeslice need not ensure that target vcpu runs immediately 
and also I suspect fairness issues needs to be tackled as well (large number of
waiters shouldn't boot a lock-holders time slice too much that it gets a 
larger share).

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/4] Paravirt-spinlock implementation for KVM guests (Version 0)

2010-07-28 Thread Srivatsa Vaddagiri
On Mon, Jul 26, 2010 at 10:18:58AM -0700, Jeremy Fitzhardinge wrote:
> >I tried to refactor Xen's spinlock
> >implementation to make it common for both Xen and KVM - but found that
> >few differences between Xen and KVM (Xen has the ability to block on a
> >particular event/irq for example) _and_ the fact that the guest kernel
> >can be compiled to support both Xen and KVM hypervisors (CONFIG_XEN and
> >CONFIG_KVM_GUEST can both be turned on) makes the "common" code a eye-sore.
> >There will have to be:
> >
> > if (xen) {
> > ...
> > } else if (kvm) {
> > ..
> > }
> >
> >or possibly:
> >
> > alternative(NOP, some_xen_specific_call, )
> >
> >type of code in the common implementation.
> 
> No, that doesn't look like a good approach.  It suggests the
> apparently commonality isn't really there.
> 
> >For the time-being, I have made this KVM-specific only. At somepoint in 
> >future,
> >I hope this can be made common between Xen/KVM.
> 
> Did you see the patch series I posted a couple of weeks ago to
> revamp pv spinlocks?  Specifically, I dropped the notion of pv
> spinlocks in which the entire spinlock implementation is replaced,
> and added pv ticketlocks where the ticketlock algorithm is always
> used for the fastpath, but it calls out to pvop calls for the
> slowpath (a long spin, or unlocking a lock with waiters).  It
> significantly reduces the amount of hypervisor-specific code.

Hmmm interesting - I will go thr' it in detail.

> You can see the current patches in
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
>xen/pvticketlock-git

[snip]

> That's not actually the real problem.  It's *a* problem, but
> insignificant compared to the ticketlock-specific "next-in-line vcpu
> scheduler bunfight" problem - lock holder preemption is a misnomer.
> Fortunately the solutions to both are (nearly) the same.
>
> See Thomas Friebel's talk "Prevent Guests from Spinning Around
> (http://www.xen.org/files/xensummitboston08/LHP.pdf).

Yes I had seen Thomas's slides reporting huge degradation in performance with
tick spinlock.

> >b. Avoid preempting a lock-holder while its holding a (spin-) lock.
> >
> >In this scheme, guest OS can hint (set some flag in memory shared with
> >hypervisor) whenever its holding a lock and hypervisor could defer 
> > preempting
> >the guest vcpu when its holding a lock.  With this scheme, we should 
> > never
> >have a lock-acquiring vcpu spin on a preempted vcpu to release its lock. 
> > If
> >ever it spins, its because somebody *currently running* is holding the 
> > lock -
> >and hence it won't have to spin-wait too long. IOW we are pro-actively
> >trying to prevent the LHP problem from occuring in the first place. This
> >should improve job turnaround time for some workloads. [1] has some
> >results based on this approach.
> 
> This doesn't actually help the problem mentioned above, because it's
> not a problem with the lock holder getting preempted, but what
> happens once the lock has been released.

Good point. I agree that the latter problem needs more attention, given a
ticket-type implementation of spinlocks. Have you considered possible solutions
for unmodified guests, which have similar ticket-type lock implementations? 
Not sure if that's important enough to investigate solutions like gang 
scheduling ..

- vatsa

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 03/14] KVM Test: Add a common ping module for network related tests

2010-07-28 Thread Michael Goldish
On 07/28/2010 02:50 PM, Michael Goldish wrote:
> On 07/27/2010 04:01 PM, Lucas Meneghel Rodrigues wrote:
>> On Tue, 2010-07-20 at 09:35 +0800, Amos Kong wrote:
>>> The kvm_net_utils.py is a just a place that wraps common network
>>> related commands which is used to do the network-related tests.
>>> Use -1 as the packet ratio for loss analysis.
>>> Use quiet mode when doing the flood ping.
>>>
>>> Signed-off-by: Jason Wang 
>>> Signed-off-by: Amos Kong 
>>> ---
>>>  0 files changed, 0 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/client/tests/kvm/kvm_net_utils.py 
>>> b/client/tests/kvm/kvm_net_utils.py
>>> index ede4965..8a71858 100644
>>> --- a/client/tests/kvm/kvm_net_utils.py
>>> +++ b/client/tests/kvm/kvm_net_utils.py
>>> @@ -1,4 +1,114 @@
>>> -import re
>>> +import logging, re, signal
>>> +from autotest_lib.client.common_lib import error
>>> +import kvm_subprocess, kvm_utils
>>> +
>>> +def get_loss_ratio(output):
>>> +"""
>>> +Get the packet loss ratio from the output of ping
>>> +
>>> +@param output
>>> +"""
>>> +try:
>>> +return int(re.findall('(\d+)% packet loss', output)[0])
>>> +except IndexError:
>>> +logging.debug(output)
>>> +return -1
>>
>>> +def raw_ping(command, timeout, session, output_func):
>>> +"""
>>> +Low-level ping command execution.
>>> +
>>> +@param command: ping command
>>> +@param timeout: timeout of the ping command
>>> +@param session: local executon hint or session to execute the ping 
>>> command
>>> +"""
>>> +if session == "localhost":
> 
> I have no problem with this, but wouldn't it be nicer to use None to
> indicate that the session should be local?
> 
>>> +process = kvm_subprocess.run_bg(command, output_func=output_func,
>>> +timeout=timeout)
>>
>> Do we really have to resort to kvm_subprocess here? We use subprocess on
>> special occasions, usually when the command in question is required to
>> live throughout the tests, which doesn't seem to be the case.
>> kvm_subprocess is a good library, but it is a more heavyweight
>> alternative (spawns its own shell process, for example). Maybe you are
>> using it because you can directly send input to the process at any time,
>> such as the ctrl+c later on this same code.
>>
>>> +
>>> +# Send SIGINT singal to notify the timeout of running ping process,
>>
>> ^ typo, signal
>>
>>> +# Because ping have the ability to catch the SIGINT signal so we 
>>> can
>>> +# always get the packet loss ratio even if timeout.
>>> +if process.is_alive():
>>> +kvm_utils.kill_process_tree(process.get_pid(), signal.SIGINT)
>>> +
>>> +status = process.get_status()
>>> +output = process.get_output()
>>> +
>>> +process.close()
>>> +return status, output
>>> +else:
>>> +session.sendline(command)
>>> +status, output = session.read_up_to_prompt(timeout=timeout,
>>> +   print_func=output_func)
>>> +if status is False:
>>
>> ^ For None objects, it is better to explicitly test for is None. However
>> the above construct seems more natural if put as
>>
>> if not status:
>>
>> Any particular reason you tested explicitely for False?
> 
> read_up_to_prompt() returns True/False as the first member of the tuple.
> 
>>> +# Send ctrl+c (SIGINT) through ssh session
>>> +session.sendline("\003")
> 
> I think you can use send() here.  sendline() calls send() with a newline
> added to the string.
> 
>>> +status, output2 = 
>>> session.read_up_to_prompt(print_func=output_func)
>>> +output += output2
>>> +if status is False:
>>> +# We also need to use this session to query the return 
>>> value
>>> +session.sendline("\003")
> 
> Same here.
> 
>>> +
>>> +session.sendline(session.status_test_command)
>>> +s2, o2 = session.read_up_to_prompt()
>>> +if s2 is False:
>>
>> ^ See comment above
>>
>>> +status = -1
>>> +else:
>>> +try:
>>> +status = int(re.findall("\d+", o2)[0])
>>> +except:
>>> +status = -1
>>> +
>>> +return status, output
>>> +
>>> +def ping(dest = "localhost", count = None, interval = None, interface = 
>>> None,
>>> + packetsize = None, ttl = None, hint = None, adaptive = False,
>>> + broadcast = False, flood = False, timeout = 0,
>>> + output_func = logging.debug, session = "localhost"):
>>
>> ^ On function headers, we pretty much follow PEP 8 and don't use spacing
>> between argument keys, the equal sign and the default value, so this
>> header should be rewritten
>>
>> def ping(dest="localhost",...)
>>
>>> +"""
>>> +Wrapper of ping.
>>> +
>>> +@param dest: destination address
>>> +@param count: count of icmp packet
>>> +@param interval: inter

Re: [PATCH] x86 emulator: Add IRET instruction

2010-07-28 Thread Paolo Bonzini

On 07/28/2010 03:30 PM, Paolo Bonzini wrote:

On 07/28/2010 11:38 AM, Mohammed Gamal wrote:

+ unsigned long mask = EFLG_CF | EFLG_PF | EFLG_AF | EFLG_ZF | EFLG_SF
| EFLG_TF |
+ EFLG_IF | EFLG_DF | EFLG_OF | EFLG_IOPL | EFLG_NT | EFLG_RF |
+ EFLG_AC | EFLG_ID | (1 << 1); /* Last one is the reserved bit */
+ unsigned long vm86_mask = EFLG_VM | EFLG_VIF | EFLG_VIP;
...
+ if (c->op_bytes == 4)
+ ctxt->eflags = ((temp_eflags & mask) | (ctxt->eflags & vm86_mask));
+ else if (c->op_bytes == 2) {
+ ctxt->eflags &= ~0x;
+ ctxt->eflags |= temp_eflags;
+ }


I think that's still not it. You can set reserved bits for c->op_bytes
== 2, and you can clear bit 1 for both 16- and 32-bit IRET.

IOW you need something like this:

mask = ...; /* without (1 << 1); */
ctxt_mask = (1 << 1) | EFLG_VM | EFLG_VIF | EFLG_VIP;

if (c->op_bytes == 2) {
mask &= 0x;
ctxt_mask |= ~0x;
}

ctxt->eflags = (temp_eflags & mask) | (ctxt->eflags & ctxt_mask);


Sorry, I replied to v3 of the patch while reviewing v2.

Looks good indeed.

Thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 0/2] Use kvm64/kvm32 when running under KVM

2010-07-28 Thread Markus Armbruster
jes.soren...@redhat.com writes:

> From: Jes Sorensen 
>
> This set of patches adds default CPU types to the PC compat
> definitions, and patch #2 sets the CPU type to kvm64/kvm32 when
> running under KVM.
>
> Long term we might want to qdev'ify the CPUs but I think it is better
> to keep it simple for 0.13.

Makes sense to me.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL net-next-2.6] vhost-net patchset for 2.6.36

2010-07-28 Thread Michael S. Tsirkin
David,
The following tree includes the current vhost-net patchset.
Please merge it for 2.6.36.
Thanks!

The following changes since commit 4cfa580e7eebb8694b875d2caff3b989ada2efac:

  r6040: Fix args to phy_mii_ioctl(). (2010-07-21 21:10:49 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next

David Stevens (1):
  vhost-net: mergeable buffers support

Michael S. Tsirkin (1):
  vhost: apply cgroup to vhost workers

Sridhar Samudrala (1):
  cgroups: Add an API to attach a task to current task's cgroup

Tejun Heo (1):
  vhost: replace vhost_workqueue with per-vhost kthread

 drivers/vhost/net.c|  293 +---
 drivers/vhost/vhost.c  |  228 -
 drivers/vhost/vhost.h  |   55 ++---
 include/linux/cgroup.h |7 +
 kernel/cgroup.c|   23 
 5 files changed, 515 insertions(+), 91 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add IRET instruction

2010-07-28 Thread Paolo Bonzini

On 07/28/2010 11:38 AM, Mohammed Gamal wrote:

+   unsigned long mask = EFLG_CF | EFLG_PF | EFLG_AF | EFLG_ZF | EFLG_SF | 
EFLG_TF |
+EFLG_IF | EFLG_DF | EFLG_OF | EFLG_IOPL | EFLG_NT 
| EFLG_RF |
+EFLG_AC | EFLG_ID | (1 << 1); /* Last one is the 
reserved bit */
+   unsigned long vm86_mask = EFLG_VM | EFLG_VIF | EFLG_VIP;
...
+   if (c->op_bytes == 4)
+   ctxt->eflags = ((temp_eflags & mask) | (ctxt->eflags & 
vm86_mask));
+   else if (c->op_bytes == 2) {
+   ctxt->eflags &= ~0x;
+   ctxt->eflags |= temp_eflags;
+   }


I think that's still not it.  You can set reserved bits for c->op_bytes 
== 2, and you can clear bit 1 for both 16- and 32-bit IRET.


IOW you need something like this:

mask = ...; /* without (1 << 1); */
ctxt_mask = (1 << 1) | EFLG_VM | EFLG_VIF | EFLG_VIP;

if (c->op_bytes == 2) {
mask &= 0x;
ctxt_mask |= ~0x;
}

ctxt->eflags = (temp_eflags & mask) | (ctxt->eflags & ctxt_mask);

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: KVM call agenda for July 27

2010-07-28 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 28.07.2010 13:22, schrieb Markus Armbruster:
>> Anthony Liguori  writes:
>> 
>>> On 07/27/2010 10:22 AM, Markus Armbruster wrote:
[...]
 Raw can't be probed safely, by its very nature.  For historical reasons,
 we try anyway.  I think we should stop doing that, even though that
 breaks existing use relying on the misfeature.  Announce it now, spit
 out scary warnings, kill it for good 1-2 releases later.

 If we're unwilling to do that, then I'd *strongly* prefer doing nothing
 over silently messing with the raw writes to sector 0 (so does
 Christoph, and he explained why).
>>>
>>> If we add docs/deprecated-features.txt, schedule removal for at least
>>> 1 year in the future, and put a warning in the code that prints
>>> whenever raw is probed, I think I could warm up to this.
>>>
>>> Since libvirt should be insulating users from this today, I think the
>>> fall out might not be terrible.
>> 
>> Okay, I'll prepare a patch.
>
> This kills -hda and friends for raw images. I'm not sure this is a good
> idea.

Please wait and see my patch.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Nested SVM unit tests

2010-07-28 Thread Avi Kivity

 On 07/28/2010 03:39 PM, Roedel, Joerg wrote:

On Wed, Jul 28, 2010 at 07:53:59AM -0400, Avi Kivity wrote:

The test framework supports smp, so you can have the second vcpu do
nasty stuff.  The difficult part is when the failure depends on host
state that is not visible to the guest.  Perhaps we can add test-only
hypercalls that allow the guest to manipulate this state (drop shadows,
etc.)

I just tried this out and it works great so far. Should be quite easy to
add new tests to this framework (as long as the failure cases are no
race conditions). I guess the small multiboot-kernels run with an
identity mapped page table?



Yes. vm.c has some functions to add more complicated mappings.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Use kvm32/kvm64 as default CPUs when running under KVM.

2010-07-28 Thread Jes . Sorensen
From: Jes Sorensen 

KVM has a minimum CPU requirement in order to run, so there is no
reason to default to the very basic family 6, model 2 (or model 3 for
qemu32) CPU since the additional features are going to be available on
the host CPU.

Signed-off-by: Jes Sorensen 
---
 hw/pc.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 58dea57..b17a199 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -39,6 +39,7 @@
 #include "msix.h"
 #include "sysbus.h"
 #include "sysemu.h"
+#include "kvm.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -866,11 +867,19 @@ void pc_cpus_init(const char *cpu_model)
 
 /* init CPUs */
 if (cpu_model == NULL) {
+if (kvm_enabled()) {
 #ifdef TARGET_X86_64
-cpu_model = "qemu64";
+cpu_model = "kvm64";
 #else
-cpu_model = "qemu32";
+cpu_model = "kvm32";
 #endif
+} else {
+#ifdef TARGET_X86_64
+cpu_model = "qemu64";
+#else
+cpu_model = "qemu32";
+#endif
+}
 }
 
 for(i = 0; i < smp_cpus; i++) {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Set a default CPU type for compat PC machine defs.

2010-07-28 Thread Jes . Sorensen
From: Jes Sorensen 

This allows for changing the default CPU type on the current PC
definition without breaking legacy mode.

Signed-off-by: Jes Sorensen 
---
 hw/boards.h  |1 +
 hw/pc_piix.c |   15 +++
 vl.c |2 ++
 3 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/hw/boards.h b/hw/boards.h
index 6f0f0d7..932bc23 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -19,6 +19,7 @@ typedef struct QEMUMachine {
 QEMUMachineInitFunc *init;
 int use_scsi;
 int max_cpus;
+const char *def_cpu_model;
 unsigned int no_serial:1,
 no_parallel:1,
 use_virtcon:1,
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 812ddfd..51742a0 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -223,6 +223,11 @@ static QEMUMachine pc_machine_v0_12 = {
 .desc = "Standard PC",
 .init = pc_init_pci,
 .max_cpus = 255,
+#ifdef TARGET_X86_64
+.def_cpu_model = "qemu64",
+#else
+.def_cpu_model = "qemu32",
+#endif
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = "virtio-serial-pci",
@@ -242,6 +247,11 @@ static QEMUMachine pc_machine_v0_11 = {
 .desc = "Standard PC, qemu 0.11",
 .init = pc_init_pci,
 .max_cpus = 255,
+#ifdef TARGET_X86_64
+.def_cpu_model = "qemu64",
+#else
+.def_cpu_model = "qemu32",
+#endif
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = "virtio-blk-pci",
@@ -277,6 +287,11 @@ static QEMUMachine pc_machine_v0_10 = {
 .desc = "Standard PC, qemu 0.10",
 .init = pc_init_pci,
 .max_cpus = 255,
+#ifdef TARGET_X86_64
+.def_cpu_model = "qemu64",
+#else
+.def_cpu_model = "qemu32",
+#endif
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = "virtio-blk-pci",
diff --git a/vl.c b/vl.c
index ba6ee11..ca2c509 100644
--- a/vl.c
+++ b/vl.c
@@ -2868,6 +2868,8 @@ int main(int argc, char **argv, char **envp)
 }
 qemu_add_globals();
 
+if ((cpu_model == NULL) && (machine->def_cpu_model))
+cpu_model = machine->def_cpu_model;
 machine->init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] Use kvm64/kvm32 when running under KVM

2010-07-28 Thread Jes . Sorensen
From: Jes Sorensen 

This set of patches adds default CPU types to the PC compat
definitions, and patch #2 sets the CPU type to kvm64/kvm32 when
running under KVM.

Long term we might want to qdev'ify the CPUs but I think it is better
to keep it simple for 0.13.

Jes Sorensen (2):
  Set a default CPU type for compat PC machine defs.
  Use kvm32/kvm64 as default CPUs when running under KVM.

 hw/boards.h  |1 +
 hw/pc.c  |   13 +++--
 hw/pc_piix.c |   15 +++
 vl.c |2 ++
 4 files changed, 29 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Nested SVM unit tests

2010-07-28 Thread Roedel, Joerg
On Wed, Jul 28, 2010 at 07:53:59AM -0400, Avi Kivity wrote:
> The test framework supports smp, so you can have the second vcpu do 
> nasty stuff.  The difficult part is when the failure depends on host 
> state that is not visible to the guest.  Perhaps we can add test-only 
> hypercalls that allow the guest to manipulate this state (drop shadows, 
> etc.)

I just tried this out and it works great so far. Should be quite easy to
add new tests to this framework (as long as the failure cases are no
race conditions). I guess the small multiboot-kernels run with an
identity mapped page table?

Joerg

-- 
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.35-rc1 regression with pvclock and smp guests

2010-07-28 Thread Andre Przywara

Andre Przywara wrote:

Andre Przywara wrote:

Avi Kivity wrote:

  On 07/27/2010 04:48 PM, Andre Przywara wrote:

Wierd.  Maybe the clock goes crazy.

Let's see if it jumps forward alot:

 } while (unlikely(last != ret));
+
+   {
+static u64 last_report;
+if (ret > last_report + 1) {
+last_report = ret;
+printk("kvmclock: %llx\n", ret);
+}
+
+   }

 return ret;
  }

Worth updating the 'return last' to update ret and goto the new code, 
so we don't miss that path.
Did that. There is _a lot_ of output (about 350 lines per second via 
the 115k serial console), both with smp=1 and smp=2.
The majority is differing about 2,000,000 (ticks?), but a handful of 
them are in the range of 20 million. 

nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.


No difference between smp=2 and smp=1.
I also get some "BUG: recent printk recursion!" and I don't see any 
kernel boot progress beyond outputting the BogoMIPS value.

Right, printk() wants the time too.


BTW: I found two message from your earlier debug statement:
[0.00] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
[0.00] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock

Those are from kvmclock initialization, not from the older patch.

I'm completely confused, everything seems to be in order.

Let's see.  if you s/return last/return ret/ in the original, does this 
help things along?  this makes pvclock drop the computation and should 
be exactly the same as before the patch.
Yes, this works, both smp version boot. I see a short very short break 
after the line in question, but then it proceeds well.
Thanks for your help, now I got a much better insight into the issue. I 
will see if I can find something more.

Did some more investigations, some observations:
- The cmpxchg does not seem to be a problem, I didn't see the loop 
iterated more than once.
- Turning off printk-timestamps makes the bug go away. But I guess it is 
just hiding or deferring it, and it's no real workaround anyway.
- I instrumented the "if (ret < last) return last;" statement, when the 
kernel hangs I get only printks from there, although it has hit before:

--
[0.82] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[0.82] returning last instead (cnt=19001)
[0.82] returning last instead (cnt=20001)
The last line repeats forever with the same timestamp, the counter 
(counting the number of "return last;") increments about 3500 times/second.


I will see if I find something more...
Added some more instrumentation, seems like the values read from the 
pvclock is bogus *sometimes*:

 returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
This is from the first time the if-statement triggers. So I guess the 
value read is ridiculously far in the future (multiple days), so next 
calls to clocksource_read() will always return this bogus last value.
This means that the clock does not make progress (for several days) and 
thus timing loops will never come to an end. I also instrumented the 
serial driver, the last thing I saw was autoconfig_irq, where obviously 
udelay() is called.


Does that ring a bell with someone?

I will now concentrate on the pvclock readout/HV write part to see which 
of the values used here are wrong.


Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Avi Kivity

 On 07/28/2010 03:18 PM, Roedel, Joerg wrote:




The guest (L2 in this case) is doomed since it execution cannot
continue.  But L1 and L0 are fine.  The problem with L2 avoiding
intercepts is that L2 can change control registers and take over L1.

Right too. We can not ignore it. The right fix is probably a check for
the instruction intercepts right after the decoder has run and before
the emulator ran.


Should be easy - just like we have the Priv flag, add a bitfield to 
opcode_table that says which bit we need to check in the control area.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Roedel, Joerg
On Wed, Jul 28, 2010 at 07:57:36AM -0400, Avi Kivity wrote:

> If the instruction opcode is on mmio, the processor never sees the 
> opcode and thus can not intercept.  Or the processor may see one 
> instruction, which is not intercepted, but by the time the emulator 
> kicks in a different instruction takes its place, since another vcpu is 
> evilly cross-modifying the code.

Right. X-modifying code is a problem too.

> The guest (L2 in this case) is doomed since it execution cannot 
> continue.  But L1 and L0 are fine.  The problem with L2 avoiding 
> intercepts is that L2 can change control registers and take over L1.

Right too. We can not ignore it. The right fix is probably a check for
the instruction intercepts right after the decoder has run and before
the emulator ran.

Joer

-- 
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-28 Thread Tejun Heo
Hello,

On 07/28/2010 12:48 PM, Michael S. Tsirkin wrote:
> I'm unsure how flush_work operates under these conditions.  E.g. in
> workqueue.c, this seems to work by keeping a pointer to current
> workqueue in the work.  But what prevents us from destroying the
> workqueue when work might not be running?

In cmwq, work points to the gcwq it was on, which keeps track of all
the works in progress, so flushing work which is on a destroyed
workqueue should be fine, but in the original implementation, it would
end up accessing freed memory.

> Is this currently broken if you use multiple workqueues
> for the same work? If yes, I propose we do as I did,
> making flush_work get worker pointer, and only flushing
> on that worker.

The original semantics of workqueue is that flush_work() guarantees
that the work has finished executing on the workqueue it was last
queued on.  Adding @worker to flush_work() is okay, I think.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Avi Kivity

 On 07/28/2010 02:51 PM, Roedel, Joerg wrote:

On Wed, Jul 28, 2010 at 07:34:11AM -0400, Avi Kivity wrote:

   On 07/28/2010 02:25 PM, Roedel, Joerg wrote:

On Wed, Jul 28, 2010 at 06:28:06AM -0400, Avi Kivity wrote:

We have a slightly different problem, if the nested guest manages to get
an instruction to be emulated by the host (if the guest assigned it the
cirrus framebuffer, for example, so from L1's point of view it is RAM,
but from L0's point of view it is emulated), then we miss the
intercept.  L2 could take over L1 this way.

I wonder how this could happen. Shouldn't the shadow paging code take
care of this?


L1 thinks the memory is RAM, so it maps it directly and forgets about
it.  L0 knows it isn't, so it leaves it unmapped and emulates any
instruction which accesses it.  The emulator needs to check whether the
instruction is intercepted or not.

Instruction intercepts take precedence over exception intercepts. So if
the L2 executes an instruction which the L1 hypervisor wants to
intercept we get this instruction intercept on the host side and
re-inject it.
To my understanding the fault-intercept which causes the emulator to run
can only happen if the instruction causing the fault isn't intercepted
itself.


If the instruction opcode is on mmio, the processor never sees the 
opcode and thus can not intercept.  Or the processor may see one 
instruction, which is not intercepted, but by the time the emulator 
kicks in a different instruction takes its place, since another vcpu is 
evilly cross-modifying the code.



Note, I think if the instruction operand is in mmio, we're safe, since
the intercept has higher priority than memory access.  But if the
instruction itself is on mmio, or if we entered the emulator through smp
trickery, then the emulator will execute the instruction in nested guest
context.

Right. But if the guest executes code which is on mmio we are doomed
anyway because our instruction emulator does not emulate the whole x86
instruction set, right?


The guest (L2 in this case) is doomed since it execution cannot 
continue.  But L1 and L0 are fine.  The problem with L2 avoiding 
intercepts is that L2 can change control registers and take over L1.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Nested SVM unit tests

2010-07-28 Thread Avi Kivity

 On 07/28/2010 02:40 PM, Roedel, Joerg wrote:

On Wed, Jul 28, 2010 at 06:18:19AM -0400, Avi Kivity wrote:

The following patchset adds a framework for nested SVM unit testing.  As
you may guess, future fixes and enhancements to nested SVM will require
matching tests.

Cool stuff. I guess the third test failed (vmrun intercept check),
should be easy to fix. This is generally a very useful thing.



Yes.


Currently, our nested SVM implementation has a 66.67% success rate using
these tests.

Okay, thats the downside :-) Anyway, are KVM changes not realated to
nested-svm required to pass these tests too?


Yes.  These tests are run as part of autotest (well, I forgot to update 
kvm/test/x86/unittests.cfg).  If something wants to be in 'master', it 
needs to pass these tests.  So it will be a good idea to add tests for 
things that are liable to break as part of normal code churn, e.g. event 
injection.



How about fixes for race conditions? We had bugs in the past with lost
interrupts in the L2 guest which only showed up under very special
conditions that may be hard to reproduce directly. They may depend on an
L1 interrupt becoming pending for example.
(And I think we still have one bug in this area left which is not yet
  root-caused)


The test framework supports smp, so you can have the second vcpu do 
nasty stuff.  The difficult part is when the failure depends on host 
state that is not visible to the guest.  Perhaps we can add test-only 
hypercalls that allow the guest to manipulate this state (drop shadows, 
etc.)


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Roedel, Joerg
On Wed, Jul 28, 2010 at 07:34:11AM -0400, Avi Kivity wrote:
>   On 07/28/2010 02:25 PM, Roedel, Joerg wrote:
> > On Wed, Jul 28, 2010 at 06:28:06AM -0400, Avi Kivity wrote:
> >> We have a slightly different problem, if the nested guest manages to get
> >> an instruction to be emulated by the host (if the guest assigned it the
> >> cirrus framebuffer, for example, so from L1's point of view it is RAM,
> >> but from L0's point of view it is emulated), then we miss the
> >> intercept.  L2 could take over L1 this way.
> > I wonder how this could happen. Shouldn't the shadow paging code take
> > care of this?
> >
> 
> L1 thinks the memory is RAM, so it maps it directly and forgets about 
> it.  L0 knows it isn't, so it leaves it unmapped and emulates any 
> instruction which accesses it.  The emulator needs to check whether the 
> instruction is intercepted or not.

Instruction intercepts take precedence over exception intercepts. So if
the L2 executes an instruction which the L1 hypervisor wants to
intercept we get this instruction intercept on the host side and
re-inject it.
To my understanding the fault-intercept which causes the emulator to run
can only happen if the instruction causing the fault isn't intercepted
itself.

> Note, I think if the instruction operand is in mmio, we're safe, since 
> the intercept has higher priority than memory access.  But if the 
> instruction itself is on mmio, or if we entered the emulator through smp 
> trickery, then the emulator will execute the instruction in nested guest 
> context.

Right. But if the guest executes code which is on mmio we are doomed
anyway because our instruction emulator does not emulate the whole x86
instruction set, right?

-- 
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 03/14] KVM Test: Add a common ping module for network related tests

2010-07-28 Thread Michael Goldish
On 07/27/2010 04:01 PM, Lucas Meneghel Rodrigues wrote:
> On Tue, 2010-07-20 at 09:35 +0800, Amos Kong wrote:
>> The kvm_net_utils.py is a just a place that wraps common network
>> related commands which is used to do the network-related tests.
>> Use -1 as the packet ratio for loss analysis.
>> Use quiet mode when doing the flood ping.
>>
>> Signed-off-by: Jason Wang 
>> Signed-off-by: Amos Kong 
>> ---
>>  0 files changed, 0 insertions(+), 0 deletions(-)
>>
>> diff --git a/client/tests/kvm/kvm_net_utils.py 
>> b/client/tests/kvm/kvm_net_utils.py
>> index ede4965..8a71858 100644
>> --- a/client/tests/kvm/kvm_net_utils.py
>> +++ b/client/tests/kvm/kvm_net_utils.py
>> @@ -1,4 +1,114 @@
>> -import re
>> +import logging, re, signal
>> +from autotest_lib.client.common_lib import error
>> +import kvm_subprocess, kvm_utils
>> +
>> +def get_loss_ratio(output):
>> +"""
>> +Get the packet loss ratio from the output of ping
>> +
>> +@param output
>> +"""
>> +try:
>> +return int(re.findall('(\d+)% packet loss', output)[0])
>> +except IndexError:
>> +logging.debug(output)
>> +return -1
> 
>> +def raw_ping(command, timeout, session, output_func):
>> +"""
>> +Low-level ping command execution.
>> +
>> +@param command: ping command
>> +@param timeout: timeout of the ping command
>> +@param session: local executon hint or session to execute the ping 
>> command
>> +"""
>> +if session == "localhost":

I have no problem with this, but wouldn't it be nicer to use None to
indicate that the session should be local?

>> +process = kvm_subprocess.run_bg(command, output_func=output_func,
>> +timeout=timeout)
> 
> Do we really have to resort to kvm_subprocess here? We use subprocess on
> special occasions, usually when the command in question is required to
> live throughout the tests, which doesn't seem to be the case.
> kvm_subprocess is a good library, but it is a more heavyweight
> alternative (spawns its own shell process, for example). Maybe you are
> using it because you can directly send input to the process at any time,
> such as the ctrl+c later on this same code.
> 
>> +
>> +# Send SIGINT singal to notify the timeout of running ping process,
> 
> ^ typo, signal
> 
>> +# Because ping have the ability to catch the SIGINT signal so we can
>> +# always get the packet loss ratio even if timeout.
>> +if process.is_alive():
>> +kvm_utils.kill_process_tree(process.get_pid(), signal.SIGINT)
>> +
>> +status = process.get_status()
>> +output = process.get_output()
>> +
>> +process.close()
>> +return status, output
>> +else:
>> +session.sendline(command)
>> +status, output = session.read_up_to_prompt(timeout=timeout,
>> +   print_func=output_func)
>> +if status is False:
> 
> ^ For None objects, it is better to explicitly test for is None. However
> the above construct seems more natural if put as
> 
> if not status:
> 
> Any particular reason you tested explicitely for False?

read_up_to_prompt() returns True/False as the first member of the tuple.

>> +# Send ctrl+c (SIGINT) through ssh session
>> +session.sendline("\003")

I think you can use send() here.  sendline() calls send() with a newline
added to the string.

>> +status, output2 = 
>> session.read_up_to_prompt(print_func=output_func)
>> +output += output2
>> +if status is False:
>> +# We also need to use this session to query the return value
>> +session.sendline("\003")

Same here.

>> +
>> +session.sendline(session.status_test_command)
>> +s2, o2 = session.read_up_to_prompt()
>> +if s2 is False:
> 
> ^ See comment above
> 
>> +status = -1
>> +else:
>> +try:
>> +status = int(re.findall("\d+", o2)[0])
>> +except:
>> +status = -1
>> +
>> +return status, output
>> +
>> +def ping(dest = "localhost", count = None, interval = None, interface = 
>> None,
>> + packetsize = None, ttl = None, hint = None, adaptive = False,
>> + broadcast = False, flood = False, timeout = 0,
>> + output_func = logging.debug, session = "localhost"):
> 
> ^ On function headers, we pretty much follow PEP 8 and don't use spacing
> between argument keys, the equal sign and the default value, so this
> header should be rewritten
> 
> def ping(dest="localhost",...)
> 
>> +"""
>> +Wrapper of ping.
>> +
>> +@param dest: destination address
>> +@param count: count of icmp packet
>> +@param interval: interval of two icmp echo request
>> +@param interface: specified interface of the source address
>> +@param packetsize: packet size of icmp
>> +@param ttl: ip time to live
>> +@param hin

Re: [PATCH 0/8] Nested SVM unit tests

2010-07-28 Thread Roedel, Joerg
On Wed, Jul 28, 2010 at 06:18:19AM -0400, Avi Kivity wrote:
> The following patchset adds a framework for nested SVM unit testing.  As
> you may guess, future fixes and enhancements to nested SVM will require
> matching tests.

Cool stuff. I guess the third test failed (vmrun intercept check),
should be easy to fix. This is generally a very useful thing.

> Currently, our nested SVM implementation has a 66.67% success rate using
> these tests.

Okay, thats the downside :-) Anyway, are KVM changes not realated to
nested-svm required to pass these tests too?
How about fixes for race conditions? We had bugs in the past with lost
interrupts in the L2 guest which only showed up under very special
conditions that may be hard to reproduce directly. They may depend on an
L1 interrupt becoming pending for example.
(And I think we still have one bug in this area left which is not yet
 root-caused)

Joerg

-- 
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Avi Kivity

 On 07/28/2010 02:25 PM, Roedel, Joerg wrote:

On Wed, Jul 28, 2010 at 06:28:06AM -0400, Avi Kivity wrote:

We have a slightly different problem, if the nested guest manages to get
an instruction to be emulated by the host (if the guest assigned it the
cirrus framebuffer, for example, so from L1's point of view it is RAM,
but from L0's point of view it is emulated), then we miss the
intercept.  L2 could take over L1 this way.

I wonder how this could happen. Shouldn't the shadow paging code take
care of this?



L1 thinks the memory is RAM, so it maps it directly and forgets about 
it.  L0 knows it isn't, so it leaves it unmapped and emulates any 
instruction which accesses it.  The emulator needs to check whether the 
instruction is intercepted or not.


Note, I think if the instruction operand is in mmio, we're safe, since 
the intercept has higher priority than memory access.  But if the 
instruction itself is on mmio, or if we entered the emulator through smp 
trickery, then the emulator will execute the instruction in nested guest 
context.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: KVM call agenda for July 27

2010-07-28 Thread Kevin Wolf
Am 28.07.2010 13:22, schrieb Markus Armbruster:
> Anthony Liguori  writes:
> 
>> On 07/27/2010 10:22 AM, Markus Armbruster wrote:
>>> Kevin Wolf  writes:
>>>
>>>
 Am 27.07.2010 15:00, schrieb Anthony Liguori:
  
> On 07/27/2010 02:19 AM, Markus Armbruster wrote:
>
>> Anthony Liguori   writes:
>>
>>
>>  
>>> - any additional input on probed_raw?
>>>
>>>
>> Isn't it a fait accompli?  I stopped providing input when commit
>> 79368c81 appeared.
>>
>>  
> No.  79368c81 was to close the security hole (and I do consider it a
> security hole).  But as I mentioned on the list, I'm also not satisfied
> with it and that's why I proposed probed_raw.  I was hoping to get a
> little more input from those that objected to 79368c81 as to whether
> probed_raw was more agreeable.
>
 Actually I believe qraw is less agreeable. It just too much magic. You
 wouldn't expect that your raw images are turned into some other format
 that you can't mount or use with any other program any more.
  
>>> I also dislike probed_raw, for the same reasons.
>>>
>>> Raw can't be probed safely, by its very nature.  For historical reasons,
>>> we try anyway.  I think we should stop doing that, even though that
>>> breaks existing use relying on the misfeature.  Announce it now, spit
>>> out scary warnings, kill it for good 1-2 releases later.
>>>
>>> If we're unwilling to do that, then I'd *strongly* prefer doing nothing
>>> over silently messing with the raw writes to sector 0 (so does
>>> Christoph, and he explained why).
>>
>> If we add docs/deprecated-features.txt, schedule removal for at least
>> 1 year in the future, and put a warning in the code that prints
>> whenever raw is probed, I think I could warm up to this.
>>
>> Since libvirt should be insulating users from this today, I think the
>> fall out might not be terrible.
> 
> Okay, I'll prepare a patch.

This kills -hda and friends for raw images. I'm not sure this is a good
idea.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Roedel, Joerg
On Wed, Jul 28, 2010 at 06:28:06AM -0400, Avi Kivity wrote:
> We have a slightly different problem, if the nested guest manages to get 
> an instruction to be emulated by the host (if the guest assigned it the 
> cirrus framebuffer, for example, so from L1's point of view it is RAM, 
> but from L0's point of view it is emulated), then we miss the 
> intercept.  L2 could take over L1 this way.

I wonder how this could happen. Shouldn't the shadow paging code take
care of this?

Joerg

-- 
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: KVM call agenda for July 27

2010-07-28 Thread Markus Armbruster
Anthony Liguori  writes:

> On 07/27/2010 10:22 AM, Markus Armbruster wrote:
>> Kevin Wolf  writes:
>>
>>
>>> Am 27.07.2010 15:00, schrieb Anthony Liguori:
>>>  
 On 07/27/2010 02:19 AM, Markus Armbruster wrote:

> Anthony Liguori   writes:
>
>
>  
>> - any additional input on probed_raw?
>>
>>
> Isn't it a fait accompli?  I stopped providing input when commit
> 79368c81 appeared.
>
>  
 No.  79368c81 was to close the security hole (and I do consider it a
 security hole).  But as I mentioned on the list, I'm also not satisfied
 with it and that's why I proposed probed_raw.  I was hoping to get a
 little more input from those that objected to 79368c81 as to whether
 probed_raw was more agreeable.

>>> Actually I believe qraw is less agreeable. It just too much magic. You
>>> wouldn't expect that your raw images are turned into some other format
>>> that you can't mount or use with any other program any more.
>>>  
>> I also dislike probed_raw, for the same reasons.
>>
>> Raw can't be probed safely, by its very nature.  For historical reasons,
>> we try anyway.  I think we should stop doing that, even though that
>> breaks existing use relying on the misfeature.  Announce it now, spit
>> out scary warnings, kill it for good 1-2 releases later.
>>
>> If we're unwilling to do that, then I'd *strongly* prefer doing nothing
>> over silently messing with the raw writes to sector 0 (so does
>> Christoph, and he explained why).
>
> If we add docs/deprecated-features.txt, schedule removal for at least
> 1 year in the future, and put a warning in the code that prints
> whenever raw is probed, I think I could warm up to this.
>
> Since libvirt should be insulating users from this today, I think the
> fall out might not be terrible.

Okay, I'll prepare a patch.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Use kvm32/kvm64 as default CPUs when running under KVM.

2010-07-28 Thread Jes Sorensen
On 07/28/10 12:51, Avi Kivity wrote:
>  On 07/28/2010 01:05 PM, jes.soren...@redhat.com wrote:
>> From: Jes Sorensen
>>
>> KVM has a minimum CPU requirement in order to run, so there is no
>> reason to default to the very basic family 6, model 2 (or model 3 for
>> qemu32) CPU since the additional features are going to be available on
>> the host CPU.
>>
>>
>> @@ -866,11 +867,19 @@ void pc_cpus_init(const char *cpu_model)
>>
>>   /* init CPUs */
>>   if (cpu_model == NULL) {
>> +if (kvm_enabled()) {
>>   #ifdef TARGET_X86_64
>> -cpu_model = "qemu64";
>> +cpu_model = "kvm64";
>>   #else
>> -cpu_model = "qemu32";
>> +cpu_model = "kvm32";
>>   #endif
>> +} else {
>> +#ifdef TARGET_X86_64
>> +cpu_model = "qemu64";
>> +#else
>> +cpu_model = "qemu32";
>> +#endif
>> +}
>>   }
> 
> What about -M 0.12?  It needs to retain the old values.
> 

Urgh, good point. I'll come up with a fix for that. So much for a simple
patch :)

Jes

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-28 Thread Michael S. Tsirkin
On Wed, Jul 28, 2010 at 09:48:31AM +0200, Tejun Heo wrote:
> On 07/27/2010 09:19 PM, Michael S. Tsirkin wrote:
> >> Thinking a bit more about it, it kind of sucks that queueing to
> >> another worker from worker->func() breaks flush.  Maybe the right
> >> thing to do there is using atomic_t for done_seq?
> > 
> > I don't believe it will help: we might have:
> > 
> > worker1 runs work
> > work requeues itself queued index = 1
> > worker1 reads queued index = 1
> > worker2 runs work
> > work requeues itself queued index = 2
> > worker2 runs work
> > worker2 reads queued index = 2
> > worker2 writes done index = 2
> > worker1 writes done index = 1
> > 
> > As you see, done index got moved back.
> 
> Yeah, I think the flushing logic should be moved to the worker.
> Are you interested in doing it w/ your change?
> 
> Thanks.

I'm unsure how flush_work operates under these conditions.  E.g. in
workqueue.c, this seems to work by keeping a pointer to current
workqueue in the work.  But what prevents us from destroying the
workqueue when work might not be running?

Is this currently broken if you use multiple workqueues
for the same work? If yes, I propose we do as I did,
making flush_work get worker pointer, and only flushing
on that worker.

> -- 
> tejun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Use kvm32/kvm64 as default CPUs when running under KVM.

2010-07-28 Thread Avi Kivity

 On 07/28/2010 01:05 PM, jes.soren...@redhat.com wrote:

From: Jes Sorensen

KVM has a minimum CPU requirement in order to run, so there is no
reason to default to the very basic family 6, model 2 (or model 3 for
qemu32) CPU since the additional features are going to be available on
the host CPU.


@@ -866,11 +867,19 @@ void pc_cpus_init(const char *cpu_model)

  /* init CPUs */
  if (cpu_model == NULL) {
+if (kvm_enabled()) {
  #ifdef TARGET_X86_64
-cpu_model = "qemu64";
+cpu_model = "kvm64";
  #else
-cpu_model = "qemu32";
+cpu_model = "kvm32";
  #endif
+} else {
+#ifdef TARGET_X86_64
+cpu_model = "qemu64";
+#else
+cpu_model = "qemu32";
+#endif
+}
  }


What about -M 0.12?  It needs to retain the old values.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.35-rc1 regression with pvclock and smp guests

2010-07-28 Thread Avi Kivity

 On 07/28/2010 12:00 AM, Arjan Koers wrote:

On 2010-07-26 20:59, Arjan Koers wrote:


I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest
kernels hang during boot.


It appears that last is way ahead of ret twice.
The kernel boots with this debug patch that makes the clock go
backwards if the difference is big:

last = atomic64_read(&last_value);
do {
-   if (ret<  last)
-   return last;
+   if (ret<  last) {
+   if ( last - ret<  2500 )
+   return last;
+   else
+   printk("pvclock backwards: ret = %llx; last = 
%llx\n", ret, last);
+   }
last = atomic64_cmpxchg(&last_value, last, ret);
} while (unlikely(last != ret));



[0.037122] Total of 2 processors activated (11198.08 BogoMIPS).
[0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0.04] pvclock backwards: ret = 108373705fe2; last = 210aff61470a


Zcc?!

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add IRET instruction

2010-07-28 Thread Avi Kivity

 On 07/28/2010 12:38 PM, Mohammed Gamal wrote:

Ths patch adds IRET instruction (opcode 0xcf).
Currently, only IRET in real mode is emulated. Protected mode support is to be 
added later if needed.


Looks good.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 02/14] KVM Test: Add a function get_interface_name() to kvm_net_utils.py

2010-07-28 Thread Michael Goldish
On 07/27/2010 05:08 AM, Lucas Meneghel Rodrigues wrote:
> On Tue, 2010-07-20 at 09:35 +0800, Amos Kong wrote:
>> The function get_interface_name is used to get the interface name of linux
>> guest through the macaddress of specified macaddress.
> 
> I wonder if it wouldn't be overkill to have separate utility libraries
> on the kvm test instead of a single kvm_utils and kvm_test_utils like
> you are proposing. Any thoughts Michael?

Yes, looks like this could be in kvm_test_utils.py, especially if
there's only a small number of functions here.

>> Signed-off-by: Jason Wang 
>> Signed-off-by: Amos Kong 
>> ---
>>  0 files changed, 0 insertions(+), 0 deletions(-)
>>
>> diff --git a/client/tests/kvm/kvm_net_utils.py 
>> b/client/tests/kvm/kvm_net_utils.py
>> new file mode 100644
>> index 000..ede4965
>> --- /dev/null
>> +++ b/client/tests/kvm/kvm_net_utils.py
>> @@ -0,0 +1,18 @@
>> +import re
>> +
>> +def get_linux_ifname(session, mac_address):
>> +"""
>> +Get the interface name through the mac address.
>> +
>> +@param session: session to the virtual machine
>> +@mac_address: the macaddress of nic
>> +"""
>> +
>> +output = session.get_command_output("ifconfig -a")
>> +
>> +try:
>> +ethname = re.findall("(\w+)\s+Link.*%s" % mac_address, output,
>> + re.IGNORECASE)[0]
>> +return ethname
>> +except:
>> +return None
>>
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Nested SVM fix and next_rip emulation

2010-07-28 Thread Avi Kivity

 On 07/27/2010 07:14 PM, Joerg Roedel wrote:

Hi Avi, Marcelo,

here are two small patches for kvm-amd. The first patch fixes a small bug in
nested svm efer handling. The second patch implements emulation for the
next_rip feature.


Please post test cases for both patches.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Avi Kivity

 On 07/28/2010 12:37 PM, Roedel, Joerg wrote:



Can it be really this simple?  Suppose we emulate a nested guest
instruction just before vmexit, doesn't that invalidate
vmcb->control.next_rip?  Can that happen?

Good point. I looked again into it. The documentation states:

The next sequential instruction pointer (nRIP) is saved in
the guest VMCB control area at location C8h on all #VMEXITs that
are due to instruction intercepts, as defined in Section 15.8 on
page 378, as well as MSR and IOIO intercepts and exceptions
caused by the INT3, INTO, and BOUND instructions. For all other
intercepts, nRIP is reset to zero.

There are a few intercepts that may need injection when running nested
immediatly after an instruction emulation on the host side:

INTR, NMI
#PF, #GP, ...

All these instructions do not provide a valid next_rip on #vmexit so we
should be save here. The other way around, copying back a next_rip
pointer when there should be none, should also not happen as far as I
see it. The next_rip is only set for instruction intercepts which are
either handled on the host side or reinjected directly into the L1
hypervisor.
When you don't see a failing case either, I think we are save with this
simple implementation.


I agree, looks like everything's fine here.

We have a slightly different problem, if the nested guest manages to get 
an instruction to be emulated by the host (if the guest assigned it the 
cirrus framebuffer, for example, so from L1's point of view it is RAM, 
but from L0's point of view it is emulated), then we miss the 
intercept.  L2 could take over L1 this way.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] test: add processor register access functions

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/lib/x86/processor.h |  246 ++
 1 files changed, 246 insertions(+), 0 deletions(-)
 create mode 100644 kvm/test/lib/x86/processor.h

diff --git a/kvm/test/lib/x86/processor.h b/kvm/test/lib/x86/processor.h
new file mode 100644
index 000..ea44a9d
--- /dev/null
+++ b/kvm/test/lib/x86/processor.h
@@ -0,0 +1,246 @@
+#ifndef LIBCFLAT_PROCESSOR_H
+#define LIBCFLAT_PROCESSOR_H
+
+#include "libcflat.h"
+
+struct descriptor_table_ptr {
+u16 limit;
+ulong base;
+} __attribute__((packed));
+
+static inline void barrier(void)
+{
+asm volatile ("" : : : "memory");
+}
+
+static inline u16 read_cs(void)
+{
+unsigned val;
+
+asm ("mov %%cs, %0" : "=mr"(val));
+return val;
+}
+
+static inline u16 read_ds(void)
+{
+unsigned val;
+
+asm ("mov %%ds, %0" : "=mr"(val));
+return val;
+}
+
+static inline u16 read_es(void)
+{
+unsigned val;
+
+asm ("mov %%es, %0" : "=mr"(val));
+return val;
+}
+
+static inline u16 read_ss(void)
+{
+unsigned val;
+
+asm ("mov %%ss, %0" : "=mr"(val));
+return val;
+}
+
+static inline u16 read_fs(void)
+{
+unsigned val;
+
+asm ("mov %%fs, %0" : "=mr"(val));
+return val;
+}
+
+static inline u16 read_gs(void)
+{
+unsigned val;
+
+asm ("mov %%gs, %0" : "=mr"(val));
+return val;
+}
+
+static inline void write_ds(unsigned val)
+{
+asm ("mov %0, %%ds" : : "rm"(val) : "memory");
+}
+
+static inline void write_es(unsigned val)
+{
+asm ("mov %0, %%es" : : "rm"(val) : "memory");
+}
+
+static inline void write_ss(unsigned val)
+{
+asm ("mov %0, %%ss" : : "rm"(val) : "memory");
+}
+
+static inline void write_fs(unsigned val)
+{
+asm ("mov %0, %%fs" : : "rm"(val) : "memory");
+}
+
+static inline void write_gs(unsigned val)
+{
+asm ("mov %0, %%gs" : : "rm"(val) : "memory");
+}
+
+static inline u64 rdmsr(u32 index)
+{
+u32 a, d;
+asm volatile ("rdmsr" : "=a"(a), "=d"(d) : "c"(index) : "memory");
+return a | ((u64)d << 32);
+}
+
+static inline void wrmsr(u32 index, u64 val)
+{
+u32 a = val, d = val >> 32;
+asm volatile ("wrmsr" : : "a"(a), "d"(d), "c"(index) : "memory");
+}
+
+static inline void write_cr0(ulong val)
+{
+asm volatile ("mov %0, %%cr0" : : "r"(val) : "memory");
+}
+
+static inline ulong read_cr0(void)
+{
+ulong val;
+asm volatile ("mov %%cr0, %0" : "=r"(val) : : "memory");
+return val;
+}
+
+static inline void write_cr2(ulong val)
+{
+asm volatile ("mov %0, %%cr2" : : "r"(val) : "memory");
+}
+
+static inline ulong read_cr2(void)
+{
+ulong val;
+asm volatile ("mov %%cr2, %0" : "=r"(val) : : "memory");
+return val;
+}
+
+static inline void write_cr3(ulong val)
+{
+asm volatile ("mov %0, %%cr3" : : "r"(val) : "memory");
+}
+
+static inline ulong read_cr3(void)
+{
+ulong val;
+asm volatile ("mov %%cr3, %0" : "=r"(val) : : "memory");
+return val;
+}
+
+static inline void write_cr4(ulong val)
+{
+asm volatile ("mov %0, %%cr4" : : "r"(val) : "memory");
+}
+
+static inline ulong read_cr4(void)
+{
+ulong val;
+asm volatile ("mov %%cr4, %0" : "=r"(val) : : "memory");
+return val;
+}
+
+static inline void write_cr8(ulong val)
+{
+asm volatile ("mov %0, %%cr8" : : "r"(val) : "memory");
+}
+
+static inline ulong read_cr8(void)
+{
+ulong val;
+asm volatile ("mov %%cr8, %0" : "=r"(val) : : "memory");
+return val;
+}
+
+static inline void lgdt(const struct descriptor_table_ptr *ptr)
+{
+asm volatile ("lgdt %0" : : "m"(*ptr));
+}
+
+static inline void sgdt(struct descriptor_table_ptr *ptr)
+{
+asm volatile ("sgdt %0" : "=m"(*ptr));
+}
+
+static inline void lidt(const struct descriptor_table_ptr *ptr)
+{
+asm volatile ("lidt %0" : : "m"(*ptr));
+}
+
+static inline void sidt(struct descriptor_table_ptr *ptr)
+{
+asm volatile ("sidt %0" : "=m"(*ptr));
+}
+
+static inline void lldt(unsigned val)
+{
+asm volatile ("lldt %0" : : "rm"(val));
+}
+
+static inline u16 sldt(void)
+{
+u16 val;
+asm volatile ("sldt %0" : "=rm"(val));
+return val;
+}
+
+static inline void ltr(unsigned val)
+{
+asm volatile ("ltr %0" : : "rm"(val));
+}
+
+static inline u16 str(void)
+{
+u16 val;
+asm volatile ("str %0" : "=rm"(val));
+return val;
+}
+
+static inline void write_dr6(ulong val)
+{
+asm volatile ("mov %0, %%dr6" : : "r"(val) : "memory");
+}
+
+static inline ulong read_dr6(void)
+{
+ulong val;
+asm volatile ("mov %%dr6, %0" : "=r"(val));
+return val;
+}
+
+static inline void write_dr7(ulong val)
+{
+asm volatile ("mov %0, %%dr7" : : "r"(val) : "memory");
+}
+
+static inline ulong read_dr7(void)
+{
+ulong val;
+asm volatile ("mov %%dr7, %0" : "=r"(val));
+return val;
+}
+
+struct cpuid { u32 a, b, c, d; };
+
+static inline struct cpuid cpuid_indexed(u32 function, u32 index)
+{
+struct cpuid r;
+asm volatile ("cpuid"
+  : "=a"(r.a), "=b"(r.b), "=c"(r.c), 

[PATCH 1/8] test: move ARRAY_SIZE() to libcflat.h

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/lib/libcflat.h |2 ++
 kvm/test/x86/vmexit.c   |2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kvm/test/lib/libcflat.h b/kvm/test/lib/libcflat.h
index 1da4013..7274fed 100644
--- a/kvm/test/lib/libcflat.h
+++ b/kvm/test/lib/libcflat.h
@@ -39,4 +39,6 @@ extern int vsnprintf(char *buf, int size, const char *fmt, 
va_list va);
 
 extern void puts(const char *s);
 
+#define ARRAY_SIZE(_a)  (sizeof(_a)/sizeof((_a)[0]))
+
 #endif
diff --git a/kvm/test/x86/vmexit.c b/kvm/test/x86/vmexit.c
index 731316b..707d5c6 100644
--- a/kvm/test/x86/vmexit.c
+++ b/kvm/test/x86/vmexit.c
@@ -167,8 +167,6 @@ static void do_test(struct test *test)
printf("%s %d\n", test->name, (int)((t2 - t1) / iterations));
 }
 
-#define ARRAY_SIZE(_x) (sizeof(_x) / sizeof((_x)[0]))
-
 static void enable_nx(void *junk)
 {
wrmsr(MSR_EFER, rdmsr(MSR_EFER) | EFER_NX_MASK);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] test: add svm tests

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/config-x86-common.mak |2 +
 kvm/test/config-x86_64.mak |1 +
 kvm/test/x86/svm.c |  180 
 3 files changed, 183 insertions(+), 0 deletions(-)
 create mode 100644 kvm/test/x86/svm.c

diff --git a/kvm/test/config-x86-common.mak b/kvm/test/config-x86-common.mak
index 00817dc..19bffd4 100644
--- a/kvm/test/config-x86-common.mak
+++ b/kvm/test/config-x86-common.mak
@@ -68,6 +68,8 @@ $(TEST_DIR)/xsave.flat: $(cstart.o) $(TEST_DIR)/idt.o 
$(TEST_DIR)/xsave.o
 $(TEST_DIR)/rmap_chain.flat: $(cstart.o) $(TEST_DIR)/rmap_chain.o \
 $(TEST_DIR)/print.o $(TEST_DIR)/vm.o
 
+$(TEST_DIR)/svm.flat: $(cstart.o) $(TEST_DIR)/vm.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/kvm/test/config-x86_64.mak b/kvm/test/config-x86_64.mak
index 3ffbcc1..b99cf85 100644
--- a/kvm/test/config-x86_64.mak
+++ b/kvm/test/config-x86_64.mak
@@ -7,5 +7,6 @@ CFLAGS += -D__x86_64__
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat
+tests += $(TEST_DIR)/svm.flat
 
 include config-x86-common.mak
diff --git a/kvm/test/x86/svm.c b/kvm/test/x86/svm.c
new file mode 100644
index 000..af0e60c
--- /dev/null
+++ b/kvm/test/x86/svm.c
@@ -0,0 +1,180 @@
+#include "svm.h"
+#include "libcflat.h"
+#include "processor.h"
+#include "msr.h"
+#include "vm.h"
+
+static void setup_svm(void)
+{
+void *hsave = alloc_page();
+
+wrmsr(MSR_VM_HSAVE_PA, virt_to_phys(hsave));
+wrmsr(MSR_EFER, rdmsr(MSR_EFER) | EFER_SVME);
+}
+
+static void vmcb_set_seg(struct vmcb_seg *seg, u16 selector,
+ u64 base, u32 limit, u32 attr)
+{
+seg->selector = selector;
+seg->attrib = attr;
+seg->limit = limit;
+seg->base = base;
+}
+
+static void vmcb_ident(struct vmcb *vmcb)
+{
+u64 vmcb_phys = virt_to_phys(vmcb);
+struct vmcb_save_area *save = &vmcb->save;
+struct vmcb_control_area *ctrl = &vmcb->control;
+u32 data_seg_attr = 3 | SVM_SELECTOR_S_MASK | SVM_SELECTOR_P_MASK
+| SVM_SELECTOR_DB_MASK | SVM_SELECTOR_G_MASK;
+u32 code_seg_attr = 9 | SVM_SELECTOR_S_MASK | SVM_SELECTOR_P_MASK
+| SVM_SELECTOR_L_MASK | SVM_SELECTOR_G_MASK;
+struct descriptor_table_ptr desc_table_ptr;
+
+memset(vmcb, 0, sizeof(*vmcb));
+asm volatile ("vmsave" : : "a"(vmcb_phys) : "memory");
+vmcb_set_seg(&save->es, read_es(), 0, -1U, data_seg_attr);
+vmcb_set_seg(&save->cs, read_cs(), 0, -1U, code_seg_attr);
+vmcb_set_seg(&save->ss, read_ss(), 0, -1U, data_seg_attr);
+vmcb_set_seg(&save->ds, read_ds(), 0, -1U, data_seg_attr);
+sgdt(&desc_table_ptr);
+vmcb_set_seg(&save->gdtr, 0, desc_table_ptr.base, desc_table_ptr.limit, 0);
+sidt(&desc_table_ptr);
+vmcb_set_seg(&save->idtr, 0, desc_table_ptr.base, desc_table_ptr.limit, 0);
+save->cpl = 0;
+save->efer = rdmsr(MSR_EFER);
+save->cr4 = read_cr4();
+save->cr3 = read_cr3();
+save->cr0 = read_cr0();
+save->dr7 = read_dr7();
+save->dr6 = read_dr6();
+save->cr2 = read_cr2();
+save->g_pat = rdmsr(MSR_IA32_CR_PAT);
+save->dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR);
+ctrl->intercept = (1ULL << INTERCEPT_VMRUN) | (1ULL << INTERCEPT_VMMCALL);
+}
+
+struct test {
+const char *name;
+void (*prepare)(struct test *test);
+void (*guest_func)(struct test *test);
+bool (*finished)(struct test *test);
+bool (*succeeded)(struct test *test);
+struct vmcb *vmcb;
+int exits;
+};
+
+static void test_thunk(struct test *test)
+{
+test->guest_func(test);
+asm volatile ("vmmcall" : : : "memory");
+}
+
+static bool test_run(struct test *test, struct vmcb *vmcb)
+{
+u64 vmcb_phys = virt_to_phys(vmcb);
+u64 guest_stack[1];
+bool success;
+
+test->vmcb = vmcb;
+test->prepare(test);
+vmcb->save.rip = (ulong)test_thunk;
+vmcb->save.rsp = (ulong)(guest_stack + ARRAY_SIZE(guest_stack));
+do {
+asm volatile (
+"clgi \n\t"
+"vmload \n\t"
+"push %%rbp \n\t"
+"push %1 \n\t"
+"vmrun \n\t"
+"pop %1 \n\t"
+"pop %%rbp \n\t"
+"vmsave \n\t"
+"stgi"
+: : "a"(vmcb_phys), "D"(test)
+: "rbx", "rcx", "rdx", "rsi",
+  "r8", "r9", "r10", "r11" , "r12", "r13", "r14", "r15",
+  "memory");
+++test->exits;
+} while (!test->finished(test));
+
+success = test->succeeded(test);
+
+printf("%s: %s\n", test->name, success ? "PASS" : "FAIL");
+
+return success;
+}
+
+static void default_prepare(struct test *test)
+{
+vmcb_ident(test->vmcb);
+}
+
+static bool default_finished(struct test *test)
+{
+return true; /* one vmexit */
+}
+
+static void nul

[PATCH 7/8] test: add msr definitions header

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/lib/x86/msr.h |  406 
 1 files changed, 406 insertions(+), 0 deletions(-)
 create mode 100644 kvm/test/lib/x86/msr.h

diff --git a/kvm/test/lib/x86/msr.h b/kvm/test/lib/x86/msr.h
new file mode 100644
index 000..509a421
--- /dev/null
+++ b/kvm/test/lib/x86/msr.h
@@ -0,0 +1,406 @@
+#ifndef _ASM_X86_MSR_INDEX_H
+#define _ASM_X86_MSR_INDEX_H
+
+/* CPU model specific register (MSR) numbers */
+
+/* x86-64 specific MSRs */
+#define MSR_EFER   0xc080 /* extended feature register */
+#define MSR_STAR   0xc081 /* legacy mode SYSCALL target */
+#define MSR_LSTAR  0xc082 /* long mode SYSCALL target */
+#define MSR_CSTAR  0xc083 /* compat mode SYSCALL target */
+#define MSR_SYSCALL_MASK   0xc084 /* EFLAGS mask for syscall */
+#define MSR_FS_BASE0xc100 /* 64bit FS base */
+#define MSR_GS_BASE0xc101 /* 64bit GS base */
+#define MSR_KERNEL_GS_BASE 0xc102 /* SwapGS GS shadow */
+#define MSR_TSC_AUX0xc103 /* Auxiliary TSC */
+
+/* EFER bits: */
+#define _EFER_SCE  0  /* SYSCALL/SYSRET */
+#define _EFER_LME  8  /* Long mode enable */
+#define _EFER_LMA  10 /* Long mode active (read-only) */
+#define _EFER_NX   11 /* No execute enable */
+#define _EFER_SVME 12 /* Enable virtualization */
+#define _EFER_LMSLE13 /* Long Mode Segment Limit Enable */
+#define _EFER_FFXSR14 /* Enable Fast FXSAVE/FXRSTOR */
+
+#define EFER_SCE   (1<<_EFER_SCE)
+#define EFER_LME   (1<<_EFER_LME)
+#define EFER_LMA   (1<<_EFER_LMA)
+#define EFER_NX(1<<_EFER_NX)
+#define EFER_SVME  (1<<_EFER_SVME)
+#define EFER_LMSLE (1<<_EFER_LMSLE)
+#define EFER_FFXSR (1<<_EFER_FFXSR)
+
+/* Intel MSRs. Some also available on other CPUs */
+#define MSR_IA32_PERFCTR0  0x00c1
+#define MSR_IA32_PERFCTR1  0x00c2
+#define MSR_FSB_FREQ   0x00cd
+
+#define MSR_MTRRcap0x00fe
+#define MSR_IA32_BBL_CR_CTL0x0119
+
+#define MSR_IA32_SYSENTER_CS   0x0174
+#define MSR_IA32_SYSENTER_ESP  0x0175
+#define MSR_IA32_SYSENTER_EIP  0x0176
+
+#define MSR_IA32_MCG_CAP   0x0179
+#define MSR_IA32_MCG_STATUS0x017a
+#define MSR_IA32_MCG_CTL   0x017b
+
+#define MSR_IA32_PEBS_ENABLE   0x03f1
+#define MSR_IA32_DS_AREA   0x0600
+#define MSR_IA32_PERF_CAPABILITIES 0x0345
+
+#define MSR_MTRRfix64K_0   0x0250
+#define MSR_MTRRfix16K_8   0x0258
+#define MSR_MTRRfix16K_A   0x0259
+#define MSR_MTRRfix4K_C0x0268
+#define MSR_MTRRfix4K_C80000x0269
+#define MSR_MTRRfix4K_D0x026a
+#define MSR_MTRRfix4K_D80000x026b
+#define MSR_MTRRfix4K_E0x026c
+#define MSR_MTRRfix4K_E80000x026d
+#define MSR_MTRRfix4K_F0x026e
+#define MSR_MTRRfix4K_F80000x026f
+#define MSR_MTRRdefType0x02ff
+
+#define MSR_IA32_CR_PAT0x0277
+
+#define MSR_IA32_DEBUGCTLMSR   0x01d9
+#define MSR_IA32_LASTBRANCHFROMIP  0x01db
+#define MSR_IA32_LASTBRANCHTOIP0x01dc
+#define MSR_IA32_LASTINTFROMIP 0x01dd
+#define MSR_IA32_LASTINTTOIP   0x01de
+
+/* DEBUGCTLMSR bits (others vary by model): */
+#define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
+#define DEBUGCTLMSR_BTF(1UL <<  1) /* single-step on 
branches */
+#define DEBUGCTLMSR_TR (1UL <<  6)
+#define DEBUGCTLMSR_BTS(1UL <<  7)
+#define DEBUGCTLMSR_BTINT  (1UL <<  8)
+#define DEBUGCTLMSR_BTS_OFF_OS (1UL <<  9)
+#define DEBUGCTLMSR_BTS_OFF_USR(1UL << 10)
+#define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI (1UL << 11)
+
+#define MSR_IA32_MC0_CTL   0x0400
+#define MSR_IA32_MC0_STATUS0x0401
+#define MSR_IA32_MC0_ADDR  0x0402
+#define MSR_IA32_MC0_MISC  0x0403
+
+#define MSR_IA32_MCx_CTL(x)(MSR_IA32_MC0_CTL + 4*(x))
+#define MSR_IA32_MCx_STATUS(x) (MSR_IA32_MC0_STATUS + 4*(x))
+#define MSR_IA32_MCx_ADDR(x)   (MSR_IA32_MC0_ADDR + 4*(x))
+#define MSR_IA32_MCx_MISC(x)   (MSR_IA32_MC0_MISC + 4*(x))
+
+/* These are consecutive and not in the normal 4er MCE bank block */
+#define MSR_IA32_MC0_CTL2  0x0280
+#define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
+
+#define CMCI_EN(1ULL << 30)
+#define CMCI_THRESHOLD_MASK   

[PATCH 6/8] test: add svm definitions header

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/x86/svm.h |  328 
 1 files changed, 328 insertions(+), 0 deletions(-)
 create mode 100644 kvm/test/x86/svm.h

diff --git a/kvm/test/x86/svm.h b/kvm/test/x86/svm.h
new file mode 100644
index 000..3fdc0d3
--- /dev/null
+++ b/kvm/test/x86/svm.h
@@ -0,0 +1,328 @@
+#ifndef __SVM_H
+#define __SVM_H
+
+#include "libcflat.h"
+
+enum {
+   INTERCEPT_INTR,
+   INTERCEPT_NMI,
+   INTERCEPT_SMI,
+   INTERCEPT_INIT,
+   INTERCEPT_VINTR,
+   INTERCEPT_SELECTIVE_CR0,
+   INTERCEPT_STORE_IDTR,
+   INTERCEPT_STORE_GDTR,
+   INTERCEPT_STORE_LDTR,
+   INTERCEPT_STORE_TR,
+   INTERCEPT_LOAD_IDTR,
+   INTERCEPT_LOAD_GDTR,
+   INTERCEPT_LOAD_LDTR,
+   INTERCEPT_LOAD_TR,
+   INTERCEPT_RDTSC,
+   INTERCEPT_RDPMC,
+   INTERCEPT_PUSHF,
+   INTERCEPT_POPF,
+   INTERCEPT_CPUID,
+   INTERCEPT_RSM,
+   INTERCEPT_IRET,
+   INTERCEPT_INTn,
+   INTERCEPT_INVD,
+   INTERCEPT_PAUSE,
+   INTERCEPT_HLT,
+   INTERCEPT_INVLPG,
+   INTERCEPT_INVLPGA,
+   INTERCEPT_IOIO_PROT,
+   INTERCEPT_MSR_PROT,
+   INTERCEPT_TASK_SWITCH,
+   INTERCEPT_FERR_FREEZE,
+   INTERCEPT_SHUTDOWN,
+   INTERCEPT_VMRUN,
+   INTERCEPT_VMMCALL,
+   INTERCEPT_VMLOAD,
+   INTERCEPT_VMSAVE,
+   INTERCEPT_STGI,
+   INTERCEPT_CLGI,
+   INTERCEPT_SKINIT,
+   INTERCEPT_RDTSCP,
+   INTERCEPT_ICEBP,
+   INTERCEPT_WBINVD,
+   INTERCEPT_MONITOR,
+   INTERCEPT_MWAIT,
+   INTERCEPT_MWAIT_COND,
+};
+
+
+struct __attribute__ ((__packed__)) vmcb_control_area {
+   u16 intercept_cr_read;
+   u16 intercept_cr_write;
+   u16 intercept_dr_read;
+   u16 intercept_dr_write;
+   u32 intercept_exceptions;
+   u64 intercept;
+   u8 reserved_1[42];
+   u16 pause_filter_count;
+   u64 iopm_base_pa;
+   u64 msrpm_base_pa;
+   u64 tsc_offset;
+   u32 asid;
+   u8 tlb_ctl;
+   u8 reserved_2[3];
+   u32 int_ctl;
+   u32 int_vector;
+   u32 int_state;
+   u8 reserved_3[4];
+   u32 exit_code;
+   u32 exit_code_hi;
+   u64 exit_info_1;
+   u64 exit_info_2;
+   u32 exit_int_info;
+   u32 exit_int_info_err;
+   u64 nested_ctl;
+   u8 reserved_4[16];
+   u32 event_inj;
+   u32 event_inj_err;
+   u64 nested_cr3;
+   u64 lbr_ctl;
+   u64 reserved_5;
+   u64 next_rip;
+   u8 reserved_6[816];
+};
+
+
+#define TLB_CONTROL_DO_NOTHING 0
+#define TLB_CONTROL_FLUSH_ALL_ASID 1
+
+#define V_TPR_MASK 0x0f
+
+#define V_IRQ_SHIFT 8
+#define V_IRQ_MASK (1 << V_IRQ_SHIFT)
+
+#define V_INTR_PRIO_SHIFT 16
+#define V_INTR_PRIO_MASK (0x0f << V_INTR_PRIO_SHIFT)
+
+#define V_IGN_TPR_SHIFT 20
+#define V_IGN_TPR_MASK (1 << V_IGN_TPR_SHIFT)
+
+#define V_INTR_MASKING_SHIFT 24
+#define V_INTR_MASKING_MASK (1 << V_INTR_MASKING_SHIFT)
+
+#define SVM_INTERRUPT_SHADOW_MASK 1
+
+#define SVM_IOIO_STR_SHIFT 2
+#define SVM_IOIO_REP_SHIFT 3
+#define SVM_IOIO_SIZE_SHIFT 4
+#define SVM_IOIO_ASIZE_SHIFT 7
+
+#define SVM_IOIO_TYPE_MASK 1
+#define SVM_IOIO_STR_MASK (1 << SVM_IOIO_STR_SHIFT)
+#define SVM_IOIO_REP_MASK (1 << SVM_IOIO_REP_SHIFT)
+#define SVM_IOIO_SIZE_MASK (7 << SVM_IOIO_SIZE_SHIFT)
+#define SVM_IOIO_ASIZE_MASK (7 << SVM_IOIO_ASIZE_SHIFT)
+
+#define SVM_VM_CR_VALID_MASK   0x001fULL
+#define SVM_VM_CR_SVM_LOCK_MASK 0x0008ULL
+#define SVM_VM_CR_SVM_DIS_MASK  0x0010ULL
+
+struct __attribute__ ((__packed__)) vmcb_seg {
+   u16 selector;
+   u16 attrib;
+   u32 limit;
+   u64 base;
+};
+
+struct __attribute__ ((__packed__)) vmcb_save_area {
+   struct vmcb_seg es;
+   struct vmcb_seg cs;
+   struct vmcb_seg ss;
+   struct vmcb_seg ds;
+   struct vmcb_seg fs;
+   struct vmcb_seg gs;
+   struct vmcb_seg gdtr;
+   struct vmcb_seg ldtr;
+   struct vmcb_seg idtr;
+   struct vmcb_seg tr;
+   u8 reserved_1[43];
+   u8 cpl;
+   u8 reserved_2[4];
+   u64 efer;
+   u8 reserved_3[112];
+   u64 cr4;
+   u64 cr3;
+   u64 cr0;
+   u64 dr7;
+   u64 dr6;
+   u64 rflags;
+   u64 rip;
+   u8 reserved_4[88];
+   u64 rsp;
+   u8 reserved_5[24];
+   u64 rax;
+   u64 star;
+   u64 lstar;
+   u64 cstar;
+   u64 sfmask;
+   u64 kernel_gs_base;
+   u64 sysenter_cs;
+   u64 sysenter_esp;
+   u64 sysenter_eip;
+   u64 cr2;
+   u8 reserved_6[32];
+   u64 g_pat;
+   u64 dbgctl;
+   u64 br_from;
+   u64 br_to;
+   u64 last_excp_from;
+   u64 last_excp_to;
+};
+
+struct __attribute__ ((__packed__)) vmcb {
+   struct vmcb_control_area control;
+   struct vmcb_save_area save;
+};
+
+#define SVM_CPUID_FEATURE_SHIFT 2
+#define SVM_CPUID_FUNC 0x800a
+
+#define SVM_VM_CR_SVM_DISABLE 4
+
+#define SVM_SELECTOR_S_SHIFT 4
+#define SVM_SELECTOR_DPL_SHIFT 5
+#define SVM_SELECTOR_P_SHIFT 7

[PATCH 5/8] test: make use of new processor.h header

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/x86/apic.c |8 
 kvm/test/x86/emulator.c |   10 +-
 kvm/test/x86/vm.c   |   20 +++-
 kvm/test/x86/vm.h   |   43 ++-
 4 files changed, 14 insertions(+), 67 deletions(-)

diff --git a/kvm/test/x86/apic.c b/kvm/test/x86/apic.c
index b6718ec..48fa0f7 100644
--- a/kvm/test/x86/apic.c
+++ b/kvm/test/x86/apic.c
@@ -127,14 +127,6 @@ void test_enable_x2apic(void)
 }
 }
 
-static u16 read_cs(void)
-{
-u16 v;
-
-asm("mov %%cs, %0" : "=rm"(v));
-return v;
-}
-
 static void init_idt(void)
 {
 struct {
diff --git a/kvm/test/x86/emulator.c b/kvm/test/x86/emulator.c
index db6a134..1483e3b 100644
--- a/kvm/test/x86/emulator.c
+++ b/kvm/test/x86/emulator.c
@@ -18,7 +18,7 @@ void report(const char *name, int result)
}
 }
 
-static char str[] = "abcdefghijklmnop";
+static char st1[] = "abcdefghijklmnop";
 
 void test_stringio()
 {
@@ -27,18 +27,18 @@ void test_stringio()
 "movw %0, %%dx \n\t"
 "rep outsb \n\t"
 : : "i"((short)TESTDEV_IO_PORT),
-  "S"(str), "c"(sizeof(str) - 1));
+  "S"(st1), "c"(sizeof(st1) - 1));
asm volatile("inb %1, %0\n\t" : "=a"(r) : "i"((short)TESTDEV_IO_PORT));
-   report("outsb up", r == str[sizeof(str) - 2]); /* last char */
+   report("outsb up", r == st1[sizeof(st1) - 2]); /* last char */
 
asm volatile("std \n\t"
 "movw %0, %%dx \n\t"
 "rep outsb \n\t"
 : : "i"((short)TESTDEV_IO_PORT),
-  "S"(str + sizeof(str) - 2), "c"(sizeof(str) - 1));
+  "S"(st1 + sizeof(st1) - 2), "c"(sizeof(st1) - 1));
asm volatile("cld \n\t" : : );
asm volatile("in %1, %0\n\t" : "=a"(r) : "i"((short)TESTDEV_IO_PORT));
-   report("outsb down", r == str[0]);
+   report("outsb down", r == st1[0]);
 }
 
 void test_cmps_one(unsigned char *m1, unsigned char *m3)
diff --git a/kvm/test/x86/vm.c b/kvm/test/x86/vm.c
index c8f1553..62b3ba8 100644
--- a/kvm/test/x86/vm.c
+++ b/kvm/test/x86/vm.c
@@ -118,19 +118,13 @@ void install_page(unsigned long *cr3,
 }
 
 
-struct gdt_table_descr
-{
-unsigned short len;
-unsigned long *table;
-} __attribute__((packed));
-
 static inline void load_gdt(unsigned long *table, int nent)
 {
-struct gdt_table_descr descr;
+struct descriptor_table_ptr descr;
 
-descr.len = nent * 8 - 1;
-descr.table = table;
-asm volatile ( "lgdt %0" : : "m"(descr) );
+descr.limit = nent * 8 - 1;
+descr.base = (ulong)table;
+lgdt(&descr);
 }
 
 #define SEG_CS_32 8
@@ -158,11 +152,11 @@ static void setup_mmu(unsigned long len)
install_page(cr3, phys, (void *)phys);
phys += PAGE_SIZE;
 }
-load_cr3(virt_to_phys(cr3));
+write_cr3(virt_to_phys(cr3));
 #ifndef __x86_64__
-load_cr4(X86_CR4_PSE);
+write_cr4(X86_CR4_PSE);
 #endif
-load_cr0(X86_CR0_PG |X86_CR0_PE);
+write_cr0(X86_CR0_PG |X86_CR0_PE);
 
 printf("paging enabled\n");
 printf("cr0 = %x\n", read_cr0());
diff --git a/kvm/test/x86/vm.h b/kvm/test/x86/vm.h
index 80dab8b..a3d2676 100644
--- a/kvm/test/x86/vm.h
+++ b/kvm/test/x86/vm.h
@@ -1,6 +1,8 @@
 #ifndef VM_H
 #define VM_H
 
+#include "processor.h"
+
 #define PAGE_SIZE 4096ul
 #ifdef __x86_64__
 #define LARGE_PAGE_SIZE (512 * PAGE_SIZE)
@@ -41,45 +43,4 @@ static inline void *phys_to_virt(unsigned long phys)
 return (void *)phys;
 }
 
-
-static inline void load_cr3(unsigned long cr3)
-{
-asm ( "mov %0, %%cr3" : : "r"(cr3) );
-}
-
-static inline unsigned long read_cr3()
-{
-unsigned long cr3;
-
-asm volatile ( "mov %%cr3, %0" : "=r"(cr3) );
-return cr3;
-}
-
-static inline void load_cr0(unsigned long cr0)
-{
-asm volatile ( "mov %0, %%cr0" : : "r"(cr0) );
-}
-
-static inline unsigned long read_cr0()
-{
-unsigned long cr0;
-
-asm volatile ( "mov %%cr0, %0" : "=r"(cr0) );
-return cr0;
-}
-
-
-static inline void load_cr4(unsigned long cr4)
-{
-asm volatile ( "mov %0, %%cr4" : : "r"(cr4) );
-}
-
-static inline unsigned long read_cr4()
-{
-unsigned long cr4;
-
-asm volatile ( "mov %%cr4, %0" : "=r"(cr4) );
-return cr4;
-}
-
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] test: add type bool

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/lib/libcflat.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kvm/test/lib/libcflat.h b/kvm/test/lib/libcflat.h
index 2e2a8bf..d0d3df2 100644
--- a/kvm/test/lib/libcflat.h
+++ b/kvm/test/lib/libcflat.h
@@ -28,6 +28,10 @@ typedef unsigned u32;
 typedef unsigned long ulong;
 typedef unsigned long long u64;
 typedef unsigned long size_t;
+typedef _Bool bool;
+
+#define true 1
+#define false 0
 
 extern void exit(int code);
 extern void panic(char *fmt, ...);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] test: move memset() to libcflat

2010-07-28 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 kvm/test/lib/libcflat.h |3 +++
 kvm/test/lib/string.c   |   11 +++
 kvm/test/x86/access.c   |8 
 kvm/test/x86/idt.c  |8 
 kvm/test/x86/vm.c   |   10 --
 5 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/kvm/test/lib/libcflat.h b/kvm/test/lib/libcflat.h
index 7274fed..2e2a8bf 100644
--- a/kvm/test/lib/libcflat.h
+++ b/kvm/test/lib/libcflat.h
@@ -27,6 +27,7 @@ typedef unsigned short u16;
 typedef unsigned u32;
 typedef unsigned long ulong;
 typedef unsigned long long u64;
+typedef unsigned long size_t;
 
 extern void exit(int code);
 extern void panic(char *fmt, ...);
@@ -39,6 +40,8 @@ extern int vsnprintf(char *buf, int size, const char *fmt, 
va_list va);
 
 extern void puts(const char *s);
 
+extern void *memset(void *s, int c, size_t n);
+
 #define ARRAY_SIZE(_a)  (sizeof(_a)/sizeof((_a)[0]))
 
 #endif
diff --git a/kvm/test/lib/string.c b/kvm/test/lib/string.c
index 42be946..acac3c0 100644
--- a/kvm/test/lib/string.c
+++ b/kvm/test/lib/string.c
@@ -19,3 +19,14 @@ char *strcat(char *dest, const char *src)
;
 return dest;
 }
+
+void *memset(void *s, int c, size_t n)
+{
+size_t i;
+char *a = s;
+
+for (i = 0; i < n; ++i)
+a[i] = c;
+
+return s;
+}
diff --git a/kvm/test/x86/access.c b/kvm/test/x86/access.c
index 3338fbc..7e6ffb0 100644
--- a/kvm/test/x86/access.c
+++ b/kvm/test/x86/access.c
@@ -160,14 +160,6 @@ void lidt(idt_entry_t *idt, int nentries)
 asm volatile ("lidt %0" : : "m"(dt));
 }
 
-void memset(void *a, unsigned char v, int n)
-{
-unsigned char *x = a;
-
-while (n--)
-   *x++ = v;
-}
-
 unsigned short read_cs()
 {
 unsigned short r;
diff --git a/kvm/test/x86/idt.c b/kvm/test/x86/idt.c
index 999b3f0..590839f 100644
--- a/kvm/test/x86/idt.c
+++ b/kvm/test/x86/idt.c
@@ -39,14 +39,6 @@ unsigned short read_cs()
 return r;
 }
 
-void memset(void *a, unsigned char v, int n)
-{
-unsigned char *x = a;
-
-while (n--)
-   *x++ = v;
-}
-
 void set_idt_entry(idt_entry_t *e, void *addr, int dpl)
 {
 memset(e, 0, sizeof *e);
diff --git a/kvm/test/x86/vm.c b/kvm/test/x86/vm.c
index 0596d9c..c8f1553 100644
--- a/kvm/test/x86/vm.c
+++ b/kvm/test/x86/vm.c
@@ -14,16 +14,6 @@
 static void *free = 0;
 static void *vfree_top = 0;
 
-void *memset(void *data, int c, unsigned long len)
-{
-char *s = data;
-
-while (len--)
-   *s++ = c;
-
-return data;
-}
-
 static void free_memory(void *mem, unsigned long size)
 {
 while (size >= PAGE_SIZE) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] Nested SVM unit tests

2010-07-28 Thread Avi Kivity
The following patchset adds a framework for nested SVM unit testing.  As
you may guess, future fixes and enhancements to nested SVM will require
matching tests.

Currently, our nested SVM implementation has a 66.67% success rate using
these tests.

Avi Kivity (8):
  test: move ARRAY_SIZE() to libcflat.h
  test: move memset() to libcflat
  test: add type bool
  test: add processor register access functions
  test: make use of new processor.h header
  test: add svm definitions header
  test: add msr definitions header
  test: add svm tests

 kvm/test/config-x86-common.mak |2 +
 kvm/test/config-x86_64.mak |1 +
 kvm/test/lib/libcflat.h|9 +
 kvm/test/lib/string.c  |   11 +
 kvm/test/lib/x86/msr.h |  406 
 kvm/test/lib/x86/processor.h   |  246 
 kvm/test/x86/access.c  |8 -
 kvm/test/x86/apic.c|8 -
 kvm/test/x86/emulator.c|   10 +-
 kvm/test/x86/idt.c |8 -
 kvm/test/x86/svm.c |  180 ++
 kvm/test/x86/svm.h |  328 
 kvm/test/x86/vm.c  |   30 +---
 kvm/test/x86/vm.h  |   43 +
 kvm/test/x86/vmexit.c  |2 -
 15 files changed, 1197 insertions(+), 95 deletions(-)
 create mode 100644 kvm/test/lib/x86/msr.h
 create mode 100644 kvm/test/lib/x86/processor.h
 create mode 100644 kvm/test/x86/svm.c
 create mode 100644 kvm/test/x86/svm.h

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Use kvm32/kvm64 as default CPUs when running under KVM.

2010-07-28 Thread Jes . Sorensen
From: Jes Sorensen 

KVM has a minimum CPU requirement in order to run, so there is no
reason to default to the very basic family 6, model 2 (or model 3 for
qemu32) CPU since the additional features are going to be available on
the host CPU.

Signed-off-by: Jes Sorensen 
---
 hw/pc.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 58dea57..b17a199 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -39,6 +39,7 @@
 #include "msix.h"
 #include "sysbus.h"
 #include "sysemu.h"
+#include "kvm.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -866,11 +867,19 @@ void pc_cpus_init(const char *cpu_model)
 
 /* init CPUs */
 if (cpu_model == NULL) {
+if (kvm_enabled()) {
 #ifdef TARGET_X86_64
-cpu_model = "qemu64";
+cpu_model = "kvm64";
 #else
-cpu_model = "qemu32";
+cpu_model = "kvm32";
 #endif
+} else {
+#ifdef TARGET_X86_64
+cpu_model = "qemu64";
+#else
+cpu_model = "qemu32";
+#endif
+}
 }
 
 for(i = 0; i < smp_cpus; i++) {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] test: Add emulator test for iret instruction

2010-07-28 Thread Mohammed Gamal
This adds a unit test for real mode emulation of the iret instruction

Signed-off-by: Mohammed Gamal 

---
Changes from v1:
- Added test for 16-bit iret
- Added tests for returned eflags
---
 kvm/test/x86/realmode.c |   79 +++
 1 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/kvm/test/x86/realmode.c b/kvm/test/x86/realmode.c
index bd79348..74456c3 100644
--- a/kvm/test/x86/realmode.c
+++ b/kvm/test/x86/realmode.c
@@ -865,6 +865,84 @@ void test_pusha_popa()
print_serial("Pusha/Popa Test2: PASS\n");
 }
 
+void test_iret()
+{
+   struct regs inregs = { 0 }, outregs;
+
+   MK_INSN(iret32, "pushf\n\t"
+   "pushl %cs\n\t"
+   "call 1f\n\t" /* a near call will push eip onto the 
stack */
+   "jmp 2f\n\t"
+   "1: iret\n\t"
+   "2:\n\t"
+);
+
+   MK_INSN(iret16, "pushfw\n\t"
+   "pushw %cs\n\t"
+   "callw 1f\n\t"
+   "jmp 2f\n\t"
+   "1: iretw\n\t"
+   "2:\n\t");
+
+   MK_INSN(iret_flags32, "pushfl\n\t"
+ "popl %eax\n\t"
+ "andl $~0x2, %eax\n\t"
+ "orl $0xffc08028, %eax\n\t"
+ "pushl %eax\n\t"
+ "pushl %cs\n\t"
+ "call 1f\n\t"
+ "jmp 2f\n\t"
+ "1: iret\n\t"
+ "2:\n\t");
+
+   MK_INSN(iret_flags16, "pushfw\n\t"
+ "popw %ax\n\t"
+ "and $~0x2, %ax\n\t"
+ "or $0x8028, %ax\n\t" 
+ "pushw %ax\n\t"
+ "pushw %cs\n\t"
+ "callw 1f\n\t"
+ "jmp 2f\n\t"
+ "1: iretw\n\t"
+ "2:\n\t");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+ insn_iret32,
+ insn_iret32_end - insn_iret32);
+
+   if (!regs_equal(&inregs, &outregs, 0))
+   print_serial("iret Test 1: FAIL\n");
+   else
+   print_serial("iret Test 1: PASS\n");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+ insn_iret16,
+ insn_iret16_end - insn_iret16);
+
+   if (!regs_equal(&inregs, &outregs, 0))
+   print_serial("iret Test 2: FAIL\n");
+   else
+   print_serial("iret Test 2: PASS\n");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+ insn_iret_flags32,
+ insn_iret_flags32_end - insn_iret_flags32);
+
+   if (!regs_equal(&inregs, &outregs, R_AX))
+   print_serial("iret Test 3: FAIL\n");
+   else
+   print_serial("iret Test 3: PASS\n");
+
+   exec_in_big_real_mode(&inregs, &outregs,
+ insn_iret_flags16,
+ insn_iret_flags16_end - insn_iret_flags16);
+
+   if (!regs_equal(&inregs, &outregs, R_AX))
+   print_serial("iret Test 4: FAIL\n");
+   else
+   print_serial("iret Test 4: PASS\n");
+}
+
 void realmode_start(void)
 {
test_null();
@@ -886,6 +964,7 @@ void realmode_start(void)
/* long jmp test uses call near so test it after testing call */
test_long_jmp();
test_xchg();
+   test_iret();
 
exit(0);
 }
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86 emulator: Add IRET instruction

2010-07-28 Thread Mohammed Gamal
Ths patch adds IRET instruction (opcode 0xcf).
Currently, only IRET in real mode is emulated. Protected mode support is to be 
added later if needed.

Signed-off-by: Mohammed Gamal 


Changes from v2:
- Delay committing cs until all pop operation are ensured to be successful
- Corrected handling for eflags
---
 arch/x86/kvm/emulate.c |   81 
 1 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b38bd8b..620f609 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -392,6 +392,9 @@ static u32 group2_table[] = {
 #define EFLG_PF (1<<2)
 #define EFLG_CF (1<<0)
 
+#define EFLG_RESERVED_ZEROS_MASK 0xffc0802a
+#define EFLG_RESERVED_ONE_MASK 2
+
 /*
  * Instruction emulation:
  * Most instructions are emulated directly via a fragment of inline assembly
@@ -1778,6 +1781,78 @@ static int emulate_popa(struct x86_emulate_ctxt *ctxt,
return rc;
 }
 
+static int emulate_iret_real(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops)
+{
+   struct decode_cache *c = &ctxt->decode;
+   int rc = X86EMUL_CONTINUE;
+   unsigned long temp_eip = 0;
+   unsigned long temp_eflags = 0;
+   unsigned long cs = 0;
+   unsigned long mask = EFLG_CF | EFLG_PF | EFLG_AF | EFLG_ZF | EFLG_SF | 
EFLG_TF |
+EFLG_IF | EFLG_DF | EFLG_OF | EFLG_IOPL | EFLG_NT 
| EFLG_RF |
+EFLG_AC | EFLG_ID | (1 << 1); /* Last one is the 
reserved bit */
+   unsigned long vm86_mask = EFLG_VM | EFLG_VIF | EFLG_VIP;
+
+   /* TODO: Add stack limit check */
+
+   rc = emulate_pop(ctxt, ops, &temp_eip, c->op_bytes);
+
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   if (temp_eip & ~0x) {
+   emulate_gp(ctxt, 0);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+
+   rc = emulate_pop(ctxt, ops, &cs, c->op_bytes);
+
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   rc = emulate_pop(ctxt, ops, &temp_eflags, c->op_bytes);
+
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   rc = load_segment_descriptor(ctxt, ops, (u16)cs, VCPU_SREG_CS);
+
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+
+   c->eip = temp_eip;
+
+   
+   if (c->op_bytes == 4)
+   ctxt->eflags = ((temp_eflags & mask) | (ctxt->eflags & 
vm86_mask));
+   else if (c->op_bytes == 2) {
+   ctxt->eflags &= ~0x;
+   ctxt->eflags |= temp_eflags;
+   }
+
+   ctxt->eflags &= ~EFLG_RESERVED_ZEROS_MASK; /* Clear reserved zeros */
+   ctxt->eflags |= EFLG_RESERVED_ONE_MASK;
+
+   return rc;
+}
+
+static inline int emulate_iret(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops* ops)
+{
+   switch(ctxt->mode) {
+   case X86EMUL_MODE_REAL:
+   return emulate_iret_real(ctxt, ops);
+   case X86EMUL_MODE_VM86:
+   case X86EMUL_MODE_PROT16:
+   case X86EMUL_MODE_PROT32:
+   case X86EMUL_MODE_PROT64:
+   default:
+   /* iret from protected mode unimplemented yet */
+   return X86EMUL_UNHANDLEABLE;
+   }
+}
+
 static inline int emulate_grp1a(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops)
 {
@@ -2910,6 +2985,12 @@ special_insn:
if (rc != X86EMUL_CONTINUE)
goto done;
break;
+   case 0xcf:  /* iret */
+   rc = emulate_iret(ctxt, ops);
+
+   if (rc != X86EMUL_CONTINUE)
+   goto done;
+   break;
case 0xd0 ... 0xd1: /* Grp2 */
c->src.val = 1;
emulate_grp2(ctxt);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: SVM: Emulate next_rip svm feature

2010-07-28 Thread Roedel, Joerg
On Tue, Jul 27, 2010 at 02:32:35PM -0400, Avi Kivity wrote:
>   On 07/27/2010 07:14 PM, Joerg Roedel wrote:
> > This patch implements the emulations of the svm next_rip
> > feature in the nested svm implementation in kvm.
> >
> > Signed-off-by: Joerg Roedel
> > ---
> >   arch/x86/kvm/svm.c |8 +++-
> >   1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 7d10f2c..b44c9cc 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -1919,6 +1919,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
> > nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
> > nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
> > nested_vmcb->control.exit_int_info_err = 
> > vmcb->control.exit_int_info_err;
> > +   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
> >
> 
> Can it be really this simple?  Suppose we emulate a nested guest 
> instruction just before vmexit, doesn't that invalidate 
> vmcb->control.next_rip?  Can that happen?

Good point. I looked again into it. The documentation states:

The next sequential instruction pointer (nRIP) is saved in
the guest VMCB control area at location C8h on all #VMEXITs that
are due to instruction intercepts, as defined in Section 15.8 on
page 378, as well as MSR and IOIO intercepts and exceptions
caused by the INT3, INTO, and BOUND instructions. For all other
intercepts, nRIP is reset to zero.

There are a few intercepts that may need injection when running nested
immediatly after an instruction emulation on the host side:

INTR, NMI
#PF, #GP, ...

All these instructions do not provide a valid next_rip on #vmexit so we
should be save here. The other way around, copying back a next_rip
pointer when there should be none, should also not happen as far as I
see it. The next_rip is only set for instruction intercepts which are
either handled on the host side or reinjected directly into the L1
hypervisor.
When you don't see a failing case either, I think we are save with this
simple implementation.

Joerg

-- 
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to use live migration with non-shared storage about kvm

2010-07-28 Thread 姚远
>From qemu-kvm-0.12.1, kvm has the function about live migration with
non-shared storage. But I don't know how to use it. In the qemu monitor,
"migrate -d tcp:10.1.10.42:444" which is based on shared storage is going
well. I don't know how to use the command of "-b or -i" or maybe I don't
know the correct format of uri.

the source and destination system is Fedora-13, and the version of kvm is
qemu-kvm-0.12.3.
When I use the "migrate -d -b tcp:10.1.10.42:", the qemu monitor don't
show nothing. When I use "info migrate",there shows "migrate status:
failed". I don't know why? I hope someone who used this command successfully
would help me, thank you.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.35-rc1 regression with pvclock and smp guests

2010-07-28 Thread Andre Przywara

Zachary Amsden wrote:

On 07/27/2010 11:51 AM, Andre Przywara wrote:

Andre Przywara wrote:

Avi Kivity wrote:

  On 07/27/2010 04:48 PM, Andre Przywara wrote:

Wierd.  Maybe the clock goes crazy.

Let's see if it jumps forward alot:

 } while (unlikely(last != ret));
+
+   {
+static u64 last_report;
+if (ret > last_report + 1) {
+last_report = ret;
+printk("kvmclock: %llx\n", ret);
+}
+
+   }

 return ret;
  }

Worth updating the 'return last' to update ret and goto the new 
code, so we don't miss that path.
Did that. There is _a lot_ of output (about 350 lines per second 
via the 115k serial console), both with smp=1 and smp=2.
The majority is differing about 2,000,000 (ticks?), but a handful 
of them are in the range of 20 million. 

nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.


No difference between smp=2 and smp=1.
I also get some "BUG: recent printk recursion!" and I don't see any 
kernel boot progress beyond outputting the BogoMIPS value.

Right, printk() wants the time too.


BTW: I found two message from your earlier debug statement:
[0.00] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
[0.00] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock

Those are from kvmclock initialization, not from the older patch.

I'm completely confused, everything seems to be in order.

Let's see.  if you s/return last/return ret/ in the original, does 
this help things along?  this makes pvclock drop the computation and 
should be exactly the same as before the patch.
Yes, this works, both smp version boot. I see a short very short 
break after the line in question, but then it proceeds well.
Thanks for your help, now I got a much better insight into the issue. 
I will see if I can find something more.

Did some more investigations, some observations:
- The cmpxchg does not seem to be a problem, I didn't see the loop 
iterated more than once.
- Turning off printk-timestamps makes the bug go away. But I guess it 
is just hiding or deferring it, and it's no real workaround anyway.
- I instrumented the "if (ret < last) return last;" statement, when 
the kernel hangs I get only printks from there, although it has hit 
before:

--
[0.82] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[0.82] returning last instead (cnt=19001)
[0.82] returning last instead (cnt=20001)
The last line repeats forever with the same timestamp, the counter 
(counting the number of "return last;") increments about 3500 
times/second.


I will see if I find something more...




gcc --version?

That would be 4.3.3
I compiled the guest kernel with 4.4.4 also, that made no difference.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: SVM: Sync efer back into nested vmcb

2010-07-28 Thread Roedel, Joerg
On Tue, Jul 27, 2010 at 03:07:32PM -0400, Avi Kivity wrote:
>   On 07/27/2010 07:14 PM, Joerg Roedel wrote:
> > This patch fixes a bug in a nested hypervisor that heavily
> > switches between real-mode and long-mode. The problem is
> > fixed by syncing back efer into the guest vmcb on emulated
> > vmexit.
> 
> This is vmware?  IIRC their old kernel was 32-bit, so they need to 
> switch all the time.

No, it is a small interal research hypervisor. But if this fixes vmware
too its fine :-)

Joerg

-- 
Joerg Roedel - AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-28 Thread Tejun Heo
On 07/27/2010 09:19 PM, Michael S. Tsirkin wrote:
>> Thinking a bit more about it, it kind of sucks that queueing to
>> another worker from worker->func() breaks flush.  Maybe the right
>> thing to do there is using atomic_t for done_seq?
> 
> I don't believe it will help: we might have:
> 
> worker1 runs work
> work requeues itself queued index = 1
> worker1 reads queued index = 1
> worker2 runs work
> work requeues itself queued index = 2
> worker2 runs work
> worker2 reads queued index = 2
> worker2 writes done index = 2
> worker1 writes done index = 1
> 
> As you see, done index got moved back.

Yeah, I think the flushing logic should be moved to the worker.  Are
you interested in doing it w/ your change?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >