Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-05-06 Thread Lai Jiangshan
Marcelo Tosatti wrote:
> On Thu, Apr 29, 2010 at 09:43:40PM +0300, Avi Kivity wrote:
>> On 04/29/2010 09:09 PM, Marcelo Tosatti wrote:
>>> You missed quadrant on 4mb large page emulation with shadow (see updated
>>> patch below).
>> Good catch.
>>
>>> Also for some reason i can't understand the assumption
>>> does not hold for large sptes with TDP, so reverted for now.
>> It's unrelated to TDP, same issue with shadow.  I think the
>> calculation is correct.  For example the 4th spte for a level=2 page
>> will yield gfn=4*512.
> 
> Under testing i see sp at level 2, with sp->gfn == 4096, mmu_set_spte
> setting index 8 to gfn 4096 (whereas kvm_mmu_page_get_gfn returns 4096 +
> 8*512).
> 
> Lai, can you please take a look at it? You should see the
> kvm_mmu_page_set_gfn BUG_ON by using -mem-path on hugetlbfs.
> 

Could you tell me how you test it? It will be better if I follow
your test steps.

I also hit the kvm_mmu_page_set_gfn BUG_ON, It is because
FNAME(fetch)() set sp->gfn wrong. The patch:
[PATCH] kvm: calculate correct gfn for small host pages which emulates large 
guest pages
fix it.

I can not hit kvm_mmu_page_set_gfn BUG_ON after this patch also
applied.

So could you tell me your test steps:
The host: ept/npt enabled? 64bit? testing codes in host?
The guest: OS? PAE? 32bit? 64bit? testing codes in guest?

Lai
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] amended: first round of vhost-net enhancements for net-next

2010-05-06 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 4 May 2010 14:21:01 +0300

> This is an amended pull request: I have rebased the tree to the
> correct patches. This has been through basic tests and seems
> to work fine here.
> 
> The following tree includes a couple of enhancements that help vhost-net.
> Please pull them for net-next. Another set of patches is under
> debugging/testing and I hope to get them ready in time for 2.6.35,
> so there may be another pull request later.

Pulled, thanks Michael.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vCPU scalability for linux VMs

2010-05-06 Thread Alec Istomin
On Wednesday, May 5, 2010 at 01:53:55 -0400, Srivatsa Vaddagiri wrote:
> On Wed, May 05, 2010 at 12:31:11PM -0700, Alec Istomin wrote:
>> 
>> On Wednesday, May 5, 2010 at 13:27:39 -0400, Srivatsa Vaddagiri wrote:
>> >>  My preliminary results show that single vCPU Linux VMs perform up to 10
>> >>  times better than 4vCPU Linux VMs (consolidated performance of 8 VMs on
>> >>  8 core pre-Nehalem server). I suspect that I'm missing something major
>> >>  and look for any means that can help improve SMP VMs performance.
>> 
>> > Could you provide more details of what workload was used to compare this
>> > performance?
>> 
>> It's Dell's DVD-Store LAMP stack test suite with Medium database (about
>> 1 GB MySQL data) . 8x 2Gb VMs (on 16GB RAM host) with different vCPU
>> configuration simultaneously loaded by external web clients.

> Ok ..Do you have any more data on how the vCPU scalability looks like (how
> the throughput scales with increasing vCPUs)?

See attached for 1 vs 2 vs 4 vCPUs consolidated throughput from multiple
VMs with the same client payload. Selected DVD-Store (LAMP) scalability
is mixed into here too and I wonder what type of test would you
recommend to receive cleaner results?

Regards,
 Alec<>

Re: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest

2010-05-06 Thread Avi Kivity

On 05/06/2010 07:23 AM, Cui, Dexuan wrote:



+   goto err;
+   vcpu->arch.guest_xstate_mask = new_bv;
+   xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.guest_xstate_mask);

   

Can't we run with the host xcr0?  isn't it guaranteed to be a superset
of the guest xcr0?
 

I agree host xcr0 is a superset of guest xcr0.
In the case guest xcr0 has less bits set  than host xcr0, I suppose running 
with guest
xcr0 would be a bit faster.


However, switching xcr0 may be slow.  That's our experience with msrs.  
Can you measure its latency?


Can also be avoided if guest and host xcr0 match.


If you think simplying the code by removing the field
guest_xstate_mask is more important, we can do it.
   


Well we need guest_xstate_mask, it's the guest's xcr0, no?

btw, it needs save/restore for live migration, as well as save/restore 
for the new fpu state.



+   skip_emulated_instruction(vcpu);
+   return 1;
+err:
+   kvm_inject_gp(vcpu, 0);

   

Need to #UD in some circumstances.
 

I don't think so. When the CPU doesn't suport XSAVE, or guest doesn't set guest
CR4.OSXSAVE, guest's attempt of exectuing xsetbv would get a #UD immediately
and no VMexit would occur.
   


Ok.


@@ -62,6 +64,7 @@
(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |
  X86_CR4_DE\ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE 
\
  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR  \
+ | (cpu_has_xsave ? X86_CR4_OSXSAVE : 0)   \
  | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE))

   

It also depends on the guest cpuid value.  Please add it outside the
 

If cpu_has_xsave is true, we always present xsave to guest via cpuid, so I
think the code is correct here.
   


We don't pass all host cpuid values to the guest.  We expose them to 
userspace via KVM_GET_SUPPORTED_CPUID, and then userspace decides what 
cpuid to present to the guest via KVM_SET_CPUID2.  So the guest may run 
with fewer features than the host.



I saw the 2 patches you sent. They (like the new APIs fpu_alloc/fpu_free) will 
simplify
the implementation of kvm xsave support. Thanks a lot!
   


Thanks.  To simplify things, please separate host xsave support 
(switching to the fpu api) and guest xsave support (cpuid, xsetbv, new 
ioctls) into separate patches.  In fact, guest xsave support is probably 
best split into patches as well.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] qemu/virtio: use last used index published by guest

2010-05-06 Thread Michael S. Tsirkin
Reduces irq_window in guest by only injecting
an interrupt if guest has handled all buffers we
used so far.

Signed-off-by: Michael S. Tsirkin 
---

This is the qemu part of the story.

 hw/vhost_net.c |6 ++
 hw/virtio.c|   15 +++
 hw/virtio.h|4 
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index 2e292ee..d77c9b2 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -51,6 +51,9 @@ unsigned vhost_net_get_features(struct vhost_net *net, 
unsigned features)
 if (!(net->dev.features & (1 << VIRTIO_RING_F_INDIRECT_DESC))) {
 features &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
 }
+if (!(net->dev.features & (1 << VIRTIO_RING_F_PUBLISH_USED))) {
+features &= ~(1 << VIRTIO_RING_F_PUBLISH_USED);
+}
 features &= ~(1 << VIRTIO_NET_F_MRG_RXBUF);
 return features;
 }
@@ -64,6 +67,9 @@ void vhost_net_ack_features(struct vhost_net *net, unsigned 
features)
 if (features & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
 net->dev.acked_features |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
 }
+if (features & (1 << VIRTIO_RING_F_PUBLISH_USED)) {
+net->dev.acked_features |= (1 << VIRTIO_RING_F_PUBLISH_USED);
+}
 }
 
 static int vhost_net_get_fd(VLANClientState *backend)
diff --git a/hw/virtio.c b/hw/virtio.c
index e7657ae..5d686f0 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -71,6 +71,7 @@ struct VirtQueue
 target_phys_addr_t pa;
 uint16_t last_avail_idx;
 int inuse;
+int num_notify;
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
 VirtIODevice *vdev;
@@ -139,6 +140,11 @@ static inline uint16_t vring_avail_ring(VirtQueue *vq, int 
i)
 return lduw_phys(pa);
 }
 
+static inline uint16_t vring_last_used_idx(VirtQueue *vq)
+{
+return vring_avail_ring(vq, vq->vring.num);
+}
+
 static inline void vring_used_ring_id(VirtQueue *vq, int i, uint32_t val)
 {
 target_phys_addr_t pa;
@@ -234,6 +240,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 wmb();
 vring_used_idx_increment(vq, count);
 vq->inuse -= count;
+vq->num_notify += count;
 }
 
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
@@ -603,6 +610,14 @@ void virtio_irq(VirtQueue *vq)
 
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
+/* Do not notify if guest did not yet see the last update. */
+if ((vdev->guest_features & (1 << VIRTIO_RING_F_PUBLISH_USED)) &&
+   vring_last_used_idx(vq) !=
+   (uint16_t)(vring_used_idx(vq) + vq->num_notify))
+   return;
+
+vq->num_notify = 0;
+
 /* Always notify when queue is empty (when feature acknowledge) */
 if ((vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT) &&
 (!(vdev->guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) ||
diff --git a/hw/virtio.h b/hw/virtio.h
index f885f1b..4bdfded 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -43,6 +43,8 @@
 #define VIRTIO_F_NOTIFY_ON_EMPTY24
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC 28
+/* The Guest publishes last-seen used index at the end of the avail ring. */
+#define VIRTIO_RING_F_PUBLISH_USED 29
 /* A guest should never accept this.  It implies negotiation is broken. */
 #define VIRTIO_F_BAD_FEATURE   30
 
@@ -189,6 +191,8 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev);
 void virtio_net_exit(VirtIODevice *vdev);
 
 #define DEFINE_VIRTIO_COMMON_FEATURES(_state, _field) \
+   DEFINE_PROP_BIT("publish_used", _state, _field, \
+   VIRTIO_RING_F_PUBLISH_USED, true), \
DEFINE_PROP_BIT("indirect_desc", _state, _field, \
VIRTIO_RING_F_INDIRECT_DESC, true)
 
-- 
1.7.1.12.g42b7f
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3 v2] KVM: VMX code cleanup and preparation.

2010-05-06 Thread Xu, Dongxiao
From: Dongxiao Xu 

Signed-off-by: Dongxiao Xu 
---
 arch/x86/kvm/vmx.c |   36 +++-
 1 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 875b785..e77da89 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -445,6 +445,19 @@ static void vmcs_clear(struct vmcs *vmcs)
   vmcs, phys_addr);
 }
 
+static void vmcs_load(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(vmcs);
+   u8 error;
+
+   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) "; setna %0"
+   : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
+   : "cc", "memory");
+   if (error)
+   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
+  vmcs, phys_addr);
+}
+
 static void __vcpu_clear(void *arg)
 {
struct vcpu_vmx *vmx = arg;
@@ -769,7 +782,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   u64 phys_addr = __pa(vmx->vmcs);
u64 tsc_this, delta, new_offset;
 
if (vcpu->cpu != cpu) {
@@ -783,15 +795,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
}
 
if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
-   u8 error;
-
per_cpu(current_vmcs, cpu) = vmx->vmcs;
-   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) "; setna %0"
- : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
- : "cc");
-   if (error)
-   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
-  vmx->vmcs, phys_addr);
+   vmcs_load(vmx->vmcs);
}
 
if (vcpu->cpu != cpu) {
@@ -1220,6 +1225,13 @@ static __init int vmx_disabled_by_bios(void)
/* locked but not enabled */
 }
 
+static void kvm_cpu_vmxon(u64 addr)
+{
+   asm volatile (ASM_VMX_VMXON_RAX
+   : : "a"(&addr), "m"(addr)
+   : "memory", "cc");
+}
+
 static int hardware_enable(void *garbage)
 {
int cpu = raw_smp_processor_id();
@@ -1240,9 +1252,7 @@ static int hardware_enable(void *garbage)
   FEATURE_CONTROL_LOCKED |
   FEATURE_CONTROL_VMXON_ENABLED);
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   asm volatile (ASM_VMX_VMXON_RAX
- : : "a"(&phys_addr), "m"(phys_addr)
- : "memory", "cc");
+   kvm_cpu_vmxon(phys_addr);
 
ept_sync_global();
 
@@ -1266,13 +1276,13 @@ static void vmclear_local_vcpus(void)
 static void kvm_cpu_vmxoff(void)
 {
asm volatile (__ex(ASM_VMX_VMXOFF) : : : "cc");
-   write_cr4(read_cr4() & ~X86_CR4_VMXE);
 }
 
 static void hardware_disable(void *garbage)
 {
vmclear_local_vcpus();
kvm_cpu_vmxoff();
+   write_cr4(read_cr4() & ~X86_CR4_VMXE);
 }
 
 static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
-- 
1.6.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3 v2] KVM: VMX: Support hosted VMM coexsitence.

2010-05-06 Thread Xu, Dongxiao
Main changes from v1:
1) Add an module option "vmm_coexistence" to decide whether to
enable this feature. Currently it is off defaultly.
2) Each time when a KVM vcpu is scheduled in, it will invalidate EPT
and VPID TLBs to avoid conflict between different VMMs.

VMX: Support for coexistence of KVM and other hosted VMMs. 

The following NOTE is picked up from Intel SDM 3B 27.3 chapter, 
MANAGING VMCS REGIONS AND POINTERS.

--
NOTE
As noted in Section 21.1, the processor may optimize VMX operation
by maintaining the state of an active VMCS (one for which VMPTRLD
has been executed) on the processor. Before relinquishing control to
other system software that may, without informing the VMM, remove
power from the processor (e.g., for transitions to S3 or S4) or leave
VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures
that all VMCS data cached by the processor are flushed to memory
and that no other software can corrupt the current VMM's VMCS data.
It is also recommended that the VMM execute VMXOFF after such
executions of VMCLEAR.
--

Currently, VMCLEAR is called at VCPU migration. To support hosted
VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and
VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is
scheduled out of a physical CPU, while VMPTRLD is called when VCPU
is scheduled in a physical CPU. Also this approach could eliminates
the IPI mechanism for original VMCLEAR. As suggested by SDM,
VMXOFF will be called after VMCLEAR, and VMXON will be called
before VMPTRLD.

With this patchset, KVM and VMware Workstation 7 could launch
serapate guests and they can work well with each other. --
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3 v2] KVM: VMX: VMXON/VMXOFF usage changes.

2010-05-06 Thread Xu, Dongxiao
From: Dongxiao Xu 

Intel SDM also suggests that VMXOFF should be called after doing
VMCLEAR. Therefore VMXON should be called before VMPTRLD.

Signed-off-by: Dongxiao Xu 
---
 arch/x86/kvm/vmx.c |   30 --
 1 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aa3b2f9..85a17a1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -168,6 +168,9 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 
 static int init_rmode(struct kvm *kvm);
 static u64 construct_eptp(unsigned long root_hpa);
+static void kvm_cpu_vmxon(u64 addr);
+static void kvm_cpu_vmxoff(void);
+
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -787,8 +790,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
+   u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
if (vmm_coexistence) {
+   kvm_cpu_vmxon(phys_addr);
vmcs_load(vmx->vmcs);
set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
} else {
@@ -846,6 +851,7 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
vmcs_clear(vmx->vmcs);
rdtscll(vmx->vcpu.arch.host_tsc);
vmx->launched = 0;
+   kvm_cpu_vmxoff();
}
 }
 
@@ -1269,9 +1275,11 @@ static int hardware_enable(void *garbage)
   FEATURE_CONTROL_LOCKED |
   FEATURE_CONTROL_VMXON_ENABLED);
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   kvm_cpu_vmxon(phys_addr);
 
-   ept_sync_global();
+   if (!vmm_coexistence) {
+   kvm_cpu_vmxon(phys_addr);
+   ept_sync_global();
+   }
 
return 0;
 }
@@ -1297,9 +1305,10 @@ static void kvm_cpu_vmxoff(void)
 
 static void hardware_disable(void *garbage)
 {
-   if (!vmm_coexistence)
+   if (!vmm_coexistence) {
vmclear_local_vcpus();
-   kvm_cpu_vmxoff();
+   kvm_cpu_vmxoff();
+   }
write_cr4(read_cr4() & ~X86_CR4_VMXE);
 }
 
@@ -3967,6 +3976,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
 {
int err;
struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+   u64 phys_addr = __pa(per_cpu(vmxarea, smp_processor_id()));
int cpu;
 
if (!vmx)
@@ -3990,9 +4000,11 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
 
if (vmm_coexistence) {
preempt_disable();
+   kvm_cpu_vmxon(phys_addr);
vmcs_load(vmx->vmcs);
err = vmx_vcpu_setup(vmx);
vmcs_clear(vmx->vmcs);
+   kvm_cpu_vmxoff();
preempt_enable();
} else {
vmcs_clear(vmx->vmcs);
@@ -4138,12 +4150,16 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
struct vcpu_vmx *vmx = to_vmx(vcpu);
+   int cpu = raw_smp_processor_id();
+   u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
u32 exec_control;
 
vmx->rdtscp_enabled = false;
if (vmx_rdtscp_supported()) {
-   if (vmm_coexistence)
+   if (vmm_coexistence) {
+   kvm_cpu_vmxon(phys_addr);
vmcs_load(vmx->vmcs);
+   }
exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
if (exec_control & SECONDARY_EXEC_RDTSCP) {
best = kvm_find_cpuid_entry(vcpu, 0x8001, 0);
@@ -4155,8 +4171,10 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
exec_control);
}
}
-   if (vmm_coexistence)
+   if (vmm_coexistence) {
vmcs_clear(vmx->vmcs);
+   kvm_cpu_vmxoff();
+   }
}
 }
 
-- 
1.6.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] x86: eliminate TS_XSAVE

2010-05-06 Thread Avi Kivity
The fpu code currently uses current->thread_info->status & TS_XSAVE as
a way to distinguish between XSAVE capable processors and older processors.
The decision is not really task specific; instead we use the task status to
avoid a global memory reference - the value should be the same across all
threads.

Eliminate this tie-in into the task structure by using an alternative
instruction keyed off the XSAVE cpu feature; this results in shorter and
faster code, without introducing a global memory reference.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/i387.h|   20 
 arch/x86/include/asm/thread_info.h |1 -
 arch/x86/kernel/cpu/common.c   |5 +
 arch/x86/kernel/i387.c |5 +
 arch/x86/kernel/xsave.c|6 +++---
 5 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index da29309..a301a68 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -56,6 +56,18 @@ extern int restore_i387_xstate_ia32(void __user *buf);
 
 #define X87_FSW_ES (1 << 7)/* Exception Summary */
 
+static inline bool use_xsave(void)
+{
+   u8 has_xsave;
+
+   alternative_io("mov $0, %0",
+  "mov $1, %0",
+  X86_FEATURE_XSAVE,
+  "=g"(has_xsave));
+
+   return has_xsave;
+}
+
 #ifdef CONFIG_X86_64
 
 /* Ignore delayed exceptions from user space */
@@ -99,7 +111,7 @@ static inline void clear_fpu_state(struct task_struct *tsk)
/*
 * xsave header may indicate the init state of the FP.
 */
-   if ((task_thread_info(tsk)->status & TS_XSAVE) &&
+   if (use_xsave() &&
!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
return;
 
@@ -164,7 +176,7 @@ static inline void fxsave(struct task_struct *tsk)
 
 static inline void __save_init_fpu(struct task_struct *tsk)
 {
-   if (task_thread_info(tsk)->status & TS_XSAVE)
+   if (use_xsave())
xsave(tsk);
else
fxsave(tsk);
@@ -218,7 +230,7 @@ static inline int fxrstor_checking(struct 
i387_fxsave_struct *fx)
  */
 static inline void __save_init_fpu(struct task_struct *tsk)
 {
-   if (task_thread_info(tsk)->status & TS_XSAVE) {
+   if (use_xsave()) {
struct xsave_struct *xstate = &tsk->thread.xstate->xsave;
struct i387_fxsave_struct *fx = &tsk->thread.xstate->fxsave;
 
@@ -266,7 +278,7 @@ end:
 
 static inline int restore_fpu_checking(struct task_struct *tsk)
 {
-   if (task_thread_info(tsk)->status & TS_XSAVE)
+   if (use_xsave())
return xrstor_checking(&tsk->thread.xstate->xsave);
else
return fxrstor_checking(&tsk->thread.xstate->fxsave);
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index d017ed5..d4092fa 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -242,7 +242,6 @@ static inline struct thread_info *current_thread_info(void)
 #define TS_POLLING 0x0004  /* true if in idle loop
   and not sleeping */
 #define TS_RESTORE_SIGMASK 0x0008  /* restore signal mask in do_signal() */
-#define TS_XSAVE   0x0010  /* Use xsave/xrstor */
 
 #define tsk_is_polling(t) (task_thread_info(t)->status & TS_POLLING)
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4868e4a..c1c00d0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1243,10 +1243,7 @@ void __cpuinit cpu_init(void)
/*
 * Force FPU initialization:
 */
-   if (cpu_has_xsave)
-   current_thread_info()->status = TS_XSAVE;
-   else
-   current_thread_info()->status = 0;
+   current_thread_info()->status = 0;
clear_used_math();
mxcsr_feature_mask_init();
 
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 54c31c2..14ca1dc 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -102,10 +102,7 @@ void __cpuinit fpu_init(void)
 
mxcsr_feature_mask_init();
/* clean state in init */
-   if (cpu_has_xsave)
-   current_thread_info()->status = TS_XSAVE;
-   else
-   current_thread_info()->status = 0;
+   current_thread_info()->status = 0;
clear_used_math();
 }
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 782c3a3..c1b0a11 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -99,7 +99,7 @@ int save_i387_xstate(void __user *buf)
if (err)
return err;
 
-   if (task_thread_info(tsk)->status & TS_XSAVE)
+   if (use_xsave())
err = xsave_user(buf);
else
err = fxsave_user(buf);
@@ -116,7 +11

[PATCH v3 2/2] x86: Introduce 'struct fpu' and related API

2010-05-06 Thread Avi Kivity
Currently all fpu state access is through tsk->thread.xstate.  Since we wish
to generalize fpu access to non-task contexts, wrap the state in a new
'struct fpu' and convert existing access to use an fpu API.

Signal frame handlers are not converted to the API since they will remain
task context only things.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/i387.h  |  115 --
 arch/x86/include/asm/processor.h |6 ++-
 arch/x86/include/asm/xsave.h |7 +-
 arch/x86/kernel/i387.c   |  102 +-
 arch/x86/kernel/process.c|   20 +++
 arch/x86/kernel/process_32.c |2 +-
 arch/x86/kernel/process_64.c |2 +-
 arch/x86/kernel/xsave.c  |2 +-
 arch/x86/math-emu/fpu_aux.c  |6 +-
 9 files changed, 160 insertions(+), 102 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index a301a68..1a8cca3 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -103,10 +104,10 @@ static inline int fxrstor_checking(struct 
i387_fxsave_struct *fx)
values. The kernel data segment can be sometimes 0 and sometimes
new user value. Both should be ok.
Use the PDA as safe address because it should be already in L1. */
-static inline void clear_fpu_state(struct task_struct *tsk)
+static inline void fpu_clear(struct fpu *fpu)
 {
-   struct xsave_struct *xstate = &tsk->thread.xstate->xsave;
-   struct i387_fxsave_struct *fx = &tsk->thread.xstate->fxsave;
+   struct xsave_struct *xstate = &fpu->state->xsave;
+   struct i387_fxsave_struct *fx = &fpu->state->fxsave;
 
/*
 * xsave header may indicate the init state of the FP.
@@ -123,6 +124,11 @@ static inline void clear_fpu_state(struct task_struct *tsk)
  X86_FEATURE_FXSAVE_LEAK);
 }
 
+static inline void clear_fpu_state(struct task_struct *tsk)
+{
+   fpu_clear(&tsk->thread.fpu);
+}
+
 static inline int fxsave_user(struct i387_fxsave_struct __user *fx)
 {
int err;
@@ -147,7 +153,7 @@ static inline int fxsave_user(struct i387_fxsave_struct 
__user *fx)
return err;
 }
 
-static inline void fxsave(struct task_struct *tsk)
+static inline void fpu_fxsave(struct fpu *fpu)
 {
/* Using "rex64; fxsave %0" is broken because, if the memory operand
   uses any extended registers for addressing, a second REX prefix
@@ -157,42 +163,45 @@ static inline void fxsave(struct task_struct *tsk)
/* Using "fxsaveq %0" would be the ideal choice, but is only supported
   starting with gas 2.16. */
__asm__ __volatile__("fxsaveq %0"
-: "=m" (tsk->thread.xstate->fxsave));
+: "=m" (fpu->state->fxsave));
 #elif 0
/* Using, as a workaround, the properly prefixed form below isn't
   accepted by any binutils version so far released, complaining that
   the same type of prefix is used twice if an extended register is
   needed for addressing (fix submitted to mainline 2005-11-21). */
__asm__ __volatile__("rex64/fxsave %0"
-: "=m" (tsk->thread.xstate->fxsave));
+: "=m" (fpu->state->fxsave));
 #else
/* This, however, we can work around by forcing the compiler to select
   an addressing mode that doesn't require extended registers. */
__asm__ __volatile__("rex64/fxsave (%1)"
-: "=m" (tsk->thread.xstate->fxsave)
-: "cdaSDb" (&tsk->thread.xstate->fxsave));
+: "=m" (fpu->state->fxsave)
+: "cdaSDb" (&fpu->state->fxsave));
 #endif
 }
 
-static inline void __save_init_fpu(struct task_struct *tsk)
+static inline void fpu_save_init(struct fpu *fpu)
 {
if (use_xsave())
-   xsave(tsk);
+   fpu_xsave(fpu);
else
-   fxsave(tsk);
+   fpu_fxsave(fpu);
 
-   clear_fpu_state(tsk);
+   fpu_clear(fpu);
+}
+
+static inline void __save_init_fpu(struct task_struct *tsk)
+{
+   fpu_save_init(&tsk->thread.fpu);
task_thread_info(tsk)->status &= ~TS_USEDFPU;
 }
 
 #else  /* CONFIG_X86_32 */
 
 #ifdef CONFIG_MATH_EMULATION
-extern void finit_task(struct task_struct *tsk);
+extern void finit_soft_fpu(struct i387_soft_struct *soft);
 #else
-static inline void finit_task(struct task_struct *tsk)
-{
-}
+static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
 #endif
 
 static inline void tolerant_fwait(void)
@@ -228,13 +237,13 @@ static inline int fxrstor_checking(struct 
i387_fxsave_struct *fx)
 /*
  * These must be called with preempt disabled
  */
-static inline void __save_init_fpu(struct task_struct *tsk)
+static inline void fpu_save_init(struct fpu *fpu)
 {
if (use

[PATCH v3 0/2] x86 FPU API

2010-05-06 Thread Avi Kivity
Currently all fpu accessors are wedded to task_struct.  However kvm also uses
the fpu in a different context.  Introduce an FPU API, and replace the
current uses with the new API.

While this patchset is oriented towards deeper changes, as a first step it
simlifies xsave for kvm.

v3:
use u8 instead of bool in asm to avoid bad code generation on older
gccs.

v2:
eliminate useless padding in use_xsave() by using a larger instruction

Avi Kivity (2):
  x86: eliminate TS_XSAVE
  x86: Introduce 'struct fpu' and related API

 arch/x86/include/asm/i387.h|  135 +++-
 arch/x86/include/asm/processor.h   |6 ++-
 arch/x86/include/asm/thread_info.h |1 -
 arch/x86/include/asm/xsave.h   |7 +-
 arch/x86/kernel/cpu/common.c   |5 +-
 arch/x86/kernel/i387.c |  107 ++---
 arch/x86/kernel/process.c  |   20 +++---
 arch/x86/kernel/process_32.c   |2 +-
 arch/x86/kernel/process_64.c   |2 +-
 arch/x86/kernel/xsave.c|8 +-
 arch/x86/math-emu/fpu_aux.c|6 +-
 11 files changed, 181 insertions(+), 118 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3 v2] KVM: VMX: VMCLEAR/VMPTRLD usage changes.

2010-05-06 Thread Xu, Dongxiao
From: Dongxiao Xu 

Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.

Signed-off-by: Dongxiao Xu 
---
 arch/x86/kvm/vmx.c |   70 +---
 1 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e77da89..aa3b2f9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -62,6 +62,9 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+static int __read_mostly vmm_coexistence;
+module_param(vmm_coexistence, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  \
(X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
 #define KVM_GUEST_CR0_MASK \
@@ -222,6 +225,7 @@ static u64 host_efer;
 
 static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
 
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu);
 /*
  * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it
  * away by decrementing the array size.
@@ -784,25 +788,31 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
 
-   if (vcpu->cpu != cpu) {
-   vcpu_clear(vmx);
-   kvm_migrate_timers(vcpu);
+   if (vmm_coexistence) {
+   vmcs_load(vmx->vmcs);
set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
-   local_irq_disable();
-   list_add(&vmx->local_vcpus_link,
-&per_cpu(vcpus_on_cpu, cpu));
-   local_irq_enable();
-   }
+   } else {
+   if (vcpu->cpu != cpu) {
+   vcpu_clear(vmx);
+   set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
+   local_irq_disable();
+   list_add(&vmx->local_vcpus_link,
+   &per_cpu(vcpus_on_cpu, cpu));
+   local_irq_enable();
+   }
 
-   if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
-   per_cpu(current_vmcs, cpu) = vmx->vmcs;
-   vmcs_load(vmx->vmcs);
+   if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
+   per_cpu(current_vmcs, cpu) = vmx->vmcs;
+   vmcs_load(vmx->vmcs);
+   }
}
 
if (vcpu->cpu != cpu) {
struct desc_ptr dt;
unsigned long sysenter_esp;
 
+   kvm_migrate_timers(vcpu);
+
vcpu->cpu = cpu;
/*
 * Linux uses per-cpu TSS and GDT, so set these when switching
@@ -829,7 +839,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
__vmx_load_host_state(to_vmx(vcpu));
+   if (vmm_coexistence) {
+   vmcs_clear(vmx->vmcs);
+   rdtscll(vmx->vcpu.arch.host_tsc);
+   vmx->launched = 0;
+   }
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
@@ -1280,7 +1297,8 @@ static void kvm_cpu_vmxoff(void)
 
 static void hardware_disable(void *garbage)
 {
-   vmclear_local_vcpus();
+   if (!vmm_coexistence)
+   vmclear_local_vcpus();
kvm_cpu_vmxoff();
write_cr4(read_cr4() & ~X86_CR4_VMXE);
 }
@@ -3927,7 +3945,8 @@ static void vmx_free_vmcs(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
if (vmx->vmcs) {
-   vcpu_clear(vmx);
+   if (!vmm_coexistence)
+   vcpu_clear(vmx);
free_vmcs(vmx->vmcs);
vmx->vmcs = NULL;
}
@@ -3969,13 +3988,20 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (!vmx->vmcs)
goto free_msrs;
 
-   vmcs_clear(vmx->vmcs);
-
-   cpu = get_cpu();
-   vmx_vcpu_load(&vmx->vcpu, cpu);
-   err = vmx_vcpu_setup(vmx);
-   vmx_vcpu_put(&vmx->vcpu);
-   put_cpu();
+   if (vmm_coexistence) {
+   preempt_disable();
+   vmcs_load(vmx->vmcs);
+   err = vmx_vcpu_setup(vmx);
+   vmcs_clear(vmx->vmcs);
+   preempt_enable();
+   } else {
+   vmcs_clear(vmx->vmcs);
+   cpu = get_cpu();
+   vmx_vcpu_load(&vmx->vcpu, cpu);
+   err = vmx_vcpu_setup(vmx);
+   vmx_vcpu_put(&vmx->vcpu);
+   put_cpu();
+   }
if (err)
goto free_vmcs;
if (vm_need_virtualize_apic_accesses(kvm))
@@ -4116,6 +4142,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 
   

Re: [PATCH 0/5] Important fixes for KVM-AMD

2010-05-06 Thread Avi Kivity

On 05/05/2010 05:04 PM, Joerg Roedel wrote:

Hi Avi, Marcelo,

here is a set of patches which fix problems in kvm-amd. Patch 1 fixes a
stupid problem with the event-reinjection introduced by me in my
previous patchset.  Patch 2 was a helper to find the bug patch 3 fixes.
I kept it in the patchset because it may be helpful in the future to
debug other problems too. Patch 3 is the most important fix because it
makes kvm-amd on 32 bit hosts work again.  Without this patch the first
vmrum fails with exit-reason VMEXIT_INVALID. Patch 4 fixes the Xen 4.0
shipped with SLES11 in nested svm. The last patch in this series fixes a
potential l2-guest breakout scenario because it may be possible for the
l2-guest to issue hypercalls directly to the host if the l1-hypervisor
does not intercept VMMCALL.
   


All applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM hook for code integrity checking

2010-05-06 Thread Avi Kivity

On 05/06/2010 07:51 AM, Suen Chun Hui wrote:

Hi,

Thanks for the reply.

On 05/05/2010 11:05 AM, Avi Kivity wrote:
   

On 04/30/2010 05:53 PM, Suen Chun Hui wrote:
 

Dear KVM developers,

I'm currently working on an open source security patch to use KVM to
implement code verification on a guest VM in runtime. Thus, it would be
very helpful if someone can point to me the right function or place to
look at for adding 2 hooks into the KVM paging code to:

1. Detect a new guest page (which I assume will imply a new pte and
imply a new spte).
Currently, I'm considering putting a hook in the function
mmu_set_spte(), but may there is a better place.
This hook will be used as the main entry point into the code
verification function

   

This is in general not possible.  Hosts with npt or ept will not see
new guest ptes.

 

Yes, I was only considering the case of using shadow paging. Would this
be possible then, since the walker would have to parse gpte anyway?
   


It's possible, but it's not a good idea to require shadow paging.  It's 
slow and doesn't scale well.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: make kvm_mmu_zap_page() return the number of pages it actually freed.

2010-05-06 Thread Avi Kivity

On 05/05/2010 04:03 AM, Gui Jianfeng wrote:

Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.

   


if (kvm_mmu_zap_page(kvm, sp))
goto restart;

Needs to be updated.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: mark page dirty when page is actually modified.

2010-05-06 Thread Avi Kivity

On 05/05/2010 04:09 AM, Gui Jianfeng wrote:

Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark
page table page as dirty.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Fix debug output error

2010-05-06 Thread Avi Kivity

On 05/05/2010 04:58 AM, Gui Jianfeng wrote:

Fix a debug output error in walk_addr

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: inject #UD if instruction emulation fails while vcpu is in cpl==3

2010-05-06 Thread Avi Kivity

On 04/29/2010 02:58 PM, Gleb Natapov wrote:

Do not kill VM If instruction emulation fails while vcpu is in
userspace. Inject #UD instead in a hope that guest OS will kill offending
process. Emulation failure is still traced by ftrace point to help
analyze the problem.
   


Still there's the risk here that a critical failure goes unnoticed.  
ftrace isn't on at all times.


We can probably inject a #UD unconditionally and exit to userspace.  
Userspace would then report the problem to the user and reenter the 
guest, which would then recover or not.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: High CPU load on target host after migration

2010-05-06 Thread Avi Kivity

On 04/29/2010 12:14 AM, Thomas Beinicke wrote:

Hi,

I have been toying around with kvm / libvirt / virt-manager and it's migration
feature.
Both host machines are running a 2.6.33 Kernel.

One host is a Dual Quad Core Intel Xeon E5520  @ 2.27GHz and the other is a
Dual Quad Core Intel E5420  @ 2.50GHz.

Migrating Linux machines works great but Windows XP SP3 is giving me a
headache.
The migration process finishes without any crash/error but the CPU load on the
target host is ~50% on two cores. There is no CPU intensive task running
inside the VM though.
"Removing" the network card from the VM and migrating it between the two
machines doesn't seem to trigger the high CPU load.
As network driver I tried the realtek and Red Hat virtio driver but it doesn't
seem to make a difference.

Any insights on what could cause this situation and how to best debug it?

   


Can you run a profiler and see where qemu (or kvm) is spending its time?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM MMU: fix sp->unsync type error in trace event definition.

2010-05-06 Thread Avi Kivity

On 04/22/2010 12:33 PM, Gui Jianfeng wrote:

sp->unsync is bool now, so update trace event declaration.

Signed-off-by: Gui Jianfeng
---
  arch/x86/kvm/mmutrace.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 3851f1f..9966e80 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -11,7 +11,7 @@
__field(__u64, gfn) \
__field(__u32, role) \
__field(__u32, root_count) \
-   __field(__u32, unsync)
+   __field(bool, unsync)

   


This was intended to be more or less binary compatible (well, mmutrace 
not so much).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: make kvm_mmu_zap_page() return the number of pages it actually freed.

2010-05-06 Thread Gui Jianfeng
Avi Kivity wrote:
> On 05/05/2010 04:03 AM, Gui Jianfeng wrote:
>> Currently, kvm_mmu_zap_page() returning the number of freed children sp.
>> This might confuse the caller, because caller don't know the actual freed
>> number. Let's make kvm_mmu_zap_page() return the number of pages it
>> actually
>> freed.
>>
>>
> 
> if (kvm_mmu_zap_page(kvm, sp))
> goto restart;
> 
> Needs to be updated.

Hi Avi,

if kvm_mmu_zap_page() returns 1, we don't know whether the freed sp is the one 
we passes into
or the child. So here just restart hash walk as long as kvm_mmu_zap_page() 
returning a positive
number, although sometimes un-needed hash walk will happen. This fix gets code 
simpler, the
idea comes from Marcelo.

Thanks,
Gui

> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: make kvm_mmu_zap_page() return the number of pages it actually freed.

2010-05-06 Thread Avi Kivity

On 05/06/2010 12:25 PM, Gui Jianfeng wrote:

Avi Kivity wrote:
   

On 05/05/2010 04:03 AM, Gui Jianfeng wrote:
 

Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it
actually
freed.


   

 if (kvm_mmu_zap_page(kvm, sp))
 goto restart;

Needs to be updated.
 

Hi Avi,

if kvm_mmu_zap_page() returns 1, we don't know whether the freed sp is the one 
we passes into
or the child. So here just restart hash walk as long as kvm_mmu_zap_page() 
returning a positive
number, although sometimes un-needed hash walk will happen. This fix gets code 
simpler, the
idea comes from Marcelo.
   


I see.  Note this can invoke quadratic behaviour.  I'll apply the patch, 
but we need to improve this area.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/9] KVM MMU: allow more shadow pages become asynchronous

2010-05-06 Thread Xiao Guangrong
Changlog v4:

- fix the bug that reported by Marcelo

- fix the race in invlpg code

Changlog v3:

Those changes all form Avi's suggestion, thanks.

- use smart way to fix the bug in patch 1
- remove duplicates code in patch 5
- check error code and fix forgot release page in patch 9
- sync shadow pages in a batch instead of one by one

And, there is one TODO thing:
Marker shadow page as unsync at create time avoid write-protect,
this idea is from Avi:

|Another interesting case is to create new shadow pages in the unsync state.
|That can help when the guest starts a short lived process: we can avoid write
|protecting its pagetables completely

I'll send the patch out after this patchset applied.
 
Changlog v2:

- when level is PT_DIRECTORY_LEVEL, the 'offset' should be
  'role.quadrant << 8', thanks Avi for point it out

- keep invlpg code in paging_tmpl.h address Avi's suggestion

- split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
  to clarify the code address Avi's suggestion








--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/9] KVM MMU: split kvm_sync_page() function

2010-05-06 Thread Xiao Guangrong
Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
to clarify the code address Avi's suggestion

kvm_sync_page_transient() function only update shadow page but not mark
it sync and not write protect sp->gfn. it will be used by later patch

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   28 
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index de99638..72238ec 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1196,16 +1196,20 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, 
struct kvm_mmu_page *sp)
 
 static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
-static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+  bool clear_unsync)
 {
if (sp->role.cr4_pae != !!is_pae(vcpu)) {
kvm_mmu_zap_page(vcpu->kvm, sp);
return 1;
}
 
-   if (rmap_write_protect(vcpu->kvm, sp->gfn))
-   kvm_flush_remote_tlbs(vcpu->kvm);
-   kvm_unlink_unsync_page(vcpu->kvm, sp);
+   if (clear_unsync) {
+   if (rmap_write_protect(vcpu->kvm, sp->gfn))
+   kvm_flush_remote_tlbs(vcpu->kvm);
+   kvm_unlink_unsync_page(vcpu->kvm, sp);
+   }
+
if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
kvm_mmu_zap_page(vcpu->kvm, sp);
return 1;
@@ -1215,6 +1219,22 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
return 0;
 }
 
+static void mmu_convert_notrap(struct kvm_mmu_page *sp);
+static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
+  struct kvm_mmu_page *sp)
+{
+   int ret;
+
+   ret = __kvm_sync_page(vcpu, sp, false);
+   mmu_convert_notrap(sp);
+   return ret;
+}
+
+static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+   return __kvm_sync_page(vcpu, sp, true);
+}
+
 struct mmu_page_path {
struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
unsigned int idx[PT64_ROOT_LEVEL-1];
-- 
1.6.1.2



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/9] KVM MMU: don't write-protect if have new mapping to unsync page

2010-05-06 Thread Xiao Guangrong
Two cases maybe happen in kvm_mmu_get_page() function:

- one case is, the goal sp is already in cache, if the sp is unsync,
  we only need update it to assure this mapping is valid, but not
  mark it sync and not write-protect sp->gfn since it not broke unsync
  rule(one shadow page for a gfn)

- another case is, the goal sp not existed, we need create a new sp
  for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
  we should sync(mark sync and write-protect) gfn's unsync shadow page.
  After enabling multiple unsync shadows, we sync those shadow pages
  only when the new sp not allow to become unsync(also for the unsyc
  rule, the new rule is: allow all pte page become unsync)

Changlog:
- fix for forget to mark parent's unsync_children bit when mapping a new
  parent to unsync shadow page

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   18 ++
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 72238ec..1dbb96e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1333,7 +1333,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
unsigned index;
unsigned quadrant;
struct hlist_head *bucket;
-   struct kvm_mmu_page *sp;
+   struct kvm_mmu_page *sp, *unsync_sp = NULL;
struct hlist_node *node, *tmp;
 
role = vcpu->arch.mmu.base_role;
@@ -1352,20 +1352,30 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
if (sp->gfn == gfn) {
if (sp->unsync)
-   if (kvm_sync_page(vcpu, sp))
-   continue;
+   unsync_sp = sp;
 
if (sp->role.word != role.word)
continue;
 
+   if (!direct && unsync_sp &&
+ kvm_sync_page_transient(vcpu, unsync_sp)) {
+   unsync_sp = NULL;
+   break;
+   }
+
mmu_page_add_parent_pte(vcpu, sp, parent_pte);
if (sp->unsync_children) {
set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
kvm_mmu_mark_parents_unsync(sp);
-   }
+   } else if (sp->unsync)
+   kvm_mmu_mark_parents_unsync(sp);
+
trace_kvm_mmu_get_page(sp, false);
return sp;
}
+   if (!direct && unsync_sp)
+   kvm_sync_page(vcpu, unsync_sp);
+
++vcpu->kvm->stat.mmu_cache_miss;
sp = kvm_mmu_alloc_page(vcpu, parent_pte);
if (!sp)
-- 
1.6.1.2



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/9] KVM MMU: allow more page become unsync at gfn mapping time

2010-05-06 Thread Xiao Guangrong
In current code, shadow page can become asynchronous only if one
shadow page for a gfn, this rule is too strict, in fact, we can
let all last mapping page(i.e, it's the pte page) become unsync,
and sync them at invlpg or flush tlb time.

This patch allow more page become asynchronous at gfn mapping time 

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   81 +++
 1 files changed, 37 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1dbb96e..ae8c43b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1166,26 +1166,6 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
return __mmu_unsync_walk(sp, pvec);
 }
 
-static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn)
-{
-   unsigned index;
-   struct hlist_head *bucket;
-   struct kvm_mmu_page *sp;
-   struct hlist_node *node;
-
-   pgprintk("%s: looking for gfn %lx\n", __func__, gfn);
-   index = kvm_page_table_hashfn(gfn);
-   bucket = &kvm->arch.mmu_page_hash[index];
-   hlist_for_each_entry(sp, node, bucket, hash_link)
-   if (sp->gfn == gfn && !sp->role.direct
-   && !sp->role.invalid) {
-   pgprintk("%s: found role %x\n",
-__func__, sp->role.word);
-   return sp;
-   }
-   return NULL;
-}
-
 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
WARN_ON(!sp->unsync);
@@ -1753,47 +1733,60 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, 
gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type);
 
-static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+   trace_kvm_mmu_unsync_page(sp);
+   ++vcpu->kvm->stat.mmu_unsync;
+   sp->unsync = 1;
+
+   kvm_mmu_mark_parents_unsync(sp);
+   mmu_convert_notrap(sp);
+}
+
+static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 {
-   unsigned index;
struct hlist_head *bucket;
struct kvm_mmu_page *s;
struct hlist_node *node, *n;
+   unsigned index;
 
-   index = kvm_page_table_hashfn(sp->gfn);
+   index = kvm_page_table_hashfn(gfn);
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
-   /* don't unsync if pagetable is shadowed with multiple roles */
+
hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
-   if (s->gfn != sp->gfn || s->role.direct)
+   if (s->gfn != gfn || s->role.direct || s->unsync)
continue;
-   if (s->role.word != sp->role.word)
-   return 1;
+   WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
+   __kvm_unsync_page(vcpu, s);
}
-   trace_kvm_mmu_unsync_page(sp);
-   ++vcpu->kvm->stat.mmu_unsync;
-   sp->unsync = 1;
-
-   kvm_mmu_mark_parents_unsync(sp);
-
-   mmu_convert_notrap(sp);
-   return 0;
 }
 
 static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
  bool can_unsync)
 {
-   struct kvm_mmu_page *shadow;
+   unsigned index;
+   struct hlist_head *bucket;
+   struct kvm_mmu_page *s;
+   struct hlist_node *node, *n;
+   bool need_unsync = false;
+
+   index = kvm_page_table_hashfn(gfn);
+   bucket = &vcpu->kvm->arch.mmu_page_hash[index];
+   hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
+   if (s->gfn != gfn || s->role.direct)
+   continue;
 
-   shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
-   if (shadow) {
-   if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
+   if (s->role.level != PT_PAGE_TABLE_LEVEL)
return 1;
-   if (shadow->unsync)
-   return 0;
-   if (can_unsync && oos_shadow)
-   return kvm_unsync_page(vcpu, shadow);
-   return 1;
+
+   if (!need_unsync && !s->unsync) {
+   if (!can_unsync || !oos_shadow)
+   return 1;
+   need_unsync = true;
+   }
}
+   if (need_unsync)
+   kvm_unsync_pages(vcpu, gfn);
return 0;
 }
 
-- 
1.6.1.2





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/9] KVM MMU: allow more page become unsync at getting sp time

2010-05-06 Thread Xiao Guangrong
Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level > 1), we should unsync all
gfn's unsync page

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   47 +--
 1 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ae8c43b..26edc11 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1215,6 +1215,35 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
return __kvm_sync_page(vcpu, sp, true);
 }
 
+/* @gfn should be write-protected at the call site */
+static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
+{
+   struct hlist_head *bucket;
+   struct kvm_mmu_page *s;
+   struct hlist_node *node, *n;
+   unsigned index;
+   bool flush = false;
+
+   index = kvm_page_table_hashfn(gfn);
+   bucket = &vcpu->kvm->arch.mmu_page_hash[index];
+   hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
+   if (s->gfn != gfn || !s->unsync)
+   continue;
+
+   WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
+   if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
+   (vcpu->arch.mmu.sync_page(vcpu, s))) {
+   kvm_mmu_zap_page(vcpu->kvm, s);
+   continue;
+   }
+   kvm_unlink_unsync_page(vcpu->kvm, s);
+   flush = true;
+   }
+
+   if (flush)
+   kvm_mmu_flush_tlb(vcpu);
+}
+
 struct mmu_page_path {
struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
unsigned int idx[PT64_ROOT_LEVEL-1];
@@ -1313,8 +1342,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
unsigned index;
unsigned quadrant;
struct hlist_head *bucket;
-   struct kvm_mmu_page *sp, *unsync_sp = NULL;
+   struct kvm_mmu_page *sp;
struct hlist_node *node, *tmp;
+   bool need_sync = false;
 
role = vcpu->arch.mmu.base_role;
role.level = level;
@@ -1331,17 +1361,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
if (sp->gfn == gfn) {
-   if (sp->unsync)
-   unsync_sp = sp;
+   if (!need_sync && sp->unsync)
+   need_sync = true;
 
if (sp->role.word != role.word)
continue;
 
-   if (!direct && unsync_sp &&
- kvm_sync_page_transient(vcpu, unsync_sp)) {
-   unsync_sp = NULL;
+   if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
break;
-   }
 
mmu_page_add_parent_pte(vcpu, sp, parent_pte);
if (sp->unsync_children) {
@@ -1353,9 +1380,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
trace_kvm_mmu_get_page(sp, false);
return sp;
}
-   if (!direct && unsync_sp)
-   kvm_sync_page(vcpu, unsync_sp);
-
++vcpu->kvm->stat.mmu_cache_miss;
sp = kvm_mmu_alloc_page(vcpu, parent_pte);
if (!sp)
@@ -1366,6 +1390,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (!direct) {
if (rmap_write_protect(vcpu->kvm, gfn))
kvm_flush_remote_tlbs(vcpu->kvm);
+   if (level > PT_PAGE_TABLE_LEVEL && need_sync)
+   kvm_sync_pages(vcpu, gfn);
+
account_shadowed(vcpu->kvm, gfn);
}
if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte)
-- 
1.6.1.2





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 5/9] KVM MMU: rename 'root_count' to 'active_count'

2010-05-06 Thread Xiao Guangrong
Rename 'root_count' to 'active_count' in kvm_mmu_page, since the unsync pages
also will use it in later patch

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |7 ++-
 arch/x86/kvm/mmu.c  |   14 +++---
 arch/x86/kvm/mmutrace.h |6 +++---
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ed48904..86a8550 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -203,7 +203,12 @@ struct kvm_mmu_page {
DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
bool multimapped; /* More than one parent_pte? */
bool unsync;
-   int root_count;  /* Currently serving as active root */
+   /*
+* if active_count > 0, it means that this page is not freed
+* immediately, it's used by active root and unsync pages which
+* out of kvm->mmu_lock's protection currently.
+*/
+   int active_count;
unsigned int unsync_children;
union {
u64 *parent_pte;   /* !multimapped */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 26edc11..58cf0f1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1539,7 +1539,7 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
unaccount_shadowed(kvm, sp->gfn);
if (sp->unsync)
kvm_unlink_unsync_page(kvm, sp);
-   if (!sp->root_count) {
+   if (!sp->active_count) {
hlist_del(&sp->hash_link);
kvm_mmu_free_page(kvm, sp);
} else {
@@ -2060,8 +2060,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
hpa_t root = vcpu->arch.mmu.root_hpa;
 
sp = page_header(root);
-   --sp->root_count;
-   if (!sp->root_count && sp->role.invalid)
+   --sp->active_count;
+   if (!sp->active_count && sp->role.invalid)
kvm_mmu_zap_page(vcpu->kvm, sp);
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
spin_unlock(&vcpu->kvm->mmu_lock);
@@ -2073,8 +2073,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
if (root) {
root &= PT64_BASE_ADDR_MASK;
sp = page_header(root);
-   --sp->root_count;
-   if (!sp->root_count && sp->role.invalid)
+   --sp->active_count;
+   if (!sp->active_count && sp->role.invalid)
kvm_mmu_zap_page(vcpu->kvm, sp);
}
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
@@ -2120,7 +2120,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
  PT64_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp->spt);
-   ++sp->root_count;
+   ++sp->active_count;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = root;
return 0;
@@ -2150,7 +2150,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
  PT32_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp->spt);
-   ++sp->root_count;
+   ++sp->active_count;
spin_unlock(&vcpu->kvm->mmu_lock);
 
vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 42f07b1..8c8d265 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -10,13 +10,13 @@
 #define KVM_MMU_PAGE_FIELDS \
__field(__u64, gfn) \
__field(__u32, role) \
-   __field(__u32, root_count) \
+   __field(__u32, active_count) \
__field(bool, unsync)
 
 #define KVM_MMU_PAGE_ASSIGN(sp) \
__entry->gfn = sp->gfn;  \
__entry->role = sp->role.word;   \
-   __entry->root_count = sp->root_count;\
+   __entry->active_count = sp->active_count;\
__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({   \
@@ -37,7 +37,7 @@
 access_str[role.access],   \
 role.invalid ? " invalid" : "",\
 role.nxe ? "" : "!",   \
-__entry->root_count,   \
+__entry->active_count, \
 __entry->unsync ? "unsync" : "sync", 0);   \
ret;\
})
-- 
1.6.1.2





--
To unsubscribe from this list: s

[PATCH v4 6/9] KVM MMU: support keeping sp live while it's out of protection

2010-05-06 Thread Xiao Guangrong
If we want to keep sp live while it it's out of kvm->mmu_lock protection,
we can increase sp->active_count.

Then, the invalid page is not only for active root but also unsync sp, we
should filter those out when we make a page to unsync.

And move 'hlist_del(&sp->hash_link)' into kvm_mmu_free_page() then we can
free the invalid unsync page to call kvm_mmu_free_page directly.

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 58cf0f1..8ab1a49 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -894,6 +894,7 @@ static int is_empty_shadow_page(u64 *spt)
 static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
ASSERT(is_empty_shadow_page(sp->spt));
+   hlist_del(&sp->hash_link);
list_del(&sp->link);
__free_page(virt_to_page(sp->spt));
__free_page(virt_to_page(sp->gfns));
@@ -1539,13 +1540,14 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
unaccount_shadowed(kvm, sp->gfn);
if (sp->unsync)
kvm_unlink_unsync_page(kvm, sp);
-   if (!sp->active_count) {
-   hlist_del(&sp->hash_link);
+   if (!sp->active_count)
kvm_mmu_free_page(kvm, sp);
-   } else {
+   else {
sp->role.invalid = 1;
list_move(&sp->link, &kvm->arch.active_mmu_pages);
-   kvm_reload_remote_mmus(kvm);
+   /* No need reload mmu if it's unsync page zapped */
+   if (sp->role.level != PT_PAGE_TABLE_LEVEL)
+   kvm_reload_remote_mmus(kvm);
}
kvm_mmu_reset_last_pte_updated(kvm);
return ret;
@@ -1781,7 +1783,8 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  
gfn_t gfn)
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
 
hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
-   if (s->gfn != gfn || s->role.direct || s->unsync)
+   if (s->gfn != gfn || s->role.direct || s->unsync ||
+ s->role.invalid)
continue;
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
__kvm_unsync_page(vcpu, s);
@@ -1806,7 +1809,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, 
gfn_t gfn,
if (s->role.level != PT_PAGE_TABLE_LEVEL)
return 1;
 
-   if (!need_unsync && !s->unsync) {
+   if (!need_unsync && !s->unsync && !s->role.invalid) {
if (!can_unsync || !oos_shadow)
return 1;
need_unsync = true;
-- 
1.6.1.2






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 7/9] KVM MMU: separate invlpg code form kvm_mmu_pte_write()

2010-05-06 Thread Xiao Guangrong
Let invlpg not depends on kvm_mmu_pte_write path, later patch will need
this feature

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   22 +-
 arch/x86/kvm/paging_tmpl.h |   44 +++-
 2 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8ab1a49..5e32751 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2318,6 +2318,10 @@ static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 
gpte, int level)
return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
 }
 
+static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+ u64 gpte);
+static void mmu_release_page_from_pte_write(struct kvm_vcpu *vcpu);
+
 #define PTTYPE 64
 #include "paging_tmpl.h"
 #undef PTTYPE
@@ -2631,6 +2635,14 @@ static void mmu_guess_page_from_pte_write(struct 
kvm_vcpu *vcpu, gpa_t gpa,
vcpu->arch.update_pte.pfn = pfn;
 }
 
+static void mmu_release_page_from_pte_write(struct kvm_vcpu *vcpu)
+{
+   if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
+   kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
+   vcpu->arch.update_pte.pfn = bad_pfn;
+   }
+}
+
 static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
u64 *spte = vcpu->arch.last_pte_updated;
@@ -2663,12 +2675,9 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int flooded = 0;
int npte;
int r;
-   int invlpg_counter;
 
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 
-   invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter);
-
/*
 * Assume that the pte write on a page table of the same type
 * as the current vcpu paging mode.  This is nearly always true
@@ -2701,8 +2710,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 
mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
spin_lock(&vcpu->kvm->mmu_lock);
-   if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
-   gentry = 0;
kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
++vcpu->kvm->stat.mmu_pte_write;
@@ -2779,10 +2786,7 @@ restart:
}
kvm_mmu_audit(vcpu, "post pte write");
spin_unlock(&vcpu->kvm->mmu_lock);
-   if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
-   kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
-   vcpu->arch.update_pte.pfn = bad_pfn;
-   }
+   mmu_release_page_from_pte_write(vcpu);
 }
 
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 89d66ca..93ee2d9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -462,11 +462,11 @@ out_unlock:
 
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
+   struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator iterator;
-   gpa_t pte_gpa = -1;
-   int level;
-   u64 *sptep;
-   int need_flush = 0;
+   gfn_t gfn = -1;
+   u64 *sptep = NULL, gentry;
+   int invlpg_counter, level, offset = 0, need_flush = 0;
 
spin_lock(&vcpu->kvm->mmu_lock);
 
@@ -475,15 +475,14 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
gva)
sptep = iterator.sptep;
 
if (is_last_spte(*sptep, level)) {
-   struct kvm_mmu_page *sp = page_header(__pa(sptep));
-   int offset, shift;
+   int shift;
 
+   sp = page_header(__pa(sptep));
shift = PAGE_SHIFT -
  (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
+   gfn = sp->gfn;
offset = sp->role.quadrant << shift;
-
-   pte_gpa = (sp->gfn << PAGE_SHIFT) + offset;
-   pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
+   offset += (sptep - sp->spt) * sizeof(pt_element_t);
 
if (is_shadow_present_pte(*sptep)) {
rmap_remove(vcpu->kvm, sptep);
@@ -492,6 +491,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
need_flush = 1;
}
__set_spte(sptep, shadow_trap_nonpresent_pte);
+   sp->active_count++;
break;
}
 
@@ -502,16 +502,34 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
gva)
if (need_flush)
kvm_flush_remote_tlbs(vcpu->kvm);
 
-   atomic_inc(&vcpu->kvm->arch.invlpg_counter);
-
+   invlpg_counter = atomic_add_return(1, &vcpu->kvm->arch.invlpg_counter);
spin_unlock(&vcpu->kvm->mmu_lock);
 
-   if (pte_gpa == -1)
+   if (gfn == -1)
   

[PATCH v4 8/9] KVM MMU: no need atomic operation for 'invlpg_counter'

2010-05-06 Thread Xiao Guangrong
'invlpg_counter' is protected by 'kvm->mmu_lock', no need atomic
operation anymore

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/paging_tmpl.h  |7 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 86a8550..f4e8973 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -387,7 +387,7 @@ struct kvm_arch {
unsigned int n_free_mmu_pages;
unsigned int n_requested_mmu_pages;
unsigned int n_alloc_mmu_pages;
-   atomic_t invlpg_counter;
+   unsigned int invlpg_counter;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
 * Hash table of struct kvm_mmu_page.
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 93ee2d9..ceaac55 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -466,7 +466,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
struct kvm_shadow_walk_iterator iterator;
gfn_t gfn = -1;
u64 *sptep = NULL, gentry;
-   int invlpg_counter, level, offset = 0, need_flush = 0;
+   unsigned int invlpg_counter;
+   int level, offset = 0, need_flush = 0;
 
spin_lock(&vcpu->kvm->mmu_lock);
 
@@ -502,7 +503,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
if (need_flush)
kvm_flush_remote_tlbs(vcpu->kvm);
 
-   invlpg_counter = atomic_add_return(1, &vcpu->kvm->arch.invlpg_counter);
+   invlpg_counter = ++vcpu->kvm->arch.invlpg_counter;
spin_unlock(&vcpu->kvm->mmu_lock);
 
if (gfn == -1)
@@ -522,7 +523,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
kvm_mmu_free_page(vcpu->kvm, sp);
goto unlock_exit;
}
-   if (atomic_read(&vcpu->kvm->arch.invlpg_counter) == invlpg_counter &&
+   if (vcpu->kvm->arch.invlpg_counter == invlpg_counter &&
sp->role.level == PT_PAGE_TABLE_LEVEL) {
++vcpu->kvm->stat.mmu_pte_updated;
FNAME(update_pte)(vcpu, sp, sptep, &gentry);
-- 
1.6.1.2





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 9/9] KVM MMU: optimize sync/update unsync-page

2010-05-06 Thread Xiao Guangrong
invlpg only need update unsync page, sp->unsync and sp->unsync_children
can help us to find it

Now, a gfn may have many shadow pages, when one sp need be synced, we
write protect sp->gfn and sync this sp but we keep other shadow pages
asynchronous

So, while gfn happen page fault, let it not touch unsync page, the unsync
page only updated at invlpg/flush TLB time

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |3 ++-
 arch/x86/kvm/paging_tmpl.h |   12 
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5e32751..7ea551c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2731,7 +2731,8 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 
 restart:
hlist_for_each_entry_safe(sp, node, n, bucket, hash_link) {
-   if (sp->gfn != gfn || sp->role.direct || sp->role.invalid)
+   if (sp->gfn != gfn || sp->role.direct || sp->role.invalid ||
+ sp->unsync)
continue;
pte_size = sp->role.cr4_pae ? 8 : 4;
misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index ceaac55..5687c0e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,10 +475,15 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
gva)
level = iterator.level;
sptep = iterator.sptep;
 
+   sp = page_header(__pa(sptep));
if (is_last_spte(*sptep, level)) {
int shift;
 
-   sp = page_header(__pa(sptep));
+   if (!sp->unsync)
+   break;
+
+   WARN_ON(level != PT_PAGE_TABLE_LEVEL);
+
shift = PAGE_SHIFT -
  (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
gfn = sp->gfn;
@@ -496,7 +501,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
break;
}
 
-   if (!is_shadow_present_pte(*sptep))
+   if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
break;
}
 
@@ -523,8 +528,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
kvm_mmu_free_page(vcpu->kvm, sp);
goto unlock_exit;
}
-   if (vcpu->kvm->arch.invlpg_counter == invlpg_counter &&
-   sp->role.level == PT_PAGE_TABLE_LEVEL) {
+   if (vcpu->kvm->arch.invlpg_counter == invlpg_counter) {
++vcpu->kvm->stat.mmu_pte_updated;
FNAME(update_pte)(vcpu, sp, sptep, &gentry);
}
-- 
1.6.1.2





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

2010-05-06 Thread Roedel, Joerg
On Wed, May 05, 2010 at 04:57:00PM -0400, Przywara, Andre wrote:

> If I understood this correctly, there is a bug somewhere, maybe even in 
> KVM's nested SVM implementation. Xen is fine with this bit-set provoking 
> a #GP. I haven't had time yet to further investigate this, though.

Ok, I looked at this again and reproduced the traces I already deleted
and fetched the Xen crash message and found something I missed before.
The relevant part of the KVM trace is:

 qemu-system-x86-7364  [012]   790.715351: kvm_exit: reason msr rip 
0x82c4801b5c93
 qemu-system-x86-7364  [012]   790.715352: kvm_msr: msr_write c080 = 0x3d01
 qemu-system-x86-7364  [012]   790.715354: kvm_inj_exception: #GP (0x0)

And the Xen-Crash message is:

(XEN) Xen call trace:
(XEN)[] svm_cpu_up+0x135/0x200
(XEN)[] start_svm+0x3c/0xe0
(XEN)[] identify_cpu+0xd2/0x240
(XEN)[] __start_xen+0x1dbb/0x3660
(XEN)[] __high_start+0xa1/0xa3
(XEN)
(XEN) 
(XEN) 
(XEN) Panic on CPU 0:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=]
(XEN) 

The MSR write happens on rip 0x82c4801b5c93 while the #GP is
injected at rip 82c4801b5c95 (== right after the wrmsr instruction).
So yes, there is another bug in KVM here. The problem is that the
set_efer function does not report write errors to ist caller and injects
the #GP directly. The svm:wrmsr_interception recognizes a success and
advances the rip.
The attached patch fixes this.

>From e0d69cf7a396d35ae9aa4778e87f82c243bfa0ae Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Thu, 6 May 2010 11:07:46 +0200
Subject: [PATCH] KVM: X86: Inject #GP with the right rip on efer writes

This patch fixes a bug in the KVM efer-msr write path. If a
guest writes to a reserved efer bit the set_efer function
injects the #GP directly. The architecture dependent wrmsr
function does not see this, assumes success and advances the
rip. This results in a #GP in the guest with the wrong rip.
This patch fixes this by reporting efer write errors back to
the architectural wrmsr function.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/x86.c |   31 ---
 1 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c83528e..5bd7b30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -683,37 +683,29 @@ static u32 emulated_msrs[] = {
MSR_IA32_MISC_ENABLE,
 };
 
-static void set_efer(struct kvm_vcpu *vcpu, u64 efer)
+static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
-   if (efer & efer_reserved_bits) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (efer & efer_reserved_bits)
+   return 1;
 
if (is_paging(vcpu)
-   && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
+   return 1;
 
if (efer & EFER_FFXSR) {
struct kvm_cpuid_entry2 *feat;
 
feat = kvm_find_cpuid_entry(vcpu, 0x8001, 0);
-   if (!feat || !(feat->edx & bit(X86_FEATURE_FXSR_OPT))) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (!feat || !(feat->edx & bit(X86_FEATURE_FXSR_OPT)))
+   return 1;
}
 
if (efer & EFER_SVME) {
struct kvm_cpuid_entry2 *feat;
 
feat = kvm_find_cpuid_entry(vcpu, 0x8001, 0);
-   if (!feat || !(feat->ecx & bit(X86_FEATURE_SVM))) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (!feat || !(feat->ecx & bit(X86_FEATURE_SVM)))
+   return 1;
}
 
kvm_x86_ops->set_efer(vcpu, efer);
@@ -725,6 +717,8 @@ static void set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
vcpu->arch.mmu.base_role.nxe = (efer & EFER_NX) && !tdp_enabled;
kvm_mmu_reset_context(vcpu);
+
+   return 0;
 }
 
 void kvm_enable_efer_bits(u64 mask)
@@ -1145,8 +1139,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
 {
switch (msr) {
case MSR_EFER:
-   set_efer(vcpu, data);
-   break;
+   return set_efer(vcpu, data);
case MSR_K7_HWCR:
data &= ~(u64)0x40; /* ignore flush filter disable */
data &= ~(u64)0x100;/* ignore ignne emulation enable */
-- 
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-06 Thread Avi Kivity

On 05/05/2010 11:58 PM, Michael S. Tsirkin wrote:

+   /* We publish the last-seen used index at the end of the available ring.
+* It is at the end for backwards compatibility. */
+   vr->last_used_idx =&(vr)->avail->ring[num];
+   /* Verify that last used index does not spill over the used ring. */
+   BUG_ON((void *)vr->last_used_idx +
+  sizeof *vr->last_used_idx>  (void *)vr->used);
  }
   


Shouldn't this be on its own cache line?

One way to achieve this is to allocate it at the end.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: inject #UD if instruction emulation fails while vcpu is in cpl==3

2010-05-06 Thread Gleb Natapov
On Thu, May 06, 2010 at 12:15:58PM +0300, Avi Kivity wrote:
> On 04/29/2010 02:58 PM, Gleb Natapov wrote:
> >Do not kill VM If instruction emulation fails while vcpu is in
> >userspace. Inject #UD instead in a hope that guest OS will kill offending
> >process. Emulation failure is still traced by ftrace point to help
> >analyze the problem.
> 
> Still there's the risk here that a critical failure goes unnoticed.
> ftrace isn't on at all times.
> 
Kvm_stat will still show that there was emulation failure, so if strange
application behaviour is reported kvm_stat output will have hints where
to look. Next step in analyzing the problem will be enabling emulator
tracing.

> We can probably inject a #UD unconditionally and exit to userspace.
> Userspace would then report the problem to the user and reenter the
> guest, which would then recover or not.
> 
By "unconditionally" you mean even if guest is in kernel mode? There is
no point in trying to continue after that happens. Instead of getting
paused VM at exact place where problem happened and easily analyzable we
will get misbehaved VM with undefined state.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: inject #UD if instruction emulation fails while vcpu is in cpl==3

2010-05-06 Thread Avi Kivity

On 05/06/2010 01:06 PM, Gleb Natapov wrote:

On Thu, May 06, 2010 at 12:15:58PM +0300, Avi Kivity wrote:
   

On 04/29/2010 02:58 PM, Gleb Natapov wrote:
 

Do not kill VM If instruction emulation fails while vcpu is in
userspace. Inject #UD instead in a hope that guest OS will kill offending
process. Emulation failure is still traced by ftrace point to help
analyze the problem.
   

Still there's the risk here that a critical failure goes unnoticed.
ftrace isn't on at all times.

 

Kvm_stat will still show that there was emulation failure, so if strange
application behaviour is reported kvm_stat output will have hints where
to look. Next step in analyzing the problem will be enabling emulator
tracing.
   


We can expect that from a developer or a user subscribed to k...@.  But 
what about some random user running virt-manager?


dmesg and kvm_stat will not go up the management stack.


We can probably inject a #UD unconditionally and exit to userspace.
Userspace would then report the problem to the user and reenter the
guest, which would then recover or not.

 

By "unconditionally" you mean even if guest is in kernel mode?


Yes.


There is
no point in trying to continue after that happens. Instead of getting
paused VM at exact place where problem happened and easily analyzable we
will get misbehaved VM with undefined state.
   


True.  But the same problem exists with cpl>0 #UD.  It may be a critical 
driver in userspace (say, video driver).


Also need to think consider nested kernels (which are userspace for this 
purpose).


How about default to unconditional #UD and report, and pause if 
requested (in userspace)?  Usually emulation failures will be 100% 
reproducible, so the user can rerun their workload.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 5/9] kvm: synchronize state from cpu context

2010-05-06 Thread Avi Kivity

On 05/04/2010 03:45 PM, Marcelo Tosatti wrote:

From: Jan Kiszka

It is not safe to retrieve the KVM internal state of a given cpu
while its potentially modifying it.

Queue the request to run on cpu context, similarly to qemu-kvm.
   


Even better is to query the state in the cpu thread; but that can come 
later.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 0/9] enable smp > 1 and related fixes

2010-05-06 Thread Avi Kivity

On 05/05/2010 09:24 PM, Anthony Liguori wrote:

On 05/04/2010 07:45 AM, Marcelo Tosatti wrote:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How does this work without an in-kernel apic (or does uq/master 
already have an in-kernel apic)?


An in-kernel apic isn't needed (qemu-kvm supports smp with userspace apic).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 0/9] enable smp > 1 and related fixes

2010-05-06 Thread Avi Kivity

On 05/04/2010 03:45 PM, Marcelo Tosatti wrote:

Applied all, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

2010-05-06 Thread Avi Kivity

On 05/06/2010 12:38 PM, Roedel, Joerg wrote:

Subject: [PATCH] KVM: X86: Inject #GP with the right rip on efer writes
This patch fixes a bug in the KVM efer-msr write path. If a
guest writes to a reserved efer bit the set_efer function
injects the #GP directly. The architecture dependent wrmsr
function does not see this, assumes success and advances the
rip. This results in a #GP in the guest with the wrong rip.
This patch fixes this by reporting efer write errors back to
the architectural wrmsr function.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: inject #UD if instruction emulation fails while vcpu is in cpl==3

2010-05-06 Thread Gleb Natapov
On Thu, May 06, 2010 at 01:13:30PM +0300, Avi Kivity wrote:
> On 05/06/2010 01:06 PM, Gleb Natapov wrote:
> >On Thu, May 06, 2010 at 12:15:58PM +0300, Avi Kivity wrote:
> >>On 04/29/2010 02:58 PM, Gleb Natapov wrote:
> >>>Do not kill VM If instruction emulation fails while vcpu is in
> >>>userspace. Inject #UD instead in a hope that guest OS will kill offending
> >>>process. Emulation failure is still traced by ftrace point to help
> >>>analyze the problem.
> >>Still there's the risk here that a critical failure goes unnoticed.
> >>ftrace isn't on at all times.
> >>
> >Kvm_stat will still show that there was emulation failure, so if strange
> >application behaviour is reported kvm_stat output will have hints where
> >to look. Next step in analyzing the problem will be enabling emulator
> >tracing.
> 
> We can expect that from a developer or a user subscribed to k...@.
> But what about some random user running virt-manager?
> 
What virt-manager would do about such error?

> dmesg and kvm_stat will not go up the management stack.
> 
> >>We can probably inject a #UD unconditionally and exit to userspace.
> >>Userspace would then report the problem to the user and reenter the
> >>guest, which would then recover or not.
> >>
> >By "unconditionally" you mean even if guest is in kernel mode?
> 
> Yes.
> 
> >There is
> >no point in trying to continue after that happens. Instead of getting
> >paused VM at exact place where problem happened and easily analyzable we
> >will get misbehaved VM with undefined state.
> 
> True.  But the same problem exists with cpl>0 #UD.  It may be a
> critical driver in userspace (say, video driver).
> 
> Also need to think consider nested kernels (which are userspace for
> this purpose).
Ugh, we can check if vcpu is in nested mode.

> 
> How about default to unconditional #UD and report, and pause if
> requested (in userspace)?  Usually emulation failures will be 100%
> reproducible, so the user can rerun their workload.
> 
Didn't what to involve userspace in this, but it can be done of course.
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio-win problem

2010-05-06 Thread Riccardo Veraldi

Hello,
if I install virtio-win drivers on windows 2008 Server R2, I have the 
problem of signed device drivers.
I Can install the drivers but Windows 2008 server refuses to use them 
unless I start
the machine pressing F8 every time at each reboot bypassing the checking 
of signed certified drivers, and this is annoying,

since I Cannot reboot the virtual machien automatically.

Anyone solved this issue ?
thanks

Rick

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: inject #UD if instruction emulation fails while vcpu is in cpl==3

2010-05-06 Thread Avi Kivity

On 05/06/2010 03:25 PM, Gleb Natapov wrote:



We can expect that from a developer or a user subscribed to k...@.
But what about some random user running virt-manager?

 

What virt-manager would do about such error?
   


Call up abrt.


True.  But the same problem exists with cpl>0 #UD.  It may be a
critical driver in userspace (say, video driver).

Also need to think consider nested kernels (which are userspace for
this purpose).
 

Ugh, we can check if vcpu is in nested mode.
   


And do what?  Inject #UD to the guest?  Or force some vmexit?


How about default to unconditional #UD and report, and pause if
requested (in userspace)?  Usually emulation failures will be 100%
reproducible, so the user can rerun their workload.

 

Didn't what to involve userspace in this, but it can be done of course.
   


Whenever we have to make a decision, we involve userspace.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] KVM: inject #UD if instruction emulation fails while vcpu is in cpl==3

2010-05-06 Thread Gleb Natapov
On Thu, May 06, 2010 at 03:33:12PM +0300, Avi Kivity wrote:
> On 05/06/2010 03:25 PM, Gleb Natapov wrote:
> >
> >>We can expect that from a developer or a user subscribed to k...@.
> >>But what about some random user running virt-manager?
> >>
> >What virt-manager would do about such error?
> 
> Call up abrt.
> 
The idea is not to let userspace process running in a VM kill the VM.

> >>True.  But the same problem exists with cpl>0 #UD.  It may be a
> >>critical driver in userspace (say, video driver).
> >>
> >>Also need to think consider nested kernels (which are userspace for
> >>this purpose).
> >Ugh, we can check if vcpu is in nested mode.
> 
> And do what?  Inject #UD to the guest?  Or force some vmexit?
> 
Does host emulator will ever run on behalf of nested guest? We have
emulator inside nested guest for this.

> >>How about default to unconditional #UD and report, and pause if
> >>requested (in userspace)?  Usually emulation failures will be 100%
> >>reproducible, so the user can rerun their workload.
> >>
> >Didn't what to involve userspace in this, but it can be done of course.
> 
> Whenever we have to make a decision, we involve userspace.
> 
> -- 
> error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH RFC] KVM: inject #UD if instruction emulation fails while vcpu is in cpl==3

2010-05-06 Thread Avi Kivity

On 05/06/2010 03:41 PM, Gleb Natapov wrote:

On Thu, May 06, 2010 at 03:33:12PM +0300, Avi Kivity wrote:
   

On 05/06/2010 03:25 PM, Gleb Natapov wrote:
 
   

We can expect that from a developer or a user subscribed to k...@.
But what about some random user running virt-manager?

 

What virt-manager would do about such error?
   

Call up abrt.

 

The idea is not to let userspace process running in a VM kill the VM.
   


Well, log the problem (including registers and instruction code), and 
continue with the #UD.



True.  But the same problem exists with cpl>0 #UD.  It may be a
critical driver in userspace (say, video driver).

Also need to think consider nested kernels (which are userspace for
this purpose).
 

Ugh, we can check if vcpu is in nested mode.
   

And do what?  Inject #UD to the guest?  Or force some vmexit?

 

Does host emulator will ever run on behalf of nested guest? We have
emulator inside nested guest for this.
   


If the guest doesn't map the page, it will emulate.  If it does map the 
page, and the host doesn't, the host emulates.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2] KVM: VMX code cleanup and preparation.

2010-05-06 Thread Avi Kivity

On 05/06/2010 11:45 AM, Xu, Dongxiao wrote:

From: Dongxiao Xu

   


Changelog needs to describe what's being done and why.  Splitting the 
changes will help.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IOzone preprocessing: Fix wrong column mapping on graph generation

2010-05-06 Thread Lucas Meneghel Rodrigues
Fix a silly bug on graph generation: it was mapping the wrong
columns when plotting the 2D throughput graphs. Sorry for the
mistake.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/iozone/postprocessing.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/iozone/postprocessing.py 
b/client/tests/iozone/postprocessing.py
index c995aea..3a77c83 100755
--- a/client/tests/iozone/postprocessing.py
+++ b/client/tests/iozone/postprocessing.py
@@ -384,7 +384,7 @@ class IOzonePlotter(object):
 record size vs. throughput.
 """
 datasource_2d = os.path.join(self.output_dir, '2d-datasource-file')
-for index, label in zip(range(1, 14), _LABELS[2:]):
+for index, label in zip(range(2, 15), _LABELS[2:]):
 commands_path = os.path.join(self.output_dir, '2d-%s.do' % label)
 commands = ""
 commands += "set title 'Iozone performance: %s'\n" % label
-- 
1.7.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v2] KVM: VMX: VMCLEAR/VMPTRLD usage changes.

2010-05-06 Thread Avi Kivity

On 05/06/2010 11:45 AM, Xu, Dongxiao wrote:

From: Dongxiao Xu

Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.



+static int __read_mostly vmm_coexistence;
+module_param(vmm_coexistence, bool, S_IRUGO);
   


I think we can call it 'exclusive' defaulting to true.


+
  #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \
(X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
  #define KVM_GUEST_CR0_MASK\
@@ -222,6 +225,7 @@ static u64 host_efer;

  static void ept_save_pdptrs(struct kvm_vcpu *vcpu);

+static void vmx_flush_tlb(struct kvm_vcpu *vcpu);
  /*
   * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it
   * away by decrementing the array size.
@@ -784,25 +788,31 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;

-   if (vcpu->cpu != cpu) {
-   vcpu_clear(vmx);
-   kvm_migrate_timers(vcpu);
+   if (vmm_coexistence) {
+   vmcs_load(vmx->vmcs);
set_bit(KVM_REQ_TLB_FLUSH,&vcpu->requests);
-   local_irq_disable();
-   list_add(&vmx->local_vcpus_link,
-   &per_cpu(vcpus_on_cpu, cpu));
-   local_irq_enable();
-   }
+   } else {
+   if (vcpu->cpu != cpu) {
+   vcpu_clear(vmx);
+   set_bit(KVM_REQ_TLB_FLUSH,&vcpu->requests);
+   local_irq_disable();
+   list_add(&vmx->local_vcpus_link,
+   &per_cpu(vcpus_on_cpu, cpu));
+   local_irq_enable();
+   }

-   if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
-   per_cpu(current_vmcs, cpu) = vmx->vmcs;
-   vmcs_load(vmx->vmcs);
+   if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
+   per_cpu(current_vmcs, cpu) = vmx->vmcs;
+   vmcs_load(vmx->vmcs);
+   }
}
   


Please keep the exclusive use code as it is, and just add !exclusive 
cases.  For example. the current_vmcs test will always fail since 
current_vmcs will be set to NULL on vmx_vcpu_put, so we can have just 
one vmcs_load().




@@ -829,7 +839,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

  static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
  {
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
__vmx_load_host_state(to_vmx(vcpu));
+   if (vmm_coexistence) {
+   vmcs_clear(vmx->vmcs);
+   rdtscll(vmx->vcpu.arch.host_tsc);
+   vmx->launched = 0;
   


This code is also duplicated.  Please refactor to avoid duplication.


+   }
  }

  static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
@@ -1280,7 +1297,8 @@ static void kvm_cpu_vmxoff(void)

  static void hardware_disable(void *garbage)
  {
-   vmclear_local_vcpus();
+   if (!vmm_coexistence)
+   vmclear_local_vcpus();
kvm_cpu_vmxoff();
write_cr4(read_cr4()&  ~X86_CR4_VMXE);
  }
@@ -3927,7 +3945,8 @@ static void vmx_free_vmcs(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);

if (vmx->vmcs) {
-   vcpu_clear(vmx);
+   if (!vmm_coexistence)
+   vcpu_clear(vmx);
   


What's wrong with doing this unconditionally?


free_vmcs(vmx->vmcs);
vmx->vmcs = NULL;
}
@@ -3969,13 +3988,20 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (!vmx->vmcs)
goto free_msrs;

-   vmcs_clear(vmx->vmcs);
-
-   cpu = get_cpu();
-   vmx_vcpu_load(&vmx->vcpu, cpu);
-   err = vmx_vcpu_setup(vmx);
-   vmx_vcpu_put(&vmx->vcpu);
-   put_cpu();
+   if (vmm_coexistence) {
+   preempt_disable();
+   vmcs_load(vmx->vmcs);
+   err = vmx_vcpu_setup(vmx);
+   vmcs_clear(vmx->vmcs);
+   preempt_enable();
   


Why won't the normal sequence work?


+   } else {
+   vmcs_clear(vmx->vmcs);
+   cpu = get_cpu();
+   vmx_vcpu_load(&vmx->vcpu, cpu);
+   err = vmx_vcpu_setup(vmx);
+   vmx_vcpu_put(&vmx->vcpu);
+   put_cpu();
+   }
if (err)
goto free_vmcs;
if (vm_need_virtualize_apic_accesses(kvm))
@@ -4116,6 +4142,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)

vmx->rdtscp_enabled = false;
if (vmx_rdtscp_supported()) {
+   if (vmm_coexistence)
+   vmcs_load(vmx->vmcs);
   


Don't understand why this is needed.  Shouldn't vmx_vcpu_load() load the 
vmptr?



   

Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-06 Thread Arnd Bergmann
On Wednesday 05 May 2010, Takuya Yoshikawa wrote:
> Date: 
> Yesterday 04:59:24
> > That's why the bitmaps are defined as little endian u64 aligned, even on 
> > big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
> > and u64 alignment ensures we can use long-sized bitops on mixed size 
> > systems.

Ok, I see.

> There was a suggestion to propose set_le_bit_user() kind of macros.
> But what I thought was these have a constraint you two explained and seemed 
> to be
> a little bit specific to some area, like KVM.
> 
> So I decided to propose just the offset calculation macro.

I'm not sure I understand how this macro is going to be used though.
If you are just using this in kernel space, that's fine, please go for
it.

However, if the intention is to use the same macro in user space, putting
it into asm-generic/bitops/* is not going to help, because those headers
are not available in user space, and I wouldn't want to change that.

The definition of the macro is not part of the ABI, so just duplicate
it in KVM if you need it there.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] IOzone preprocessing: Fix wrong column mapping on graph generation

2010-05-06 Thread Martin Bligh
LGTM

On Thu, May 6, 2010 at 6:24 AM, Lucas Meneghel Rodrigues  
wrote:
> Fix a silly bug on graph generation: it was mapping the wrong
> columns when plotting the 2D throughput graphs. Sorry for the
> mistake.
>
> Signed-off-by: Lucas Meneghel Rodrigues 
> ---
>  client/tests/iozone/postprocessing.py |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/client/tests/iozone/postprocessing.py 
> b/client/tests/iozone/postprocessing.py
> index c995aea..3a77c83 100755
> --- a/client/tests/iozone/postprocessing.py
> +++ b/client/tests/iozone/postprocessing.py
> @@ -384,7 +384,7 @@ class IOzonePlotter(object):
>         record size vs. throughput.
>         """
>         datasource_2d = os.path.join(self.output_dir, '2d-datasource-file')
> -        for index, label in zip(range(1, 14), _LABELS[2:]):
> +        for index, label in zip(range(2, 15), _LABELS[2:]):
>             commands_path = os.path.join(self.output_dir, '2d-%s.do' % label)
>             commands = ""
>             commands += "set title 'Iozone performance: %s'\n" % label
> --
> 1.7.0.1
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest

2010-05-06 Thread Cui, Dexuan
Avi Kivity wrote:
> On 05/06/2010 07:23 AM, Cui, Dexuan wrote:
>> 
 +  goto err;
 +  vcpu->arch.guest_xstate_mask = new_bv;
 +  xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.guest_xstate_mask);
 
 
>>> Can't we run with the host xcr0?  isn't it guaranteed to be a
>>> superset of the guest xcr0? 
>>> 
>> I agree host xcr0 is a superset of guest xcr0.
>> In the case guest xcr0 has less bits set  than host xcr0, I suppose
>> running with guest xcr0 would be a bit faster.
> 
> However, switching xcr0 may be slow.  That's our experience with msrs.
> Can you measure its latency?
We can measure that.
However, I think the changing xcr0 to guest xcr0 in handle_xsetbv() is 
necessary --
or else, inside guest xgetbv() would return host xcr0 rather than guest xcr0 -- 
this is obviously incorrect. Once handle_xsetbv() changes the xcr0 to guest's 
value,
the xsetbv() in kvm_fx_restore_host() is also necessary, and the xsetbv() in
kvm_fx_restore_guest() is also necessary. So looks guest can't run with the
host xcr0.

 
> Can also be avoided if guest and host xcr0 match.
> 
>> If you think simplying the code by removing the field
>> guest_xstate_mask is more important, we can do it.
>> 
> 
> Well we need guest_xstate_mask, it's the guest's xcr0, no?
I misread it. Yes, we need geust_xsave_mask.

> 
> btw, it needs save/restore for live migration, as well as save/restore
> for the new fpu state.
Yes. This part is missing. Sheng and I is also doing this -- it may be a bittle
troublesome as the number of XSTATEs can grown as time goes on. We'll
have to handle the compatibility issue.

> 
 +  skip_emulated_instruction(vcpu);
 +  return 1;
 +err:
 +  kvm_inject_gp(vcpu, 0);
 
 
>>> Need to #UD in some circumstances.
>>> 
>> I don't think so. When the CPU doesn't suport XSAVE, or guest
>> doesn't set guest CR4.OSXSAVE, guest's attempt of exectuing xsetbv
>> would get a #UD immediately 
>> and no VMexit would occur.
>> 
> 
> Ok.
> 
 @@ -62,6 +64,7 @@
(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |
  X86_CR4_DE\ | X86_CR4_PSE | X86_CR4_PAE | 
 X86_CR4_MCE \
  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR  
 \
 +| (cpu_has_xsave ? X86_CR4_OSXSAVE : 0)   \
  | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE))
 
 
>>> It also depends on the guest cpuid value.  Please add it outside the
>>> 
>> If cpu_has_xsave is true, we always present xsave to guest via
>> cpuid, so I 
>> think the code is correct here.
>> 
> 
> We don't pass all host cpuid values to the guest.  We expose them to
> userspace via KVM_GET_SUPPORTED_CPUID, and then userspace decides what
> cpuid to present to the guest via KVM_SET_CPUID2.  So the guest may
> run with fewer features than the host.
Sorry, I didn't notice this. Will look into this.

> 
>> I saw the 2 patches you sent. They (like the new APIs
>> fpu_alloc/fpu_free) will simplify the implementation of kvm xsave
>> support. Thanks a lot! 
>> 
> 
> Thanks.  To simplify things, please separate host xsave support
> (switching to the fpu api) and guest xsave support (cpuid, xsetbv, new
> ioctls) into separate patches.  In fact, guest xsave support is
> probably best split into patches as well.
Ok. Will try to do these.

Thanks,
-- Dexuan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Jamie Lokier
Rusty Russell wrote:
> > Seems over-zealous.
> > If the recovery_header held a strong checksum of the recovery_data you would
> > not need the first fsync, and as long as you have two places to write 
> > recovery
> > data, you don't need the 3rd and 4th syncs.
> > Just:
> >   
> > write_internally_checksummed_recovery_data_and_header_to_unused_log_space()
> >   fsync / msync
> >   overwrite_with_new_data()
> > 
> > To recovery you choose the most recent log_space and replay the content.
> > That may be a redundant operation, but that is no loss.
> 
> I think you missed a checksum for the new data?  Otherwise we can't tell if
> the new data is completely written.

The data checksum can go in the recovery-data block.  If there's
enough slack in the log, by the time that recovery-data block is
overwritten, you can be sure that an fsync has been done for that
data (by a later commit).

> But yes, I will steal this scheme for TDB2, thanks!

Take a look at the filesystems.  I think ext4 did some optimisations
in this area, and that checksums had to be added anyway due to a
subtle replay-corruption problem that happens when the log is
partially corrupted, and followed by non-corrupt blocks.

Also, you can remove even more fsyncs by adding a bit of slack to the
data space and writing into unused/fresh areas some of the time -
i.e. a bit like btrfs/zfs or anything log-structured, but you don't
have to go all the way with that.

> In practice, it's the first sync which is glacial, the rest are pretty cheap.

The 3rd and 4th fsyncs imply a disk seek each, just because the
preceding writes are to different areas of the disk.  Seeks are quite
slow - but not as slow as ext3 fsyncs :-) What do you mean by cheap?
That it's only a couple of seeks, or that you don't see even that?

> 
> > Also cannot see the point of msync if you have already performed an fsync,
> > and if there is a point, I would expect you to call msync before
> > fsync... Maybe there is some subtlety there that I am not aware of.
> 
> I assume it's this from the msync man page:
> 
>msync()  flushes  changes  made  to the in-core copy of a file that was
>mapped into memory using mmap(2) back to disk.   Without  use  of  this
>call  there  is  no guarantee that changes are written back before mun‐
>map(2) is called. 

Historically, that means msync() ensures dirty mapping data is written
to the file as if with write(), and that mapping pages are removed or
refreshed to get the effect of read() (possibly a lazy one).  It's
more obvious in the early mmap implementations where mappings don't
share pages with the filesystem cache, so msync() has explicit
behaviour.

Like with write(), after calling msync() you would then call fsync()
to ensure the data is flushed to disk.

If you've been calling fsync then msync, I guess that's another fine
example of how these function are so hard to test, that they aren't.

Historically on Linux, msync has been iffy on some architectures, and
I'm still not sure it has the same semantics as other unixes.  fsync
as we know has also been iffy, and even now that fsync is tidier it
does not always issue a hardware-level cache commit.

But then historically writable mmap has been iffy on a boatload of
unixes.

> > > It's an implementation detail; barrier has less flexibility because it has
> > > less information about what is required. I'm saying I want to give you as
> > > much information as I can, even if you don't use it yet.
> > 
> > Only we know that approach doesn't work.
> > People will learn that they don't need to give the extra information to 
> > still
> > achieve the same result - just like they did with ext3 and fsync.
> > Then when we improve the implementation to only provide the guarantees that
> > you asked for, people will complain that they are getting empty files that
> > they didn't expect.
> 
> I think that's an oversimplification: IIUC that occurred to people *not*
> using fsync().  They weren't using it because it was too slow.  Providing
> a primitive which is as fast or faster and more specific doesn't have the
> same magnitude of social issues.

I agree with Rusty.  Let's make it perform well so there is no reason
to deliberately avoid using it, and let's make say what apps actually
want to request without being way too strong.

And please, if anyone has ideas on how we could make correct use of
these functions *testable* by app authors, I'm all ears.  Right now it
is quite difficult - pulling power on hard disks mid-transaction is
not a convenient method :)

> > The abstraction I would like to see is a simple 'barrier' that contains no
> > data and has a filesystem-wide effect.
> 
> I think you lack ambition ;)
> 
> Thinking about the single-file use case (eg. kvm guest or tdb), isn't that
> suboptimal for md?  Since you have to hand your barrier to every device
> whereas a file-wide primitive may theoretically only go to some.

Yes.

Note that database-like programs s

RE: virtio-win problem

2010-05-06 Thread Martin Maurer
> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Riccardo Veraldi
> Sent: Donnerstag, 06. Mai 2010 14:10
> To: kvm@vger.kernel.org
> Subject: virtio-win problem
> 
> Hello,
> if I install virtio-win drivers on windows 2008 Server R2, I have the
> problem of signed device drivers.
> I Can install the drivers but Windows 2008 server refuses to use them
> unless I start
> the machine pressing F8 every time at each reboot bypassing the
> checking
> of signed certified drivers, and this is annoying,
> since I Cannot reboot the virtual machien automatically.
> 
> Anyone solved this issue ?

Redhat released signed drivers working on win2008r2, but no public download or 
free use (you need a subscription, it's a special license)

Or you need to follow the howtos on the KVM wiki pages, 
http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers

Br, Martin

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 6/9] KVM test: Raise error when met unknown type in kvm_vm.remote_login().

2010-05-06 Thread Lucas Meneghel Rodrigues
On Mon, Apr 26, 2010 at 7:04 AM, Jason Wang  wrote:
> Need to raise the error when met the unknown type of shell_client in
> kvm_vm.remote_login() in order to avoid the traceback.

In order to keep consistency, please make the function return None
instead of throwing an exception. You might log the message as a
logging.error() record.

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_vm.py |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
> index 8f4753f..0cdf925 100755
> --- a/client/tests/kvm/kvm_vm.py
> +++ b/client/tests/kvm/kvm_vm.py
> @@ -806,7 +806,9 @@ class VM:
>         elif client == "nc":
>             session = kvm_utils.netcat(address, port, username, password,
>                                        prompt, linesep, timeout)
> -
> +        else:
> +            raise error.TestError("Unknown shell_client type %s" % client)
> +
>         if session:
>             session.set_status_test_command(self.params.get("status_test_"
>                                                             "command", ""))
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] KVM test: Introduce the prompt assist

2010-05-06 Thread Lucas Meneghel Rodrigues
On Thu, 2010-05-06 at 10:55 +0800, Jason Wang wrote:
> Michael Goldish wrote:
> > On 04/26/2010 01:03 PM, Jason Wang wrote:
> >   
> >> Sometimes we need to send an assist string to a session in order to
> >> get the prompt especially when re-connecting to an already logged
> >> serial session. This patch send the assist string before doing the
> >> pattern matching of remote_login.
> >> 
> >
> > Can you give an example of a prompt assist string, and a typical usage
> > example?  What guests require prompt assist strings?
> >
> >   
> It was just used by serial console, consider when the first test case 
> have already connected to the serial console, so the second test must 
> send something in order to get the prompt string. But it may be better 
> to log out during when the session is closed.

This has a upside of making things cleaner, I'd consider that.

> >> Signed-off-by: Jason Wang 
> >> ---
> >>  client/tests/kvm/kvm_utils.py |9 +++--
> >>  1 files changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
> >> index 25f3c8c..9adbaee 100644
> >> --- a/client/tests/kvm/kvm_utils.py
> >> +++ b/client/tests/kvm/kvm_utils.py
> >> @@ -451,7 +451,8 @@ def check_kvm_source_dir(source_dir):
> >>  # The following are functions used for SSH, SCP and Telnet communication 
> >> with
> >>  # guests.
> >>  
> >> -def remote_login(command, password, prompt, linesep="\n", timeout=10):
> >> +def remote_login(command, password, prompt, linesep="\n", timeout=10,
> >> + prompt_assist = None):
> >> 
> >  ^ ^
> > These spaces do not conform with PEP 8.
> >   
> Would change them.
> >   
> >>  """
> >>  Log into a remote host (guest) using SSH or Telnet. Run the given 
> >> command
> >>  using kvm_spawn and provide answers to the questions asked. If timeout
> >> @@ -468,7 +469,8 @@ def remote_login(command, password, prompt, 
> >> linesep="\n", timeout=10):
> >>  @param timeout: The maximal time duration (in seconds) to wait for 
> >> each
> >>  step of the login procedure (i.e. the "Are you sure" prompt, 
> >> the
> >>  password prompt, the shell prompt, etc)
> >> -
> >> +@prarm prompt_assist: An assistant string sent before the pattern
> >> 
> >
> > Typo^
> >
> >   
> >> +matching in order to get the prompt for some kinds of 
> >> shell_client.
> >>  @return Return the kvm_spawn object on success and None on failure.
> >>  """
> >>  sub = kvm_subprocess.kvm_shell_session(command,
> >> @@ -479,6 +481,9 @@ def remote_login(command, password, prompt, 
> >> linesep="\n", timeout=10):
> >>  
> >>  logging.debug("Trying to login with command '%s'" % command)
> >>  
> >> +if prompt_assist is not None:
> >> +sub.sendline(prompt_assist)
> >> +
> >>  while True:
> >>  (match, text) = sub.read_until_last_line_matches(
> >>  [r"[Aa]re you sure", r"[Pp]assword:\s*$", 
> >> r"^\s*[Ll]ogin:\s*$",
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >
> >   
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Jamie Lokier
Rusty Russell wrote:
> On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote:
> > Jens Axboe wrote:
> > > On Tue, May 04 2010, Rusty Russell wrote:
> > > > ISTR someone mentioning a desire for such an API years ago, so CC'ing 
> > > > the
> > > > usual I/O suspects...
> > > 
> > > It would be nice to have a more fuller API for this, but the reality is
> > > that only the flush approach is really workable. Even just strict
> > > ordering of requests could only be supported on SCSI, and even there the
> > > kernel still lacks proper guarantees on error handling to prevent
> > > reordering there.
> > 
> > There's a few I/O scheduling differences that might be useful:
> > 
> > 1. The I/O scheduler could freely move WRITEs before a FLUSH but not
> >before a BARRIER.  That might be useful for time-critical WRITEs,
> >and those issued by high I/O priority.
> 
> This is only because noone actually wants flushes or barriers, though
> I/O people seem to only offer that.  We really want " must
> occur before ".  That offers maximum choice to the I/O subsystem
> and potentially to smart (virtual?) disks.

We do want flushes for the "D" in ACID - such things as after
receiving a mail, or blog update into a database file (could be TDB),
and confirming that to the sender, to have high confidence that the
update won't disappear on system crash or power failure.

Less obviously, it's also needed for the "C" in ACID when more than
one file is involved.  "C" is about differently updated things staying
consistent with each other.

For example, imagine you have a TDB file mapping Samba usernames to
passwords, and another mapping Samba usernames to local usernames.  (I
don't know if you do this; it's just an illustration).

To rename a Samba user involves updating both.  Let's ignore transient
transactional issues :-) and just think about what happens with
per-file barriers and no sync, when a crash happens long after the
updates, and before the system has written out all data and issued low
level cache flushes.

After restarting, due to lack of sync, the Samba username could be
present in one file and not the other.

> > 2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
> >only for data belonging to a particular file (e.g. fdatasync with
> >no file size change, even on btrfs if O_DIRECT was used for the
> >writes being committed).  That would entail tagging FLUSHes and
> >WRITEs with a fs-specific identifier (such as inode number), opaque
> >to the scheduler which only checks equality.
> 
> This is closer.  In userspace I'd be happy with a "all prior writes to this
> struct file before all future writes".  Even if the original guarantees were
> stronger (ie. inode basis).  We currently implement transactions using 4 fsync
> /msync pairs.
> 
>   write_recovery_data(fd);
>   fsync(fd);
>   msync(mmap);
>   write_recovery_header(fd);
>   fsync(fd);
>   msync(mmap);
>   overwrite_with_new_data(fd);
>   fsync(fd);
>   msync(mmap);
>   remove_recovery_header(fd);
>   fsync(fd);
>   msync(mmap);
> 
> Yet we really only need ordering, not guarantees about it actually hitting
> disk before returning.
> 
> > In other words, FLUSH can be more relaxed than BARRIER inside the
> > kernel.  It's ironic that we think of fsync as stronger than
> > fbarrier outside the kernel :-)
> 
> It's an implementation detail; barrier has less flexibility because it has
> less information about what is required. I'm saying I want to give you as
> much information as I can, even if you don't use it yet.

I agree, and I've started a few threads about it over the last couple of years.

An fsync_range() system call would be very easy to use and
(most importantly) easy to understand.

With optional flags to weaken it (into fdatasync, barrier without sync,
sync without barrier, one-sided barrier, no lowlevel cache-flush, don't rush,
etc.), it would be very versatile, and still easy to understand.

With an AIO version, and another flag meaning don't rush, just return
when satisfied, and I suspect it would be useful for the most
demanding I/O apps.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 8/9] KVM test: Create the background threads before calling process()

2010-05-06 Thread Lucas Meneghel Rodrigues
On Wed, Apr 28, 2010 at 8:55 AM, Michael Goldish  wrote:
> On 04/26/2010 01:04 PM, Jason Wang wrote:
>> If the screendump and scrialdump threads are created after the
>> process(), we may lose the progress tracking of guest shutting
>> down. So this patch creates them before calling process() in preprocess.
>>
>> Signed-off-by: Jason Wang 
>> ---
>>  client/tests/kvm/kvm_preprocessing.py |    5 ++---
>>  1 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/client/tests/kvm/kvm_preprocessing.py 
>> b/client/tests/kvm/kvm_preprocessing.py
>> index 50d0e35..73e835a 100644
>> --- a/client/tests/kvm/kvm_preprocessing.py
>> +++ b/client/tests/kvm/kvm_preprocessing.py
>> @@ -257,9 +257,6 @@ def preprocess(test, params, env):
>>                          int(params.get("pre_command_timeout", "600")),
>>                          params.get("pre_command_noncritical") == "yes")
>>
>> -    # Preprocess all VMs and images
>> -    process(test, params, env, preprocess_image, preprocess_vm)
>> -
>>      # Start the screendump thread
>>      if params.get("take_regular_screendumps") == "yes":
>>          logging.debug("Starting screendump thread")
>> @@ -278,6 +275,8 @@ def preprocess(test, params, env):
>>                                                args=(test, params, env))
>>          _serialdump_thread.start()
>>
>> +    # Preprocess all VMs and images
>> +    process(test, params, env, preprocess_image, preprocess_vm)
>>
>>
>>  def postprocess(test, params, env):
>
> The initial shutdown procedure is carried out automatically by the
> preprocessor in order to prepare the VMs for the current test, and is
> not part of the test.  During the procedure VMs from a previous test are
> shut down and/or restarted.  I think it'll be confusing (or at least
> irrelevant) for the user to see a Fedora guest shutting down at the
> beginning of a Windows test.  Also, it will be inconsistent with the
> pre_*.ppm screendumps which are taken after the new VMs are started.

Agreed. Besides, if we didn't specify kill_vm for that particular
test, it means that we are not terribly interested on what happens
with the VM if it has to be shutdown (worst case scenario is, the
postprocessor will quit the VM). Of course, it is allways scary to
loose information, but I can't think on why we'd need that info anyway
(correct me if I am wrong).



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM test: Redirect the console to serial for all linux guests

2010-05-06 Thread Lucas Meneghel Rodrigues
On Thu, 2010-05-06 at 11:08 +0800, Jason Wang wrote:
> Michael Goldish wrote:
> > On 04/26/2010 01:04 PM, Jason Wang wrote:
> >   
> >> As we have the ability to dump the content from serial console or use
> >> a session through it, we need to redirect the console to serial
> >> through unattended files to make use of it.
> >>
> >> Signed-off-by: Jason Wang 
> >> ---
> >>  client/tests/kvm/unattended/Fedora-10.ks |2 +-
> >>  client/tests/kvm/unattended/Fedora-11.ks |2 +-
> >>  client/tests/kvm/unattended/Fedora-12.ks |2 +-
> >>  client/tests/kvm/unattended/Fedora-8.ks  |2 +-
> >>  client/tests/kvm/unattended/Fedora-9.ks  |2 +-
> >>  client/tests/kvm/unattended/OpenSUSE-11.xml  |1 +
> >>  client/tests/kvm/unattended/RHEL-3-series.ks |2 +-
> >>  client/tests/kvm/unattended/RHEL-4-series.ks |2 +-
> >>  client/tests/kvm/unattended/RHEL-5-series.ks |2 +-
> >>  client/tests/kvm/unattended/SLES-11.xml  |1 +
> >>  10 files changed, 10 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/client/tests/kvm/unattended/Fedora-10.ks 
> >> b/client/tests/kvm/unattended/Fedora-10.ks
> >> index 61e59d7..8628036 100644
> >> --- a/client/tests/kvm/unattended/Fedora-10.ks
> >> +++ b/client/tests/kvm/unattended/Fedora-10.ks
> >> @@ -11,7 +11,7 @@ firewall --enabled --ssh
> >>  selinux --enforcing
> >>  timezone --utc America/New_York
> >>  firstboot --disable
> >> -bootloader --location=mbr
> >> +bootloader --location=mbr --append="console=ttyS0,115200"
> >> 
> >
> > I see no reason to completely silence the regular console.  We can let
> > some output go to the regular console as well as the serial one by using
> >
> > --append="console=tty0 console=ttyS0"
> >
> >   
> Yes, we should always keep the tty0. Thanks for pointing that.

It seems like we are opening possibilities on windows land - it seems
that there is an administration console running on windows' serial ports
by default:

http://technet.microsoft.com/en-us/library/cc785873%28WS.10%29.aspx

Need to check that out.

Excellent. Jason, the patchset looks mostly good, please implement the
suggestions pointed out on the comments and re-send it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] question on virtio

2010-05-06 Thread Jamie Lokier
Michael S. Tsirkin wrote:
> Hi!
> I see this in virtio_ring.c:
> 
> /* Put entry in available array (but don't update avail->idx *
>  until they do sync). */
> 
> Why is it done this way?
> It seems that updating the index straight away would be simpler, while
> this might allow the host to specilatively look up the buffer and handle
> it, without waiting for the kick.

Even better, if the host updates a location containing which index it
has seen recently, you can avoid the kick entirely during sustained
flows - just like your recent patch to avoid sending irqs to the
guest.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: How to map PCI memory into the VM without trapping

2010-05-06 Thread Cam Macdonell
On Wed, May 5, 2010 at 5:20 PM, Frank Berreth  wrote:
> Hi there,
> this is just an update if you are interested in the outcome. I turns out
> that my initial assumption that there would be page faults/trapping on the
> memory pages was false. The reason the throughput is so low is because the
> memory was mapped non-cached. The VGA driver and the ivshmem driver use
> pci_ioremap_bar which will *always* map the PCI bar non-cached (even the
> resourceX_wc).
> Changing the driver(s) to use ioremap_cache or ioremap_wc speeds up things
> quite a bit. I don't know if VGA framebuffer was always mapped this way --
> it appears on a real system that usually graphics memory is mapped WC.
> Mapping it UC would cause quite a performance degradation. This could be the
> reason for the reported VGA performance drop in another email thread. IMHO,
> since QEMU emulates VGA, this could be mapped WB.
> Thanks,
> Frank.
>

Hi Frank,

Thanks for the note.  I'll make the change to cached and see if that
helps with some of the tests I'm running.

Cheers,
Cam
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 3/3] KVM test: Remove the duplicated KERNEL paramters in the pxe configuration file

2010-05-06 Thread Lucas Meneghel Rodrigues
On Mon, Apr 26, 2010 at 7:07 AM, Jason Wang  wrote:
> Remove the duplicated "KERNEL vmlinuz" in unattended.py

Good catch, applied, thanks!

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/scripts/unattended.py |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/client/tests/kvm/scripts/unattended.py 
> b/client/tests/kvm/scripts/unattended.py
> index e41bc86..fdadd03 100755
> --- a/client/tests/kvm/scripts/unattended.py
> +++ b/client/tests/kvm/scripts/unattended.py
> @@ -209,7 +209,6 @@ class UnattendedInstall(object):
>         pxe_config.write('PROMPT 0\n')
>         pxe_config.write('LABEL pxeboot\n')
>         pxe_config.write('     KERNEL vmlinuz\n')
> -        pxe_config.write('     KERNEL vmlinuz\n')
>         pxe_config.write('     APPEND initrd=initrd.img %s\n' %
>                          self.kernel_args)
>         pxe_config.close()
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 2/3] KVM test: Create ksm scanner through pre_command

2010-05-06 Thread Lucas Meneghel Rodrigues
On Mon, Apr 26, 2010 at 7:07 AM, Jason Wang  wrote:
> KSM may have various control interface for different distributions,so
> this patch launch ksm through pre_command instead of the hard-coded
> bits in the test. User may specify their owner suitable commands or
> paramteres.

Applied, thanks!

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/tests/ksm_overcommit.py |   15 ---
>  client/tests/kvm/tests_base.cfg.sample   |    2 ++
>  2 files changed, 2 insertions(+), 15 deletions(-)
>
> diff --git a/client/tests/kvm/tests/ksm_overcommit.py 
> b/client/tests/kvm/tests/ksm_overcommit.py
> index 2dd46c4..4aa6deb 100644
> --- a/client/tests/kvm/tests/ksm_overcommit.py
> +++ b/client/tests/kvm/tests/ksm_overcommit.py
> @@ -412,21 +412,6 @@ def run_ksm_overcommit(test, params, env):
>                                      (3100 - 64.0)))
>                 mem = int(math.floor(host_mem * overcommit / vmsc))
>
> -    logging.debug("Checking KSM status...")
> -    ksm_flag = 0
> -    for line in os.popen('ksmctl info').readlines():
> -        if line.startswith('flags'):
> -            ksm_flag = int(line.split(' ')[1].split(',')[0])
> -    if int(ksm_flag) != 1:
> -        logging.info("KSM module is not loaded! Trying to load module and "
> -                     "start ksmctl...")
> -        try:
> -            utils.run("modprobe ksm")
> -            utils.run("ksmctl start 5000 100")
> -        except error.CmdError, e:
> -            raise error.TestFail("Failed to load KSM: %s" % e)
> -    logging.debug("KSM module loaded and ksmctl started")
> -
>     swap = int(utils.read_from_meminfo("SwapTotal")) / 1024
>
>     logging.debug("Overcommit = %f", overcommit)
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index e73ba44..2db0d2c 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -285,6 +285,8 @@ variants:
>         catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
>
>     - ksm_overcommit:
> +        pre_command = "[ -e /dev/ksm ] && true || modprobe ksm && ksmctl 
> start 5000 50"
> +        pre_command_critical = yes
>         # Don't preprocess any vms as we need to change its params
>         vms = ''
>         image_snapshot = yes
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 1/3] KVM test: Use customized command to get the version of kvm and its

2010-05-06 Thread Lucas Meneghel Rodrigues
On Mon, Apr 26, 2010 at 7:07 AM, Jason Wang  wrote:
> userspace
>
> Current method may or may not work for various kinds of
> distribution. So this patch enable the ability to use customized
> commands to get the version of kvm and its userspace. "kvm_ver_cmd" is
> used for kvm verison and "kvm_userspace_ver_cmd" is for its userspace.

The method we are currently using is pretty satisfactory - if we fail
in getting /sys/module/kvm/version we use the kernel version as a
fallback, which is good for the kernel module. For qemu, we make a
regular expression searching for numbers following the string version,
so I don't see a reason on why we should make it configurable. Care to
provide an example of a situation where the current method fails?

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/kvm_preprocessing.py |   18 +-
>  1 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/client/tests/kvm/kvm_preprocessing.py 
> b/client/tests/kvm/kvm_preprocessing.py
> index 4b9290c..16200ab 100644
> --- a/client/tests/kvm/kvm_preprocessing.py
> +++ b/client/tests/kvm/kvm_preprocessing.py
> @@ -225,10 +225,10 @@ def preprocess(test, params, env):
>     # Get the KVM kernel module version and write it as a keyval
>     logging.debug("Fetching KVM module version...")
>     if os.path.exists("/dev/kvm"):
> -        try:
> -            kvm_version = open("/sys/module/kvm/version").read().strip()
> -        except:
> -            kvm_version = os.uname()[2]
> +        kvm_ver_cmd = params.get("kvm_ver_cmd", "cat 
> /sys/module/kvm/version")
> +        s, kvm_version = commands.getstatusoutput(kvm_ver_cmd)
> +        if s != 0:
> +            kvm_version = "Unknown"
>     else:
>         kvm_version = "Unknown"
>         logging.debug("KVM module not loaded")
> @@ -239,11 +239,11 @@ def preprocess(test, params, env):
>     logging.debug("Fetching KVM userspace version...")
>     qemu_path = kvm_utils.get_path(test.bindir, params.get("qemu_binary",
>                                                            "qemu"))
> -    version_line = commands.getoutput("%s -help | head -n 1" % qemu_path)
> -    matches = re.findall("[Vv]ersion .*?,", version_line)
> -    if matches:
> -        kvm_userspace_version = " ".join(matches[0].split()[1:]).strip(",")
> -    else:
> +    def_qemu_ver_cmd = "%s -help | head -n 1 | awk '{ print $5}'" % qemu_path
> +    kvm_userspace_ver_cmd = params.get("kvm_userspace_ver_cmd",
> +                                       def_qemu_ver_cmd)
> +    s, kvm_userspace_version = 
> commands.getstatusoutput(kvm_userspace_ver_cmd)
> +    if s != 0:
>         kvm_userspace_version = "Unknown"
>         logging.debug("Could not fetch KVM userspace version")
>     logging.debug("KVM userspace version: %s" % kvm_userspace_version)
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-06 Thread Anthony Liguori

On 04/21/2010 12:53 PM, Cam Macdonell wrote:

Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

 -device ivshmem,size=[,shm=]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

 -device ivshmem,size=[,shm=]
 [,chardev=][,msi=on][,irqfd=on][,vectors=n]
 -chardev socket,path=,id=

(shared memory server is qemu.git/contrib/ivshmem-server)

Sample programs and init scripts are in a git repo here:

 www.gitorious.org/nahanni
---
  Makefile.target |3 +
  hw/ivshmem.c|  727 +++
  qemu-char.c |6 +
  qemu-char.h |3 +
  qemu-doc.texi   |   25 ++
  5 files changed, 764 insertions(+), 0 deletions(-)
  create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 1ffd802..bc9a681 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
  obj-y += rtl8139.o
  obj-y += e1000.o

+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
  # Hardware support
  obj-i386-y = pckbd.o dma.o
  obj-i386-y += vga.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..f8d8fdb
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,727 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include
+#include
+#include
+#include
+#include
+#include
   


This will break the Windows along with any non-Linux unix or any Linux 
old enough to not have eventfd support.


If it's based on cirrus_vga.c and rtl8139.c, then it ought to carry the 
respective copyrights, no?


Regards,

Anthony Liguori


+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "msix.h"
+#include "qemu-kvm.h"
+#include "libkvm.h"
+
+#include
+#include
+#include
+#include
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI 1
+
+#define DEBUG_IVSHMEM
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState * chr;
+CharDriverState ** eventfd_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+unsigned long ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+int nr_allocated_vms;
+/* array of eventfds for each guest */
+int ** eventfds;
+/* keep track of # of eventfds for each guest*/
+int * eventfds_posn_count;
+
+int nr_alloc_guests;
+int vm_id;
+int num_eventfds;
+uint32_t vectors;
+uint32_t features;
+EventfdEntry *eventfd_table;
+
+char * shmobj;
+char * sizearg;
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 4,
+IVPosition = 8,
+Doorbell = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int feature) {
+return (ivs->features&  (1<<  feature));
+}
+
+static inline int is_power_of_two(int x) {
+return (x&  (x-1)) == 0;
+}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr, (uint32_t)size);
+cpu_register_physical_memory(addr, s->ivshmem_size, s->ivshmem_offset);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s->intrstatus&  s->intrmask)&  0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
+   isr ? 1 : 0, s->intrstatus, s->intrmask);
+}
+
+qemu_set_irq(s->dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val);
+
+s->intrmask = val;
+
+ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
+{
+uint32_t ret = s->intrmask;
+
+IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret);
+
+return ret;
+}
+
+static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val);
+
+s->intrstatus = val;
+
+ivshmem_u

Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-06 Thread Cam Macdonell
On Thu, May 6, 2010 at 11:32 AM, Anthony Liguori  wrote:
> On 04/21/2010 12:53 PM, Cam Macdonell wrote:
>>
>> Support an inter-vm shared memory device that maps a shared-memory object
>> as a
>> PCI device in the guest.  This patch also supports interrupts between
>> guest by
>> communicating over a unix domain socket.  This patch applies to the
>> qemu-kvm
>> repository.
>>
>>     -device ivshmem,size=[,shm=]
>>
>> Interrupts are supported between multiple VMs by using a shared memory
>> server
>> by using a chardev socket.
>>
>>     -device ivshmem,size=[,shm=]
>>                     [,chardev=][,msi=on][,irqfd=on][,vectors=n]
>>     -chardev socket,path=,id=
>>
>> (shared memory server is qemu.git/contrib/ivshmem-server)
>>
>> Sample programs and init scripts are in a git repo here:
>>
>>     www.gitorious.org/nahanni
>> ---
>>  Makefile.target |    3 +
>>  hw/ivshmem.c    |  727
>> +++
>>  qemu-char.c     |    6 +
>>  qemu-char.h     |    3 +
>>  qemu-doc.texi   |   25 ++
>>  5 files changed, 764 insertions(+), 0 deletions(-)
>>  create mode 100644 hw/ivshmem.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 1ffd802..bc9a681 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>  obj-y += rtl8139.o
>>  obj-y += e1000.o
>>
>> +# Inter-VM PCI shared memory
>> +obj-y += ivshmem.o
>> +
>>  # Hardware support
>>  obj-i386-y = pckbd.o dma.o
>>  obj-i386-y += vga.o
>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
>> new file mode 100644
>> index 000..f8d8fdb
>> --- /dev/null
>> +++ b/hw/ivshmem.c
>> @@ -0,0 +1,727 @@
>> +/*
>> + * Inter-VM Shared Memory PCI device.
>> + *
>> + * Author:
>> + *      Cam Macdonell
>> + *
>> + * Based On: cirrus_vga.c and rtl8139.c
>> + *
>> + * This code is licensed under the GNU GPL v2.
>> + */
>> +#include
>> +#include
>> +#include
>> +#include
>> +#include
>> +#include
>>
>
> This will break the Windows along with any non-Linux unix or any Linux old
> enough to not have eventfd support.

I'll wrap it with

#ifdef CONFIG_EVENTFD

> If it's based on cirrus_vga.c and rtl8139.c, then it ought to carry the
> respective copyrights, no?

Sure, I can add those

Cam

>
> Regards,
>
> Anthony Liguori
>
>> +#include "hw.h"
>> +#include "console.h"
>> +#include "pc.h"
>> +#include "pci.h"
>> +#include "sysemu.h"
>> +
>> +#include "msix.h"
>> +#include "qemu-kvm.h"
>> +#include "libkvm.h"
>> +
>> +#include
>> +#include
>> +#include
>> +#include
>> +
>> +#define IVSHMEM_IRQFD   0
>> +#define IVSHMEM_MSI     1
>> +
>> +#define DEBUG_IVSHMEM
>> +#ifdef DEBUG_IVSHMEM
>> +#define IVSHMEM_DPRINTF(fmt, args...)        \
>> +    do {printf("IVSHMEM: " fmt, ##args); } while (0)
>> +#else
>> +#define IVSHMEM_DPRINTF(fmt, args...)
>> +#endif
>> +
>> +typedef struct EventfdEntry {
>> +    PCIDevice *pdev;
>> +    int vector;
>> +} EventfdEntry;
>> +
>> +typedef struct IVShmemState {
>> +    PCIDevice dev;
>> +    uint32_t intrmask;
>> +    uint32_t intrstatus;
>> +    uint32_t doorbell;
>> +
>> +    CharDriverState * chr;
>> +    CharDriverState ** eventfd_chr;
>> +    int ivshmem_mmio_io_addr;
>> +
>> +    pcibus_t mmio_addr;
>> +    unsigned long ivshmem_offset;
>> +    uint64_t ivshmem_size; /* size of shared memory region */
>> +    int shm_fd; /* shared memory file descriptor */
>> +
>> +    int nr_allocated_vms;
>> +    /* array of eventfds for each guest */
>> +    int ** eventfds;
>> +    /* keep track of # of eventfds for each guest*/
>> +    int * eventfds_posn_count;
>> +
>> +    int nr_alloc_guests;
>> +    int vm_id;
>> +    int num_eventfds;
>> +    uint32_t vectors;
>> +    uint32_t features;
>> +    EventfdEntry *eventfd_table;
>> +
>> +    char * shmobj;
>> +    char * sizearg;
>> +} IVShmemState;
>> +
>> +/* registers for the Inter-VM shared memory device */
>> +enum ivshmem_registers {
>> +    IntrMask = 0,
>> +    IntrStatus = 4,
>> +    IVPosition = 8,
>> +    Doorbell = 12,
>> +};
>> +
>> +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int
>> feature) {
>> +    return (ivs->features&  (1<<  feature));
>> +}
>> +
>> +static inline int is_power_of_two(int x) {
>> +    return (x&  (x-1)) == 0;
>> +}
>> +
>> +static void ivshmem_map(PCIDevice *pci_dev, int region_num,
>> +                    pcibus_t addr, pcibus_t size, int type)
>> +{
>> +    IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
>> +
>> +    IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr,
>> (uint32_t)size);
>> +    cpu_register_physical_memory(addr, s->ivshmem_size,
>> s->ivshmem_offset);
>> +
>> +}
>> +
>> +/* accessing registers - based on rtl8139 */
>> +static void ivshmem_update_irq(IVShmemState *s, int val)
>> +{
>> +    int isr;
>> +    isr = (s->intrstatus&  s->intrmask)&  0x;
>> +
>> +    /* don't print ISR resets */
>> +    if (isr) {
>> +        IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
>> +           isr ? 1 : 0, s->intrstatus, s->int

Re: [Autotest] [KVM_AUTOTEST][PATCH] KSM_overcommit: dynamic reserve calculation

2010-05-06 Thread Lucas Meneghel Rodrigues
On Wed, 2010-05-05 at 21:52 +0100, Lukáš Doktor wrote:
> Hi,
> 
> we are back with new features of KSM_overcommit test:
> * NEW: guest_reserve and host_reserve are now calculated based on used 
> memory
> * NEW: tmpfs reserve is also evaluated to fit the overhead
> * NEW: VM alive check during split_guest()
> * FIX: In function split_guest() we used incorrect session
> * MOD: Increase number of VNC ports

Ok guys, thanks for working on this automatic evaluation of reserved
memory for host and guest!

> 
> host_reserve:
> * Still possible to set this value in cfg
> * Calculation:
> 1) host_reserve = Available memory + minimal guest(128MB)
> 2) host_reserve += number of guests * 64
> 
> guest_reserve:
> * Still possible to set this value in cfg
> * Calculation:
> 1) guest_reserve = 256
> 2) guest_reserve += guest memory * constant
> ... where constant represents (TMPFS overhead and multiplicative OS 
> memory consumption) per MB

Ok, this sounds good, I'll give my round of testing on it after you guys
re-send the patch to me with the small comments below addressed.

Also, I noticed you guys are not using the full awesomeness of git.
Please read the mini tutorial I made so you can use git send-email.

http://autotest.kernel.org/wiki/GitWorkflow

I made comments on your patch using -- so it's easier to identify
that is a comment and what is the actual patch. Next time, please send
the patch using git send-email.

> 
> Tested by ldoktor and jzupka on various HW (2GB-36GB host memory).
> 

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 6bc7987..1d83120 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -396,7 +396,7 @@ class VM:
 
 # Find available VNC port, if needed
 if params.get("display") == "vnc":
-self.vnc_port = kvm_utils.find_free_port(5900, 6000)
+self.vnc_port = kvm_utils.find_free_port(5900, 6100)
 
 # Find random UUID if specified 'uuid = random' in config file
 if params.get("uuid") == "random":
diff --git a/client/tests/kvm/scripts/allocator.py 
b/client/tests/kvm/scripts/allocator.py
index 1036893..227745a 100755
--- a/client/tests/kvm/scripts/allocator.py
+++ b/client/tests/kvm/scripts/allocator.py
@@ -8,10 +8,12 @@ Auxiliary script used to allocate memory on guests.
 """
 
 
-import os, array, sys, struct, random, copy, inspect, tempfile, datetime
+import os, array, sys, struct, random, copy, inspect, tempfile, datetime, math
 
 PAGE_SIZE = 4096 # machine page size
 
+TMPFS_OVERHEAD = 0.0022 # overhead on 1MB of write data 
+

---
Cool, how did you guys get to this constant?
---

 class MemFill(object):
 """
@@ -32,7 +34,8 @@ class MemFill(object):
 
 self.tmpdp = tempfile.mkdtemp()
 ret_code = os.system("mount -o size=%dM tmpfs %s -t tmpfs" %
- ((mem + 25), self.tmpdp))
+ ((mem+math.ceil(mem*TMPFS_OVERHEAD)), 
+ self.tmpdp))
 if ret_code != 0:
 if os.getuid() != 0:
 print ("FAIL: Unable to mount tmpfs "
@@ -42,7 +45,7 @@ class MemFill(object):
 else:
 self.f = tempfile.TemporaryFile(prefix='mem', dir=self.tmpdp)
 self.allocate_by = 'L'
-self.npages = (mem * 1024 * 1024) / PAGE_SIZE
+self.npages = ((mem * 1024 * 1024) / PAGE_SIZE)
 self.random_key = random_key
 self.static_value = static_value
 print "PASS: Initialization"
@@ -83,7 +86,7 @@ class MemFill(object):
 @return: return array of bytes size PAGE_SIZE.
 """
 a = array.array("B")
-for i in range(PAGE_SIZE / a.itemsize):
+for i in range((PAGE_SIZE / a.itemsize)):
 try:
 a.append(value)
 except:
diff --git a/client/tests/kvm/tests/ksm_overcommit.py 
b/client/tests/kvm/tests/ksm_overcommit.py
index 2dd46c4..31d5c61 100644
--- a/client/tests/kvm/tests/ksm_overcommit.py
+++ b/client/tests/kvm/tests/ksm_overcommit.py
@@ -142,6 +142,10 @@ def run_ksm_overcommit(test, params, env):
 session = None
 vm = None
 for i in range(1, vmsc):
+# Check VMs
+for j in range(0, vmsc):
+if not lvms[i].is_alive:
+raise error.TestFail("one of other VMs is death")

---
The message above should be along the lines 

e_msg = "VM %s died while executing static_random_fill on allocator loop" % i
---
 vm = lvms[i]
 session = lsessions[i]
 a_cmd = "mem.static_random_fill()"
@@ -154,6 +158,8 @@ def run_ksm_overcom

[PATCH] Fix segfault after device assignment hot remove

2010-05-06 Thread Alex Williamson
We keep a qlist of assigned devices for irq updates, but we forgot to
remove entries from it if they're hot unplugged.  This makes
assigned_dev_update_irqs() a timebomb that goes off when the guest is
rebooted.

Signed-off-by: Alex Williamson 
---

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 1f13a6d..b9cc06f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1390,6 +1390,7 @@ static int assigned_exitfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
+QLIST_REMOVE(dev, next);
 deassign_device(dev);
 free_assigned_device(dev);
 return 0;


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: reduce 50% memory usage

2010-05-06 Thread Marcelo Tosatti
On Thu, May 06, 2010 at 03:03:48PM +0800, Lai Jiangshan wrote:
> Marcelo Tosatti wrote:
> > On Thu, Apr 29, 2010 at 09:43:40PM +0300, Avi Kivity wrote:
> >> On 04/29/2010 09:09 PM, Marcelo Tosatti wrote:
> >>> You missed quadrant on 4mb large page emulation with shadow (see updated
> >>> patch below).
> >> Good catch.
> >>
> >>> Also for some reason i can't understand the assumption
> >>> does not hold for large sptes with TDP, so reverted for now.
> >> It's unrelated to TDP, same issue with shadow.  I think the
> >> calculation is correct.  For example the 4th spte for a level=2 page
> >> will yield gfn=4*512.
> > 
> > Under testing i see sp at level 2, with sp->gfn == 4096, mmu_set_spte
> > setting index 8 to gfn 4096 (whereas kvm_mmu_page_get_gfn returns 4096 +
> > 8*512).
> > 
> > Lai, can you please take a look at it? You should see the
> > kvm_mmu_page_set_gfn BUG_ON by using -mem-path on hugetlbfs.
> > 
> 
> Could you tell me how you test it? It will be better if I follow
> your test steps.

mount -t hugetlbfs none /mnt/
echo xyz > /proc/sys/vm/nr_hugepages
qemu-kvm parameters -mem-path /mnt/

> I also hit the kvm_mmu_page_set_gfn BUG_ON, It is because
> FNAME(fetch)() set sp->gfn wrong. The patch:
> [PATCH] kvm: calculate correct gfn for small host pages which emulates large 
> guest pages
> fix it.
> 
> I can not hit kvm_mmu_page_set_gfn BUG_ON after this patch also
> applied.
> 
> So could you tell me your test steps:
> The host: ept/npt enabled? 64bit? testing codes in host?

Intel EPT enabled.

> The guest: OS? PAE? 32bit? 64bit? testing codes in guest?

FC12 guest.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest

2010-05-06 Thread Avi Kivity

On 05/06/2010 05:20 PM, Cui, Dexuan wrote:



However, switching xcr0 may be slow.  That's our experience with msrs.
Can you measure its latency?
 

We can measure that.
However, I think the changing xcr0 to guest xcr0 in handle_xsetbv() is 
necessary --
or else, inside guest xgetbv() would return host xcr0 rather than guest xcr0 --
this is obviously incorrect. Once handle_xsetbv() changes the xcr0 to guest's 
value,
the xsetbv() in kvm_fx_restore_host() is also necessary, and the xsetbv() in
kvm_fx_restore_guest() is also necessary. So looks guest can't run with the
host xcr0.
   


Right.  Moreover, xsave would write into memory the guest doesn't expect.


btw, it needs save/restore for live migration, as well as save/restore
for the new fpu state.
 

Yes. This part is missing. Sheng and I is also doing this -- it may be a bittle
troublesome as the number of XSTATEs can grown as time goes on. We'll
have to handle the compatibility issue.
   



Reserve tons of space in the ioctl - and we can use the same format as 
xsave.


All those control registers are annoying, we have cr1 and cr5-cr7 free, 
cr9-cr15 on x86_64, infinite msr space, and now we have XCRs.  Great.


Looking forward to YCR0.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: X86: add the support of XSAVE/XRSTOR to guest

2010-05-06 Thread Avi Kivity

On 05/06/2010 10:45 PM, Avi Kivity wrote:


All those control registers are annoying, we have cr1 and cr5-cr7 
free, cr9-cr15 on x86_64, infinite msr space, and now we have XCRs.  
Great.


Looking forward to YCR0.



I think I see the reason - xgetbv is unprivileged, so applications can 
see what the OS supports instead of looking at cpuid and hoping.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-win problem

2010-05-06 Thread Brian Jackson
On Thursday, May 06, 2010 07:10:07 am Riccardo Veraldi wrote:
> Hello,
> if I install virtio-win drivers on windows 2008 Server R2, I have the
> problem of signed device drivers.
> I Can install the drivers but Windows 2008 server refuses to use them
> unless I start
> the machine pressing F8 every time at each reboot bypassing the checking
> of signed certified drivers, and this is annoying,
> since I Cannot reboot the virtual machien automatically.


I have some compiled and signed here:

http://theiggy.com/tmp/virtio-20100228.zip

These are not guaranteed to work and they will probably kill kittens. That 
said, I've had luck with them and had only a few reports of things not working 
(mostly with the balloon drivers).



> 
> Anyone solved this issue ?
> thanks
> 
> Rick
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Another SIGFPE in display code, now in cirrus

2010-05-06 Thread Michael Tokarev

There was a bug recently fixed in vnc code.  Apparently
there's something similar in the cirrus emulation as well.
Here it triggers _always_ (including old versions of kvm)
when running windows NT and hitting "test" button in its
display resolution dialog.  Here's what gdb is to say:

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0xf76cab70 (LWP 580)]
0x080c5e45 in cirrus_do_copy (s=0x86134dc, dst=96, src=0, w=2, h=9)
at hw/cirrus_vga.c:687
687 sx = (src % ABS(s->cirrus_blt_srcpitch)) / depth;
(gdb) p depth
$1 = 2
(gdb) p s->cirrus_blt_srcpitch
$2 = 0
(gdb) p *s
$3 = {vga = {
vram_ptr = 0xd5e42000 
"\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 
vram_offset = 537133056,

vram_size = 16777216, lfb_addr = 4026531840, lfb_end = 4043309056,
map_addr = 4026531840, map_end = 4043309056, lfb_vram_mapped = 1,
bios_offset = 0, bios_size = 0, latch = 3876589584, sr_index = 19 '\023',
sr = 
"\003!\017\000\016\000\022\027\000\000\030\230\000\000\000?\000\004\017$\000\000\000\024\024\024\024-",
 '\000' ,
gr_index = 56 '8',
gr = "\000\000\000\000\...@\005\017\377\000\000$", '\000' , 
"\017\000\000\000\000\000\000\000\001\000\b\000\001\000\000\000\000\246\016\000\000\000\000\000\000\201\016",
 '\000' , ar_index = 32 ' ',
ar = "\000\001\002\003\004\005\024\a89:;<=>?\005\000\017\b",
ar_flip_flop = 1, cr_index = 39 '\'',
cr = "}cc\200k\032\230\360\000`\016\017\000\000\000\000}#w\...@w\230\303\377\000\000\"", '\000' 
"\270, ", '\000' ,
msr = 103 'g', fcr = 0 '\000', st00 = 0 '\000', st01 = 0 '\000',
dac_state = 0 '\000', dac_sub_index = 0 '\000',
dac_read_index = 16 '\020', dac_write_index = 16 '\020',
dac_cache = "**?", dac_8bit = 0,
palette = 
"\000\000\000\000\000*\000*\000\000***\000\000*\000***\000***\000\000\025\000\000?\000*\025\000*?*\000\025*\000?**\025**?\000\025\000\000\025*\000?\000\000?**\025\000*\025**?\000*?*\000\025\025\000\025?\000?\025\000??*\025\025*\025?*?\025*??\025\000\000\025\000*\025*\000\025**?\000\000?\000*?*\000?**\025\000\025\025\000?\025*\025\025*??\000\025?\000??*\025?*?\025\025\000\025\025*\025?\000\025?*?\025\000?\025*??\000??*\025\025\025\025\025?\025?\025\025???\025\025?\025???\025???",
 '\000' , bank_offset = 0,
vga_io_memory = 56, get_bpp = 0x80c70e0 ,
get_offsets = 0x80c6f9e ,
get_resolution = 0x80c717e , vbe_index = 0,
vbe_regs = {45248, 0, 0, 0, 0, 0, 0, 0, 0, 0}, vbe_start_addr = 0,
vbe_line_offset = 0, vbe_bank_mask = 255, vbe_mapped = 0, ds = 0x8489fb0,
font_offsets = {2, 2}, graphic_mode = 1, shift_control = 2 '\002',
double_scan = 0 '\000', line_offset = 1600, line_compare = 1023,
start_addr = 0, plane_updated = 0, last_line_offset = 1600,
last_cw = 9 '\t', last_ch = 16 '\020', last_width = 800,
last_height = 600, last_scr_width = 800, last_scr_height = 600,
last_depth = 16, cursor_start = 14 '\016', cursor_end = 15 '\017',
cursor_offset = 0, rgb_to_pixel = 0x809fadb ,
update = 0x80a19f4 ,
invalidate = 0x80a1ac1 ,
screen_dump = 0x80a2fda ,
text_update = 0x80a1e83 , invalidated_y_table = {
  0 },
cursor_invalidate = 0x80c8b9c ,
cursor_draw_line = 0x80c8e33 , last_palette = {0,
  168, 43008, 43176, 11010048, 11010216, 11032320, 11053224, 5723991,
  5724159, 5766999, 5767167, 16734039, 16734207, 16777047, 16777215,
  0 }, last_ch_attr = {0 ,
  4294967295, 0 },
retrace = 0x809b64c ,
update_retrace_info = 0x809b298 ,
retrace_info = {precise = {ticks_per_char = 0, total_chars = 0,
htotal = 0, hstart = 0, hend = 0, vstart = 0, vend = 0, freq = 0}},
is_vbe_vmstate = 1 '\001'}, cirrus_linear_io_addr = 64,
  cirrus_linear_bitblt_io_addr = 72, cirrus_mmio_io_addr = 80,
  cirrus_addr_mask = 4194303, linear_mmio_mask = 4194048,
  cirrus_shadow_gr0 = 0 '\000', cirrus_shadow_gr1 = 0 '\000',
  cirrus_hidden_dac_lockindex = 0 '\000', cirrus_hidden_dac_data = 225 '\341',
  cirrus_bank_base = {0, 32768}, cirrus_bank_limit = {4194304, 4161536},
  cirrus_hidden_palette = '\000' "\377, \377\377",
  hw_cursor_x = 0, hw_cursor_y = 0, cirrus_blt_pixelwidth = 1,
  cirrus_blt_width = 2, cirrus_blt_height = 9, cirrus_blt_dstpitch = 1,
  cirru

Re: [PATCH] KVM: Get rid of KVM_REQ_KICK

2010-05-06 Thread Marcelo Tosatti
On Mon, May 03, 2010 at 05:19:08PM +0300, Avi Kivity wrote:
> KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
> operation.  This causes the fast path check for a clear vcpu->requests
> to fail all the time, triggering tons of atomic operations.
> 
> Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/x86.c   |   17 ++---
>  include/linux/kvm_host.h |1 +
>  2 files changed, 11 insertions(+), 7 deletions(-)

Reviewed-by: Marcelo Tosatti 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Fix EFER.NX=0 with EPT

2010-05-06 Thread Marcelo Tosatti
On Sun, May 02, 2010 at 12:48:49PM +0300, Avi Kivity wrote:
> Currently we run with EFER.NX=1 on the guest even if the guest value is 0.
> This is fine with shadow, since we check bit 63 when instantiating a page
> table, and fault if bit 63 is set while EFER.NX is clear.
> 
> This doesn't work with EPT, since we no longer get the change to check guest
> ptes.  So we need to run with EFER.NX=0.
> 
> This is complicated by the fact that if we switch EFER.NX on the host, we'll
> trap immediately, since some host pages are mapped with the NX bit set.  As
> a result, we need to switch the MSR atomically during guest entry and exit.
> 
> This patchset implements the complications described above.
> 
> v2:
> Fix transition from long mode to legacy mode

Reviewed-by: Marcelo Tosatti 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots

2010-05-06 Thread Marcelo Tosatti
On Tue, May 04, 2010 at 01:03:50PM +0300, Avi Kivity wrote:
> On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.
> 
> Push the spinlock into mmu_alloc_roots(), and only take it after we've read
> the pdptr.
> 
> Signed-off-by: Avi Kivity 
> ---
> 
> Marcelo, dropping and re-acquiring the lock before mmu_sync_roots(), is fine,
> yes?

Yes, but you should call kvm_mmu_free_some_pages after reacquiring the
spin_lock, to guarantee kvm_mmu_alloc_page won't fail.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-win problem

2010-05-06 Thread Jernej Simončič
On Thursday, May 6, 2010, 21:59:21, Brian Jackson wrote:

> http://theiggy.com/tmp/virtio-20100228.zip
> These are not guaranteed to work and they will probably kill kittens. That
> said, I've had luck with them and had only a few reports of things not working
> (mostly with the balloon drivers).

XP32 drivers from this pack didn't work for me, but Vista64 work fine.

-- 
< Jernej Simončič ><><><><>< http://eternallybored.org/ >

Beauty times brains equals a constant.
   -- Beckhap's Law

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-win problem

2010-05-06 Thread Brian Jackson
On Thursday, May 06, 2010 03:11:00 pm Jernej Simončič wrote:
> On Thursday, May 6, 2010, 21:59:21, Brian Jackson wrote:
> > http://theiggy.com/tmp/virtio-20100228.zip
> > These are not guaranteed to work and they will probably kill kittens.
> > That said, I've had luck with them and had only a few reports of things
> > not working (mostly with the balloon drivers).
> 
> XP32 drivers from this pack didn't work for me, but Vista64 work fine.

What about the XP32 drivers from:

http://theiggy.com/tmp/virtio-20091208.zip
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


OPCODE Emulation

2010-05-06 Thread Matteo Signorini

Dear Yaniv, Dear Avi,

I would like to add the "sidt emulation" feature in kvm, but in order to
implement it I need to know the details on how the OPCODE works and how 
exactly opcodes are emulated within kvm.

For example let's take the SIDT instruction.
I know the LIDT opcode is "0F 01 /1" but what does 0F, 01 and /1 mean?
I also know that this instruction has only the operand "ModRM:r/m (w)"
but where is this operand stored and how can I access it in emulation?
 Could you please suggest to me where can I found some detailed docs on 
the subject?
(I have already read the Intel Volume 2B Instruction Set Reference N-Z 
pag. 4-440 but I have not found enough detailed information)


Thank you

Matteo Signorini
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-win problem

2010-05-06 Thread Jernej Simončič
On Thursday, May 6, 2010, 22:36:02, Brian Jackson wrote:

> What about the XP32 drivers from:
> http://theiggy.com/tmp/virtio-20091208.zip

This is what I currently use on XP, and it works fine (I think I
mentioned this on IRC - my nickname's ender` there).

-- 
< Jernej Simončič ><><><><>< http://eternallybored.org/ >

An executive will always return to work from lunch early if no one takes him.
   -- Kelly's Law

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-win problem

2010-05-06 Thread Brian Jackson
On Thursday, May 06, 2010 04:05:17 pm Jernej Simončič wrote:
> On Thursday, May 6, 2010, 22:36:02, Brian Jackson wrote:
> > What about the XP32 drivers from:
> > http://theiggy.com/tmp/virtio-20091208.zip
> 
> This is what I currently use on XP, and it works fine (I think I
> mentioned this on IRC - my nickname's ender` there).

Ahh, didn't make the connection. I'm still working on new drivers that 
hopefully will be fixed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Add a subtest iofuzz

2010-05-06 Thread Lucas Meneghel Rodrigues
From: Jason Wang 

The design of iofuzz is simple: it just generate random I/O port
activity inside the virtual machine. The correctness of the device
emulation may be verified through this test.

As the instrcutions are randomly generated, guest may enter the wrong
state. The test solve this issue by detect the hang and restart the
virtual machine.

The test duration could also be adjusted through the "fuzz_count". And
the parameter "skip_devices" is used to specified the devices which
should not be used to do the fuzzing.

For current version, every activity were logged and the commnad was
sent through a seesion between host and guest. Through this method may
slow down the whole test but it works well. The enumeration was done
through /proc/ioports and the scenario of avtivity is not aggressive.

Suggestions are welcome.

Signed-off-by: Jason Wang 
---
 client/tests/kvm/tests/iofuzz.py   |  138 
 client/tests/kvm/tests_base.cfg.sample |2 +
 2 files changed, 140 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/iofuzz.py

diff --git a/client/tests/kvm/tests/iofuzz.py b/client/tests/kvm/tests/iofuzz.py
new file mode 100644
index 000..5bd1b8e
--- /dev/null
+++ b/client/tests/kvm/tests/iofuzz.py
@@ -0,0 +1,138 @@
+import logging, time, re, random
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+
+def run_iofuzz(test, params, env):
+"""
+KVM iofuzz test:
+1) Log into a guest
+2) Enumerate all IO port ranges through /proc/ioports
+3) On each port of the range:
+* Read it
+* Write 0 to it
+* Write a random value to a random port on a random order
+
+If the guest SSH session hangs, the test detects the hang and the guest
+is then rebooted. The test fails if we detect the qemu process to terminate
+while executing the process.
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+def outb(session, port, data):
+"""
+Write data to a given port.
+
+@param session: SSH session stablished to a VM
+@param port: Port where we'll write the data
+@param data: Integer value that will be written on the port. This
+value will be converted to octal before its written.
+"""
+logging.debug("outb(0x%x, 0x%x)", port, data)
+outb_cmd = ("echo -e '\\%s' | dd of=/dev/port seek=%d bs=1 count=1" %
+(oct(data), port))
+s, o = session.get_command_status_output(outb_cmd)
+if s is None:
+logging.debug("Command did not return")
+if s != 0:
+logging.debug("Command returned status %s", s)
+
+
+def inb(session, port):
+"""
+Read from a given port.
+
+@param session: SSH session stablished to a VM
+@param port: Port where we'll read data
+"""
+logging.debug("inb(0x%x)", port)
+inb_cmd = "dd if=/dev/port seek=%d of=/dev/null bs=1 count=1" % port
+s, o = session.get_command_status_output(inb_cmd)
+if s is None:
+logging.debug("Command did not return")
+if s != 0:
+logging.debug("Command returned status %s", s)
+
+
+def fuzz(session, inst_list):
+"""
+Executes a series of read/write/randwrite instructions.
+
+If the guest SSH session hangs, an attempt to relogin will be made.
+If it fails, the guest will be reset. If during the process the VM
+process abnormally ends, the test fails.
+
+@param inst_list: List of instructions that will be executed.
+@raise error.TestFail: If the VM process dies in the middle of the
+fuzzing procedure.
+"""
+for (op, operand) in inst_list:
+if op == "read":
+inb(session, operand[0])
+elif op =="write":
+outb(session, operand[0], operand[1])
+else:
+raise error.TestError("Unknown command %s" % op)
+
+if not session.is_responsive():
+logging.debug("Session is not responsive")
+if vm.process.is_alive():
+logging.debug("VM is alive, try to re-login")
+try:
+session = kvm_test_utils.wait_for_login(vm, 0, 10, 0, 
2)
+except:
+logging.debug("Could not re-login, reboot the guest")
+session = kvm_test_utils.reboot(vm, session,
+method = 
"system_reset")
+else:
+raise error.TestFail("VM has quit abnormally during %s",
+ (op, operand))
+
+
+boot_timeout = float(params.get("boot_timeout", 240))
+vm = kvm_test_utils.ge

Re: [Autotest] [PATCH] KVM test: Add a subtest iofuzz

2010-05-06 Thread Lucas Meneghel Rodrigues
On Wed, Apr 7, 2010 at 8:55 AM, Jason Wang  wrote:
> The design of iofuzz is simple: it just generate random I/O port
> activity inside the virtual machine. The correctness of the device
> emulation may be verified through this test.
>
> As the instrcutions are randomly generated, guest may enter the wrong
> state. The test solve this issue by detect the hang and restart the
> virtual machine.
>
> The test duration could also be adjusted through the "fuzz_count". And
> the parameter "skip_devices" is used to specified the devices which
> should not be used to do the fuzzing.
>
> For current version, every activity were logged and the commnad was
> sent through a seesion between host and guest. Through this method may
> slow down the whole test but it works well. The enumeration was done
> through /proc/ioports and the scenario of avtivity is not aggressive.
>
> Suggestions are welcomed.

Hi Jason, nice test! I have made some little changes, re-submitted and
will commit. Thanks!

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/tests/iofuzz.py       |   97 
> 
>  client/tests/kvm/tests_base.cfg.sample |    2 +
>  2 files changed, 99 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/tests/iofuzz.py
>
> diff --git a/client/tests/kvm/tests/iofuzz.py 
> b/client/tests/kvm/tests/iofuzz.py
> new file mode 100644
> index 000..c2f22af
> --- /dev/null
> +++ b/client/tests/kvm/tests/iofuzz.py
> @@ -0,0 +1,97 @@
> +import logging, time, re, random
> +from autotest_lib.client.common_lib import error
> +import kvm_subprocess, kvm_test_utils, kvm_utils
> +
> +
> +def run_iofuzz(test, params, env):
> +    """
> +    KVM iofuzz test:
> +    1) Log into a guest
> +
> +   �...@param test: kvm test object
> +   �...@param params: Dictionary with the test parameters
> +   �...@param env: Dictionary with test environment.
> +    """
> +    vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +    session = kvm_test_utils.wait_for_login(vm, 0,
> +                                         float(params.get("boot_timeout", 
> 240)),
> +                                         0, 2)
> +
> +    def outb(session, port, data):
> +        logging.debug("outb(0x%x,0x%x)" % (port, data))
> +        outb_cmd = "echo -e '\\%s' | dd of=/dev/port seek=%d bs=1 count=1" % 
> \
> +                   (oct(data), port)
> +        s, o = session.get_command_status_output(outb_cmd)
> +        if s != 0:
> +            logging.debug("None zero value returned")
> +
> +    def inb(session, port):
> +        logging.debug("inb(0x%x)" % port)
> +        inb_cmd = "dd if=/dev/port seek=%d of=/dev/null bs=1 count=1" % port
> +        s, o = session.get_command_status_output(inb_cmd)
> +        if s != 0:
> +            logging.debug("None zero value returned")
> +
> +    def fuzz(session, inst_list):
> +        for (op, operand) in inst_list:
> +            if op == "read":
> +                inb(session, operand[0])
> +            elif op =="write":
> +                outb(session, operand[0], operand[1])
> +            else:
> +                raise error.TestError("Unknown command %s" % op)
> +
> +            if not session.is_responsive():
> +                logging.debug("Session is not responsive")
> +                if vm.process.is_alive():
> +                    logging.debug("VM is alive, try to re-login")
> +                    try:
> +                        session = kvm_test_utils.wait_for_login(vm, 0, 10, 
> 0, 2)
> +                    except:
> +                        logging.debug("Could not re-login, reboot the guest")
> +                        session = kvm_test_utils.reboot(vm, session,
> +                                                        method = 
> "system_reset")
> +                else:
> +                    raise error.TestFail("VM have quit abnormally")
> +    try:
> +        ports = {}
> +        ran = random.SystemRandom()
> +
> +        logging.info("Enumerate the device through /proc/ioports")
> +        ioports = session.get_command_output("cat /proc/ioports")
> +        logging.debug(ioports)
> +        devices = re.findall("(\w+)-(\w+)\ : (.*)", ioports)
> +
> +        skip_devices = params.get("skip_devices","")
> +        fuzz_count = int(params.get("fuzz_count", 10))
> +
> +        for (beg, end, name) in devices:
> +            ports[(int(beg, base=16), int(end, base=16))] = name.strip()
> +
> +        for (beg, end) in ports.keys():
> +
> +            name = ports[(beg, end)]
> +            if name in skip_devices:
> +                logging.info("Skipping %s" % name)
> +                continue
> +
> +            logging.info("Fuzzing %s at 0x%x-0x%x" % (name, beg, end))
> +            inst = []
> +
> +            # Read all ports
> +            for port in range(beg, end+1):
> +                inst.append(("read", [port]))
> +
> +            # Outb with zero
> +            for port in range(beg, end+1):
> +                inst.append(("w

Re: OPCODE Emulation

2010-05-06 Thread Mohammed Gamal
On Thu, May 6, 2010 at 11:37 PM, Matteo Signorini
 wrote:
>
> Dear Yaniv, Dear Avi,
>
> I would like to add the "sidt emulation" feature in kvm, but in order to
> implement it I need to know the details on how the OPCODE works and how 
> exactly opcodes are emulated within kvm.
> For example let's take the SIDT instruction.
> I know the LIDT opcode is "0F 01 /1" but what does 0F, 01 and /1 mean?
> I also know that this instruction has only the operand "ModRM:r/m (w)"
> but where is this operand stored and how can I access it in emulation?
>  Could you please suggest to me where can I found some detailed docs on the 
> subject?
> (I have already read the Intel Volume 2B Instruction Set Reference N-Z pag. 
> 4-440 but I have not found enough detailed information)
>
> Thank you
>
> Matteo Signorini

Hi Matteo,
arch/x86/kvm/emulate.c is the best place to start. All you need to
look at is there.

Regards,
Mohammed
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 3/3] KVM Test: Add ioquit test case

2010-05-06 Thread Lucas Meneghel Rodrigues
On Wed, Apr 7, 2010 at 5:49 AM, Feng Yang  wrote:
> Signed-off-by: Feng Yang 
> ---
>  client/tests/kvm/tests/ioquit.py       |   54 
> 
>  client/tests/kvm/tests_base.cfg.sample |    4 ++
>  2 files changed, 58 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/tests/ioquit.py
>
> diff --git a/client/tests/kvm/tests/ioquit.py 
> b/client/tests/kvm/tests/ioquit.py
> new file mode 100644
> index 000..c75a0e3
> --- /dev/null
> +++ b/client/tests/kvm/tests/ioquit.py
> @@ -0,0 +1,54 @@
> +import logging, time, random, signal, os
> +from autotest_lib.client.common_lib import error
> +import kvm_test_utils, kvm_utils
> +
> +
> +def run_ioquit(test, params, env):
> +    """
> +    Emulate the poweroff under IO workload(dbench so far) using monitor
> +    command 'quit'.
> +
> +   �...@param test: Kvm test object
> +   �...@param params: Dictionary with the test parameters.
> +   �...@param env: Dictionary with test environment.
> +    """

Hi Feng, after reading your test I think I got the idea. You want to
put some heavy load on the system, quit the VM through a monitor
command and then we pray for it to not segfault during the process.

However:



> +    vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +    session = kvm_test_utils.wait_for_login(vm,
> +                  timeout=int(params.get("login_timeout", 360)))
> +    session2 = kvm_test_utils.wait_for_login(vm,
> +                  timeout=int(params.get("login_timeout", 360)))
> +    def is_autotest_launched():
> +        if session.get_command_status("pgrep autotest") != 0:
> +            logging.debug("Autotest process not found")
> +            return False
> +        return True
> +
> +    test_name = params.get("background_test", "dbench")
> +    control_file = params.get("control_file", "dbench.control")
> +    timeout = int(params.get("test_timeout", 300))
> +    control_path = os.path.join(test.bindir, "autotest_control",
> +                                control_file)
> +    outputdir = test.outputdir
> +
> +    pid = kvm_test_utils.run_autotest_background(vm, session2, control_path,
> +                                                 timeout, test_name,
> +                                                 outputdir)
> +    if pid < 0:
> +        raise error.TestError("Could not create child process to execute "
> +                              "autotest background")
> +
> +    if kvm_utils.wait_for(is_autotest_launched, 240, 0, 2):
> +        logging.debug("Background autotest successfully")
> +    else:
> +        logging.debug("Background autotest failed, start the test anyway")
> +
> +    time.sleep(100 + random.randrange(0,100))
> +    logging.info("Kill the virtual machine")
> +    vm.process.close()
> +
> +    logging.info("Kill the tracking process")
> +    kvm_utils.safe_kill(pid, signal.SIGKILL)
> +    kvm_test_utils.wait_autotest_background(pid)
> +    session.close()
> +    session2.close()
> +
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 9b12fc2..d8530f6 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -305,6 +305,10 @@ variants:
>             - ksm_parallel:
>                 ksm_mode = "parallel"
>
> +    - ioquit:
> +        type = ioquit
> +        control_file = dbench.control.200
> +        background_test = dbench
>     # system_powerdown, system_reset and shutdown *must* be the last ones
>     # defined (in this order), since the effect of such tests can leave
>     # the VM on a bad state.
> --
> 1.5.5.6
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 3/3] KVM Test: Add ioquit test case

2010-05-06 Thread Lucas Meneghel Rodrigues
On Wed, Apr 7, 2010 at 5:49 AM, Feng Yang  wrote:
> Signed-off-by: Feng Yang 
> ---
>  client/tests/kvm/tests/ioquit.py       |   54 
> 
>  client/tests/kvm/tests_base.cfg.sample |    4 ++
>  2 files changed, 58 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/tests/ioquit.py
>
> diff --git a/client/tests/kvm/tests/ioquit.py 
> b/client/tests/kvm/tests/ioquit.py
> new file mode 100644
> index 000..c75a0e3
> --- /dev/null
> +++ b/client/tests/kvm/tests/ioquit.py
> @@ -0,0 +1,54 @@
> +import logging, time, random, signal, os
> +from autotest_lib.client.common_lib import error
> +import kvm_test_utils, kvm_utils
> +
> +
> +def run_ioquit(test, params, env):
> +    """
> +    Emulate the poweroff under IO workload(dbench so far) using monitor
> +    command 'quit'.
> +
> +   �...@param test: Kvm test object
> +   �...@param params: Dictionary with the test parameters.
> +   �...@param env: Dictionary with test environment.
> +    """

Hi Feng, after reading your test I *think* I got the idea. You want to
put some heavy load on the system, quit the VM through a monitor
command and then we pray for it to not segfault during the process.

However:

1) Using autotest in the background to generate the high load
certainly seems overkill. I'd say to use a rather standard shell one
liner to generate the load, put it to run in background, and that is
it.
2) In no moment this test actually issues a 'quit' through monitor
3) When sending 'quit' is implemented, then if the VM segfaults the
crash handler will pick up the core dump

So, please simplify the test removing the function that forks autotest
and runs it on background, actually send quit through the monitor,
give it a good round of testing and then send it again. I am still not
100% convinced of the usefulness of the test, but it's worth a try.

> +    vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +    session = kvm_test_utils.wait_for_login(vm,
> +                  timeout=int(params.get("login_timeout", 360)))
> +    session2 = kvm_test_utils.wait_for_login(vm,
> +                  timeout=int(params.get("login_timeout", 360)))
> +    def is_autotest_launched():
> +        if session.get_command_status("pgrep autotest") != 0:
> +            logging.debug("Autotest process not found")
> +            return False
> +        return True
> +
> +    test_name = params.get("background_test", "dbench")
> +    control_file = params.get("control_file", "dbench.control")
> +    timeout = int(params.get("test_timeout", 300))
> +    control_path = os.path.join(test.bindir, "autotest_control",
> +                                control_file)
> +    outputdir = test.outputdir
> +
> +    pid = kvm_test_utils.run_autotest_background(vm, session2, control_path,
> +                                                 timeout, test_name,
> +                                                 outputdir)
> +    if pid < 0:
> +        raise error.TestError("Could not create child process to execute "
> +                              "autotest background")
> +
> +    if kvm_utils.wait_for(is_autotest_launched, 240, 0, 2):
> +        logging.debug("Background autotest successfully")
> +    else:
> +        logging.debug("Background autotest failed, start the test anyway")
> +
> +    time.sleep(100 + random.randrange(0,100))
> +    logging.info("Kill the virtual machine")
> +    vm.process.close()
> +
> +    logging.info("Kill the tracking process")
> +    kvm_utils.safe_kill(pid, signal.SIGKILL)
> +    kvm_test_utils.wait_autotest_background(pid)
> +    session.close()
> +    session2.close()
> +
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 9b12fc2..d8530f6 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -305,6 +305,10 @@ variants:
>             - ksm_parallel:
>                 ksm_mode = "parallel"
>
> +    - ioquit:
> +        type = ioquit
> +        control_file = dbench.control.200
> +        background_test = dbench
>     # system_powerdown, system_reset and shutdown *must* be the last ones
>     # defined (in this order), since the effect of such tests can leave
>     # the VM on a bad state.
> --
> 1.5.5.6
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Endless loop in qcow2_alloc_cluster_offset

2010-05-06 Thread Marcelo Tosatti
On Thu, Nov 19, 2009 at 01:19:55PM +0100, Jan Kiszka wrote:
> Hi,
> 
> I just managed to push a qemu-kvm process (git rev. b496fe3431) into an
> endless loop in qcow2_alloc_cluster_offset, namely over
> QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight):
> 
> (gdb) bt
> #0  0x0048614b in qcow2_alloc_cluster_offset (bs=0xc4e1d0, 
> offset=7417184256, n_start=0, n_end=16, num=0xcb351c, m=0xcb3568) at 
> /data/qemu-kvm/block/qcow2-cluster.c:750
> #1  0x004828d0 in qcow_aio_write_cb (opaque=0xcb34d0, ret=0) at 
> /data/qemu-kvm/block/qcow2.c:587
> #2  0x00482a44 in qcow_aio_writev (bs=, 
> sector_num=, qiov=, 
> nb_sectors=, cb=, opaque= optimized out>) at /data/qemu-kvm/block/qcow2.c:645
> #3  0x00470e89 in bdrv_aio_writev (bs=0xc4e1d0, sector_num=2, 
> qiov=0x7f48a9010ed0, nb_sectors=16, cb=0x470d20 , 
> opaque=0x7f48a9010f0c) at /data/qemu-kvm/block.c:1362
> #4  0x00472991 in bdrv_write_em (bs=0xc4e1d0, sector_num=14486688, 
> buf=0xd67200 "H\a", nb_sectors=16) at /data/qemu-kvm/block.c:1736
> #5  0x00435581 in ide_sector_write (s=0xc92650) at 
> /data/qemu-kvm/hw/ide/core.c:622
> #6  0x00425fc2 in kvm_handle_io (env=) at 
> /data/qemu-kvm/kvm-all.c:553
> #7  kvm_run (env=) at /data/qemu-kvm/qemu-kvm.c:964
> #8  0x00426049 in kvm_cpu_exec (env=0x1000) at 
> /data/qemu-kvm/qemu-kvm.c:1651
> #9  0x0042627d in kvm_main_loop_cpu (_env=) at 
> /data/qemu-kvm/qemu-kvm.c:1893
> #10 ap_main_loop (_env=) at 
> /data/qemu-kvm/qemu-kvm.c:1943
> #11 0x7f48ae89d070 in start_thread () from /lib64/libpthread.so.0
> #12 0x7f48abf0711d in clone () from /lib64/libc.so.6
> #13 0x in ?? ()
> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
> $5 = (struct QCowL2Meta *) 0xcb3568
> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, 
> depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight 
> = {le_next = 0xcb3568, le_prev = 0xc4ebd8}}
> 
> So next == first.
> 

Seen the exact same bug twice in a row while installing FC12 with IDE
disk, current qemu-kvm.git. 

qemu-system-x86_64 -drive file=/root/images/fc12-ide.img,cache=writeback \
-m 1000  -vnc :1 \
-net nic,model=virtio \
-net tap,script=/root/ifup.sh -serial stdio \
-cdrom /root/iso/linux/Fedora-12-x86_64-DVD.iso -monitor
telnet::4445,server,nowait -usbdevice tablet

Can't reproduce though.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 3/3] KVM Test: Add ioquit test case

2010-05-06 Thread Feng Yang
Hi, Lucas

Thanks for your comment.  I am sorry for no response so long time.

I will update it according to your comment.

Also thanks Michael for his comment. 





- "Lucas Meneghel Rodrigues"  wrote:

> From: "Lucas Meneghel Rodrigues" 
> To: "Feng Yang" 
> Cc: autot...@test.kernel.org, kvm@vger.kernel.org
> Sent: Friday, May 7, 2010 7:32:33 AM GMT +08:00 Beijing / Chongqing / Hong 
> Kong / Urumqi
> Subject: Re: [Autotest] [PATCH 3/3] KVM Test: Add ioquit test case
>
> On Wed, Apr 7, 2010 at 5:49 AM, Feng Yang  wrote:
> > Signed-off-by: Feng Yang 
> > ---
> >  client/tests/kvm/tests/ioquit.py       |   54
> 
> >  client/tests/kvm/tests_base.cfg.sample |    4 ++
> >  2 files changed, 58 insertions(+), 0 deletions(-)
> >  create mode 100644 client/tests/kvm/tests/ioquit.py
> >
> > diff --git a/client/tests/kvm/tests/ioquit.py
> b/client/tests/kvm/tests/ioquit.py
> > new file mode 100644
> > index 000..c75a0e3
> > --- /dev/null
> > +++ b/client/tests/kvm/tests/ioquit.py
> > @@ -0,0 +1,54 @@
> > +import logging, time, random, signal, os
> > +from autotest_lib.client.common_lib import error
> > +import kvm_test_utils, kvm_utils
> > +
> > +
> > +def run_ioquit(test, params, env):
> > +    """
> > +    Emulate the poweroff under IO workload(dbench so far) using
> monitor
> > +    command 'quit'.
> > +
> > +   �...@param test: Kvm test object
> > +   �...@param params: Dictionary with the test parameters.
> > +   �...@param env: Dictionary with test environment.
> > +    """
> 
> Hi Feng, after reading your test I *think* I got the idea. You want
> to
> put some heavy load on the system, quit the VM through a monitor
> command and then we pray for it to not segfault during the process.
> 
> However:
> 
> 1) Using autotest in the background to generate the high load
> certainly seems overkill. I'd say to use a rather standard shell one
> liner to generate the load, put it to run in background, and that is
> it.
> 2) In no moment this test actually issues a 'quit' through monitor
> 3) When sending 'quit' is implemented, then if the VM segfaults the
> crash handler will pick up the core dump
> 
> So, please simplify the test removing the function that forks
> autotest
> and runs it on background, actually send quit through the monitor,
> give it a good round of testing and then send it again. I am still
> not
> 100% convinced of the usefulness of the test, but it's worth a try.
> 
> > +    vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> > +    session = kvm_test_utils.wait_for_login(vm,
> > +                  timeout=int(params.get("login_timeout", 360)))
> > +    session2 = kvm_test_utils.wait_for_login(vm,
> > +                  timeout=int(params.get("login_timeout", 360)))
> > +    def is_autotest_launched():
> > +        if session.get_command_status("pgrep autotest") != 0:
> > +            logging.debug("Autotest process not found")
> > +            return False
> > +        return True
> > +
> > +    test_name = params.get("background_test", "dbench")
> > +    control_file = params.get("control_file", "dbench.control")
> > +    timeout = int(params.get("test_timeout", 300))
> > +    control_path = os.path.join(test.bindir, "autotest_control",
> > +                                control_file)
> > +    outputdir = test.outputdir
> > +
> > +    pid = kvm_test_utils.run_autotest_background(vm, session2,
> control_path,
> > +                                                 timeout,
> test_name,
> > +                                                 outputdir)
> > +    if pid < 0:
> > +        raise error.TestError("Could not create child process to
> execute "
> > +                              "autotest background")
> > +
> > +    if kvm_utils.wait_for(is_autotest_launched, 240, 0, 2):
> > +        logging.debug("Background autotest successfully")
> > +    else:
> > +        logging.debug("Background autotest failed, start the test
> anyway")
> > +
> > +    time.sleep(100 + random.randrange(0,100))
> > +    logging.info("Kill the virtual machine")
> > +    vm.process.close()
> > +
> > +    logging.info("Kill the tracking process")
> > +    kvm_utils.safe_kill(pid, signal.SIGKILL)
> > +    kvm_test_utils.wait_autotest_background(pid)
> > +    session.close()
> > +    session2.close()
> > +
> > diff --git a/client/tests/kvm/tests_base.cfg.sample
> b/client/tests/kvm/tests_base.cfg.sample
> > index 9b12fc2..d8530f6 100644
> > --- a/client/tests/kvm/tests_base.cfg.sample
> > +++ b/client/tests/kvm/tests_base.cfg.sample
> > @@ -305,6 +305,10 @@ variants:
> >             - ksm_parallel:
> >                 ksm_mode = "parallel"
> >
> > +    - ioquit:
> > +        type = ioquit
> > +        control_file = dbench.control.200
> > +        background_test = dbench
> >     # system_powerdown, system_reset and shutdown *must* be the last
> ones
> >     # defined (in this order), since the effect of such tests can
> leave
> >     # the VM on a bad state.
> > --
> > 1

RE: [PATCH 2/3 v2] KVM: VMX: VMCLEAR/VMPTRLD usage changes.

2010-05-06 Thread Xu, Dongxiao
Avi Kivity wrote:
> On 05/06/2010 11:45 AM, Xu, Dongxiao wrote:
>> From: Dongxiao Xu
>> 
>> Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
>> support hosted VMM coexistance, VMCLEAR is executed on vcpu
>> schedule out, and VMPTRLD is executed on vcpu schedule in.
>> This could also eliminate the IPI when doing VMCLEAR.
>> 
>> 
>> 
>> +static int __read_mostly vmm_coexistence;
>> +module_param(vmm_coexistence, bool, S_IRUGO);
>> 
> 
> I think we can call it 'exclusive' defaulting to true.

I will change it in next version.

> 
>> +
>>   #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  
>> \
>>  (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
>>   #define KVM_GUEST_CR0_MASK \
>> @@ -222,6 +225,7 @@ static u64 host_efer;
>> 
>>   static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
>> 
>> +static void vmx_flush_tlb(struct kvm_vcpu *vcpu);
>>   /*
>>* Keep MSR_K6_STAR at the end, as setup_msrs() will try to
>> optimize it 
>>* away by decrementing the array size.
>> @@ -784,25 +788,31 @@ static void vmx_vcpu_load(struct kvm_vcpu
>>  *vcpu, int cpu) struct vcpu_vmx *vmx = to_vmx(vcpu);
>>  u64 tsc_this, delta, new_offset;
>> 
>> -if (vcpu->cpu != cpu) {
>> -vcpu_clear(vmx);
>> -kvm_migrate_timers(vcpu);
>> +if (vmm_coexistence) {
>> +vmcs_load(vmx->vmcs);
>>  set_bit(KVM_REQ_TLB_FLUSH,&vcpu->requests);
>> -local_irq_disable();
>> -list_add(&vmx->local_vcpus_link,
>> -&per_cpu(vcpus_on_cpu, cpu));
>> -local_irq_enable();
>> -}
>> +} else {
>> +if (vcpu->cpu != cpu) {
>> +vcpu_clear(vmx);
>> +set_bit(KVM_REQ_TLB_FLUSH,&vcpu->requests);
>> +local_irq_disable();
>> +list_add(&vmx->local_vcpus_link,
>> +&per_cpu(vcpus_on_cpu, cpu));
>> +local_irq_enable();
>> +}
>> 
>> -if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
>> -per_cpu(current_vmcs, cpu) = vmx->vmcs;
>> -vmcs_load(vmx->vmcs);
>> +if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
>> +per_cpu(current_vmcs, cpu) = vmx->vmcs;
>> +vmcs_load(vmx->vmcs);
>> +}
>>  }
>> 
> 
> Please keep the exclusive use code as it is, and just add !exclusive
> cases.  For example. the current_vmcs test will always fail since
> current_vmcs will be set to NULL on vmx_vcpu_put, so we can have just
> one vmcs_load().

Thanks for the suggestion.

> 
>> 
>> @@ -829,7 +839,14 @@ static void vmx_vcpu_load(struct kvm_vcpu
>> *vcpu, int cpu) 
>> 
>>   static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
>>   {
>> +struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>>  __vmx_load_host_state(to_vmx(vcpu));
>> +if (vmm_coexistence) {
>> +vmcs_clear(vmx->vmcs);
>> +rdtscll(vmx->vcpu.arch.host_tsc);
>> +vmx->launched = 0;
>> 
> 
> This code is also duplicated.  Please refactor to avoid duplication.

Thanks.

> 
>> +}
>>   }
>> 
>>   static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
>> @@ -1280,7 +1297,8 @@ static void kvm_cpu_vmxoff(void)
>> 
>>   static void hardware_disable(void *garbage)
>>   {
>> -vmclear_local_vcpus();
>> +if (!vmm_coexistence)
>> +vmclear_local_vcpus();
>>  kvm_cpu_vmxoff();
>>  write_cr4(read_cr4()&  ~X86_CR4_VMXE);
>>   }
>> @@ -3927,7 +3945,8 @@ static void vmx_free_vmcs(struct kvm_vcpu
>>  *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu);
>> 
>>  if (vmx->vmcs) {
>> -vcpu_clear(vmx);
>> +if (!vmm_coexistence)
>> +vcpu_clear(vmx);
>> 
> 
> What's wrong with doing this unconditionally?

The change should have beed put in PATCH 3/3.
This judgement is to avoid calling vcpu_clear() after kvm_cpu_vmxoff();
However it will not appear in next version since I will set vcpu->cpu=-1
in vmx_vcpu_put() !vmm_exclusive case, so that vcpu_clear() will not
executed actually.

> 
>>  free_vmcs(vmx->vmcs);
>>  vmx->vmcs = NULL;
>>  }
>> @@ -3969,13 +3988,20 @@ static struct kvm_vcpu
>>  *vmx_create_vcpu(struct kvm *kvm, unsigned int id)  if
>> (!vmx->vmcs) goto free_msrs; 
>> 
>> -vmcs_clear(vmx->vmcs);
>> -
>> -cpu = get_cpu();
>> -vmx_vcpu_load(&vmx->vcpu, cpu);
>> -err = vmx_vcpu_setup(vmx);
>> -vmx_vcpu_put(&vmx->vcpu);
>> -put_cpu();
>> +if (vmm_coexistence) {
>> +preempt_disable();
>> +vmcs_load(vmx->vmcs);
>> +err = vmx_vcpu_setup(vmx);
>> +vmcs_clear(vmx->vmcs);
>> +preempt_enable();
>> 
> 
> Why won't the normal sequence work?

Yes you are right.

> 
>> +} else {
>> +vmcs_clear(vmx->vmcs);
>> +cpu = get_cpu();
>> +vmx_vcpu_load(&vm

[PATCH 1/4 v3] KVM: VMX: Define new functions to wrapper direct call of asm code.

2010-05-06 Thread Xu, Dongxiao
From: Dongxiao Xu 

Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().

Signed-off-by: Dongxiao Xu 
---
 arch/x86/kvm/vmx.c |   36 +++-
 1 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 875b785..e77da89 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -445,6 +445,19 @@ static void vmcs_clear(struct vmcs *vmcs)
   vmcs, phys_addr);
 }
 
+static void vmcs_load(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(vmcs);
+   u8 error;
+
+   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) "; setna %0"
+   : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
+   : "cc", "memory");
+   if (error)
+   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
+  vmcs, phys_addr);
+}
+
 static void __vcpu_clear(void *arg)
 {
struct vcpu_vmx *vmx = arg;
@@ -769,7 +782,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   u64 phys_addr = __pa(vmx->vmcs);
u64 tsc_this, delta, new_offset;
 
if (vcpu->cpu != cpu) {
@@ -783,15 +795,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
}
 
if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
-   u8 error;
-
per_cpu(current_vmcs, cpu) = vmx->vmcs;
-   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) "; setna %0"
- : "=g"(error) : "a"(&phys_addr), "m"(phys_addr)
- : "cc");
-   if (error)
-   printk(KERN_ERR "kvm: vmptrld %p/%llx fail\n",
-  vmx->vmcs, phys_addr);
+   vmcs_load(vmx->vmcs);
}
 
if (vcpu->cpu != cpu) {
@@ -1220,6 +1225,13 @@ static __init int vmx_disabled_by_bios(void)
/* locked but not enabled */
 }
 
+static void kvm_cpu_vmxon(u64 addr)
+{
+   asm volatile (ASM_VMX_VMXON_RAX
+   : : "a"(&addr), "m"(addr)
+   : "memory", "cc");
+}
+
 static int hardware_enable(void *garbage)
 {
int cpu = raw_smp_processor_id();
@@ -1240,9 +1252,7 @@ static int hardware_enable(void *garbage)
   FEATURE_CONTROL_LOCKED |
   FEATURE_CONTROL_VMXON_ENABLED);
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   asm volatile (ASM_VMX_VMXON_RAX
- : : "a"(&phys_addr), "m"(phys_addr)
- : "memory", "cc");
+   kvm_cpu_vmxon(phys_addr);
 
ept_sync_global();
 
@@ -1266,13 +1276,13 @@ static void vmclear_local_vcpus(void)
 static void kvm_cpu_vmxoff(void)
 {
asm volatile (__ex(ASM_VMX_VMXOFF) : : : "cc");
-   write_cr4(read_cr4() & ~X86_CR4_VMXE);
 }
 
 static void hardware_disable(void *garbage)
 {
vmclear_local_vcpus();
kvm_cpu_vmxoff();
+   write_cr4(read_cr4() & ~X86_CR4_VMXE);
 }
 
 static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
-- 
1.6.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4 v3] KVM: VMX: Some minor changes to code structure.

2010-05-06 Thread Xu, Dongxiao
From: Dongxiao Xu 

Do some preparations for vmm coexistence support.

Signed-off-by: Dongxiao Xu 
---
 arch/x86/kvm/vmx.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e77da89..49b0850 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -784,15 +784,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
 
-   if (vcpu->cpu != cpu) {
+   if (vcpu->cpu != cpu)
vcpu_clear(vmx);
-   kvm_migrate_timers(vcpu);
-   set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
-   local_irq_disable();
-   list_add(&vmx->local_vcpus_link,
-&per_cpu(vcpus_on_cpu, cpu));
-   local_irq_enable();
-   }
 
if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->vmcs;
@@ -803,6 +796,13 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct desc_ptr dt;
unsigned long sysenter_esp;
 
+   kvm_migrate_timers(vcpu);
+   set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests);
+   local_irq_disable();
+   list_add(&vmx->local_vcpus_link,
+&per_cpu(vcpus_on_cpu, cpu));
+   local_irq_enable();
+
vcpu->cpu = cpu;
/*
 * Linux uses per-cpu TSS and GDT, so set these when switching
-- 
1.6.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v3] KVM: VMX: Support hosted VMM coexsitence.

2010-05-06 Thread Xu, Dongxiao
Hi all,

This is hosted VMM coexistence support v3.

Main changes from v2:
1) Change vmm_coexistence to vmm_exclusive.
2) Some code structure changes. Split the original 3 patches to 4.
3) Address some comments from Avi.

Main changes from v1:
1) Add an module option "vmm_coexistence" to decide whether to
enable this feature. Currently it is off defaultly.
2) Each time when a KVM vcpu is scheduled in, it will invalidate EPT
and VPID TLBs to avoid conflict between different VMMs.

VMX: Support for coexistence of KVM and other hosted VMMs. 

The following NOTE is picked up from Intel SDM 3B 27.3 chapter, 
MANAGING VMCS REGIONS AND POINTERS.

--
NOTE
As noted in Section 21.1, the processor may optimize VMX operation
by maintaining the state of an active VMCS (one for which VMPTRLD
has been executed) on the processor. Before relinquishing control to
other system software that may, without informing the VMM, remove
power from the processor (e.g., for transitions to S3 or S4) or leave
VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures
that all VMCS data cached by the processor are flushed to memory
and that no other software can corrupt the current VMM's VMCS data.
It is also recommended that the VMM execute VMXOFF after such
executions of VMCLEAR.
--

Currently, VMCLEAR is called at VCPU migration. To support hosted
VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and
VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is
scheduled out of a physical CPU, while VMPTRLD is called when VCPU
is scheduled in a physical CPU. Also this approach could eliminates
the IPI mechanism for original VMCLEAR. As suggested by SDM,
VMXOFF will be called after VMCLEAR, and VMXON will be called
before VMPTRLD.

With this patchset, KVM and VMware Workstation 7 could launch
serapate guests and they can work well with each other. --
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.

2010-05-06 Thread Xu, Dongxiao
From: Dongxiao Xu 

SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.

Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.

Signed-off-by: Dongxiao Xu 
---
 arch/x86/kvm/vmx.c |   45 ++---
 1 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c536b9d..59d7443 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -168,6 +168,8 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 
 static int init_rmode(struct kvm *kvm);
 static u64 construct_eptp(unsigned long root_hpa);
+static void kvm_cpu_vmxon(u64 addr);
+static void kvm_cpu_vmxoff(void);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -786,8 +788,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
+   u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
-   if (vmm_exclusive && vcpu->cpu != cpu)
+   if (!vmm_exclusive)
+   kvm_cpu_vmxon(phys_addr);
+   else if (vcpu->cpu != cpu)
vcpu_clear(vmx);
 
if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
@@ -833,8 +838,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
__vmx_load_host_state(to_vmx(vcpu));
-   if (!vmm_exclusive)
+   if (!vmm_exclusive) {
__vcpu_clear(to_vmx(vcpu));
+   kvm_cpu_vmxoff();
+   }
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
@@ -1257,9 +1264,11 @@ static int hardware_enable(void *garbage)
   FEATURE_CONTROL_LOCKED |
   FEATURE_CONTROL_VMXON_ENABLED);
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   kvm_cpu_vmxon(phys_addr);
 
-   ept_sync_global();
+   if (vmm_exclusive) {
+   kvm_cpu_vmxon(phys_addr);
+   ept_sync_global();
+   }
 
return 0;
 }
@@ -1285,8 +1294,10 @@ static void kvm_cpu_vmxoff(void)
 
 static void hardware_disable(void *garbage)
 {
-   vmclear_local_vcpus();
-   kvm_cpu_vmxoff();
+   if (vmm_exclusive) {
+   vmclear_local_vcpus();
+   kvm_cpu_vmxoff();
+   }
write_cr4(read_cr4() & ~X86_CR4_VMXE);
 }
 
@@ -3949,6 +3960,19 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
kmem_cache_free(kvm_vcpu_cache, vmx);
 }
 
+static inline void vmcs_init(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(per_cpu(vmxarea, raw_smp_processor_id()));
+
+   if (!vmm_exclusive)
+   kvm_cpu_vmxon(phys_addr);
+
+   vmcs_clear(vmcs);
+
+   if (!vmm_exclusive)
+   kvm_cpu_vmxoff();
+}
+
 static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 {
int err;
@@ -3974,13 +3998,14 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
if (!vmx->vmcs)
goto free_msrs;
 
-   vmcs_clear(vmx->vmcs);
+   vmcs_init(vmx->vmcs);
 
cpu = get_cpu();
vmx_vcpu_load(&vmx->vcpu, cpu);
err = vmx_vcpu_setup(vmx);
vmx_vcpu_put(&vmx->vcpu);
put_cpu();
+
if (err)
goto free_vmcs;
if (vm_need_virtualize_apic_accesses(kvm))
@@ -4118,7 +4143,10 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *best;
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   int cpu;
 
+   cpu = get_cpu();
+   vmx_vcpu_load(&vmx->vcpu, cpu);
vmx->rdtscp_enabled = false;
if (vmx_rdtscp_supported()) {
exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
@@ -4133,6 +4161,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
}
}
}
+   vmx_vcpu_put(&vmx->vcpu);
+   put_cpu();
+
 }
 
 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
-- 
1.6.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4 v3] KVM: VMX: VMCLEAR/VMPTRLD usage changes.

2010-05-06 Thread Xu, Dongxiao
From: Dongxiao Xu 

Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.

vmm_exclusive is introduced as a module parameter to indicate
whether the vmm coexistence feature is enabled or not.
Currently the feature is disabled in default.

Signed-off-by: Dongxiao Xu 
---
 arch/x86/kvm/vmx.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 49b0850..c536b9d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -62,6 +62,9 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+static int __read_mostly vmm_exclusive = 1;
+module_param(vmm_exclusive, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  \
(X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
 #define KVM_GUEST_CR0_MASK \
@@ -784,7 +787,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
 
-   if (vcpu->cpu != cpu)
+   if (vmm_exclusive && vcpu->cpu != cpu)
vcpu_clear(vmx);
 
if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
@@ -830,6 +833,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
__vmx_load_host_state(to_vmx(vcpu));
+   if (!vmm_exclusive)
+   __vcpu_clear(to_vmx(vcpu));
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
-- 
1.6.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio: put last_used and last_avail index into ring itself.

2010-05-06 Thread Rusty Russell
On Thu, 6 May 2010 03:57:55 pm Michael S. Tsirkin wrote:
> On Thu, May 06, 2010 at 10:22:12AM +0930, Rusty Russell wrote:
> > On Wed, 5 May 2010 03:52:36 am Michael S. Tsirkin wrote:
> > > What do you think?
> > 
> > I think everyone is settled on 128 byte cache lines for the forseeable
> > future, so it's not really an issue.
> 
> You mean with 64 bit descriptors we will be bouncing a cache line
> between host and guest, anyway?

I'm confused by this entire thread.

Descriptors are 16 bytes.  They are at the start, so presumably aligned to
cache boundaries.

Available ring follows that at 2 bytes per entry, so it's also packed nicely
into cachelines.

Then there's padding to page boundary.  That puts us on a cacheline again
for the used ring; also 2 bytes per entry.

I don't see how any change in layout could be more cache friendly?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-06 Thread Rusty Russell
On Thu, 6 May 2010 07:30:00 pm Avi Kivity wrote:
> On 05/05/2010 11:58 PM, Michael S. Tsirkin wrote:
> > +   /* We publish the last-seen used index at the end of the available ring.
> > +* It is at the end for backwards compatibility. */
> > +   vr->last_used_idx =&(vr)->avail->ring[num];
> > +   /* Verify that last used index does not spill over the used ring. */
> > +   BUG_ON((void *)vr->last_used_idx +
> > +  sizeof *vr->last_used_idx>  (void *)vr->used);
> >   }
> >
> 
> Shouldn't this be on its own cache line?

It's next to the available ring; because that's where the guest publishes
its data.  That whole page is guest-write, host-read.

Putting it on a cacheline by itself would be a slight pessimization; the host
cpu would have to get the last_used_idx cacheline and the avail descriptor
cacheline every time.  This way, they are sometimes the same cacheline.

Hope that clarifies,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-06 Thread Rusty Russell
On Thu, 6 May 2010 03:49:46 pm Michael S. Tsirkin wrote:
> Now, I also added an mb() in guest between read and write so
> that last used index write can not get ahead of used index read.
> It does feel good to have it there, but I can not say why
> it's helpful. Works fine without it, but then these
> subtle races might be hard to trigger. What do you think?

I couldn't see that in the patch?  I don't think it's necessary
though, since the write of depends last_used depends on the read of
used (and no platform we care about would reorder such a thing).

I'm reasonably happy, but we should write some convenient test for
missing interrupts.

I'm thinking of a sender which does a loop: blasts 1MB of UDP packets,
then prints the time and sleep(1).  The receiver would print the time
every 1MB of received data.  The two times should almost exactly correspond.

Assuming that the network doesn't overflow and lose stuff, this should
identify any missing wakeup/interrupts (depending on direction used).

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/9] KVM MMU: allow more shadow pages become asynchronous

2010-05-06 Thread Xiao Guangrong
Hi Avi, Marcelo,

patch 5 and patch 6 are can't apply to current kvm tree, i'll
rebase those two patches.

Marcelo, does this patchset fix your issue? I have tested it with
Fedora12/Ubuntu/CentOS 32/64 guests, it works well.

Thanks,
Xiao
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/9] KVM MMU: rename 'root_count' to 'active_count'

2010-05-06 Thread Xiao Guangrong
Rename 'root_count' to 'active_count' in kvm_mmu_page, since the unsync pages
also will use it in later patch

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |8 +++-
 arch/x86/kvm/mmu.c  |   14 +++---
 arch/x86/kvm/mmutrace.h |6 +++---
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ed48904..2f61543 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -203,7 +203,13 @@ struct kvm_mmu_page {
DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
bool multimapped; /* More than one parent_pte? */
bool unsync;
-   int root_count;  /* Currently serving as active root */
+   /*
+* if active_count > 0, it means that this page is not freed
+* immediately, it's used by active root and unsync pages which
+* out of kvm->mmu_lock's protection currently.
+*/
+   int active_count;
+
unsigned int unsync_children;
union {
u64 *parent_pte;   /* !multimapped */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e202b0d..4077a9c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1539,7 +1539,7 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
unaccount_shadowed(kvm, sp->gfn);
if (sp->unsync)
kvm_unlink_unsync_page(kvm, sp);
-   if (!sp->root_count) {
+   if (!sp->active_count) {
/* Count self */
ret++;
hlist_del(&sp->hash_link);
@@ -2061,8 +2061,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
hpa_t root = vcpu->arch.mmu.root_hpa;
 
sp = page_header(root);
-   --sp->root_count;
-   if (!sp->root_count && sp->role.invalid)
+   --sp->active_count;
+   if (!sp->active_count && sp->role.invalid)
kvm_mmu_zap_page(vcpu->kvm, sp);
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
spin_unlock(&vcpu->kvm->mmu_lock);
@@ -2074,8 +2074,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
if (root) {
root &= PT64_BASE_ADDR_MASK;
sp = page_header(root);
-   --sp->root_count;
-   if (!sp->root_count && sp->role.invalid)
+   --sp->active_count;
+   if (!sp->active_count && sp->role.invalid)
kvm_mmu_zap_page(vcpu->kvm, sp);
}
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
@@ -2121,7 +2121,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
  PT64_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp->spt);
-   ++sp->root_count;
+   ++sp->active_count;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = root;
return 0;
@@ -2151,7 +2151,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
  PT32_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp->spt);
-   ++sp->root_count;
+   ++sp->active_count;
spin_unlock(&vcpu->kvm->mmu_lock);
 
vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 42f07b1..8c8d265 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -10,13 +10,13 @@
 #define KVM_MMU_PAGE_FIELDS \
__field(__u64, gfn) \
__field(__u32, role) \
-   __field(__u32, root_count) \
+   __field(__u32, active_count) \
__field(bool, unsync)
 
 #define KVM_MMU_PAGE_ASSIGN(sp) \
__entry->gfn = sp->gfn;  \
__entry->role = sp->role.word;   \
-   __entry->root_count = sp->root_count;\
+   __entry->active_count = sp->active_count;\
__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({   \
@@ -37,7 +37,7 @@
 access_str[role.access],   \
 role.invalid ? " invalid" : "",\
 role.nxe ? "" : "!",   \
-__entry->root_count,   \
+__entry->active_count, \
 __entry->unsync ? "unsync" : "sync", 0);   \
ret;\
})
-- 
1.6.1.2


--
To unsubscribe from this list: send t

[PATCH v5 6/9] KVM MMU: support keeping sp live while it's out of protection

2010-05-06 Thread Xiao Guangrong
If we want to keep sp live while it it's out of kvm->mmu_lock protection,
we can increase sp->active_count.

Then, the invalid page is not only for active root but also unsync sp, we
should filter those out when we make a page to unsync.

And move 'hlist_del(&sp->hash_link)' into kvm_mmu_free_page() then we can
free the invalid unsync page to call kvm_mmu_free_page directly.

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4077a9c..2d3347c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -894,6 +894,7 @@ static int is_empty_shadow_page(u64 *spt)
 static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
ASSERT(is_empty_shadow_page(sp->spt));
+   hlist_del(&sp->hash_link);
list_del(&sp->link);
__free_page(virt_to_page(sp->spt));
__free_page(virt_to_page(sp->gfns));
@@ -1542,12 +1543,13 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
if (!sp->active_count) {
/* Count self */
ret++;
-   hlist_del(&sp->hash_link);
kvm_mmu_free_page(kvm, sp);
} else {
sp->role.invalid = 1;
list_move(&sp->link, &kvm->arch.active_mmu_pages);
-   kvm_reload_remote_mmus(kvm);
+   /* No need reload mmu if it's unsync page zapped */
+   if (sp->role.level != PT_PAGE_TABLE_LEVEL)
+   kvm_reload_remote_mmus(kvm);
}
kvm_mmu_reset_last_pte_updated(kvm);
return ret;
@@ -1782,7 +1784,8 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  
gfn_t gfn)
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
 
hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
-   if (s->gfn != gfn || s->role.direct || s->unsync)
+   if (s->gfn != gfn || s->role.direct || s->unsync ||
+ s->role.invalid)
continue;
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
__kvm_unsync_page(vcpu, s);
@@ -1807,7 +1810,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, 
gfn_t gfn,
if (s->role.level != PT_PAGE_TABLE_LEVEL)
return 1;
 
-   if (!need_unsync && !s->unsync) {
+   if (!need_unsync && !s->unsync && !s->role.invalid) {
if (!can_unsync || !oos_shadow)
return 1;
need_unsync = true;
-- 
1.6.1.2


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html