[COMMIT master] vhost-net: fix reversed logic in mask notifiers

2010-05-27 Thread Avi Kivity
From: Michael S. Tsirkin m...@redhat.com

When guest notifier is assigned, we set mask notifier,
which will assign kvm irqfd.
When guest notifier is unassigned, mask notifier is unset,
which should unassign kvm irqfd.

The way to do this is to call mask notifier telling it to mask the vector.
This, unless vector is already masked which unassigns irqfd already.

The logic in unassign was reversed, which left kvm irqfd assigned.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Acked-by: Gerd Hoffmann kra...@redhat.com
Acked-by: Amit Shah amit.s...@redhat.com
Reported-by: Amit Shah amit.s...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/msix.c b/hw/msix.c
index 8f9a621..1398680 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -617,6 +617,7 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, 
void *opaque)
 assert(opaque);
 assert(!dev-msix_mask_notifier_opaque[vector]);
 
+/* Unmask the new notifier unless vector is masked. */
 if (msix_is_masked(dev, vector)) {
 return 0;
 }
@@ -638,12 +639,13 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned 
vector)
 assert(dev-msix_mask_notifier);
 assert(dev-msix_mask_notifier_opaque[vector]);
 
+/* Mask the old notifier unless it is already masked. */
 if (msix_is_masked(dev, vector)) {
 return 0;
 }
 r = dev-msix_mask_notifier(dev, vector,
 dev-msix_mask_notifier_opaque[vector],
-msix_is_masked(dev, vector));
+!msix_is_masked(dev, vector));
 if (r  0) {
 return r;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] device-assignment: don't truncate MSIX capabilities table size

2010-05-27 Thread Avi Kivity
From: Alex Williamson alex.william...@redhat.com

PCI_MSIX_TABSIZE is 0x07ff

Reported-by: Juan Quintela quint...@redhat.com
Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Acked-by: Chris Wright chr...@redhat.com
Acked-by: Acked-by: Juan Quintela quint...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index d8e7cb4..e254203 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1073,7 +1073,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 else
 pos = pci_dev-cap.start;
 
-entries_max_nr = pci_dev-config[pos + 2];
+entries_max_nr = *(uint16_t *)(pci_dev-config + pos + 2);
 entries_max_nr = PCI_MSIX_TABSIZE;
 entries_max_nr += 1;
 
@@ -1255,8 +1255,8 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 entry_nr = assigned_dev_pci_read_word(pci_dev, pos + 2) 
  PCI_MSIX_TABSIZE;
 pci_dev-config[pci_dev-cap.start + pci_dev-cap.length] = 0x11;
-pci_dev-config[pci_dev-cap.start +
-pci_dev-cap.length + 2] = entry_nr;
+*(uint16_t *)(pci_dev-config + pci_dev-cap.start +
+  pci_dev-cap.length + 2) = entry_nr;
 msix_table_entry = assigned_dev_pci_read_long(pci_dev,
   pos + PCI_MSIX_TABLE);
 *(uint32_t *)(pci_dev-config + pci_dev-cap.start +
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: don't check PT_WRITABLE_MASK directly

2010-05-27 Thread Avi Kivity
From: Gui Jianfeng guijianf...@cn.fujitsu.com

Since we have is_writable_pte(), make use of it.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d92984d..136e160 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2965,7 +2965,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
pt = sp-spt;
for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
/* avoid RMW */
-   if (pt[i]  PT_WRITABLE_MASK)
+   if (is_writable_pte(pt[i]))
pt[i] = ~PT_WRITABLE_MASK;
}
kvm_flush_remote_tlbs(kvm);
@@ -3400,7 +3400,7 @@ void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
struct kvm_mmu_page *rev_sp;
gfn_t gfn;
 
-   if (*sptep  PT_WRITABLE_MASK) {
+   if (is_writable_pte(*sptep)) {
rev_sp = page_header(__pa(sptep));
gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp-spt);
 
@@ -3449,7 +3449,7 @@ static void check_writable_mappings_rmap(struct kvm_vcpu 
*vcpu)
 
if (!(ent  PT_PRESENT_MASK))
continue;
-   if (!(ent  PT_WRITABLE_MASK))
+   if (!is_writable_pte(ent))
continue;
inspect_spte_has_rmap(vcpu-kvm, pt[i]);
}
@@ -3483,7 +3483,7 @@ static void audit_write_protection(struct kvm_vcpu *vcpu)
 
spte = rmap_next(vcpu-kvm, rmapp, NULL);
while (spte) {
-   if (*spte  PT_WRITABLE_MASK)
+   if (is_writable_pte(*spte))
printk(KERN_ERR %s: (%s) shadow page has 
writable mappings: gfn %lx role %x\n,
   __func__, audit_msg, sp-gfn,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: Fix EFER.LME being stripped

2010-05-27 Thread Avi Kivity
From: Zachary Amsden zams...@redhat.com

Must set VCPU register to be the guest notion of EFER even if that
setting is not valid on hardware.  This was masked by the set in
set_efer until 7657fd5ace88e8092f5f3a84117e093d7b893f26 broke that.
Fix is simply to set the VCPU register before stripping bits.

Signed-off-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5b313e9..298918e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -286,11 +286,11 @@ static inline void flush_guest_tlb(struct kvm_vcpu *vcpu)
 
 static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
+   vcpu-arch.efer = efer;
if (!npt_enabled  !(efer  EFER_LMA))
efer = ~EFER_LME;
 
to_svm(vcpu)-vmcb-save.efer = efer | EFER_SVME;
-   vcpu-arch.efer = efer;
 }
 
 static int is_external_interrupt(u32 info)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: SVM: fix compiling warning from erratum 383 fix

2010-05-27 Thread Avi Kivity
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

fix:

arch/x86/kvm/svm.c: In function ‘svm_handle_mce’:
arch/x86/kvm/svm.c:1490: warning: unused variable ‘kvm_run’

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 298918e..9c68a65 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1487,8 +1487,6 @@ static void svm_handle_mce(struct vcpu_svm *svm)
 * Erratum 383 triggered. Guest state is corrupt so kill the
 * guest.
 */
-   struct kvm_run *kvm_run = svm-vcpu.run;
-
pr_err(KVM: Guest triggered AMD Erratum 383\n);
 
set_bit(KVM_REQ_TRIPLE_FAULT, svm-vcpu.requests);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow mode

2010-05-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

When tdp is enabled, the guest's cr0.wp shouldn't have any effect on spte
permissions.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 136e160..39dd8d3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1881,7 +1881,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte |= (u64)pfn  PAGE_SHIFT;
 
if ((pte_access  ACC_WRITE_MASK)
-   || (write_fault  !is_write_protection(vcpu)  !user_fault)) {
+   || (!tdp_enabled  write_fault  !is_write_protection(vcpu)
+!user_fault)) {
 
if (level  PT_PAGE_TABLE_LEVEL 
has_wrprotected_page(vcpu-kvm, gfn, level)) {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page

2010-05-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0.
We do that by setting spte.w=1, since the host cr0.wp must remain set so the
host can write protect pages.  Once we allow write access, we must remove
user access otherwise we mistakenly allow the user to write the page.

Reviewed-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 39dd8d3..56f8c3c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1894,6 +1894,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
spte |= PT_WRITABLE_MASK;
 
+   if (!tdp_enabled  !(pte_access  ACC_WRITE_MASK))
+   spte = ~PT_USER_MASK;
+
/*
 * Optimization: for pte sync, if spte was writable the hash
 * lookup is unnecessary (and expensive). Write protection
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Document cr0.wp emulation

2010-05-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index 2201dcb..1e7ecdd 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -298,6 +298,25 @@ Host translation updates:
   - look up affected sptes through reverse map
   - drop (or update) translations
 
+Emulating cr0.wp
+
+
+If tdp is not enabled, the host must keep cr0.wp=1 so page write protection
+works for the guest kernel, not guest guest userspace.  When the guest
+cr0.wp=1, this does not present a problem.  However when the guest cr0.wp=0,
+we cannot map the permissions for gpte.u=1, gpte.w=0 to any spte (the
+semantics require allowing any guest kernel access plus user read access).
+
+We handle this by mapping the permissions to two possible sptes, depending
+on fault type:
+
+- kernel write fault: spte.u=0, spte.w=1 (allows full kernel access,
+  disallows user access)
+- read fault: spte.u=1, spte.w=0 (allows full read access, disallows kernel
+  write access)
+
+(user write faults generate a #PF)
+
 Further reading
 ===
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Fix order passed to iommu_unmap

2010-05-27 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

This is obviously a left-over from the the old interface taking the
size. Apparently a mostly harmless issue with the current iommu_unmap
implementation.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Acked-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 5adc578..673c88a 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -273,7 +273,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
pfn  = phys  PAGE_SHIFT;
 
/* Unmap address from IO address space */
-   order   = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
+   order   = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
unmap_pages = 1ULL  order;
 
/* Unpin all pages we just unmapped to not leak any memory */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: ia64: Add missing spin_unlock in kvm_arch_hardware_enable()

2010-05-27 Thread Avi Kivity
From: Julia Lawall ju...@diku.dk

Add a spin_unlock missing on the error path.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// smpl
@@
expression E1;
@@

* spin_lock(E1,...);
  +... when != E1
  if (...) {
... when != E1
*   return ...;
  }
  ...+
* spin_unlock(E1,...);
// /smpl

Signed-off-by: Julia Lawall ju...@diku.dk
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 29f6075..91760e8 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -144,6 +144,7 @@ int kvm_arch_hardware_enable(void *garbage)
VP_INIT_ENV : VP_INIT_ENV_INITALIZE,
__pa(kvm_vm_buffer), KVM_VM_BUFFER_BASE, tmp_base);
if (status != 0) {
+   spin_unlock(vp_lock);
printk(KERN_WARNINGkvm: Failed to Enable VT Support\n);
return -EINVAL;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] Fix tboot enabled macro

2010-05-27 Thread Wang, Shane
Jan Kiszka wrote:
 Wang, Shane wrote:
 Avi Kivity wrote:
 On 05/26/2010 10:25 AM, Jan Kiszka wrote:
 This is for CONFIG_INTEL_TXT enabled? Good point but needs to be
 solved differently. tboot, the variable that is checked by the
 original header, is not exported to modules. I wonder how this
 worked out for you... 
 
 Solution should be: hack tboot_enabled to kvm_tboot_enabled and
 unconditionally define that to 0 for older kernels. If tboot is
 actually enabled in hardware, KVM may not load but I'm unsure if
 it's OK to assume tboot == 1 for that case or if that will cause
 breakages if it's off instead - CC'ing the KVM patch author.
 
 Worst case it doesn't load.  I don't think it's a problem since
 enabling tboot will be rare for older kernels.
 
 tboot is not 0 if tboot module is run before kernel.
 If tboot is enabled in hardware (I assume you mean if Intel TXT is
 enabled in hardware) but tboot module is not run or old kernels
 don't support tboot module, 
 we still have outside_smx bit in feature msr. Why might KVM not load?
 
 If we have to hard-wire tboot_enabled in kvm-kmod to 0, KVM may not
 test all required bits and erroneously assume VTX would be disabled.
 
 So I wondered what would happen if we hard-wired it to 1, pretending
 that the tboot modules is loaded. Would we gain something without
 loosing on some other end? If not, I would simply leave things as they
 are now (i.e. always assuming tboot absence).
 
 Thanks,
 Jan

Why is VTX assumed to be disabled?
tboot_enabled == 0 but (msr  FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) == 1 
if you have VT enabled.
If you have VT enabled, VMX outside SMX is 1 always.

Shane
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


About upgrade KVM

2010-05-27 Thread satimis

Hi folks,

KVM -virtualizer
host - Debian 5.0

This virtual machine has been running for sometimes.  IIRC the KVM was  
download from KVM website.  Its version is quite old.



$ sudo kvm | head -1
QEMU PC emulator version 0.9.1 (kvm-72), Copyright (c) 2003-2008  
Fabrice Bellard



$ apt-cache policy qemu-kvm
qemu-kvm:
  Installed: (none)
  Candidate: 0.12.4+dfsg-1~bpo50+1
  Version table:
 0.12.4+dfsg-1~bpo50+1 0
  1 http://www.backports.org lenny-backports/main Packages


I tried to upgrade it on System - Administration - Synaptic Package Manager;
- qemu-kvm
- qemu-kvm-dbg


On marking them to install following warning popup:-
Mark additional required changes?

To be removed
kvm

To be added
libasyncns0
libcap1
libpulse0
- end -


My questions are;

1)
On deleting the old package whether it would affect the running  
virtual machine?


2)
Is there any way to come back in case of problem?


Please advise.  TIA


B.R.
Stephen L

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] Fix tboot enabled macro

2010-05-27 Thread Wang, Shane
Wang, Shane wrote:

 Why is VTX assumed to be disabled?
 tboot_enabled == 0 but (msr 
 FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) == 1 if you have VT
 enabled. If you have VT enabled, VMX outside SMX is 1 always. 
 
 Shane

BTW:
In hardware,
VT is enabled, TXT is enabled, then outside = 1, inside = 1;
VT is enabled, TXT is disabled, then outside = 1, inside = 0;
VT is disabled, then outside = 0;--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: don't truncate MSIX capabilities table size

2010-05-27 Thread Jes Sorensen
On 05/26/10 14:48, Avi Kivity wrote:
 On 05/26/2010 03:27 AM, Juan Quintela wrote:
 BTW, I also noticed the lack of pci_set_long() and friend functions, but
 arrived to the same conclusion that you: all the device assignment
 assumes that the world is x86_64 :)

 
 IIRC it used to work on ia64 as well.
 

Same endianess, unless you run HPUX on the CPU.

Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm mmu: optimizations when tdp is in use

2010-05-27 Thread Gui Jianfeng
In case of using tdp, checking write protected page isn't needed and
quadrant also no need to be calculated.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0bb9f17..ce4bbd3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -495,10 +495,13 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn)
max_level = kvm_x86_ops-get_lpage_level()  host_level ?
kvm_x86_ops-get_lpage_level() : host_level;
 
+   if (tdp_enabled)
+   goto done;
+
for (level = PT_DIRECTORY_LEVEL; level = max_level; ++level)
if (has_wrprotected_page(vcpu-kvm, large_gfn, level))
break;
-
+done:
return level - 1;
 }
 
@@ -1346,7 +1349,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (role.direct)
role.cr4_pae = 0;
role.access = access;
-   if (vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
+   if (!tdp_enabled  vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
quadrant = gaddr  (PAGE_SHIFT + (PT64_PT_BITS * level));
quadrant = (1  ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
role.quadrant = quadrant;
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: Add perfmon into the guest tests

2010-05-27 Thread Jes Sorensen
On 05/25/10 05:05, Chen Cao wrote:
 perfmon2 API provides access to the hardware performance counters of
 modern processors.
 
 Dependency,
 To compile the source code of the test, the following packages should
 be installed,
 glibc-static-2.11.1-6.x86_64
 glibc-headers-2.11.1-6.x86_64
 glibc-common-2.11.1-6.x86_64
 glibc-devel-2.11.1-6.x86_64
 glibc-2.11.1-6.x86_64
 
 Note,
 1. libpfm uses the Performance Monitor Unit (PMU) on the processors,
 but this unit is not provided by kvm currently, i.e. the test should
 fail in kvm guests.
 2. According to the README file of perfmon-tests-0.3, 2.6.24 or higer
 Linux kernel (with perfmon v2.8 or higher) is needed to run the tests.

I thought perfmon2 was deprecated in favor of perf_event.c ? The only
reference left for perfmon2 in the kernel is in the ia64 tree, and
KVM/ia64 seems to be pretty dead these days.

Cheers,
Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm mmu: don't check PT_WRITABLE_MASK directly

2010-05-27 Thread Gui Jianfeng
Since we have is_writable_pte(), make use of it.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ce4bbd3..441a5d8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2923,7 +2923,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
pt = sp-spt;
for (i = 0; i  PT64_ENT_PER_PAGE; ++i)
/* avoid RMW */
-   if (pt[i]  PT_WRITABLE_MASK)
+   if (is_writable_pte(pt[i]))
pt[i] = ~PT_WRITABLE_MASK;
}
kvm_flush_remote_tlbs(kvm);
@@ -3358,7 +3358,7 @@ void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
struct kvm_mmu_page *rev_sp;
gfn_t gfn;
 
-   if (*sptep  PT_WRITABLE_MASK) {
+   if (is_writable_pte(*sptep)) {
rev_sp = page_header(__pa(sptep));
gfn = rev_sp-gfns[sptep - rev_sp-spt];
 
@@ -3408,7 +3408,7 @@ static void check_writable_mappings_rmap(struct kvm_vcpu 
*vcpu)
 
if (!(ent  PT_PRESENT_MASK))
continue;
-   if (!(ent  PT_WRITABLE_MASK))
+   if (!is_writable_pte(ent))
continue;
inspect_spte_has_rmap(vcpu-kvm, pt[i]);
}
@@ -3442,7 +3442,7 @@ static void audit_write_protection(struct kvm_vcpu *vcpu)
 
spte = rmap_next(vcpu-kvm, rmapp, NULL);
while (spte) {
-   if (*spte  PT_WRITABLE_MASK)
+   if (is_writable_pte(*spte))
printk(KERN_ERR %s: (%s) shadow page has 
writable mappings: gfn %lx role %x\n,
   __func__, audit_msg, sp-gfn,
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Documentation/kvm/msr.txt

2010-05-27 Thread Avi Kivity

On 05/26/2010 09:04 PM, Glauber Costa wrote:

This patch adds a file that documents the usage of KVM-specific
MSRs.

   


Looks good.  A few comments:


+
+Custom MSR list
+
+
+The current supported Custom MSR list is:
+
+MSR_KVM_WALL_CLOCK:  0x11
+
+   data: physical address of a memory area.


Which must be in guest RAM (i.e., don't point it somewhere random and 
expect the hypervisor to allocate it for you).


Must be aligned to 4 bytes (we don't enforce it though).


+
+MSR_KVM_SYSTEM_TIME: 0x12
+
+   data: physical address of a memory area. This memory is expected to
+   hold a copy of the following structure:
   


In guest RAM.  What are the alignment restrictions?  I don't think we 
can restrict to less than 4 bytes without breaking guests retroactively.



+
+   struct pvclock_vcpu_time_info {
+   u32   version;
+   u32   pad0;
+   u64   tsc_timestamp;
+   u64   system_time;
+   u32   tsc_to_system_mul;
+   s8tsc_shift;
+   u8flags;
+   u8pad[2];
+   } __attribute__((__packed__)); /* 32 bytes */
+
+   whose data will be filled in by the hypervisor periodically. Only one
+   write, or registration, is needed for each VCPU. The interval between
+   updates of this structure is arbitrary, and implementation-dependent.
+
+   version: guest has to check version before and after grabbing
+   time information, and check that they are both equal and even.
+   An odd version indicates an in-progress update.
+
+   tsc_timestamp: the tsc value at the current VCPU, at the time
+   of the update of this structure. Guests can subtract this value
+   from current tsc to derive a notion of elapsed time since the
+   structure update.
+
+   system_time: the current system time at the time this structure
+   was last updated. Unit is nanoseconds.
   


What is the baseline for system_time?  Guest boot?


+
+   tsc_to_system_mul: a function of the tsc frequency. One has
+   to multiply any tsc-related quantity by this value to get
+   a value in nanoseconds, besides dividing by 2^tsc_shift
+
+   tsc_shift: cycle to nanosecond divider, as a power of two, to
+   allow for shift rights. One has to shift right any tsc-related
+   quantity by this value to get a value in nanoseconds, besides
+   multiplying by tsc_to_system_mul.
   


Would be good to write down the formula for calculating time here.


+
+   flags: bits in this field indicate extended capabilities
+   coordinated between the guest and the hypervisor. Availability
+   of specific flags has to be checked in 0x4001 cpuid leaf.
+   Refer to cpuid.txt for details.
   


Need to document bit 0 here for completeness.  cpuid.txt documents how 
to discover it, here we document how to use it.



+
+   Availability of this MSR must be checked via bit 0 in 0x401 cpuid
+   leaf prior to usage.
+
+   This MSR falls outside the reserved KVM range, and may be removed in the
+   future. Its usage is deprecated.
+
+MSR_KVM_WALL_CLOCK_NEW:   0x4b564d00
+
+   data and functioning: same as MSR_KVM_WALL_CLOCK. Use this instead.
+
+   Availability of this MSR must be checked via bit 3 in 0x401 cpuid
+   leaf prior to usage.
+
+MSR_KVM_SYSTEM_TIME_NEW:  0x4b564d01
+
+   data and functioning: same as MSR_KVM_SYSTEM_TIME. Use this instead.
+
+   Availability of this MSR must be checked via bit 3 in 0x401 cpuid
+   leaf prior to usage.
+
   


Please detail the suggested algorithm for using the msrs (new then old).



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issue about virtio-net driver to suppoprt vhost mergeable buffer with zero-copy to support PS mode

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 09:21:02AM +0800, Xin, Xiaohui wrote:
 Michael,
 I'm now looking into the vhost mergeable buffer, and I tried to use it to 
 support PS mode with zero-copy. And I found an issue there that I have to 
 modify the guest virito-net driver.
 
 When guest virtio-net driver submits mergeable buffers, it submits multiple 
 pages outside. In zero-copy case, vhost cannot know which page is used to put 
 header, and which page is used to put payload. Then vhost can only reserves 
 12 bytes for each page. That means, the page_offset of the payload DMAed into 
 the guest buffer is always 12 bytes. But guest virtio-net driver always use 
 offset 0 to put the data (See receive_mergeable()). That's where the 
 zero-copy use mergeable buffer must modify.
 
 Have I missed something here? And how do you think about it?
 
 Thanks 
 Xiaohui 

Maybe you can teach the hardware skip the first 12 bytes: qemu will
call an ioctl telling hardware what the virtio header size is.
This is how we plan to do it for tap.

Alternatively, buffers can be used in any order.
So we can have hardware use N buffers for the packet, and then
have vhost put the header in buffer N+1.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About upgrade KVM

2010-05-27 Thread Thomas Mueller

 
 
 On marking them to install following warning popup:- Mark additional
 required changes?
 
 To be removed
 kvm
 
 To be added
 libasyncns0
 libcap1
 libpulse0
 - end -
 
 
 My questions are;
 
 1)
 On deleting the old package whether it would affect the running virtual
 machine?

IMHO they won't be affected until shutdown of the guest and/or the host.


 
 2)
 Is there any way to come back in case of problem?

backup the old kvm package (look in /var/cache/apt/archive) before 
upgrade. 

beaware that some command line switches changed between 0.9.1 and 0.12.4. 
and upgrade your host kernel too (2.6.32 is in backports) for a newer 
version of the kvm kernel module (don't know if kvm-72 works with 0.12.4)



- Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: optimizations when tdp is in use

2010-05-27 Thread Avi Kivity

On 05/27/2010 11:06 AM, Gui Jianfeng wrote:

In case of using tdp, checking write protected page isn't needed and
quadrant also no need to be calculated.

Signed-off-by: Gui Jianfengguijianf...@cn.fujitsu.com
---
  arch/x86/kvm/mmu.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0bb9f17..ce4bbd3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -495,10 +495,13 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn)
max_level = kvm_x86_ops-get_lpage_level()  host_level ?
kvm_x86_ops-get_lpage_level() : host_level;

+   if (tdp_enabled)
+   goto done;
+
for (level = PT_DIRECTORY_LEVEL; level= max_level; ++level)
if (has_wrprotected_page(vcpu-kvm, large_gfn, level))
break;
-
+done:
return level - 1;
  }

   


We also use -write_count to prevent mapping the end of a 
non-large-page-aligned memslot with a large spte.


Undocumented in mmu.txt, I'll post a patch.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] Fix tboot enabled macro

2010-05-27 Thread Wang, Shane
Jan Kiszka wrote:
 If TXT is on and VT is locked but KVM sees tboot_enabled == 0, it
 won't check for FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX during setup
 and may consider VT unavailable.

If vt is locked, txt is on, tboot_enabled = 0, then it will check 
VMXON_OUTSIDE_SMX.
But at this point, if vt is on (still locked), the fn will return 0, which 
means vmx is not disabled by bios, correct?


 Moreover, if VT is not locked in that case, KVM will also not set
 FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX during hardware_enable,
 likely leaving VT off then, no? 

Sure, KVM will not set VMXON_INSIDE_SMX, but will set VMXON_OUTSIDE_SMX.
In that case, this means vt is on.

 
 So my question is: Would it cause any harm to assume TXT being always
 on, even if it wasn't?

A bit confused.
Do you mean hardware TXT always on, i.e. set 
FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX = 1 always?
That's fine. No problem. No harm.
Or, do you mean set tboot_enabled = 1 always? if so, in case that the hardware 
TXT is disabled
(FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX = 0), then KVM will think vmx is 
disabled if feature msr is locked.

Shane--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Michael S. Tsirkin
On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
 Add a new kernel API to create a singlethread workqueue and attach it's
 task to current task's cgroup and cpumask.
 
 Signed-off-by: Sridhar Samudrala s...@us.ibm.com

Could someone familiar with workqueue code please comment on whether
this patch is suitable for 2.6.35?

It is needed to fix the case where vhost user might cause a kernel
thread to consume more CPU than allowed by the cgroup.
Should I merge it through the vhost tree?
Ack for this?

Thanks!

 diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
 index 9466e86..6d6f301 100644
 --- a/include/linux/workqueue.h
 +++ b/include/linux/workqueue.h
 @@ -211,6 +211,7 @@ __create_workqueue_key(const char *name, int singlethread,
  #define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1, 0)
  #define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0, 
 0)
  
 +extern struct workqueue_struct 
 *create_singlethread_workqueue_in_current_cg(char *name); 
  extern void destroy_workqueue(struct workqueue_struct *wq);
  
  extern int queue_work(struct workqueue_struct *wq, struct work_struct *work);
 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
 index 5bfb213..6ba226e 100644
 --- a/kernel/workqueue.c
 +++ b/kernel/workqueue.c
 @@ -35,6 +35,7 @@
  #include linux/lockdep.h
  #define CREATE_TRACE_POINTS
  #include trace/events/workqueue.h
 +#include linux/cgroup.h
  
  /*
   * The per-CPU workqueue (if single thread, we always use the first
 @@ -193,6 +194,45 @@ static const struct cpumask *cpu_singlethread_map 
 __read_mostly;
   */
  static cpumask_var_t cpu_populated_map __read_mostly;
  
 +static struct task_struct *get_singlethread_wq_task(struct workqueue_struct 
 *wq)
 +{
 + return (per_cpu_ptr(wq-cpu_wq, singlethread_cpu))-thread;
 +}
 +
 +/* Create a singlethread workqueue and attach it's task to the current task's
 + * cgroup and set it's cpumask to the current task's cpumask.
 + */
 +struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char 
 *name)
 +{
 + struct workqueue_struct *wq;
 + struct task_struct *task;
 + cpumask_var_t mask;
 +
 + wq = create_singlethread_workqueue(name);
 + if (!wq)
 + goto out;
 +
 + if (!alloc_cpumask_var(mask, GFP_KERNEL))
 + goto err;
 + 
 + if (sched_getaffinity(current-pid, mask))
 + goto err;
 +
 + task = get_singlethread_wq_task(wq);
 + if (sched_setaffinity(task-pid, mask))
 + goto err;
 +
 + if (cgroup_attach_task_current_cg(task))
 + goto err;
 +out: 
 + return wq;
 +err:
 + destroy_workqueue(wq);
 + wq = NULL;
 + goto out;
 +}
 +EXPORT_SYMBOL_GPL(create_singlethread_workqueue_in_current_cg);
 +
  /* If it's single threaded, it isn't in the list of workqueues. */
  static inline int is_wq_single_threaded(struct workqueue_struct *wq)
  {
   
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Fix tboot enabled macro

2010-05-27 Thread Jan Kiszka
Wang, Shane wrote:
 Jan Kiszka wrote:
 If TXT is on and VT is locked but KVM sees tboot_enabled == 0, it
 won't check for FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX during setup
 and may consider VT unavailable.
 
 If vt is locked, txt is on, tboot_enabled = 0, then it will check 
 VMXON_OUTSIDE_SMX.
 But at this point, if vt is on (still locked), the fn will return 0, which 
 means vmx is not disabled by bios, correct?
 
 
 Moreover, if VT is not locked in that case, KVM will also not set
 FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX during hardware_enable,
 likely leaving VT off then, no? 
 
 Sure, KVM will not set VMXON_INSIDE_SMX, but will set VMXON_OUTSIDE_SMX.
 In that case, this means vt is on.
 
 So my question is: Would it cause any harm to assume TXT being always
 on, even if it wasn't?
 
 A bit confused.
 Do you mean hardware TXT always on, i.e. set 
 FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX = 1 always?
 That's fine. No problem. No harm.
 Or, do you mean set tboot_enabled = 1 always? 

The latter. As we have no clue about the actual state (tboot is not
exported on older kernels), we are forced to assume some reasonable state.

 if so, in case that the hardware TXT is disabled
 (FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX = 0), then KVM will think vmx is 
 disabled if feature msr is locked.

Then let's leave it as it was before the tboot changes to VMX: assume
!tboot_enabled().

Thanks for explaining,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copy and paste feature across guest and host

2010-05-27 Thread Tomasz Chmielewski

Just installed Fedora13 as guest on KVM.  However there is no
cross-platform copy and paste feature.  I trust I have setup this
feature on other guest sometime before.  Unfortunately I can't the
relevant document.  Could you please shed me some light.  Pointer
would be appreciated.  TIA


Did you try;

# modprobe virtio-copypaste

?


Seriously, qemu does not make it easy (well, its GUI does not make most 
things easy) and you'll need a tool which synchronizes the clipboard 
between two machines (google for qemu copy paste?).



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/4] Fix tboot enabled macro

2010-05-27 Thread Wang, Shane
Jan Kiszka wrote:
 The latter. As we have no clue about the actual state (tboot is not
 exported on older kernels), we are forced to assume some reasonable
 state. 

Are you trying to load the latest KVM on the older kernels?

Shane
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Autotest PATCH - updated with current tree] KVM test: Add perfmon into the guest tests

2010-05-27 Thread Chen Cao
perfmon2 API provides access to the hardware performance counters of
modern processors.

Dependency,
To compile the source code of the test, the following packages should
be installed,
glibc-static-2.11.1-6.x86_64
glibc-headers-2.11.1-6.x86_64
glibc-common-2.11.1-6.x86_64
glibc-devel-2.11.1-6.x86_64
glibc-2.11.1-6.x86_64

Note,
1. libpfm uses the Performance Monitor Unit (PMU) on the processors,
but this unit is not provided by kvm currently, i.e. the test should
fail in kvm guests. And this test can be used as a reminder that kvm
still lack the PMU virtualization.
2. According to the README file of perfmon-tests-0.3, 2.6.24 or higer
Linux kernel (with perfmon v2.8 or higher) is needed to run the tests.

Signed-off-by: Chen Cao k...@redhat.com
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/autotest_control/perfmon.control 
b/client/tests/kvm/autotest_control/perfmon.control
new file mode 100644
index 000..d3f5190
--- /dev/null
+++ b/client/tests/kvm/autotest_control/perfmon.control
@@ -0,0 +1,16 @@
+TIME=SHORT
+AUTHOR = Stephane Eranian eran...@google.com
+DOC = 
+This is a simple series of test for the perfmon2 API which
+provides access to the hardware performance counters of modern
+processors.
+
+Information about perfmon2 at:
+http://perfmon2.sf.net
+
+NAME = 'perfmon'
+TEST_CLASS = 'kernel'
+TEST_CATEGORY = 'Functional'
+TEST_TYPE = 'client'
+
+job.run_test('perfmon')
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 5349034..599e4c3 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -154,6 +154,8 @@ variants:
 test_control_file = hwclock.control
 - rtc:
 test_control_file = rtc.control
+- perfmon:
+test_control_file = perfmon.control
 
 - linux_s3: install setup unattended_install
 type = linux_s3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix SVM longmode guests

2010-05-27 Thread Avi Kivity

On 05/27/2010 04:09 AM, Zachary Amsden wrote:
In recent testing, I discovered guests failed to boot on my AMD box.  
Bisecting revealed an EFER related change caused the problem; here is 
the fix.




Yikes, applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Autotest PATCH] client test - perfmon: Patch the source code of perfmon to remove the -Werror switch

2010-05-27 Thread Chen Cao
On some OSes, the compilation will failed because of this switch,

...
gcc -O2 -g -Wall -Werror
+-I/home/kcao/projects/autotest/client/tests/perfmon/perfmon-tests-0.3/libpfm-3.
+52/lib/../include -D_REENTRANT -DCONFIG_PFMLIB_ARCH_X86_64 -I. -c
+pfmlib_os_linux.c
cc1: warnings being treated as errors
pfmlib_os_linux.c: In function ‘pfm_init_syscalls_sysfs’:
pfmlib_os_linux.c:398: error: ignoring return value of ‘fscanf’, declared with
+attribute warn_unused_result
make[2]: *** [pfmlib_os_linux.o] Error 1
make[2]: Leaving directory
+`/home/kcao/projects/autotest/client/tests/perfmon/perfmon-tests-0.3/libpfm-3.5
+2/lib'
make[1]: *** [lib] Error 2


Signed-off-by: Chen Cao k...@redhat.com
---
 client/tests/perfmon/perfmon.py |8 +++
 client/tests/perfmon/remove-werror-switch.patch |   55 +++
 2 files changed, 63 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/perfmon/remove-werror-switch.patch

diff --git a/client/tests/perfmon/perfmon.py b/client/tests/perfmon/perfmon.py
index ec1145f..39c6fb2 100644
--- a/client/tests/perfmon/perfmon.py
+++ b/client/tests/perfmon/perfmon.py
@@ -10,6 +10,14 @@ class perfmon(test.test):
 tarball = utils.unmap_url(self.bindir, tarball, self.tmpdir)
 utils.extract_tarball_to_dir(tarball, self.srcdir)
 os.chdir(self.srcdir)
+# Apply the patch to remove the -Werror switch,
+# because there are warnings while compiling,
+# if the compiler sticks to the rules, the source building
+# will fail, although it seems that fedora and rhel ignore
+# these warnings.
+p1 = 'patch -p1  ../remove-werror-switch.patch'
+utils.system(p1)
+
 utils.system('make')
 
 
diff --git a/client/tests/perfmon/remove-werror-switch.patch 
b/client/tests/perfmon/remove-werror-switch.patch
new file mode 100644
index 000..6a5062c
--- /dev/null
+++ b/client/tests/perfmon/remove-werror-switch.patch
@@ -0,0 +1,55 @@
+diff --git a/Makefile b/Makefile
+index 6ba6fba..884274d 100644
+--- a/Makefile
 b/Makefile
+@@ -28,6 +28,7 @@ TOPDIR  := $(shell if [ $$PWD !=  ]; then echo $$PWD; 
else pwd; fi)
+ include config.mk
+ 
+ DIRS=tests
++PATCHNAME=remove-libpfm-werror.patch
+ 
+ all:  libpfm
+   @echo Compiling for \'$(ARCH)\' target
+@@ -36,6 +37,7 @@ all:  libpfm
+ libpfm:
+   @echo Compiling $(LIBPFM)
+   @tar zxf $(LIBPFM).tar.gz
++  @(cd $(LIBPFM)  patch -p1  ../$(PATCHNAME)  cd ..)
+   @ln -sf $(LIBPFM) libpfm
+   @$(MAKE) -C $(LIBPFM) lib
+ clean: 
+diff --git a/config.mk b/config.mk
+index a110111..7321f2d 100644
+--- a/config.mk
 b/config.mk
+@@ -163,7 +163,7 @@ INSTALL=install
+ LN=ln -sf
+ PFMINCDIR=$(TOPDIR)/libpfm/include
+ PFMLIBDIR=$(TOPDIR)/libpfm/lib
+-DBG?=-g -Wall -Werror
++DBG?=-g -Wall
+ # gcc/mips64 bug
+ ifeq ($(CONFIG_PFMLIB_ARCH_SICORTEX),y)
+ OPTIM?=-O
+diff --git a/remove-libpfm-werror.patch b/remove-libpfm-werror.patch
+new file mode 100644
+index 000..252aaa0
+--- /dev/null
 b/remove-libpfm-werror.patch
+@@ -0,0 +1,13 @@
++diff --git a/config.mk b/config.mk
++index 8c76f59..bbf1fc0 100644
++--- a/config.mk
+ b/config.mk
++@@ -164,7 +164,7 @@ INSTALL=install
++ LN=ln -sf
++ PFMINCDIR=$(TOPDIR)/include
++ PFMLIBDIR=$(TOPDIR)/lib
++-DBG?=-g -Wall -Werror
+++DBG?=-g -Wall
++ # gcc/mips64 bug
++ ifeq ($(CONFIG_PFMLIB_ARCH_SICORTEX),y)
++ OPTIM?=-O
+-- 
+1.7.0.4
+

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6] KVM: x86: Enable XSAVE/XRSTOR for guest

2010-05-27 Thread Sheng Yang
From: Dexuan Cui dexuan@intel.com

This patch enable guest to use XSAVE/XRSTOR instructions.

We assume that host_xcr0 would use all possible bits that OS supported.

And we loaded xcr0 in the same way we handled fpu - do it as late as we can.

Signed-off-by: Dexuan Cui dexuan@intel.com
Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |2 +
 arch/x86/include/asm/vmx.h  |1 +
 arch/x86/kvm/kvm_cache_regs.h   |6 ++
 arch/x86/kvm/vmx.c  |   16 +
 arch/x86/kvm/x86.c  |  120 --
 include/linux/kvm_host.h|2 +-
 6 files changed, 139 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d08bb4a..b16356b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -302,6 +302,7 @@ struct kvm_vcpu_arch {
} update_pte;
 
struct fpu guest_fpu;
+   u64 xcr0;
 
gva_t mmio_fault_cr2;
struct kvm_pio_request pio;
@@ -605,6 +606,7 @@ int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long 
*val);
 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
 void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
+void kvm_set_xcr0(struct kvm_vcpu *vcpu, u64 xcr0);
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 9e6779f..346ea66 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -266,6 +266,7 @@ enum vmcs_field {
 #define EXIT_REASON_EPT_VIOLATION   48
 #define EXIT_REASON_EPT_MISCONFIG   49
 #define EXIT_REASON_WBINVD 54
+#define EXIT_REASON_XSETBV 55
 
 /*
  * Interruption-information format
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index d2a98f8..6491ac8 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -71,4 +71,10 @@ static inline ulong kvm_read_cr4(struct kvm_vcpu *vcpu)
return kvm_read_cr4_bits(vcpu, ~0UL);
 }
 
+static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu)
+{
+   return (kvm_register_read(vcpu, VCPU_REGS_RAX)  -1u)
+   | ((u64)(kvm_register_read(vcpu, VCPU_REGS_RDX)  -1u)  32);
+}
+
 #endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 99ae513..c55d57d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -36,6 +36,8 @@
 #include asm/vmx.h
 #include asm/virtext.h
 #include asm/mce.h
+#include asm/i387.h
+#include asm/xcr.h
 
 #include trace.h
 
@@ -3354,6 +3356,19 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu)
return 1;
 }
 
+static int handle_xsetbv(struct kvm_vcpu *vcpu)
+{
+   u64 new_bv = kvm_read_edx_eax(vcpu);
+
+   if (kvm_register_read(vcpu, VCPU_REGS_RCX) != 0) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+   kvm_set_xcr0(vcpu, new_bv);
+   skip_emulated_instruction(vcpu);
+   return 1;
+}
+
 static int handle_apic_access(struct kvm_vcpu *vcpu)
 {
return emulate_instruction(vcpu, 0, 0, 0) == EMULATE_DONE;
@@ -3632,6 +3647,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
*vcpu) = {
[EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold,
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_WBINVD]  = handle_wbinvd,
+   [EXIT_REASON_XSETBV]  = handle_xsetbv,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
[EXIT_REASON_MCE_DURING_VMENTRY]  = handle_machine_check,
[EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7be1d36..e7acc9d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -64,6 +64,7 @@
(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
  | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \
  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR  \
+ | X86_CR4_OSXSAVE \
  | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
@@ -149,6 +150,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+u64 __read_mostly host_xcr0;
+
+static inline u32 bit(int bitno)
+{
+   return 1  (bitno  31);
+}
+
 static void kvm_on_user_return(struct user_return_notifier *urn)
 {
unsigned slot;
@@ -473,6 +481,52 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 }
 EXPORT_SYMBOL_GPL(kvm_lmsw);
 
+int __kvm_set_xcr0(struct kvm_vcpu *vcpu, u64 xcr0)
+{
+   if (kvm_x86_ops-get_cpl(vcpu) != 0)
+   return 1;
+   if (!(xcr0  XSTATE_FP))
+   return 1;
+   if 

[PATCH] KVM: x86: XSAVE/XRSTOR live migration support

2010-05-27 Thread Sheng Yang
This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm.h |   29 
 arch/x86/kvm/x86.c |   79 
 include/linux/kvm.h|6 +++
 3 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index ff90055..d3f4d9f 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -22,6 +22,7 @@
 #define __KVM_HAVE_XEN_HVM
 #define __KVM_HAVE_VCPU_EVENTS
 #define __KVM_HAVE_DEBUGREGS
+#define __KVM_HAVE_XSAVE
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
@@ -299,4 +300,32 @@ struct kvm_debugregs {
__u64 reserved[9];
 };
 
+/* for KVM_CAP_XSAVE */
+struct kvm_xsave {
+   struct {
+   __u16 cwd;
+   __u16 swd;
+   __u16 twd;
+   __u16 fop;
+   __u64 rip;
+   __u64 rdp;
+   __u32 mxcsr;
+   __u32 mxcsr_mask;
+   __u32 st_space[32];
+   __u32 xmm_space[64];
+   __u32 padding[12];
+   __u32 sw_reserved[12];
+   } i387;
+   struct {
+   __u64 xstate_bv;
+   __u64 reserved1[2];
+   __u64 reserved2[5];
+   } xsave_hdr;
+   struct {
+   __u32 ymmh_space[64];
+   } ymmh;
+   __u64 xcr0;
+   __u32 padding[256];
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e7acc9d..5badba2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1711,6 +1711,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_MCE:
r = KVM_MAX_MCE_BANKS;
break;
+   case KVM_CAP_XSAVE:
+   r = cpu_has_xsave;
+   break;
default:
r = 0;
break;
@@ -2363,6 +2366,59 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
kvm_vcpu *vcpu,
return 0;
 }
 
+static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
+   struct kvm_xsave *guest_xsave)
+{
+   struct xsave_struct *xsave = vcpu-arch.guest_fpu.state-xsave;
+
+   if (!cpu_has_xsave)
+   return;
+
+   guest_xsave-i387.cwd = xsave-i387.cwd;
+   guest_xsave-i387.swd = xsave-i387.swd;
+   guest_xsave-i387.twd = xsave-i387.twd;
+   guest_xsave-i387.fop = xsave-i387.fop;
+   guest_xsave-i387.rip = xsave-i387.rip;
+   guest_xsave-i387.rdp = xsave-i387.rdp;
+   memcpy(guest_xsave-i387.st_space, xsave-i387.st_space, 128);
+   memcpy(guest_xsave-i387.xmm_space, xsave-i387.xmm_space,
+   sizeof guest_xsave-i387.xmm_space);
+
+   guest_xsave-xsave_hdr.xstate_bv = xsave-xsave_hdr.xstate_bv;
+   memcpy(guest_xsave-ymmh.ymmh_space, xsave-ymmh.ymmh_space,
+   sizeof xsave-ymmh.ymmh_space);
+
+   guest_xsave-xcr0 = vcpu-arch.xcr0;
+}
+
+static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
+   struct kvm_xsave *guest_xsave)
+{
+   struct xsave_struct *xsave = vcpu-arch.guest_fpu.state-xsave;
+
+   if (!cpu_has_xsave)
+   return -EINVAL;
+
+   xsave-i387.cwd = guest_xsave-i387.cwd;
+   xsave-i387.swd = guest_xsave-i387.swd;
+   xsave-i387.twd = guest_xsave-i387.twd;
+   xsave-i387.fop = guest_xsave-i387.fop;
+   xsave-i387.rip = guest_xsave-i387.rip;
+   xsave-i387.rdp = guest_xsave-i387.rdp;
+   memcpy(xsave-i387.st_space, guest_xsave-i387.st_space, 128);
+   memcpy(xsave-i387.xmm_space, guest_xsave-i387.xmm_space,
+   sizeof guest_xsave-i387.xmm_space);
+
+   xsave-xsave_hdr.xstate_bv = guest_xsave-xsave_hdr.xstate_bv;
+   memcpy(xsave-ymmh.ymmh_space, guest_xsave-ymmh.ymmh_space,
+   sizeof guest_xsave-ymmh.ymmh_space);
+
+   /* set_xsave may override the initial value of xcr0... */
+   if (guest_xsave-xcr0 != 0)
+   kvm_set_xcr0(vcpu, guest_xsave-xcr0);
+   return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
@@ -2564,6 +2620,29 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_x86_set_debugregs(vcpu, dbgregs);
break;
}
+   case KVM_GET_XSAVE: {
+   struct kvm_xsave xsave;
+
+   kvm_vcpu_ioctl_x86_get_xsave(vcpu, xsave);
+
+   r = -EFAULT;
+   if (copy_to_user(argp, xsave,
+sizeof(struct kvm_xsave)))
+   break;
+   r = 0;
+   break;
+   }
+   case KVM_SET_XSAVE: {
+   struct kvm_xsave xsave;
+
+   r = -EFAULT;
+   if (copy_from_user(xsave, argp,
+  

[PATCH] qemu: Enable XSAVE related CPUID

2010-05-27 Thread Sheng Yang
We can support it in KVM now. The initial values are the minimal requirement
of XSAVE capable processor.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 target-i386/cpuid.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index eebf038..cbf5595 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -1067,6 +1067,38 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx = 0;
 *edx = 0;
 break;
+case 0xD:
+/* Processor Extended State */
+if (!(env-cpuid_ext_features  CPUID_EXT_XSAVE)) {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+break;
+}
+if (count == 0) {
+*eax = 0x7; /* FP | SSE | YMM */
+*ebx = 0x340; /* FP + SSE + YMM size */
+*ecx = 0x340; /* FP + SSE + YMM size */
+*edx = 0;
+} else if (count == 1) {
+/* eax = 1, so we can continue with others */
+*eax = 1;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+} else if (count == 2) {
+*eax = 0x100; /* YMM size */
+*ebx = 0x240; /* YMM offset */
+*ecx = 0;
+*edx = 0;
+} else {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+}
+break;
 case 0x8000:
 *eax = env-cpuid_xlevel;
 *ebx = env-cpuid_vendor1;
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu: kvm: Enable XSAVE live migration support

2010-05-27 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu-kvm-x86.c|   77 +
 qemu-kvm.c|   12 +++
 qemu-kvm.h|   14 +
 target-i386/cpu.h |5 +++
 target-i386/machine.c |   20 +
 5 files changed, 109 insertions(+), 19 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 73b4af7..a2e2896 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -776,6 +776,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
 {
 struct kvm_regs regs;
 struct kvm_fpu fpu;
+struct kvm_xsave xsave;
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 int rc, n, i;
@@ -806,16 +807,35 @@ void kvm_arch_load_regs(CPUState *env, int level)
 
 kvm_set_regs(env, regs);
 
-memset(fpu, 0, sizeof fpu);
-fpu.fsw = env-fpus  ~(7  11);
-fpu.fsw |= (env-fpstt  7)  11;
-fpu.fcw = env-fpuc;
-for (i = 0; i  8; ++i)
-   fpu.ftwx |= (!env-fptags[i])  i;
-memcpy(fpu.fpr, env-fpregs, sizeof env-fpregs);
-memcpy(fpu.xmm, env-xmm_regs, sizeof env-xmm_regs);
-fpu.mxcsr = env-mxcsr;
-kvm_set_fpu(env, fpu);
+if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+memset(xsave, 0, sizeof xsave);
+xsave.i387.swd = env-fpus  ~(7  11);
+xsave.i387.swd |= (env-fpstt  7)  11;
+xsave.i387.cwd = env-fpuc;
+for (i = 0; i  8; ++i)
+xsave.i387.twd |= (!env-fptags[i])  i;
+memcpy(xsave.i387.st_space, env-fpregs,
+sizeof env-fpregs);
+memcpy(xsave.i387.xmm_space, env-xmm_regs,
+sizeof env-xmm_regs);
+xsave.i387.mxcsr = env-mxcsr;
+xsave.xsave_hdr.xstate_bv = env-xstate_bv;
+memcpy(xsave.ymmh.ymmh_space, env-ymmh_regs,
+sizeof env-ymmh_regs);
+xsave.xcr0 = env-xcr0;
+kvm_set_xsave(env, xsave);
+} else {
+memset(fpu, 0, sizeof fpu);
+fpu.fsw = env-fpus  ~(7  11);
+fpu.fsw |= (env-fpstt  7)  11;
+fpu.fcw = env-fpuc;
+for (i = 0; i  8; ++i)
+fpu.ftwx |= (!env-fptags[i])  i;
+memcpy(fpu.fpr, env-fpregs, sizeof env-fpregs);
+memcpy(fpu.xmm, env-xmm_regs, sizeof env-xmm_regs);
+fpu.mxcsr = env-mxcsr;
+kvm_set_fpu(env, fpu);
+}
 
 memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
 if (env-interrupt_injected = 0) {
@@ -938,6 +958,7 @@ void kvm_arch_save_regs(CPUState *env)
 {
 struct kvm_regs regs;
 struct kvm_fpu fpu;
+struct kvm_xsave xsave;
 struct kvm_sregs sregs;
 struct kvm_msr_entry msrs[100];
 uint32_t hflags;
@@ -969,15 +990,33 @@ void kvm_arch_save_regs(CPUState *env)
 env-eflags = regs.rflags;
 env-eip = regs.rip;
 
-kvm_get_fpu(env, fpu);
-env-fpstt = (fpu.fsw  11)  7;
-env-fpus = fpu.fsw;
-env-fpuc = fpu.fcw;
-for (i = 0; i  8; ++i)
-   env-fptags[i] = !((fpu.ftwx  i)  1);
-memcpy(env-fpregs, fpu.fpr, sizeof env-fpregs);
-memcpy(env-xmm_regs, fpu.xmm, sizeof env-xmm_regs);
-env-mxcsr = fpu.mxcsr;
+if (kvm_check_extension(kvm_state, KVM_CAP_XSAVE)) {
+kvm_get_xsave(env, xsave);
+env-fpstt = (xsave.i387.swd  11)  7;
+env-fpus = xsave.i387.swd;
+env-fpuc = xsave.i387.cwd;
+for (i = 0; i  8; ++i)
+env-fptags[i] = !((xsave.i387.twd  i)  1);
+memcpy(env-fpregs, xsave.i387.st_space,
+sizeof env-fpregs);
+memcpy(env-xmm_regs, xsave.i387.xmm_space,
+sizeof env-xmm_regs);
+env-mxcsr = xsave.i387.mxcsr;
+env-xstate_bv = xsave.xsave_hdr.xstate_bv;
+memcpy(env-ymmh_regs, xsave.ymmh.ymmh_space,
+sizeof env-ymmh_regs);
+env-xcr0 = xsave.xcr0;
+} else {
+kvm_get_fpu(env, fpu);
+env-fpstt = (fpu.fsw  11)  7;
+env-fpus = fpu.fsw;
+env-fpuc = fpu.fcw;
+for (i = 0; i  8; ++i)
+env-fptags[i] = !((fpu.ftwx  i)  1);
+memcpy(env-fpregs, fpu.fpr, sizeof env-fpregs);
+memcpy(env-xmm_regs, fpu.xmm, sizeof env-xmm_regs);
+env-mxcsr = fpu.mxcsr;
+}
 
 kvm_get_sregs(env, sregs);
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 35a4c8a..c472b96 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -485,6 +485,18 @@ int kvm_set_mpstate(CPUState *env, struct kvm_mp_state 
*mp_state)
 }
 #endif
 
+#ifdef KVM_CAP_XSAVE
+int kvm_get_xsave(CPUState *env, struct kvm_xsave *xsave)
+{
+return kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
+}
+
+int kvm_set_xsave(CPUState *env, struct kvm_xsave *xsave)
+{
+return kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
+}
+#endif
+
 static int handle_mmio(CPUState *env)
 {
 unsigned long addr = env-kvm_run-mmio.phys_addr;
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6f6c6d8..0fb4638 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -300,6 +300,20 @@ int kvm_get_mpstate(CPUState *env, struct kvm_mp_state 
*mp_state);
 

Re: [PATCH v6] KVM: x86: Enable XSAVE/XRSTOR for guest

2010-05-27 Thread Avi Kivity

On 05/27/2010 12:47 PM, Sheng Yang wrote:

From: Dexuan Cuidexuan@intel.com

This patch enable guest to use XSAVE/XRSTOR instructions.

We assume that host_xcr0 would use all possible bits that OS supported.

And we loaded xcr0 in the same way we handled fpu - do it as late as we can.

   


Looks good.

What's the outlook on a unit test?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: Add perfmon into the guest tests

2010-05-27 Thread Chen Cao
On Thu, May 27, 2010 at 10:08:49AM +0200, Jes Sorensen wrote:
 On 05/25/10 05:05, Chen Cao wrote:
  perfmon2 API provides access to the hardware performance counters of
  modern processors.
  
  Dependency,
  To compile the source code of the test, the following packages should
  be installed,
  glibc-static-2.11.1-6.x86_64
  glibc-headers-2.11.1-6.x86_64
  glibc-common-2.11.1-6.x86_64
  glibc-devel-2.11.1-6.x86_64
  glibc-2.11.1-6.x86_64
  
  Note,
  1. libpfm uses the Performance Monitor Unit (PMU) on the processors,
  but this unit is not provided by kvm currently, i.e. the test should
  fail in kvm guests.
  2. According to the README file of perfmon-tests-0.3, 2.6.24 or higer
  Linux kernel (with perfmon v2.8 or higher) is needed to run the tests.
 
 I thought perfmon2 was deprecated in favor of perf_event.c ? The only
 reference left for perfmon2 in the kernel is in the ia64 tree, and
 KVM/ia64 seems to be pretty dead these days.
 

Jes,

Thank you for reminding.
I have not noticed your mail this afternoon and resent the patches for
the perfmon test, it seems that i may have made a mistake.

by the way, could you tell me how to verify that perfmon2 was
deprecated in favor of perf_event.c, except looking into the kernel
code?


Regards,

Cao, Chen
2010/05/27
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] vhost: fix to check the return value of copy_to/from_user() correctly

2010-05-27 Thread Takuya Yoshikawa
copy_to/from_user() returns the number of bytes that could not be copied.

So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 drivers/vhost/vhost.c |   51 ++--
 1 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5c9c657..9633a3c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -320,10 +320,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
 {
struct vhost_memory mem, *newmem, *oldmem;
unsigned long size = offsetof(struct vhost_memory, regions);
-   long r;
-   r = copy_from_user(mem, m, size);
-   if (r)
-   return r;
+   if (copy_from_user(mem, m, size))
+   return -EFAULT;
if (mem.padding)
return -EOPNOTSUPP;
if (mem.nregions  VHOST_MEMORY_MAX_NREGIONS)
@@ -333,11 +331,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
return -ENOMEM;
 
memcpy(newmem, mem, size);
-   r = copy_from_user(newmem-regions, m-regions,
-  mem.nregions * sizeof *m-regions);
-   if (r) {
+   if (copy_from_user(newmem-regions, m-regions,
+  mem.nregions * sizeof *m-regions)) {
kfree(newmem);
-   return r;
+   return -EFAULT;
}
 
if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
@@ -389,9 +386,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
r = -EBUSY;
break;
}
-   r = copy_from_user(s, argp, sizeof s);
-   if (r  0)
+   if (copy_from_user(s, argp, sizeof s)) {
+   r = -EFAULT;
break;
+   }
if (!s.num || s.num  0x || (s.num  (s.num - 1))) {
r = -EINVAL;
break;
@@ -405,9 +403,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
r = -EBUSY;
break;
}
-   r = copy_from_user(s, argp, sizeof s);
-   if (r  0)
+   if (copy_from_user(s, argp, sizeof s)) {
+   r = -EFAULT;
break;
+   }
if (s.num  0x) {
r = -EINVAL;
break;
@@ -419,12 +418,14 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
case VHOST_GET_VRING_BASE:
s.index = idx;
s.num = vq-last_avail_idx;
-   r = copy_to_user(argp, s, sizeof s);
+   if (copy_to_user(argp, s, sizeof s))
+   r = -EFAULT;
break;
case VHOST_SET_VRING_ADDR:
-   r = copy_from_user(a, argp, sizeof a);
-   if (r  0)
+   if (copy_from_user(a, argp, sizeof a)) {
+   r = -EFAULT;
break;
+   }
if (a.flags  ~(0x1  VHOST_VRING_F_LOG)) {
r = -EOPNOTSUPP;
break;
@@ -477,9 +478,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
vq-used = (void __user *)(unsigned long)a.used_user_addr;
break;
case VHOST_SET_VRING_KICK:
-   r = copy_from_user(f, argp, sizeof f);
-   if (r  0)
+   if (copy_from_user(f, argp, sizeof f)) {
+   r = -EFAULT;
break;
+   }
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
if (IS_ERR(eventfp)) {
r = PTR_ERR(eventfp);
@@ -492,9 +494,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
filep = eventfp;
break;
case VHOST_SET_VRING_CALL:
-   r = copy_from_user(f, argp, sizeof f);
-   if (r  0)
+   if (copy_from_user(f, argp, sizeof f)) {
+   r = -EFAULT;
break;
+   }
eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
if (IS_ERR(eventfp)) {
r = PTR_ERR(eventfp);
@@ -510,9 +513,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
ioctl, void __user *argp)
filep = eventfp;
break;
case VHOST_SET_VRING_ERR:
-   r = copy_from_user(f, argp, sizeof f);
-   if (r  

[PATCH 2/3] vhost-net: fix to check the return value of copy_to/from_user() correctly

2010-05-27 Thread Takuya Yoshikawa
copy_to/from_user() returns the number of bytes that could not be copied.

So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 drivers/vhost/net.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index aa88911..0f41c91 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -593,17 +593,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int 
ioctl,
int r;
switch (ioctl) {
case VHOST_NET_SET_BACKEND:
-   r = copy_from_user(backend, argp, sizeof backend);
-   if (r  0)
-   return r;
+   if (copy_from_user(backend, argp, sizeof backend))
+   return -EFAULT;
return vhost_net_set_backend(n, backend.index, backend.fd);
case VHOST_GET_FEATURES:
features = VHOST_FEATURES;
-   return copy_to_user(featurep, features, sizeof features);
+   if (copy_to_user(featurep, features, sizeof features))
+   return -EFAULT;
+   return 0;
case VHOST_SET_FEATURES:
-   r = copy_from_user(features, featurep, sizeof features);
-   if (r  0)
-   return r;
+   if (copy_from_user(features, featurep, sizeof features))
+   return -EFAULT;
if (features  ~VHOST_FEATURES)
return -EOPNOTSUPP;
return vhost_net_set_features(n, features);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: XSAVE/XRSTOR live migration support

2010-05-27 Thread Avi Kivity

On 05/27/2010 12:48 PM, Sheng Yang wrote:

This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yangsh...@linux.intel.com
---
  arch/x86/include/asm/kvm.h |   29 
  arch/x86/kvm/x86.c |   79 
  include/linux/kvm.h|6 +++
   


Documentation/kvm/api.txt +



+/* for KVM_CAP_XSAVE */
+struct kvm_xsave {
+   struct {
+   __u16 cwd;
+   __u16 swd;
+   __u16 twd;
+   __u16 fop;
+   __u64 rip;
+   __u64 rdp;
+   __u32 mxcsr;
+   __u32 mxcsr_mask;
+   __u32 st_space[32];
+   __u32 xmm_space[64];
+   __u32 padding[12];
+   __u32 sw_reserved[12];
+   } i387;
+   struct {
+   __u64 xstate_bv;
+   __u64 reserved1[2];
+   __u64 reserved2[5];
+   } xsave_hdr;
+   struct {
+   __u32 ymmh_space[64];
+   } ymmh;
+   __u64 xcr0;
+   __u32 padding[256];
+};
   


Need to reserve way more space here for future xsave growth.  I think at 
least 4K.  LRB wa 32x512bit = 1K (though it probably isn't a candidate 
for vmx).  Would be good to get an opinion from your processor architects.


I don't think we need to detail the contents of the structures since 
they're described by the SDM; so we can have just a large array that is 
1:1 with the xsave as saved by the fpu.


If we do that then xcr0 needs to be in a separate structure, say 
kvm_xcr, with a flags field and reserved space of its own for future xcr 
growth.



@@ -2363,6 +2366,59 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct 
kvm_vcpu *vcpu,
return 0;
  }

+static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
+   struct kvm_xsave *guest_xsave)
+{
+   struct xsave_struct *xsave =vcpu-arch.guest_fpu.state-xsave;
+
+   if (!cpu_has_xsave)
+   return;
   


Hm, it would be nice to make it backward compatible and return the 
legacy fpu instead.  I think the layouts are compatible?



+
+   guest_xsave-i387.cwd = xsave-i387.cwd;
+   guest_xsave-i387.swd = xsave-i387.swd;
+   guest_xsave-i387.twd = xsave-i387.twd;
+   guest_xsave-i387.fop = xsave-i387.fop;
+   guest_xsave-i387.rip = xsave-i387.rip;
+   guest_xsave-i387.rdp = xsave-i387.rdp;
+   memcpy(guest_xsave-i387.st_space, xsave-i387.st_space, 128);
+   memcpy(guest_xsave-i387.xmm_space, xsave-i387.xmm_space,
+   sizeof guest_xsave-i387.xmm_space);
+
+   guest_xsave-xsave_hdr.xstate_bv = xsave-xsave_hdr.xstate_bv;
+   memcpy(guest_xsave-ymmh.ymmh_space, xsave-ymmh.ymmh_space,
+   sizeof xsave-ymmh.ymmh_space);
   


And we can do a big memcpy here.  But we need to limit it to what the 
host actually allocated.



+
+   guest_xsave-xcr0 = vcpu-arch.xcr0;
+}
+

  long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg)
  {
@@ -2564,6 +2620,29 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_x86_set_debugregs(vcpu,dbgregs);
break;
}
+   case KVM_GET_XSAVE: {
+   struct kvm_xsave xsave;
   


Too big for stack (especially if we reserve room for growth).


diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 23ea022..5006761 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -524,6 +524,9 @@ struct kvm_enable_cap {
  #define KVM_CAP_PPC_OSI 52
  #define KVM_CAP_PPC_UNSET_IRQ 53
  #define KVM_CAP_ENABLE_CAP 54
+#ifdef __KVM_HAVE_XSAVE
+#define KVM_CAP_XSAVE 55
+#endif
   


Might make sense to have a separate KVM_CAP_XCR, just for consistency.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] vhost: fix the memory leak which will happen when memory_access_ok fails

2010-05-27 Thread Takuya Yoshikawa
We need to free newmem when vhost_set_memory() fails to complete.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 drivers/vhost/vhost.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9633a3c..1241a22 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -337,8 +337,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
return -EFAULT;
}
 
-   if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
+   if (!memory_access_ok(d, newmem, vhost_has_feature(d, 
VHOST_F_LOG_ALL))) {
+   kfree(newmem);
return -EFAULT;
+   }
oldmem = d-memory;
rcu_assign_pointer(d-memory, newmem);
synchronize_rcu();
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: Add perfmon into the guest tests

2010-05-27 Thread Jes Sorensen
On 05/27/10 11:53, Chen Cao wrote:
 Jes,
 
 Thank you for reminding.
 I have not noticed your mail this afternoon and resent the patches for
 the perfmon test, it seems that i may have made a mistake.
 
 by the way, could you tell me how to verify that perfmon2 was
 deprecated in favor of perf_event.c, except looking into the kernel
 code?

Hi Chen,

I went and checked with people in the community, and I can confirm that
perfmon2 is deprecated for everything but ia64. Since KVM/ia64 is pretty
stale and I doubt it even works, I would not recommend for pushing
perfmon2 tests into autotest at this point.

As for how to see it, then I don't think there really is a way except
for looking at the code.

Cheers,
Jes

PS: Please don't CC closed mailing lists when posting to public lists
like the KVM list.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: Fix order passed to iommu_unmap

2010-05-27 Thread Roedel, Joerg
On Wed, May 26, 2010 at 03:36:33PM -0400, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 This is obviously a left-over from the the old interface taking the
 size. Apparently a mostly harmless issue with the current iommu_unmap
 implementation.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Uh, good catch.

Acked-by: Joerg Roedel joerg.roe...@amd.com

 ---
 
 I think this one is even closer to the right order. :)
 
  virt/kvm/iommu.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
 index 5adc578..673c88a 100644
 --- a/virt/kvm/iommu.c
 +++ b/virt/kvm/iommu.c
 @@ -273,7 +273,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
   pfn  = phys  PAGE_SHIFT;
  
   /* Unmap address from IO address space */
 - order   = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE);
 + order   = iommu_unmap(domain, gfn_to_gpa(gfn), 0);
   unmap_pages = 1ULL  order;
  
   /* Unpin all pages we just unmapped to not leak any memory */
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: MMU: fix relaxing permission

2010-05-27 Thread Avi Kivity

On 05/26/2010 05:46 AM, Xiao Guangrong wrote:

There is a relaxing permission operation in set_spte():

if guest's CR0.WP is not set and R/W #PF occurs in supervisor-level,
the mapping path might set to writable, then user can allow to write.

@@ -1859,8 +1859,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

spte |= (u64)pfn  PAGE_SHIFT;

-   if ((pte_access  ACC_WRITE_MASK)
-   || (write_fault  !is_write_protection(vcpu)  !user_fault)) {
+   if (pte_access  ACC_WRITE_MASK) {

   


The host always sets cr0.wp (in shadow mode) so we can write protect 
page tables.  So when the guest clears cr0.wp, we emulate a gpte with 
gpte.w=0 and gpte.u=1 in two ways:


- spte.w=1, spte.u=0: this will allow the guest kernel to write but trap 
on guest user access
- spte.w=0, spte.u=1: allows guest user access but traps on guest kernel 
writes


If the guest attempts an access that is currently disallowed, we switch 
to the other spte encoding.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: SVM: fix compiling warning

2010-05-27 Thread Avi Kivity

On 05/26/2010 05:44 AM, Xiao Guangrong wrote:

fix:

arch/x86/kvm/svm.c: In function ‘is_erratum_383’:
arch/x86/kvm/svm.c:1459: warning: integer constant is too large for ‘long’ type
arch/x86/kvm/svm.c: In function ‘svm_handle_mce’:
arch/x86/kvm/svm.c:1490: warning: unused variable ‘kvm_run’
   


Applied (partially, got the other part from Jan already), thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Fix tboot enabled macro

2010-05-27 Thread Avi Kivity

On 05/27/2010 12:27 PM, Wang, Shane wrote:

Jan Kiszka wrote:
   

The latter. As we have no clue about the actual state (tboot is not
exported on older kernels), we are forced to assume some reasonable
state.
 

Are you trying to load the latest KVM on the older kernels?
   


He is, look at kvm-kmod:

  
http://www.linux-kvm.org/page/Code#building_an_external_module_with_older_kernels


(Jan was tricked into becoming the kvm-kmod maintainer)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6] KVM: x86: Enable XSAVE/XRSTOR for guest

2010-05-27 Thread Sheng Yang
On Thursday 27 May 2010 17:52:36 Avi Kivity wrote:
 On 05/27/2010 12:47 PM, Sheng Yang wrote:
  From: Dexuan Cuidexuan@intel.com
  
  This patch enable guest to use XSAVE/XRSTOR instructions.
  
  We assume that host_xcr0 would use all possible bits that OS supported.
  
  And we loaded xcr0 in the same way we handled fpu - do it as late as we
  can.
 
 Looks good.
 
 What's the outlook on a unit test?

Working on it now...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: XSAVE/XRSTOR live migration support

2010-05-27 Thread Sheng Yang
On Thursday 27 May 2010 18:02:31 Avi Kivity wrote:
 On 05/27/2010 12:48 PM, Sheng Yang wrote:
  This patch enable save/restore of xsave state.
  
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  
arch/x86/include/asm/kvm.h |   29 
arch/x86/kvm/x86.c |   79
 include/linux/kvm.h  
 |6 +++
 
 Documentation/kvm/api.txt +

Yes...
 
  +/* for KVM_CAP_XSAVE */
  +struct kvm_xsave {
  +   struct {
  +   __u16 cwd;
  +   __u16 swd;
  +   __u16 twd;
  +   __u16 fop;
  +   __u64 rip;
  +   __u64 rdp;
  +   __u32 mxcsr;
  +   __u32 mxcsr_mask;
  +   __u32 st_space[32];
  +   __u32 xmm_space[64];
  +   __u32 padding[12];
  +   __u32 sw_reserved[12];
  +   } i387;
  +   struct {
  +   __u64 xstate_bv;
  +   __u64 reserved1[2];
  +   __u64 reserved2[5];
  +   } xsave_hdr;
  +   struct {
  +   __u32 ymmh_space[64];
  +   } ymmh;
  +   __u64 xcr0;
  +   __u32 padding[256];
  +};
 
 Need to reserve way more space here for future xsave growth.  I think at
 least 4K.  LRB wa 32x512bit = 1K (though it probably isn't a candidate
 for vmx).  Would be good to get an opinion from your processor architects.

Would check it.
 
 I don't think we need to detail the contents of the structures since
 they're described by the SDM; so we can have just a large array that is
 1:1 with the xsave as saved by the fpu.

Um, I've tried that, but failed mysteriously... Would check what's wrong.
 
 If we do that then xcr0 needs to be in a separate structure, say
 kvm_xcr, with a flags field and reserved space of its own for future xcr
 growth.

I meant to put it into sregs, but found it's already full... How about 
extended 
sregs?
 
  @@ -2363,6 +2366,59 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct
  kvm_vcpu *vcpu,
  
  return 0;

}
  
  +static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
  +   struct kvm_xsave *guest_xsave)
  +{
  +   struct xsave_struct *xsave =vcpu-arch.guest_fpu.state-xsave;
  +
  +   if (!cpu_has_xsave)
  +   return;
 
 Hm, it would be nice to make it backward compatible and return the
 legacy fpu instead.  I think the layouts are compatible?

Sound good.  But seems we still need KVM_CAP_XSAVE to use this interface, and 
other processors would still go FPU interface. Seems didn't improve much?
 
  +
  +   guest_xsave-i387.cwd = xsave-i387.cwd;
  +   guest_xsave-i387.swd = xsave-i387.swd;
  +   guest_xsave-i387.twd = xsave-i387.twd;
  +   guest_xsave-i387.fop = xsave-i387.fop;
  +   guest_xsave-i387.rip = xsave-i387.rip;
  +   guest_xsave-i387.rdp = xsave-i387.rdp;
  +   memcpy(guest_xsave-i387.st_space, xsave-i387.st_space, 128);
  +   memcpy(guest_xsave-i387.xmm_space, xsave-i387.xmm_space,
  +   sizeof guest_xsave-i387.xmm_space);
  +
  +   guest_xsave-xsave_hdr.xstate_bv = xsave-xsave_hdr.xstate_bv;
  +   memcpy(guest_xsave-ymmh.ymmh_space, xsave-ymmh.ymmh_space,
  +   sizeof xsave-ymmh.ymmh_space);
 
 And we can do a big memcpy here.  But we need to limit it to what the
 host actually allocated.

Would try.
 
  +
  +   guest_xsave-xcr0 = vcpu-arch.xcr0;
  +}
  +
  
long kvm_arch_vcpu_ioctl(struct file *filp,

   unsigned int ioctl, unsigned long arg)

{
  
  @@ -2564,6 +2620,29 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
  
  r = kvm_vcpu_ioctl_x86_set_debugregs(vcpu,dbgregs);
  break;
  
  }
  
  +   case KVM_GET_XSAVE: {
  +   struct kvm_xsave xsave;
 
 Too big for stack (especially if we reserve room for growth).

Oops...
 
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index 23ea022..5006761 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -524,6 +524,9 @@ struct kvm_enable_cap {
  
#define KVM_CAP_PPC_OSI 52
#define KVM_CAP_PPC_UNSET_IRQ 53
#define KVM_CAP_ENABLE_CAP 54
  
  +#ifdef __KVM_HAVE_XSAVE
  +#define KVM_CAP_XSAVE 55
  +#endif
 
 Might make sense to have a separate KVM_CAP_XCR, just for consistency.

Maybe EXTENDED_SREGS? But still every future field in the struct need a CAP...

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] vhost: fix to check the return value of copy_to/from_user() correctly

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 06:58:03PM +0900, Takuya Yoshikawa wrote:
 copy_to/from_user() returns the number of bytes that could not be copied.
 
 So we need to check if it is not zero, and in that case, we should return
 the error number -EFAULT rather than directly return the return value from
 copy_to/from_user().
 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Thanks, applied.

 ---
  drivers/vhost/vhost.c |   51 ++--
  1 files changed, 28 insertions(+), 23 deletions(-)
 
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 5c9c657..9633a3c 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -320,10 +320,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
 vhost_memory __user *m)
  {
   struct vhost_memory mem, *newmem, *oldmem;
   unsigned long size = offsetof(struct vhost_memory, regions);
 - long r;
 - r = copy_from_user(mem, m, size);
 - if (r)
 - return r;
 + if (copy_from_user(mem, m, size))
 + return -EFAULT;
   if (mem.padding)
   return -EOPNOTSUPP;
   if (mem.nregions  VHOST_MEMORY_MAX_NREGIONS)
 @@ -333,11 +331,10 @@ static long vhost_set_memory(struct vhost_dev *d, 
 struct vhost_memory __user *m)
   return -ENOMEM;
  
   memcpy(newmem, mem, size);
 - r = copy_from_user(newmem-regions, m-regions,
 -mem.nregions * sizeof *m-regions);
 - if (r) {
 + if (copy_from_user(newmem-regions, m-regions,
 +mem.nregions * sizeof *m-regions)) {
   kfree(newmem);
 - return r;
 + return -EFAULT;
   }
  
   if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
 @@ -389,9 +386,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
   r = -EBUSY;
   break;
   }
 - r = copy_from_user(s, argp, sizeof s);
 - if (r  0)
 + if (copy_from_user(s, argp, sizeof s)) {
 + r = -EFAULT;
   break;
 + }
   if (!s.num || s.num  0x || (s.num  (s.num - 1))) {
   r = -EINVAL;
   break;
 @@ -405,9 +403,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
   r = -EBUSY;
   break;
   }
 - r = copy_from_user(s, argp, sizeof s);
 - if (r  0)
 + if (copy_from_user(s, argp, sizeof s)) {
 + r = -EFAULT;
   break;
 + }
   if (s.num  0x) {
   r = -EINVAL;
   break;
 @@ -419,12 +418,14 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
   case VHOST_GET_VRING_BASE:
   s.index = idx;
   s.num = vq-last_avail_idx;
 - r = copy_to_user(argp, s, sizeof s);
 + if (copy_to_user(argp, s, sizeof s))
 + r = -EFAULT;
   break;
   case VHOST_SET_VRING_ADDR:
 - r = copy_from_user(a, argp, sizeof a);
 - if (r  0)
 + if (copy_from_user(a, argp, sizeof a)) {
 + r = -EFAULT;
   break;
 + }
   if (a.flags  ~(0x1  VHOST_VRING_F_LOG)) {
   r = -EOPNOTSUPP;
   break;
 @@ -477,9 +478,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
   vq-used = (void __user *)(unsigned long)a.used_user_addr;
   break;
   case VHOST_SET_VRING_KICK:
 - r = copy_from_user(f, argp, sizeof f);
 - if (r  0)
 + if (copy_from_user(f, argp, sizeof f)) {
 + r = -EFAULT;
   break;
 + }
   eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
   if (IS_ERR(eventfp)) {
   r = PTR_ERR(eventfp);
 @@ -492,9 +494,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
   filep = eventfp;
   break;
   case VHOST_SET_VRING_CALL:
 - r = copy_from_user(f, argp, sizeof f);
 - if (r  0)
 + if (copy_from_user(f, argp, sizeof f)) {
 + r = -EFAULT;
   break;
 + }
   eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
   if (IS_ERR(eventfp)) {
   r = PTR_ERR(eventfp);
 @@ -510,9 +513,10 @@ static long vhost_set_vring(struct vhost_dev *d, int 
 ioctl, void __user *argp)
   filep = eventfp;
   break;
   case VHOST_SET_VRING_ERR:
 - r = copy_from_user(f, argp, sizeof 

Re: [PATCH 2/3] vhost-net: fix to check the return value of copy_to/from_user() correctly

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 07:01:58PM +0900, Takuya Yoshikawa wrote:
 copy_to/from_user() returns the number of bytes that could not be copied.
 
 So we need to check if it is not zero, and in that case, we should return
 the error number -EFAULT rather than directly return the return value from
 copy_to/from_user().
 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Thanks, applied.

 ---
  drivers/vhost/net.c |   14 +++---
  1 files changed, 7 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index aa88911..0f41c91 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -593,17 +593,17 @@ static long vhost_net_ioctl(struct file *f, unsigned 
 int ioctl,
   int r;
   switch (ioctl) {
   case VHOST_NET_SET_BACKEND:
 - r = copy_from_user(backend, argp, sizeof backend);
 - if (r  0)
 - return r;
 + if (copy_from_user(backend, argp, sizeof backend))
 + return -EFAULT;
   return vhost_net_set_backend(n, backend.index, backend.fd);
   case VHOST_GET_FEATURES:
   features = VHOST_FEATURES;
 - return copy_to_user(featurep, features, sizeof features);
 + if (copy_to_user(featurep, features, sizeof features))
 + return -EFAULT;
 + return 0;
   case VHOST_SET_FEATURES:
 - r = copy_from_user(features, featurep, sizeof features);
 - if (r  0)
 - return r;
 + if (copy_from_user(features, featurep, sizeof features))
 + return -EFAULT;
   if (features  ~VHOST_FEATURES)
   return -EOPNOTSUPP;
   return vhost_net_set_features(n, features);
 -- 
 1.7.0.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] vhost: fix the memory leak which will happen when memory_access_ok fails

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 07:03:56PM +0900, Takuya Yoshikawa wrote:
 We need to free newmem when vhost_set_memory() fails to complete.
 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 ---

Thanks, applied.

  drivers/vhost/vhost.c |4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 9633a3c..1241a22 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -337,8 +337,10 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
 vhost_memory __user *m)
   return -EFAULT;
   }
  
 - if (!memory_access_ok(d, newmem, vhost_has_feature(d, VHOST_F_LOG_ALL)))
 + if (!memory_access_ok(d, newmem, vhost_has_feature(d, 
 VHOST_F_LOG_ALL))) {
 + kfree(newmem);
   return -EFAULT;
 + }
   oldmem = d-memory;
   rcu_assign_pointer(d-memory, newmem);
   synchronize_rcu();
 -- 
 1.7.0.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL net-next-2.6] vhost-net cleanups

2010-05-27 Thread Michael S. Tsirkin
David,
The following tree is on top of net-next-2.6.
Please merge it for 2.6.36.
Thanks!

The following changes since commit 7a9b149212f3716c598afe973b6261fd58453b7a:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 
(2010-05-20 21:26:12 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next

Jeff Dike (1):
  vhost-net: minor cleanup

Michael S. Tsirkin (1):
  vhost: whitespace fix

Tobias Klauser (1):
  vhost: Storage class should be before const qualifier

 drivers/vhost/net.c   |   13 ++---
 drivers/vhost/vhost.c |4 ++--
 2 files changed, 8 insertions(+), 9 deletions(-)
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: MMU: fix relaxing permission

2010-05-27 Thread Xiao Guangrong


Avi Kivity wrote:
 On 05/26/2010 05:46 AM, Xiao Guangrong wrote:
 There is a relaxing permission operation in set_spte():

 if guest's CR0.WP is not set and R/W #PF occurs in supervisor-level,
 the mapping path might set to writable, then user can allow to write.

 @@ -1859,8 +1859,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64
 *sptep,

   spte |= (u64)pfn  PAGE_SHIFT;

 -if ((pte_access  ACC_WRITE_MASK)
 -|| (write_fault  !is_write_protection(vcpu)  !user_fault)) {
 +if (pte_access  ACC_WRITE_MASK) {


 
 The host always sets cr0.wp (in shadow mode) so we can write protect
 page tables.  So when the guest clears cr0.wp, we emulate a gpte with
 gpte.w=0 and gpte.u=1 in two ways:
 
 - spte.w=1, spte.u=0: this will allow the guest kernel to write but trap
 on guest user access
 - spte.w=0, spte.u=1: allows guest user access but traps on guest kernel
 writes
 
 If the guest attempts an access that is currently disallowed, we switch
 to the other spte encoding.

Avi,

Thanks for your explanation, but i not see where to implement what you say,
could you please point it out for me? :-(

And, i think use 'spte.w=1, spte.u=0' to emulate 'guest cr0.wp=0 and gpte.w=0'
is not a good way since it can completely stop user process access, but in this
case, user process is usually read and kernel lazily to write, just like vdso,
it will generate a lots of #PF
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: MMU: fix relaxing permission

2010-05-27 Thread Avi Kivity

On 05/27/2010 02:00 PM, Xiao Guangrong wrote:



The host always sets cr0.wp (in shadow mode) so we can write protect
page tables.  So when the guest clears cr0.wp, we emulate a gpte with
gpte.w=0 and gpte.u=1 in two ways:

- spte.w=1, spte.u=0: this will allow the guest kernel to write but trap
on guest user access
- spte.w=0, spte.u=1: allows guest user access but traps on guest kernel
writes

If the guest attempts an access that is currently disallowed, we switch
to the other spte encoding.
 

Avi,

Thanks for your explanation, but i not see where to implement what you say,
could you please point it out for me? :-(
   


b70ccb0b3fd removed it accidentally:


-   } else
-   /*
-* Kernel mode access.  Fail if it's a read-only page and
-* supervisor write protection is enabled.
-*/
-   if (!writable_shadow) {
-   if (is_write_protection(vcpu))
-   return 0;
-   *shadow_ent = ~PT_USER_MASK;
-   }


:(


And, i think use 'spte.w=1, spte.u=0' to emulate 'guest cr0.wp=0 and gpte.w=0'
is not a good way since it can completely stop user process access, but in this
case, user process is usually read and kernel lazily to write, just like vdso,
it will generate a lots of #PF
   


As soon as the guest kernel stops writing we switch back to 
gpte.w=gpte.u=1 and the guest can access it completely.  For the case 
where both the kernel and userspace use interleaved access, you are 
right, but I don't see a better way, do you?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: XSAVE/XRSTOR live migration support

2010-05-27 Thread Avi Kivity

On 05/27/2010 01:33 PM, Sheng Yang wrote:



If we do that then xcr0 needs to be in a separate structure, say
kvm_xcr, with a flags field and reserved space of its own for future xcr
growth.
 

I meant to put it into sregs, but found it's already full... How about extended
sregs?
   


Isn't this what xcr means?  xtended control register?


+static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
+   struct kvm_xsave *guest_xsave)
+{
+   struct xsave_struct *xsave =vcpu-arch.guest_fpu.state-xsave;
+
+   if (!cpu_has_xsave)
+   return;
   

Hm, it would be nice to make it backward compatible and return the
legacy fpu instead.  I think the layouts are compatible?
 

Sound good.  But seems we still need KVM_CAP_XSAVE to use this interface, and
other processors would still go FPU interface. Seems didn't improve much?
   


I would like the new interface to be used in all cases, this way we can 
deprecate the old one in a few years.



diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 23ea022..5006761 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -524,6 +524,9 @@ struct kvm_enable_cap {

   #define KVM_CAP_PPC_OSI 52
   #define KVM_CAP_PPC_UNSET_IRQ 53
   #define KVM_CAP_ENABLE_CAP 54

+#ifdef __KVM_HAVE_XSAVE
+#define KVM_CAP_XSAVE 55
+#endif
   

Might make sense to have a separate KVM_CAP_XCR, just for consistency.
 

Maybe EXTENDED_SREGS? But still every future field in the struct need a CAP...
   


Might do

struct kvm_xcr {
__u32 xcr;
__u32 reserved;
__u64 value;
};

struct kvm_xcrs {
__u32 nr_xcrs;
__u32 flags;
struct kvm_xcr xcrs[KVM_MAX_XCRS];
... reserved;
};

which would allow new xcrs to be added easily.

You'll need to change kvm_set_xcr0() to kvm_set_xcr() for this to work 
though.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow mode

2010-05-27 Thread Avi Kivity
When tdp is enabled, the guest's cr0.wp shouldn't have any effect on spte
permissions.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 136e160..39dd8d3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1881,7 +1881,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte |= (u64)pfn  PAGE_SHIFT;
 
if ((pte_access  ACC_WRITE_MASK)
-   || (write_fault  !is_write_protection(vcpu)  !user_fault)) {
+   || (!tdp_enabled  write_fault  !is_write_protection(vcpu)
+!user_fault)) {
 
if (level  PT_PAGE_TABLE_LEVEL 
has_wrprotected_page(vcpu-kvm, gfn, level)) {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Fix cr0.wp=0 emulation

2010-05-27 Thread Avi Kivity
Xiao didn't find the nice cr0.wp=0 trick because it wasn't there, removed
accidentally by b70ccb0b3fd. Restore the trick and document it.

Avi Kivity (3):
  KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow
mode
  KVM: MMU: Remove user access when allowing kernel access to gpte.w=0
page
  KVM: MMU: Document cr0.wp emulation

 Documentation/kvm/mmu.txt |   18 ++
 arch/x86/kvm/mmu.c|6 +-
 2 files changed, 23 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page

2010-05-27 Thread Avi Kivity
If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0.
We do that by setting spte.w=1, since the host cr0.wp must remain set so the
host can write protect pages.  Once we allow write access, we must remove
user access otherwise we mistakenly allow the user to write the page.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/mmu.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 39dd8d3..56f8c3c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1894,6 +1894,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
spte |= PT_WRITABLE_MASK;
 
+   if (!tdp_enabled  !(pte_access  ACC_WRITE_MASK))
+   spte = ~PT_USER_MASK;
+
/*
 * Optimization: for pte sync, if spte was writable the hash
 * lookup is unnecessary (and expensive). Write protection
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: MMU: Document cr0.wp emulation

2010-05-27 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 Documentation/kvm/mmu.txt |   18 ++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index 2201dcb..6528ad9 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -298,6 +298,24 @@ Host translation updates:
   - look up affected sptes through reverse map
   - drop (or update) translations
 
+Emulating cr0.wp
+
+
+If tdp is not enabled, the host must keep cr0.wp=1 so page write protection
+works for the guest kernel, not guest guest userspace.  When the guest
+cr0.wp=1, this does not present a problem.  However when the guest cr0.wp=0,
+we cannot map the permissions for gpte.u=1, gpte.w=0 to any spte.
+
+We handle this my mapping the permissions to two possible gptes, depending
+on fault type:
+
+- kernel write fault: spte.u=0, spte.w=1 (allows full kernel access,
+  disallows user access)
+- read fault: spte.u=1, spte.w=0 (allows full read access, disallows kernel
+  write access)
+
+(user write faults generate a #PF)
+
 Further reading
 ===
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: MMU: fix relaxing permission

2010-05-27 Thread Avi Kivity

On 05/27/2010 02:18 PM, Avi Kivity wrote:


And, i think use 'spte.w=1, spte.u=0' to emulate 'guest cr0.wp=0 and 
gpte.w=0'
is not a good way since it can completely stop user process access, 
but in this
case, user process is usually read and kernel lazily to write, just 
like vdso,

it will generate a lots of #PF


As soon as the guest kernel stops writing we switch back to 
gpte.w=gpte.u=1 and the guest can access it completely.  For the case 
where both the kernel and userspace use interleaved access, you are 
right, but I don't see a better way, do you?


To expand, we only set spte.w=1 on write faults.  So if the guest only 
reads the page, we'll instantiate an spte with u=1 and w=0.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL net-2.6] vhost-net: error handling fixes

2010-05-27 Thread Michael S. Tsirkin
David,
The following tree includes fixes dealing with error handling
in vhost-net. It is on top of net-2.6.
Please merge it for 2.6.35.
Thanks!

The following changes since commit 8a74ad60a546b13bd1096b2a61a7a5c6fd9ae17c:

  net: fix lock_sock_bh/unlock_sock_bh (2010-05-27 00:30:53 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

Krishna Kumar (1):
  vhost: Fix host panic if ioctl called with wrong index

Takuya Yoshikawa (3):
  vhost: fix to check the return value of copy_to/from_user() correctly
  vhost-net: fix to check the return value of copy_to/from_user() correctly
  vhost: fix the memory leak which will happen when memory_access_ok fails

 drivers/vhost/net.c   |   14 ++--
 drivers/vhost/vhost.c |   57 +++-
 2 files changed, 39 insertions(+), 32 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: don't truncate MSIX capabilities table size

2010-05-27 Thread Avi Kivity

On 05/26/2010 02:08 AM, Alex Williamson wrote:

PCI_MSIX_TABSIZE is 0x07ff

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 0/2] virtio: put last seen used index into ring itself

2010-05-27 Thread Avi Kivity

On 05/26/2010 10:50 PM, Michael S. Tsirkin wrote:

Here's a rewrite of the original patch with a new layout.
I haven't tested it yet so no idea how this performs, but
I think this addresses the cache bounce issue raised by Avi.
Posting for early flames/comments.

Generally, the Host end of the virtio ring doesn't need to see where
Guest is up to in consuming the ring.  However, to completely understand
what's going on from the outside, this information must be exposed.
For example, host can reduce the number of interrupts by detecting
that the guest is currently handling previous buffers.

We add a feature bit so the guest can tell the host that it's writing
out the current value there, if it wants to use that.

This differs from original approach in that the used index
is put after avail index (they are typically written out together).
To avoid cache bounces on descriptor access,
and make future extensions easier, we put the ring itself at start of
page, and move the control after it.
   


I missed the spec patch, can you repost it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page

2010-05-27 Thread Xiao Guangrong


Avi Kivity wrote:
 If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0.
 We do that by setting spte.w=1, since the host cr0.wp must remain set so the
 host can write protect pages.  Once we allow write access, we must remove
 user access otherwise we mistakenly allow the user to write the page.
 

Yeah, it's really a nice way :-)

Reviewed-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/mmu.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 39dd8d3..56f8c3c 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1894,6 +1894,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
  
   spte |= PT_WRITABLE_MASK;
  
 + if (!tdp_enabled  !(pte_access  ACC_WRITE_MASK))
 + spte = ~PT_USER_MASK;
 +
   /*
* Optimization: for pte sync, if spte was writable the hash
* lookup is unnecessary (and expensive). Write protection
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Propagate fpu_alloc errors

2010-05-27 Thread Avi Kivity

On 05/25/2010 05:01 PM, Jan Kiszka wrote:

Memory allocation may fail. Propagate such errors.

   


Applied, thanks.


Signed-off-by: Jan Kiszkajan.kis...@siemens.com
---
  arch/x86/include/asm/kvm_host.h |2 +-
  arch/x86/kvm/svm.c  |7 ++-
  arch/x86/kvm/vmx.c  |4 +++-
  arch/x86/kvm/x86.c  |   11 +--
   


An indication that fpu init should be in common code...

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: optimizations when tdp is in use

2010-05-27 Thread Marcelo Tosatti
On Thu, May 27, 2010 at 04:06:34PM +0800, Gui Jianfeng wrote:
 In case of using tdp, checking write protected page isn't needed and
 quadrant also no need to be calculated.
 
 Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
 ---
  arch/x86/kvm/mmu.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 0bb9f17..ce4bbd3 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -495,10 +495,13 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
 large_gfn)
   max_level = kvm_x86_ops-get_lpage_level()  host_level ?
   kvm_x86_ops-get_lpage_level() : host_level;
  
 + if (tdp_enabled)
 + goto done;
 +

This is wrong. write_count is initialized for alignment purposes, not 
only write protected pages. See __kvm_set_memory_region in
virt/kvm/kvm_main.c.

   for (level = PT_DIRECTORY_LEVEL; level = max_level; ++level)
   if (has_wrprotected_page(vcpu-kvm, large_gfn, level))
   break;
 -
 +done:
   return level - 1;
  }
  
 @@ -1346,7 +1349,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
 kvm_vcpu *vcpu,
   if (role.direct)
   role.cr4_pae = 0;
   role.access = access;
 - if (vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
 + if (!tdp_enabled  vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
   quadrant = gaddr  (PAGE_SHIFT + (PT64_PT_BITS * level));
   quadrant = (1  ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
   role.quadrant = quadrant;
 -- 
 1.6.5.2
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Oleg Nesterov
On 05/27, Michael S. Tsirkin wrote:

 On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
  Add a new kernel API to create a singlethread workqueue and attach it's
  task to current task's cgroup and cpumask.
 
  Signed-off-by: Sridhar Samudrala s...@us.ibm.com

 Could someone familiar with workqueue code please comment on whether
 this patch is suitable for 2.6.35?

 It is needed to fix the case where vhost user might cause a kernel
 thread to consume more CPU than allowed by the cgroup.
 Should I merge it through the vhost tree?
 Ack for this?

I don't understand the reasons for this patch, but this doesn't matter.

I don't really see any need to change workqueue.c,

  +static struct task_struct *get_singlethread_wq_task(struct 
  workqueue_struct *wq)
  +{
  +   return (per_cpu_ptr(wq-cpu_wq, singlethread_cpu))-thread;
  +}

(Not sure this trivial static helper with the single caller makes sense, but
 see below)

  +/* Create a singlethread workqueue and attach it's task to the current 
  task's
  + * cgroup and set it's cpumask to the current task's cpumask.
  + */
  +struct workqueue_struct *create_singlethread_workqueue_in_current_cg(char 
  *name)
  +{
  +   struct workqueue_struct *wq;
  +   struct task_struct *task;
  +   cpumask_var_t mask;
  +
  +   wq = create_singlethread_workqueue(name);
  +   if (!wq)
  +   goto out;
  +
  +   if (!alloc_cpumask_var(mask, GFP_KERNEL))
  +   goto err;
  +   
  +   if (sched_getaffinity(current-pid, mask))
  +   goto err;
  +
  +   task = get_singlethread_wq_task(wq);
  +   if (sched_setaffinity(task-pid, mask))
  +   goto err;
  +
  +   if (cgroup_attach_task_current_cg(task))
  +   goto err;
  +out:   
  +   return wq;
  +err:
  +   destroy_workqueue(wq);
  +   wq = NULL;
  +   goto out;
  +}

Instead, cgroup.c (or whoever needs this) can do

struct move_struct {
struct work_struct work;
int ret;
};

static void move_func(struct work_struct *work)
{
struct move_struct *move = container_of(...);

if (cgroup_attach_task_current_cg(current))
ret = -EANY;
}

static struct workqueue_struct 
*create_singlethread_workqueue_in_current_cg(char *name)
{
struct workqueue_struct *wq;
struct move_struct move = {
.work = __WORK_INITIALIZER(move_func);
};

wq = create_singlethread_workqueue(name);
if (!wq)
return NULL;

queue_work(move.work);
flush_work(move.work);

if (move.ret) {
destroy_workqueue(wq);
wq = NULL;
}

return wq;
}

Or. Just export wq_per_cpu() from workqueue.c (probably with a better name) and
use it like the patch does.

But, imho, create_singlethread_workqueue_in_current_cg() does not belong
to  workqueue.c.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: Enable XSAVE related CPUID

2010-05-27 Thread Avi Kivity

On 05/27/2010 12:50 PM, Sheng Yang wrote:

We can support it in KVM now. The initial values are the minimal requirement
of XSAVE capable processor.

Signed-off-by: Sheng Yangsh...@linux.intel.com
---
  target-i386/cpuid.c |   32 
  1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index eebf038..cbf5595 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -1067,6 +1067,38 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  *ecx = 0;
  *edx = 0;
  break;
+case 0xD:
+/* Processor Extended State */
+if (!(env-cpuid_ext_features  CPUID_EXT_XSAVE)) {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+break;
+}
+if (count == 0) {
+*eax = 0x7; /* FP | SSE | YMM */
+*ebx = 0x340; /* FP + SSE + YMM size */
+*ecx = 0x340; /* FP + SSE + YMM size */
   


For -cpu host, we should pick these from KVM_GET_SUPPORTED_CPUID.  For 
canned cpu types (e.g. qemu64), we need to return what we always did.


We can also add a new cpu type that has them built in (there's no cpu on 
the market with avx, right?)



+*edx = 0;
+} else if (count == 1) {
+/* eax = 1, so we can continue with others */
+*eax = 1;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+} else if (count == 2) {
+*eax = 0x100; /* YMM size */
+*ebx = 0x240; /* YMM offset */
+*ecx = 0;
+*edx = 0;
   


These, too.


+} else {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+}
+break;
  case 0x8000:
  *eax = env-cpuid_xlevel;
  *ebx = env-cpuid_vendor1;
   



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copy and paste feature across guest and host

2010-05-27 Thread Dor Laor

On 05/27/2010 12:17 PM, Tomasz Chmielewski wrote:

Just installed Fedora13 as guest on KVM.  However there is no
cross-platform copy and paste feature. I trust I have setup this
feature on other guest sometime before. Unfortunately I can't the
relevant document. Could you please shed me some light. Pointer
would be appreciated. TIA


Did you try;

# modprobe virtio-copypaste

?


Seriously, qemu does not make it easy (well, its GUI does not make most
things easy) and you'll need a tool which synchronizes the clipboard
between two machines (google for qemu copy paste?).


There is no cutpaste at the moment. The plan is to enable it through 
virtio-serial and have spice  vnc use it. Cannot guarantee a date but 
it shouldn't be too long.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/17] arch/ia64/kvm: Add missing spin_unlock

2010-05-27 Thread Avi Kivity

On 05/26/2010 06:57 PM, Julia Lawall wrote:

From: Julia Lawallju...@diku.dk

Add a spin_unlock missing on the error path.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

//smpl
@@
expression E1;
@@

* spin_lock(E1,...);
   +... when != E1
   if (...) {
 ... when != E1
*   return ...;
   }
   ...+
* spin_unlock(E1,...);
///smpl

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: Document large pages

2010-05-27 Thread Avi Kivity
Signed-off-by: Avi Kivity a...@redhat.com
---
 Documentation/kvm/mmu.txt |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index 1e7ecdd..6a0cab2 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -317,6 +317,29 @@ on fault type:
 
 (user write faults generate a #PF)
 
+Large pages
+===
+
+The mmu supports all combinations of large and small guest and host pages.
+Supported page sizes include 4k, 2M, 4M, and 1G.  4M pages are treated as
+two separate 2M pages, on both guest and host, since the mmu always uses PAE
+paging.
+
+To instantiate a large spte, four constraints must be satisfied:
+
+- the spte must point to a large host page
+- the guest pte must be a large pte of at least equivalent size (if tdp is
+  enabled, there is no guest pte and this condition is satisified)
+- if the spte will be writeable, the large page frame may not overlap any
+  write-protected pages
+- the guest guest pte must be wholly contained by a single memory slot
+
+To check the last two conditions, the mmu maintains a -write_count set of
+arrays for each memory slot and large page size.  Every write protected page
+causes its write_count to be incremented, thus preventing instantiation of
+a large spte.  The frames at the end of an unaligned memory slot have
+artificically inflated -write_counts so they can never be instantiated.
+
 Further reading
 ===
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Oleg Nesterov
On 05/27, Michael S. Tsirkin wrote:

 On Thu, May 27, 2010 at 02:44:48PM +0200, Oleg Nesterov wrote:
  On 05/27, Michael S. Tsirkin wrote:
  
   On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
Add a new kernel API to create a singlethread workqueue and attach it's
task to current task's cgroup and cpumask.
   
Signed-off-by: Sridhar Samudrala s...@us.ibm.com
  
   Could someone familiar with workqueue code please comment on whether
   this patch is suitable for 2.6.35?
  
   It is needed to fix the case where vhost user might cause a kernel
   thread to consume more CPU than allowed by the cgroup.
   Should I merge it through the vhost tree?
   Ack for this?
 
  I don't understand the reasons for this patch, but this doesn't matter.

 Depending on userspace application, driver can create a lot of work
 for a workqueue to handle. By making the workqueue thread
 belong in a cgroup, we make it possible to the CPU and other
 resources thus consumed.

OK, I see, thanks for your explanation.


in case I wasn't clear... I didn't mean I dislike this idea, only the
the implementation of create_singlethread_workqueue_in_current_cg(),
it doesn't belong to workqueue.c imho.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-27 Thread Avi Kivity

On 05/25/2010 05:00 PM, Michael S. Tsirkin wrote:

When guest notifier is assigned, we set mask notifier,
which will assign kvm irqfd.
When guest notifier is unassigned, mask notifier is unset,
which should unassign kvm irqfd.

The way to do this is to call mask notifier telling it to mask the vector.
This, unless vector is already masked which unassigns irqfd already.

The logic in unassign was reversed, which left kvm irqfd assigned.

This patch is qemu-kvm only as irqfd is not upstream.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: Enable XSAVE related CPUID

2010-05-27 Thread Sheng Yang
On Thursday 27 May 2010 20:56:17 Avi Kivity wrote:
 On 05/27/2010 12:50 PM, Sheng Yang wrote:
  We can support it in KVM now. The initial values are the minimal
  requirement of XSAVE capable processor.
  
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
  
target-i386/cpuid.c |   32 
1 files changed, 32 insertions(+), 0 deletions(-)
  
  diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
  index eebf038..cbf5595 100644
  --- a/target-i386/cpuid.c
  +++ b/target-i386/cpuid.c
  @@ -1067,6 +1067,38 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
  index, uint32_t count,
  
*ecx = 0;
*edx = 0;
break;
  
  +case 0xD:
  +/* Processor Extended State */
  +if (!(env-cpuid_ext_features  CPUID_EXT_XSAVE)) {
  +*eax = 0;
  +*ebx = 0;
  +*ecx = 0;
  +*edx = 0;
  +break;
  +}
  +if (count == 0) {
  +*eax = 0x7; /* FP | SSE | YMM */
  +*ebx = 0x340; /* FP + SSE + YMM size */
  +*ecx = 0x340; /* FP + SSE + YMM size */
 
 For -cpu host, we should pick these from KVM_GET_SUPPORTED_CPUID.  For
 canned cpu types (e.g. qemu64), we need to return what we always did.

Yes, I also prefer this way. didn't do this because it's somehow out of current 
QEmu CPUID setting mechanism.
 
 We can also add a new cpu type that has them built in (there's no cpu on
 the market with avx, right?)

Right... Let's use -cpu host now.

 
  +*edx = 0;
  +} else if (count == 1) {
  +/* eax = 1, so we can continue with others */
  +*eax = 1;
  +*ebx = 0;
  +*ecx = 0;
  +*edx = 0;
  +} else if (count == 2) {
  +*eax = 0x100; /* YMM size */
  +*ebx = 0x240; /* YMM offset */
  +*ecx = 0;
  +*edx = 0;
 
 These, too.

Sure.

--
regards
Yang, Sheng

 
  +} else {
  +*eax = 0;
  +*ebx = 0;
  +*ecx = 0;
  +*edx = 0;
  +}
  +break;
  
case 0x8000:
*eax = env-cpuid_xlevel;
*ebx = env-cpuid_vendor1;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 0/2] virtio: put last seen used index into ring itself

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 03:07:52PM +0300, Avi Kivity wrote:
 I missed the spec patch, can you repost it?

Still work in progress, but here it is.
Note I am still debating with myself whether we should split
avail idx and flags into separate cache lines.

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index ed35893..150e5a8 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -1803,6 +1803,36 @@ next
 \emph default
  descriptor entry (modulo the ring size).
  This starts at 0, and increases.
+\change_inserted 0 1274966643
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1274968378
+When PUBLISH_USED feature flag has 
+\emph on
+not
+\emph default
+ been negotiated, the ring follows the 
+\begin_inset Quotes eld
+\end_inset
+
+flags
+\begin_inset Quotes erd
+\end_inset
+
+ and the 
+\begin_inset Quotes eld
+\end_inset
+
+idx
+\begin_inset Quotes erd
+\end_inset
+
+ fields:
+\change_unchanged
+
 \end_layout
 
 \begin_layout Standard
@@ -1845,7 +1875,134 @@ struct vring_avail {
 
 \end_layout
 
+\begin_layout Standard
+
+\change_inserted 0 1274968432
+\begin_inset CommandInset label
+LatexCommand label
+name PUBLISH_USED-feature
+
+\end_inset
+
+When PUBLISH_USED feature flag has been negotiated, the control structure
+ including the 
+\begin_inset Quotes eld
+\end_inset
+
+flags and the 
+\begin_inset Quotes eld
+\end_inset
+
+idx
+\begin_inset Quotes erd
+\end_inset
+
+ fields follows the ring.
+ This leaves the room for the 
+\begin_inset Quotes eld
+\end_inset
+
+last_seen_used_idx
+\begin_inset Quotes erd
+\end_inset
+
+ field, which indicates the most recent 
+\begin_inset Quotes eld
+\end_inset
+
+idx
+\begin_inset Quotes erd
+\end_inset
+
+ value observed by guest in the used ring (see 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference sub:Used-Ring
+
+\end_inset
+
+ below):
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1274967396
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274967404
+
+struct vring_avail {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274967405
+
+   u16 ring[qsz]; /* qsz is the Queue Size field read from device */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274967533
+
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274967533
+
+   u16 flags;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274967533
+
+   u16 idx;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274968345
+
+   u16 last_seen_used_idx;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274967396
+
+}; 
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 0 1274967715
+If the ring is large enough, the second layout maintains the control and
+ ring structures on separate cache lines.
+\end_layout
+
 \begin_layout Subsection
+
+\change_inserted 0 1274968415
+\begin_inset CommandInset label
+LatexCommand label
+name sub:Used-Ring
+
+\end_inset
+
+
+\change_unchanged
 Used Ring
 \end_layout
 
@@ -2391,12 +2548,20 @@ status open
 
 \begin_layout Plain Layout
 
-while (vq-last_seen_used != vring-used.idx) {
+while (vq-last_seen_used
+\change_inserted 0 1274968316
+_idx
+\change_unchanged
+ != vring-used.idx) {
 \end_layout
 
 \begin_layout Plain Layout
 
-struct vring_used_elem *e = vring.used-ring[vq-last_seen_used%vsz];
+struct vring_used_elem *e = vring.used-ring[vq-last_seen_used
+\change_inserted 0 1274968326
+_idx
+\change_unchanged
+%vsz];
 \end_layout
 
 \begin_layout Plain Layout
@@ -2406,7 +2571,11 @@ while (vq-last_seen_used != vring-used.idx) {
 
 \begin_layout Plain Layout
 
-vq-last_seen_used++;
+vq-last_seen_used
+\change_inserted 0 1274968321
+_idx
+\change_unchanged
+++;
 \end_layout
 
 \begin_layout Plain Layout
@@ -2419,6 +2588,13 @@ while (vq-last_seen_used != vring-used.idx) {
 
 \end_layout
 
+\begin_layout Standard
+
+\change_inserted 0 1274968252
+If PUBLISH_USED feature is negotiated, last_seen_used value should be published
+ to the device in the avail ring.
+\end_layout
+
 \begin_layout Subsection
 Dealing With Configuration Changes
 \end_layout
@@ -2986,6 +3162,47 @@ struct vring_avail {
 \begin_layout Plain Layout
 
 };
+\change_inserted 0 1274966477
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274966484
+
+struct vring_avail_ctrl {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274966489
+
+__u16 flags;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274966494
+
+__u16 idx;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274966499
+
+__u16 last_used_idx;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1274966474
+
+};
 \end_layout
 
 \begin_layout Plain Layout
@@ -3349,6 +3566,28 @@ reference sub:Indirect-Descriptors
 \end_inset
 
 .

Re: Copy and paste feature across guest and host

2010-05-27 Thread Markus Breitländer
Am 27.05.2010 15:19, schrieb Dor Laor:
 On 05/27/2010 12:17 PM, Tomasz Chmielewski wrote:
 Just installed Fedora13 as guest on KVM.  However there is no
 cross-platform copy and paste feature. I trust I have setup this
 feature on other guest sometime before. Unfortunately I can't the
 relevant document. Could you please shed me some light. Pointer
 would be appreciated. TIA

 Did you try;

 # modprobe virtio-copypaste

 ?


 Seriously, qemu does not make it easy (well, its GUI does not make most
 things easy) and you'll need a tool which synchronizes the clipboard
 between two machines (google for qemu copy paste?).
 
 There is no cutpaste at the moment. The plan is to enable it through
 virtio-serial and have spice  vnc use it. Cannot guarantee a date but
 it shouldn't be too long.
 
 
 -- 
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maybe NX / FreeNX will suit your needs.

Regards,
Markus

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/2] Fix scsi-generic breakage in upstream qemu-kvm.git

2010-05-27 Thread Nicholas A. Bellinger
On Thu, 2010-05-20 at 15:18 +0200, Kevin Wolf wrote:
 Am 17.05.2010 18:45, schrieb Nicholas A. Bellinger:
  From: Nicholas Bellinger n...@linux-iscsi.org
  
  Greetings,
  
  Attached are the updated patches following hch's comments to fix 
  scsi-generic
  device breakage with find_image_format() and refresh_total_sectors().
  
  These are being resent as the last attachments where in MBOX format from 
  git-format-patch.
  
  Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org
 
 Thanks, applied all to the block branch, even though I forgot to reply here.
 
 Kevin

Hi Kevin,

Thanks for accepting the series.  There is one more piece of breakage
that Chris Krumme found in block.c:find_image_format() in the original
patch.  Please apply the patch to add the missing bdrv_delete() for the
SG_IO case below.

Thanks for pointing this out Chris!

Best,

--nab

[PATCH] [block]: Add missing bdrv_delete() for SG_IO BlockDriver in 
find_image_format()

This patch adds a missing bdrv_delete() call in find_image_format() so that a
SG_IO BlockDriver properly releases the temporary BlockDriverState *bs created
from bdrv_file_open()

Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org
Reported-by: Chris Krumme chris.kru...@windriver.com
---
 block.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 7a379dc..88dbc00 100644
--- a/block.c
+++ b/block.c
@@ -334,8 +334,10 @@ static BlockDriver *find_image_format(const char *filename)
 return NULL;

 /* Return the raw BlockDriver * to scsi-generic devices */
-if (bs-sg)
+if (bs-sg) {
+bdrv_delete(bs); 
 return bdrv_find_format(raw);
+}

 ret = bdrv_pread(bs, 0, buf, sizeof(buf));
 bdrv_delete(bs);
-- 
1.5.6.5




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Documentation/kvm/msr.txt

2010-05-27 Thread Glauber Costa
On Thu, May 27, 2010 at 11:15:43AM +0300, Avi Kivity wrote:
 On 05/26/2010 09:04 PM, Glauber Costa wrote:
 This patch adds a file that documents the usage of KVM-specific
 MSRs.
 
 
 Looks good.  A few comments:
 
 +
 +Custom MSR list
 +
 +
 +The current supported Custom MSR list is:
 +
 +MSR_KVM_WALL_CLOCK:  0x11
 +
 +data: physical address of a memory area.
 
 Which must be in guest RAM (i.e., don't point it somewhere random
 and expect the hypervisor to allocate it for you).
 
 Must be aligned to 4 bytes (we don't enforce it though).
I don't see the reason for it.

If this is a requirement, our own implementation
is failing to meet it.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Tejun Heo
Hello,

On 05/27/2010 03:12 PM, Michael S. Tsirkin wrote:
 I don't understand the reasons for this patch, but this doesn't matter.
 
 Depending on userspace application, driver can create a lot of work
 for a workqueue to handle. By making the workqueue thread
 belong in a cgroup, we make it possible to the CPU and other
 resources thus consumed.

Hmmm I don't really get it.  The unit of scheduling in workqueue
is a work.  Unless you're gonna convert every driver to use this
special kind of workqueue (and what happens when multiple tasks from
different cgroups share the driver?), I can't see how this is gonna be
useful.  If you really wanna impose cgroup control on workqueue items,
you'll have to do it per work item which might lead to the problem of
priority inversion.  Can you please describe what you're trying to do
in more detail?

Thank you.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Sridhar Samudrala
On Thu, 2010-05-27 at 14:44 +0200, Oleg Nesterov wrote:
 On 05/27, Michael S. Tsirkin wrote:
 
  On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
   Add a new kernel API to create a singlethread workqueue and attach it's
   task to current task's cgroup and cpumask.
  
   Signed-off-by: Sridhar Samudrala s...@us.ibm.com
 
  Could someone familiar with workqueue code please comment on whether
  this patch is suitable for 2.6.35?
 
  It is needed to fix the case where vhost user might cause a kernel
  thread to consume more CPU than allowed by the cgroup.
  Should I merge it through the vhost tree?
  Ack for this?
 
 I don't understand the reasons for this patch, but this doesn't matter.
 
 I don't really see any need to change workqueue.c,
 
   +static struct task_struct *get_singlethread_wq_task(struct 
   workqueue_struct *wq)
   +{
   + return (per_cpu_ptr(wq-cpu_wq, singlethread_cpu))-thread;
   +}
 
 (Not sure this trivial static helper with the single caller makes sense, but
  see below)
 
   +/* Create a singlethread workqueue and attach it's task to the current 
   task's
   + * cgroup and set it's cpumask to the current task's cpumask.
   + */
   +struct workqueue_struct 
   *create_singlethread_workqueue_in_current_cg(char *name)
   +{
   + struct workqueue_struct *wq;
   + struct task_struct *task;
   + cpumask_var_t mask;
   +
   + wq = create_singlethread_workqueue(name);
   + if (!wq)
   + goto out;
   +
   + if (!alloc_cpumask_var(mask, GFP_KERNEL))
   + goto err;
   + 
   + if (sched_getaffinity(current-pid, mask))
   + goto err;
   +
   + task = get_singlethread_wq_task(wq);
   + if (sched_setaffinity(task-pid, mask))
   + goto err;
   +
   + if (cgroup_attach_task_current_cg(task))
   + goto err;
   +out: 
   + return wq;
   +err:
   + destroy_workqueue(wq);
   + wq = NULL;
   + goto out;
   +}
 
 Instead, cgroup.c (or whoever needs this) can do
 
   struct move_struct {
   struct work_struct work;
   int ret;
   };
 
   static void move_func(struct work_struct *work)
   {
   struct move_struct *move = container_of(...);
 
   if (cgroup_attach_task_current_cg(current))

We are trying to attach the task associated with workqueue to the
current task's cgroup. So what we need is 
   cgroup_attach_task_current_cg(wq-task);

However there is no interface currently that exports the task associated
with a workqueue. It is hidden in cpu_workqueue_struct and is only 
accessible within workqueue.c.


   ret = -EANY;
   }
 
   static struct workqueue_struct 
 *create_singlethread_workqueue_in_current_cg(char *name)
   {
   struct workqueue_struct *wq;
   struct move_struct move = {
   .work = __WORK_INITIALIZER(move_func);
   };
 
   wq = create_singlethread_workqueue(name);
   if (!wq)
   return NULL;
 
   queue_work(move.work);
   flush_work(move.work);
 
   if (move.ret) {
   destroy_workqueue(wq);
   wq = NULL;
   }
 
   return wq;
   }
 
 Or. Just export wq_per_cpu() from workqueue.c (probably with a better name) 
 and
 use it like the patch does.
This requires that struct cpu_workqueue_struct and struct
workqueue_struct are made externally visible by moving them to
kernel/workqueue.h.

Instead what about adding the simple helper get_singlethread_wq_task()
in workqueue.c and exporting it.
I can add create_singlethread_workqueue_in_current_cg() to cgroup.c
using this helper routine.

Thanks
Sridhar


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM test: Add implementation of network based unattended installation

2010-05-27 Thread Lucas Meneghel Rodrigues
On Wed, 2010-05-19 at 17:20 +0800, Jason Wang wrote:
 This patch could let the unattended installation to be done through
 the following method:
 
 - cdrom: the original method which does the installation from cdrom
 - url: installing the linux guest from http or ftp, tree url was specified
 through url
 - nfs: installing the linux guest from nfs. the server address was
 specified through nfs_server, and the director was specified through
 nfs_dir
 
 For url and nfs installation, the extra_params need to be configurated
 to specify the location of unattended files:
 
 - If the unattended file in the tree is used, extra_parmas= append
   ks=floppy and unattended_file params need to be specified in the
   configuration file.
 - If the unattended file located at remote server is used,
   unattended_file option must be none and extram_params= append
   ks=http://xxx; need to be speficied in the configuration file and
   don't forget the add the finish nofitication part.
 
 The --kernel and --initrd were used directly for the network
 installation instead of the tftp/bootp param because the user mode
 network is too slow to do this.

Hi Jason, I had reviewed your patchset, implemented the suggestions I
pointed out, and I have one more thing we need to work out before we
commit this - We absolutely need to provide examples on the config file,
decently commented. So, could you please provide examples:

 * HTTP install using remote kickstart
 * NFS install using local kickstart

And re-send the patchset? I will send the patchset with my modifications
directly to you and will wait on your follow up.

Thanks!


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 06:15:54PM +0200, Tejun Heo wrote:
 Hello,
 
 On 05/27/2010 03:12 PM, Michael S. Tsirkin wrote:
  I don't understand the reasons for this patch, but this doesn't matter.
  
  Depending on userspace application, driver can create a lot of work
  for a workqueue to handle. By making the workqueue thread
  belong in a cgroup, we make it possible to the CPU and other
  resources thus consumed.
 
 Hmmm I don't really get it.  The unit of scheduling in workqueue
 is a work.

Yes. However, we use cgroups to limit when the workqueue itself is scheduled.
This affects all of work done on this workqueue, so it's a bit
of a blunt intrument. Thus we are not trying to apply this
to all drivers, we intend to start with vhost-net.

 Unless you're gonna convert every driver to use this
 special kind of workqueue (and what happens when multiple tasks from
 different cgroups share the driver?),

We'll then create a workqueue per task. Each workqueue will have the
right cgroup. But we are not trying to selve the problem for
every driver.

 I can't see how this is gonna be
 useful.  If you really wanna impose cgroup control on workqueue items,
 you'll have to do it per work item which might lead to the problem of
 priority inversion.

Exactly. cgroup is per-workqueue not per work item.
If driver wants to let administrators control priority
for different kinds of items separately, driver will have
to submit them to separate workqueues.

  Can you please describe what you're trying to do
 in more detail?
 
 Thank you.

vhost-net driver is under control from userspace,
it queues potentially a lot of work into the workqueue, which
might load the system beyond the cgroup limits.
And staying within cgroups limits is important for virtualization
where vhost is used.


 -- 
 tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 09:24:18AM -0700, Sridhar Samudrala wrote:
 On Thu, 2010-05-27 at 14:44 +0200, Oleg Nesterov wrote:
  On 05/27, Michael S. Tsirkin wrote:
  
   On Tue, May 18, 2010 at 05:04:51PM -0700, Sridhar Samudrala wrote:
Add a new kernel API to create a singlethread workqueue and attach it's
task to current task's cgroup and cpumask.
   
Signed-off-by: Sridhar Samudrala s...@us.ibm.com
  
   Could someone familiar with workqueue code please comment on whether
   this patch is suitable for 2.6.35?
  
   It is needed to fix the case where vhost user might cause a kernel
   thread to consume more CPU than allowed by the cgroup.
   Should I merge it through the vhost tree?
   Ack for this?
  
  I don't understand the reasons for this patch, but this doesn't matter.
  
  I don't really see any need to change workqueue.c,
  
+static struct task_struct *get_singlethread_wq_task(struct 
workqueue_struct *wq)
+{
+   return (per_cpu_ptr(wq-cpu_wq, singlethread_cpu))-thread;
+}
  
  (Not sure this trivial static helper with the single caller makes sense, but
   see below)
  
+/* Create a singlethread workqueue and attach it's task to the current 
task's
+ * cgroup and set it's cpumask to the current task's cpumask.
+ */
+struct workqueue_struct 
*create_singlethread_workqueue_in_current_cg(char *name)
+{
+   struct workqueue_struct *wq;
+   struct task_struct *task;
+   cpumask_var_t mask;
+
+   wq = create_singlethread_workqueue(name);
+   if (!wq)
+   goto out;
+
+   if (!alloc_cpumask_var(mask, GFP_KERNEL))
+   goto err;
+   
+   if (sched_getaffinity(current-pid, mask))
+   goto err;
+
+   task = get_singlethread_wq_task(wq);
+   if (sched_setaffinity(task-pid, mask))
+   goto err;
+
+   if (cgroup_attach_task_current_cg(task))
+   goto err;
+out:   
+   return wq;
+err:
+   destroy_workqueue(wq);
+   wq = NULL;
+   goto out;
+}
  
  Instead, cgroup.c (or whoever needs this) can do
  
  struct move_struct {
  struct work_struct work;
  int ret;
  };
  
  static void move_func(struct work_struct *work)
  {
  struct move_struct *move = container_of(...);
  
  if (cgroup_attach_task_current_cg(current))
 
 We are trying to attach the task associated with workqueue to the
 current task's cgroup. So what we need is 
cgroup_attach_task_current_cg(wq-task);
 
 However there is no interface currently that exports the task associated
 with a workqueue. It is hidden in cpu_workqueue_struct and is only 
 accessible within workqueue.c.
 
 
  ret = -EANY;
  }
  
  static struct workqueue_struct 
  *create_singlethread_workqueue_in_current_cg(char *name)
  {
  struct workqueue_struct *wq;
  struct move_struct move = {
  .work = __WORK_INITIALIZER(move_func);
  };
  
  wq = create_singlethread_workqueue(name);
  if (!wq)
  return NULL;
  
  queue_work(move.work);
  flush_work(move.work);
  
  if (move.ret) {
  destroy_workqueue(wq);
  wq = NULL;
  }
  
  return wq;
  }
  
  Or. Just export wq_per_cpu() from workqueue.c (probably with a better name) 
  and
  use it like the patch does.
 This requires that struct cpu_workqueue_struct and struct
 workqueue_struct are made externally visible by moving them to
 kernel/workqueue.h.
 
 Instead what about adding the simple helper get_singlethread_wq_task()
 in workqueue.c and exporting it.
 I can add create_singlethread_workqueue_in_current_cg() to cgroup.c
 using this helper routine.

Or to our driver, if that's more palatable.

 Thanks
 Sridhar
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Tejun Heo
Hello,

On 05/27/2010 06:39 PM, Michael S. Tsirkin wrote:
 Unless you're gonna convert every driver to use this
 special kind of workqueue (and what happens when multiple tasks from
 different cgroups share the driver?),
 
 We'll then create a workqueue per task. Each workqueue will have the
 right cgroup. But we are not trying to selve the problem for
 every driver.

Ah... I see.  You're gonna use multiple workqueues.  Once concern that
I have is that this is abuse of workqueue interface to certain level
and depends on the implementation detail of workqueue rather than its
intended usage model.  stop_machine() was a similar case and in the
end it was better served by a different mechanism built on kthread
directly (cpu_stop).  Wouldn't it be cleaner to use kthread directly
for your case too?  You're basically trying to use workqueue as a
frontend to kthread, so...

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Oleg Nesterov
What I am actually worried about is Tejun's rework, I am not sure
cmwq has the this thread services that wq property...

On 05/27, Sridhar Samudrala wrote:

 On Thu, 2010-05-27 at 14:44 +0200, Oleg Nesterov wrote:
 
  Instead, cgroup.c (or whoever needs this) can do
 
  struct move_struct {
  struct work_struct work;
  int ret;
  };
 
  static void move_func(struct work_struct *work)
  {
  struct move_struct *move = container_of(...);
 
  if (cgroup_attach_task_current_cg(current))

 We are trying to attach the task associated with workqueue to the
 current task's cgroup. So what we need is
cgroup_attach_task_current_cg(wq-task);

I do not see cgroup_attach_task_current_cg() in Linus's tree and thus I do
not now what exactly it does, and of course the code above is only template.

But I think this is easy. Just add struct cgroup *cgrp into move_struct
and then move_func() can do cgroup_attach_task(move-cgrp, current) ?

  Or. Just export wq_per_cpu() from workqueue.c (probably with a better name) 
  and
  use it like the patch does.
 This requires that struct cpu_workqueue_struct and struct
 workqueue_struct are made externally visible by moving them to
 kernel/workqueue.h.

no, no,

 Instead what about adding the simple helper get_singlethread_wq_task()
 in workqueue.c and exporting it.

Indeed, this is what I meant.

But. I disagree with get_singlethread_wq_task(). If we add this helper,
it should work with the multi-threaded wq's too, and needs the int cpu
parameter, ignored when is_wq_single_threaded().

So. Either rename wq_per_cpu() and export it (once again, I do not
mean we should move the body to workqueue.h!), or create the new
helper which just calls wq_per_cpu().

 I can add create_singlethread_workqueue_in_current_cg() to cgroup.c
 using this helper routine.

Imho, this is better.

But please note that it is possible to do without any changes in
workqueue.[ch] afaics, see above.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Michael S. Tsirkin
On Thu, May 27, 2010 at 06:56:20PM +0200, Tejun Heo wrote:
 Hello,
 
 On 05/27/2010 06:39 PM, Michael S. Tsirkin wrote:
  Unless you're gonna convert every driver to use this
  special kind of workqueue (and what happens when multiple tasks from
  different cgroups share the driver?),
  
  We'll then create a workqueue per task. Each workqueue will have the
  right cgroup. But we are not trying to selve the problem for
  every driver.
 
 Ah... I see.  You're gonna use multiple workqueues.  Once concern that
 I have is that this is abuse of workqueue interface to certain level
 and depends on the implementation detail of workqueue rather than its
 intended usage model.

Well, this is why I proposed adding a new API for creating
workqueue within workqueue.c, rather than exposing the task
and attaching it to cgroups in our driver: so that workqueue
maintainers can fix the implementation if it ever changes.

And after all, it's an internal API, we can always change
it later if we need.

 stop_machine() was a similar case and in the
 end it was better served by a different mechanism built on kthread
 directly (cpu_stop).  Wouldn't it be cleaner to use kthread directly
 for your case too?  You're basically trying to use workqueue as a
 frontend to kthread, so...
 
 Thanks.

Well, yes but we are using APIs like flush_work etc. These are very
handy.  It seems much easier than rolling our own queue on top of kthread.

Makes sense?

 -- 
 tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Add Documentation/kvm/msr.txt

2010-05-27 Thread Glauber Costa
This patch adds a file that documents the usage of KVM-specific
MSRs.

[ v2: added comments from Randy ]
[ v2: added comments from Avi ]

Signed-off-by: Glauber Costa glom...@redhat.com
---
 Documentation/kvm/msr.txt |  152 +
 1 files changed, 152 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/msr.txt

diff --git a/Documentation/kvm/msr.txt b/Documentation/kvm/msr.txt
new file mode 100644
index 000..8cdcb6d
--- /dev/null
+++ b/Documentation/kvm/msr.txt
@@ -0,0 +1,152 @@
+KVM-specific MSRs.
+Glauber Costa glom...@redhat.com, Red Hat Inc, 2010
+=
+
+KVM makes use of some custom MSRs to service some requests.
+At present, this facility is only used by kvmclock.
+
+Custom MSRs have a range reserved for them, that goes from
+0x4b564d00 to 0x4b564dff. There are MSRs outside this area,
+but they are deprecated and their use is discouraged.
+
+Custom MSR list
+
+
+The current supported Custom MSR list is:
+
+MSR_KVM_WALL_CLOCK_NEW:   0x4b564d00
+
+   data: physical address of a memory area which must be in guest RAM.
+   This memory is expected to hold a copy of the following structure:
+
+   struct pvclock_wall_clock {
+   u32   version;
+   u32   sec;
+   u32   nsec;
+   } __attribute__((__packed__));
+
+   whose data will be filled in by the hypervisor. The hypervisor is only 
+   guaranteed to update this data at the moment of MSR write.
+   Users that want to reliably query this information more than once have
+   to write more than once to this MSR. Fields have the following meanings:
+
+   version: guest has to check version before and after grabbing
+   time information and check that they are both equal and even.
+   An odd version indicates an in-progress update.
+
+   sec: number of seconds for wallclock.
+
+   nsec: number of nanoseconds for wallclock.
+
+   Note that although MSRs are per-CPU entities, the effect of this
+   particular MSR is global.
+
+   Availability of this MSR must be checked via bit 3 in 0x401 cpuid
+   leaf prior to usage. 
+
+MSR_KVM_SYSTEM_TIME_NEW:  0x4b564d01
+
+   data: 4-byte aligned physical address of a memory area which must be in
+   guest RAM, plus an enable bit in bit 0. This memory is expected to hold
+   a copy of the following structure:
+
+   struct pvclock_vcpu_time_info {
+   u32   version;
+   u32   pad0;
+   u64   tsc_timestamp;
+   u64   system_time;
+   u32   tsc_to_system_mul;
+   s8tsc_shift;
+   u8flags;
+   u8pad[2];
+   } __attribute__((__packed__)); /* 32 bytes */
+
+   whose data will be filled in by the hypervisor periodically. Only one
+   write, or registration, is needed for each VCPU. The interval between
+   updates of this structure is arbitrary and implementation-dependent.
+   The hypervisor may update this structure at any time it sees fit until
+   anything with bit0 == 0 is written to it.
+
+   Fields have the following meanings:
+
+   version: guest has to check version before and after grabbing
+   time information and check that they are both equal and even.
+   An odd version indicates an in-progress update.
+
+   tsc_timestamp: the tsc value at the current VCPU at the time
+   of the update of this structure. Guests can subtract this value
+   from current tsc to derive a notion of elapsed time since the
+   structure update.
+
+   system_time: a host notion of monotonic time, including sleep
+   time at the time this structure was last updated. Unit is
+   nanoseconds.
+
+   tsc_to_system_mul: a function of the tsc frequency. One has
+   to multiply any tsc-related quantity by this value to get
+   a value in nanoseconds, besides dividing by 2^tsc_shift
+
+   tsc_shift: cycle to nanosecond divider, as a power of two, to
+   allow for shift rights. One has to shift right any tsc-related
+   quantity by this value to get a value in nanoseconds, besides
+   multiplying by tsc_to_system_mul.
+
+   With this information, guests can derive per-CPU time by
+   doing:
+
+   time = (current_tsc - tsc_timestamp)
+   time = (time * tsc_to_system_mul)  tsc_shift 
+   time = time + system_time
+
+   flags: bits in this field indicate extended capabilities
+   coordinated between the guest and the hypervisor. Availability
+   of specific flags has to be checked in 0x4001 cpuid leaf.
+   

Re: Clock jumps

2010-05-27 Thread john stultz
On Thu, May 27, 2010 at 11:32 AM, Bernhard Schmidt be...@birkenwald.de wrote:
 Alexander Graf ag...@suse.de wrote:
 Do you have ntpd running inside the guest? I have a bug report lying
 around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
 https://bugzilla.novell.com/show_bug.cgi?id=582260

 I want to chime in here, I have a very similar problem, but not with
 ntpd in the guest.

 The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
 was a Debian Lenny with a custom 2.6.33 host kernel and a custom
 qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.

 The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
 (2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
 kernels, but one system has i386 userland.

 With this setup once in a while (maybe every other week) one VM would
 have a sudden clock jump, 6-12 hours into the future. No kernel messages
 or other log entries than applications complaining about the clock jump
 after the fact. Other VMs were unaffected.

 Yesterday I did an upgrade to Debian Squeeze. This involved a new
 qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
 kernels from 2.6.33 to 2.6.33.4.

 First of all, after the reboot the host clock was totally unreliable. I
 had a constant skew of up to five seconds per minute in the host clock,
 which of course affected the VMs as well.  This problem went away when I
 changed from tsc into hpet on the host. The host does CPU frequency
 scaling which is, as far as I know, known to affect TSC stability. I
 think I remember messages about tsc being unstable in earlier boots,
 maybe the detection did just not work this time.

I'd be very interested in hearing more about the host side issue. So
this happened with the same kernel that you were using before, with no
trouble?

Could you also send dmesg output from this boot? And if you can find
any older dmesg logs to compare with, send those too?

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] ceph/rbd block driver for qemu-kvm (v2)

2010-05-27 Thread Christian Brunner
Hi,

Based on the review notes Blue Swirl sent us after my last mail, Yehuda
cleaned up the header files. The patch is much smaller now and I hope
that you accept it for inclusion.

To build it, you will need the testing (or unstable) git head of ceph 
now. The required header files will be part of the next release of 
ceph (0.20.3).

In case you didn't read my last posting, here is the short description
again:

This patch is a block driver for the distributed file system Ceph
(http://ceph.newdream.net/). Ceph was included in the Linux v2.6.34
kernel. However, this driver uses librados (which is part of the Ceph
server) for direct access to the Ceph object store and is running entirely
in userspace. Therefore it is called rbd - rados block device.

The basic idea is to stripe a VM block device over (by default) 4MB objects
stored in the Ceph distributed object store.  This is very similar to what
the sheepdog project is doing, but uses the ceph server as a storage backend.
If you don't plan on using the entire ceph filesystem you may leave out the
metadata service of ceph.

Yehuda Sadeh helped me with the implementation and put some additional
usage information on the Ceph-Wiki (http://ceph.newdream.net/wiki/Kvm-rbd).
He has also written a Linux kernel driver to make an rbd image accessible as
a block device.

Regards,
Christian

---
 Makefile  |3 +
 Makefile.objs |1 +
 block/rbd.c   |  584 +
 block/rbd_types.h |   52 +
 configure |   27 +++
 5 files changed, 667 insertions(+), 0 deletions(-)
 create mode 100644 block/rbd.c
 create mode 100644 block/rbd_types.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Documentation/kvm/msr.txt

2010-05-27 Thread Zachary Amsden

On 05/27/2010 06:02 AM, Glauber Costa wrote:

On Thu, May 27, 2010 at 11:15:43AM +0300, Avi Kivity wrote:
   

On 05/26/2010 09:04 PM, Glauber Costa wrote:
 

This patch adds a file that documents the usage of KVM-specific
MSRs.

   

Looks good.  A few comments:

 

+
+Custom MSR list
+
+
+The current supported Custom MSR list is:
+
+MSR_KVM_WALL_CLOCK:  0x11
+
+   data: physical address of a memory area.
   

Which must be in guest RAM (i.e., don't point it somewhere random
and expect the hypervisor to allocate it for you).

Must be aligned to 4 bytes (we don't enforce it though).
 

I don't see the reason for it.

If this is a requirement, our own implementation
is failing to meet it.
   


It's so the atomic write actually is atomic.  Stating a 4 -byte 
alignment requirement prevents the wall clock from crossing a page boundary.


Zach

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Documentation/kvm/msr.txt

2010-05-27 Thread Glauber Costa
On Thu, May 27, 2010 at 10:13:12AM -1000, Zachary Amsden wrote:
 On 05/27/2010 06:02 AM, Glauber Costa wrote:
 On Thu, May 27, 2010 at 11:15:43AM +0300, Avi Kivity wrote:
 On 05/26/2010 09:04 PM, Glauber Costa wrote:
 This patch adds a file that documents the usage of KVM-specific
 MSRs.
 
 Looks good.  A few comments:
 
 +
 +Custom MSR list
 +
 +
 +The current supported Custom MSR list is:
 +
 +MSR_KVM_WALL_CLOCK:  0x11
 +
 +  data: physical address of a memory area.
 Which must be in guest RAM (i.e., don't point it somewhere random
 and expect the hypervisor to allocate it for you).
 
 Must be aligned to 4 bytes (we don't enforce it though).
 I don't see the reason for it.
 
 If this is a requirement, our own implementation
 is failing to meet it.
 
 It's so the atomic write actually is atomic. 
Which atomic write? This is the wallclock, we do no atomic writes for
querying it. Not to confuse with system time (the other msr).

 Stating a 4 -byte
 alignment requirement prevents the wall clock from crossing a page
 boundary.

Yes, but why require it?

reading the wallclock is not a hot path for anybody, is usually done
just once, and crossing a page boundary here does not pose any correctness
issue.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Documentation/kvm/msr.txt

2010-05-27 Thread Zachary Amsden

On 05/27/2010 10:36 AM, Glauber Costa wrote:

On Thu, May 27, 2010 at 10:13:12AM -1000, Zachary Amsden wrote:
   

On 05/27/2010 06:02 AM, Glauber Costa wrote:
 

On Thu, May 27, 2010 at 11:15:43AM +0300, Avi Kivity wrote:
   

On 05/26/2010 09:04 PM, Glauber Costa wrote:
 

This patch adds a file that documents the usage of KVM-specific
MSRs.

   

Looks good.  A few comments:

 

+
+Custom MSR list
+
+
+The current supported Custom MSR list is:
+
+MSR_KVM_WALL_CLOCK:  0x11
+
+   data: physical address of a memory area.
   

Which must be in guest RAM (i.e., don't point it somewhere random
and expect the hypervisor to allocate it for you).

Must be aligned to 4 bytes (we don't enforce it though).
 

I don't see the reason for it.

If this is a requirement, our own implementation
is failing to meet it.
   

It's so the atomic write actually is atomic.
 

Which atomic write? This is the wallclock, we do no atomic writes for
querying it. Not to confuse with system time (the other msr).

   

Stating a 4 -byte
alignment requirement prevents the wall clock from crossing a page
boundary.
 

Yes, but why require it?

reading the wallclock is not a hot path for anybody, is usually done
just once, and crossing a page boundary here does not pose any correctness
issue.
   


Little-endian non-atomic page crossing writes will write the small part 
of the wallclock first, so another CPU may observe the following 
wallclock sequence:


0x01ff .. 0x0100 .. 0x0200

Big-endian writes also have similar failure:

0x01ff .. 0x02ff .. 0x0200

This won't happen if there is a single instruction write of the wall 
clock word.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Tejun Heo
Hello, Michael.

On 05/27/2010 07:32 PM, Michael S. Tsirkin wrote:
 Well, this is why I proposed adding a new API for creating
 workqueue within workqueue.c, rather than exposing the task
 and attaching it to cgroups in our driver: so that workqueue
 maintainers can fix the implementation if it ever changes.
 
 And after all, it's an internal API, we can always change
 it later if we need.
...
 Well, yes but we are using APIs like flush_work etc. These are very
 handy.  It seems much easier than rolling our own queue on top of kthread.

The thing is that this kind of one-off usage becomes problemetic when
you're trying to change the implementation detail.  All current
workqueue users don't care which thread they run on and they shouldn't
as each work owns the context only for the duration the work is
executing.  If this sort of fundamental guidelines are followed, the
implementation can be improved in pretty much transparent way but when
you start depending on specific implementation details, things become
messy pretty quickly.

If this type of usage were more common, adding proper way to account
work usage according to cgroups would make sense but that's not the
case here and I removed the only exception case recently while trying
to implement cmwq and if this is added.  So, this would be the only
one which makes such extra assumptions in the whole kernel.  One way
or the other, workqueue needs to be improved and I don't really think
adding the single exception at this point is a good idea.

The thing I realized after stop_machine conversion was that there was
no reason to use workqueue there at all.  There already are more than
enough not-too-difficult synchronization constructs and if you're
using a thread for dedicated purposes, code complexity isn't that
different either way.  Plus, it would also be clearer that dedicated
threads are required there for what reason.  So, I strongly suggest
using a kthread.  If there are issues which are noticeably difficult
to solve with kthread, we can definitely talk about that and think
about things again.

Thank you.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Zachary Amsden

On 05/27/2010 08:32 AM, Bernhard Schmidt wrote:

Alexander Grafag...@suse.de  wrote:

Hi,

   

Do you have ntpd running inside the guest? I have a bug report lying
around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
https://bugzilla.novell.com/show_bug.cgi?id=582260
 

I want to chime in here, I have a very similar problem, but not with
ntpd in the guest.

The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
was a Debian Lenny with a custom 2.6.33 host kernel and a custom
qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.

The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
(2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
kernels, but one system has i386 userland.

With this setup once in a while (maybe every other week) one VM would
have a sudden clock jump, 6-12 hours into the future. No kernel messages
or other log entries than applications complaining about the clock jump
after the fact. Other VMs were unaffected.

Yesterday I did an upgrade to Debian Squeeze. This involved a new
qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
kernels from 2.6.33 to 2.6.33.4.

First of all, after the reboot the host clock was totally unreliable. I
had a constant skew of up to five seconds per minute in the host clock,
which of course affected the VMs as well.  This problem went away when I
changed from tsc into hpet on the host. The host does CPU frequency
scaling which is, as far as I know, known to affect TSC stability. I
think I remember messages about tsc being unstable in earlier boots,
maybe the detection did just not work this time.

Worse, the clock jump issues in the guest appeared much more often than
before. The higher loaded VMs did not survive ten minutes without
jumping several hours ahead.

Situation has stabilized after setting clocksource hpet in the guest
immediately after boot. So it seems kvm-clock has some issues here.

I've seen a preliminary patch floating around on the ML by Zachary
Amsden, but I haven't tried it yet. It talks of backward warps, but so
far I've only seen forward warps (the VM time suddenly jumps into the
future), so it might be unrelated.
   


I have an AMD Turion TL-52 machine with unreliable TSC.  It varies with 
CPU frequency, which is okay, we can compensate for that, but worse, it 
has broken clocking when in C1E idle.  Apparently, it divides down the 
clock input to an idle core, so it only runs at 1/16 or whatever of the 
rate, and adds a multiplier to the TSC increment, so it scales by 16 
instead of by 1 (whatever the actual numbers are I don't know, but this 
illustrates the point).  When it wakes up to service a cache probe from 
another core, it now runs with full clock rate ... and still uses the 
multiplier for the TSC increment.


The effect is that idle CPUs have TSC which may increase faster than 
running CPUs.  Given time, this delta can add to a very large number (in 
theory, it's a random walk, but it can go very far off).  If a VM is 
running on this CPU and happens to match the idle pattern without 
switching CPUs, time can effectively run accelerated on that VM, and 
very rapidly things are going to get confused.


Newer kernels should detect the host clock being unreliable quite 
quickly; my F13 machine detects it right away at boot.


I have server side fixes for this kvm-clock which seem to give me a 
stable clock on this machine, but for true SMP stability, you will need 
Glauber's guest side changes to kvmclock as well.  It is impossible to 
guarantee strictly monotonic clocksource across multiple CPUs when 
frequency is dynamically changing (and also because of the C1E idle 
problems).


There is one remaining problem to fix, the reset of TSC on reboot in SMP 
will destabilize the TSCs again, but now I've actually got VMs running 
again (different bug), that shouldn't be long.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Bernhard Schmidt

On 27.05.2010 21:08, john stultz wrote:

Hi John,


I'd be very interested in hearing more about the host side issue. So
this happened with the same kernel that you were using before, with no
trouble?


Correct.


Could you also send dmesg output from this boot? And if you can find
any older dmesg logs to compare with, send those too?


See http://users.birkenwald.de/~berni/temp/dmesg-lenny and 
http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running 
on the same kernel binary the initrd changed greatly when upgrading, so 
ordering/timing between those two is off.


Note that the dmesg output is captured right after boot. I think I 
remember seeing a TSC unstable message pretty soon after boot, but I 
might be mixing it up with my other AMD-based KVM server. I don't hold 
normal (non-boot) logs that long, so I can't tell for sure.


If you need any more info feel free to contact me.

Bernhard
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Bernhard Schmidt

On 27.05.2010 23:53, Zachary Amsden wrote:

Hello Zachary,


I have server side fixes for this kvm-clock which seem to give me a
stable clock on this machine, but for true SMP stability, you will need
Glauber's guest side changes to kvmclock as well. It is impossible to
guarantee strictly monotonic clocksource across multiple CPUs when
frequency is dynamically changing (and also because of the C1E idle
problems).


Is all this relevant only when the host is on TSC? Because I have seen 
these jumps when the host was on HPET and the guests were using kvm-clock.


Anyway, can you send me both patches? I'd like to try it, but I have 
completely lost track where the up-to-date patches are.


Bernhard
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Zachary Amsden

On 05/27/2010 12:12 PM, Bernhard Schmidt wrote:

On 27.05.2010 23:53, Zachary Amsden wrote:

Hello Zachary,


I have server side fixes for this kvm-clock which seem to give me a
stable clock on this machine, but for true SMP stability, you will need
Glauber's guest side changes to kvmclock as well. It is impossible to
guarantee strictly monotonic clocksource across multiple CPUs when
frequency is dynamically changing (and also because of the C1E idle
problems).


Is all this relevant only when the host is on TSC? Because I have seen 
these jumps when the host was on HPET and the guests were using 
kvm-clock.


Anyway, can you send me both patches? I'd like to try it, but I have 
completely lost track where the up-to-date patches are.


There's more than two, there's quite a bit, I'll send it to the list soon.

Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Zachary Amsden

On 05/27/2010 12:12 PM, Bernhard Schmidt wrote:

On 27.05.2010 23:53, Zachary Amsden wrote:

Hello Zachary,


I have server side fixes for this kvm-clock which seem to give me a
stable clock on this machine, but for true SMP stability, you will need
Glauber's guest side changes to kvmclock as well. It is impossible to
guarantee strictly monotonic clocksource across multiple CPUs when
frequency is dynamically changing (and also because of the C1E idle
problems).


Is all this relevant only when the host is on TSC? Because I have seen 
these jumps when the host was on HPET and the guests were using 
kvm-clock.


It doesn't matter what the host uses (although the host on TSC with 
unstable TSC can make things worse), tsc and kvmclock sources in the 
guest will be unstable regardless.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL net-2.6] vhost-net: error handling fixes

2010-05-27 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Thu, 27 May 2010 14:57:14 +0300

 David,
 The following tree includes fixes dealing with error handling
 in vhost-net. It is on top of net-2.6.
 Please merge it for 2.6.35.
 Thanks!
 
 The following changes since commit 8a74ad60a546b13bd1096b2a61a7a5c6fd9ae17c:
 
   net: fix lock_sock_bh/unlock_sock_bh (2010-05-27 00:30:53 -0700)
 
 are available in the git repository at:
   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

Pulled, thanks Michael.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread john stultz
On Thu, 2010-05-27 at 23:48 +0200, Bernhard Schmidt wrote:
 On 27.05.2010 21:08, john stultz wrote:
  I'd be very interested in hearing more about the host side issue. So
  this happened with the same kernel that you were using before, with no
  trouble?
 
 Correct.
 
  Could you also send dmesg output from this boot? And if you can find
  any older dmesg logs to compare with, send those too?
 
 See http://users.birkenwald.de/~berni/temp/dmesg-lenny and 
 http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running 
 on the same kernel binary the initrd changed greatly when upgrading, so 
 ordering/timing between those two is off.
 
 Note that the dmesg output is captured right after boot. I think I 
 remember seeing a TSC unstable message pretty soon after boot, but I 
 might be mixing it up with my other AMD-based KVM server. I don't hold 
 normal (non-boot) logs that long, so I can't tell for sure.

Looking at the diff:
--- dmesg-lenny 2010-05-27 16:45:33.0 -0700
+++ dmesg-squeeze   2010-05-27 16:46:14.0 -0700
@@ -132,8 +132,8 @@
 console [ttyS1] enabled
 hpet clockevent registered
 Fast TSC calibration using PIT
-Detected 2660.398 MHz processor.
-Calibrating delay loop (skipped), value calculated using timer frequency.. 
5320.79 BogoMIPS (lpj=10641592)
+Detected 2613.324 MHz processor.
+Calibrating delay loop (skipped), value calculated using timer frequency.. 
5226.64 BogoMIPS (lpj=10453296)
 Security Framework initialized
 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
 Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
@@ -160,7 +160,7 @@
 CPU0: Intel(R) Xeon(R) CPU3075  @ 2.66GHz stepping 0b
 Booting Node   0, Processors  #1
 Brought up 2 CPUs
-Total of 2 processors activated (10640.79 BogoMIPS).
+Total of 2 processors activated (10546.63 BogoMIPS).
 NET: Registered protocol family 16
 ACPI: bus type pci registered
 PCI: MMCONFIG for domain  [bus 00-ff] at [mem 0xe000-0xefff] (base 
0xe000)

So you can see in the above the during the second boot the TSC
calibration was badly mis-calculated. This was the cause of the skew.

Not sure how that might be linked to the distro upgrade. It could have
been something like SMI damage during the calibration time, but I
thought the calibration loop watched for that.

Bernhard: I expect with all those vms, this machine isn't rebooted
frequently. So could you look at the logs to see how much the  Detected
.yyy MHz processor. line varies by across a few other boots (if
they still exist?).

Ingo/Thomas: Any thoughts, should we be considering dropping the
quick_pit_calibrate() code and always do the slower route?

thanks
-john

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Documentation/kvm/msr.txt

2010-05-27 Thread Glauber Costa
On Thu, May 27, 2010 at 11:02:35AM -1000, Zachary Amsden wrote:
 On 05/27/2010 10:36 AM, Glauber Costa wrote:
 On Thu, May 27, 2010 at 10:13:12AM -1000, Zachary Amsden wrote:
 On 05/27/2010 06:02 AM, Glauber Costa wrote:
 On Thu, May 27, 2010 at 11:15:43AM +0300, Avi Kivity wrote:
 On 05/26/2010 09:04 PM, Glauber Costa wrote:
 This patch adds a file that documents the usage of KVM-specific
 MSRs.
 
 Looks good.  A few comments:
 
 +
 +Custom MSR list
 +
 +
 +The current supported Custom MSR list is:
 +
 +MSR_KVM_WALL_CLOCK:  0x11
 +
 +data: physical address of a memory area.
 Which must be in guest RAM (i.e., don't point it somewhere random
 and expect the hypervisor to allocate it for you).
 
 Must be aligned to 4 bytes (we don't enforce it though).
 I don't see the reason for it.
 
 If this is a requirement, our own implementation
 is failing to meet it.
 It's so the atomic write actually is atomic.
 Which atomic write? This is the wallclock, we do no atomic writes for
 querying it. Not to confuse with system time (the other msr).
 
 Stating a 4 -byte
 alignment requirement prevents the wall clock from crossing a page
 boundary.
 Yes, but why require it?
 
 reading the wallclock is not a hot path for anybody, is usually done
 just once, and crossing a page boundary here does not pose any correctness
 issue.
 
 Little-endian non-atomic page crossing writes will write the small
 part of the wallclock first, so another CPU may observe the
 following wallclock sequence:
 
 0x01ff .. 0x0100 .. 0x0200
 
 Big-endian writes also have similar failure:
 
 0x01ff .. 0x02ff .. 0x0200
 
 This won't happen if there is a single instruction write of the wall
 clock word.

We already specify that users can only trust the value of the wallclock
after we have an even version field.
When we start the update, and during the time of all writes to it,
it is odd, and thus, invalid.
The ABI guarantees to the guest that we'll only bump version
after we're done updating.

So why bother?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: optimizations when tdp is in use

2010-05-27 Thread Gui Jianfeng
Marcelo Tosatti wrote:
 On Thu, May 27, 2010 at 04:06:34PM +0800, Gui Jianfeng wrote:
 In case of using tdp, checking write protected page isn't needed and
 quadrant also no need to be calculated.

 Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
 ---
  arch/x86/kvm/mmu.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 0bb9f17..ce4bbd3 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -495,10 +495,13 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
 large_gfn)
  max_level = kvm_x86_ops-get_lpage_level()  host_level ?
  kvm_x86_ops-get_lpage_level() : host_level;
  
 +if (tdp_enabled)
 +goto done;
 +
 
 This is wrong. write_count is initialized for alignment purposes, not 
 only write protected pages. See __kvm_set_memory_region in
 virt/kvm/kvm_main.c.

thanks, avi also pointed this out.

Gui

 
  for (level = PT_DIRECTORY_LEVEL; level = max_level; ++level)
  if (has_wrprotected_page(vcpu-kvm, large_gfn, level))
  break;
 -
 +done:
  return level - 1;
  }
  
 @@ -1346,7 +1349,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
 kvm_vcpu *vcpu,
  if (role.direct)
  role.cr4_pae = 0;
  role.access = access;
 -if (vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
 +if (!tdp_enabled  vcpu-arch.mmu.root_level = PT32_ROOT_LEVEL) {
  quadrant = gaddr  (PAGE_SHIFT + (PT64_PT_BITS * level));
  quadrant = (1  ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
  role.quadrant = quadrant;
 -- 
 1.6.5.2
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

-- 
Regards
Gui Jianfeng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >