Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread padmanabh ratnakar
On Tue, Jun 7, 2011 at 4:04 AM, Chris Wright chr...@sous-sol.org wrote:
 * Alex Williamson (alex.william...@redhat.com) wrote:
 On Mon, 2011-06-06 at 14:39 +0530, padmanabh ratnakar wrote:
  Hi,
          I am using linux kernel 2.6.39. I have a IBM x3650 M3 system.
  I have used following boot options -
  intel_iommu=on iommu=pt
 
  I was loading/unloading my NIC driver(be2net) with num_vfs=7.
 
  After some iterations I get following DMAR errors -
  Jun  4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason
  2d on CPU 0.
  Jun  4 03:50:20 rhel6 kernel: Do you have a strange power saving mode 
  enabled?
  Jun  4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue
  Jun  4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2
  Jun  4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2]
  fault addr 78077000
  Jun  4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in
  context entry is clear
 
  I was trying to debug this. I dont understand iommu code much.
  The physical address belongs the printed PCI function and there should
  not have been an error.
 
  I am unable to see pci_dev(pdev) of VFs getting removed from
  si_domain-devices list(intel-iommu.c)
  when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs.
  Looks like issue happens when when freed pdev is allocated again and
  as it is already in list,
  required initializations dont happen.
 
  I dont know if my understanding is correct. Can anyone point me to
  what the issue may be?

 Yes, that's correct.  The (now replaced) check identity_mapping()
 will succeed when the pci_dev is recycled (it's freed, but never
 removed from the list, this is an issue with passtrhough mode and device
 creation/desctruction).  This false match happens w/ a brand new pci_dev
 which still has default 32bit DMA mask, so it is removed from pt domain.
 During removal domain_remove_one_dev_info() test that matches only
 on bus/devfn (now also segment) will match despite the fact that the
 info-pdev != pdev-dev.archdata.iommu.  Then...Oops

 Typically devices are removed from the domain via
 drivers/pci/intel-iommu.c:device_notifier(), which is called as the
 device is unbound from the driver.  However, this seems to get skipped
 when running in passthrough mode, so I'm not sure where that's supposed
 to occur.  Does it happen w/o passthrough?

I had tried without passthrough on RHEL 6.1 GA kernel. Was seeing
hangs and panics. Will check if non passthrough mode works on latest kernel.

 If you blacklist the driver then a create/delete may do similar (haven't
 tested that idea).

 Also note that some
 intel-iommu fixes have rolled into 3.0.0-rc2, you might want to update
 and see if anything is better there.  Thanks,

 The change in identity_mapping() means we won't demote to 32-bit DMA
 (drop out of pt domain), so I don't think we'll see the same issue.

For testing I had made a hack in 2.6.39 kernel which will prevent
demoting to 32bit DMA mask
and thereby prevent calling of domain_remove_one_dev_info() for the
specific VF device I was using
and it had worked.
So as you said I may not hit the issue in latest kernel. Will try that.

 thanks,
 -chris


Thanks for the response and suggestions.
Padmanabh
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Fix build warnings

2011-06-07 Thread Borislav Petkov
On Tue, May 31, 2011 at 12:26:55PM +0200, Ingo Molnar wrote:
 
 * Avi Kivity a...@redhat.com wrote:
 
  On 05/31/2011 10:38 AM, Ingo Molnar wrote:
  * Borislav Petkovb...@alien8.de  wrote:
  
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -121,7 +121,7 @@ static int FNAME(walk_addr_generic)(struct 
   guest_walker *walker,
gva_t addr, u32 access)
 {
pt_element_t pte;
-   pt_element_t __user *ptep_user;
+   pt_element_t __user *uninitialized_var(ptep_user);
  
  Note that doing this is actually actively dangerous for two reasons.
  
  
  
  snip lots of good advice
  
   Please fix it instead.
  
  s/instead/in addition/; while all those changes are good, they are 
  much too large for 3.0.  Let's push the simple fix for 3.0 and 
  queue the bigger refactoring to 3.1.
 
 Yeah, that's probably wise, this is a tricky function.

So, any progress on this front? Warning is still there in -rc2.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Fix build warnings

2011-06-07 Thread Avi Kivity

On 06/07/2011 10:28 AM, Borislav Petkov wrote:

So, any progress on this front? Warning is still there in -rc2.



Thanks for the reminder, applied and queued.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] Graphics card pass-through working with two pass pci-initialization

2011-06-07 Thread Jan Kiszka
On 2011-06-06 08:30, Gerd Hoffmann wrote:
Hi,
 
 As Jan points out though, is a dynamic PCI region really needed?
 Those that need a large PCI region are also likely to need a large
 amount of memory.  Maybe the space for PCI should just be increased.
 
 Just changing it will not work as it will break live migration.

Changing logic in the BIOS won't break migration (the active BIOS is
included in the migration of RAM, current mappings are part of the
device states). Changing the 4G mapping in qemu's hw/pc_piix.c would
break it and needs to be coupled to the machine version.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it

2011-06-07 Thread Avi Kivity

On 06/07/2011 01:04 AM, Jan Kiszka wrote:

On 2011-06-06 23:48, Alex Williamson wrote:
  On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
  From: Jan Kiszkajan.kis...@siemens.com

  At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
  reset on an assigned device and corrupt its config space. Prevent
  this by checking for a host kernel with the required support, tagged by
  the to-be-introduced KVM_CAP_DEVICE_RESET.

  Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I guess
  we don't have an option to do that for .38 since stable is done there,
  but there are also some intel-iommu breakages that won't make stable for
  that release.  It seems like the userspace invoked reset resolves known,
  demonstrable issues of devices continuing to DMA into guest memory while
  ed78661f is mostly a theoretical change.

Easier would be this patch. But I don't mind reverting the problematic
commit in 39, whatever is preferred. We should just resolve the issue
finally.


Kernel problems should be solved in the kernel (with exceptions of 
course, but don't see the need here).



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it

2011-06-07 Thread Jan Kiszka
On 2011-06-07 10:06, Avi Kivity wrote:
 On 06/07/2011 01:04 AM, Jan Kiszka wrote:
 On 2011-06-06 23:48, Alex Williamson wrote:
   On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
   From: Jan Kiszkajan.kis...@siemens.com
 
   At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
   reset on an assigned device and corrupt its config space. Prevent
   this by checking for a host kernel with the required support,
 tagged by
   the to-be-introduced KVM_CAP_DEVICE_RESET.
 
   Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I
 guess
   we don't have an option to do that for .38 since stable is done there,
   but there are also some intel-iommu breakages that won't make
 stable for
   that release.  It seems like the userspace invoked reset resolves
 known,
   demonstrable issues of devices continuing to DMA into guest memory
 while
   ed78661f is mostly a theoretical change.

 Easier would be this patch. But I don't mind reverting the problematic
 commit in 39, whatever is preferred. We should just resolve the issue
 finally.
 
 Kernel problems should be solved in the kernel (with exceptions of
 course, but don't see the need here).

Then please file a revert for stable ASAP.

Jan



signature.asc
Description: OpenPGP digital signature


Re: KVM: VMX: do not overwrite uptodate vcpu-arch.cr3 on KVM_SET_SREGS

2011-06-07 Thread Avi Kivity

On 06/06/2011 08:27 PM, Marcelo Tosatti wrote:

Only decache guest CR3 value if vcpu-arch.cr3 is stale.
Fixes loadvm with live guest.


@@ -2049,7 +2049,9 @@ static void ept_update_paging_mode_cr0(unsigned long 
*hw_cr0,
unsigned long cr0,
struct kvm_vcpu *vcpu)
  {
-   vmx_decache_cr3(vcpu);
+
+   if (!test_bit(VCPU_EXREG_CR3, (ulong *)vcpu-arch.regs_avail))
+   vmx_decache_cr3(vcpu);
if (!(cr0  X86_CR0_PG)) {
/* From paging/starting to nonpaging */
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,


Applied and queued, but I think there is something rotten here.  How 
does arch.cr3 get into GUEST_CR3 after KVM_SET_SREGS?  arch.cr3 is a 
supposed to be write-through cache - it only has a bit in regs_avail, 
not regs_dirty.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio-spec: Fix wrong bit number of device status

2011-06-07 Thread akong
From: Amos Kong ak...@redhat.com

qemu-kvm/hw/virtio_config.h:
 #define VIRTIO_CONFIG_S_ACKNOWLEDGE 1
 #define VIRTIO_CONFIG_S_DRIVER  2
 #define VIRTIO_CONFIG_S_DRIVER_OK   4
 #define VIRTIO_CONFIG_S_FAILED  0x80

virtio-spec:
ACKNOWLEDGE(1) :
DRIVER(2)  :
DRIVER_OK(3)   :
FAILED(128):

The spec refers to bit numbers and the headers use absolute numbers,
they are not consistent.

it shoule be 'FAILED(8)'.
2^(8-1) = 128

Signed-off-by: Amos Kong ak...@redhat.com
---
 virtio-spec.lyx |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 448af76..41b7657 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -1552,7 +1552,7 @@ FAILED
 \begin_inset space ~
 \end_inset
 
-(128) Indicates that something went wrong in the guest, and it has given
+(7) Indicates that something went wrong in the guest, and it has given
  up on the device.
  This could be an internal error, or the driver didn't like the device for
  some reason, or even a fatal error during device operation.
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/15] KVM: optimize for MMIO handled

2011-06-07 Thread Xiao Guangrong
The idea of this patchset is from Avi:
| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

The aim of this patchset is to support fast mmio emulate, it reduce searching
mmio gfn from memslots which is very expensive since we need to walk all slots
for mmio gfn, and the other advantage is: we can reduce guest page table walking
for soft mmu.

Lockless walk shadow page table is introduced in this patchset, it is the light
way to check the page fault is the real mmio page fault or something is running
out of our mind.

And, if shadow_notrap_nonpresent_pte is enabled(bypass_guest_pf=1), mmio page
fault and normal page fault is mixed(the reserved is set for all page fault),
it has little regression, if the box can generate lots of mmio access, for
example, the network server, it can disable shadow_notrap_nonpresent_pte and
enable mmio pf, after all, we can enable/disable mmio pf at the runtime.

The performance test result:

Netperf (TCP_RR):
===
ept is enabled:

  Before After
1st   709.58 734.60
2nd   715.40 723.75
3rd   713.45 724.22

ept=0 bypass_guest_pf=0:

  Before After
1st   706.10 709.63
2nd   709.38 715.80
3rd   695.90 710.70

Kernbech (do not redirect output to /dev/null)
==
ept is enabled:

  Before After
1st   2m34.749s  2m33.482s
2nd   2m34.651s  2m33.161s
3rd   2m34.543s  2m34.271s

ept=0 bypass_guest_pf=0:

  Before After
1st   4m43.467s  4m41.873s
2nd   4m45.225s  4m41.668s
3rd   4m47.029s  4m40.128s

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/15] KVM: MMU: fix walking shadow page table

2011-06-07 Thread Xiao Guangrong
Properly check the last mapping, and do not walk to the next level if last spte
is met

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2d14434..cda666a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1517,10 +1517,6 @@ static bool shadow_walk_okay(struct 
kvm_shadow_walk_iterator *iterator)
if (iterator-level  PT_PAGE_TABLE_LEVEL)
return false;
 
-   if (iterator-level == PT_PAGE_TABLE_LEVEL)
-   if (is_large_pte(*iterator-sptep))
-   return false;
-
iterator-index = SHADOW_PT_INDEX(iterator-addr, iterator-level);
iterator-sptep = ((u64 *)__va(iterator-shadow_addr)) + 
iterator-index;
return true;
@@ -1528,6 +1524,11 @@ static bool shadow_walk_okay(struct 
kvm_shadow_walk_iterator *iterator)
 
 static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
 {
+   if (is_last_spte(*iterator-sptep, iterator-level)) {
+   iterator-level = 0;
+   return;
+   }
+
iterator-shadow_addr = *iterator-sptep  PT64_BASE_ADDR_MASK;
--iterator-level;
 }
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/15] KVM: MMU: do not update slot bitmap if spte is nonpresent

2011-06-07 Thread Xiao Guangrong
Set slot bitmap only if the spte is present

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   15 +++
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cda666a..125f78d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -743,9 +743,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t 
gfn)
struct kvm_mmu_page *sp;
unsigned long *rmapp;
 
-   if (!is_rmap_spte(*spte))
-   return 0;
-
sp = page_header(__pa(spte));
kvm_mmu_page_set_gfn(sp, spte - sp-spt, gfn);
rmapp = gfn_to_rmap(vcpu-kvm, gfn, sp-role.level);
@@ -2078,11 +2075,13 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
if (!was_rmapped  is_large_pte(*sptep))
++vcpu-kvm-stat.lpages;
 
-   page_header_update_slot(vcpu-kvm, sptep, gfn);
-   if (!was_rmapped) {
-   rmap_count = rmap_add(vcpu, sptep, gfn);
-   if (rmap_count  RMAP_RECYCLE_THRESHOLD)
-   rmap_recycle(vcpu, sptep, gfn);
+   if (is_shadow_present_pte(*sptep)) {
+   page_header_update_slot(vcpu-kvm, sptep, gfn);
+   if (!was_rmapped) {
+   rmap_count = rmap_add(vcpu, sptep, gfn);
+   if (rmap_count  RMAP_RECYCLE_THRESHOLD)
+   rmap_recycle(vcpu, sptep, gfn);
+   }
}
kvm_release_pfn_clean(pfn);
if (speculative) {
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/15] KVM: x86: avoid unnecessarily guest page table walking

2011-06-07 Thread Xiao Guangrong
We already get the guest physical address, so use it to read guest data
directly to avoid walking guest page table again

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 694538a..8be9ff6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3930,8 +3930,7 @@ static int emulator_read_emulated(struct x86_emulate_ctxt 
*ctxt,
if ((gpa  PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
goto mmio;
 
-   if (kvm_read_guest_virt(ctxt, addr, val, bytes, exception)
-   == X86EMUL_CONTINUE)
+   if (!kvm_read_guest(vcpu-kvm, gpa, val, bytes))
return X86EMUL_CONTINUE;
 
 mmio:
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-07 Thread Xiao Guangrong
If the page fault is caused by mmio, we can cache the mmio info, later, we do
not need to walk guest page table and quickly know it is a mmio fault while we
emulate the mmio instruction

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |5 +++
 arch/x86/kvm/mmu.c  |   21 +--
 arch/x86/kvm/mmu.h  |   23 +
 arch/x86/kvm/paging_tmpl.h  |   21 ++-
 arch/x86/kvm/x86.c  |   52 ++
 arch/x86/kvm/x86.h  |   36 +++
 6 files changed, 126 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d167039..326af42 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -414,6 +414,11 @@ struct kvm_vcpu_arch {
u64 mcg_ctl;
u64 *mce_banks;
 
+   /* Cache MMIO info */
+   u64 mmio_gva;
+   unsigned access;
+   gfn_t mmio_gfn;
+
/* used for guest single stepping over the given code position */
unsigned long singlestep_rip;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 125f78d..415030e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -217,11 +217,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 
accessed_mask,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
-static bool is_write_protection(struct kvm_vcpu *vcpu)
-{
-   return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
-}
-
 static int is_cpuid_PSE36(void)
 {
return 1;
@@ -243,11 +238,6 @@ static int is_large_pte(u64 pte)
return pte  PT_PAGE_SIZE_MASK;
 }
 
-static int is_writable_pte(unsigned long pte)
-{
-   return pte  PT_WRITABLE_MASK;
-}
-
 static int is_dirty_gpte(unsigned long pte)
 {
return pte  PT_DIRTY_MASK;
@@ -2238,15 +2228,17 @@ static void kvm_send_hwpoison_signal(unsigned long 
address, struct task_struct *
send_sig_info(SIGBUS, info, tsk);
 }
 
-static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn)
+static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gva_t gva,
+  unsigned access, gfn_t gfn, pfn_t pfn)
 {
kvm_release_pfn_clean(pfn);
if (is_hwpoison_pfn(pfn)) {
-   kvm_send_hwpoison_signal(gfn_to_hva(kvm, gfn), current);
+   kvm_send_hwpoison_signal(gfn_to_hva(vcpu-kvm, gfn), current);
return 0;
} else if (is_fault_pfn(pfn))
return -EFAULT;
 
+   vcpu_cache_mmio_info(vcpu, gva, gfn, access);
return 1;
 }
 
@@ -2328,7 +2320,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
int write, gfn_t gfn,
 
/* mmio */
if (is_error_pfn(pfn))
-   return kvm_handle_bad_page(vcpu-kvm, gfn, pfn);
+   return kvm_handle_bad_page(vcpu, v, ACC_ALL, gfn, pfn);
 
spin_lock(vcpu-kvm-mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
@@ -2555,6 +2547,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
if (!VALID_PAGE(vcpu-arch.mmu.root_hpa))
return;
 
+   vcpu_clear_mmio_info(vcpu, ~0ull);
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
if (vcpu-arch.mmu.root_level == PT64_ROOT_LEVEL) {
hpa_t root = vcpu-arch.mmu.root_hpa;
@@ -2701,7 +2694,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
 
/* mmio */
if (is_error_pfn(pfn))
-   return kvm_handle_bad_page(vcpu-kvm, gfn, pfn);
+   return kvm_handle_bad_page(vcpu, 0, 0, gfn, pfn);
spin_lock(vcpu-kvm-mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7086ca8..05310b1 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -76,4 +76,27 @@ static inline int is_present_gpte(unsigned long pte)
return pte  PT_PRESENT_MASK;
 }
 
+static inline int is_writable_pte(unsigned long pte)
+{
+   return pte  PT_WRITABLE_MASK;
+}
+
+static inline bool is_write_protection(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
+}
+
+static inline bool check_write_user_access(struct kvm_vcpu *vcpu,
+  bool write_fault, bool user_fault,
+  unsigned long pte)
+{
+   if (unlikely(write_fault  !is_writable_pte(pte)
+  (user_fault || is_write_protection(vcpu
+   return false;
+
+   if (unlikely(user_fault  !(pte  PT_USER_MASK)))
+   return false;
+
+   return true;
+}
 #endif
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6c4dc01..b0c8184 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -201,11 +201,8 @@ walk:
break;
}
 
-   if (unlikely(write_fault  

[PATCH 05/15] KVM: MMU: optimize to handle dirty bit

2011-06-07 Thread Xiao Guangrong
If dirty bit is not set, we can make the pte access read-only to avoid handing
dirty bit everywhere

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   13 ++---
 arch/x86/kvm/paging_tmpl.h |   30 ++
 2 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 415030e..a10afd4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1923,7 +1923,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, 
gfn_t gfn,
 
 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
unsigned pte_access, int user_fault,
-   int write_fault, int dirty, int level,
+   int write_fault, int level,
gfn_t gfn, pfn_t pfn, bool speculative,
bool can_unsync, bool host_writable)
 {
@@ -1938,8 +1938,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte = PT_PRESENT_MASK;
if (!speculative)
spte |= shadow_accessed_mask;
-   if (!dirty)
-   pte_access = ~ACC_WRITE_MASK;
+
if (pte_access  ACC_EXEC_MASK)
spte |= shadow_x_mask;
else
@@ -2014,7 +2013,7 @@ done:
 
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 unsigned pt_access, unsigned pte_access,
-int user_fault, int write_fault, int dirty,
+int user_fault, int write_fault,
 int *ptwrite, int level, gfn_t gfn,
 pfn_t pfn, bool speculative,
 bool host_writable)
@@ -2050,7 +2049,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
 
if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
- dirty, level, gfn, pfn, speculative, true,
+ level, gfn, pfn, speculative, true,
  host_writable)) {
if (write_fault)
*ptwrite = 1;
@@ -2120,7 +2119,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 
for (i = 0; i  ret; i++, gfn++, start++)
mmu_set_spte(vcpu, start, ACC_ALL,
-access, 0, 0, 1, NULL,
+access, 0, 0, NULL,
 sp-role.level, gfn,
 page_to_pfn(pages[i]), true, true);
 
@@ -2184,7 +2183,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
unsigned pte_access = ACC_ALL;
 
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
-0, write, 1, pt_write,
+0, write, pt_write,
 level, gfn, pfn, prefault, map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu-stat.pf_fixed;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b0c8184..67971da 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -106,6 +106,9 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, 
pt_element_t gpte)
unsigned access;
 
access = (gpte  (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
+   if (!is_dirty_gpte(gpte))
+   access = ~ACC_WRITE_MASK;
+
 #if PTTYPE == 64
if (vcpu-arch.mmu.nx)
access = ~(gpte  PT64_NX_SHIFT);
@@ -378,7 +381,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 * vcpu-arch.update_pte.pfn was fetched from get_user_pages(write = 1).
 */
mmu_set_spte(vcpu, spte, sp-role.access, pte_access, 0, 0,
-is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL,
+NULL, PT_PAGE_TABLE_LEVEL,
 gpte_to_gfn(gpte), pfn, true, true);
 }
 
@@ -429,7 +432,6 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
unsigned pte_access;
gfn_t gfn;
pfn_t pfn;
-   bool dirty;
 
if (spte == sptep)
continue;
@@ -444,16 +446,15 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
 
pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte);
gfn = gpte_to_gfn(gpte);
-   dirty = is_dirty_gpte(gpte);
pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
- (pte_access  ACC_WRITE_MASK)  dirty);
+ pte_access  ACC_WRITE_MASK);
if (is_error_pfn(pfn)) {
kvm_release_pfn_clean(pfn);
break;
}
 
mmu_set_spte(vcpu, spte, sp-role.access, pte_access, 0, 0,
-   

[PATCH 06/15] KVM: MMU: cleanup for FNAME(fetch)

2011-06-07 Thread Xiao Guangrong
gw-pte_access is the final access permission, since it is unified with
gw-pt_access when we walked guest page table:

FNAME(walk_addr_generic):
pte_access = pt_access  FNAME(gpte_access)(vcpu, pte);

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/paging_tmpl.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 67971da..95da29e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -477,7 +477,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (!is_present_gpte(gw-ptes[gw-level - 1]))
return NULL;
 
-   direct_access = gw-pt_access  gw-pte_access;
+   direct_access = gw-pte_access;
 
top_level = vcpu-arch.mmu.root_level;
if (top_level == PT32E_ROOT_LEVEL)
@@ -535,7 +535,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
link_shadow_page(it.sptep, sp);
}
 
-   mmu_set_spte(vcpu, it.sptep, access, gw-pte_access  access,
+   mmu_set_spte(vcpu, it.sptep, access, gw-pte_access,
 user_fault, write_fault, ptwrite, it.level,
 gw-gfn, pfn, prefault, map_writable);
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/15] KVM: MMU: rename 'pt_write' to 'emulate'

2011-06-07 Thread Xiao Guangrong
If 'pt_write' is true, we need to emulate the fault. And in later patch, we
need to emulate the fault even though it is not a pt_write event, so rename
it to better fit the meaning

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   10 +-
 arch/x86/kvm/paging_tmpl.h |   16 
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a10afd4..05e604d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2014,7 +2014,7 @@ done:
 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 unsigned pt_access, unsigned pte_access,
 int user_fault, int write_fault,
-int *ptwrite, int level, gfn_t gfn,
+int *emulate, int level, gfn_t gfn,
 pfn_t pfn, bool speculative,
 bool host_writable)
 {
@@ -2052,7 +2052,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
  level, gfn, pfn, speculative, true,
  host_writable)) {
if (write_fault)
-   *ptwrite = 1;
+   *emulate = 1;
kvm_mmu_flush_tlb(vcpu);
}
 
@@ -2175,7 +2175,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
-   int pt_write = 0;
+   int emulate = 0;
gfn_t pseudo_gfn;
 
for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
@@ -2183,7 +2183,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
unsigned pte_access = ACC_ALL;
 
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
-0, write, pt_write,
+0, write, emulate,
 level, gfn, pfn, prefault, map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu-stat.pf_fixed;
@@ -2211,7 +2211,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
   | shadow_accessed_mask);
}
}
-   return pt_write;
+   return emulate;
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct 
*tsk)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 95da29e..8353b69 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -465,7 +465,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
 static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 struct guest_walker *gw,
 int user_fault, int write_fault, int hlevel,
-int *ptwrite, pfn_t pfn, bool map_writable,
+int *emulate, pfn_t pfn, bool map_writable,
 bool prefault)
 {
unsigned access = gw-pt_access;
@@ -536,7 +536,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}
 
mmu_set_spte(vcpu, it.sptep, access, gw-pte_access,
-user_fault, write_fault, ptwrite, it.level,
+user_fault, write_fault, emulate, it.level,
 gw-gfn, pfn, prefault, map_writable);
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
 
@@ -570,7 +570,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
int user_fault = error_code  PFERR_USER_MASK;
struct guest_walker walker;
u64 *sptep;
-   int write_pt = 0;
+   int emulate = 0;
int r;
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
@@ -631,19 +631,19 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
if (!force_pt_level)
transparent_hugepage_adjust(vcpu, walker.gfn, pfn, level);
sptep = FNAME(fetch)(vcpu, addr, walker, user_fault, write_fault,
-level, write_pt, pfn, map_writable, prefault);
+level, emulate, pfn, map_writable, prefault);
(void)sptep;
-   pgprintk(%s: shadow pte %p %llx ptwrite %d\n, __func__,
-sptep, *sptep, write_pt);
+   pgprintk(%s: shadow pte %p %llx emulate %d\n, __func__,
+sptep, *sptep, emulate);
 
-   if (!write_pt)
+   if (!emulate)
vcpu-arch.last_pt_write_count = 0; /* reset fork detector */
 
++vcpu-stat.pf_fixed;
trace_kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
spin_unlock(vcpu-kvm-mmu_lock);
 
-   return write_pt;
+   return emulate;
 
 out_unlock:
spin_unlock(vcpu-kvm-mmu_lock);
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body 

[PATCH 08/15] KVM: MMU: count used shadow pages on preparing path

2011-06-07 Thread Xiao Guangrong
Move counting used shadow pages from committing path to preparing path to
reduce tlb flush on some paths

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 05e604d..43e7ca1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1039,7 +1039,7 @@ static inline void kvm_mod_used_mmu_pages(struct kvm 
*kvm, int nr)
percpu_counter_add(kvm_total_used_mmu_pages, nr);
 }
 
-static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 {
ASSERT(is_empty_shadow_page(sp-spt));
hlist_del(sp-hash_link);
@@ -1048,7 +1048,6 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
if (!sp-role.direct)
free_page((unsigned long)sp-gfns);
kmem_cache_free(mmu_page_header_cache, sp);
-   kvm_mod_used_mmu_pages(kvm, -1);
 }
 
 static unsigned kvm_page_table_hashfn(gfn_t gfn)
@@ -1655,6 +1654,7 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, 
struct kvm_mmu_page *sp,
/* Count self */
ret++;
list_move(sp-link, invalid_list);
+   kvm_mod_used_mmu_pages(kvm, -1);
} else {
list_move(sp-link, kvm-arch.active_mmu_pages);
kvm_reload_remote_mmus(kvm);
@@ -1678,7 +1678,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
do {
sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
WARN_ON(!sp-role.invalid || sp-root_count);
-   kvm_mmu_free_page(kvm, sp);
+   kvm_mmu_free_page(sp);
} while (!list_empty(invalid_list));
 
 }
@@ -1704,8 +1704,8 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned 
int goal_nr_mmu_pages)
page = container_of(kvm-arch.active_mmu_pages.prev,
struct kvm_mmu_page, link);
kvm_mmu_prepare_zap_page(kvm, page, invalid_list);
-   kvm_mmu_commit_zap_page(kvm, invalid_list);
}
+   kvm_mmu_commit_zap_page(kvm, invalid_list);
goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages;
}
 
@@ -3290,9 +3290,9 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
sp = container_of(vcpu-kvm-arch.active_mmu_pages.prev,
  struct kvm_mmu_page, link);
kvm_mmu_prepare_zap_page(vcpu-kvm, sp, invalid_list);
-   kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list);
++vcpu-kvm-stat.mmu_recycled;
}
+   kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list);
 }
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/15] KVM: MMU: split kvm_mmu_free_page

2011-06-07 Thread Xiao Guangrong
Split kvm_mmu_free_page to kvm_mmu_free_lock_parts and
kvm_mmu_free_unlock_parts

One is used to free the parts which is under mmu lock and the other is
used to free the parts which can allow be freed out of mmu lock

It is used by later patch

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 43e7ca1..9f3a746 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1039,17 +1039,27 @@ static inline void kvm_mod_used_mmu_pages(struct kvm 
*kvm, int nr)
percpu_counter_add(kvm_total_used_mmu_pages, nr);
 }
 
-static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
+static void kvm_mmu_free_lock_parts(struct kvm_mmu_page *sp)
 {
ASSERT(is_empty_shadow_page(sp-spt));
hlist_del(sp-hash_link);
-   list_del(sp-link);
-   free_page((unsigned long)sp-spt);
if (!sp-role.direct)
free_page((unsigned long)sp-gfns);
+}
+
+static void kvm_mmu_free_unlock_parts(struct kvm_mmu_page *sp)
+{
+   list_del(sp-link);
+   free_page((unsigned long)sp-spt);
kmem_cache_free(mmu_page_header_cache, sp);
 }
 
+static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
+{
+   kvm_mmu_free_lock_parts(sp);
+   kvm_mmu_free_unlock_parts(sp);
+}
+
 static unsigned kvm_page_table_hashfn(gfn_t gfn)
 {
return gfn  ((1  KVM_MMU_HASH_SHIFT) - 1);
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/15] KVM: MMU: lockless walking shadow page table

2011-06-07 Thread Xiao Guangrong
Using rcu to protect shadow pages table to be freed, so we can safely walk it,
it should run fast and is needed by mmio page fault

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |4 ++
 arch/x86/kvm/mmu.c  |   79 ++-
 arch/x86/kvm/mmu.h  |4 +-
 arch/x86/kvm/vmx.c  |2 +-
 4 files changed, 69 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 326af42..260582b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -232,6 +232,8 @@ struct kvm_mmu_page {
unsigned int unsync_children;
unsigned long parent_ptes;  /* Reverse mapping for parent_pte */
DECLARE_BITMAP(unsync_child_bitmap, 512);
+
+   struct rcu_head rcu;
 };
 
 struct kvm_pv_mmu_op_buffer {
@@ -478,6 +480,8 @@ struct kvm_arch {
u64 hv_guest_os_id;
u64 hv_hypercall;
 
+   atomic_t reader_counter;
+
#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
#endif
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9f3a746..52d4682 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1675,6 +1675,30 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, 
struct kvm_mmu_page *sp,
return ret;
 }
 
+static void free_mmu_pages_unlock_parts(struct list_head *invalid_list)
+{
+   struct kvm_mmu_page *sp;
+
+   list_for_each_entry(sp, invalid_list, link)
+   kvm_mmu_free_lock_parts(sp);
+}
+
+static void free_invalid_pages_rcu(struct rcu_head *head)
+{
+   struct kvm_mmu_page *next, *sp;
+
+   sp = container_of(head, struct kvm_mmu_page, rcu);
+   while (sp) {
+   if (!list_empty(sp-link))
+   next = list_first_entry(sp-link,
+ struct kvm_mmu_page, link);
+   else
+   next = NULL;
+   kvm_mmu_free_unlock_parts(sp);
+   sp = next;
+   }
+}
+
 static void kvm_mmu_commit_zap_page(struct kvm *kvm,
struct list_head *invalid_list)
 {
@@ -1685,6 +1709,14 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
kvm_flush_remote_tlbs(kvm);
 
+   if (atomic_read(kvm-arch.reader_counter)) {
+   free_mmu_pages_unlock_parts(invalid_list);
+   sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
+   list_del_init(invalid_list);
+   call_rcu(sp-rcu, free_invalid_pages_rcu);
+   return;
+   }
+
do {
sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
WARN_ON(!sp-role.invalid || sp-root_count);
@@ -2601,6 +2633,35 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu 
*vcpu, gva_t vaddr,
return vcpu-arch.nested_mmu.translate_gpa(vcpu, vaddr, access);
 }
 
+int kvm_mmu_walk_shadow_page_lockless(struct kvm_vcpu *vcpu, u64 addr,
+ u64 sptes[4])
+{
+   struct kvm_shadow_walk_iterator iterator;
+   int nr_sptes = 0;
+
+   rcu_read_lock();
+
+   atomic_inc(vcpu-kvm-arch.reader_counter);
+   /* Increase the counter before walking shadow page table */
+   smp_mb__after_atomic_inc();
+
+   for_each_shadow_entry(vcpu, addr, iterator) {
+   sptes[iterator.level-1] = *iterator.sptep;
+   nr_sptes++;
+   if (!is_shadow_present_pte(*iterator.sptep))
+   break;
+   }
+
+   /* Decrease the counter after walking shadow page table finished */
+   smp_mb__before_atomic_dec();
+   atomic_dec(vcpu-kvm-arch.reader_counter);
+
+   rcu_read_unlock();
+
+   return nr_sptes;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_walk_shadow_page_lockless);
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code, bool prefault)
 {
@@ -3684,24 +3745,6 @@ out:
return r;
 }
 
-int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
-{
-   struct kvm_shadow_walk_iterator iterator;
-   int nr_sptes = 0;
-
-   spin_lock(vcpu-kvm-mmu_lock);
-   for_each_shadow_entry(vcpu, addr, iterator) {
-   sptes[iterator.level-1] = *iterator.sptep;
-   nr_sptes++;
-   if (!is_shadow_present_pte(*iterator.sptep))
-   break;
-   }
-   spin_unlock(vcpu-kvm-mmu_lock);
-
-   return nr_sptes;
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
-
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
ASSERT(vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 05310b1..e7725c4 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,9 @@
 #define PFERR_RSVD_MASK (1U  3)
 #define PFERR_FETCH_MASK (1U  4)
 
-int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 

[PATCH 11/15] KVM: MMU: filter out the mmio pfn from the fault pfn

2011-06-07 Thread Xiao Guangrong
If the page fault is caused by mmio, the gfn can not be found in memslots, and
'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify
the mmio page fault.

And, to clarify the meaning of mmio pfn, we return fault page instead of bad
page when the gfn is not allowed to prefetch

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c   |4 ++--
 include/linux/kvm_host.h |5 +
 virt/kvm/kvm_main.c  |   16 ++--
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 52d4682..7286d2a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2133,8 +2133,8 @@ static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu 
*vcpu, gfn_t gfn,
 
slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
if (!slot) {
-   get_page(bad_page);
-   return page_to_pfn(bad_page);
+   get_page(fault_page);
+   return page_to_pfn(fault_page);
}
 
hva = gfn_to_hva_memslot(slot, gfn);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b9c3299..16d6d3f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -326,12 +326,17 @@ static inline struct kvm_memslots *kvm_memslots(struct 
kvm *kvm)
 static inline int is_error_hpa(hpa_t hpa) { return hpa  HPA_MSB; }
 
 extern struct page *bad_page;
+extern struct page *fault_page;
+
 extern pfn_t bad_pfn;
+extern pfn_t fault_pfn;
 
 int is_error_page(struct page *page);
 int is_error_pfn(pfn_t pfn);
 int is_hwpoison_pfn(pfn_t pfn);
 int is_fault_pfn(pfn_t pfn);
+int is_mmio_pfn(pfn_t pfn);
+int is_invalid_pfn(pfn_t pfn);
 int kvm_is_error_hva(unsigned long addr);
 int kvm_set_memory_region(struct kvm *kvm,
  struct kvm_userspace_memory_region *mem,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f78ddb8..93a1ce1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -97,8 +97,8 @@ static bool largepages_enabled = true;
 static struct page *hwpoison_page;
 static pfn_t hwpoison_pfn;
 
-static struct page *fault_page;
-static pfn_t fault_pfn;
+struct page *fault_page;
+pfn_t fault_pfn;
 
 inline int kvm_is_mmio_pfn(pfn_t pfn)
 {
@@ -926,6 +926,18 @@ int is_fault_pfn(pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(is_fault_pfn);
 
+int is_mmio_pfn(pfn_t pfn)
+{
+   return pfn == bad_pfn;
+}
+EXPORT_SYMBOL_GPL(is_mmio_pfn);
+
+int is_invalid_pfn(pfn_t pfn)
+{
+   return pfn == hwpoison_pfn || pfn == fault_pfn;
+}
+EXPORT_SYMBOL_GPL(is_invalid_pfn);
+
 static inline unsigned long bad_hva(void)
 {
return PAGE_OFFSET;
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/15] KVM: MMU: abstract some functions to handle fault pfn

2011-06-07 Thread Xiao Guangrong
Introduce handle_abnormal_pfn to handle fault pfn on page fault path,
introduce mmu_invalid_pfn to handle fault pfn on prefetch path

It is the preparing work for mmio page fault support

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   47 ---
 arch/x86/kvm/paging_tmpl.h |   12 +-
 2 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7286d2a..4f475ab 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2269,18 +2269,15 @@ static void kvm_send_hwpoison_signal(unsigned long 
address, struct task_struct *
send_sig_info(SIGBUS, info, tsk);
 }
 
-static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gva_t gva,
-  unsigned access, gfn_t gfn, pfn_t pfn)
+static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, pfn_t pfn)
 {
kvm_release_pfn_clean(pfn);
if (is_hwpoison_pfn(pfn)) {
kvm_send_hwpoison_signal(gfn_to_hva(vcpu-kvm, gfn), current);
return 0;
-   } else if (is_fault_pfn(pfn))
-   return -EFAULT;
+   }
 
-   vcpu_cache_mmio_info(vcpu, gva, gfn, access);
-   return 1;
+   return -EFAULT;
 }
 
 static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
@@ -2325,6 +2322,33 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
*vcpu,
}
 }
 
+static bool mmu_invalid_pfn(pfn_t pfn)
+{
+   return unlikely(is_invalid_pfn(pfn) || is_mmio_pfn(pfn));
+}
+
+static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+   pfn_t pfn, unsigned access, int *ret_val)
+{
+   bool ret = true;
+
+   /* The pfn is invalid, report the error! */
+   if (unlikely(is_invalid_pfn(pfn))) {
+   *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn);
+   goto exit;
+   }
+
+   if (unlikely(is_mmio_pfn(pfn))) {
+   vcpu_cache_mmio_info(vcpu, gva, gfn, ACC_ALL);
+   *ret_val = 1;
+   goto exit;
+   }
+
+   ret = false;
+exit:
+   return ret;
+}
+
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 gva_t gva, pfn_t *pfn, bool write, bool *writable);
 
@@ -2359,9 +2383,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
int write, gfn_t gfn,
if (try_async_pf(vcpu, prefault, gfn, v, pfn, write, map_writable))
return 0;
 
-   /* mmio */
-   if (is_error_pfn(pfn))
-   return kvm_handle_bad_page(vcpu, v, ACC_ALL, gfn, pfn);
+   if (handle_abnormal_pfn(vcpu, v, gfn, pfn, ACC_ALL, r))
+   return r;
 
spin_lock(vcpu-kvm-mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
@@ -2762,9 +2785,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
if (try_async_pf(vcpu, prefault, gfn, gpa, pfn, write, map_writable))
return 0;
 
-   /* mmio */
-   if (is_error_pfn(pfn))
-   return kvm_handle_bad_page(vcpu, 0, 0, gfn, pfn);
+   if (handle_abnormal_pfn(vcpu, 0, gfn, pfn, ACC_ALL, r))
+   return r;
+
spin_lock(vcpu-kvm-mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8353b69..4f960b2 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -371,7 +371,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
pgprintk(%s: gpte %llx spte %p\n, __func__, (u64)gpte, spte);
pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte);
pfn = gfn_to_pfn_atomic(vcpu-kvm, gpte_to_gfn(gpte));
-   if (is_error_pfn(pfn)) {
+   if (mmu_invalid_pfn(pfn)) {
kvm_release_pfn_clean(pfn);
return;
}
@@ -448,7 +448,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
gfn = gpte_to_gfn(gpte);
pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
  pte_access  ACC_WRITE_MASK);
-   if (is_error_pfn(pfn)) {
+   if (mmu_invalid_pfn(pfn)) {
kvm_release_pfn_clean(pfn);
break;
}
@@ -618,10 +618,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
 map_writable))
return 0;
 
-   /* mmio */
-   if (is_error_pfn(pfn))
-   return kvm_handle_bad_page(vcpu, mmu_is_nested(vcpu) ? 0 :
- addr, walker.pte_access, walker.gfn, pfn);
+   if (handle_abnormal_pfn(vcpu, mmu_is_nested(vcpu) ? 0 : addr,
+   walker.gfn, pfn, walker.pte_access, r))
+   return r;
+

[PATCH 13/15] KVM: VMX: modify the default value of nontrap shadow pte

2011-06-07 Thread Xiao Guangrong
Modify the default value to identify nontrap shadow pte and mmio shadow pte
whill will be introduced in later patch

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 20dbf7f..8c3d343 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7110,7 +7110,7 @@ static int __init vmx_init(void)
kvm_disable_tdp();
 
if (bypass_guest_pf)
-   kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull);
+   kvm_mmu_set_nonpresent_ptes(0xfull  49 | 1ull, 0ull);
 
return 0;
 
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/15] KVM: MMU: mmio page fault support

2011-06-07 Thread Xiao Guangrong
The idea is from Avi:

| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

When the page fault is caused by mmio, we cache the info in the shadow page
table, and also set the reserved bits in the shadow page table, so if the mmio
is caused again, we can quickly identify it and emulate it directly

Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it
can be reduced by this feature, and also avoid walking guest page table for
soft mmu.

This feature can be disabled/enabled at the runtime, if
shadow_notrap_nonpresent_pte is enabled, the PFER.RSVD is always set, we need
to walk shadow page table for all page fault, so disable this feature if
shadow_notrap_nonpresent is enabled.

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |  149 ---
 arch/x86/kvm/mmu.h |4 +-
 arch/x86/kvm/paging_tmpl.h |   32 +-
 arch/x86/kvm/vmx.c |   12 +++-
 4 files changed, 180 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4f475ab..227cf10 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -91,6 +91,9 @@ module_param(dbg, bool, 0644);
 static int oos_shadow = 1;
 module_param(oos_shadow, bool, 0644);
 
+static int __read_mostly mmio_pf = 1;
+module_param(mmio_pf, bool, 0644);
+
 #ifndef MMU_DEBUG
 #define ASSERT(x) do { } while (0)
 #else
@@ -193,6 +196,44 @@ static u64 __read_mostly shadow_x_mask;/* mutual 
exclusive with nx_mask */
 static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
+static u64 __read_mostly shadow_mmio_mask = (0xffull  49 | 1ULL);
+
+static void __set_spte(u64 *sptep, u64 spte)
+{
+   set_64bit(sptep, spte);
+}
+
+static void mark_mmio_spte(u64 *sptep, u64 gfn, unsigned access)
+{
+   access = ACC_WRITE_MASK | ACC_USER_MASK;
+
+   __set_spte(sptep, shadow_mmio_mask | access | gfn  PAGE_SHIFT);
+}
+
+static bool is_mmio_spte(u64 spte)
+{
+   return (spte  shadow_mmio_mask) == shadow_mmio_mask;
+}
+
+static gfn_t get_mmio_spte_gfn(u64 spte)
+{
+   return (spte  ~shadow_mmio_mask)  PAGE_SHIFT;
+}
+
+static unsigned get_mmio_spte_access(u64 spte)
+{
+   return (spte  ~shadow_mmio_mask)  ~PAGE_MASK;
+}
+
+static bool set_mmio_spte(u64 *sptep, gfn_t gfn, pfn_t pfn, unsigned access)
+{
+   if (unlikely(is_mmio_pfn(pfn))) {
+   mark_mmio_spte(sptep, gfn, access);
+   return true;
+   }
+
+   return false;
+}
 
 static inline u64 rsvd_bits(int s, int e)
 {
@@ -203,6 +244,8 @@ void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 
notrap_pte)
 {
shadow_trap_nonpresent_pte = trap_pte;
shadow_notrap_nonpresent_pte = notrap_pte;
+   if (trap_pte != notrap_pte)
+   mmio_pf = 0;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes);
 
@@ -230,7 +273,8 @@ static int is_nx(struct kvm_vcpu *vcpu)
 static int is_shadow_present_pte(u64 pte)
 {
return pte != shadow_trap_nonpresent_pte
-pte != shadow_notrap_nonpresent_pte;
+pte != shadow_notrap_nonpresent_pte
+!is_mmio_spte(pte);
 }
 
 static int is_large_pte(u64 pte)
@@ -269,11 +313,6 @@ static gfn_t pse36_gfn_delta(u32 gpte)
return (gpte  PT32_DIR_PSE36_MASK)  shift;
 }
 
-static void __set_spte(u64 *sptep, u64 spte)
-{
-   set_64bit(sptep, spte);
-}
-
 static u64 __xchg_spte(u64 *sptep, u64 new_spte)
 {
 #ifdef CONFIG_X86_64
@@ -1972,6 +2011,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
u64 spte, entry = *sptep;
int ret = 0;
 
+   if (set_mmio_spte(sptep, gfn, pfn, pte_access))
+   return 0;
+
/*
 * We don't set the accessed bit, since we sometimes want to see
 * whether the guest actually used the pte (in order to detect
@@ -2098,6 +2140,9 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
kvm_mmu_flush_tlb(vcpu);
}
 
+   if (unlikely(is_mmio_spte(*sptep)  emulate))
+   *emulate = 1;
+
pgprintk(%s: setting spte %llx\n, __func__, *sptep);
pgprintk(instantiating %s PTE (%s) at %llx (%llx) addr %p\n,
 is_large_pte(*sptep)? 2MB : 4kB,
@@ -2324,7 +2369,10 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
*vcpu,
 
 static bool mmu_invalid_pfn(pfn_t pfn)
 {
-   return unlikely(is_invalid_pfn(pfn) || is_mmio_pfn(pfn));
+   if (unlikely(!mmio_pf  is_mmio_pfn(pfn)))
+   return true;
+
+   return unlikely(is_invalid_pfn(pfn));
 }
 
 static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
@@ -2340,8 +2388,10 @@ 

[PATCH 15/15] KVM: MMU: trace mmio page fault

2011-06-07 Thread Xiao Guangrong
Add tracepoints to trace mmio page fault

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |4 +++
 arch/x86/kvm/mmutrace.h|   48 
 arch/x86/kvm/x86.c |5 +++-
 include/trace/events/kvm.h |   24 ++
 4 files changed, 80 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 227cf10..aff8f52 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -207,6 +207,7 @@ static void mark_mmio_spte(u64 *sptep, u64 gfn, unsigned 
access)
 {
access = ACC_WRITE_MASK | ACC_USER_MASK;
 
+   trace_mark_mmio_spte(sptep, gfn, access);
__set_spte(sptep, shadow_mmio_mask | access | gfn  PAGE_SHIFT);
 }
 
@@ -1752,6 +1753,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
free_mmu_pages_unlock_parts(invalid_list);
sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
list_del_init(invalid_list);
+   trace_kvm_mmu_delay_free_pages(sp);
call_rcu(sp-rcu, free_invalid_pages_rcu);
return;
}
@@ -2765,6 +2767,8 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, 
u64 addr,
 
if (direct)
addr = 0;
+
+   trace_handle_mmio_page_fault(addr, gfn, access);
vcpu_cache_mmio_info(vcpu, addr, gfn, access);
return 1;
}
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index b60b4fd..eed67f3 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -196,6 +196,54 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page,
TP_ARGS(sp)
 );
 
+DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_delay_free_pages,
+   TP_PROTO(struct kvm_mmu_page *sp),
+
+   TP_ARGS(sp)
+);
+
+TRACE_EVENT(
+   mark_mmio_spte,
+   TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access),
+   TP_ARGS(sptep, gfn, access),
+
+   TP_STRUCT__entry(
+   __field(void *, sptep)
+   __field(gfn_t, gfn)
+   __field(unsigned, access)
+   ),
+
+   TP_fast_assign(
+   __entry-sptep = sptep;
+   __entry-gfn = gfn;
+   __entry-access = access;
+   ),
+
+   TP_printk(sptep:%p gfn %llx access %x, __entry-sptep, __entry-gfn,
+ __entry-access)
+);
+
+TRACE_EVENT(
+   handle_mmio_page_fault,
+   TP_PROTO(u64 addr, gfn_t gfn, unsigned access),
+   TP_ARGS(addr, gfn, access),
+
+   TP_STRUCT__entry(
+   __field(u64, addr)
+   __field(gfn_t, gfn)
+   __field(unsigned, access)
+   ),
+
+   TP_fast_assign(
+   __entry-addr = addr;
+   __entry-gfn = gfn;
+   __entry-access = access;
+   ),
+
+   TP_printk(addr:%llx gfn %llx access %x, __entry-addr, __entry-gfn,
+ __entry-access)
+);
+
 TRACE_EVENT(
kvm_mmu_audit,
TP_PROTO(struct kvm_vcpu *vcpu, int audit_point),
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a136181..c75f845 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3914,6 +3914,7 @@ static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, 
unsigned long gva,
  vcpu-arch.access)) {
*gpa = vcpu-arch.mmio_gfn  PAGE_SHIFT |
(gva  (PAGE_SIZE - 1));
+   trace_vcpu_match_mmio(gva, *gpa, write, false);
return 1;
}
 
@@ -3929,8 +3930,10 @@ static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, 
unsigned long gva,
if ((*gpa  PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
return 1;
 
-   if (vcpu_match_mmio_gpa(vcpu, *gpa))
+   if (vcpu_match_mmio_gpa(vcpu, *gpa)) {
+   trace_vcpu_match_mmio(gva, *gpa, write, true);
return 1;
+   }
 
return 0;
 }
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 46e3cd8..571e972 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -306,6 +306,30 @@ TRACE_EVENT(
 
 #endif
 
+TRACE_EVENT(
+   vcpu_match_mmio,
+   TP_PROTO(gva_t gva, gpa_t gpa, bool write, bool gpa_match),
+   TP_ARGS(gva, gpa, write, gpa_match),
+
+   TP_STRUCT__entry(
+   __field(gva_t, gva)
+   __field(gpa_t, gpa)
+   __field(bool, write)
+   __field(bool, gpa_match)
+   ),
+
+   TP_fast_assign(
+   __entry-gva = gva;
+   __entry-gpa = gpa;
+   __entry-write = write;
+   __entry-gpa_match = gpa_match
+   ),
+
+   TP_printk(gva %#lx gpa %#llx %s %s, __entry-gva, __entry-gpa,
+ __entry-write ? Write : Read,
+ __entry-gpa_match ? GPA : GVA)
+);
+
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
-- 
1.7.4.4


[PATCH v2] virtio-spec: Fix wrong bit number of device status

2011-06-07 Thread Amos Kong

qemu-kvm/hw/virtio_config.h:
 #define VIRTIO_CONFIG_S_ACKNOWLEDGE 1
 #define VIRTIO_CONFIG_S_DRIVER  2
 #define VIRTIO_CONFIG_S_DRIVER_OK   4
 #define VIRTIO_CONFIG_S_FAILED  0x80

virtio-spec:
ACKNOWLEDGE(1) :
DRIVER(2)  :
DRIVER_OK(3)   :
FAILED(128):

The spec refers to bit numbers and the headers use absolute numbers,
they are not consistent.

it shoule be 'FAILED(8)'.
2^(8-1) = 128

Changes from V1:
- Fix wrong patch body

Signed-off-by: Amos Kong ak...@redhat.com
---
 virtio-spec.lyx |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 448af76..1fc3e59 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -1552,7 +1552,7 @@ FAILED
 \begin_inset space ~
 \end_inset
 
-(128) Indicates that something went wrong in the guest, and it has given
+(8) Indicates that something went wrong in the guest, and it has given
  up on the device.
  This could be an internal error, or the driver didn't like the device for
  some reason, or even a fatal error during device operation.
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Brad Campbell

On 07/06/11 04:22, Eric Dumazet wrote:


Could you please try latest linux-2.6 tree ?

We fixed many networking bugs that could explain your crash.






No good I'm afraid.

[  543.040056] 
=
[  543.040136] BUG ip_dst_cache: Padding overwritten. 
0x8803e4217ffe-0x8803e4217fff
[  543.040194] 
-

[  543.040198]
[  543.040298] INFO: Slab 0xea000d9e74d0 objects=25 used=25 fp=0x 
   (null) flags=0x80004081

[  543.040364] Pid: 4576, comm: kworker/1:2 Not tainted 3.0.0-rc2 #1
[  543.040415] Call Trace:
[  543.040472]  [810b9c1d] ? slab_err+0xad/0xd0
[  543.040528]  [8102e034] ? check_preempt_wakeup+0xa4/0x160
[  543.040595]  [810ba206] ? slab_pad_check+0x126/0x170
[  543.040650]  [8133045b] ? dst_destroy+0x8b/0x110
[  543.040701]  [810ba29a] ? check_slab+0x4a/0xc0
[  543.040753]  [810baf2d] ? free_debug_processing+0x2d/0x250
[  543.040808]  [810bb27b] ? __slab_free+0x12b/0x140
[  543.040862]  [810bbe99] ? kmem_cache_free+0x99/0xa0
[  543.040915]  [8133045b] ? dst_destroy+0x8b/0x110
[  543.040967]  [813307f6] ? dst_gc_task+0x196/0x1f0
[  543.041021]  [8104e954] ? queue_delayed_work_on+0x154/0x160
[  543.041081]  [813066fe] ? do_dbs_timer+0x20e/0x3d0
[  543.041133]  [81330660] ? dst_alloc+0x180/0x180
[  543.041187]  [8104f28b] ? process_one_work+0xfb/0x3b0
[  543.041242]  [8104f964] ? worker_thread+0x144/0x3d0
[  543.041296]  [8102cc10] ? __wake_up_common+0x50/0x80
[  543.041678]  [8104f820] ? rescuer_thread+0x2e0/0x2e0
[  543.041729]  [8104f820] ? rescuer_thread+0x2e0/0x2e0
[  543.041782]  [81053436] ? kthread+0x96/0xa0
[  543.041835]  [813e1d14] ? kernel_thread_helper+0x4/0x10
[  543.041890]  [810533a0] ? kthread_worker_fn+0x120/0x120
[  543.041944]  [813e1d10] ? gs_change+0xb/0xb
[  543.041993]  Padding 0x8803e4217f40:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.042718]  Padding 0x8803e4217f50:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.043433]  Padding 0x8803e4217f60:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.044155]  Padding 0x8803e4217f70:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.044866]  Padding 0x8803e4217f80:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.045590]  Padding 0x8803e4217f90:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.046311]  Padding 0x8803e4217fa0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.047034]  Padding 0x8803e4217fb0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.047755]  Padding 0x8803e4217fc0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.048474]  Padding 0x8803e4217fd0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.049203]  Padding 0x8803e4217fe0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 5a 5a 
[  543.049909]  Padding 0x8803e4217ff0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
5a 5a 5a 5a 5a 00 00 ZZ..
[  543.050021] FIX ip_dst_cache: Restoring 
0x8803e4217f40-0x8803e4217fff=0x5a

[  543.050021]

Dropped -mm, Hugh and Andrea from CC as this does not appear to be mm or 
ksm related.


I'll pare down the firewall and see if I can make it break easier with a 
smaller test set.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Patrick McHardy
On 07.06.2011 05:33, Brad Campbell wrote:
 On 07/06/11 04:10, Bart De Schuymer wrote:
 Hi Brad,

 This has probably nothing to do with ebtables, so please rmmod in case
 it's loaded.
 A few questions I didn't directly see an answer to in the threads I
 scanned...
 I'm assuming you actually use the bridging firewall functionality. So,
 what iptables modules do you use? Can you reduce your iptables rules to
 a core that triggers the bug?
 Or does it get triggered even with an empty set of firewall rules?
 Are you using a stock .35 kernel or is it patched?
 Is this something I can trigger on a poor guy's laptop or does it
 require specialized hardware (I'm catching up on qemu/kvm...)?
 
 Not specialised hardware as such, I've just not been able to reproduce
 it outside of this specific operating scenario.

The last similar problem we've had was related to the 32/64 bit compat
code. Are you running 32 bit userspace on a 64 bit kernel?

 I can't trigger it with empty firewall rules as it relies on a DNAT to
 occur. If I try it directly to the internal IP address (as I have to
 without netfilter loaded) then of course nothing fails.
 
 It's a pain in the bum as a fault, but it's one I can easily reproduce
 as long as I use the same set of circumstances.
 
 I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it
 on that then I'll attempt to pare down the IPTABLES rules to a bare
 minimum.
 
 It is nothing to do with ebtables as I don't compile it. I'm not really
 sure about bridging firewall functionality. I just use a couple of
 hand coded bash scripts to set the tables up.

From one of your previous mails:

 # CONFIG_BRIDGE_NF_EBTABLES is not set

How about CONFIG_BRIDGE_NETFILTER?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Eric Dumazet
Le mardi 07 juin 2011 à 21:27 +0800, Brad Campbell a écrit :
 On 07/06/11 04:22, Eric Dumazet wrote:
 
  Could you please try latest linux-2.6 tree ?
 
  We fixed many networking bugs that could explain your crash.
 
 
 
 
 
 No good I'm afraid.
 
 [  543.040056] 
 =
 [  543.040136] BUG ip_dst_cache: Padding overwritten. 
 0x8803e4217ffe-0x8803e4217fff
 [  543.040194] 

Thats pretty strange : These are the last two bytes of a page, set to
0x (a 16 bit value)

There is no way a dst field could actually sit on this location (its a
padding), since a dst is a bit less than 256 bytes (0xe8), and each
entry is aligned on a 64byte address.

grep dst /proc/slabinfo 

ip_dst_cache   32823  62944256   322 : tunables00
0 : slabdata   1967   1967  0

sizeof(struct rtable)=0xe8


 -
 [  543.040198]
 [  543.040298] INFO: Slab 0xea000d9e74d0 objects=25 used=25 fp=0x 
 (null) flags=0x80004081
 [  543.040364] Pid: 4576, comm: kworker/1:2 Not tainted 3.0.0-rc2 #1
 [  543.040415] Call Trace:
 [  543.040472]  [810b9c1d] ? slab_err+0xad/0xd0
 [  543.040528]  [8102e034] ? check_preempt_wakeup+0xa4/0x160
 [  543.040595]  [810ba206] ? slab_pad_check+0x126/0x170
 [  543.040650]  [8133045b] ? dst_destroy+0x8b/0x110
 [  543.040701]  [810ba29a] ? check_slab+0x4a/0xc0
 [  543.040753]  [810baf2d] ? free_debug_processing+0x2d/0x250
 [  543.040808]  [810bb27b] ? __slab_free+0x12b/0x140
 [  543.040862]  [810bbe99] ? kmem_cache_free+0x99/0xa0
 [  543.040915]  [8133045b] ? dst_destroy+0x8b/0x110
 [  543.040967]  [813307f6] ? dst_gc_task+0x196/0x1f0
 [  543.041021]  [8104e954] ? queue_delayed_work_on+0x154/0x160
 [  543.041081]  [813066fe] ? do_dbs_timer+0x20e/0x3d0
 [  543.041133]  [81330660] ? dst_alloc+0x180/0x180
 [  543.041187]  [8104f28b] ? process_one_work+0xfb/0x3b0
 [  543.041242]  [8104f964] ? worker_thread+0x144/0x3d0
 [  543.041296]  [8102cc10] ? __wake_up_common+0x50/0x80
 [  543.041678]  [8104f820] ? rescuer_thread+0x2e0/0x2e0
 [  543.041729]  [8104f820] ? rescuer_thread+0x2e0/0x2e0
 [  543.041782]  [81053436] ? kthread+0x96/0xa0
 [  543.041835]  [813e1d14] ? kernel_thread_helper+0x4/0x10
 [  543.041890]  [810533a0] ? kthread_worker_fn+0x120/0x120
 [  543.041944]  [813e1d10] ? gs_change+0xb/0xb
 [  543.041993]  Padding 0x8803e4217f40:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.042718]  Padding 0x8803e4217f50:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.043433]  Padding 0x8803e4217f60:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.044155]  Padding 0x8803e4217f70:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.044866]  Padding 0x8803e4217f80:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.045590]  Padding 0x8803e4217f90:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.046311]  Padding 0x8803e4217fa0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.047034]  Padding 0x8803e4217fb0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.047755]  Padding 0x8803e4217fc0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.048474]  Padding 0x8803e4217fd0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.049203]  Padding 0x8803e4217fe0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 5a 5a 
 [  543.049909]  Padding 0x8803e4217ff0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 
 5a 5a 5a 5a 5a 00 00 ZZ..
 [  543.050021] FIX ip_dst_cache: Restoring 
 0x8803e4217f40-0x8803e4217fff=0x5a
 [  543.050021]
 
 Dropped -mm, Hugh and Andrea from CC as this does not appear to be mm or 
 ksm related.
 
 I'll pare down the firewall and see if I can make it break easier with a 
 smaller test set.

Hmm, not sure now :(

Could you reproduce another bug please ?



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread Chris Wright
* padmanabh ratnakar (pratnaka...@gmail.com) wrote:
 On Tue, Jun 7, 2011 at 4:04 AM, Chris Wright chr...@sous-sol.org wrote:
  * Alex Williamson (alex.william...@redhat.com) wrote:
  On Mon, 2011-06-06 at 14:39 +0530, padmanabh ratnakar wrote:
   Hi,
           I am using linux kernel 2.6.39. I have a IBM x3650 M3 system.
   I have used following boot options -
   intel_iommu=on iommu=pt
  
   I was loading/unloading my NIC driver(be2net) with num_vfs=7.
  
   After some iterations I get following DMAR errors -
   Jun  4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason
   2d on CPU 0.
   Jun  4 03:50:20 rhel6 kernel: Do you have a strange power saving mode 
   enabled?
   Jun  4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue
   Jun  4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2
   Jun  4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2]
   fault addr 78077000
   Jun  4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in
   context entry is clear
  
   I was trying to debug this. I dont understand iommu code much.
   The physical address belongs the printed PCI function and there should
   not have been an error.
  
   I am unable to see pci_dev(pdev) of VFs getting removed from
   si_domain-devices list(intel-iommu.c)
   when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs.
   Looks like issue happens when when freed pdev is allocated again and
   as it is already in list,
   required initializations dont happen.
  
   I dont know if my understanding is correct. Can anyone point me to
   what the issue may be?
 
  Yes, that's correct.  The (now replaced) check identity_mapping()
  will succeed when the pci_dev is recycled (it's freed, but never
  removed from the list, this is an issue with passtrhough mode and device
  creation/desctruction).  This false match happens w/ a brand new pci_dev
  which still has default 32bit DMA mask, so it is removed from pt domain.
  During removal domain_remove_one_dev_info() test that matches only
  on bus/devfn (now also segment) will match despite the fact that the
  info-pdev != pdev-dev.archdata.iommu.  Then...Oops
 
  Typically devices are removed from the domain via
  drivers/pci/intel-iommu.c:device_notifier(), which is called as the
  device is unbound from the driver.  However, this seems to get skipped
  when running in passthrough mode, so I'm not sure where that's supposed
  to occur.  Does it happen w/o passthrough?
 
 I had tried without passthrough on RHEL 6.1 GA kernel. Was seeing
 hangs and panics. Will check if non passthrough mode works on latest kernel.
 
  If you blacklist the driver then a create/delete may do similar (haven't
  tested that idea).
 
  Also note that some
  intel-iommu fixes have rolled into 3.0.0-rc2, you might want to update
  and see if anything is better there.  Thanks,
 
  The change in identity_mapping() means we won't demote to 32-bit DMA
  (drop out of pt domain), so I don't think we'll see the same issue.
 
 For testing I had made a hack in 2.6.39 kernel which will prevent
 demoting to 32bit DMA mask
 and thereby prevent calling of domain_remove_one_dev_info() for the
 specific VF device I was using
 and it had worked.
 So as you said I may not hit the issue in latest kernel. Will try that.

I think we still leak the list entry though.  Bottom line is that we
need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio scsi host draft specification, v3

2011-06-07 Thread Paolo Bonzini
Hi all,

after some preliminary discussion on the QEMU mailing list, I present a
draft specification for a virtio-based SCSI host (controller, HBA, you
name it).

The virtio SCSI host is the basis of an alternative storage stack for
KVM. This stack would overcome several limitations of the current
solution, virtio-blk:

1) scalability limitations: virtio-blk-over-PCI puts a strong upper
limit on the number of devices that can be added to a guest. Common
configurations have a limit of ~30 devices. While this can be worked
around by implementing a PCI-to-PCI bridge, or by using multifunction
virtio-blk devices, these solutions either have not been implemented
yet, or introduce management restrictions. On the other hand, the SCSI
architecture is well known for its scalability and virtio-scsi supports
advanced feature such as multiqueueing.

2) limited flexibility: virtio-blk does not support all possible storage
scenarios. For example, it does not allow SCSI passthrough or persistent
reservations. In principle, virtio-scsi provides anything that the
underlying SCSI target (be it physical storage, iSCSI or the in-kernel
target) supports.

3) limited extensibility: over the time, many features have been added
to virtio-blk. Each such change requires modifications to the virtio
specification, to the guest drivers, and to the device model in the
host. The virtio-scsi spec has been written to follow SAM conventions,
and exposing new features to the guest will only require changes to the
host's SCSI target implementation.


Comments are welcome.

Paolo 

--- 8 ---


Virtio SCSI Host Device Spec


The virtio SCSI host device groups together one or more simple virtual
devices (ie. disk), and allows communicating to these devices using the
SCSI protocol.  An instance of the device represents a SCSI host with
possibly many buses, targets and LUN attached.

The virtio SCSI device services two kinds of requests:

- command requests for a logical unit;

- task management functions related to a logical unit, target or
command.

The device is also able to send out notifications about added
and removed logical units.

v1:
First public version

v2:
Merged all virtqueues into one, removed separate TARGET fields

v3:
Added configuration information and reworked descriptor structure.
Added back multiqueue on Avi's request, while still leaving TARGET
fields out.  Added dummy event and clarified some aspects of the
event protocol.  First version sent to a wider audience (linux-kernel
and virtio lists).

Configuration
-

Subsystem Device ID
TBD

Virtqueues
0:controlq
1:eventq
2..n:request queues

Feature bits
VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both
read-only and write-only data buffers.

Device configuration layout
struct virtio_scsi_config {
u32 num_queues;
u32 event_info_size;
u32 sense_size;
u32 cdb_size;
}

num_queues is the total number of virtqueues exposed by the
device.  The driver is free to use only one request queue, or
it can use more to achieve better performance.

event_info_size is the maximum size that the device will fill
for buffers that the driver places in the eventq.  The
driver should always put buffers at least of this size.

sense_size is the maximum size of the sense data that the device
will write.  The default value is written by the device and
will always be 96, but the driver can modify it.

cdb_size is the maximum size of the CBD that the driver
will write.  The default value is written by the device and
will always be 32, but the driver can likewise modify it.

Device initialization
-

The initialization routine should first of all discover the device's
virtqueues.

The driver should then place at least a buffer in the eventq.
Buffers returned by the device on the eventq may be referred
to as events in the rest of the document.

The driver can immediately issue requests (for example, INQUIRY or
REPORT LUNS) or task management functions (for example, I_T RESET).

Device operation: request queues


The driver queues requests to an arbitrary request queue, and they are
used by the device on that same queue.

Requests have the following format:

struct virtio_scsi_req_cmd {
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];

u8 sense[sense_size];
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
char datain[];
};

/* command-specific response values */
#define VIRTIO_SCSI_S_OK  0
#define VIRTIO_SCSI_S_UNDERRUN1
#define VIRTIO_SCSI_S_ABORTED 2
#define 

Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread David Woodhouse
On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote:
 I think we still leak the list entry though.  Bottom line is that we
 need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
 happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. 

Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We
should figure out the matching DMAR unit directly from the ACPI table at
ADD_DEVICE time, and store it in pdev-archdata.iommu.

I saw patches which were going in that direction...

-- 
dwmw2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Brad Campbell

On 07/06/11 21:30, Patrick McHardy wrote:

On 07.06.2011 05:33, Brad Campbell wrote:

On 07/06/11 04:10, Bart De Schuymer wrote:

Hi Brad,

This has probably nothing to do with ebtables, so please rmmod in case
it's loaded.
A few questions I didn't directly see an answer to in the threads I
scanned...
I'm assuming you actually use the bridging firewall functionality. So,
what iptables modules do you use? Can you reduce your iptables rules to
a core that triggers the bug?
Or does it get triggered even with an empty set of firewall rules?
Are you using a stock .35 kernel or is it patched?
Is this something I can trigger on a poor guy's laptop or does it
require specialized hardware (I'm catching up on qemu/kvm...)?


Not specialised hardware as such, I've just not been able to reproduce
it outside of this specific operating scenario.


The last similar problem we've had was related to the 32/64 bit compat
code. Are you running 32 bit userspace on a 64 bit kernel?


No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit kernel.

Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is 
current git




I can't trigger it with empty firewall rules as it relies on a DNAT to
occur. If I try it directly to the internal IP address (as I have to
without netfilter loaded) then of course nothing fails.

It's a pain in the bum as a fault, but it's one I can easily reproduce
as long as I use the same set of circumstances.

I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it
on that then I'll attempt to pare down the IPTABLES rules to a bare
minimum.

It is nothing to do with ebtables as I don't compile it. I'm not really
sure about bridging firewall functionality. I just use a couple of
hand coded bash scripts to set the tables up.


 From one of your previous mails:


# CONFIG_BRIDGE_NF_EBTABLES is not set


How about CONFIG_BRIDGE_NETFILTER?



It was compiled in.

With the following table set I was able to reproduce the problem on 
3.0-rc2. Replaced my IP with xxx.xxx.xxx.xxx, but otherwise unmodified


root@srv:~# iptables-save
# Generated by iptables-save v1.4.10 on Tue Jun  7 22:11:30 2011
*filter
:INPUT ACCEPT [978:107619]
:FORWARD ACCEPT [142:7068]
:OUTPUT ACCEPT [1659:291870]
-A INPUT -i ppp0 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT ! -i ppp0 -m state --state NEW -j ACCEPT
-A INPUT -i ppp0 -j DROP
COMMIT
# Completed on Tue Jun  7 22:11:30 2011
# Generated by iptables-save v1.4.10 on Tue Jun  7 22:11:30 2011
*nat
:PREROUTING ACCEPT [813:49170]
:INPUT ACCEPT [91:7090]
:OUTPUT ACCEPT [267:20731]
:POSTROUTING ACCEPT [296:22281]
-A PREROUTING -d xxx.xxx.xxx.xxx/32 ! -i ppp0 -p tcp -m tcp --dport 443 
-j DNAT --to-destination 192.168.253.198

COMMIT
# Completed on Tue Jun  7 22:11:30 2011
# Generated by iptables-save v1.4.10 on Tue Jun  7 22:11:30 2011
*mangle
:PREROUTING ACCEPT [2729:274392]
:INPUT ACCEPT [2508:262976]
:FORWARD ACCEPT [142:7068]
:OUTPUT ACCEPT [1674:293701]
:POSTROUTING ACCEPT [2131:346411]
-A FORWARD -o ppp0 -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss 
1400:1536 -j TCPMSS --clamp-mss-to-pmtu

COMMIT
# Completed on Tue Jun  7 22:11:30 2011

I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access 
the address the way I was doing it, so that's a no-go for me.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread Chris Wright
* David Woodhouse (dw...@infradead.org) wrote:
 On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote:
  I think we still leak the list entry though.  Bottom line is that we
  need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
  happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. 
 
 Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We
 should figure out the matching DMAR unit directly from the ACPI table at
 ADD_DEVICE time, and store it in pdev-archdata.iommu.
 
 I saw patches which were going in that direction...

Cool, where are they?  I'm working on something similar, and missed them.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools, ui: Add simple keyboard support to SDL UI

2011-06-07 Thread Pekka Enberg
This patch wires up hw/i8042.c to the SDL UI for simple guest keyboard support.

Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: John Floren j...@jfloren.net
Cc: Sasha Levin levinsasha...@gmail.com
Signed-off-by: Pekka Enberg penb...@kernel.org
---
 tools/kvm/kvm-run.c |1 +
 tools/kvm/ui/sdl.c  |   76 +++
 2 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c
index 8398287..b688ef7 100644
--- a/tools/kvm/kvm-run.c
+++ b/tools/kvm/kvm-run.c
@@ -643,6 +643,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
}
 
if (sdl) {
+   kbd__init(kvm);
if (fb)
sdl__init(fb);
}
diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c
index bc69ed9..878df1d 100644
--- a/tools/kvm/ui/sdl.c
+++ b/tools/kvm/ui/sdl.c
@@ -1,6 +1,7 @@
 #include kvm/sdl.h
 
 #include kvm/framebuffer.h
+#include kvm/i8042.h
 #include kvm/util.h
 
 #include SDL/SDL.h
@@ -13,6 +14,63 @@ static void sdl__write(struct framebuffer *fb, u64 addr, u8 
*data, u32 len)
memcpy(fb-mem[addr - fb-mem_addr], data, len);
 }
 
+static u8 keymap[255] = {
+   [10]= 0x16, /* 1 */
+   [11]= 0x1e, /* 2 */
+   [12]= 0x26, /* 3 */
+   [13]= 0x25, /* 4 */
+   [14]= 0x27, /* 5 */
+   [15]= 0x36, /* 6 */
+   [16]= 0x3d, /* 7 */
+   [17]= 0x3e, /* 8 */
+   [18]= 0x46, /* 9 */
+   [19]= 0x45, /* 9 */
+
+   [22]= 0x66, /* backspace */
+
+   [24]= 0x15, /* q */
+   [25]= 0x1d, /* w */
+   [26]= 0x24, /* e */
+   [27]= 0x2d, /* r */
+   [28]= 0x2c, /* t */
+   [29]= 0x35, /* y */
+   [30]= 0x3c, /* u */
+   [31]= 0x43, /* i */
+   [32]= 0x44, /* o */
+   [33]= 0x4d, /* p */
+
+   [36]= 0x5a, /* enter */
+
+   [38]= 0x1c, /* a */
+   [39]= 0x1b, /* s */
+   [40]= 0x23, /* d */
+   [41]= 0x2b, /* f */
+   [42]= 0x34, /* g */
+   [43]= 0x33, /* h */
+   [44]= 0x3b, /* j */
+   [45]= 0x42, /* k */
+   [46]= 0x4b, /* l */
+
+   [50]= 0x12, /* left shift */
+
+   [52]= 0x1a, /* z */
+   [53]= 0x22, /* x */
+   [54]= 0x21, /* c */
+   [55]= 0x2a, /* v */
+   [56]= 0x32, /* b */
+   [57]= 0x31, /* n */
+   [58]= 0x3a, /* m */
+
+   [61]= 0x4e, /* - */
+   [62]= 0x59, /* right shift */
+   [65]= 0x29, /* space */
+};
+
+static u8 to_code(u8 scancode)
+{
+   return keymap[scancode];
+}
+
 static void *sdl__thread(void *p)
 {
Uint32 rmask, gmask, bmask, amask;
@@ -43,12 +101,30 @@ static void *sdl__thread(void *p)
for (;;) {
SDL_BlitSurface(guest_screen, NULL, screen, NULL);
SDL_UpdateRect(screen, 0, 0, 0, 0);
+
while (SDL_PollEvent(ev)) {
switch (ev.type) {
+   case SDL_KEYDOWN: {
+   u8 code = to_code(ev.key.keysym.scancode);
+   if (code)
+   kbd_queue(code);
+   else
+   pr_warning(key '%d' not found in 
keymap, ev.key.keysym.scancode);
+   break;
+   }
+   case SDL_KEYUP: {
+   u8 code = to_code(ev.key.keysym.scancode);
+   if (code) {
+   kbd_queue(0xf0);
+   kbd_queue(code);
+   }
+   break;
+   }
case SDL_QUIT:
goto exit;
}
}
+
SDL_Delay(1000 / FRAME_RATE);
}
 exit:
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread David Woodhouse
On Tue, 2011-06-07 at 08:10 -0700, Chris Wright wrote:
 * David Woodhouse (dw...@infradead.org) wrote:
  On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote:
   I think we still leak the list entry though.  Bottom line is that we
   need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
   happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. 
  
  Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We
  should figure out the matching DMAR unit directly from the ACPI table at
  ADD_DEVICE time, and store it in pdev-archdata.iommu.
  
  I saw patches which were going in that direction...
 
 Cool, where are they?  I'm working on something similar, and missed them.

[PATCH] pci, dmar: Update dmar units devices list during hotplug

Alex was working on it.

-- 
dwmw2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Patrick McHardy
On 07.06.2011 16:40, Brad Campbell wrote:
 On 07/06/11 21:30, Patrick McHardy wrote:
 On 07.06.2011 05:33, Brad Campbell wrote:
 On 07/06/11 04:10, Bart De Schuymer wrote:
 Hi Brad,

 This has probably nothing to do with ebtables, so please rmmod in case
 it's loaded.
 A few questions I didn't directly see an answer to in the threads I
 scanned...
 I'm assuming you actually use the bridging firewall functionality. So,
 what iptables modules do you use? Can you reduce your iptables rules to
 a core that triggers the bug?
 Or does it get triggered even with an empty set of firewall rules?
 Are you using a stock .35 kernel or is it patched?
 Is this something I can trigger on a poor guy's laptop or does it
 require specialized hardware (I'm catching up on qemu/kvm...)?

 Not specialised hardware as such, I've just not been able to reproduce
 it outside of this specific operating scenario.

 The last similar problem we've had was related to the 32/64 bit compat
 code. Are you running 32 bit userspace on a 64 bit kernel?
 
 No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit kernel.
 
 Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is
 current git
 
 
 I can't trigger it with empty firewall rules as it relies on a DNAT to
 occur. If I try it directly to the internal IP address (as I have to
 without netfilter loaded) then of course nothing fails.

 It's a pain in the bum as a fault, but it's one I can easily reproduce
 as long as I use the same set of circumstances.

 I'll try using 3.0-rc2 (current git) tonight, and if I can reproduce it
 on that then I'll attempt to pare down the IPTABLES rules to a bare
 minimum.

 It is nothing to do with ebtables as I don't compile it. I'm not really
 sure about bridging firewall functionality. I just use a couple of
 hand coded bash scripts to set the tables up.

  From one of your previous mails:

 # CONFIG_BRIDGE_NF_EBTABLES is not set

 How about CONFIG_BRIDGE_NETFILTER?

 
 It was compiled in.
 
 With the following table set I was able to reproduce the problem on
 3.0-rc2. Replaced my IP with xxx.xxx.xxx.xxx, but otherwise unmodified

Which kernel was the last version without this problem?

 root@srv:~# iptables-save
 # Generated by iptables-save v1.4.10 on Tue Jun  7 22:11:30 2011
 *filter
 :INPUT ACCEPT [978:107619]
 :FORWARD ACCEPT [142:7068]
 :OUTPUT ACCEPT [1659:291870]
 -A INPUT -i ppp0 -m state --state RELATED,ESTABLISHED -j ACCEPT
 -A INPUT ! -i ppp0 -m state --state NEW -j ACCEPT
 -A INPUT -i ppp0 -j DROP
 COMMIT
 # Completed on Tue Jun  7 22:11:30 2011
 # Generated by iptables-save v1.4.10 on Tue Jun  7 22:11:30 2011
 *nat
 :PREROUTING ACCEPT [813:49170]
 :INPUT ACCEPT [91:7090]
 :OUTPUT ACCEPT [267:20731]
 :POSTROUTING ACCEPT [296:22281]
 -A PREROUTING -d xxx.xxx.xxx.xxx/32 ! -i ppp0 -p tcp -m tcp --dport 443
 -j DNAT --to-destination 192.168.253.198
 COMMIT
 # Completed on Tue Jun  7 22:11:30 2011
 # Generated by iptables-save v1.4.10 on Tue Jun  7 22:11:30 2011
 *mangle
 :PREROUTING ACCEPT [2729:274392]
 :INPUT ACCEPT [2508:262976]
 :FORWARD ACCEPT [142:7068]
 :OUTPUT ACCEPT [1674:293701]
 :POSTROUTING ACCEPT [2131:346411]
 -A FORWARD -o ppp0 -p tcp -m tcp --tcp-flags SYN,RST SYN -m tcpmss --mss
 1400:1536 -j TCPMSS --clamp-mss-to-pmtu
 COMMIT
 # Completed on Tue Jun  7 22:11:30 2011

The main suspects would be NAT and TCPMSS. Did you also try whether
the crash occurs with only one of these these rules?

 I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access
 the address the way I was doing it, so that's a no-go for me.

That's really weird since you're apparently not using any bridge
netfilter features. It shouldn't have any effect besides changing
at which point ip_tables is invoked. How are your network devices
configured (specifically any bridges)?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread Chris Wright
* David Woodhouse (dw...@infradead.org) wrote:
 On Tue, 2011-06-07 at 08:10 -0700, Chris Wright wrote:
  * David Woodhouse (dw...@infradead.org) wrote:
   On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote:
I think we still leak the list entry though.  Bottom line is that we
need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. 
   
   Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We
   should figure out the matching DMAR unit directly from the ACPI table at
   ADD_DEVICE time, and store it in pdev-archdata.iommu.
   
   I saw patches which were going in that direction...
  
  Cool, where are they?  I'm working on something similar, and missed them.
 
 [PATCH] pci, dmar: Update dmar units devices list during hotplug

Oh yeah, thanks for the reminder.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread Alex Williamson
On Tue, 2011-06-07 at 16:33 +0100, David Woodhouse wrote:
 On Tue, 2011-06-07 at 08:10 -0700, Chris Wright wrote:
  * David Woodhouse (dw...@infradead.org) wrote:
   On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote:
I think we still leak the list entry though.  Bottom line is that we
need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. 
   
   Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We
   should figure out the matching DMAR unit directly from the ACPI table at
   ADD_DEVICE time, and store it in pdev-archdata.iommu.
   
   I saw patches which were going in that direction...
  
  Cool, where are they?  I'm working on something similar, and missed them.
 
 [PATCH] pci, dmar: Update dmar units devices list during hotplug
 
 Alex was working on it.

Nope, I had a wip patch that did an on-the-fly lookup, that I handed off
to Yinghai, but it didn't actually work.  That's when the suggestion was
made to do it at hotplug, but I'm not pursuing that right now, maybe
Yinghai is?  Thanks,

Alex

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 RFC 4/4] Revert virtio: make add_buf return capacity remaining:

2011-06-07 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 06:43:25PM +0300, Michael S. Tsirkin wrote:
 This reverts commit 3c1b27d5043086a485f8526353ae9fe37bfa1065.
 The only user was virtio_net, and it switched to
 min_capacity instead.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

It turns out another place in virtio_net: receive
buf processing - relies on the old behaviour:

try_fill_recv:
do {
if (vi-mergeable_rx_bufs)
err = add_recvbuf_mergeable(vi, gfp);
else if (vi-big_packets)
err = add_recvbuf_big(vi, gfp);
else
err = add_recvbuf_small(vi, gfp);

oom = err == -ENOMEM;
if (err  0)
break;
++vi-num;
} while (err  0);

The point is to avoid allocating a buf if
the ring is out of space and we are sure
add_buf will fail.

It works well for mergeable buffers and for big
packets if we are not OOM. small packets and
oom will do extra get_page/put_page calls
(but maybe we don't care).

So this is RX, I intend to drop it from this patchset and focus on the
TX side for starters.

 ---
  drivers/virtio/virtio_ring.c |2 +-
  include/linux/virtio.h   |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 23422f1..a6c21eb 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -233,7 +233,7 @@ add_head:
   pr_debug(Added buffer head %i to %p\n, head, vq);
   END_USE(vq);
  
 - return vq-num_free;
 + return 0;
  }
  EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
  
 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index 209220d..63c4908 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -34,7 +34,7 @@ struct virtqueue {
   *   in_num: the number of sg which are writable (after readable ones)
   *   data: the token identifying the buffer.
   *   gfp: how to do memory allocations (if necessary).
 - *  Returns remaining capacity of queue (sg segments) or a negative 
 error.
 + *  Returns 0 on success or a negative error.
   * virtqueue_kick: update after add_buf
   *   vq: the struct virtqueue
   *   After one or more add_buf calls, invoke this to kick the other side.
 -- 
 1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 RFC 3/4] virtio_net: limit xmit polling

2011-06-07 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 06:43:17PM +0300, Michael S. Tsirkin wrote:
 Current code might introduce a lot of latency variation
 if there are many pending bufs at the time we
 attempt to transmit a new one. This is bad for
 real-time applications and can't be good for TCP either.
 
 Free up just enough to both clean up all buffers
 eventually and to be able to xmit the next packet.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com


I've been testing this patch and it seems to work fine
so far. The following fixups are needed to make it
build though:


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b25db1c..77cdf34 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -529,11 +529,8 @@ static bool free_old_xmit_skb(struct virtnet_info *vi)
  * virtqueue_add_buf will succeed. */
 static bool free_xmit_capacity(struct virtnet_info *vi)
 {
-   struct sk_buff *skb;
-   unsigned int len;
-
while (virtqueue_min_capacity(vi-svq)  MAX_SKB_FRAGS + 2)
-   if (unlikely(!free_old_xmit_skb))
+   if (unlikely(!free_old_xmit_skb(vi)))
return false;
return true;
 }
@@ -628,7 +625,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 * Doing this after kick means there's a chance we'll free
 * the skb we have just sent, which is hot in cache. */
for (i = 0; i  2; i++)
-   free_old_xmit_skb(v);
+   free_old_xmit_skb(vi);
 
if (likely(free_xmit_capacity(vi)))
return NETDEV_TX_OK;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 RFC 0/4] virtio and vhost-net capacity handling

2011-06-07 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 06:42:35PM +0300, Michael S. Tsirkin wrote:
 OK, here's a new attempt to use the new capacity api.  I also added more
 comments to clarify the logic.  Hope this is more readable.  Let me know
 pls.
 
 This is on top of the patches applied by Rusty.
 
 Warning: untested. Posting now to give people chance to
 comment on the API.

OK, this seems to have survived some testing so far,
after I dropped patch 4 and fixed build for patch 3
(build fixup patch sent in reply to the original).

I'll be mostly offline until Sunday, would appreciate
testing reports.

git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
virtio-net-xmit-polling-v2
git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git
virtio-net-event-idx-v3

Thanks!

 Changes from v1:
 - fix comment in patch 2 to correct confusion noted by Rusty
 - rewrite patch 3 along the lines suggested by Rusty
   note: it's not exactly the same but I hope it's close
   enough, the main difference is that mine does limited
   polling even in the unlikely xmit failure case.
 - added a patch to not return capacity from add_buf
   it always looked like a weird hack
 
 Michael S. Tsirkin (4):
   virtio_ring: add capacity check API
   virtio_net: fix tx capacity checks using new API
   virtio_net: limit xmit polling
   Revert virtio: make add_buf return capacity remaining:
 
  drivers/net/virtio_net.c |  111 
 ++
  drivers/virtio/virtio_ring.c |   10 +++-
  include/linux/virtio.h   |7 ++-
  3 files changed, 84 insertions(+), 44 deletions(-)
 
 -- 
 1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Bart De Schuymer

Op 7/06/2011 16:40, Brad Campbell schreef:

On 07/06/11 21:30, Patrick McHardy wrote:

On 07.06.2011 05:33, Brad Campbell wrote:

On 07/06/11 04:10, Bart De Schuymer wrote:

Hi Brad,

This has probably nothing to do with ebtables, so please rmmod in case
it's loaded.
A few questions I didn't directly see an answer to in the threads I
scanned...
I'm assuming you actually use the bridging firewall functionality. So,
what iptables modules do you use? Can you reduce your iptables 
rules to

a core that triggers the bug?
Or does it get triggered even with an empty set of firewall rules?
Are you using a stock .35 kernel or is it patched?
Is this something I can trigger on a poor guy's laptop or does it
require specialized hardware (I'm catching up on qemu/kvm...)?


Not specialised hardware as such, I've just not been able to reproduce
it outside of this specific operating scenario.


The last similar problem we've had was related to the 32/64 bit compat
code. Are you running 32 bit userspace on a 64 bit kernel?


No, 32 bit Guest OS, but a completely 64 bit userspace on a 64 bit 
kernel.


Userspace is current Debian Stable. Kernel is Vanilla and qemu-kvm is 
current git


If the bug is easily triggered with your guest os, then you could try to 
capture the traffic with wireshark (or something else) in a 
configuration that doesn't crash your system. Save the traffic in a pcap 
file. Then you can see if resending that traffic in the vulnerable 
configuration triggers the bug (I don't know if something in Windows 
exists, but tcpreplay should work for Linux). Once you have such a 
capture , chances are the bug is even easily reproducible by us (unless 
it's hardware-specific). Success isn't guaranteed, but I think it's 
worth a shot...


cheers,
Bart


--
Bart De Schuymer
www.artinalgorithms.be

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Eric Dumazet
Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit :

 The main suspects would be NAT and TCPMSS. Did you also try whether
 the crash occurs with only one of these these rules?
 
  I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access
  the address the way I was doing it, so that's a no-go for me.
 
 That's really weird since you're apparently not using any bridge
 netfilter features. It shouldn't have any effect besides changing
 at which point ip_tables is invoked. How are your network devices
 configured (specifically any bridges)?

Something in the kernel does 

u16 *ptr = addr (given by kmalloc())

ptr[-1] = 0;

Could be an off-one error in a memmove()/memcopy() or loop...

I cant see a network issue here.

I checked arch/x86/lib/memmove_64.S and it seems fine.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it

2011-06-07 Thread Alex Williamson
On Tue, 2011-06-07 at 10:14 +0200, Jan Kiszka wrote:
 On 2011-06-07 10:06, Avi Kivity wrote:
  On 06/07/2011 01:04 AM, Jan Kiszka wrote:
  On 2011-06-06 23:48, Alex Williamson wrote:
On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
From: Jan Kiszkajan.kis...@siemens.com
  
At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
reset on an assigned device and corrupt its config space. Prevent
this by checking for a host kernel with the required support,
  tagged by
the to-be-introduced KVM_CAP_DEVICE_RESET.
  
Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I
  guess
we don't have an option to do that for .38 since stable is done there,
but there are also some intel-iommu breakages that won't make
  stable for
that release.  It seems like the userspace invoked reset resolves
  known,
demonstrable issues of devices continuing to DMA into guest memory
  while
ed78661f is mostly a theoretical change.
 
  Easier would be this patch. But I don't mind reverting the problematic
  commit in 39, whatever is preferred. We should just resolve the issue
  finally.
  
  Kernel problems should be solved in the kernel (with exceptions of
  course, but don't see the need here).
 
 Then please file a revert for stable ASAP.

How's this?  For stable only or course.  Thanks,

Alex

Revert KVM: Save/restore state of assigned PCI device

From: Alex Williamson alex.william...@redhat.com

This reverts ed78661f2614d3c9f69c23e280db3bafdabdf5bb as it assumes
the saved PCI state will remain valid for the entire length of time
that it is attached to a guest.  This fails when userspace makes use
of the pci-sysfs reset interface, which invalidates the saved device
state, leaving nothing to be restored after the device is reset on
de-assignment.  This leaves the device in an unusable state.

3.0.0 will add an interface for KVM to save the PCI state in a
buffer unaffected by other callers of pci_reset_function(), but the
most appropriate stable fix seems to be reverting this change since
the original assumption about the device saved state persisting is
incorrect.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 virt/kvm/assigned-dev.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)


diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index ae72ae6..e3f1235 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -197,8 +197,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 {
kvm_free_assigned_irq(kvm, assigned_dev);
 
-   __pci_reset_function(assigned_dev-dev);
-   pci_restore_state(assigned_dev-dev);
+   pci_reset_function(assigned_dev-dev);
 
pci_release_regions(assigned_dev-dev);
pci_disable_device(assigned_dev-dev);
@@ -515,7 +514,6 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
}
 
pci_reset_function(dev);
-   pci_save_state(dev);
 
match-assigned_dev_id = assigned_dev-assigned_dev_id;
match-host_segnr = assigned_dev-segnr;
@@ -546,7 +544,6 @@ out:
mutex_unlock(kvm-lock);
return r;
 out_list_del:
-   pci_restore_state(dev);
list_del(match-list);
pci_release_regions(dev);
 out_disable:



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3] Add an isa device for SGA

2011-06-07 Thread Anthony Liguori

On 05/16/2011 01:45 PM, Glauber Costa wrote:

This patch adds a dummy legacy ISA device whose responsibility is to
deploy sgabios, an option rom for a serial graphics adapter.
The proposal is that this device is always-on when -nographics,
but can otherwise be enable in any setup when -device sga is used.

[v2: suggestions on qdev by Markus ]
[v3: cleanups and documentation, per list suggestions ]

Signed-off-by: Glauber Costaglom...@redhat.com


Applied.  But I'd like to figure out what to do about sgabios.bin.  I 
think we should ship a copy.


Regards,

Anthony Liguori


---
  Makefile.target |2 +-
  hw/pc.c |9 
  hw/sga.c|   56 +++
  3 files changed, 66 insertions(+), 1 deletions(-)
  create mode 100644 hw/sga.c

diff --git a/Makefile.target b/Makefile.target
index fdbdc6c..004ea7e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -224,7 +224,7 @@ obj-$(CONFIG_KVM) += ivshmem.o
  # Hardware support
  obj-i386-y += vga.o
  obj-i386-y += mc146818rtc.o i8259.o pc.o
-obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
+obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o
  obj-i386-y += vmport.o
  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
  obj-i386-y += extboot.o
diff --git a/hw/pc.c b/hw/pc.c
index 8d351ba..5a8e00a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1096,6 +1096,15 @@ void pc_vga_init(PCIBus *pci_bus)
  isa_vga_init();
  }
  }
+
+/*
+ * sga does not suppress normal vga output. So a machine can have both a
+ * vga card and sga manually enabled. Output will be seen on both.
+ * For nographic case, sga is enabled at all times
+ */
+if (display_type == DT_NOGRAPHIC) {
+isa_create_simple(sga);
+}
  }

  static void cpu_request_exit(void *opaque, int irq, int level)
diff --git a/hw/sga.c b/hw/sga.c
new file mode 100644
index 000..7ef750a
--- /dev/null
+++ b/hw/sga.c
@@ -0,0 +1,56 @@
+/*
+ * QEMU dummy ISA device for loading sgabios option rom.
+ *
+ * Copyright (c) 2011 Glauber Costa, Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ * sgabios code originally available at code.google.com/p/sgabios
+ *
+ */
+#include pci.h
+#include pc.h
+#include loader.h
+#include sysemu.h
+
+#define SGABIOS_FILENAME sgabios.bin
+
+typedef struct ISAGAState {
+ISADevice dev;
+} ISASGAState;
+
+static int isa_cirrus_vga_initfn(ISADevice *dev)
+{
+rom_add_vga(SGABIOS_FILENAME);
+return 0;
+}
+
+static ISADeviceInfo sga_info = {
+.qdev.name= sga,
+.qdev.desc= Serial Graphics Adapter,
+.qdev.size= sizeof(ISASGAState),
+.init = isa_cirrus_vga_initfn,
+};
+
+static void sga_register(void)
+{
+  isa_qdev_register(sga_info);
+}
+
+device_init(sga_register);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/5] kvm tools: Get rid of spaces in ld script

2011-06-07 Thread Cyrill Gorcunov
Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
---
 tools/kvm/bios/rom.ld.S |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.git/tools/kvm/bios/rom.ld.S
===
--- linux-2.6.git.orig/tools/kvm/bios/rom.ld.S
+++ linux-2.6.git/tools/kvm/bios/rom.ld.S
@@ -11,7 +11,7 @@ PHDRS {
 }
 
 SECTIONS {
-   . = 0;
-   .text : { *(.text) } :text = 0x9090
+   . = 0;
+   .text : { *(.text) } :text = 0x9090
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/5] kvm tools: Reform bios make fules

2011-06-07 Thread Cyrill Gorcunov
Put bios code into bios.s and adjust makefile
rules accordingly. It's more natural than bios-rom.S
(which is now simply a container over real bios code).

Also improve bios deps in Makefile.

Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
---
 tools/kvm/Makefile|   29 +++-
 tools/kvm/bios/bios-rom.S |   95 +++---
 tools/kvm/bios/bios.S |   95 ++
 tools/kvm/bios/gen-offsets.sh |3 -
 4 files changed, 115 insertions(+), 107 deletions(-)

Index: linux-2.6.git/tools/kvm/Makefile
===
--- linux-2.6.git.orig/tools/kvm/Makefile
+++ linux-2.6.git/tools/kvm/Makefile
@@ -82,7 +82,7 @@ DEPS  := $(patsubst %.o,%.d,$(OBJS))
 
 # Exclude BIOS object files from header dependencies.
 OBJS   += bios.o
-OBJS   += bios/bios.o
+OBJS   += bios/bios-rom.o
 
 LIBS   += -lrt
 LIBS   += -lpthread
@@ -165,20 +165,27 @@ BIOS_CFLAGS += -m32
 BIOS_CFLAGS += -march=i386
 BIOS_CFLAGS += -mregparm=3
 
-bios.o: bios/bios-rom.bin
-bios/bios.o: bios/bios.S bios/bios-rom.bin
-   $(E)   CC   $@
-   $(Q) $(CC) -c $(CFLAGS) bios/bios.S -o bios/bios.o
-   
-bios/bios-rom.bin: bios/bios-rom.S bios/e820.c
-   $(E)   CC   $@
+bios.o: bios/bios.bin bios/bios-rom.h
+
+bios/bios.bin.elf: bios/bios.S bios/e820.c bios/int10.c bios/rom.ld.S
+   $(E)   CC   bios/e820.o
$(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s 
bios/e820.c -o bios/e820.o
+   $(E)   CC   bios/int10.o
$(Q) $(CC) -include code16gcc.h $(CFLAGS) $(BIOS_CFLAGS) -c -s 
bios/int10.c -o bios/int10.o
-   $(Q) $(CC) $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/bios-rom.S -o 
bios/bios-rom.o
+   $(E)   CC   bios/bios.o
+   $(Q) $(CC) $(CFLAGS) $(BIOS_CFLAGS) -c -s bios/bios.S -o bios/bios.o
$(E)   LD   $@
-   $(Q) ld -T bios/rom.ld.S -o bios/bios-rom.bin.elf bios/bios-rom.o 
bios/e820.o bios/int10.o
+   $(Q) ld -T bios/rom.ld.S -o bios/bios.bin.elf bios/bios.o bios/e820.o 
bios/int10.o
+
+bios/bios.bin: bios/bios.bin.elf
$(E)   OBJCOPY  $@
-   $(Q) objcopy -O binary -j .text bios/bios-rom.bin.elf bios/bios-rom.bin
+   $(Q) objcopy -O binary -j .text bios/bios.bin.elf bios/bios.bin
+
+bios/bios-rom.o: bios/bios-rom.S bios/bios.bin bios/bios-rom.h
+   $(E)   CC   $@
+   $(Q) $(CC) -c $(CFLAGS) bios/bios-rom.S -o bios/bios-rom.o
+
+bios/bios-rom.h: bios/bios.bin.elf
$(E)   NM   $@
$(Q) cd bios  sh gen-offsets.sh  bios-rom.h  cd ..
 
Index: linux-2.6.git/tools/kvm/bios/bios-rom.S
===
--- linux-2.6.git.orig/tools/kvm/bios/bios-rom.S
+++ linux-2.6.git/tools/kvm/bios/bios-rom.S
@@ -1,89 +1,12 @@
-/*
- * Our pretty trivial BIOS emulation
- */
-
-#include kvm/bios.h
 #include kvm/assembly.h
 
.org 0
-   .code16gcc
-
-#include macro.S
-
-/*
- * fake interrupt handler, nothing can be faster ever
- */
-ENTRY(bios_intfake)
-   IRET
-ENTRY_END(bios_intfake)
-
-/*
- * int 10 - video - service
- */
-ENTRY(bios_int10)
-   pushw   %fs
-   pushl   %es
-   pushl   %edi
-   pushl   %esi
-   pushl   %ebp
-   pushl   %esp
-   pushl   %edx
-   pushl   %ecx
-   pushl   %ebx
-   pushl   %eax
-
-   movl%esp, %eax
-   /* this is way easier than doing it in assembly */
-   /* just push all the regs and jump to a C handler */
-   callint10_handler
-
-   popl%eax
-   popl%ebx
-   popl%ecx
-   popl%edx
-   popl%esp
-   popl%ebp
-   popl%esi
-   popl%edi
-   popl%es
-   popw%fs
-
-   IRET
-ENTRY_END(bios_int10)
-
-#define EFLAGS_CF  (1  0)
-
-ENTRY(bios_int15)
-   cmp $0xE820, %eax
-   jne 1f
-
-   pushw   %fs
-
-   pushl   %edx
-   pushl   %ecx
-   pushl   %edi
-   pushl   %ebx
-   pushl   %eax
-
-   movl%esp, %eax  # it's bioscall case
-   calle820_query_map
-
-   popl%eax
-   popl%ebx
-   popl%edi
-   popl%ecx
-   popl%edx
-
-   popw%fs
-
-   /* Clear CF */
-   andl$~EFLAGS_CF, 0x4(%esp)
-1:
-   IRET
-ENTRY_END(bios_int15)
-
-GLOBAL(__locals)
-
-#include local.S
-
-END(__locals)
+#ifdef CONFIG_X86_64
+   .code64
+#else
+   .code32
+#endif
+
+GLOBAL(bios_rom)
+   .incbin bios/bios.bin
+END(bios_rom)
Index: linux-2.6.git/tools/kvm/bios/bios.S
===
--- linux-2.6.git.orig/tools/kvm/bios/bios.S
+++ linux-2.6.git/tools/kvm/bios/bios.S
@@ -1,12 +1,89 @@
+/*
+ * Our pretty trivial BIOS emulation
+ */
+
+#include kvm/bios.h
 #include kvm/assembly.h
 
.org 0
-#ifdef CONFIG_X86_64
-   .code64
-#else
-   .code32
-#endif
-
-GLOBAL(bios_rom)
-   .incbin bios/bios-rom.bin

[patch 1/5] kvm tools: Options parser to handle hex numbers

2011-06-07 Thread Cyrill Gorcunov
Some kernel parameters are convenient if passed in
hex form so our options parser should handle even
such form of input.

Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
---
 tools/kvm/util/parse-options.c |  102 -
 1 file changed, 82 insertions(+), 20 deletions(-)

Index: linux-2.6.git/tools/kvm/util/parse-options.c
===
--- linux-2.6.git.orig/tools/kvm/util/parse-options.c
+++ linux-2.6.git/tools/kvm/util/parse-options.c
@@ -39,6 +39,84 @@ static int get_arg(struct parse_opt_ctx_
return 0;
 }
 
+#define numvalue(c)\
+   ((c) = 'a' ? (c) - 'a' + 10 :  \
+(c) = 'A' ? (c) - 'A' + 10 : (c) - '0')
+
+static u64 readhex(const char *str, bool *error)
+{
+   char *pos = strchr(str, 'x') + 1;
+   u64 res = 0;
+
+   while (*pos) {
+   unsigned int v = numvalue(*pos);
+   if (v  16) {
+   *error = true;
+   return 0;
+   }
+
+   res = (res * 16) + v;
+   pos++;
+   }
+
+   *error = false;
+   return res;
+}
+
+static int readnum(const struct option *opt, int flags,
+  const char *str, char **end)
+{
+   if (strchr(str, 'x')) {
+   bool error;
+   u64 value;
+
+   value = readhex(str, error);
+   if (error)
+   goto enotnum;
+
+   switch (opt-type) {
+   case OPTION_INTEGER:
+   *(int *)opt-value = value;
+   break;
+   case OPTION_UINTEGER:
+   *(unsigned int *)opt-value = value;
+   break;
+   case OPTION_LONG:
+   *(long *)opt-value = value;
+   break;
+   case OPTION_U64:
+   *(u64 *)opt-value = value;
+   break;
+   default:
+   goto invcall;
+   }
+   } else {
+   switch (opt-type) {
+   case OPTION_INTEGER:
+   *(int *)opt-value = strtol(str, end, 10);
+   break;
+   case OPTION_UINTEGER:
+   *(unsigned int *)opt-value = strtol(str, end, 10);
+   break;
+   case OPTION_LONG:
+   *(long *)opt-value = strtol(str, end, 10);
+   break;
+   case OPTION_U64:
+   *(u64 *)opt-value = strtoull(str, end, 10);
+   break;
+   default:
+   goto invcall;
+   }
+   }
+
+   return 0;
+
+enotnum:
+   return opterror(opt, expects a numerical value, flags);
+invcall:
+   return opterror(opt, invalid numeric conversion, flags);
+}
+
 static int get_value(struct parse_opt_ctx_t *p,
const struct option *opt, int flags)
 {
@@ -131,11 +209,7 @@ static int get_value(struct parse_opt_ct
}
if (get_arg(p, opt, flags, arg))
return -1;
-   *(int *)opt-value = strtol(arg, (char **)s, 10);
-   if (*s)
-   return opterror(opt, expects a numerical value,
-   flags);
-   return 0;
+   return readnum(opt, flags, arg, (char **)s);
 
case OPTION_UINTEGER:
if (unset) {
@@ -148,11 +222,7 @@ static int get_value(struct parse_opt_ct
}
if (get_arg(p, opt, flags, arg))
return -1;
-   *(unsigned int *)opt-value = strtol(arg, (char **)s, 10);
-   if (*s)
-   return opterror(opt,
-   expects a numerical value, flags);
-   return 0;
+   return readnum(opt, flags, arg, (char **)s);
 
case OPTION_LONG:
if (unset) {
@@ -165,11 +235,7 @@ static int get_value(struct parse_opt_ct
}
if (get_arg(p, opt, flags, arg))
return -1;
-   *(long *)opt-value = strtol(arg, (char **)s, 10);
-   if (*s)
-   return opterror(opt,
-   expects a numerical value, flags);
-   return 0;
+   return readnum(opt, flags, arg, (char **)s);
 
case OPTION_U64:
if (unset) {
@@ -182,11 +248,7 @@ static int get_value(struct parse_opt_ct
}
if (get_arg(p, opt, flags, arg))
return -1;
-   *(u64 *)opt-value = strtoull(arg, (char **)s, 10);
-   if (*s)
-   return opterror(opt,
-   

[patch 3/5] kvm tools: Delete dangling cursor from int10

2011-06-07 Thread Cyrill Gorcunov
Noone use it anymore. Also cleanup comment on
int10 as well, int10_handler routine do all
the hard work.

Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
---
 tools/kvm/bios/bios-rom.S |   14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

Index: linux-2.6.git/tools/kvm/bios/bios-rom.S
===
--- linux-2.6.git.orig/tools/kvm/bios/bios-rom.S
+++ linux-2.6.git/tools/kvm/bios/bios-rom.S
@@ -18,13 +18,7 @@ ENTRY(bios_intfake)
 ENTRY_END(bios_intfake)
 
 /*
- * int 10 - video - write character and advance cursor (tty write)
- * ah = 0eh
- * al = character
- * bh = display page (alpha modes)
- * bl = foreground color (graphics modes)
- *
- * We ignore bx settings
+ * int 10 - video - service
  */
 ENTRY(bios_int10)
pushw   %fs
@@ -55,12 +49,6 @@ ENTRY(bios_int10)
popw%fs
 
IRET
-
-
-/*
- * private IRQ data
- */
-cursor:.long 0
 ENTRY_END(bios_int10)
 
 #define EFLAGS_CF  (1  0)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/5] kvm tools: Introduce vidmode parmeter

2011-06-07 Thread Cyrill Gorcunov
Usually this might be set by loader but since
we're the loader lets allow to specify vesa
mode as well.

Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
---
 tools/kvm/kvm-run.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

Index: linux-2.6.git/tools/kvm/kvm-run.c
===
--- linux-2.6.git.orig/tools/kvm/kvm-run.c
+++ linux-2.6.git/tools/kvm/kvm-run.c
@@ -80,6 +80,7 @@ extern int  active_console;
 bool do_debug_print = false;
 
 static int nrcpus;
+static int vidmode = 0x312;
 
 static const char * const run_usage[] = {
kvm run [options] [kernel image],
@@ -139,6 +140,10 @@ static const struct option options[] = {
OPT_STRING('\0', tapscript, script, Script path,
 Assign a script to process created tap device),
 
+   OPT_GROUP(BIOS options:),
+   OPT_INTEGER('\0', vidmode, vidmode,
+   Video mode),
+
OPT_GROUP(Debug options:),
OPT_BOOLEAN('\0', debug, do_debug_print,
Enable debug messages),
@@ -434,7 +439,6 @@ int kvm_cmd_run(int argc, const char **a
struct framebuffer *fb = NULL;
unsigned int nr_online_cpus;
int exit_code = 0;
-   u16 vidmode = 0;
int max_cpus;
char *hi;
int i;
@@ -541,12 +545,10 @@ int kvm_cmd_run(int argc, const char **a
 
memset(real_cmdline, 0, sizeof(real_cmdline));
strcpy(real_cmdline, notsc noapic noacpi pci=conf1);
-   if (vnc || sdl) {
+   if (vnc || sdl)
strcat(real_cmdline,  video=vesafb console=tty0);
-   vidmode = 0x312;
-   } else {
+   else
strcat(real_cmdline,  console=ttyS0 earlyprintk=serial);
-   }
strcat(real_cmdline,  );
if (kernel_cmdline)
strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline));

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/5] kvm tools: A few fixes

2011-06-07 Thread Cyrill Gorcunov
Nothing serious, please review. Thanks.

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] kvm tools: Introduce vidmode parmeter

2011-06-07 Thread Pekka Enberg

On Tue, 7 Jun 2011, Cyrill Gorcunov wrote:

Usually this might be set by loader but since
we're the loader lets allow to specify vesa
mode as well.

Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com


This patch causes 'make check' to go crazy and print out bunch of these:

Warning: Ignoring MMIO write at d0031f40 (length 4)
Warning: Ignoring MMIO write at d0031f44 (length 4)
Warning: Ignoring MMIO write at d0031f48 (length 4)
Warning: Ignoring MMIO write at d0031f4c (length 4)
Warning: Ignoring MMIO write at d0031f50 (length 4)
Warning: Ignoring MMIO write at d0031f54 (length 4)
Warning: Ignoring MMIO write at d0031f58 (length 4)
Warning: Ignoring MMIO write at d0031f5c (length 4)
Warning: Ignoring MMIO write at d0031f60 (length 4)
Warning: Ignoring MMIO write at d0031f64 (length 4)
Warning: Ignoring MMIO write at d0031f68 (length 4)
Warning: Ignoring MMIO write at d0031f6c (length 4)
Warning: Ignoring MMIO write at d0031f70 (length 4)
Warning: Ignoring MMIO write at d0031f74 (length 4)
Warning: Ignoring MMIO write at d0031f78 (length 4)
Warning: Ignoring MMIO write at d0031f7c (length 4)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] kvm tools: Introduce vidmode parmeter

2011-06-07 Thread Cyrill Gorcunov
On Tue, Jun 07, 2011 at 10:53:28PM +0300, Pekka Enberg wrote:
 On Tue, 7 Jun 2011, Cyrill Gorcunov wrote:
 Usually this might be set by loader but since
 we're the loader lets allow to specify vesa
 mode as well.
 
 Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
 
 This patch causes 'make check' to go crazy and print out bunch of these:
 
 Warning: Ignoring MMIO write at d0031f40 (length 4)
 

Hmm, weird...

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] kvm tools: Introduce vidmode parmeter

2011-06-07 Thread Cyrill Gorcunov
On Tue, Jun 07, 2011 at 10:53:28PM +0300, Pekka Enberg wrote:
 On Tue, 7 Jun 2011, Cyrill Gorcunov wrote:
 Usually this might be set by loader but since
 we're the loader lets allow to specify vesa
 mode as well.
 
 Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
 
 This patch causes 'make check' to go crazy and print out bunch of these:


Pekka, are you sure it's because of _this_ particular patch?

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] kvm tools: Introduce vidmode parmeter

2011-06-07 Thread Cyrill Gorcunov
On Wed, Jun 08, 2011 at 12:10:30AM +0400, Cyrill Gorcunov wrote:
 On Tue, Jun 07, 2011 at 10:53:28PM +0300, Pekka Enberg wrote:
  On Tue, 7 Jun 2011, Cyrill Gorcunov wrote:
  Usually this might be set by loader but since
  we're the loader lets allow to specify vesa
  mode as well.
  
  Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
  
  This patch causes 'make check' to go crazy and print out bunch of these:
 
 
 Pekka, are you sure it's because of _this_ particular patch?
 
   Cyrill

This one should do the trick, cant say I like it, we probably need some
default values from options parser, ie to extend it.

Cyrill
---
kvm tools: Introduce vidmode parmeter v2

Usually this might be set by loader but since
we're the loader lets allow to specify vesa
mode as well.

v2: Pekka spotted the default value was being compromised,
so revert it back and set only if specified.

Signed-off-by: Cyrill Gorcunov gorcu...@gmail.com
---
 tools/kvm/kvm-run.c |   20 
 1 file changed, 16 insertions(+), 4 deletions(-)

Index: linux-2.6.git/tools/kvm/kvm-run.c
===
--- linux-2.6.git.orig/tools/kvm/kvm-run.c
+++ linux-2.6.git/tools/kvm/kvm-run.c
@@ -80,6 +80,7 @@ extern int  active_console;
 bool do_debug_print = false;
 
 static int nrcpus;
+static int vidmode = -1;
 
 static const char * const run_usage[] = {
kvm run [options] [kernel image],
@@ -139,6 +140,10 @@ static const struct option options[] = {
OPT_STRING('\0', tapscript, script, Script path,
 Assign a script to process created tap device),
 
+   OPT_GROUP(BIOS options:),
+   OPT_INTEGER('\0', vidmode, vidmode,
+   Video mode),
+
OPT_GROUP(Debug options:),
OPT_BOOLEAN('\0', debug, do_debug_print,
Enable debug messages),
@@ -434,7 +439,6 @@ int kvm_cmd_run(int argc, const char **a
struct framebuffer *fb = NULL;
unsigned int nr_online_cpus;
int exit_code = 0;
-   u16 vidmode = 0;
int max_cpus;
char *hi;
int i;
@@ -539,14 +543,22 @@ int kvm_cmd_run(int argc, const char **a
 
kvm-nrcpus = nrcpus;
 
+   /*
+* vidmode should be either specified
+* either set by default
+*/
+   if (vnc || sdl) {
+   if (vidmode == -1)
+   vidmode = 0x312;
+   } else
+   vidmode = 0;
+
memset(real_cmdline, 0, sizeof(real_cmdline));
strcpy(real_cmdline, notsc noapic noacpi pci=conf1);
if (vnc || sdl) {
strcat(real_cmdline,  video=vesafb console=tty0);
-   vidmode = 0x312;
-   } else {
+   } else
strcat(real_cmdline,  console=ttyS0 earlyprintk=serial);
-   }
strcat(real_cmdline,  );
if (kernel_cmdline)
strlcat(real_cmdline, kernel_cmdline, sizeof(real_cmdline));
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK

2011-06-07 Thread Alexander Graf
KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.

Signed-off-by: Alexander Graf ag...@suse.de
---
 kernel/compat.c |1 +
 virt/kvm/kvm_main.c |   50 +-
 2 files changed, 50 insertions(+), 1 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index 9214dcd..506e176 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat)
case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1])  32 );
}
 }
+EXPORT_SYMBOL_GPL(sigset_from_compat);
 
 asmlinkage long
 compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f78ddb8..f03db82 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -84,6 +84,8 @@ struct dentry *kvm_debugfs_dir;
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
   unsigned long arg);
+static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl,
+ unsigned long arg);
 static int hardware_enable_all(void);
 static void hardware_disable_all(void);
 
@@ -1585,7 +1587,9 @@ static int kvm_vcpu_release(struct inode *inode, struct 
file *filp)
 static struct file_operations kvm_vcpu_fops = {
.release= kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
-   .compat_ioctl   = kvm_vcpu_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = kvm_vcpu_compat_ioctl,
+#endif
.mmap   = kvm_vcpu_mmap,
.llseek = noop_llseek,
 };
@@ -1874,6 +1878,50 @@ out:
return r;
 }
 
+#ifdef CONFIG_COMPAT
+static long kvm_vcpu_compat_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+   struct kvm_vcpu *vcpu = filp-private_data;
+   void __user *argp = (void __user *)arg;
+   int r;
+
+   if (vcpu-kvm-mm != current-mm)
+   return -EIO;
+
+   switch (ioctl) {
+   case KVM_SET_SIGNAL_MASK: {
+   struct kvm_signal_mask __user *sigmask_arg = argp;
+   struct kvm_signal_mask kvm_sigmask;
+   compat_sigset_t csigset;
+   sigset_t sigset;
+
+   if (argp) {
+   r = -EFAULT;
+   if (copy_from_user(kvm_sigmask, argp,
+  sizeof kvm_sigmask))
+   goto out;
+   r = -EINVAL;
+   if (kvm_sigmask.len != sizeof csigset)
+   goto out;
+   r = -EFAULT;
+   if (copy_from_user(csigset, sigmask_arg-sigset,
+  sizeof csigset))
+   goto out;
+   }
+   sigset_from_compat(sigset, csigset);
+   r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset);
+   break;
+   }
+   default:
+   r = kvm_vcpu_ioctl(filp, ioctl, arg);
+   }
+
+out:
+   return r;
+}
+#endif
+
 static long kvm_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3] Add an isa device for SGA

2011-06-07 Thread Glauber Costa

On 06/07/2011 04:17 PM, Anthony Liguori wrote:

On 05/16/2011 01:45 PM, Glauber Costa wrote:

This patch adds a dummy legacy ISA device whose responsibility is to
deploy sgabios, an option rom for a serial graphics adapter.
The proposal is that this device is always-on when -nographics,
but can otherwise be enable in any setup when -device sga is used.

[v2: suggestions on qdev by Markus ]
[v3: cleanups and documentation, per list suggestions ]

Signed-off-by: Glauber Costaglom...@redhat.com


Applied. But I'd like to figure out what to do about sgabios.bin. I
think we should ship a copy.


Agree.


Regards,

Anthony Liguori


---
Makefile.target | 2 +-
hw/pc.c | 9 
hw/sga.c | 56 +++
3 files changed, 66 insertions(+), 1 deletions(-)
create mode 100644 hw/sga.c

diff --git a/Makefile.target b/Makefile.target
index fdbdc6c..004ea7e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -224,7 +224,7 @@ obj-$(CONFIG_KVM) += ivshmem.o
# Hardware support
obj-i386-y += vga.o
obj-i386-y += mc146818rtc.o i8259.o pc.o
-obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
+obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o
obj-i386-y += vmport.o
obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
obj-i386-y += extboot.o
diff --git a/hw/pc.c b/hw/pc.c
index 8d351ba..5a8e00a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1096,6 +1096,15 @@ void pc_vga_init(PCIBus *pci_bus)
isa_vga_init();
}
}
+
+ /*
+ * sga does not suppress normal vga output. So a machine can have both a
+ * vga card and sga manually enabled. Output will be seen on both.
+ * For nographic case, sga is enabled at all times
+ */
+ if (display_type == DT_NOGRAPHIC) {
+ isa_create_simple(sga);
+ }
}

static void cpu_request_exit(void *opaque, int irq, int level)
diff --git a/hw/sga.c b/hw/sga.c
new file mode 100644
index 000..7ef750a
--- /dev/null
+++ b/hw/sga.c
@@ -0,0 +1,56 @@
+/*
+ * QEMU dummy ISA device for loading sgabios option rom.
+ *
+ * Copyright (c) 2011 Glauber Costa, Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person
obtaining a copy
+ * of this software and associated documentation files (the
Software), to deal
+ * in the Software without restriction, including without limitation
the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN
+ * THE SOFTWARE.
+ *
+ * sgabios code originally available at code.google.com/p/sgabios
+ *
+ */
+#include pci.h
+#include pc.h
+#include loader.h
+#include sysemu.h
+
+#define SGABIOS_FILENAME sgabios.bin
+
+typedef struct ISAGAState {
+ ISADevice dev;
+} ISASGAState;
+
+static int isa_cirrus_vga_initfn(ISADevice *dev)
+{
+ rom_add_vga(SGABIOS_FILENAME);
+ return 0;
+}
+
+static ISADeviceInfo sga_info = {
+ .qdev.name = sga,
+ .qdev.desc = Serial Graphics Adapter,
+ .qdev.size = sizeof(ISASGAState),
+ .init = isa_cirrus_vga_initfn,
+};
+
+static void sga_register(void)
+{
+ isa_qdev_register(sga_info);
+}
+
+device_init(sga_register);




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK

2011-06-07 Thread Arnd Bergmann
On Tuesday 07 June 2011 22:25:15 Alexander Graf wrote:
 +static long kvm_vcpu_compat_ioctl(struct file *filp,
 + unsigned int ioctl, unsigned long arg)
 +{
 +   struct kvm_vcpu *vcpu = filp-private_data;
 +   void __user *argp = (void __user *)arg;

Converting a compat user argument into a pointer should use the
compat_ptr() function to do the right thing on s390. Otherwise
your patch looks good.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Patrick McHardy
On 07.06.2011 20:31, Eric Dumazet wrote:
 Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit :
 
 The main suspects would be NAT and TCPMSS. Did you also try whether
 the crash occurs with only one of these these rules?

 I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access
 the address the way I was doing it, so that's a no-go for me.

 That's really weird since you're apparently not using any bridge
 netfilter features. It shouldn't have any effect besides changing
 at which point ip_tables is invoked. How are your network devices
 configured (specifically any bridges)?
 
 Something in the kernel does 
 
 u16 *ptr = addr (given by kmalloc())
 
 ptr[-1] = 0;
 
 Could be an off-one error in a memmove()/memcopy() or loop...
 
 I cant see a network issue here.

So far me neither, but netfilter appears to trigger the bug.

 I checked arch/x86/lib/memmove_64.S and it seems fine.

I was thinking it might be a missing skb_make_writable() combined
with vhost_net specifics in the netfilter code (TCPMSS and NAT are
both suspect), but was unable to find something. I also went
through the dst_metrics() conversion to see whether anything could
cause problems with the bridge fake_rttable, but also nothing
so far.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Brad Campbell

On 08/06/11 02:04, Bart De Schuymer wrote:


If the bug is easily triggered with your guest os, then you could try to
capture the traffic with wireshark (or something else) in a
configuration that doesn't crash your system. Save the traffic in a pcap
file. Then you can see if resending that traffic in the vulnerable
configuration triggers the bug (I don't know if something in Windows
exists, but tcpreplay should work for Linux). Once you have such a
capture , chances are the bug is even easily reproducible by us (unless
it's hardware-specific). Success isn't guaranteed, but I think it's
worth a shot...


The issue with this is I don't have a configuration that does not crash 
the system. This only happens under the specific circumstance that 
traffic from VM A is being DNAT'd to VM B. If I disable 
CONFIG_BRIDGE_NETFILTER, or I leave out the DNAT then I can't replicate 
the problem as I don't seem to be able to get the packets to go where I 
want them to go.


Let me try and explain it a little more clearly with made up IP 
addresses to illustrate the problem.


I have VM A (1.1.1.2) and VM B (1.1.1.3) on br1 (1.1.1.1)
I have public IP on ppp0 (2.2.2.2).

VM B can talk to VM A using its host address (1.1.1.2) and there is no 
problem.


The DNAT says anything destined for PPP0 that is on port 443 and coming 
from anywhere other than PPP0 (ie inside the network) is to be DNAT'd to 
1.1.1.3.


So VM B (1.1.1.3) tries to connect to ppp0 (2.2.2.2) on port 443, and 
this is redirected to VM B on 1.1.1.2.


Only under this specific circumstance does the problem occur. I can get 
VM B (1.1.1.3) to talk directly to VM A (1.1.1.2) all day long and there 
is no problem, it's only when VM B tries to talk to ppp0 that there is 
an issue (and it happens within seconds of the initial connection).


All these tests have been performed with VM B being a Windows XP guest. 
Tonight I'll try it with a Linux guest and see if I can make it happen. 
If that works I might be able to come up with some reproducible test 
case for you. I have a desktop machine that has Intel VT extensions, so 
I'll work toward making a portable test case.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Brad Campbell

On 08/06/11 06:57, Patrick McHardy wrote:

On 07.06.2011 20:31, Eric Dumazet wrote:

Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit :


The main suspects would be NAT and TCPMSS. Did you also try whether
the crash occurs with only one of these these rules?


I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access
the address the way I was doing it, so that's a no-go for me.


That's really weird since you're apparently not using any bridge
netfilter features. It shouldn't have any effect besides changing
at which point ip_tables is invoked. How are your network devices
configured (specifically any bridges)?


Something in the kernel does

u16 *ptr = addr (given by kmalloc())

ptr[-1] = 0;

Could be an off-one error in a memmove()/memcopy() or loop...

I cant see a network issue here.


So far me neither, but netfilter appears to trigger the bug.


Would it help if I tried some older kernels? This issue only surfaced 
for me recently as I only installed the VM's in question about 12 weeks 
ago and have only just started really using them in anger. I could try 
reproducing it on progressively older kernels to see if I can find one 
that works and then bisect from there.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Brad Campbell

On 07/06/11 23:35, Patrick McHardy wrote:


The main suspects would be NAT and TCPMSS. Did you also try whether
the crash occurs with only one of these these rules?


To be honest I'm actually having trouble finding where TCPMSS is 
actually set in that ruleset. This is a production machine so I can only 
take it down after about 9PM at night. I'll have another crack at it 
tonight.



I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access
the address the way I was doing it, so that's a no-go for me.


That's really weird since you're apparently not using any bridge
netfilter features. It shouldn't have any effect besides changing
at which point ip_tables is invoked. How are your network devices
configured (specifically any bridges)?



I have one bridge with all my virtual machines on it.

In this particular instance the packets leave VM A destined for the IP 
address of ppp0 (the external interface). This is intercepted by the 
DNAT PREROUTING rule above and shunted back to VM B.


The VM's are on br1 and the external address is ppp0. Without 
CONFIG_BRIDGE_NETFILTER compiled in I can see the traffic entering and 
leaving VM B with tcpdump, but the packets never seem to get back to VM A.


VM A is XP 32 bit, VM B is Linux. I have some other Linux VM's, so I'll 
do some more testing tonight between those to see where the packets are 
going without CONFIG_BRIDGE_NETFILTER set.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK

2011-06-07 Thread Alexander Graf
KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - use compat_ptr
  - only declare compat call with CONFIG_COMPAT
---
 kernel/compat.c |1 +
 virt/kvm/kvm_main.c |   52 ++-
 2 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index 9214dcd..506e176 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat)
case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1])  32 );
}
 }
+EXPORT_SYMBOL_GPL(sigset_from_compat);
 
 asmlinkage long
 compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f78ddb8..04dfce9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -84,6 +84,10 @@ struct dentry *kvm_debugfs_dir;
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
   unsigned long arg);
+#ifdef CONFIG_COMPAT
+static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl,
+ unsigned long arg);
+#endif
 static int hardware_enable_all(void);
 static void hardware_disable_all(void);
 
@@ -1585,7 +1589,9 @@ static int kvm_vcpu_release(struct inode *inode, struct 
file *filp)
 static struct file_operations kvm_vcpu_fops = {
.release= kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
-   .compat_ioctl   = kvm_vcpu_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = kvm_vcpu_compat_ioctl,
+#endif
.mmap   = kvm_vcpu_mmap,
.llseek = noop_llseek,
 };
@@ -1874,6 +1880,50 @@ out:
return r;
 }
 
+#ifdef CONFIG_COMPAT
+static long kvm_vcpu_compat_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+   struct kvm_vcpu *vcpu = filp-private_data;
+   void __user *argp = compat_ptr(arg);
+   int r;
+
+   if (vcpu-kvm-mm != current-mm)
+   return -EIO;
+
+   switch (ioctl) {
+   case KVM_SET_SIGNAL_MASK: {
+   struct kvm_signal_mask __user *sigmask_arg = argp;
+   struct kvm_signal_mask kvm_sigmask;
+   compat_sigset_t csigset;
+   sigset_t sigset;
+
+   if (argp) {
+   r = -EFAULT;
+   if (copy_from_user(kvm_sigmask, argp,
+  sizeof kvm_sigmask))
+   goto out;
+   r = -EINVAL;
+   if (kvm_sigmask.len != sizeof csigset)
+   goto out;
+   r = -EFAULT;
+   if (copy_from_user(csigset, sigmask_arg-sigset,
+  sizeof csigset))
+   goto out;
+   }
+   sigset_from_compat(sigset, csigset);
+   r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset);
+   break;
+   }
+   default:
+   r = kvm_vcpu_ioctl(filp, ioctl, arg);
+   }
+
+out:
+   return r;
+}
+#endif
+
 static long kvm_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 RFC 4/4] Revert virtio: make add_buf return capacity remaining:

2011-06-07 Thread Rusty Russell
On Tue, 7 Jun 2011 18:54:57 +0300, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Jun 02, 2011 at 06:43:25PM +0300, Michael S. Tsirkin wrote:
  This reverts commit 3c1b27d5043086a485f8526353ae9fe37bfa1065.
  The only user was virtio_net, and it switched to
  min_capacity instead.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 It turns out another place in virtio_net: receive
 buf processing - relies on the old behaviour:
 
 try_fill_recv:
   do {
   if (vi-mergeable_rx_bufs)
   err = add_recvbuf_mergeable(vi, gfp);
   else if (vi-big_packets)
   err = add_recvbuf_big(vi, gfp);
   else
   err = add_recvbuf_small(vi, gfp);
 
   oom = err == -ENOMEM;
   if (err  0)
   break;
   ++vi-num;
   } while (err  0);
 
 The point is to avoid allocating a buf if
 the ring is out of space and we are sure
 add_buf will fail.
 
 It works well for mergeable buffers and for big
 packets if we are not OOM. small packets and
 oom will do extra get_page/put_page calls
 (but maybe we don't care).
 
 So this is RX, I intend to drop it from this patchset and focus on the
 TX side for starters.

We could do some hack where we get the capacity, and estimate how many
packets we need to fill it, then try to do that many.

I say hack, because knowing whether we're doing indirect buffers is a
layering violation.  But that's life when you're trying to do
microoptimizations.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio-spec: Fix wrong bit number of device status

2011-06-07 Thread Rusty Russell
On Tue, 7 Jun 2011 21:09:42 +0800, Amos Kong ak...@redhat.com wrote:
 
 qemu-kvm/hw/virtio_config.h:
  #define VIRTIO_CONFIG_S_ACKNOWLEDGE 1
  #define VIRTIO_CONFIG_S_DRIVER  2
  #define VIRTIO_CONFIG_S_DRIVER_OK   4
  #define VIRTIO_CONFIG_S_FAILED  0x80
 
 virtio-spec:
 ACKNOWLEDGE(1) :
 DRIVER(2)  :
 DRIVER_OK(3)   :
 FAILED(128):
 
 The spec refers to bit numbers and the headers use absolute numbers,
 they are not consistent.
 
 it shoule be 'FAILED(8)'.
 2^(8-1) = 128
 
 Changes from V1:
 - Fix wrong patch body
 
 Signed-off-by: Amos Kong ak...@redhat.com

Thanks, applied!
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] KVM: optimize for MMIO handled

2011-06-07 Thread Takuya Yoshikawa
On Tue, 07 Jun 2011 20:58:06 +0800
Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote:

 The performance test result:
 
 Netperf (TCP_RR):
 ===
 ept is enabled:
 
   Before After
 1st   709.58 734.60
 2nd   715.40 723.75
 3rd   713.45 724.22
 
 ept=0 bypass_guest_pf=0:
 
   Before After
 1st   706.10 709.63
 2nd   709.38 715.80
 3rd   695.90 710.70
 

In what condition, does TCP_RR perform so bad?

On 1Gbps network, directly connecting two Intel servers,
I got 20 times better result before.

Even when I used a KVM guest as the netperf client,
I got more than 10 times better result.

Could you tell me a bit more details of your test?


 Kernbech (do not redirect output to /dev/null)
 ==
 ept is enabled:
 
   Before After
 1st   2m34.749s  2m33.482s
 2nd   2m34.651s  2m33.161s
 3rd   2m34.543s  2m34.271s
 
 ept=0 bypass_guest_pf=0:
 
   Before After
 1st   4m43.467s  4m41.873s
 2nd   4m45.225s  4m41.668s
 3rd   4m47.029s  4m40.128s
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] KVM: MMU: optimize to handle dirty bit

2011-06-07 Thread Xiao Guangrong
On 06/07/2011 09:01 PM, Xiao Guangrong wrote:
 If dirty bit is not set, we can make the pte access read-only to avoid handing
 dirty bit everywhere

 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index b0c8184..67971da 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -106,6 +106,9 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, 
 pt_element_t gpte)
   unsigned access;
  
   access = (gpte  (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
 + if (!is_dirty_gpte(gpte))
 + access = ~ACC_WRITE_MASK;
 +

Sorry, it can break something: if the gpte is not on the last level and dirty 
bit
is set later, below patch should fix it, i'll merge it into in the next version.

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 4287dc8..6ceb5fd 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -101,12 +101,13 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu *mmu,
return (ret != orig_pte);
 }
 
-static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
+static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte,
+  bool last)
 {
unsigned access;
 
access = (gpte  (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
-   if (!is_dirty_gpte(gpte))
+   if (last  !is_dirty_gpte(gpte))
access = ~ACC_WRITE_MASK;
 
 #if PTTYPE == 64
@@ -230,8 +231,6 @@ walk:
pte |= PT_ACCESSED_MASK;
}
 
-   pte_access = pt_access  FNAME(gpte_access)(vcpu, pte);
-
walker-ptes[walker-level - 1] = pte;
 
if ((walker-level == PT_PAGE_TABLE_LEVEL) ||
@@ -266,7 +265,7 @@ walk:
break;
}
 
-   pt_access = pte_access;
+   pt_access = FNAME(gpte_access)(vcpu, pte, false);
--walker-level;
}
 
@@ -290,6 +289,7 @@ walk:
walker-ptes[walker-level - 1] = pte;
}
 
+   pte_access = pt_access  FNAME(gpte_access)(vcpu, pte, true);
walker-pt_access = pt_access;
walker-pte_access = pte_access;
pgprintk(%s: pte %llx pte_access %x pt_access %x\n,
@@ -369,7 +369,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
return;
 
pgprintk(%s: gpte %llx spte %p\n, __func__, (u64)gpte, spte);
-   pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte);
+   pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte, true);
pfn = gfn_to_pfn_atomic(vcpu-kvm, gpte_to_gfn(gpte));
if (mmu_invalid_pfn(pfn)) {
kvm_release_pfn_clean(pfn);
@@ -444,7 +444,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
continue;
 
-   pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte);
+   pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte,
+ true);
gfn = gpte_to_gfn(gpte);
pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
  pte_access  ACC_WRITE_MASK);
@@ -790,7 +791,7 @@ static bool FNAME(sync_mmio_spte)(struct kvm_vcpu *vcpu,
if (unlikely(is_mmio_spte(*sptep))) {
gfn_t gfn = gpte_to_gfn(gpte);
unsigned access = sp-role.access  FNAME(gpte_access)(vcpu,
-   gpte);
+   gpte, true);
 
if (gfn != get_mmio_spte_gfn(*sptep)) {
__set_spte(sptep, shadow_trap_nonpresent_pte);
@@ -868,7 +869,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
}
 
nr_present++;
-   pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte);
+   pte_access = sp-role.access  FNAME(gpte_access)(vcpu, gpte,
+ true);
host_writable = sp-spt[i]  SPTE_HOST_WRITEABLE;
 
set_spte(vcpu, sp-spt[i], pte_access, 0, 0,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] KVM: optimize for MMIO handled

2011-06-07 Thread Xiao Guangrong
On 06/08/2011 11:11 AM, Takuya Yoshikawa wrote:
 On Tue, 07 Jun 2011 20:58:06 +0800
 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote:
 
 The performance test result:

 Netperf (TCP_RR):
 ===
 ept is enabled:

   Before After
 1st   709.58 734.60
 2nd   715.40 723.75
 3rd   713.45 724.22

 ept=0 bypass_guest_pf=0:

   Before After
 1st   706.10 709.63
 2nd   709.38 715.80
 3rd   695.90 710.70

 
 In what condition, does TCP_RR perform so bad?
 
 On 1Gbps network, directly connecting two Intel servers,
 I got 20 times better result before.
 
 Even when I used a KVM guest as the netperf client,
 I got more than 10 times better result.
 

Um, which case did you test? ept = 1 or ept=0 bypass_guest_pf=0 or both?

 Could you tell me a bit more details of your test?
 

Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT
network connect to the netperf server, the bandwidth of our network
is 100M.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Guest to host communication in kvm

2011-06-07 Thread Vaibhav Nipunage
Hello,

I am trying to understand the kvm code. I am writing simple code in
which I want to send some message or notification from the guest to
host (qemu-kvm).

I thought of implementing some hypercalls in which, on some condition
this hypercall will get called and it get handled in qemu-kvm. But I
didn't understand how to handle this in qemu-kvm.

Or is there any other better way to do this?

Please help me.

Thanks in advance.

Thanks,
Vaibhav
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] KVM: optimize for MMIO handled

2011-06-07 Thread Xiao Guangrong
On 06/08/2011 11:25 AM, Xiao Guangrong wrote:
 On 06/08/2011 11:11 AM, Takuya Yoshikawa wrote:
 On Tue, 07 Jun 2011 20:58:06 +0800
 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote:

 The performance test result:

 Netperf (TCP_RR):
 ===
 ept is enabled:

   Before After
 1st   709.58 734.60
 2nd   715.40 723.75
 3rd   713.45 724.22

 ept=0 bypass_guest_pf=0:

   Before After
 1st   706.10 709.63
 2nd   709.38 715.80
 3rd   695.90 710.70


 In what condition, does TCP_RR perform so bad?

 On 1Gbps network, directly connecting two Intel servers,
 I got 20 times better result before.

 Even when I used a KVM guest as the netperf client,
 I got more than 10 times better result.

 
 Um, which case did you test? ept = 1 or ept=0 bypass_guest_pf=0 or both?
 
 Could you tell me a bit more details of your test?

 
 Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT
 network connect to the netperf server, the bandwidth of our network
 is 100M.
 

And this is my test script:

#!/bin/sh

echo 3  /proc/sys/vm/drop_caches
./netperf -H $HOST_NAME -p $PORT -t TCP_RR -l 60

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] KVM: optimize for MMIO handled

2011-06-07 Thread Takuya Yoshikawa
On Wed, 08 Jun 2011 11:32:12 +0800
Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote:

 On 06/08/2011 11:25 AM, Xiao Guangrong wrote:
  On 06/08/2011 11:11 AM, Takuya Yoshikawa wrote:
  On Tue, 07 Jun 2011 20:58:06 +0800
  Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote:
 
  The performance test result:
 
  Netperf (TCP_RR):
  ===
  ept is enabled:
 
Before After
  1st   709.58 734.60
  2nd   715.40 723.75
  3rd   713.45 724.22
 
  ept=0 bypass_guest_pf=0:
 
Before After
  1st   706.10 709.63
  2nd   709.38 715.80
  3rd   695.90 710.70
 
 
  In what condition, does TCP_RR perform so bad?
 
  On 1Gbps network, directly connecting two Intel servers,
  I got 20 times better result before.
 
  Even when I used a KVM guest as the netperf client,
  I got more than 10 times better result.
 
  
  Um, which case did you test? ept = 1 or ept=0 bypass_guest_pf=0 or both?
  

ept = 1 only.

  Could you tell me a bit more details of your test?
 
  
  Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT
  network connect to the netperf server, the bandwidth of our network
  is 100M.
  

I see the reason, thank you!

I used virtio-net and you used e1000.
You are using e1000 to see the MMIO performance change, right?

  Takuya

 
 And this is my test script:
 
 #!/bin/sh
 
 echo 3  /proc/sys/vm/drop_caches
 ./netperf -H $HOST_NAME -p $PORT -t TCP_RR -l 60
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-07 Thread Eric Dumazet
Le mercredi 08 juin 2011 à 08:18 +0800, Brad Campbell a écrit :
 On 08/06/11 06:57, Patrick McHardy wrote:
  On 07.06.2011 20:31, Eric Dumazet wrote:
  Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit :
 
  The main suspects would be NAT and TCPMSS. Did you also try whether
  the crash occurs with only one of these these rules?
 
  I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access
  the address the way I was doing it, so that's a no-go for me.
 
  That's really weird since you're apparently not using any bridge
  netfilter features. It shouldn't have any effect besides changing
  at which point ip_tables is invoked. How are your network devices
  configured (specifically any bridges)?
 
  Something in the kernel does
 
  u16 *ptr = addr (given by kmalloc())
 
  ptr[-1] = 0;
 
  Could be an off-one error in a memmove()/memcopy() or loop...
 
  I cant see a network issue here.
 
  So far me neither, but netfilter appears to trigger the bug.
 
 Would it help if I tried some older kernels? This issue only surfaced 
 for me recently as I only installed the VM's in question about 12 weeks 
 ago and have only just started really using them in anger. I could try 
 reproducing it on progressively older kernels to see if I can find one 
 that works and then bisect from there.

Well, a bisection definitely should help, but needs a lot of time in
your case.

Could you try following patch, because this is the 'usual suspect' I had
yesterday :

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 46cbd28..9f548f9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int 
ntail,
fastpath = atomic_read(skb_shinfo(skb)-dataref) == delta;
}
 
+#if 0
if (fastpath 
size + sizeof(struct skb_shared_info) = ksize(skb-head)) {
memmove(skb-head + size, skb_shinfo(skb),
@@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int 
ntail,
off = nhead;
goto adjust_others;
}
-
+#endif
data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
if (!data)
goto nodata;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] KVM: optimize for MMIO handled

2011-06-07 Thread Xiao Guangrong
On 06/08/2011 11:47 AM, Takuya Yoshikawa wrote:

 Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT
 network connect to the netperf server, the bandwidth of our network
 is 100M.

 
 I see the reason, thank you!
 
 I used virtio-net and you used e1000.
 You are using e1000 to see the MMIO performance change, right?
 

Hi Takuya,

Please applied my fix path when you test it again, thanks! :-)
(http://www.spinics.net/lists/kvm/msg56017.html)

Just then, in order to affirm the performance result, i tested it again,
and do not use our office network(since such many boxes in this network),
just boot two guests, one runs netperf server, one runs netperf client,
both use e1000 and NAT network.

I'll test the performance of virtio-net!

This is the result:

ept = 1:

Before patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001182.27   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001185.84   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001181.58   
16384  87380 

After patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001205.65   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001216.06   
16384  87380

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001215.70   
16384  87380 


ept = 0, bypass_guest_pf=0:

Before patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001169.70   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001160.82   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001168.01   
16384  87380 

After patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001266.28   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.001268.16  

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.

[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK

2011-06-07 Thread Alexander Graf
KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.

Signed-off-by: Alexander Graf ag...@suse.de
---
 kernel/compat.c |1 +
 virt/kvm/kvm_main.c |   50 +-
 2 files changed, 50 insertions(+), 1 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index 9214dcd..506e176 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat)
case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1])  32 );
}
 }
+EXPORT_SYMBOL_GPL(sigset_from_compat);
 
 asmlinkage long
 compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f78ddb8..f03db82 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -84,6 +84,8 @@ struct dentry *kvm_debugfs_dir;
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
   unsigned long arg);
+static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl,
+ unsigned long arg);
 static int hardware_enable_all(void);
 static void hardware_disable_all(void);
 
@@ -1585,7 +1587,9 @@ static int kvm_vcpu_release(struct inode *inode, struct 
file *filp)
 static struct file_operations kvm_vcpu_fops = {
.release= kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
-   .compat_ioctl   = kvm_vcpu_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = kvm_vcpu_compat_ioctl,
+#endif
.mmap   = kvm_vcpu_mmap,
.llseek = noop_llseek,
 };
@@ -1874,6 +1878,50 @@ out:
return r;
 }
 
+#ifdef CONFIG_COMPAT
+static long kvm_vcpu_compat_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+   struct kvm_vcpu *vcpu = filp-private_data;
+   void __user *argp = (void __user *)arg;
+   int r;
+
+   if (vcpu-kvm-mm != current-mm)
+   return -EIO;
+
+   switch (ioctl) {
+   case KVM_SET_SIGNAL_MASK: {
+   struct kvm_signal_mask __user *sigmask_arg = argp;
+   struct kvm_signal_mask kvm_sigmask;
+   compat_sigset_t csigset;
+   sigset_t sigset;
+
+   if (argp) {
+   r = -EFAULT;
+   if (copy_from_user(kvm_sigmask, argp,
+  sizeof kvm_sigmask))
+   goto out;
+   r = -EINVAL;
+   if (kvm_sigmask.len != sizeof csigset)
+   goto out;
+   r = -EFAULT;
+   if (copy_from_user(csigset, sigmask_arg-sigset,
+  sizeof csigset))
+   goto out;
+   }
+   sigset_from_compat(sigset, csigset);
+   r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset);
+   break;
+   }
+   default:
+   r = kvm_vcpu_ioctl(filp, ioctl, arg);
+   }
+
+out:
+   return r;
+}
+#endif
+
 static long kvm_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK

2011-06-07 Thread Arnd Bergmann
On Tuesday 07 June 2011 22:25:15 Alexander Graf wrote:
 +static long kvm_vcpu_compat_ioctl(struct file *filp,
 + unsigned int ioctl, unsigned long arg)
 +{
 +   struct kvm_vcpu *vcpu = filp-private_data;
 +   void __user *argp = (void __user *)arg;

Converting a compat user argument into a pointer should use the
compat_ptr() function to do the right thing on s390. Otherwise
your patch looks good.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK

2011-06-07 Thread Alexander Graf
KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - use compat_ptr
  - only declare compat call with CONFIG_COMPAT
---
 kernel/compat.c |1 +
 virt/kvm/kvm_main.c |   52 ++-
 2 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index 9214dcd..506e176 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -882,6 +882,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat)
case 1: set-sig[0] = compat-sig[0] | (((long)compat-sig[1])  32 );
}
 }
+EXPORT_SYMBOL_GPL(sigset_from_compat);
 
 asmlinkage long
 compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f78ddb8..04dfce9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -84,6 +84,10 @@ struct dentry *kvm_debugfs_dir;
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
   unsigned long arg);
+#ifdef CONFIG_COMPAT
+static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl,
+ unsigned long arg);
+#endif
 static int hardware_enable_all(void);
 static void hardware_disable_all(void);
 
@@ -1585,7 +1589,9 @@ static int kvm_vcpu_release(struct inode *inode, struct 
file *filp)
 static struct file_operations kvm_vcpu_fops = {
.release= kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
-   .compat_ioctl   = kvm_vcpu_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = kvm_vcpu_compat_ioctl,
+#endif
.mmap   = kvm_vcpu_mmap,
.llseek = noop_llseek,
 };
@@ -1874,6 +1880,50 @@ out:
return r;
 }
 
+#ifdef CONFIG_COMPAT
+static long kvm_vcpu_compat_ioctl(struct file *filp,
+ unsigned int ioctl, unsigned long arg)
+{
+   struct kvm_vcpu *vcpu = filp-private_data;
+   void __user *argp = compat_ptr(arg);
+   int r;
+
+   if (vcpu-kvm-mm != current-mm)
+   return -EIO;
+
+   switch (ioctl) {
+   case KVM_SET_SIGNAL_MASK: {
+   struct kvm_signal_mask __user *sigmask_arg = argp;
+   struct kvm_signal_mask kvm_sigmask;
+   compat_sigset_t csigset;
+   sigset_t sigset;
+
+   if (argp) {
+   r = -EFAULT;
+   if (copy_from_user(kvm_sigmask, argp,
+  sizeof kvm_sigmask))
+   goto out;
+   r = -EINVAL;
+   if (kvm_sigmask.len != sizeof csigset)
+   goto out;
+   r = -EFAULT;
+   if (copy_from_user(csigset, sigmask_arg-sigset,
+  sizeof csigset))
+   goto out;
+   }
+   sigset_from_compat(sigset, csigset);
+   r = kvm_vcpu_ioctl_set_sigmask(vcpu, sigset);
+   break;
+   }
+   default:
+   r = kvm_vcpu_ioctl(filp, ioctl, arg);
+   }
+
+out:
+   return r;
+}
+#endif
+
 static long kvm_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html