[PATCH] KVM : VMX: disable apicv by default

2013-02-08 Thread Yang Zhang
From: Yang Zhang 

Without Posted Interrupt, current code is broken. Just disable by
default until Posted Interrupt is ready.

Signed-off-by: Yang Zhang 
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fe9a9cf..27233a1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -84,7 +84,7 @@ module_param(vmm_exclusive, bool, S_IRUGO);
 static bool __read_mostly fasteoi = 1;
 module_param(fasteoi, bool, S_IRUGO);
 
-static bool __read_mostly enable_apicv_reg_vid = 1;
+static bool __read_mostly enable_apicv_reg_vid;
 module_param(enable_apicv_reg_vid, bool, S_IRUGO);
 
 /*
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Remove user_alloc from struct kvm_memory_slot

2013-02-08 Thread Marcelo Tosatti
On Thu, Feb 07, 2013 at 06:55:57PM +0900, Takuya Yoshikawa wrote:
> This field was needed to differentiate memory slots created by the new
> API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent,
> KVM_SET_MEMORY_REGION, whose support was dropped long before:
> 
>   commit b74a07beed0e64bfba413dcb70dd6749c57f43dc
>   KVM: Remove kernel-allocated memory regions
> 
> Although we also have private memory slots to which KVM allocates
> memory with vm_mmap(), !user_alloc slots in other words, the slot id
> should be enough for differentiating them.
> 
> Note: corresponding function parameters will be removed later.
> 
> Signed-off-by: Takuya Yoshikawa 

Reviewed-by: Marcelo Tosatti 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why not use struct mm_struct to manage guest physical addresses in new port?

2013-02-08 Thread David Daney

On 02/08/2013 02:11 PM, Marcelo Tosatti wrote:

On Tue, Feb 05, 2013 at 11:02:32AM -0800, David Daney wrote:

Hi,

I am starting to working on a port of KVM to an architecture that
has a dual TLB.  The Guest Virtual Addresses (GVA) are translated to
Guest Physical Addresses (GPA) by the first TLB, then a second TLB
translates the GPA to a Root Physical Address (RPA).  For the sake
of this question, we will ignore the GVA->GPA TLB and consider only
the GPA->RPA TLB.

I seems that most existing ports have a bunch of custom code that
manages the GPA->RPA TLB and page tables.

Here is what I would like to try:  Create a mm for the GPA->RPA
mappings each vma would have a fault handler that calls gfn_to_pfn()
to look up the proper page.  In kvm_arch_vcpu_ioctl_run() we would
call switch_mm() to this new 'gva_mm'.


gfn_to_pfn uses the address space of the controlling process. GPA->RPA
translation does:

1) Find 'memory slot' (indexed by gfn)
2) From 'memory slot', find virtual address (relative to controlling
process).
3) Walk pagetable of controlling process and page retrieve physical address.


Actually, it kind of works.  Here is the vm_operations_struct for the 
VMAs in the guest MM using this technique:



static int kvm_mipsvz_host_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf)

{
struct page *page[1];
unsigned long addr;
int npages;
struct kvm *kvm = vma->vm_private_data;
gfn_t gfn = vmf->pgoff + (vma->vm_start >> PAGE_SHIFT);

addr = gfn_to_hva(kvm, gfn);
if (kvm_is_error_hva(addr))
return VM_FAULT_SIGBUS;

npages = get_user_pages(current, kvm->arch.host_mm, addr, 1, 1, 0, page,
NULL);
if (unlikely(npages != 1))
return VM_FAULT_SIGBUS;

vmf->page = page[0];
return 0;
}

static const struct vm_operations_struct kvm_mipsvz_host_ops = {
.fault =  kvm_mipsvz_host_fault
};

Most likely this screws up the page reference counts in a manner that 
will leak pages.  But the existing mm infrastructure is managing the 
page tables so that the pages show up in the proper place in the guest.


That said, I think I will switch to a more conventional approach where 
the guest page tables are manages outside of the kernel's struct 
mm_struct framework.  What I did, works for memory, but I think it will 
be very difficult to implement trap-and-emulate on memory references 
this way.





  Upon exiting guest mode we
would switch back to the original mm of the controlling process.
For me the benefit of this approach is that all the code that
manages the TLB is already implemented and works well for struct
mm_struct.  The only thing I need to do is write a vma fault
handler.  That is a lot easier and less error prone than maintaining
a parallel TLB management framework and making sure it interacts
properly with the existing TLB code for 'normal' processes.


Q1: Am I crazy for wanting to try this?


You need the mm_struct of the controlling to be active, when doing
GPA->RPA translations.


Q2: Have others tried this and rejected it?  What were the reasons?


I think you'll have to switch_mm back to the controlling process mm on
every page fault (and then back to gva_mm).



Thanks in advance,
David Daney
Cavium, Inc.


'vma' `is a process


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why not use struct mm_struct to manage guest physical addresses in new port?

2013-02-08 Thread Marcelo Tosatti
On Tue, Feb 05, 2013 at 11:02:32AM -0800, David Daney wrote:
> Hi,
> 
> I am starting to working on a port of KVM to an architecture that
> has a dual TLB.  The Guest Virtual Addresses (GVA) are translated to
> Guest Physical Addresses (GPA) by the first TLB, then a second TLB
> translates the GPA to a Root Physical Address (RPA).  For the sake
> of this question, we will ignore the GVA->GPA TLB and consider only
> the GPA->RPA TLB.
> 
> I seems that most existing ports have a bunch of custom code that
> manages the GPA->RPA TLB and page tables.
> 
> Here is what I would like to try:  Create a mm for the GPA->RPA
> mappings each vma would have a fault handler that calls gfn_to_pfn()
> to look up the proper page.  In kvm_arch_vcpu_ioctl_run() we would
> call switch_mm() to this new 'gva_mm'.

gfn_to_pfn uses the address space of the controlling process. GPA->RPA
translation does:

1) Find 'memory slot' (indexed by gfn)
2) From 'memory slot', find virtual address (relative to controlling
process).
3) Walk pagetable of controlling process and page retrieve physical address.

>  Upon exiting guest mode we
> would switch back to the original mm of the controlling process.
> For me the benefit of this approach is that all the code that
> manages the TLB is already implemented and works well for struct
> mm_struct.  The only thing I need to do is write a vma fault
> handler.  That is a lot easier and less error prone than maintaining
> a parallel TLB management framework and making sure it interacts
> properly with the existing TLB code for 'normal' processes.
> 
> 
> Q1: Am I crazy for wanting to try this?

You need the mm_struct of the controlling to be active, when doing
GPA->RPA translations.

> Q2: Have others tried this and rejected it?  What were the reasons?

I think you'll have to switch_mm back to the controlling process mm on
every page fault (and then back to gva_mm).

> 
> Thanks in advance,
> David Daney
> Cavium, Inc.

'vma' `is a process 

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] KVM: MMU: simple cleanups

2013-02-08 Thread Marcelo Tosatti
On Tue, Feb 05, 2013 at 03:26:21PM +0800, Xiao Guangrong wrote:
> There are the simple cleanups for MMU, no function / logic changed.
> 
> Marcelo, Gleb, please apply them after applying
> "[PATCH v3] KVM: MMU: lazily drop large spte"
> 
> Changelog:
> no change, just split them from the previous patchset for good review.
> 
> Thanks!

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] KVM: MMU: introduce mmu_spte_establish

2013-02-08 Thread Marcelo Tosatti
On Tue, Feb 05, 2013 at 04:53:19PM +0800, Xiao Guangrong wrote:
> There is little different between walking parent pte and walking ramp:
> all spte in rmap must be present but this is not true on parent pte list,
> in kvm_mmu_alloc_page, we always link the parent list before set the spte
> to present
> 
> This patch introduce mmu_spte_establish which set the spte before linking
> it to parent list to eliminates the different then it is possible to unify
> the code of walking pte list
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |   81 ++-
>  arch/x86/kvm/paging_tmpl.h |   16 -
>  2 files changed, 48 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 8041454..68d4d5f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1482,9 +1482,6 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
>  static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
>   struct kvm_mmu_page *sp, u64 *parent_pte)
>  {
> - if (!parent_pte)
> - return;
> -
>   pte_list_add(vcpu, parent_pte, &sp->parent_ptes);
>  }
> 
> @@ -1502,7 +1499,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
>  }
> 
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
> -u64 *parent_pte, int direct)
> +int direct)
>  {
>   struct kvm_mmu_page *sp;
>   sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
> @@ -1512,7 +1509,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
> kvm_vcpu *vcpu,
>   set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>   list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
>   sp->parent_ptes = 0;
> - mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>   kvm_mod_used_mmu_pages(vcpu->kvm, +1);
>   return sp;
>  }
> @@ -1845,8 +1841,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
> kvm_vcpu *vcpu,
>gva_t gaddr,
>unsigned level,
>int direct,
> -  unsigned access,
> -  u64 *parent_pte)
> +  unsigned access)
>  {
>   union kvm_mmu_page_role role;
>   unsigned quadrant;
> @@ -1876,19 +1871,15 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
> kvm_vcpu *vcpu,
>   if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
>   break;
> 
> - mmu_page_add_parent_pte(vcpu, sp, parent_pte);
> - if (sp->unsync_children) {
> + if (sp->unsync_children)
>   kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> - kvm_mmu_mark_parents_unsync(sp);
> - } else if (sp->unsync)
> - kvm_mmu_mark_parents_unsync(sp);
> 
>   __clear_sp_write_flooding_count(sp);
>   trace_kvm_mmu_get_page(sp, false);
>   return sp;
>   }
>   ++vcpu->kvm->stat.mmu_cache_miss;
> - sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
> + sp = kvm_mmu_alloc_page(vcpu, direct);
>   if (!sp)
>   return sp;
>   sp->gfn = gfn;
> @@ -1908,6 +1899,35 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
> kvm_vcpu *vcpu,
>   return sp;
>  }
> 
> +static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
> +{
> + u64 spte;
> +
> + spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
> +shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
> +
> + mmu_spte_set(sptep, spte);
> +}
> +
> +static struct kvm_mmu_page *
> +mmu_spte_establish(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn, gva_t gaddr,
> +unsigned level, int direct, unsigned access)
> +{
> + struct kvm_mmu_page *sp;
> +
> + WARN_ON(is_shadow_present_pte(*spte));
> +
> + sp = kvm_mmu_get_page(vcpu, gfn, gaddr, level, direct, access);
> +
> + link_shadow_page(spte, sp);
> + mmu_page_add_parent_pte(vcpu, sp, spte);
> +
> + if (sp->unsync_children || sp->unsync)
> + kvm_mmu_mark_parents_unsync(sp);
> +
> + return sp;
> +}
> +
>  static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
>struct kvm_vcpu *vcpu, u64 addr)
>  {
> @@ -1957,16 +1977,6 @@ static void shadow_walk_next(struct 
> kvm_shadow_walk_iterator *iterator)
>   return __shadow_walk_next(iterator, *iterator->sptep);
>  }
> 
> -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
> -{
> - u64 spte;
> -
> - spte = __pa(sp->spt) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
> -shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
> -
> - mmu_spte_set(sptep, spte);
> -}
> -

Re: [PATCH v3 5/5] KVM: MMU: fast drop all spte on the pte_list

2013-02-08 Thread Marcelo Tosatti
On Tue, Feb 05, 2013 at 04:55:37PM +0800, Xiao Guangrong wrote:
> If a shadow page is being zapped or a host page is going to be freed, kvm
> will drop all the reverse-mappings on the shadow page or the gfn. Currently,
> it drops the reverse-mapping one by one - it deletes the first reverse 
> mapping,
> then moves other reverse-mapping between the description-table. When the
> last description-table become empty, it will be freed.
> 
> It works well if we only have a few reverse-mappings, but some pte_lists are
> very long, during my tracking, i saw some gfns have more than 200 sptes listed
> on its pte-list (1G memory in guest on softmmu). Optimization for dropping 
> such
> long pte-list is worthwhile, at lease it is good for deletion memslots and
> ksm/thp merge pages.
> 
> This patch introduce a better way to optimize for this case, it walks all the
> reverse-mappings and clear them, then free all description-tables together.
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |   36 +++-
>  1 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 58f813a..aa7a887 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -945,6 +945,25 @@ static void pte_list_remove(u64 *spte, unsigned long 
> *pte_list)
>   }
>  }
> 
> +static void pte_list_destroy(unsigned long *pte_list)
> +{
> + struct pte_list_desc *desc;
> + unsigned long list_value = *pte_list;
> +
> + *pte_list = 0;
> +
> + if (!(list_value & 1))
> + return;
> +
> + desc = (struct pte_list_desc *)(list_value & ~1ul);
> + while (desc) {
> + struct pte_list_desc *next_desc = desc->more;
> +
> + mmu_free_pte_list_desc(desc);
> + desc = next_desc;
> + }
> +}
> +
>  /*
>   * Used by the following functions to iterate through the sptes linked by a
>   * pte_list.  All fields are private and not assumed to be used outside.
> @@ -1183,17 +1202,17 @@ static bool rmap_write_protect(struct kvm *kvm, u64 
> gfn)
>  static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  struct kvm_memory_slot *slot, unsigned long data)
>  {
> - u64 *sptep;
>   struct pte_list_iterator iter;
> + u64 *sptep;
>   int need_tlb_flush = 0;
> 
> -restart:
>   for_each_spte_in_rmap(*rmapp, iter, sptep) {
> - drop_spte(kvm, sptep);
> + mmu_spte_clear_track_bits(sptep);
>   need_tlb_flush = 1;
> - goto restart;
>   }
> 
> + pte_list_destroy(rmapp);
> +
>   return need_tlb_flush;
>  }
> 
> @@ -2016,11 +2035,10 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, 
> struct kvm_mmu_page *sp)
>   u64 *sptep;
>   struct pte_list_iterator iter;
> 
> -restart:
> - for_each_spte_in_pte_list(sp->parent_ptes, iter, sptep) {
> - drop_parent_pte(sp, sptep);
> - goto restart;
> - }
> + for_each_spte_in_pte_list(sp->parent_ptes, iter, sptep)
> + mmu_spte_clear_no_track(sptep);
> +
> + pte_list_destroy(&sp->parent_ptes);
>  }

Do we lose the crash information of pte_list_remove? 
It has been shown to be useful in several cases.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Added one_reg interface for timer registers

2013-02-08 Thread Scott Wood

On 02/08/2013 04:06:14 AM, Bharat Bhushan wrote:

If userspace wants to change some specific bits of TSR
(timer status register) then it uses GET/SET_SREGS ioctl interface.
So the steps will be:
  i)   user-space will make get ioctl,
  ii)  change TSR in userspace
  iii) then make set ioctl.
It can happen that TSR gets changed by kernel after step i) and
before step iii).

To avoid this we have added below one_reg ioctls for oring and  
clearing

specific bits in TSR. This patch adds one registerface for:
 1) setting specific bit in TSR (timer status register)
 2) clearing specific bit in TSR (timer status register)
 3) setting the TCR register. There are cases where we want to  
only

change TCR and not TSR. Although we can uses SREGS without
KVM_SREGS_E_UPDATE_TSR flag but I think one reg is better. I  
am open

if someone feels we should use SREGS only here.

Signed-off-by: Bharat Bhushan 
---
 arch/powerpc/include/uapi/asm/kvm.h |4 
 arch/powerpc/kvm/booke.c|   18 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h  
b/arch/powerpc/include/uapi/asm/kvm.h

index 16064d0..b4ad5d4 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -417,4 +417,8 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_EPCR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x85)
 #define KVM_REG_PPC_EPR		(KVM_REG_PPC | KVM_REG_SIZE_U32  
| 0x86)


+/* Timer Status Register OR/CLEAR interface */
+#define KVM_REG_PPC_OR_TSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x87)
+#define KVM_REG_PPC_CLEAR_TSR	(KVM_REG_PPC | KVM_REG_SIZE_U32  
| 0x88)

+#define KVM_REG_PPC_SET_TCR(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x89)
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 020923e..983c06f 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1481,6 +1481,24 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu  
*vcpu, struct kvm_one_reg *reg)

break;
}
 #endif
+   case KVM_REG_PPC_OR_TSR: {
+   u32 tsr_bits;
+   r = get_user(tsr_bits, (u32 __user *)(long)reg->addr);
+   kvmppc_set_tsr_bits(vcpu, tsr_bits);
+   break;
+   }
+   case KVM_REG_PPC_CLEAR_TSR: {
+   u32 tsr_bits;
+   r = get_user(tsr_bits, (u32 __user *)(long)reg->addr);
+   kvmppc_clr_tsr_bits(vcpu, tsr_bits);
+   break;
+   }
+   case KVM_REG_PPC_SET_TCR: {
+   u32 tcr;
+   r = get_user(tcr, (u32 __user *)(long)reg->addr);
+   kvmppc_set_tcr(vcpu, tcr);
+   break;
+   }


KVM_REG_PPC_SET_TCR should just be KVM_REG_PPC_TCR -- we should be able  
to "GET" it too...  Might as well add a KVM_REG_PPC_TSR as well that  
has normal read/write semantics (in addition to the CLEAR and OR  
versions), if we're going to do it for TCR.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][v4] PPC: add paravirt idle loop for 64-bit book E

2013-02-08 Thread Stuart Yoder
From: Stuart Yoder 

Signed-off-by: Stuart Yoder 
---

-removed KVM prefix to patch subject, patch is not KVM specific

 arch/powerpc/kernel/epapr_hcalls.S |2 ++
 arch/powerpc/kernel/idle_book3e.S  |   32 ++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/epapr_hcalls.S 
b/arch/powerpc/kernel/epapr_hcalls.S
index 62c0dc2..9f1ebf7 100644
--- a/arch/powerpc/kernel/epapr_hcalls.S
+++ b/arch/powerpc/kernel/epapr_hcalls.S
@@ -17,6 +17,7 @@
 #include 
 #include 
 
+#ifndef CONFIG_PPC64
 /* epapr_ev_idle() was derived from e500_idle() */
 _GLOBAL(epapr_ev_idle)
CURRENT_THREAD_INFO(r3, r1)
@@ -42,6 +43,7 @@ epapr_ev_idle_start:
 * _TLF_NAPPING.
 */
b   idle_loop
+#endif
 
 /* Hypercall entry point. Will be patched with device tree instructions. */
 .global epapr_hypercall_start
diff --git a/arch/powerpc/kernel/idle_book3e.S 
b/arch/powerpc/kernel/idle_book3e.S
index 4c7cb400..bfb73cc 100644
--- a/arch/powerpc/kernel/idle_book3e.S
+++ b/arch/powerpc/kernel/idle_book3e.S
@@ -16,11 +16,13 @@
 #include 
 #include 
 #include 
+#include 
 
 /* 64-bit version only for now */
 #ifdef CONFIG_PPC64
 
-_GLOBAL(book3e_idle)
+.macro BOOK3E_IDLE name loop
+_GLOBAL(\name)
/* Save LR for later */
mflrr0
std r0,16(r1)
@@ -67,7 +69,33 @@ _GLOBAL(book3e_idle)
 
/* We can now re-enable hard interrupts and go to sleep */
wrteei  1
-1: PPC_WAIT(0)
+   \loop
+
+.endm
+
+.macro BOOK3E_IDLE_LOOP
+1:
+   PPC_WAIT(0)
b   1b
+.endm
+
+/* epapr_ev_idle_start below is patched with the proper hcall
+   opcodes during kernel initialization */
+.macro EPAPR_EV_IDLE_LOOP
+idle_loop:
+   LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE))
+
+.global epapr_ev_idle_start
+epapr_ev_idle_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+   b   idle_loop
+.endm
+
+BOOK3E_IDLE epapr_ev_idle EPAPR_EV_IDLE_LOOP
+
+BOOK3E_IDLE book3e_idle BOOK3E_IDLE_LOOP
 
 #endif /* CONFIG_PPC64 */
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/5] target-i386: Slim conversion to X86CPU subclasses

2013-02-08 Thread Eduardo Habkost
On Fri, Feb 08, 2013 at 05:54:50PM +0100, Andreas Färber wrote:
> Am 08.02.2013 15:52, schrieb Eduardo Habkost:
> > On Fri, Feb 08, 2013 at 01:58:42PM +0100, Igor Mammedov wrote:
> >> On Fri, 08 Feb 2013 12:16:17 +0100
> >> Andreas Färber  wrote:
> >>> Am 08.02.2013 10:03, schrieb Igor Mammedov:
>  On Thu, 7 Feb 2013 13:08:19 -0200
>  Eduardo Habkost  wrote:
> 
> > On Tue, Feb 05, 2013 at 05:39:22PM +0100, Igor Mammedov wrote:
> >> @@ -2236,6 +2083,44 @@ static void x86_cpu_initfn(Object *obj)
> >>  }
> >>  }
> >>  
> >> +static void x86_cpu_def_class_init(ObjectClass *oc, void *data)
> >> +{
> >> +X86CPUClass *xcc = X86_CPU_CLASS(oc);
> >> +ObjectClass *hoc = object_class_by_name(TYPE_HOST_X86_CPU);
> >> +X86CPUClass *hostcc;
> >> +x86_def_t *def = data;
> >> +int i;
> >> +static const char *versioned_models[] = { "qemu32", "qemu64", 
> >> "athlon" };
> >> +
> >> +memcpy(&xcc->info, def, sizeof(x86_def_t));
> >> +
> >> +/* host cpu class is available if KVM is enabled,
> >> + * get kvm overrides from it */
> >> +if (hoc) {
> >> +hostcc = X86_CPU_CLASS(hoc);
> >> +/* sysenter isn't supported in compatibility mode on AMD,
> >> + * syscall isn't supported in compatibility mode on Intel.
> >> + * Normally we advertise the actual CPU vendor, but you can
> >> + * override this using the 'vendor' property if you want to 
> >> use
> >> + * KVM's sysenter/syscall emulation in compatibility mode and
> >> + * when doing cross vendor migration
> >> + */
> >> +memcpy(xcc->info.vendor, hostcc->info.vendor,
> >> +   sizeof(xcc->info.vendor));
> >> +}
> >
> > Again, we have the same problem we had before, but now in the non-host
> > classes. What if class_init is called before KVM is initialized? I
> > believe we will be forced to move this hack to the instance init
> > function.
>  I believe, the in the case where non-host CPU classes might be 
>  initialized
>  before KVM "-cpu ?" we do not care what their defaults are, since we only
>  would use class names there and then exit.
> 
>  For case where classes could be inspected over QMP, OQM, KVM would be 
>  already
>  initialized if enabled and we would get proper initialization order 
>  without
>  hack.
> > 
> > Who guarantees that KVM will be already initialized when we get a QMP
> > monitor? We can't do that today because of limitations in the QEMU main
> > code, but I believe we want to get rid of this limitation eventually,
> > instead of making it harder to get rid of.
> > 
> > If we could initialize KVM before QMP is initialized, we could simply
> > initialize KVM before class_init is called, instead. It would be easier
> > to reason about, and it would make the limitations of our code very
> > clear to anybody reading the code in main().
> 
> That wouldn't work (currently) due to -device and -object being command
> line options just like -enable-kvm, -disable-kvm and -machine accel=.

Well, we could loop over the command-line options twice.

It is just an alternative that would be better than making class_init
unreliable. I don't think it would be a great solution anyway.

> 
> >>>
> >>> I think you're missing Eduardo's and my point:
> >>>
> >>> diff --git a/vl.c b/vl.c
> >>> index a8dc73d..6b9378e 100644
> >>> --- a/vl.c
> >>> +++ b/vl.c
> >>> @@ -2844,6 +2844,7 @@ int main(int argc, char **argv, char **envp)
> >>>  }
> >>>
> >>>  module_call_init(MODULE_INIT_QOM);
> >>> +object_class_foreach(walkerfn, TYPE_OBJECT, false, NULL);
> >>>
> >>>  qemu_add_opts(&qemu_drive_opts);
> >>>  qemu_add_opts(&qemu_chardev_opts);
> >>>
> >>> Anyone may iterate over QOM classes at any time after their type
> >>> registration, which is before the first round of option parsing.
> >>> Sometime later, after option parsing, there's the -cpu ? handling in
> >>> vl.c:3854, then vl.c:4018:configure_accelerator().
> >>>
> >>> Like I said, mostly a theoretical issue today.
> >> Question is if we should drop this theoretical issue for 1.5?
> > 
> > I wouldn't call it just theoretical. It is something that will surely
> > hit us back. The people working on QMP or on the main() code 6 months
> > from now will no idea that our class_init code is broken and will
> > explode if class_init is called too early.
> 
> We should try to find a reliable solution here or at least add
> appropriate comments to the module_call_init() call in vl.c.

Agreed. But I believe there are tons of other solutions that don't
require making class_init broken.


> 
> >>> Originally I had considered making kvm_init() re-entrant and calling it
> >>> from the offending class_init. But we must support the distro case of
> >>> compiling with CONFIG_KVM but the user's hardw

Re: [Qemu-devel] [PATCH 3/5] target-i386: Slim conversion to X86CPU subclasses

2013-02-08 Thread Andreas Färber
Am 08.02.2013 15:52, schrieb Eduardo Habkost:
> On Fri, Feb 08, 2013 at 01:58:42PM +0100, Igor Mammedov wrote:
>> On Fri, 08 Feb 2013 12:16:17 +0100
>> Andreas Färber  wrote:
>>> Am 08.02.2013 10:03, schrieb Igor Mammedov:
 On Thu, 7 Feb 2013 13:08:19 -0200
 Eduardo Habkost  wrote:

> On Tue, Feb 05, 2013 at 05:39:22PM +0100, Igor Mammedov wrote:
>> @@ -2236,6 +2083,44 @@ static void x86_cpu_initfn(Object *obj)
>>  }
>>  }
>>  
>> +static void x86_cpu_def_class_init(ObjectClass *oc, void *data)
>> +{
>> +X86CPUClass *xcc = X86_CPU_CLASS(oc);
>> +ObjectClass *hoc = object_class_by_name(TYPE_HOST_X86_CPU);
>> +X86CPUClass *hostcc;
>> +x86_def_t *def = data;
>> +int i;
>> +static const char *versioned_models[] = { "qemu32", "qemu64", 
>> "athlon" };
>> +
>> +memcpy(&xcc->info, def, sizeof(x86_def_t));
>> +
>> +/* host cpu class is available if KVM is enabled,
>> + * get kvm overrides from it */
>> +if (hoc) {
>> +hostcc = X86_CPU_CLASS(hoc);
>> +/* sysenter isn't supported in compatibility mode on AMD,
>> + * syscall isn't supported in compatibility mode on Intel.
>> + * Normally we advertise the actual CPU vendor, but you can
>> + * override this using the 'vendor' property if you want to use
>> + * KVM's sysenter/syscall emulation in compatibility mode and
>> + * when doing cross vendor migration
>> + */
>> +memcpy(xcc->info.vendor, hostcc->info.vendor,
>> +   sizeof(xcc->info.vendor));
>> +}
>
> Again, we have the same problem we had before, but now in the non-host
> classes. What if class_init is called before KVM is initialized? I
> believe we will be forced to move this hack to the instance init
> function.
 I believe, the in the case where non-host CPU classes might be initialized
 before KVM "-cpu ?" we do not care what their defaults are, since we only
 would use class names there and then exit.

 For case where classes could be inspected over QMP, OQM, KVM would be 
 already
 initialized if enabled and we would get proper initialization order without
 hack.
> 
> Who guarantees that KVM will be already initialized when we get a QMP
> monitor? We can't do that today because of limitations in the QEMU main
> code, but I believe we want to get rid of this limitation eventually,
> instead of making it harder to get rid of.
> 
> If we could initialize KVM before QMP is initialized, we could simply
> initialize KVM before class_init is called, instead. It would be easier
> to reason about, and it would make the limitations of our code very
> clear to anybody reading the code in main().

That wouldn't work (currently) due to -device and -object being command
line options just like -enable-kvm, -disable-kvm and -machine accel=.

>>>
>>> I think you're missing Eduardo's and my point:
>>>
>>> diff --git a/vl.c b/vl.c
>>> index a8dc73d..6b9378e 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -2844,6 +2844,7 @@ int main(int argc, char **argv, char **envp)
>>>  }
>>>
>>>  module_call_init(MODULE_INIT_QOM);
>>> +object_class_foreach(walkerfn, TYPE_OBJECT, false, NULL);
>>>
>>>  qemu_add_opts(&qemu_drive_opts);
>>>  qemu_add_opts(&qemu_chardev_opts);
>>>
>>> Anyone may iterate over QOM classes at any time after their type
>>> registration, which is before the first round of option parsing.
>>> Sometime later, after option parsing, there's the -cpu ? handling in
>>> vl.c:3854, then vl.c:4018:configure_accelerator().
>>>
>>> Like I said, mostly a theoretical issue today.
>> Question is if we should drop this theoretical issue for 1.5?
> 
> I wouldn't call it just theoretical. It is something that will surely
> hit us back. The people working on QMP or on the main() code 6 months
> from now will no idea that our class_init code is broken and will
> explode if class_init is called too early.

We should try to find a reliable solution here or at least add
appropriate comments to the module_call_init() call in vl.c.

>>> Originally I had considered making kvm_init() re-entrant and calling it
>>> from the offending class_init. But we must support the distro case of
>>> compiling with CONFIG_KVM but the user's hardware not supporting KVM or
>>> kvm module not being loaded or the user having insufficient priviledges
>>> to access /dev/kvm.
>> working without KVM is not issue, it just works with normal defaults. 
>> Applying
>> KVM specific defaults to already initialized classes is.

Right, but applying KVM-specific defaults is much easier once KVM is
initialized. :)

> 
> My big question is: why exactly we want to initialize this stuff inside
> class_init? Can't we (please!) put the KVM-specific logic inside
> instance_init?

Then we're pretty much back to my v3 plus Igor's error handling chan

Re: [Qemu-devel] [PATCH 3/5] target-i386: Slim conversion to X86CPU subclasses

2013-02-08 Thread Eduardo Habkost
On Fri, Feb 08, 2013 at 12:52:31PM -0200, Eduardo Habkost wrote:
> On Fri, Feb 08, 2013 at 01:58:42PM +0100, Igor Mammedov wrote:
[...]
> > Continuing on theoretical issue:
> > > We could add an inited field to X86CPUClass that gets checked at initfn
> > > time (only ever getting set to true by the pre-defined models). Then it
> > > would be per model. And if we add a prototype for the ..._class_init we
> > > could even call it late as proposed for -cpu host if we take some
> >   is a tricky part, for global properties to work it
> > would require, calling this hook after kvm_init() is succeeds and before
> > any initfn() is called in general or as minimum before Device.initfn(). And 
> > it
> > anyway will not make all CPU classes to have correct defaults in KVM mode,
> > since only CPU class of created CPU instance will be fixed up.
> > 
> > 1. One way to make sure that built-in CPU classes have fixed up defaults is 
> > to
> > iterate over them in kvm_arch_init() and possibly calling their class_init()
> > again to reinitialize. It's still hack (due fixing something up), but it 
> > would
> > give at least correct KVM mode defaults, regardless of the order classes are
> > initialized.
> 
> Can't we do that more easily with the tcg-vendor/vendor properties?
> 
> It looks we are burning too much brain cycles trying to force a model
> that's really unintuitive to the outside, where the default-value of a
> class property depends on the options given to the QEMU command-line. We
> don't have to do that.
> 
> The point of initializing stuff in class_init is to make introspection
> easy. If we make the classes change how they look like depending on the
> command-line configuration, the classes and the class introspection
> system get less useful.
> 

Sorry for replying to myself, but extending my answer:

> > 
> > 2. But more correct way from POV of OOP would be one without any fixups, 
> > i.e.
> > create extra KVM-builtin-CPU-classes that are derived from host class.
> > and in object_class_by_name() lookup for them if kvm is enabled. But we 
> > could
> > do this as follow-up to #1.

Solution #2 would be 100% correct, strictly speaking, but isn't it
overkill to create separate classes if we could just add one additional
property in X86CPUClass, and let the CPU object choose which property is
important for the CPUID setup, depending if KVM is enabled or not?

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/5] target-i386: Slim conversion to X86CPU subclasses

2013-02-08 Thread Eduardo Habkost
On Fri, Feb 08, 2013 at 01:58:42PM +0100, Igor Mammedov wrote:
> On Fri, 08 Feb 2013 12:16:17 +0100
> Andreas Färber  wrote:
> 
> > Am 08.02.2013 10:03, schrieb Igor Mammedov:
> > > On Thu, 7 Feb 2013 13:08:19 -0200
> > > Eduardo Habkost  wrote:
> > > 
> > >> On Tue, Feb 05, 2013 at 05:39:22PM +0100, Igor Mammedov wrote:
> > >>> From: Andreas Färber 
> > >>>
> > >>> Move x86_def_t definition to header and embed into X86CPUClass.
> > >>> Register types per built-in model definition.
> > >>>
> > >>> Move version initialization from x86_cpudef_setup() to class_init.
> > >>>
> > >>> Inline cpu_x86_register() into the X86CPU initfn.
> > >>> Since instance_init cannot reports errors, drop error handling.
> > >>>
> > >>> Replace cpu_x86_find_by_name() with x86_cpu_class_by_name().
> > >>> Move handling of KVM host vendor override from cpu_x86_find_by_name()
> > >>> to the kvm_arch_init() and class_init(). Use TYPE_X86_CPU class to
> > >>> communicate kvm specific defaults to other sub-classes.
> > >>>
> > >>> Register host-{i386,x86_64}-cpu type from KVM code to avoid #ifdefs
> > >>> and only when KVM is enabled to avoid hacks in CPU code.
> > >>> Make kvm_cpu_fill_host() into a host specific class_init and inline
> > >>> cpu_x86_fill_model_id().
> > >>>
> > >>> Let kvm_check_features_against_host() obtain host-{i386,86_64}-cpu for
> > >>> comparison.
> > >>>
> > >>> Signed-off-by: Andreas Färber 
> > >>> Signed-off-by: Igor Mammedov 
> > >>> ---
> > >>> v4:
> > >>>   * set error if cpu model is not found and goto out;
> > >>>   * copy vendor override from 'host' CPU class in sub-class'es
> > >>> class_init() if 'host' CPU class is available.
> > >>>   * register type TYPE_HOST_X86_CPU in kvm_arch_init(), this type
> > >>> should be available only in KVM mode and we haven't printed it in
> > >>> -cpu ? output so far, so we can continue doing so. It's not
> > >>> really confusing to show 'host' cpu (even if we do it) when KVM
> > >>> is not enabled.
> > >>>   * remove special case for 'host' CPU check in x86_cpu_class_by_name(),
> > >>> due to 'host' CPU will not find anything if not in KVM mode or
> > >>> return 'host' CPU class in KVM mode, i.e. treat it as regular CPUs.
> > >>> ---
> > >>>  target-i386/cpu-qom.h |   24 
> > >>>  target-i386/cpu.c |  331 
> > >>> ++---
> > >>>  target-i386/cpu.h |5 +-
> > >>>  target-i386/kvm.c |   72 +++
> > >>>  4 files changed, 217 insertions(+), 215 deletions(-)
> > >>>
> > >>> diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
> > >>> index 48e6b54..80bf72d 100644
> > >>> --- a/target-i386/cpu-qom.h
> > >>> +++ b/target-i386/cpu-qom.h
> > >>> @@ -30,6 +30,27 @@
> > >>>  #define TYPE_X86_CPU "i386-cpu"
> > >>>  #endif
> > >>>  
> > >>> +#define TYPE_HOST_X86_CPU "host-" TYPE_X86_CPU
> > >>
> > >> Can we introduce a X86_CPU_CLASS_NAME() macro to help us make the rules
> > >> to generate CPU class names clearer?
> > >>
> > >> e.g.:
> > >>
> > >> #define X86_CPU_CLASS_NAME(s) s "-" TYPE_X86_CPU
> > >> [...]
> > >> #define TYPE_HOST_X86_CPU X86_CPU_CLASS_NAME("host")
> > >> [...]
> > >>   /* (at the class lookup code) */
> > >>   typename = g_strdup_printf(X86_CPU_CLASS_NAME("%s"), name);
> > > I kind of like Andreas' variant not hiding format string in macro, for
> > > one doesn't have look-up what macro does to see how name will look.
> > > Especially since it's called in only 2 places.

"2 places" is already too much duplication to my taste. :-)

But I am not completely against the current version.


> > > 
> > >>
> > >>
> > >>
> > >>> +
> > >>> +typedef struct x86_def_t {
> > >>> +const char *name;
> > >>> +uint32_t level;
> > >>> +/* vendor is zero-terminated, 12 character ASCII string */
> > >>> +char vendor[CPUID_VENDOR_SZ + 1];
> > >>> +int family;
> > >>> +int model;
> > >>> +int stepping;
> > >>> +uint32_t features, ext_features, ext2_features, ext3_features;
> > >>> +uint32_t kvm_features, svm_features;
> > >>> +uint32_t xlevel;
> > >>> +char model_id[48];
> > >>> +/* Store the results of Centaur's CPUID instructions */
> > >>> +uint32_t ext4_features;
> > >>> +uint32_t xlevel2;
> > >>> +/* The feature bits on CPUID[EAX=7,ECX=0].EBX */
> > >>> +uint32_t cpuid_7_0_ebx_features;
> > >>> +} x86_def_t;
> > >>> +
> > >>>  #define X86_CPU_CLASS(klass) \
> > >>>  OBJECT_CLASS_CHECK(X86CPUClass, (klass), TYPE_X86_CPU)
> > >>>  #define X86_CPU(obj) \
> > >>> @@ -41,6 +62,7 @@
> > >>>   * X86CPUClass:
> > >>>   * @parent_realize: The parent class' realize handler.
> > >>>   * @parent_reset: The parent class' reset handler.
> > >>> + * @info: Model-specific data.
> > >>>   *
> > >>>   * An x86 CPU model or family.
> > >>>   */
> > >>> @@ -51,6 +73,8 @@ typedef struct X86CPUClass {
> > >>>  
> > >>>  DeviceRealize parent_realize;
> > >>>  void (*parent_reset)(CPUState *cpu);
> > >>> +
> > >>> +x

Re: [PATCH] kvm-s390: Bugfix for handling of iscs.

2013-02-08 Thread Christian Borntraeger
On 07/02/13 13:20, Cornelia Huck wrote:
> Hi,
> 
> here's another kvm-s390 bugfix for kvm-next, this time fixing a
> thinko in the I/O interrupt injection.
> 
> Please apply.
> 
> Cornelia Huck (1):
>   KVM: s390: Fix handling of iscs.
> 
>  arch/s390/kvm/interrupt.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)

Would be good if this could be queued for 3.9 together with the
other virtio-ccw patches to ensure that the first version of 
virtio-ccw in Linus tree works as expected.

Thanks

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

2013-02-08 Thread Marcelo Tosatti
On Fri, Feb 08, 2013 at 02:28:44PM +0200, Gleb Natapov wrote:
> On Thu, Feb 07, 2013 at 07:49:47PM -0200, Marcelo Tosatti wrote:
> > On Thu, Feb 07, 2013 at 04:01:11PM +0200, Gleb Natapov wrote:
> > > On Wed, Feb 06, 2013 at 08:49:23PM -0200, Marcelo Tosatti wrote:
> > > > > Second is that interrupt may be
> > > > > reported as delivered, but it will be coalesced (possible only with 
> > > > > the self
> > > > > IPI with the same vector):
> > > > > 
> > > > > Starting condition: PIR=0, IRR=0 vcpu is in a guest mode
> > > > > 
> > > > > io thread |   vcpu
> > > > > accept_apic_interrupt()   |
> > > > >  PIR and IRR is zero  |
> > > > >  set PIR  |
> > > > >  return delivered |
> > > > >   |  self IPI
> > > > >   |  set IRR
> > > > >   | merge PIR to IRR (*)
> > > > > 
> > > > > At (*) interrupt that was reported as delivered is coalesced.
> > > > 
> > > > Only vcpu itself should send self-IPI, so its fine.
> > > > 
> > > It is fine only because this is not happening in practice (I hope) for 
> > > single interrupt
> > > we care about. Otherwise this is serious problem.
> > 
> > Coalesced information is only interesting for non IPI cases, that
> > is, device emulation (at the moment, at least).
> > 
> And incorrect result will be returned for an interrupt injected by an 
> emulated device
> in the scenario above.
>
> > The above cause can happen when loading APIC registers, but delivered
> > is not interesting in that case. Good to document, however.
> > 
> > > > > > Or:
> > > > > > 
> > > > > > apic_accept_interrupt() {
> > > > > > 
> > > > > > 1. Read ORIG_PIR=PIR, ORIG_IRR=IRR.
> > > > > > Never set IRR when HWAPIC enabled, even if outside of guest mode.
> > > > > > 2. Set PIR and let HW or SW VM-entry transfer it to IRR.
> > > > > > 3. set_irq return value: (ORIG_PIR or ORIG_IRR set).
> > > > > > }
> > > > > > 
> > > > > This can report interrupt as coalesced, but it will be eventually 
> > > > > delivered
> > > > > as separate interrupt:
> > > > > 
> > > > > Starting condition: PIR=0, IRR=1 vcpu is in a guest mode
> > > > > 
> > > > >  io thread  | vcpu
> > > > > | 
> > > > > accept_apic_interrupt() |
> > > > > ORIG_PIR=0, ORIG_IRR=1  |
> > > > > |EOI
> > > > > |clear IRR, set ISR
> > > > > set PIR |
> > > > > return coalesced|
> > > > > |clear PIR, set IRR
> > > > > |EOI
> > > > > |clear IRR, set ISR (*)
> > > > > 
> > > > > At (*) interrupt that was reported as coalesced is delivered.
> > > > > 
> > > > > 
> > > > > So still no perfect solution. But first one has much less serious
> > > > > problems for our practical needs.
> > > > > 
> > > > > > Two or more concurrent set_irq can race with each other, though. Can
> > > > > > either document the race or add a lock.
> > > > > > 
> > > > > 
> > > > > --
> > > > >   Gleb.
> > > > 
> > > > Ok, then:
> > > > 
> > > > accept_apic_irq:
> > > > 1. coalesced = test_and_set_bit(PIR)
> > > > 2. set KVM_REQ_EVENT bit(*)
> > > > 3. if (vcpu->in_guest_mode)
> > > > 4.  if (test_and_set_bit(pir notification bit))
> > > > 5.  send PIR IPI
> > > > 6. return coalesced
> > > Do not see how it will help.
> > > 
> > > Starting condition: PIR=0, IRR=1 vcpu is in a guest mode
> > > 
> > > io thread  |  vcpu
> > > accept_apic_interrupt()|
> > > coalesced = 0, PIR=1   |
> > > vcpu in a guest mode:  |
> > > send PIR IPI   |
> > >|  receive PIR IPI
> > >|  merge PIR to IRR (*)
> > > return not coalesced   |
> > > 
> > > At (*) interrupt that was reported as delivered is coalesced.
> > 
> > Of course!
> > 
> > > The point is that we need to check PIR and IRR atomically and this is
> > > impossible.
> > 
> > Ok, next try:
> > 
> > 1. orig_irr = read irr from vapic page
> > 2. if (orig_irr == 0)
> > 3.  return test_and_test_bit(pir)
> > 4. return 0
> > 
> I think this is exactly same solution we are discussing above:
> 
> apic_accept_interrupt() {
>  if (PIR || IRR)
>return coalesced;
>  else
>set PIR;
> }

Its not exactly the same. Without test_and_set_bit(PIR) two vcpus can race.

> with the same self-IPI problem. IMO this is the best we can do and will
> work correctly for RTC interrupt re-injection case.

Yes.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/5] target-i386: Slim conversion to X86CPU subclasses

2013-02-08 Thread Igor Mammedov
On Fri, 08 Feb 2013 12:16:17 +0100
Andreas Färber  wrote:

> Am 08.02.2013 10:03, schrieb Igor Mammedov:
> > On Thu, 7 Feb 2013 13:08:19 -0200
> > Eduardo Habkost  wrote:
> > 
> >> On Tue, Feb 05, 2013 at 05:39:22PM +0100, Igor Mammedov wrote:
> >>> From: Andreas Färber 
> >>>
> >>> Move x86_def_t definition to header and embed into X86CPUClass.
> >>> Register types per built-in model definition.
> >>>
> >>> Move version initialization from x86_cpudef_setup() to class_init.
> >>>
> >>> Inline cpu_x86_register() into the X86CPU initfn.
> >>> Since instance_init cannot reports errors, drop error handling.
> >>>
> >>> Replace cpu_x86_find_by_name() with x86_cpu_class_by_name().
> >>> Move handling of KVM host vendor override from cpu_x86_find_by_name()
> >>> to the kvm_arch_init() and class_init(). Use TYPE_X86_CPU class to
> >>> communicate kvm specific defaults to other sub-classes.
> >>>
> >>> Register host-{i386,x86_64}-cpu type from KVM code to avoid #ifdefs
> >>> and only when KVM is enabled to avoid hacks in CPU code.
> >>> Make kvm_cpu_fill_host() into a host specific class_init and inline
> >>> cpu_x86_fill_model_id().
> >>>
> >>> Let kvm_check_features_against_host() obtain host-{i386,86_64}-cpu for
> >>> comparison.
> >>>
> >>> Signed-off-by: Andreas Färber 
> >>> Signed-off-by: Igor Mammedov 
> >>> ---
> >>> v4:
> >>>   * set error if cpu model is not found and goto out;
> >>>   * copy vendor override from 'host' CPU class in sub-class'es
> >>> class_init() if 'host' CPU class is available.
> >>>   * register type TYPE_HOST_X86_CPU in kvm_arch_init(), this type
> >>> should be available only in KVM mode and we haven't printed it in
> >>> -cpu ? output so far, so we can continue doing so. It's not
> >>> really confusing to show 'host' cpu (even if we do it) when KVM
> >>> is not enabled.
> >>>   * remove special case for 'host' CPU check in x86_cpu_class_by_name(),
> >>> due to 'host' CPU will not find anything if not in KVM mode or
> >>> return 'host' CPU class in KVM mode, i.e. treat it as regular CPUs.
> >>> ---
> >>>  target-i386/cpu-qom.h |   24 
> >>>  target-i386/cpu.c |  331 
> >>> ++---
> >>>  target-i386/cpu.h |5 +-
> >>>  target-i386/kvm.c |   72 +++
> >>>  4 files changed, 217 insertions(+), 215 deletions(-)
> >>>
> >>> diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
> >>> index 48e6b54..80bf72d 100644
> >>> --- a/target-i386/cpu-qom.h
> >>> +++ b/target-i386/cpu-qom.h
> >>> @@ -30,6 +30,27 @@
> >>>  #define TYPE_X86_CPU "i386-cpu"
> >>>  #endif
> >>>  
> >>> +#define TYPE_HOST_X86_CPU "host-" TYPE_X86_CPU
> >>
> >> Can we introduce a X86_CPU_CLASS_NAME() macro to help us make the rules
> >> to generate CPU class names clearer?
> >>
> >> e.g.:
> >>
> >> #define X86_CPU_CLASS_NAME(s) s "-" TYPE_X86_CPU
> >> [...]
> >> #define TYPE_HOST_X86_CPU X86_CPU_CLASS_NAME("host")
> >> [...]
> >>   /* (at the class lookup code) */
> >>   typename = g_strdup_printf(X86_CPU_CLASS_NAME("%s"), name);
> > I kind of like Andreas' variant not hiding format string in macro, for
> > one doesn't have look-up what macro does to see how name will look.
> > Especially since it's called in only 2 places.
> > 
> >>
> >>
> >>
> >>> +
> >>> +typedef struct x86_def_t {
> >>> +const char *name;
> >>> +uint32_t level;
> >>> +/* vendor is zero-terminated, 12 character ASCII string */
> >>> +char vendor[CPUID_VENDOR_SZ + 1];
> >>> +int family;
> >>> +int model;
> >>> +int stepping;
> >>> +uint32_t features, ext_features, ext2_features, ext3_features;
> >>> +uint32_t kvm_features, svm_features;
> >>> +uint32_t xlevel;
> >>> +char model_id[48];
> >>> +/* Store the results of Centaur's CPUID instructions */
> >>> +uint32_t ext4_features;
> >>> +uint32_t xlevel2;
> >>> +/* The feature bits on CPUID[EAX=7,ECX=0].EBX */
> >>> +uint32_t cpuid_7_0_ebx_features;
> >>> +} x86_def_t;
> >>> +
> >>>  #define X86_CPU_CLASS(klass) \
> >>>  OBJECT_CLASS_CHECK(X86CPUClass, (klass), TYPE_X86_CPU)
> >>>  #define X86_CPU(obj) \
> >>> @@ -41,6 +62,7 @@
> >>>   * X86CPUClass:
> >>>   * @parent_realize: The parent class' realize handler.
> >>>   * @parent_reset: The parent class' reset handler.
> >>> + * @info: Model-specific data.
> >>>   *
> >>>   * An x86 CPU model or family.
> >>>   */
> >>> @@ -51,6 +73,8 @@ typedef struct X86CPUClass {
> >>>  
> >>>  DeviceRealize parent_realize;
> >>>  void (*parent_reset)(CPUState *cpu);
> >>> +
> >>> +x86_def_t info;
> >>
> >> I thought you had suggesting making it a pointer. If you made it a
> >> pointer, you wouldn't need to move the x86_def_t declaration to
> >> cpu-qom.h.
> > 
> > x86_def_t is needed in kvm.c for host class. So there is no much point
> > in changing info into pointer, considering it's temporary solution.
> 
> The main reason I did it this way was to avoid a g_malloc0() in a
> non-failing clas

Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

2013-02-08 Thread Gleb Natapov
On Thu, Feb 07, 2013 at 07:49:47PM -0200, Marcelo Tosatti wrote:
> On Thu, Feb 07, 2013 at 04:01:11PM +0200, Gleb Natapov wrote:
> > On Wed, Feb 06, 2013 at 08:49:23PM -0200, Marcelo Tosatti wrote:
> > > > Second is that interrupt may be
> > > > reported as delivered, but it will be coalesced (possible only with the 
> > > > self
> > > > IPI with the same vector):
> > > > 
> > > > Starting condition: PIR=0, IRR=0 vcpu is in a guest mode
> > > > 
> > > > io thread |   vcpu
> > > > accept_apic_interrupt()   |
> > > >  PIR and IRR is zero  |
> > > >  set PIR  |
> > > >  return delivered |
> > > >   |  self IPI
> > > >   |  set IRR
> > > >   | merge PIR to IRR (*)
> > > > 
> > > > At (*) interrupt that was reported as delivered is coalesced.
> > > 
> > > Only vcpu itself should send self-IPI, so its fine.
> > > 
> > It is fine only because this is not happening in practice (I hope) for 
> > single interrupt
> > we care about. Otherwise this is serious problem.
> 
> Coalesced information is only interesting for non IPI cases, that
> is, device emulation (at the moment, at least).
> 
And incorrect result will be returned for an interrupt injected by an emulated 
device
in the scenario above.

> The above cause can happen when loading APIC registers, but delivered
> is not interesting in that case. Good to document, however.
> 
> > > > > Or:
> > > > > 
> > > > > apic_accept_interrupt() {
> > > > > 
> > > > > 1. Read ORIG_PIR=PIR, ORIG_IRR=IRR.
> > > > > Never set IRR when HWAPIC enabled, even if outside of guest mode.
> > > > > 2. Set PIR and let HW or SW VM-entry transfer it to IRR.
> > > > > 3. set_irq return value: (ORIG_PIR or ORIG_IRR set).
> > > > > }
> > > > > 
> > > > This can report interrupt as coalesced, but it will be eventually 
> > > > delivered
> > > > as separate interrupt:
> > > > 
> > > > Starting condition: PIR=0, IRR=1 vcpu is in a guest mode
> > > > 
> > > >  io thread  | vcpu
> > > > | 
> > > > accept_apic_interrupt() |
> > > > ORIG_PIR=0, ORIG_IRR=1  |
> > > > |EOI
> > > > |clear IRR, set ISR
> > > > set PIR |
> > > > return coalesced|
> > > > |clear PIR, set IRR
> > > > |EOI
> > > > |clear IRR, set ISR (*)
> > > > 
> > > > At (*) interrupt that was reported as coalesced is delivered.
> > > > 
> > > > 
> > > > So still no perfect solution. But first one has much less serious
> > > > problems for our practical needs.
> > > > 
> > > > > Two or more concurrent set_irq can race with each other, though. Can
> > > > > either document the race or add a lock.
> > > > > 
> > > > 
> > > > --
> > > > Gleb.
> > > 
> > > Ok, then:
> > > 
> > > accept_apic_irq:
> > > 1. coalesced = test_and_set_bit(PIR)
> > > 2. set KVM_REQ_EVENT bit  (*)
> > > 3. if (vcpu->in_guest_mode)
> > > 4.if (test_and_set_bit(pir notification bit))
> > > 5.send PIR IPI
> > > 6. return coalesced
> > Do not see how it will help.
> > 
> > Starting condition: PIR=0, IRR=1 vcpu is in a guest mode
> > 
> > io thread  |  vcpu
> > accept_apic_interrupt()|
> > coalesced = 0, PIR=1   |
> > vcpu in a guest mode:  |
> > send PIR IPI   |
> >|  receive PIR IPI
> >|  merge PIR to IRR (*)
> > return not coalesced   |
> > 
> > At (*) interrupt that was reported as delivered is coalesced.
> 
> Of course!
> 
> > The point is that we need to check PIR and IRR atomically and this is
> > impossible.
> 
> Ok, next try:
> 
> 1. orig_irr = read irr from vapic page
> 2. if (orig_irr == 0)
> 3.return test_and_test_bit(pir)
> 4. return 0
> 
I think this is exactly same solution we are discussing above:

apic_accept_interrupt() {
 if (PIR || IRR)
   return coalesced;
 else
   set PIR;
}

with the same self-IPI problem. IMO this is the best we can do and will
work correctly for RTC interrupt re-injection case.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

2013-02-08 Thread Gleb Natapov
On Fri, Feb 08, 2013 at 12:07:36AM -0200, Marcelo Tosatti wrote:
> On Thu, Feb 07, 2013 at 03:52:24PM +0200, Gleb Natapov wrote:
> > > Its not a bad idea to have a new KVM_REQ_ bit for PIR processing (just
> > > as the current patches do).
> > Without the numbers I do not see why.
> 
> KVM_REQ_EVENT already means... counting... many things. Its a well
Exactly my point. KVM_REQ_EVENT means many things, all of them event
injection related. It can be split may be to 2/3 smaller request, but
it will complicate the code and why would we do that without actually
able to show performance improvements that split provides?

> defined request, to sync PIR->VIRR, don't see your point about
> performance.
And it is just one more things that needs to be done during event
injection. Without providing a good reason no need to handle it
specially. Performance is convincing enough reason, but I what to
see numbers.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance Java server/MySQL...

2013-02-08 Thread David Cruz
Other optimizations people are testing out there.

- use "nohz=off" in the kernel loading line y menu.lst
- Disable Cgroups completely. Using cgclear, and turning off cgred
cg-config daemons.

And from a Personal point of view, we've always tried to use MySQL in
a different server from JBoss.
99% of the times is far better for performance and tuning.

David

2013/2/8 Erik Brakkee :
> 
>> 
>>> On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote:
 Hi,


 We have been benchmarking a java server application (java 6 update 29)
 that requires a mysql database. The scenario is quite simple. We open a
 web page which displays a lot of search results. To get the content of
 the
 page one big query is done with many smaller queries to retrieve the
 data.
 The test from the java side is single threaded.

 We have used the following deployment scenarios:
 1. JBoss in VM, MySql in separate VM
 2. JBoss in VM, MySQL native
 3. JBoss native, MySQL in vm.
 4. JBoss native and MySQL native on the same physical machine
 5. JBoss and MySQL virtualized on the same VM.

 What we see is that the performance (time to execute) is practically
 the
 same for all scenarios (approx. 30 seconds), except for scenario 4 that
 takes approx. 21 seconds. This difference is quite large and contrasts
 many other test on the internet and other benchmarks we did previously.

 We have tried pinning the VMs, turning hyperthreading off, varying the
 CPU
 model (including host-passthrough), but this did not have any
 significant
 impact.

 The hardware on which we are running is a dual socket E5-2650 machine
 with
 64 GB memory. The server is a Dell poweredge R720 server with SAS
 disks,
 RAID controller with battery backup (writeback cache). Transparent huge
 pages is turned on.

 We are at a loss to explain the differences in the test. In particular,
 we
 would have expected the least performance when both were running
 virtualized and we would have expected a better performance when JBoss
 and
 MySQL were running virtualized in the same VM as compared to JBoss and
 MySQL both running in different virtual machines. It looks like we are
 dealing with multiple issues here and not just one.

 Right now we have a 30% penalty for running virtualized which is too
 much
 for us; 10% would be allright. What would you suggest to do to
 troubleshoot this further?

>>>
>>> What is you kernel/qemu versions and command line you are using to start
>>> a VM?
>>
>> Centos 6.3, 2.6.32-279.22.1.el6.x86_64
>>> rpm -qf /usr/libexec/qemu-kvm
>> qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64
>>
>> The guest is also running centos 6.3 with the same settings. Settings that
>> can influence Java performance (such as transparent huge pages) are turned
>> on on both the host and guest (see the remark on hugepages below).
>>
>> The command-line is as follows:
>>
>> /usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc
>> -mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name
>> master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig
>> -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
>> -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
>> if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
>> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
>> -drive
>> file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
>> -drive
>> file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
>> -drive
>> file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2
>> -drive
>> file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25
>> -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus
>> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
>>
>>
>>
>> This virtual machine has three virtio disks, and one file based disk. The
>> last disk is about 100MB in size and is used only during startup (contain

Re: [RFC PATCH 0/8] virtio: new API for addition of buffers, scatterlist changes

2013-02-08 Thread Jens Axboe
On Fri, Feb 08 2013, Rusty Russell wrote:
> Paolo Bonzini  writes:
> > The virtqueue_add_buf function has two limitations:
> >
> > 1) it requires the caller to provide all the buffers in a single call;
> >
> > 2) it does not support chained scatterlists: the buffers must be
> > provided as an array of struct scatterlist.
> >
> > Because of these limitations, virtio-scsi has to copy each request into
> > a scatterlist internal to the driver.  It cannot just use the one that
> > was prepared by the upper SCSI layers.
> 
> Hi Paulo,
> 
> Note that you've defined your problem in terms of your solution
> here.  For clarity:
> 
> The problem: we want to prepend and append to a scatterlist.  We can't
> append, because the chained scatterlist implementation requires
> an element to be appended to join two scatterlists together.
> 
> The solution: fix scatterlists by introducing struct sg_ring:
> struct sg_ring {
> struct list_head ring;
>   unsigned int nents;
>   unsigned int orig_nents; /* do we want to replace sg_table? */
> struct scatterlist *sg;
> };

This would definitely be more flexible than the current chaining.
However:

> The workaround: make virtio accept multiple scatterlists for a single
> buffer.
> 
> There's nothing wrong with your workaround, but if other subsystems have
> the same problem we do, perhaps we should consider a broader solution?

Do other use cases actually exist? I don't think I've come across this
requirement before, since it was introduced (6 years ago, from a cursory
look at the git logs!).

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] target-i386: Slim conversion to X86CPU subclasses

2013-02-08 Thread Andreas Färber
Am 08.02.2013 10:03, schrieb Igor Mammedov:
> On Thu, 7 Feb 2013 13:08:19 -0200
> Eduardo Habkost  wrote:
> 
>> On Tue, Feb 05, 2013 at 05:39:22PM +0100, Igor Mammedov wrote:
>>> From: Andreas Färber 
>>>
>>> Move x86_def_t definition to header and embed into X86CPUClass.
>>> Register types per built-in model definition.
>>>
>>> Move version initialization from x86_cpudef_setup() to class_init.
>>>
>>> Inline cpu_x86_register() into the X86CPU initfn.
>>> Since instance_init cannot reports errors, drop error handling.
>>>
>>> Replace cpu_x86_find_by_name() with x86_cpu_class_by_name().
>>> Move handling of KVM host vendor override from cpu_x86_find_by_name()
>>> to the kvm_arch_init() and class_init(). Use TYPE_X86_CPU class to
>>> communicate kvm specific defaults to other sub-classes.
>>>
>>> Register host-{i386,x86_64}-cpu type from KVM code to avoid #ifdefs
>>> and only when KVM is enabled to avoid hacks in CPU code.
>>> Make kvm_cpu_fill_host() into a host specific class_init and inline
>>> cpu_x86_fill_model_id().
>>>
>>> Let kvm_check_features_against_host() obtain host-{i386,86_64}-cpu for
>>> comparison.
>>>
>>> Signed-off-by: Andreas Färber 
>>> Signed-off-by: Igor Mammedov 
>>> ---
>>> v4:
>>>   * set error if cpu model is not found and goto out;
>>>   * copy vendor override from 'host' CPU class in sub-class'es
>>> class_init() if 'host' CPU class is available.
>>>   * register type TYPE_HOST_X86_CPU in kvm_arch_init(), this type
>>> should be available only in KVM mode and we haven't printed it in
>>> -cpu ? output so far, so we can continue doing so. It's not
>>> really confusing to show 'host' cpu (even if we do it) when KVM
>>> is not enabled.
>>>   * remove special case for 'host' CPU check in x86_cpu_class_by_name(),
>>> due to 'host' CPU will not find anything if not in KVM mode or
>>> return 'host' CPU class in KVM mode, i.e. treat it as regular CPUs.
>>> ---
>>>  target-i386/cpu-qom.h |   24 
>>>  target-i386/cpu.c |  331 
>>> ++---
>>>  target-i386/cpu.h |5 +-
>>>  target-i386/kvm.c |   72 +++
>>>  4 files changed, 217 insertions(+), 215 deletions(-)
>>>
>>> diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
>>> index 48e6b54..80bf72d 100644
>>> --- a/target-i386/cpu-qom.h
>>> +++ b/target-i386/cpu-qom.h
>>> @@ -30,6 +30,27 @@
>>>  #define TYPE_X86_CPU "i386-cpu"
>>>  #endif
>>>  
>>> +#define TYPE_HOST_X86_CPU "host-" TYPE_X86_CPU
>>
>> Can we introduce a X86_CPU_CLASS_NAME() macro to help us make the rules
>> to generate CPU class names clearer?
>>
>> e.g.:
>>
>> #define X86_CPU_CLASS_NAME(s) s "-" TYPE_X86_CPU
>> [...]
>> #define TYPE_HOST_X86_CPU X86_CPU_CLASS_NAME("host")
>> [...]
>>   /* (at the class lookup code) */
>>   typename = g_strdup_printf(X86_CPU_CLASS_NAME("%s"), name);
> I kind of like Andreas' variant not hiding format string in macro, for
> one doesn't have look-up what macro does to see how name will look.
> Especially since it's called in only 2 places.
> 
>>
>>
>>
>>> +
>>> +typedef struct x86_def_t {
>>> +const char *name;
>>> +uint32_t level;
>>> +/* vendor is zero-terminated, 12 character ASCII string */
>>> +char vendor[CPUID_VENDOR_SZ + 1];
>>> +int family;
>>> +int model;
>>> +int stepping;
>>> +uint32_t features, ext_features, ext2_features, ext3_features;
>>> +uint32_t kvm_features, svm_features;
>>> +uint32_t xlevel;
>>> +char model_id[48];
>>> +/* Store the results of Centaur's CPUID instructions */
>>> +uint32_t ext4_features;
>>> +uint32_t xlevel2;
>>> +/* The feature bits on CPUID[EAX=7,ECX=0].EBX */
>>> +uint32_t cpuid_7_0_ebx_features;
>>> +} x86_def_t;
>>> +
>>>  #define X86_CPU_CLASS(klass) \
>>>  OBJECT_CLASS_CHECK(X86CPUClass, (klass), TYPE_X86_CPU)
>>>  #define X86_CPU(obj) \
>>> @@ -41,6 +62,7 @@
>>>   * X86CPUClass:
>>>   * @parent_realize: The parent class' realize handler.
>>>   * @parent_reset: The parent class' reset handler.
>>> + * @info: Model-specific data.
>>>   *
>>>   * An x86 CPU model or family.
>>>   */
>>> @@ -51,6 +73,8 @@ typedef struct X86CPUClass {
>>>  
>>>  DeviceRealize parent_realize;
>>>  void (*parent_reset)(CPUState *cpu);
>>> +
>>> +x86_def_t info;
>>
>> I thought you had suggesting making it a pointer. If you made it a
>> pointer, you wouldn't need to move the x86_def_t declaration to
>> cpu-qom.h.
> 
> x86_def_t is needed in kvm.c for host class. So there is no much point
> in changing info into pointer, considering it's temporary solution.

The main reason I did it this way was to avoid a g_malloc0() in a
non-failing class_init. Also it is closer to having class fields sit
directly in X86CPUClass.

>>>  } X86CPUClass;
>>>  
>>>  /**
>>> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
>>> index 1aee097..62fdc84 100644
>>> --- a/target-i386/cpu.c
>>> +++ b/target-i386/cpu.c
>>> @@ -47,8 +47,8 @@
> [...]
> 
>>> @@ -2195,6 +201

Re: KVM performance Java server/MySQL...

2013-02-08 Thread Erik Brakkee

> 
>> On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote:
>>> Hi,
>>>
>>>
>>> We have been benchmarking a java server application (java 6 update 29)
>>> that requires a mysql database. The scenario is quite simple. We open a
>>> web page which displays a lot of search results. To get the content of
>>> the
>>> page one big query is done with many smaller queries to retrieve the
>>> data.
>>> The test from the java side is single threaded.
>>>
>>> We have used the following deployment scenarios:
>>> 1. JBoss in VM, MySql in separate VM
>>> 2. JBoss in VM, MySQL native
>>> 3. JBoss native, MySQL in vm.
>>> 4. JBoss native and MySQL native on the same physical machine
>>> 5. JBoss and MySQL virtualized on the same VM.
>>>
>>> What we see is that the performance (time to execute) is practically
>>> the
>>> same for all scenarios (approx. 30 seconds), except for scenario 4 that
>>> takes approx. 21 seconds. This difference is quite large and contrasts
>>> many other test on the internet and other benchmarks we did previously.
>>>
>>> We have tried pinning the VMs, turning hyperthreading off, varying the
>>> CPU
>>> model (including host-passthrough), but this did not have any
>>> significant
>>> impact.
>>>
>>> The hardware on which we are running is a dual socket E5-2650 machine
>>> with
>>> 64 GB memory. The server is a Dell poweredge R720 server with SAS
>>> disks,
>>> RAID controller with battery backup (writeback cache). Transparent huge
>>> pages is turned on.
>>>
>>> We are at a loss to explain the differences in the test. In particular,
>>> we
>>> would have expected the least performance when both were running
>>> virtualized and we would have expected a better performance when JBoss
>>> and
>>> MySQL were running virtualized in the same VM as compared to JBoss and
>>> MySQL both running in different virtual machines. It looks like we are
>>> dealing with multiple issues here and not just one.
>>>
>>> Right now we have a 30% penalty for running virtualized which is too
>>> much
>>> for us; 10% would be allright. What would you suggest to do to
>>> troubleshoot this further?
>>>
>>
>> What is you kernel/qemu versions and command line you are using to start
>> a VM?
>
> Centos 6.3, 2.6.32-279.22.1.el6.x86_64
>> rpm -qf /usr/libexec/qemu-kvm
> qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64
>
> The guest is also running centos 6.3 with the same settings. Settings that
> can influence Java performance (such as transparent huge pages) are turned
> on on both the host and guest (see the remark on hugepages below).
>
> The command-line is as follows:
>
> /usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc
> -mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name
> master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig
> -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
> -drive
> file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
> -drive
> file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
> -drive
> file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2
> -drive
> file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25
> -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
>
>
>
> This virtual machine has three virtio disks, and one file based disk. The
> last disk is about 100MB in size and is used only during startup (contains
> configurationd data for initializing the vm) and is only read and never
> written. It has one CDrom which is not used. It also uses old-style
> hugepages. Apparently this did not have any significant effect on
> performance over transparent hugepages (as would be expected). We
> configured these old style hugepages just to rule out any issue with
> transparent hugepages.
>
> Initially we got 30% performance penalty with 16 processors, but in the
> current setting of using 4 proces

[PATCH] Added one_reg interface for timer registers

2013-02-08 Thread Bharat Bhushan
If userspace wants to change some specific bits of TSR
(timer status register) then it uses GET/SET_SREGS ioctl interface.
So the steps will be:
  i)   user-space will make get ioctl,
  ii)  change TSR in userspace
  iii) then make set ioctl.
It can happen that TSR gets changed by kernel after step i) and
before step iii).

To avoid this we have added below one_reg ioctls for oring and clearing
specific bits in TSR. This patch adds one registerface for:
 1) setting specific bit in TSR (timer status register)
 2) clearing specific bit in TSR (timer status register)
 3) setting the TCR register. There are cases where we want to only
change TCR and not TSR. Although we can uses SREGS without
KVM_SREGS_E_UPDATE_TSR flag but I think one reg is better. I am open
if someone feels we should use SREGS only here.

Signed-off-by: Bharat Bhushan 
---
 arch/powerpc/include/uapi/asm/kvm.h |4 
 arch/powerpc/kvm/booke.c|   18 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 16064d0..b4ad5d4 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -417,4 +417,8 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_EPCR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x85)
 #define KVM_REG_PPC_EPR(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x86)
 
+/* Timer Status Register OR/CLEAR interface */
+#define KVM_REG_PPC_OR_TSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x87)
+#define KVM_REG_PPC_CLEAR_TSR  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x88)
+#define KVM_REG_PPC_SET_TCR(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x89)
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 020923e..983c06f 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1481,6 +1481,24 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
break;
}
 #endif
+   case KVM_REG_PPC_OR_TSR: {
+   u32 tsr_bits;
+   r = get_user(tsr_bits, (u32 __user *)(long)reg->addr);
+   kvmppc_set_tsr_bits(vcpu, tsr_bits);
+   break;
+   }
+   case KVM_REG_PPC_CLEAR_TSR: {
+   u32 tsr_bits;
+   r = get_user(tsr_bits, (u32 __user *)(long)reg->addr);
+   kvmppc_clr_tsr_bits(vcpu, tsr_bits);
+   break;
+   }
+   case KVM_REG_PPC_SET_TCR: {
+   u32 tcr;
+   r = get_user(tcr, (u32 __user *)(long)reg->addr);
+   kvmppc_set_tcr(vcpu, tcr);
+   break;
+   }
default:
break;
}
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance Java server/MySQL...

2013-02-08 Thread Erik Brakkee

> The IO scheduler on the host and on the guest is CFS. We also tried with
> deadline scheduler on the host but this did not make any measurable
> difference. We did not try no-op on the host.

I mean of course that we did not try no-op on the guest (not on the host).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance Java server/MySQL...

2013-02-08 Thread Erik Brakkee

> On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote:
>> Hi,
>>
>>
>> We have been benchmarking a java server application (java 6 update 29)
>> that requires a mysql database. The scenario is quite simple. We open a
>> web page which displays a lot of search results. To get the content of
>> the
>> page one big query is done with many smaller queries to retrieve the
>> data.
>> The test from the java side is single threaded.
>>
>> We have used the following deployment scenarios:
>> 1. JBoss in VM, MySql in separate VM
>> 2. JBoss in VM, MySQL native
>> 3. JBoss native, MySQL in vm.
>> 4. JBoss native and MySQL native on the same physical machine
>> 5. JBoss and MySQL virtualized on the same VM.
>>
>> What we see is that the performance (time to execute) is practically the
>> same for all scenarios (approx. 30 seconds), except for scenario 4 that
>> takes approx. 21 seconds. This difference is quite large and contrasts
>> many other test on the internet and other benchmarks we did previously.
>>
>> We have tried pinning the VMs, turning hyperthreading off, varying the
>> CPU
>> model (including host-passthrough), but this did not have any
>> significant
>> impact.
>>
>> The hardware on which we are running is a dual socket E5-2650 machine
>> with
>> 64 GB memory. The server is a Dell poweredge R720 server with SAS disks,
>> RAID controller with battery backup (writeback cache). Transparent huge
>> pages is turned on.
>>
>> We are at a loss to explain the differences in the test. In particular,
>> we
>> would have expected the least performance when both were running
>> virtualized and we would have expected a better performance when JBoss
>> and
>> MySQL were running virtualized in the same VM as compared to JBoss and
>> MySQL both running in different virtual machines. It looks like we are
>> dealing with multiple issues here and not just one.
>>
>> Right now we have a 30% penalty for running virtualized which is too
>> much
>> for us; 10% would be allright. What would you suggest to do to
>> troubleshoot this further?
>>
>
> What is you kernel/qemu versions and command line you are using to start
> a VM?

Centos 6.3, 2.6.32-279.22.1.el6.x86_64
> rpm -qf /usr/libexec/qemu-kvm
qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64

The guest is also running centos 6.3 with the same settings. Settings that
can influence Java performance (such as transparent huge pages) are turned
on on both the host and guest (see the remark on hugepages below).

The command-line is as follows:

/usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc
-mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name
master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig
-nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
-drive
file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive
file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
-drive
file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2
-drive
file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25
-netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6



This virtual machine has three virtio disks, and one file based disk. The
last disk is about 100MB in size and is used only during startup (contains
configurationd data for initializing the vm) and is only read and never
written. It has one CDrom which is not used. It also uses old-style
hugepages. Apparently this did not have any significant effect on
performance over transparent hugepages (as would be expected). We
configured these old style hugepages just to rule out any issue with
transparent hugepages.

Initially we got 30% performance penalty with 16 processors, but in the
current setting of using 4 processors we see a reduced performance penalty
of 15-20%. Also on the physical host, we are not running the numad daemon
at the moment. Also, we tried disabling hyperthre