Re: [PATCH 1/1] virtio: rng: add derating factor for use by hwrng core

2014-08-11 Thread Amit Shah
On (Mon) 11 Aug 2014 [15:11:03], H. Peter Anvin wrote:
> On 08/11/2014 11:49 AM, Amit Shah wrote:
> > The khwrngd thread is started when a hwrng device of sufficient
> > quality is registered.  The virtio-rng device is backed by the
> > hypervisor, and we trust the hypervisor to provide real entropy.  A
> > malicious hypervisor is a scenario that's ruled out, so we are certain
> > the quality of randomness we receive is perfectly trustworthy.  Hence,
> > we use 100% for the factor, indicating maximum confidence in the source.
> > 
> > Signed-off-by: Amit Shah 
> 
> It isn't "ruled out", it is just irrelevant: if the hypervisor is
> malicious, the quality of your random number source is the least of your
> problems.

Yea; I meant ruled out in that sense.  Should the commit msg be more
verbose?

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] powerpc/kvm: support to handle sw breakpoint

2014-08-11 Thread Madhavan Srinivasan
On Monday 11 August 2014 02:45 PM, Alexander Graf wrote:
> 
> On 11.08.14 10:51, Benjamin Herrenschmidt wrote:
>> On Mon, 2014-08-11 at 09:26 +0200, Alexander Graf wrote:
 diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
 index da86d9b..d95014e 100644
 --- a/arch/powerpc/kvm/emulate.c
 +++ b/arch/powerpc/kvm/emulate.c
>>> This should be book3s_emulate.c.
>> Any reason we can't make that 0000 opcode as breakpoint common to
>> all powerpc variants ?
> 
> I can't think of a good reason. We use a hypercall on booke (which traps
> into an illegal instruction for pr) today, but I don't think it has to
> be that way.
> 
> Given that the user space API allows us to change it dynamically, there
> should be nothing blocking us from going with 0000 always.
> 
Kindly correct me if i am wrong. So we can still have a common code in
emulate.c to set the env for both HV and pr incase of illegal
instruction (i will rebase latest src). But suggestion here to use
0000, in that case current path in embed is kvmppc_handle_exit
(booke.c) -> BOOKE_INTERRUPT_HV_PRIV -> emulation_exit ->
kvmppc_emulate_instruction, will change to kvmppc_handle_exit (booke.c)
-> BOOKE_INTERRUPT_PROGRAM -> if debug instr call emulation_exit else
send to guest?

Thanks for review
regards
Maddy

> 
> Alex
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-11 Thread Xiao Guangrong
We may cache the current mmio generation number and stale memslot info
into spte, like this scenario:

   CPU 0  CPU 1
page fault:add a new memslot
read memslot and detecting its a mmio access
   update memslots
   update generation number
read generation number
cache the gpa and current gen number into spte

So, if guest accesses the gpa later, it will generate a incorrect
mmio exit

This patch fixes it by updating the generation number after
synchronize_srcu_expedited() that makes sure the generation
number updated only if memslots update is finished

Cc: sta...@vger.kernel.org
Cc: David Matlack 
Signed-off-by: Xiao Guangrong 
---
 virt/kvm/kvm_main.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..ca3cdac 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new, u64 last_generation);
+   struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new,
-   u64 last_generation)
+   struct kvm_memory_slot *new)
 {
if (new) {
int id = new->id;
@@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
if (new->npages != npages)
sort_memslots(slots);
}
-
-   slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 {
struct kvm_memslots *old_memslots = kvm->memslots;
 
-   update_memslots(slots, new, kvm->memslots->generation);
+   update_memslots(slots, new);
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);
+   slots->generation++;
 
kvm_arch_memslots_updated(kvm);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm: x86: fix stale mmio cache bug

2014-08-11 Thread Xiao Guangrong
From: David Matlack 

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

[ xiaoguangrong: adjust the code to make it simpler for stable-tree fix. ]

Cc: sta...@vger.kernel.org
Signed-off-by: David Matlack 
Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c  |  4 ++--
 arch/x86/kvm/mmu.h  |  2 ++
 arch/x86/kvm/x86.h  | 19 +++
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5724601..58fa3ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -481,6 +481,7 @@ struct kvm_vcpu_arch {
u64 mmio_gva;
unsigned access;
gfn_t mmio_gfn;
+   unsigned int mmio_gen;
 
struct kvm_pmu pmu;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..e00fbfe 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -234,7 +234,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
return gen;
 }
 
-static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
/*
 * Init kvm generation close to MMIO_MAX_GEN to easily test the
@@ -3163,7 +3163,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
 
-   vcpu_clear_mmio_info(vcpu, ~0ul);
+   vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
hpa_t root = vcpu->arch.mmu.root_hpa;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b982112..e2d902a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -76,6 +76,8 @@ enum {
 };
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
+unsigned int kvm_current_mmio_generation(struct kvm *kvm);
+
 void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
bool execonly);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 306a1b7..ffd03b7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -3,6 +3,7 @@
 
 #include 
 #include "kvm_cache_regs.h"
+#include "mmu.h"
 
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
@@ -88,15 +89,23 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu 
*vcpu,
vcpu->arch.mmio_gva = gva & PAGE_MASK;
vcpu->arch.access = access;
vcpu->arch.mmio_gfn = gfn;
+   vcpu->arch.mmio_gen = kvm_current_mmio_generation(vcpu->kvm);
+}
+
+static inline bool vcpu_match_mmio_gen(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.mmio_gen == kvm_current_mmio_generation(vcpu->kvm);
 }
 
 /*
  * Clear the mmio cache info for the given gva,
- * specially, if gva is ~0ul, we clear all mmio cache info.
+ * specially, if gva is ~MMIO_GVA_ANY, we clear all mmio cache info.
  */
+#define MMIO_GVA_ANY   ~((gva_t)0)
+
 static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
 {
-   if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
+   if (gva != MMIO_GVA_ANY && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
return;
 
vcpu->arch.mmio_gva = 0;
@@ -104,7 +113,8 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu 
*vcpu, gva_t gva)
 
 static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long 
gva)
 {
-   if (vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
+   if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gva &&
+ vcpu->arch.mmio_gva == (gva & PAGE_MASK))
return true;
 
return false;
@@ -112,7 +122,8 @@ static inline bool vcpu_match_mmio_gva(struct kvm_vcpu 
*vcpu, unsigned long gva)
 
 static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
-   if (vcpu->arch.mmio_gfn && vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
+   if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gfn &&
+ vcpu->arch.mmio_gf

The status about vhost-net on kvm-arm?

2014-08-11 Thread Li Liu
Hi all,

Is anyone there can tell the current status of vhost-net on kvm-arm?

Half a year has passed from Isa Ansharullah asked this question:
http://www.spinics.net/lists/kvm-arm/msg08152.html

I have found two patches which have provided the kvm-arm support of
eventfd and irqfd:

1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM
http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html

2) [RFC,v3] ARM: KVM: add irqfd and irq routing support
https://patches.linaro.org/32261/

And there's a rough patch for qemu to support eventfd from Ying-Shiuan Pan:

[Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio
https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

But there no any comments of this patch. And I can found nothing about qemu
to support irqfd. Do I lost the track?

If nobody try to fix it. We have a plan to complete it about virtio-mmio
supporing irqfd and multiqueue.






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 2/4] arm: ARMv7 dirty page logging inital mem region write protect (w/no huge PUD support)

2014-08-11 Thread Mario Smarduch
On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> Remove the parenthesis from the subject line.
> 
> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This patch 
>> series
> ^^^
> stray whitespace of
> 
> 
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> may be worth mentioning that this is always valid on ARMv7.
> 
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h   |1 +
>>  arch/arm/include/asm/kvm_mmu.h|   20 ++
>>  arch/arm/include/asm/pgtable-3level.h |1 +
>>  arch/arm/kvm/arm.c|9 +++
>>  arch/arm/kvm/mmu.c|  128 
>> +
>>  5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>  pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>  }
>>  
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  #define kvm_pgd_addr_end(addr, end) \
>>  ({  u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;\
>> diff --git a/arch/arm/include/asm/pgtable-3level.h 
>> b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>  #define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6)   /* 
>> HAP[1]   */
>>  #define L_PTE_S2_RDWR   (_AT(pteval_t, 3) << 6)   /* 
>> HAP[2:1] */
>>  
>> +#define L_PMD_S2_RDONLY (_AT(pteval_t, 1) << 6)   /* 
>> HAP[1]   */
>>  #define L_PMD_S2_RDWR   (_AT(pmdval_t, 3) << 6)   /* 
>> HAP[2:1] */
>>  
>>  /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>> const struct kvm_memory_slot *old,
>> enum kvm_mr_change change)
>>  {
>> +#ifdef CONFIG_ARM
>> +/*
>> + * At this point memslot has been committed and there is an
>> + * allocated dirty_bitmap[], dirty pages will be be tracked while the
>> + * memory slot is write protected.
>> + */
>> +if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> +kvm_mmu_wp_memory_region(kvm, mem->slot);
>> +#endif
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 35254c6..7bfc792 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
>> phys_addr_t *ipap)
>>  return false;
>>  }
>>  
>> +#ifdef CONFIG_ARM
>> +/**
>> + * stage2_wp_pte_range - write protect PTE range
>> + * @pmd:pointer to pmd entry
>> + * @addr:   range start address
>> + * @end:range end address
>> + */
>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t 
>> end)
>> +{
>> +pte_t *pte;
>> +
>> +pte = pte_offset_kernel(pmd, addr);
>> +do {
>> +if (!pte_none(*pte)) {
>> +if (!kvm_s2pte_readonly(pte))
>> +kvm_set_s2pte_readonly(pte);
>> +}
>> +} while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> +
>> +/**
>> + * stage2_wp_pmd_range - write protect PMD range
>> + * @pud:pointer to pud entry
>> + * @addr:   range start 

Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support

2014-08-11 Thread Mario Smarduch
On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and dissolves huge pages to page tables.
>> In case migration is canceled huge pages will be used again.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/kvm/mmu.c |   31 +--
>>  1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index ca84331..a17812a 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct 
>> kvm_mmu_memory_cache
>>  }
>>  
>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache 
>> *cache,
>> -  phys_addr_t addr, const pte_t *new_pte, bool iomap)
>> +  phys_addr_t addr, const pte_t *new_pte, bool iomap,
>> +  bool logging_active)
>>  {
>>  pmd_t *pmd;
>>  pte_t *pte, old_pte;
>> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
>> kvm_mmu_memory_cache *cache,
>>  return 0;
>>  }
>>  
>> +/*
>> + * While dirty memory logging, clear PMD entry for huge page and split
>> + * into smaller pages, to track dirty memory at page granularity.
>> + */
>> +if (logging_active && kvm_pmd_huge(*pmd)) {
>> +phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
>> +clear_pmd_entry(kvm, pmd, ipa);
> 
> clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
> definitely not the right thing to call.

I don't see that in 3.15rc1/rc4 -

static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
{
if (kvm_pmd_huge(*pmd)) {
pmd_clear(pmd);
kvm_tlb_flush_vmid_ipa(kvm, addr);
} else {
  []
}

I thought the purpose of this function was to clear PMD entry. Also
ran hundreds of tests no problems. Hmmm confused.

> 
>> +}
>> +
>>  /* Create stage-2 page mappings - Level 2 */
>>  if (pmd_none(*pmd)) {
>>  if (!cache)
>> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
>> guest_ipa,
>>  if (ret)
>>  goto out;
>>  spin_lock(&kvm->mmu_lock);
>> -ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
>> +ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>>  spin_unlock(&kvm->mmu_lock);
>>  if (ret)
>>  goto out;
>> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>  struct vm_area_struct *vma;
>>  pfn_t pfn;
>> +/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>> +#ifdef CONFIG_ARM
>> +bool logging_active = !!memslot->dirty_bitmap;
>> +#else
>> +bool logging_active = false;
>> +#endif
> 
> can you make this an inline in the header files for now please?

Yes definitely.

> 
>>  
>>  write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>  if (fault_status == FSC_PERM && !write_fault) {
>> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  /* Let's check if we will get back a huge page backed by hugetlbfs */
>>  down_read(¤t->mm->mmap_sem);
>>  vma = find_vma_intersection(current->mm, hva, hva + 1);
>> -if (is_vm_hugetlb_page(vma)) {
>> +if (is_vm_hugetlb_page(vma) && !logging_active) {
>>  hugetlb = true;
>>  gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>>  } else {
>> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  spin_lock(&kvm->mmu_lock);
>>  if (mmu_notifier_retry(kvm, mmu_seq))
>>  goto out_unlock;
>> -if (!hugetlb && !force_pte)
>> +if (!hugetlb && !force_pte && !logging_active)
>>  hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>  
>>  if (hugetlb) {
>> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  kvm_set_pfn_dirty(pfn);
>>  }
>>  coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
>> -ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>> +ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
>> +logging_active);
>>  }
>>  
>> +if (write_fault)
>> +mark_page_dirty(kvm, gfn);
>>  
>>  out_unlock:
>>  spin_unlock(&kvm->mmu_lock);
>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, 
>> gpa_t gpa, void *data)
>>  {
>>  pte_t *pte = (pte_t *)data;
>>  
>> -st

Re: [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific

2014-08-11 Thread Mario Smarduch
On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:07PM -0700, Mario Smarduch wrote:
>> This patch adds support for keeping track of VM dirty pages. As dirty page 
>> log
>> is retrieved, the pages that have been written are write protected again for
>> next write and log read.
>>
>> The dirty log read function is generic for armv7 and x86, and arch specific
>> for arm64, ia64, mips, powerpc, s390.
> 
> So I would also split up this patch.  One that only modifies the
> existing functionality, but does not introduce any new functionality for
> ARM.  Put this first patch in the beginning of the patch series with the
> other prepatory patch, so that you get something like this:
> 
> [PATCH 1/X] KVM: Add architecture-specific TLB flush implementations
> [PATCH 2/X] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log
> [PATCH 3/X] arm: KVM: Add ARMv7 API to flush TLBs
> [PATCH 4/X] arm: KVM: Add initial dirty page locking infrastructure
> ...

Yes definitely, thanks for the advice makes the patch series easier to
review.

> 
> That will make it easier to get the patches accepted and for us to
> review...
> 
> 
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/kvm/arm.c  |8 +++-
>>  arch/arm/kvm/mmu.c  |   22 +
>>  arch/arm64/include/asm/kvm_host.h   |2 +
>>  arch/arm64/kvm/Kconfig  |1 +
>>  arch/ia64/include/asm/kvm_host.h|1 +
>>  arch/ia64/kvm/Kconfig   |1 +
>>  arch/ia64/kvm/kvm-ia64.c|2 +-
>>  arch/mips/include/asm/kvm_host.h|2 +-
>>  arch/mips/kvm/Kconfig   |1 +
>>  arch/mips/kvm/kvm_mips.c|2 +-
>>  arch/powerpc/include/asm/kvm_host.h |2 +
>>  arch/powerpc/kvm/Kconfig|1 +
>>  arch/powerpc/kvm/book3s.c   |2 +-
>>  arch/powerpc/kvm/booke.c|2 +-
>>  arch/s390/include/asm/kvm_host.h|2 +
>>  arch/s390/kvm/Kconfig   |1 +
>>  arch/s390/kvm/kvm-s390.c|2 +-
>>  arch/x86/kvm/x86.c  |   86 -
>>  include/linux/kvm_host.h|3 ++
>>  virt/kvm/Kconfig|3 ++
>>  virt/kvm/kvm_main.c |   90 
>> +++
>>  21 files changed, 143 insertions(+), 93 deletions(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index e11c2dd..f7739a0 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>  }
>>  }
>>  
>> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>> +#ifdef CONFIG_ARM64
>> +/*
>> + * For now features not supported on ARM64, the #ifdef is added to make that
>> + * clear but not needed since ARM64 Kconfig selects function in generic 
>> code.
>> + */
> 
> I don't think this comment is needed, but if you really want it, it
> should be something like:
> 
> /*
>  * ARM64 does not support dirty logging and therefore selects
>  * CONFIG_HAVE_KVM_ARCH_DIRTY_LOG.  Provide a -EINVAL stub.
>  */

I think it could go since I'm doing arm64 now.

> 
>> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log 
>> *log)
>>  {
>>  return -EINVAL;
>>  }
>> +#endif
>>  
>>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>>  struct kvm_arm_device_addr *dev_addr)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7bfc792..ca84331 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>>  kvm_flush_remote_tlbs(kvm);
>>  spin_unlock(&kvm->mmu_lock);
>>  }
>> +
>> +/**
>> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in 
>> mask
>> + * @kvm:The KVM pointer
>> + * @slot:   The memory slot associated with mask
>> + * @gfn_offset: The gfn offset in memory slot
>> + * @mask:   The mask of dirty pages at offset 'gfn_offset' in this memory
>> + *  slot to be write protected
>> + *
>> + * Walks bits set in mask write protects the associated pte's. Caller must
>> + * acquire kvm_mmu_lock.
>> + */
>> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> +struct kvm_memory_slot *slot,
>> +gfn_t gfn_offset, unsigned long mask)
>> +{
>> +phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
>> +phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>> +phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
> 
> __fls(x) + 1 is the same as fls(x)

For me the __fls(x) + 1 is easier to see the covered range. Unless
it really breaks the convention I'd prefer to keep the '+1'. Either
way no problem.

>> +
>> +stage2_wp_range(kvm, start, end);
>> +}
>>  #endif
>>  
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> diff --git a/arch/arm64/include

Re: [PATCH v9 2/4] arm: ARMv7 dirty page logging inital mem region write protect (w/no huge PUD support)

2014-08-11 Thread Mario Smarduch
On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> Remove the parenthesis from the subject line.

Hmmm have to check this don't see it my patch file.
> 
> On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
>> Patch adds  support for initial write protection VM memlsot. This patch 
>> series
> ^^^
> stray whitespace of
> 
Need to watch out for these adds delays to review cycle.
> 
>> assumes that huge PUDs will not be used in 2nd stage tables.
> 
> may be worth mentioning that this is always valid on ARMv7.
> 

Yep definitely.

>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h   |1 +
>>  arch/arm/include/asm/kvm_mmu.h|   20 ++
>>  arch/arm/include/asm/pgtable-3level.h |1 +
>>  arch/arm/kvm/arm.c|9 +++
>>  arch/arm/kvm/mmu.c|  128 
>> +
>>  5 files changed, 159 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 042206f..6521a2d 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>  void kvm_arch_flush_remote_tlbs(struct kvm *);
>> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f..08ab5e8 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>>  pmd_val(*pmd) |= L_PMD_S2_RDWR;
>>  }
>>  
>> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
>> +{
>> +pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pte_readonly(pte_t *pte)
>> +{
>> +return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>> +}
>> +
>> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> +}
>> +
>> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>> +{
>> +return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>> +}
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  #define kvm_pgd_addr_end(addr, end) \
>>  ({  u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;\
>> diff --git a/arch/arm/include/asm/pgtable-3level.h 
>> b/arch/arm/include/asm/pgtable-3level.h
>> index 85c60ad..d8bb40b 100644
>> --- a/arch/arm/include/asm/pgtable-3level.h
>> +++ b/arch/arm/include/asm/pgtable-3level.h
>> @@ -129,6 +129,7 @@
>>  #define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6)   /* 
>> HAP[1]   */
>>  #define L_PTE_S2_RDWR   (_AT(pteval_t, 3) << 6)   /* 
>> HAP[2:1] */
>>  
>> +#define L_PMD_S2_RDONLY (_AT(pteval_t, 1) << 6)   /* 
>> HAP[1]   */
>>  #define L_PMD_S2_RDWR   (_AT(pmdval_t, 3) << 6)   /* 
>> HAP[2:1] */
>>  
>>  /*
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 3c82b37..e11c2dd 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>> const struct kvm_memory_slot *old,
>> enum kvm_mr_change change)
>>  {
>> +#ifdef CONFIG_ARM
>> +/*
>> + * At this point memslot has been committed and there is an
>> + * allocated dirty_bitmap[], dirty pages will be be tracked while the
>> + * memory slot is write protected.
>> + */
>> +if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> +kvm_mmu_wp_memory_region(kvm, mem->slot);
>> +#endif
>>  }
>>  
>>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 35254c6..7bfc792 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
>> phys_addr_t *ipap)
>>  return false;
>>  }
>>  
>> +#ifdef CONFIG_ARM
>> +/**
>> + * stage2_wp_pte_range - write protect PTE range
>> + * @pmd:pointer to pmd entry
>> + * @addr:   range start address
>> + * @end:range end address
>> + */
>> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t 
>> end)
>> +{
>> +pte_t *pte;
>> +
>> +pte = pte_offset_kernel(pmd, addr);
>> +do {
>> +if (!pte_none(*pte)) {
>> +if (!kvm_s2pte_readonly(pte))
>> +kvm_set_s2pte_readonly(pte);
>> +}
>> +} while (pte++, addr += PAGE_SIZE, addr != end);
>> +}
>> 

Re: [PATCH 7/7 v3] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-08-11 Thread Scott Wood
On Wed, 2014-08-06 at 12:08 +0530, Bharat Bhushan wrote:
> @@ -1249,6 +1284,7 @@ int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
>   setup_timer(&vcpu->arch.wdt_timer, kvmppc_watchdog_func,
>   (unsigned long)vcpu);
>  
> + kvmppc_clear_dbsr();
>   return 0;

This could use a comment for why we're doing this.  Also, I'm a bit
uneasy about clearing the whole DBSR here, where we haven't yet switched
the debug registers to guest context.  It shouldn't actually matter
except for deferred debug exceptions which are not actually useful (in
fact e6500 removed support for them), but still...

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-08-11 Thread Scott Wood
On Tue, 2014-08-12 at 01:53 +0200, Alexander Graf wrote:
> 
> > Am 12.08.2014 um 01:36 schrieb Scott Wood :
> > 
> >> On Wed, 2014-08-06 at 19:33 +0300, Mihai Caraman wrote:
> >> @@ -390,19 +400,30 @@ static void kvmppc_core_vcpu_free_e500mc(struct 
> >> kvm_vcpu *vcpu)
> >> 
> >> static int kvmppc_core_init_vm_e500mc(struct kvm *kvm)
> >> {
> >> -int lpid;
> >> +int i, lpid;
> >> 
> >> -lpid = kvmppc_alloc_lpid();
> >> -if (lpid < 0)
> >> -return lpid;
> >> +/* The lpid pool supports only 2 entries now */
> >> +if (threads_per_core > 2)
> >> +return -ENOMEM;
> >> +
> >> +/* Each VM allocates one LPID per HW thread index */
> >> +for (i = 0; i < threads_per_core; i++) {
> >> +lpid = kvmppc_alloc_lpid();
> >> +if (lpid < 0)
> >> +return lpid;
> >> +
> >> +kvm->arch.lpid_pool[i] = lpid;
> >> +}
> > 
> > Wouldn't it be simpler to halve the size of the lpid pool that the
> > allocator sees, and just OR in the high bit based on the low bit of the
> > cpu number?
> 
> Heh, I wrote the same and then removed the section from my reply again. It 
> wouldn't really make that much of a difference if you think it through 
> completely.
> 
> But yes, it certainly would be quite a bit more natural. I'm ok either way.

It's not a huge difference, but it would at least get rid of some of the
ifdeffing in the headers.  It'd also be nicer when debugging to have the
LPIDs correlated.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush

2014-08-11 Thread Mario Smarduch
On 08/11/2014 12:12 PM, Christoffer Dall wrote:
> On Thu, Jul 24, 2014 at 05:56:05PM -0700, Mario Smarduch wrote:
>> Patch adds HYP interface for global VM TLB invalidation without address
>> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_asm.h  |1 +
>>  arch/arm/include/asm/kvm_host.h |1 +
>>  arch/arm/kvm/Kconfig|1 +
>>  arch/arm/kvm/interrupts.S   |   12 
>>  arch/arm/kvm/mmu.c  |   17 +
>>  virt/kvm/Kconfig|3 +++
>>  virt/kvm/kvm_main.c |4 
>>  7 files changed, 39 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 53b3c4a..21bc519 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>>  
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  #endif
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 193ceaf..042206f 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>>  
>>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>>  
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
>> index 466bd29..44d3b6f 100644
>> --- a/arch/arm/kvm/Kconfig
>> +++ b/arch/arm/kvm/Kconfig
>> @@ -22,6 +22,7 @@ config KVM
>>  select ANON_INODES
>>  select HAVE_KVM_CPU_RELAX_INTERCEPT
>>  select KVM_MMIO
>> +select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>>  select KVM_ARM_HOST
>>  depends on ARM_VIRT_EXT && ARM_LPAE
>>  ---help---
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 0d68d40..1258d46 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>>  bx  lr
>>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>>  
>> +/**
>> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
>> + *
>> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
>> + * parameter
>> + */
>> +
>> +ENTRY(__kvm_tlb_flush_vmid)
>> +b   __kvm_tlb_flush_vmid_ipa
>> +ENDPROC(__kvm_tlb_flush_vmid)
>> +
>> +
>>  /
>>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>>   * domain, for all VMIDs
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 2ac9588..35254c6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 
>> phys_addr_t ipa)
>>  kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>>  }
>>  
>> +#ifdef CONFIG_ARM
> 
> I assume this is here because of arm vs. arm64, use static inlines in
> the header files to differentiate instead.
Yes that's right, will move it.
> 
>> +/**
>> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
>> + * @kvm:   pointer to kvm structure.
>> + *
>> + * Interface to HYP function to flush all VM TLB entries without address
>> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
>> + * kvm_tlb_flush_vmid_ipa().
> 
> remove the last sentence from here, it's repetitive.
Ok.
> 
>> + */
>> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>> +{
>> +if (kvm)
>> +kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
>> +}
>> +
>> +#endif
>> +
>>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
>>int min, int max)
>>  {
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 13f2d19..f1efaa5 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
>>  
>>  config KVM_VFIO
>> bool
>> +
>> +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> +   bool
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index fa70c6e..258f3d9 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, 
>> unsigned int req)
>>  
>>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>>  {
>> +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
>> +kvm_arch_flush_remote_tlbs(kvm);
>> +#else
>>  long dirty_count = kvm->tlbs_dirty;
>>  
>>  smp_mb();
>>  if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>>  ++kvm->stat.remote_tlb_flush;
>>  cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>> +#endif
> 
> I would s

Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-08-11 Thread Alexander Graf


> Am 12.08.2014 um 01:36 schrieb Scott Wood :
> 
>> On Wed, 2014-08-06 at 19:33 +0300, Mihai Caraman wrote:
>> @@ -390,19 +400,30 @@ static void kvmppc_core_vcpu_free_e500mc(struct 
>> kvm_vcpu *vcpu)
>> 
>> static int kvmppc_core_init_vm_e500mc(struct kvm *kvm)
>> {
>> -int lpid;
>> +int i, lpid;
>> 
>> -lpid = kvmppc_alloc_lpid();
>> -if (lpid < 0)
>> -return lpid;
>> +/* The lpid pool supports only 2 entries now */
>> +if (threads_per_core > 2)
>> +return -ENOMEM;
>> +
>> +/* Each VM allocates one LPID per HW thread index */
>> +for (i = 0; i < threads_per_core; i++) {
>> +lpid = kvmppc_alloc_lpid();
>> +if (lpid < 0)
>> +return lpid;
>> +
>> +kvm->arch.lpid_pool[i] = lpid;
>> +}
> 
> Wouldn't it be simpler to halve the size of the lpid pool that the
> allocator sees, and just OR in the high bit based on the low bit of the
> cpu number?

Heh, I wrote the same and then removed the section from my reply again. It 
wouldn't really make that much of a difference if you think it through 
completely.

But yes, it certainly would be quite a bit more natural. I'm ok either way.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-08-11 Thread Scott Wood
On Wed, 2014-08-06 at 19:33 +0300, Mihai Caraman wrote:
> @@ -390,19 +400,30 @@ static void kvmppc_core_vcpu_free_e500mc(struct 
> kvm_vcpu *vcpu)
>  
>  static int kvmppc_core_init_vm_e500mc(struct kvm *kvm)
>  {
> - int lpid;
> + int i, lpid;
>  
> - lpid = kvmppc_alloc_lpid();
> - if (lpid < 0)
> - return lpid;
> + /* The lpid pool supports only 2 entries now */
> + if (threads_per_core > 2)
> + return -ENOMEM;
> +
> + /* Each VM allocates one LPID per HW thread index */
> + for (i = 0; i < threads_per_core; i++) {
> + lpid = kvmppc_alloc_lpid();
> + if (lpid < 0)
> + return lpid;
> +
> + kvm->arch.lpid_pool[i] = lpid;
> + }

Wouldn't it be simpler to halve the size of the lpid pool that the
allocator sees, and just OR in the high bit based on the low bit of the
cpu number?

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] arm64: KVM: add irqfd support

2014-08-11 Thread Joel Schopp
Depends on Eric Auger's "ARM: KVM: add irqfd support" patch.

Enable vfio of platform devices for ARM64.  This patch fixes the ARM64 compile.
However this patch has only been compile tested.  It seemed worth sharing as it
will allow us to carry both the ARM and ARM64 patches together as we do more
testing.

Cc: Eirc Auger 
Signed-off-by: Joel Schopp 
---
 Documentation/virtual/kvm/api.txt |2 +-
 arch/arm64/include/uapi/asm/kvm.h |4 
 arch/arm64/kvm/Kconfig|4 +++-
 arch/arm64/kvm/Makefile   |2 +-
 drivers/vfio/platform/Kconfig |2 +-
 5 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 04310d9..bc64ce9 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2132,7 +2132,7 @@ into the hash PTE second double word).
 4.75 KVM_IRQFD
 
 Capability: KVM_CAP_IRQFD
-Architectures: x86 s390 arm
+Architectures: x86 s390 arm arm64
 Type: vm ioctl
 Parameters: struct kvm_irqfd (in)
 Returns: 0 on success, -1 on error
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index e633ff8..3df8baa 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -180,6 +180,10 @@ struct kvm_arch_memory_slot {
 /* Highest supported SPI, from VGIC_NR_IRQS */
 #define KVM_ARM_IRQ_GIC_MAX127
 
+/* One single KVM irqchip, ie. the VGIC */
+#define KVM_NR_IRQCHIPS  1
+
+
 /* PSCI interface */
 #define KVM_PSCI_FN_BASE   0x95c1ba5e
 #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ba85e9..cbd3525 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -26,6 +26,7 @@ config KVM
select KVM_ARM_HOST
select KVM_ARM_VGIC
select KVM_ARM_TIMER
+   select HAVE_KVM_EVENTFD
---help---
  Support hosting virtualized guest machines.
 
@@ -50,13 +51,14 @@ config KVM_ARM_MAX_VCPUS
 config KVM_ARM_VGIC
bool
depends on KVM_ARM_HOST && OF
-   select HAVE_KVM_IRQCHIP
+   select HAVE_KVM_IRQFD
---help---
  Adds support for a hardware assisted, in-kernel GIC emulation.
 
 config KVM_ARM_TIMER
bool
depends on KVM_ARM_VGIC
+   select HAVE_KVM_IRQCHIP
---help---
  Adds support for the Architected Timers in virtual machines.
 
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 72a9fd5..40b9970 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
 
 obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
 
-kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
 
diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
index c51af17..43ee890 100644
--- a/drivers/vfio/platform/Kconfig
+++ b/drivers/vfio/platform/Kconfig
@@ -1,6 +1,6 @@
 config VFIO_PLATFORM
tristate "VFIO support for platform devices"
-   depends on VFIO && EVENTFD && ARM
+   depends on VFIO && EVENTFD && (ARM || ARM64)
help
  Support for platform devices with VFIO. This is required to make
  use of platform devices present on the system using the VFIO

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] virtio: rng: add derating factor for use by hwrng core

2014-08-11 Thread H. Peter Anvin
On 08/11/2014 11:49 AM, Amit Shah wrote:
> The khwrngd thread is started when a hwrng device of sufficient
> quality is registered.  The virtio-rng device is backed by the
> hypervisor, and we trust the hypervisor to provide real entropy.  A
> malicious hypervisor is a scenario that's ruled out, so we are certain
> the quality of randomness we receive is perfectly trustworthy.  Hence,
> we use 100% for the factor, indicating maximum confidence in the source.
> 
> Signed-off-by: Amit Shah 

It isn't "ruled out", it is just irrelevant: if the hypervisor is
malicious, the quality of your random number source is the least of your
problems.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] arm64: fix VTTBR_BADDR_MASK

2014-08-11 Thread Joel Schopp
The current VTTBR_BADDR_MASK only masks 39 bits, which is broken on current
systems.  Rather than just add a bit it seems like a good time to also set
things at run-time instead of compile time to accomodate more hardware.

This patch sets TCR_EL2.PS, VTCR_EL2.T0SZ and vttbr_baddr_mask in runtime,
not compile time.

In ARMv8, EL2 physical address size (TCR_EL2.PS) and stage2 input address
size (VTCR_EL2.T0SZE) cannot be determined in compile time since they
depend on hardware capability.

According to Table D4-23 and Table D4-25 in ARM DDI 0487A.b document,
vttbr_x is calculated using different fixed values with consideration
of T0SZ, granule size and the level of translation tables. Therefore,
vttbr_baddr_mask should be determined dynamically.

Changes since v3:
Another rebase
Addressed minor comments from v2

Changes since v2:
Rebased on https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
next branch

Changes since v1:
Rebased fix on Jungseok Lee's patch https://lkml.org/lkml/2014/5/12/189 to
provide better long term fix.  Updated that patch to log error instead of
silently fail on unaligned vttbr.

Cc: Christoffer Dall 
Cc: Sungjinn Chung 
Signed-off-by: Jungseok Lee 
Signed-off-by: Joel Schopp 
---
 arch/arm/kvm/arm.c   |  116 +-
 arch/arm64/include/asm/kvm_arm.h |   17 +-
 arch/arm64/kvm/hyp-init.S|   20 +--
 3 files changed, 131 insertions(+), 22 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3c82b37..b4859fa 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -61,6 +62,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u8 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
+static u64 vttbr_baddr_mask;
+
 static bool vgic_present;
 
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
@@ -412,6 +415,103 @@ static bool need_new_vmid_gen(struct kvm *kvm)
return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
 }
 
+
+
+   /*
+* ARMv8 64K architecture limitations:
+* 16 <= T0SZ <= 21 is valid under 3 level of translation tables
+* 18 <= T0SZ <= 34 is valid under 2 level of translation tables
+* 31 <= T0SZ <= 39 is valid under 1 level of transltaion tables
+*
+* ARMv8 4K architecture limitations:
+* 16 <= T0SZ <= 24 is valid under 4 level of translation tables
+* 21 <= T0SZ <= 30 is valid under 3 level of translation tables
+* 30 <= T0SZ <= 39 is valid under 2 level of translation tables
+*
+* 
+* We further limit T0SZ in ARM64 Linux by not supporting 1 level 
+* translation tables at all, not supporting 2 level translation 
+* tables with 4k pages, not supporting different levels of translation
+* tables in stage 1 vs stage 2, not supporting different page sizes in
+* stage 1 vs stage 2, not supporting less than 40 bit address space 
+* with 64k pages, and not supporting less than 32 bit address space 
+* with 4K pages.
+*
+* See Table D4-23 and Table D4-25 in ARM DDI 0487A.b to figure out
+* the origin of the hardcoded values, 38 and 37.
+*/
+
+#ifdef CONFIG_ARM64_64K_PAGES
+static inline int t0sz_to_vttbr_x(int t0sz){
+   if (t0sz < 16 || t0sz > 24) {
+   kvm_err("Cannot support %d-bit address space\n", 64 - t0sz);
+   return -EINVAL;
+   }
+
+   return 38 - t0sz;
+}
+#elif CONFIG_ARM64 && !CONFIG_ARM64_64K_PAGES
+static inline int t0sz_to_vttbr_x(int t0sz){
+   if (t0sz < 16 || t0sz > 32) {
+   kvm_err("Cannot support %d-bit address space\n", 64 - t0sz);
+   return -EINVAL;
+   }
+   return 37 - t0sz;
+}
+#endif
+
+
+/**
+ * set_vttbr_baddr_mask - set mask value for vttbr base address
+ *
+ * In ARMv8, vttbr_baddr_mask cannot be determined in compile time since the
+ * stage2 input address size depends on hardware capability. Thus, we first
+ * need to read ID_AA64MMFR0_EL1.PARange first and then set vttbr_baddr_mask
+ * with consideration of both granule size and the level of translation tables.
+ */
+#ifndef CONFIG_ARM64
+static int set_vttbr_baddr_mask(void)
+{
+   vttbr_baddr_mask = VTTBR_BADDR_MASK;
+   return 0;
+}
+#else
+static int set_vttbr_baddr_mask(void)
+{
+  int pa_range, t0sz, vttbr_x;
+
+   pa_range = read_cpuid(ID_AA64MMFR0_EL1) & 0xf;
+
+   switch (pa_range) {
+   case 0:
+   t0sz = VTCR_EL2_T0SZ(32);
+   break;
+   case 1:
+   t0sz = VTCR_EL2_T0SZ(36);
+   break;
+   case 2:
+   t0sz = VTCR_EL2_T0SZ(40);
+   break;
+   case 3:
+   t0sz = VTCR_EL2_T0SZ(42);
+   break;
+   case 4:
+   t0sz = VTCR_EL2_T0SZ(44);
+   break;
+   cas

Re: [PATCH v3] arm64: fix VTTBR_BADDR_MASK

2014-08-11 Thread Joel Schopp

>>> That said, I don't think this is doing the right thing.  I think you
>>> want to refuse running the VM and avoid any stage-2 entried being
>>> created if this is not the case (actually, we may want to check this
>>> after set_vttbr_baddr_mask() or right aftert allocating the stage-2
>>> pgd), because otherwise I think we may be overwriting memory not
>>> belonging to us with concatenated page tables in a 42-bit 4KB system,
>>> for example.
>> My experience here was that the hardware actually catches the error on
>> the first instruction load of the guest kernel and does a stage 2
>> translation abort.  However, to be extra safe we could just log the
>> error with the address of the vttbr and then zero out the pgd_phys part
>> of vttbr altogether, leaving only the vmid.  The guest would then die of
>> natural causes and we wouldn't have to worry about the outside
>> possibility of memory getting overwritten.
> uh, putting zero in the pgd_phys part will just point to random memory
> if you happen to have memory based at address 0 though, right?
>
> I think we should check when we allocate the pgd that it is indeed of
> the right size and alignment, and if it isn't at this point, it truly is
> a BUG() and your kernel is terribly busted.
If I can't rely on 0 to be an invalid address I can't think of what I
could rely on to be invalid.  I'll just change this to BUG_ON(pgd_phys &
~vttbr_baddr_mask); and give up on my dream of the host kernel surviving
the bug.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-11 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Sun, 10 Aug 2014 21:45:59 +0200

> On Sun, Aug 10, 2014 at 11:30:35AM +0300, Razya Ladelsky wrote:
 ...
> And, did your tests actually produce 100% load on both host CPUs?
 ...

Michael, please do not quote an entire patch just to ask a one line
question.

I truly, truly, wish it was simpler in modern email clients to delete
the unrelated quoted material because I bet when people do this they
are simply being lazy.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] arm64: fix VTTBR_BADDR_MASK

2014-08-11 Thread Christoffer Dall
On Mon, Aug 11, 2014 at 10:20:41AM -0500, Joel Schopp wrote:
> Thanks for the detailed review.
> > the last case would be case 5 and the default case would be a BUG().
> I agree with the case, but rather than do a BUG() I'm going to print an
> error and return -EINVAL.  Not worth stopping the host kernel just
> because kvm is messed up when we can gracefully exit from it.

agreed

> >
> >> +
> >> +  /*
> >> +   * See Table D4-23 and Table D4-25 in ARM DDI 0487A.b to figure out
> >> +   * the origin of the hardcoded values, 38 and 37.
> >> +   */
> >> +#ifdef CONFIG_ARM64_64K_PAGES
> >> +  /*
> >> +   * 16 <= T0SZ <= 21 is valid under 3 level of translation tables
> >> +   * 18 <= T0SZ <= 34 is valid under 2 level of translation tables
> >> +   * 31 <= T0SZ <= 39 is valid under 1 level of transltaion tables
> >> +   */
> > so this scheme is with concatenated initial level stage-2 page tables.
> >
> > But we only ever allocate the amount of pages for our pgd according to
> > what the host has, so I think this allocation needs to be locked down
> > more tight, because the host is always using the appropriate amount for
> > 39 bits virtual addresses.
> I'll narrow the sanity check of the range.  I'll narrow it based on a 39
> - 48 bit VA host range in anticipation of the 4 level 4k and 3 level 64k
> host patches going in.
> 
> >> +  kvm_err("Cannot support %d-bit address space\n", 64 - t0sz);
> >> +  return -EINVAL;
> >> +  }
> >> +  vttbr_x = 37 - t0sz;
> >> +#endif
> >> +  vttbr_baddr_mask = (((1LLU << (48 - vttbr_x)) - 1) << (vttbr_x - 1));
> >> +#endif
> > This nested ifdef is really quite horrible.  Can you either factor these
> > out into some static inlines in arch/arm[64]/include/asm/kvm_mmu.h or
> > provide a per-architecture implementation in a .c file?
> I'll factor it out in the file to make it more readable and do away with
> the nested ifdef.  My theory on putting things into .h files is to only
> do it if there is actually another file that uses it.

that's not really the design principle between the split of kvm/arm and
kvm/arm64 though, but I'll let you choose your preferred method when
writing up the patches.

> >
> >> +  return 0;
> >> +}
> >> +
> >> +/**
> >>   * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
> >>   * @kvm   The guest that we are about to run
> >>   *
> >> @@ -429,8 +502,16 @@ static void update_vttbr(struct kvm *kvm)
> >>/* update vttbr to be used with the new vmid */
> >>pgd_phys = virt_to_phys(kvm->arch.pgd);
> >>vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
> >> -  kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
> >> -  kvm->arch.vttbr |= vmid;
> >> +
> >> +  /*
> >> +   * If the VTTBR isn't aligned there is something wrong with the system
> >> +   * or kernel.  It is better to just fail and not mask it. But no need
> >> +   * to panic the host kernel with a BUG_ON(), instead just log the error.
> >> +   */
> > These last two sentences are not very helpful, because they don't
> > describe the rationale for what you're doing, only what you are (and are
> > not) doing.
> I'll reword the comment.
> >
> > That said, I don't think this is doing the right thing.  I think you
> > want to refuse running the VM and avoid any stage-2 entried being
> > created if this is not the case (actually, we may want to check this
> > after set_vttbr_baddr_mask() or right aftert allocating the stage-2
> > pgd), because otherwise I think we may be overwriting memory not
> > belonging to us with concatenated page tables in a 42-bit 4KB system,
> > for example.
> My experience here was that the hardware actually catches the error on
> the first instruction load of the guest kernel and does a stage 2
> translation abort.  However, to be extra safe we could just log the
> error with the address of the vttbr and then zero out the pgd_phys part
> of vttbr altogether, leaving only the vmid.  The guest would then die of
> natural causes and we wouldn't have to worry about the outside
> possibility of memory getting overwritten.

uh, putting zero in the pgd_phys part will just point to random memory
if you happen to have memory based at address 0 though, right?

I think we should check when we allocate the pgd that it is indeed of
the right size and alignment, and if it isn't at this point, it truly is
a BUG() and your kernel is terribly busted.

> 
> I don't like the option of just calling BUG() because stopping the host
> kernel from running just because we can't run a guest seems a bit
> extreme.  On the other hand adding a return code to update_vttbr and
> checking it (even with unlikely) in the kvm_arch_vcpu_ioctl_run() loop
> doesn't thrill me either just from a wasted cpu cycles point of view. 

Agreed that we shouldn't call BUG on something that could happen.  We
only should if we really do have a BUG.  If we have a programming error
in setting our stage-2 page tables, then we're hosed, and I think
crashing the kernel 

Re: [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific

2014-08-11 Thread Christoffer Dall
On Thu, Jul 24, 2014 at 05:56:07PM -0700, Mario Smarduch wrote:
> This patch adds support for keeping track of VM dirty pages. As dirty page log
> is retrieved, the pages that have been written are write protected again for
> next write and log read.
> 
> The dirty log read function is generic for armv7 and x86, and arch specific
> for arm64, ia64, mips, powerpc, s390.

So I would also split up this patch.  One that only modifies the
existing functionality, but does not introduce any new functionality for
ARM.  Put this first patch in the beginning of the patch series with the
other prepatory patch, so that you get something like this:

[PATCH 1/X] KVM: Add architecture-specific TLB flush implementations
[PATCH 2/X] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log
[PATCH 3/X] arm: KVM: Add ARMv7 API to flush TLBs
[PATCH 4/X] arm: KVM: Add initial dirty page locking infrastructure
...

That will make it easier to get the patches accepted and for us to
review...


> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/kvm/arm.c  |8 +++-
>  arch/arm/kvm/mmu.c  |   22 +
>  arch/arm64/include/asm/kvm_host.h   |2 +
>  arch/arm64/kvm/Kconfig  |1 +
>  arch/ia64/include/asm/kvm_host.h|1 +
>  arch/ia64/kvm/Kconfig   |1 +
>  arch/ia64/kvm/kvm-ia64.c|2 +-
>  arch/mips/include/asm/kvm_host.h|2 +-
>  arch/mips/kvm/Kconfig   |1 +
>  arch/mips/kvm/kvm_mips.c|2 +-
>  arch/powerpc/include/asm/kvm_host.h |2 +
>  arch/powerpc/kvm/Kconfig|1 +
>  arch/powerpc/kvm/book3s.c   |2 +-
>  arch/powerpc/kvm/booke.c|2 +-
>  arch/s390/include/asm/kvm_host.h|2 +
>  arch/s390/kvm/Kconfig   |1 +
>  arch/s390/kvm/kvm-s390.c|2 +-
>  arch/x86/kvm/x86.c  |   86 -
>  include/linux/kvm_host.h|3 ++
>  virt/kvm/Kconfig|3 ++
>  virt/kvm/kvm_main.c |   90 
> +++
>  21 files changed, 143 insertions(+), 93 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e11c2dd..f7739a0 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -783,10 +783,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>   }
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +#ifdef CONFIG_ARM64
> +/*
> + * For now features not supported on ARM64, the #ifdef is added to make that
> + * clear but not needed since ARM64 Kconfig selects function in generic code.
> + */

I don't think this comment is needed, but if you really want it, it
should be something like:

/*
 * ARM64 does not support dirty logging and therefore selects
 * CONFIG_HAVE_KVM_ARCH_DIRTY_LOG.  Provide a -EINVAL stub.
 */

> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log 
> *log)
>  {
>   return -EINVAL;
>  }
> +#endif
>  
>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>   struct kvm_arm_device_addr *dev_addr)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7bfc792..ca84331 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -889,6 +889,28 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>   kvm_flush_remote_tlbs(kvm);
>   spin_unlock(&kvm->mmu_lock);
>  }
> +
> +/**
> + * kvm_mmu_write_protected_pt_masked() - write protect dirty pages set in 
> mask
> + * @kvm: The KVM pointer
> + * @slot:The memory slot associated with mask
> + * @gfn_offset:  The gfn offset in memory slot
> + * @mask:The mask of dirty pages at offset 'gfn_offset' in this memory
> + *   slot to be write protected
> + *
> + * Walks bits set in mask write protects the associated pte's. Caller must
> + * acquire kvm_mmu_lock.
> + */
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> + struct kvm_memory_slot *slot,
> + gfn_t gfn_offset, unsigned long mask)
> +{
> + phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
> + phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
> + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;

__fls(x) + 1 is the same as fls(x)
> +
> + stage2_wp_range(kvm, start, end);
> +}
>  #endif
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 92242ce..b4a280b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -200,4 +200,6 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
> boot_pgd_ptr,
>hyp_stack_ptr, vector_ptr);
>  }
>  
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log 
> *log);
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/

Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support

2014-08-11 Thread Christoffer Dall
On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> This patch adds support for handling 2nd stage page faults during migration,
> it disables faulting in huge pages, and dissolves huge pages to page tables.
> In case migration is canceled huge pages will be used again.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/kvm/mmu.c |   31 +--
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index ca84331..a17812a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -642,7 +642,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct 
> kvm_mmu_memory_cache
>  }
>  
>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache 
> *cache,
> -   phys_addr_t addr, const pte_t *new_pte, bool iomap)
> +   phys_addr_t addr, const pte_t *new_pte, bool iomap,
> +   bool logging_active)
>  {
>   pmd_t *pmd;
>   pte_t *pte, old_pte;
> @@ -657,6 +658,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
> kvm_mmu_memory_cache *cache,
>   return 0;
>   }
>  
> + /*
> +  * While dirty memory logging, clear PMD entry for huge page and split
> +  * into smaller pages, to track dirty memory at page granularity.
> +  */
> + if (logging_active && kvm_pmd_huge(*pmd)) {
> + phys_addr_t ipa = pmd_pfn(*pmd) << PAGE_SHIFT;
> + clear_pmd_entry(kvm, pmd, ipa);

clear_pmd_entry has a VM_BUG_ON(kvm_pmd_huge(*pmd)) so that is
definitely not the right thing to call.

> + }
> +
>   /* Create stage-2 page mappings - Level 2 */
>   if (pmd_none(*pmd)) {
>   if (!cache)
> @@ -709,7 +719,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
> guest_ipa,
>   if (ret)
>   goto out;
>   spin_lock(&kvm->mmu_lock);
> - ret = stage2_set_pte(kvm, &cache, addr, &pte, true);
> + ret = stage2_set_pte(kvm, &cache, addr, &pte, true, false);
>   spin_unlock(&kvm->mmu_lock);
>   if (ret)
>   goto out;
> @@ -926,6 +936,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>   struct vm_area_struct *vma;
>   pfn_t pfn;
> + /* Get logging status, if dirty_bitmap is not NULL then logging is on */
> + #ifdef CONFIG_ARM
> + bool logging_active = !!memslot->dirty_bitmap;
> + #else
> + bool logging_active = false;
> + #endif

can you make this an inline in the header files for now please?

>  
>   write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>   if (fault_status == FSC_PERM && !write_fault) {
> @@ -936,7 +952,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   /* Let's check if we will get back a huge page backed by hugetlbfs */
>   down_read(¤t->mm->mmap_sem);
>   vma = find_vma_intersection(current->mm, hva, hva + 1);
> - if (is_vm_hugetlb_page(vma)) {
> + if (is_vm_hugetlb_page(vma) && !logging_active) {
>   hugetlb = true;
>   gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
>   } else {
> @@ -979,7 +995,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   spin_lock(&kvm->mmu_lock);
>   if (mmu_notifier_retry(kvm, mmu_seq))
>   goto out_unlock;
> - if (!hugetlb && !force_pte)
> + if (!hugetlb && !force_pte && !logging_active)
>   hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>  
>   if (hugetlb) {
> @@ -998,9 +1014,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   kvm_set_pfn_dirty(pfn);
>   }
>   coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
> - ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
> + ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false,
> + logging_active);
>   }
>  
> + if (write_fault)
> + mark_page_dirty(kvm, gfn);
>  
>  out_unlock:
>   spin_unlock(&kvm->mmu_lock);
> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t 
> gpa, void *data)
>  {
>   pte_t *pte = (pte_t *)data;
>  
> - stage2_set_pte(kvm, NULL, gpa, pte, false);
> + stage2_set_pte(kvm, NULL, gpa, pte, false, false);

why is logging never active if we are called from MMU notifiers?

>  }
>  
>  
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush

2014-08-11 Thread Christoffer Dall
On Thu, Jul 24, 2014 at 05:56:05PM -0700, Mario Smarduch wrote:
> Patch adds HYP interface for global VM TLB invalidation without address
> parameter. Generic VM TLB flush calls ARMv7 arch defined TLB flush function.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/include/asm/kvm_asm.h  |1 +
>  arch/arm/include/asm/kvm_host.h |1 +
>  arch/arm/kvm/Kconfig|1 +
>  arch/arm/kvm/interrupts.S   |   12 
>  arch/arm/kvm/mmu.c  |   17 +
>  virt/kvm/Kconfig|3 +++
>  virt/kvm/kvm_main.c |4 
>  7 files changed, 39 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 53b3c4a..21bc519 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -78,6 +78,7 @@ extern char __kvm_hyp_code_end[];
>  
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  #endif
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 193ceaf..042206f 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -230,5 +230,6 @@ int kvm_perf_teardown(void);
>  
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> +void kvm_arch_flush_remote_tlbs(struct kvm *);
>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 466bd29..44d3b6f 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -22,6 +22,7 @@ config KVM
>   select ANON_INODES
>   select HAVE_KVM_CPU_RELAX_INTERCEPT
>   select KVM_MMIO
> + select HAVE_KVM_ARCH_TLB_FLUSH_ALL
>   select KVM_ARM_HOST
>   depends on ARM_VIRT_EXT && ARM_LPAE
>   ---help---
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 0d68d40..1258d46 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -66,6 +66,18 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
>   bx  lr
>  ENDPROC(__kvm_tlb_flush_vmid_ipa)
>  
> +/**
> + * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
> + *
> + * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
> + * parameter
> + */
> +
> +ENTRY(__kvm_tlb_flush_vmid)
> + b   __kvm_tlb_flush_vmid_ipa
> +ENDPROC(__kvm_tlb_flush_vmid)
> +
> +
>  /
>   * Flush TLBs and instruction caches of all CPUs inside the inner-shareable
>   * domain, for all VMIDs
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 2ac9588..35254c6 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -56,6 +56,23 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 
> phys_addr_t ipa)
>   kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>  }
>  
> +#ifdef CONFIG_ARM

I assume this is here because of arm vs. arm64, use static inlines in
the header files to differentiate instead.

> +/**
> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
> + * @kvm:   pointer to kvm structure.
> + *
> + * Interface to HYP function to flush all VM TLB entries without address
> + * parameter. In HYP mode reuses __kvm_tlb_flush_vmid_ipa() function used by
> + * kvm_tlb_flush_vmid_ipa().

remove the last sentence from here, it's repetitive.

> + */
> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> +{
> + if (kvm)
> + kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> +}
> +
> +#endif
> +
>  static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
> int min, int max)
>  {
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 13f2d19..f1efaa5 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  config KVM_VFIO
> bool
> +
> +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
> +   bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fa70c6e..258f3d9 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, 
> unsigned int req)
>  
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
> + kvm_arch_flush_remote_tlbs(kvm);
> +#else
>   long dirty_count = kvm->tlbs_dirty;
>  
>   smp_mb();
>   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>   ++kvm->stat.remote_tlb_flush;
>   cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> +#endif

I would split this into two patches, one trivial one for the KVM generic
solution, and one to add the arm-specific part.

That will make your commit text and title much nicer to read too.

Thanks,
-Christoffer
--
To unsubscribe

Re: [PATCH v9 2/4] arm: ARMv7 dirty page logging inital mem region write protect (w/no huge PUD support)

2014-08-11 Thread Christoffer Dall
Remove the parenthesis from the subject line.

On Thu, Jul 24, 2014 at 05:56:06PM -0700, Mario Smarduch wrote:
> Patch adds  support for initial write protection VM memlsot. This patch series
^^^
stray whitespace of


> assumes that huge PUDs will not be used in 2nd stage tables.

may be worth mentioning that this is always valid on ARMv7.

> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/include/asm/kvm_host.h   |1 +
>  arch/arm/include/asm/kvm_mmu.h|   20 ++
>  arch/arm/include/asm/pgtable-3level.h |1 +
>  arch/arm/kvm/arm.c|9 +++
>  arch/arm/kvm/mmu.c|  128 
> +
>  5 files changed, 159 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 042206f..6521a2d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -231,5 +231,6 @@ int kvm_perf_teardown(void);
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  void kvm_arch_flush_remote_tlbs(struct kvm *);
> +void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5cc0b0f..08ab5e8 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
>   pmd_val(*pmd) |= L_PMD_S2_RDWR;
>  }
>  
> +static inline void kvm_set_s2pte_readonly(pte_t *pte)
> +{
> + pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pte_readonly(pte_t *pte)
> +{
> + return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> +}
> +
> +static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> +{
> + pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> +}
> +
> +static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> +{
> + return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> +}
> +
>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>  #define kvm_pgd_addr_end(addr, end)  \
>  ({   u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;\
> diff --git a/arch/arm/include/asm/pgtable-3level.h 
> b/arch/arm/include/asm/pgtable-3level.h
> index 85c60ad..d8bb40b 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -129,6 +129,7 @@
>  #define L_PTE_S2_RDONLY  (_AT(pteval_t, 1) << 6)   /* 
> HAP[1]   */
>  #define L_PTE_S2_RDWR(_AT(pteval_t, 3) << 6)   /* 
> HAP[2:1] */
>  
> +#define L_PMD_S2_RDONLY  (_AT(pteval_t, 1) << 6)   /* 
> HAP[1]   */
>  #define L_PMD_S2_RDWR(_AT(pmdval_t, 3) << 6)   /* 
> HAP[2:1] */
>  
>  /*
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3c82b37..e11c2dd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -242,6 +242,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  const struct kvm_memory_slot *old,
>  enum kvm_mr_change change)
>  {
> +#ifdef CONFIG_ARM
> + /*
> +  * At this point memslot has been committed and there is an
> +  * allocated dirty_bitmap[], dirty pages will be be tracked while the
> +  * memory slot is write protected.
> +  */
> + if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
> + kvm_mmu_wp_memory_region(kvm, mem->slot);
> +#endif
>  }
>  
>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 35254c6..7bfc792 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -763,6 +763,134 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
> phys_addr_t *ipap)
>   return false;
>  }
>  
> +#ifdef CONFIG_ARM
> +/**
> + * stage2_wp_pte_range - write protect PTE range
> + * @pmd: pointer to pmd entry
> + * @addr:range start address
> + * @end: range end address
> + */
> +static void stage2_wp_pte_range(pmd_t *pmd, phys_addr_t addr, phys_addr_t 
> end)
> +{
> + pte_t *pte;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + do {
> + if (!pte_none(*pte)) {
> + if (!kvm_s2pte_readonly(pte))
> + kvm_set_s2pte_readonly(pte);
> + }
> + } while (pte++, addr += PAGE_SIZE, addr != end);
> +}
> +
> +/**
> + * stage2_wp_pmd_range - write protect PMD range
> + * @pud: pointer to pud entry
> + * @addr:range start address
> + * @end: range end address
> + */
> +static void stage2_wp_pmd_range(pud_t *pud, phys_addr_t addr, phys_addr_t 
> end)
> +{
> + pmd_t *pmd;
> +

[PATCH 1/1] virtio: rng: add derating factor for use by hwrng core

2014-08-11 Thread Amit Shah
The khwrngd thread is started when a hwrng device of sufficient
quality is registered.  The virtio-rng device is backed by the
hypervisor, and we trust the hypervisor to provide real entropy.  A
malicious hypervisor is a scenario that's ruled out, so we are certain
the quality of randomness we receive is perfectly trustworthy.  Hence,
we use 100% for the factor, indicating maximum confidence in the source.

Signed-off-by: Amit Shah 

---
Pretty small and contained patch; would be great if it is picked up for
3.17.
---
 drivers/char/hw_random/virtio-rng.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 0027137..2e3139e 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -116,6 +116,7 @@ static int probe_common(struct virtio_device *vdev)
.cleanup = virtio_cleanup,
.priv = (unsigned long)vi,
.name = vi->name,
+   .quality = 1000,
};
vdev->priv = vi;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call for agenda for 2014-08-19

2014-08-11 Thread Joel Schopp

On 08/11/2014 08:09 AM, Juan Quintela wrote:
> Hi
>
> Please, send any topic that you are interested in covering.
>
> People have complained on the past that I don't cancel the call until
> the very last minute.  So, what do you think that deadline for
> submitting topics is 23:00UTC on Monday?
I like the deadline. 
>
> Call details:
>
>  15:00 CEST
>  13:00 UTC
>  09:00 EDT
>
> Every two weeks
>
> By popular demand, a google calendar public entry with it
>
>  
> https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
>
>   (Let me know if you have any problems with the calendar entry)
>
> If you need phone number details,  contact me privately
>
> Thanks, Juan.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] arm64: fix VTTBR_BADDR_MASK

2014-08-11 Thread Joel Schopp
Thanks for the detailed review.
> the last case would be case 5 and the default case would be a BUG().
I agree with the case, but rather than do a BUG() I'm going to print an
error and return -EINVAL.  Not worth stopping the host kernel just
because kvm is messed up when we can gracefully exit from it.
>
>> +
>> +/*
>> + * See Table D4-23 and Table D4-25 in ARM DDI 0487A.b to figure out
>> + * the origin of the hardcoded values, 38 and 37.
>> + */
>> +#ifdef CONFIG_ARM64_64K_PAGES
>> +/*
>> + * 16 <= T0SZ <= 21 is valid under 3 level of translation tables
>> + * 18 <= T0SZ <= 34 is valid under 2 level of translation tables
>> + * 31 <= T0SZ <= 39 is valid under 1 level of transltaion tables
>> + */
> so this scheme is with concatenated initial level stage-2 page tables.
>
> But we only ever allocate the amount of pages for our pgd according to
> what the host has, so I think this allocation needs to be locked down
> more tight, because the host is always using the appropriate amount for
> 39 bits virtual addresses.
I'll narrow the sanity check of the range.  I'll narrow it based on a 39
- 48 bit VA host range in anticipation of the 4 level 4k and 3 level 64k
host patches going in.

>> +kvm_err("Cannot support %d-bit address space\n", 64 - t0sz);
>> +return -EINVAL;
>> +}
>> +vttbr_x = 37 - t0sz;
>> +#endif
>> +vttbr_baddr_mask = (((1LLU << (48 - vttbr_x)) - 1) << (vttbr_x - 1));
>> +#endif
> This nested ifdef is really quite horrible.  Can you either factor these
> out into some static inlines in arch/arm[64]/include/asm/kvm_mmu.h or
> provide a per-architecture implementation in a .c file?
I'll factor it out in the file to make it more readable and do away with
the nested ifdef.  My theory on putting things into .h files is to only
do it if there is actually another file that uses it.
>
>> +return 0;
>> +}
>> +
>> +/**
>>   * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>>   * @kvm The guest that we are about to run
>>   *
>> @@ -429,8 +502,16 @@ static void update_vttbr(struct kvm *kvm)
>>  /* update vttbr to be used with the new vmid */
>>  pgd_phys = virt_to_phys(kvm->arch.pgd);
>>  vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
>> -kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
>> -kvm->arch.vttbr |= vmid;
>> +
>> +/*
>> + * If the VTTBR isn't aligned there is something wrong with the system
>> + * or kernel.  It is better to just fail and not mask it. But no need
>> + * to panic the host kernel with a BUG_ON(), instead just log the error.
>> + */
> These last two sentences are not very helpful, because they don't
> describe the rationale for what you're doing, only what you are (and are
> not) doing.
I'll reword the comment.
>
> That said, I don't think this is doing the right thing.  I think you
> want to refuse running the VM and avoid any stage-2 entried being
> created if this is not the case (actually, we may want to check this
> after set_vttbr_baddr_mask() or right aftert allocating the stage-2
> pgd), because otherwise I think we may be overwriting memory not
> belonging to us with concatenated page tables in a 42-bit 4KB system,
> for example.
My experience here was that the hardware actually catches the error on
the first instruction load of the guest kernel and does a stage 2
translation abort.  However, to be extra safe we could just log the
error with the address of the vttbr and then zero out the pgd_phys part
of vttbr altogether, leaving only the vmid.  The guest would then die of
natural causes and we wouldn't have to worry about the outside
possibility of memory getting overwritten.

I don't like the option of just calling BUG() because stopping the host
kernel from running just because we can't run a guest seems a bit
extreme.  On the other hand adding a return code to update_vttbr and
checking it (even with unlikely) in the kvm_arch_vcpu_ioctl_run() loop
doesn't thrill me either just from a wasted cpu cycles point of view. 
>
>> +if (pgd_phys & ~vttbr_baddr_mask)
>> +kvm_err("VTTBR not aligned, expect guest to fail");
>> +
>> +kvm->arch.vttbr = pgd_phys | vmid;
>>  
>>  spin_unlock(&kvm_vmid_lock);
>>  }
>> @@ -1015,6 +1096,12 @@ int kvm_arch_init(void *opaque)
>>  }
>>  }
>>  
>> +err = set_vttbr_baddr_mask();
>> +if (err) {
>> +kvm_err("Cannot set vttbr_baddr_mask\n");
>> +return -EINVAL;
>> +}
>> +
>>  cpu_notifier_register_begin();
>>  
>>  err = init_hyp_mode();
>> diff --git a/arch/arm64/include/asm/kvm_arm.h 
>> b/arch/arm64/include/asm/kvm_arm.h
>> index cc83520..ff4a4fa 100644
>> --- a/arch/arm64/include/asm/kvm_arm.h
>> +++ b/arch/arm64/include/asm/kvm_arm.h
>> @@ -95,7 +95,6 @@
>>  /* TCR_EL2 Registers bits */
>>  #define TCR_EL2_TBI (1 << 20)
>>  #define TCR_EL2_PS  (7 << 16)
>> -#define TCR_EL2_PS_40B  (2 << 16)
>> 

[PATCH 0/5] watchdog: various fixes

2014-08-11 Thread Don Zickus
Just respinning these patches with my sign-off.  I keep forgetting which is
easier for Andrew to digest (this way or just me replying with an ack).

Ulrich Obergfell (3):
  watchdog: fix print-once on enable
  watchdog: control hard lockup detection default
  kvm: ensure hard lockup detection is disabled by default

chai wen (2):
  watchdog: remove unnecessary head files
  softlockup: make detector be aware of task switch of processes
hogging cpu

 arch/x86/kernel/kvm.c |8 +
 include/linux/nmi.h   |9 +
 kernel/watchdog.c |   78 +++-
 3 files changed, 86 insertions(+), 9 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] watchdog: control hard lockup detection default

2014-08-11 Thread Don Zickus
From: Ulrich Obergfell 

In some cases we don't want hard lockup detection enabled by default.
An example is when running as a guest. Introduce

  watchdog_enable_hardlockup_detector(bool)

allowing those cases to disable hard lockup detection. This must be
executed early by the boot processor from e.g. smp_prepare_boot_cpu,
in order to allow kernel command line arguments to override it, as
well as to avoid hard lockup detection being enabled before we've
had a chance to indicate that it's unwanted. In summary,

  initial boot: default=enabled
  smp_prepare_boot_cpu
watchdog_enable_hardlockup_detector(false): default=disabled
  cmdline has 'nmi_watchdog=1': default=enabled

The running kernel still has the ability to enable/disable at any
time with /proc/sys/kernel/nmi_watchdog us usual. However even
when the default has been overridden /proc/sys/kernel/nmi_watchdog
will initially show '1'. To truly turn it on one must disable/enable
it, i.e.
  echo 0 > /proc/sys/kernel/nmi_watchdog
  echo 1 > /proc/sys/kernel/nmi_watchdog

This patch will be immediately useful for KVM with the next patch
of this series. Other hypervisor guest types may find it useful as
well.

Signed-off-by: Ulrich Obergfell 
Signed-off-by: Andrew Jones 
Signed-off-by: Don Zickus 
---
 include/linux/nmi.h |9 +
 kernel/watchdog.c   |   50 --
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 447775e..72aacf4 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -17,11 +17,20 @@
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 #include 
 extern void touch_nmi_watchdog(void);
+extern void watchdog_enable_hardlockup_detector(bool val);
+extern bool watchdog_hardlockup_detector_is_enabled(void);
 #else
 static inline void touch_nmi_watchdog(void)
 {
touch_softlockup_watchdog();
 }
+static inline void watchdog_enable_hardlockup_detector(bool)
+{
+}
+static inline bool watchdog_hardlockup_detector_is_enabled(void)
+{
+   return true;
+}
 #endif
 
 /*
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 0838685..8cb24dc 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -59,6 +59,25 @@ static unsigned long soft_lockup_nmi_warn;
 static int hardlockup_panic =
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
 
+static bool hardlockup_detector_enabled = true;
+/*
+ * We may not want to enable hard lockup detection by default in all cases,
+ * for example when running the kernel as a guest on a hypervisor. In these
+ * cases this function can be called to disable hard lockup detection. This
+ * function should only be executed once by the boot processor before the
+ * kernel command line parameters are parsed, because otherwise it is not
+ * possible to override this in hardlockup_panic_setup().
+ */
+void watchdog_enable_hardlockup_detector(bool val)
+{
+   hardlockup_detector_enabled = val;
+}
+
+bool watchdog_hardlockup_detector_is_enabled(void)
+{
+   return hardlockup_detector_enabled;
+}
+
 static int __init hardlockup_panic_setup(char *str)
 {
if (!strncmp(str, "panic", 5))
@@ -67,6 +86,14 @@ static int __init hardlockup_panic_setup(char *str)
hardlockup_panic = 0;
else if (!strncmp(str, "0", 1))
watchdog_user_enabled = 0;
+   else if (!strncmp(str, "1", 1) || !strncmp(str, "2", 1)) {
+   /*
+* Setting 'nmi_watchdog=1' or 'nmi_watchdog=2' (legacy option)
+* has the same effect.
+*/
+   watchdog_user_enabled = 1;
+   watchdog_enable_hardlockup_detector(true);
+   }
return 1;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
@@ -462,6 +489,15 @@ static int watchdog_nmi_enable(unsigned int cpu)
struct perf_event_attr *wd_attr;
struct perf_event *event = per_cpu(watchdog_ev, cpu);
 
+   /*
+* Some kernels need to default hard lockup detection to
+* 'disabled', for example a guest on a hypervisor.
+*/
+   if (!watchdog_hardlockup_detector_is_enabled()) {
+   event = ERR_PTR(-ENOENT);
+   goto handle_err;
+   }
+
/* is it already setup and enabled? */
if (event && event->state > PERF_EVENT_STATE_OFF)
goto out;
@@ -476,6 +512,7 @@ static int watchdog_nmi_enable(unsigned int cpu)
/* Try to register using hardware perf events */
event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, 
watchdog_overflow_callback, NULL);
 
+handle_err:
/* save cpu0 error for future comparision */
if (cpu == 0 && IS_ERR(event))
cpu0_err = PTR_ERR(event);
@@ -621,11 +658,13 @@ int proc_dowatchdog(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
 {
i

[PATCH 1/5] watchdog: remove unnecessary head files

2014-08-11 Thread Don Zickus
From: chai wen 

Signed-off-by: chai wen 
Signed-off-by: Don Zickus 
---
 kernel/watchdog.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c3319bd..4c2e11c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -15,11 +15,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
 #include 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-11 Thread Don Zickus
From: chai wen 

For now, soft lockup detector warns once for each case of process softlockup.
But the thread 'watchdog/n' may not always get the cpu at the time slot between
the task switch of two processes hogging that cpu to reset soft_watchdog_warn.

An example would be two processes hogging the cpu.  Process A causes the
softlockup warning and is killed manually by a user.  Process B immediately
becomes the new process hogging the cpu preventing the softlockup code from
resetting the soft_watchdog_warn variable.

This case is a false negative of "warn only once for a process", as there may
be a different process that is going to hog the cpu.  Resolve this by
saving/checking the pid of the hogging process and use that to reset
soft_watchdog_warn too.

Signed-off-by: chai wen 
[modified the comment and changelog to be more specific]
Signed-off-by: Don Zickus 
---
 kernel/watchdog.c |   20 ++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4c2e11c..6d0a891 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
+static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved);
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
@@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 */
duration = is_softlockup(touch_ts);
if (unlikely(duration)) {
+   pid_t pid = task_pid_nr(current);
+
/*
 * If a virtual machine is stopped by the host it can look to
 * the watchdog like a soft lockup, check to see if the host
@@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
return HRTIMER_RESTART;
 
/* only warn once */
-   if (__this_cpu_read(soft_watchdog_warn) == true)
+   if (__this_cpu_read(soft_watchdog_warn) == true) {
+
+   /*
+* Handle the case where multiple processes are
+* causing softlockups but the duration is small
+* enough, the softlockup detector can not reset
+* itself in time.  Use pids to detect this.
+*/
+   if (__this_cpu_read(softlockup_warn_pid_saved) != pid) {
+   __this_cpu_write(soft_watchdog_warn, false);
+   __touch_watchdog();
+   }
return HRTIMER_RESTART;
+   }
 
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is 
already
@@ -342,7 +357,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 
printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! 
[%s:%d]\n",
smp_processor_id(), duration,
-   current->comm, task_pid_nr(current));
+   current->comm, pid);
+   __this_cpu_write(softlockup_warn_pid_saved, pid);
print_modules();
print_irqtrace_events(current);
if (regs)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] watchdog: fix print-once on enable

2014-08-11 Thread Don Zickus
From: Ulrich Obergfell 

This patch avoids printing the message 'enabled on all CPUs, ...'
multiple times. For example, the issue can occur in the following
scenario:

1) watchdog_nmi_enable() fails to enable PMU counters and sets
   cpu0_err.

2) 'echo [0|1] > /proc/sys/kernel/nmi_watchdog' is executed to
   disable and re-enable the watchdog mechanism 'on the fly'.

3) If watchdog_nmi_enable() succeeds to enable PMU counters, each
   CPU will print the message because step1 left behind a non-zero
   cpu0_err.

   if (!IS_ERR(event)) {
   if (cpu == 0 || cpu0_err)
   pr_info("enabled on all CPUs, ...")

The patch avoids this by clearing cpu0_err in watchdog_nmi_disable().

Signed-off-by: Ulrich Obergfell 
Signed-off-by: Andrew Jones 
Signed-off-by: Don Zickus 
---
 kernel/watchdog.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6d0a891..0838685 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -522,6 +522,9 @@ static void watchdog_nmi_disable(unsigned int cpu)
/* should be in cleanup, but blocks oprofile */
perf_event_release_kernel(event);
}
+   if (cpu == 0)
+   /* watchdog_nmi_enable() expects this to be zero initially. */
+   cpu0_err = 0;
return;
 }
 #else
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] kvm: ensure hard lockup detection is disabled by default

2014-08-11 Thread Don Zickus
From: Ulrich Obergfell 

Use watchdog_enable_hardlockup_detector() to set hard lockup detection's
default value to false. It's risky to run this detection in a guest, as
false positives are easy to trigger, especially if the host is
overcommitted.

Signed-off-by: Ulrich Obergfell 
Signed-off-by: Andrew Jones 
Signed-off-by: Don Zickus 
---
 arch/x86/kernel/kvm.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..95c3cb1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -499,6 +500,13 @@ void __init kvm_guest_init(void)
 #else
kvm_guest_cpu_init();
 #endif
+
+   /*
+* Hard lockup detection is enabled by default. Disable it, as guests
+* can get false positives too easily, for example if the host is
+* overcommitted.
+*/
+   watchdog_enable_hardlockup_detector(false);
 }
 
 static noinline uint32_t __kvm_cpuid_base(void)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core

2014-08-11 Thread Alexander Graf


On 06.08.14 18:33, Mihai Caraman wrote:

ePAPR represents hardware threads as cpu node properties in device tree.
So with existing QEMU, hardware threads are simply exposed as vcpus with
one hardware thread.

The e6500 core shares TLBs between hardware threads. Without tlb write
conditional instruction, the Linux kernel uses per core mechanisms to
protect against duplicate TLB entries.

The guest is unable to detect real siblings threads, so it can't use a
TLB protection mechanism. An alternative solution is to use the hypervisor
to allocate different lpids to guest's vcpus running simultaneous on real
siblings threads. This patch moves lpid to vcpu level and allocates a pool
of lpids (equal to the number of threads per core) per VM.

Signed-off-by: Mihai Caraman 
---
  Please rebase this patch before
 [PATCH v3 5/5] KVM: PPC: Book3E: Enable e6500 core
  to proper handle SMP guests.

  arch/powerpc/include/asm/kvm_host.h |  5 
  arch/powerpc/kernel/asm-offsets.c   |  4 +++
  arch/powerpc/kvm/e500_mmu_host.c| 15 +-
  arch/powerpc/kvm/e500mc.c   | 55 +
  4 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..1b0bb4a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,11 @@ struct kvm_arch_memory_slot {
  };
  
  struct kvm_arch {

+#ifdef CONFIG_KVM_BOOKE_HV
+   unsigned int lpid_pool[2];
+#else
unsigned int lpid;
+#endif
  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
unsigned long hpt_virt;
struct revmap_entry *revmap;
@@ -435,6 +439,7 @@ struct kvm_vcpu_arch {
u32 eplc;
u32 epsc;
u32 oldpir;
+   u32 lpid;
  #endif
  
  #if defined(CONFIG_BOOKE)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ab9ae04..5a30b87 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -483,7 +483,11 @@ int main(void)
DEFINE(VCPU_SHARED_MAS6, offsetof(struct kvm_vcpu_arch_shared, mas6));
  
  	DEFINE(VCPU_KVM, offsetof(struct kvm_vcpu, kvm));

+#ifdef CONFIG_KVM_BOOKE_HV
+   DEFINE(KVM_LPID, offsetof(struct kvm_vcpu, arch.lpid));


This is a recipe for confusion. Please use a name that indicates that 
we're looking at the vcpu - VCPU_LPID for example.



+#else
DEFINE(KVM_LPID, offsetof(struct kvm, arch.lpid));
+#endif
  
  	/* book3s */

  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 4150826..a233cc6 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -69,7 +69,7 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
   * writing shadow tlb entry to host TLB
   */
  static inline void __write_host_tlbe(struct kvm_book3e_206_tlb_entry *stlbe,
-uint32_t mas0)
+uint32_t mas0, uint32_t *lpid)


Why a pointer?


  {
unsigned long flags;
  
@@ -80,6 +80,8 @@ static inline void __write_host_tlbe(struct kvm_book3e_206_tlb_entry *stlbe,

mtspr(SPRN_MAS3, (u32)stlbe->mas7_3);
mtspr(SPRN_MAS7, (u32)(stlbe->mas7_3 >> 32));
  #ifdef CONFIG_KVM_BOOKE_HV
+   /* populate mas8 with latest LPID */


What is a "latest LPID"? Really all you're doing is you're populating 
mas8 with the thread-specific lpid.



+   stlbe->mas8 = MAS8_TGS | *lpid;
mtspr(SPRN_MAS8, stlbe->mas8);


Just ignore the value in stlbe and directly write MAS8_TGS | lpid into mas8.



  #endif
asm volatile("isync; tlbwe" : : : "memory");
@@ -129,11 +131,12 @@ static inline void write_host_tlbe(struct 
kvmppc_vcpu_e500 *vcpu_e500,
  
  	if (tlbsel == 0) {

mas0 = get_host_mas0(stlbe->mas2);
-   __write_host_tlbe(stlbe, mas0);
+   __write_host_tlbe(stlbe, mas0, &vcpu_e500->vcpu.arch.lpid);
} else {
__write_host_tlbe(stlbe,
  MAS0_TLBSEL(1) |
- MAS0_ESEL(to_htlb1_esel(sesel)));
+ MAS0_ESEL(to_htlb1_esel(sesel)),
+ &vcpu_e500->vcpu.arch.lpid);
}
  }
  
@@ -318,9 +321,7 @@ static void kvmppc_e500_setup_stlbe(

stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
  
-#ifdef CONFIG_KVM_BOOKE_HV

-   stlbe->mas8 = MAS8_TGS | vcpu->kvm->arch.lpid;
-#endif
+   /* Set mas8 when executing tlbwe since LPID can change dynamically */


Please be more precise in this comment.


  }
  
  static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,

@@ -632,7 +633,7 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum 
instruction_type type,
  
  	local_irq_save(flags);

mtspr(SPRN_MAS6, (vcpu->

KVM call for agenda for 2014-08-19

2014-08-11 Thread Juan Quintela

Hi

Please, send any topic that you are interested in covering.

People have complained on the past that I don't cancel the call until
the very last minute.  So, what do you think that deadline for
submitting topics is 23:00UTC on Monday?

Call details:

 15:00 CEST
 13:00 UTC
 09:00 EDT

Every two weeks

By popular demand, a google calendar public entry with it

 
https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

  (Let me know if you have any problems with the calendar entry)

If you need phone number details,  contact me privately

Thanks, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] powerpc/kvm: support to handle sw breakpoint

2014-08-11 Thread Alexander Graf


On 11.08.14 10:51, Benjamin Herrenschmidt wrote:

On Mon, 2014-08-11 at 09:26 +0200, Alexander Graf wrote:

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index da86d9b..d95014e 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c

This should be book3s_emulate.c.

Any reason we can't make that 0000 opcode as breakpoint common to
all powerpc variants ?


I can't think of a good reason. We use a hypercall on booke (which traps 
into an illegal instruction for pr) today, but I don't think it has to 
be that way.


Given that the user space API allows us to change it dynamically, there 
should be nothing blocking us from going with 0000 always.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] powerpc/kvm: support to handle sw breakpoint

2014-08-11 Thread Benjamin Herrenschmidt
On Mon, 2014-08-11 at 09:26 +0200, Alexander Graf wrote:
> > diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
> > index da86d9b..d95014e 100644
> > --- a/arch/powerpc/kvm/emulate.c
> > +++ b/arch/powerpc/kvm/emulate.c
> 
> This should be book3s_emulate.c.

Any reason we can't make that 0000 opcode as breakpoint common to
all powerpc variants ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC]Enlarge the dalta of TSC match window from one second to five second

2014-08-11 Thread xiexiangyou
hi,

In kvm_write_tsc() func of kvm, The TSCs will be synchronized unless the time 
diff of creating vcpus small than one second.
However, In my enviroment, stress is large, the vcpu creating time is delay, 
sometimes the diff time between vcpu creating
is more than one second. In this case, TSCs in VM are not the same with each 
other when it boot.
(1)To solve the issue, should we enlarge the dalta of TSC match window from one 
second to five second?

as follows:

 * it's better to try to match offsets from the beginning.
  */
-   if (nsdiff < NSEC_PER_SEC &&
+   if (nsdiff < 5 *NSEC_PER_SEC &&
vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
if (!check_tsc_unstable()) {

(2)Another way to solve the issue: setting all VPUs' tsc_offset equal to the 
first boot VCPU's. So in special case, hotpluging VCPU,
we can ensure TSC clocksource is stable.

Thanks.
xiexiangyou

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] powerpc/kvm: support to handle sw breakpoint

2014-08-11 Thread Alexander Graf


On 01.08.14 06:50, Madhavan Srinivasan wrote:

This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.

Changes v2->v3:
  Changed the debug instructions. Using the all zero opcode in the instruction 
word
   as illegal instruction as mentioned in Power ISA instead of ABS
  Removed reg updated in emulation assist and added a call to
   kvmppc_emulate_instruction for reg update.

Changes v1->v2:

  Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM can also 
share it.
  Added code to use KVM get one reg infrastructure to get debug opcode.
  Updated emulate.c to include emulation of debug instruction incase of PR_KVM.
  Made changes to commit message.

Signed-off-by: Madhavan Srinivasan 
---
  arch/powerpc/include/asm/kvm_book3s.h |  7 +++
  arch/powerpc/include/asm/ppc-opcode.h |  5 +
  arch/powerpc/kvm/book3s.c |  3 ++-
  arch/powerpc/kvm/book3s_hv.c  | 12 ++--
  arch/powerpc/kvm/book3s_pr.c  |  3 +++
  arch/powerpc/kvm/emulate.c|  9 +
  6 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index f52f656..f17e3fd 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -24,6 +24,13 @@
  #include 
  #include 
  
+/*

+ * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting Software 
Breakpoint.
+ * Based on PowerISA v2.07, Instruction with opcode 0s will be treated as 
illegal
+ * instruction.
+ */
+#define KVMPPC_INST_BOOK3S_DEBUG   0x0000
+
  struct kvmppc_bat {
u64 raw;
u32 bepi;
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 3132bb9..56739b3 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -111,6 +111,11 @@
  #define OP_31_XOP_LHBRX 790
  #define OP_31_XOP_STHBRX918
  
+/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction

+ * 0x0000 -- Primary opcode is 0s
+ */
+#define OP_ZERO 0x0
+
  #define OP_LWZ  32
  #define OP_LD   58
  #define OP_LWZU 33
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c254c27..b40fe5d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
  {
-   return -EINVAL;
+   vcpu->guest_debug = dbg->control;
+   return 0;
  }
  
  void kvmppc_decrementer_func(unsigned long data)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7a12edb..7c16f4f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -725,8 +725,13 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 * we don't emulate any guest instructions at this stage.
 */
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
-   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
-   r = RESUME_GUEST;
+   if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG) {
+   kvmppc_emulate_instruction(run, vcpu);


I changed the emulation code flow very recently, so while I advised you 
to write it this way this won't work with recent git versions anymore :(.


Please just create a tiny static function that handles this particular 
inst and duplicate the logic in book3s_emulate.c (for PR) as well as 
here (for HV).



+   r = RESUME_HOST;
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   r = RESUME_GUEST;
+   }
break;
/*
 * This occurs if the guest (kernel or userspace), does something that
@@ -831,6 +836,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 
id,
long int i;
  
  	switch (id) {

+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_BOOK3S_DEBUG);
+   break;
case KVM_REG_PPC_HIOR:
*val = get_reg_val(id, 0);
break;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 8eef1e5..27f5234 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1229,6 +1229,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
int r = 0;
  
  	switch (id) {

+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_BOOK3S_DEBUG);
+   break;
case KVM_REG_PPC_HIOR: