Re: [PATCH] sched: access local runqueue directly in single_task_running

2015-09-18 Thread Dominik Dingel
On Fri, 18 Sep 2015 13:26:53 +0200
Paolo Bonzini  wrote:

> 
> 
> On 18/09/2015 11:27, Dominik Dingel wrote:
> > Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
> > check if it is the only task running on a cpu") referenced the current
> > runqueue with the smp_processor_id.  When CONFIG_DEBUG_PREEMPT is enabled,
> > that is only allowed if preemption is disabled or the currrent task is
> > bound to the local cpu (e.g. kernel worker).
> > 
> > With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
> > calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
> > generates a lot of kernel messages.
> > 
> > To avoid adding preemption in that cases, as it would limit the usefulness,
> > we change single_task_running to access directly the cpu local runqueue.
> > 
> > Cc: Tim Chen 
> > Suggested-by: Peter Zijlstra 
> > Cc:  # 4.2.x
> > Signed-off-by: Dominik Dingel 
> > ---
> >  kernel/sched/core.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 78b4bad10..5bfad0b 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2614,13 +2614,13 @@ unsigned long nr_running(void)
> >  
> >  /*
> >   * Check if only the current task is running on the cpu.
> > + *
> > + * Caution result is subject to time-of-check-to-time-of-use race,
> > + * every caller is responsible to set up additional fences if necessary.
> 
> Let's expand it a bit more:
> 
>  * Caution: this function does not check that the caller has disabled
>  * preemption, thus the result might have a time-of-check-to-time-of-use
>  * race.  The caller is responsible to use this correctly, for example:
>  *
>  * - use it from a non-preemptable section
>  *
>  * - use it from a thread that is bound to a single CPU
>  *
>  * - use it in a loop where each iteration takes very little time
>  *   (e.g. a polling loop)
>  */
> 
> I'll include it in my pull request.

Sounds really good.
Thank you!

> Paolo
> 
> >   */
> >  bool single_task_running(void)
> >  {
> > -   if (cpu_rq(smp_processor_id())->nr_running == 1)
> > -   return true;
> > -   else
> > -   return false;
> > +   return raw_rq()->nr_running == 1;
> >  }
> >  EXPORT_SYMBOL(single_task_running);
> >  
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sched: access local runqueue directly in single_task_running

2015-09-18 Thread Dominik Dingel
Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id.  When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).

With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.

To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.

Cc: Tim Chen 
Suggested-by: Peter Zijlstra 
Cc:  # 4.2.x
Signed-off-by: Dominik Dingel 
---
 kernel/sched/core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10..5bfad0b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2614,13 +2614,13 @@ unsigned long nr_running(void)
 
 /*
  * Check if only the current task is running on the cpu.
+ *
+ * Caution result is subject to time-of-check-to-time-of-use race,
+ * every caller is responsible to set up additional fences if necessary.
  */
 bool single_task_running(void)
 {
-   if (cpu_rq(smp_processor_id())->nr_running == 1)
-   return true;
-   else
-   return false;
+   return raw_rq()->nr_running == 1;
 }
 EXPORT_SYMBOL(single_task_running);
 
-- 
2.3.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: single_task_running() vs. preemption warnings (was Re: [PATCH] kvm: fix preemption warnings in kvm_vcpu_block)

2015-09-17 Thread Dominik Dingel
On Thu, 17 Sep 2015 18:45:00 +0200
Paolo Bonzini  wrote:

> 
> 
> On 17/09/2015 18:27, Dominik Dingel wrote:
> > +   preempt_disable();
> > +   solo = single_task_running();
> > +   preempt_enable();
> > +
> > cur = ktime_get();
> > -   } while (single_task_running() && ktime_before(cur, stop));
> 
> That's the obvious way to fix it, but the TOCTTOU problem (which was in
> the buggy code too) is obvious too. :)  And the only other user of
> single_task_running() in drivers/crypto/mcryptd.c has the same issue.

Right, worst thing we fly another round.

I am not sure about the case for mcryptd.c. I think it might be that the worker
there is bounded to one cpu and will not be migrated.

I really need to look more in the details what is happening with that worker.

> In fact, because of the way the function is used ("maybe I can do a
> little bit of work before going to sleep") it will likely be called many
> times in a loop.  This in turn means that:
> 
> - any wrong result due to a concurrent process migration would be
> rectified very soon
> 
> - preempt_disable()/preempt_enable() can actually be just as expensive
> or more expensive than single_task_running() itself.
> 
> Therefore, I wonder if single_task_running() should just use
> raw_smp_processor_id().  At least the TOCTTOU issue can be clearly
> documented in the function comment, instead of being hidden behind each
> of the callers.

Yes to be useful it should probably call raw_smp_processor_id,
and as a lot of code actually already does just does that I do not really see 
much
down sides.

@Tim, would it be okay if I change single_task_running and add a specific 
comment on top?

> Thanks,
> 
> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: fix preemption warnings in kvm_vcpu_block

2015-09-17 Thread Dominik Dingel
Commit f78195129963 ("kvm: add halt_poll_ns module parameter") calls, with
enabled preemption, single_task_running. When CONFIG_DEBUG_PREEMPT is
enabled that will result in a debug_smp_processor_id() call.

Cc:   # 4.2.x
Signed-off-by: Dominik Dingel 
---
 virt/kvm/kvm_main.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54534de..ce67dd6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1971,6 +1971,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 
start = cur = ktime_get();
if (vcpu->halt_poll_ns) {
+   bool solo;
ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
 
do {
@@ -1982,8 +1983,13 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_successful_poll;
goto out;
}
+
+   preempt_disable();
+   solo = single_task_running();
+   preempt_enable();
+
cur = ktime_get();
-   } while (single_task_running() && ktime_before(cur, stop));
+   } while (solo && ktime_before(cur, stop));
}
 
for (;;) {
-- 
2.3.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: trivial fix comment regarding __kvm_set_memory_region

2014-10-27 Thread Dominik Dingel
commit 72dc67a69690 ("KVM: remove the usage of the mmap_sem for the protection 
of the memory slots.")
changed the lock which will be taken. This should be reflected in the function
commentary.

Signed-off-by: Dominik Dingel 
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d82ec25..8b13607 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -738,7 +738,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm 
*kvm,
  *
  * Discontiguous memory is allowed, mostly for framebuffers.
  *
- * Must be called holding mmap_sem for write.
+ * Must be called holding kvm->slots_lock for write.
  */
 int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem)
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function

2014-10-22 Thread Dominik Dingel
On Wed, 22 Oct 2014 12:22:23 -0700
Andrew Morton  wrote:

> On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel  
> wrote:
> 
> > Add a new function stub to allow architectures to disable for
> > an mm_structthe backing of non-present, anonymous pages with
> > read-only empty zero pages.
> > 
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
> >  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
> >  #endif
> >  
> > +#ifndef mm_forbids_zeropage
> > +#define mm_forbids_zeropage(X)  (0)
> > +#endif
> 
> Can we document this please?  What it does, why it does it.  We should
> also specify precisely which arch header file is responsible for
> defining mm_forbids_zeropage.
> 

I will add a comment like:

/*
 * To prevent common memory management code establishing
 * a zero page mapping on a read fault.
 * This function should be implemented within .
 * s390 does this to prevent multiplexing of hardware bits
 * related to the physical page in case of virtualization.
 */

Okay?


> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process

2014-10-22 Thread Dominik Dingel
s390 has the special notion of storage keys which are some sort of page flags
associated with physical pages and live outside of direct addressable memory.
These storage keys can be queried and changed with a special set of 
instructions.
The mentioned instructions behave quite nicely under virtualization, if there 
is: 
- an invalid pte, then the instructions will work on memory in the host page 
table
- a valid pte, then the instructions will work with the real storage key

Thanks to Martin with his software reference and dirty bit tracking,
the kernel does not issue any storage key instructions as now a 
software based approach will be taken, on the other hand distributions 
in the wild are currently using them.

However, for virtualized guests we still have a problem with guest pages 
mapped to zero pages and the kernel same page merging.  
With each one multiple guest pages will point to the same physical page
and share the same storage key.

Let's fix this by introducing a new function which s390 will define to
forbid new zero page mappings.  If the guest issues a storage key related 
instruction we flag the mm_struct, drop existing zero page mappings
and unmerge the guest memory.

v2 -> v3:
 - Clearing up patch description Patch 3/4
 - removing unnecessary flag in mmu_context (Paolo)

v1 -> v2: 
 - Following Dave and Paolo suggestion removing the vma flag

Dominik Dingel (4):
  s390/mm: recfactor global pgste updates
  mm: introduce mm_forbids_zeropage function
  s390/mm: prevent and break zero page mappings in case of storage keys
  s390/mm: disable KSM for storage key enabled pages

 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   8 +-
 arch/s390/kvm/kvm-s390.c|   2 +-
 arch/s390/kvm/priv.c|  17 ++--
 arch/s390/mm/pgtable.c  | 180 ++--
 include/linux/mm.h  |   4 +
 mm/huge_memory.c|   2 +-
 mm/memory.c |   2 +-
 8 files changed, 106 insertions(+), 111 deletions(-)

-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys

2014-10-22 Thread Dominik Dingel
As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
  storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
  storage key from PGSTE for Y will got to storage key for zero page
  overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/pgtable.h |  5 +
 arch/s390/mm/pgtable.c  | 13 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1e991f6a..0da98d6 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
return 0;
 }
 
+/*
+ * In the case that a guest uses storage keys
+ * faults should no longer be backed by zero pages
+ */
+#define mm_forbids_zeropage mm_use_skey
 static inline int mm_use_skey(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ab55ba8..58d7eb2 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long 
addr,
pgste_t pgste;
 
pgste = pgste_get_lock(pte);
+   /*
+* Remove all zero page mappings,
+* after establishing a policy to forbid zero page mappings
+* following faults for that page will get fresh anonymous pages
+*/
+   if (is_zero_pfn(pte_pfn(*pte))) {
+   ptep_flush_direct(walk->mm, addr, pte);
+   pte_val(*pte) = _PAGE_INVALID;
+   }
/* Clear storage key */
pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
  PGSTE_GR_BIT | PGSTE_GC_BIT);
@@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
down_write(&mm->mmap_sem);
if (mm_use_skey(mm))
goto out_up;
+
+   mm->context.use_skey = 1;
+
walk.mm = mm;
walk_page_range(0, TASK_SIZE, &walk);
-   mm->context.use_skey = 1;
 
 out_up:
up_write(&mm->mmap_sem);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] s390/mm: recfactor global pgste updates

2014-10-22 Thread Dominik Dingel
Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   1 +
 arch/s390/kvm/kvm-s390.c|   2 +-
 arch/s390/mm/pgtable.c  | 153 ++--
 4 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 9e18a61..120e126 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned 
long);
 void page_table_free(struct mm_struct *, unsigned long *);
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
 
-void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long,
-   bool init_skey);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
  unsigned long key, bool nq);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5efb2fe..1e991f6a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned 
long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern void s390_enable_skey(void);
+extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
  * No page table caches to initialise
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 81b0e11..7a33c11 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct 
kvm_device_attr *attr)
case KVM_S390_VM_MEM_CLR_CMMA:
mutex_lock(&kvm->lock);
idx = srcu_read_lock(&kvm->srcu);
-   page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false);
+   s390_reset_cmma(kvm->arch.gmap->mm);
srcu_read_unlock(&kvm->srcu, idx);
mutex_unlock(&kvm->lock);
ret = 0;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5404a62..ab55ba8 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long 
*table)
__free_page(page);
 }
 
-static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t 
*pmd,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   pte_t *start_pte, *pte;
-   spinlock_t *ptl;
-   pgste_t pgste;
-
-   start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-   pte = start_pte;
-   do {
-   pgste = pgste_get_lock(pte);
-   pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
-   if (init_skey) {
-   unsigned long address;
-
-   pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
- PGSTE_GR_BIT | PGSTE_GC_BIT);
-
-   /* skip invalid and not writable pages */
-   if (pte_val(*pte) & _PAGE_INVALID ||
-   !(pte_val(*pte) & _PAGE_WRITE)) {
-   pgste_set_unlock(pte, pgste);
-   continue;
-   }
-
-   address = pte_val(*pte) & PAGE_MASK;
-   page_set_storage_key(address, PAGE_DEFAULT_KEY, 1);
-   }
-   pgste_set_unlock(pte, pgste);
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   pte_unmap_unlock(start_pte, ptl);
-
-   return addr;
-}
-
-static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t 
*pud,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   unsigned long next;
-   pmd_t *pmd;
-
-   pmd = pmd_offset(pud, addr);
-   do {
-   next = pmd_addr_end(addr, end);
-   if (pmd_none_or_clear_bad(pmd))
-   continue;
-   next = page_table_reset_pte(mm, pmd, addr, next, init_skey);
-   } while (pmd++, addr = next, addr != end);
-
-   return addr;
-}
-
-static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t 
*pgd,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   unsigned long next;
-   pud_t *pud;
-
-   pud = pud_offset(pgd, addr);
-   do {
-   next = pud_addr_end(addr, end);
-   if (pud_none_or_clear_bad(pud))
-   continue;

[PATCH 4/4] s390/mm: disable KSM for storage key enabled pages

2014-10-22 Thread Dominik Dingel
When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel 
Acked-by: Christian Borntraeger 
---
 arch/s390/include/asm/pgtable.h |  2 +-
 arch/s390/kvm/priv.c| 17 -
 arch/s390/mm/pgtable.c  | 16 +++-
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0da98d6..dfb38af 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, 
unsigned long offset)
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
-extern void s390_enable_skey(void);
+extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f89c1cd..e0967fd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static void __skey_check_enable(struct kvm_vcpu *vcpu)
+static int __skey_check_enable(struct kvm_vcpu *vcpu)
 {
+   int rc = 0;
if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
-   return;
+   return rc;
 
-   s390_enable_skey();
+   rc = s390_enable_skey();
trace_kvm_s390_skey_related_inst(vcpu);
vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
+   return rc;
 }
 
 
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
-   __skey_check_enable(vcpu);
+   int rc = __skey_check_enable(vcpu);
 
+   if (rc)
+   return rc;
vcpu->stat.instruction_storage_key++;
 
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
}
 
if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
-   __skey_check_enable(vcpu);
+   int rc = __skey_check_enable(vcpu);
+
+   if (rc)
+   return rc;
if (set_guest_storage_key(current->mm, useraddr,
vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 58d7eb2..82aa528 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long 
addr,
return 0;
 }
 
-void s390_enable_skey(void)
+int s390_enable_skey(void)
 {
struct mm_walk walk = { .pte_entry = __s390_enable_skey };
struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
+   int rc = 0;
 
down_write(&mm->mmap_sem);
if (mm_use_skey(mm))
goto out_up;
 
mm->context.use_skey = 1;
+   for (vma = mm->mmap; vma; vma = vma->vm_next) {
+   if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+   MADV_UNMERGEABLE, &vma->vm_flags)) {
+   mm->context.use_skey = 0;
+   rc = -ENOMEM;
+   goto out_up;
+   }
+   }
+   mm->def_flags &= ~VM_MERGEABLE;
 
walk.mm = mm;
walk_page_range(0, TASK_SIZE, &walk);
 
 out_up:
up_write(&mm->mmap_sem);
+   return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] mm: introduce mm_forbids_zeropage function

2014-10-22 Thread Dominik Dingel
Add a new function stub to allow architectures to disable for
an mm_structthe backing of non-present, anonymous pages with
read-only empty zero pages.

Signed-off-by: Dominik Dingel 
---
 include/linux/mm.h | 4 
 mm/huge_memory.c   | 2 +-
 mm/memory.c| 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd33ae2..0a2022e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
 
+#ifndef mm_forbids_zeropage
+#define mm_forbids_zeropage(X)  (0)
+#endif
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de98415..357a381 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return VM_FAULT_OOM;
if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
return VM_FAULT_OOM;
-   if (!(flags & FAULT_FLAG_WRITE) &&
+   if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
transparent_hugepage_use_zero_page()) {
spinlock_t *ptl;
pgtable_t pgtable;
diff --git a/mm/memory.c b/mm/memory.c
index 64f82aa..f275a9d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return VM_FAULT_SIGBUS;
 
/* Use the zero-page for reads */
-   if (!(flags & FAULT_FLAG_WRITE)) {
+   if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
vma->vm_page_prot));
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys

2014-10-22 Thread Dominik Dingel
On Wed, 22 Oct 2014 12:09:31 +0200
Paolo Bonzini  wrote:

> On 10/22/2014 10:30 AM, Dominik Dingel wrote:
> > As use_skey is already the condition on which we call s390_enable_skey
> > we need to introduce a new flag for the mm->context on which we decide
> > if zero page mapping is allowed.
> 
> Can you explain better why "mm->context.use_skey = 1" cannot be done
> before the walk_page_range?  Where does the walk or __s390_enable_skey
> or (after the next patch) ksm_madvise rely on
> "mm->context.forbids_zeropage && !mm->context.use_skey"?

I can't, my reasoning there is wrong.
I remembered incorrectly that we use mm_use_skey in arch/s390/kvm/priv.c to
check if we need to call s390_enable_skey, but that does happen
with the interception bits.

So every vCPU which get the a interception for a storage key instruction
will call s390_enable_skey and wait there for the mmap_sem.

> The only reason I can think of, is that the next patch does not reset
> "mm->context.forbids_zeropage" to 0 if the ksm_madvise fails.  Why
> doesn't it do that---or is it a bug?

You are right, this is a bug, where we will drop to userspace with -ENOMEM.

I will fix this as well. 


> Thanks, and sorry for the flurry of questions! :)

I really appreciate your questions and remarks. Thank you!

> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] mm: introduce mm_forbids_zeropage function

2014-10-22 Thread Dominik Dingel
Add a new function stub to allow architectures to disable for
an mm_structthe backing of non-present, anonymous pages with
read-only empty zero pages.

Signed-off-by: Dominik Dingel 
---
 include/linux/mm.h | 4 
 mm/huge_memory.c   | 2 +-
 mm/memory.c| 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd33ae2..0a2022e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
 
+#ifndef mm_forbids_zeropage
+#define mm_forbids_zeropage(X)  (0)
+#endif
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de98415..357a381 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return VM_FAULT_OOM;
if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
return VM_FAULT_OOM;
-   if (!(flags & FAULT_FLAG_WRITE) &&
+   if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
transparent_hugepage_use_zero_page()) {
spinlock_t *ptl;
pgtable_t pgtable;
diff --git a/mm/memory.c b/mm/memory.c
index 64f82aa..f275a9d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return VM_FAULT_SIGBUS;
 
/* Use the zero-page for reads */
-   if (!(flags & FAULT_FLAG_WRITE)) {
+   if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
vma->vm_page_prot));
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys

2014-10-22 Thread Dominik Dingel
As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
  storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
  storage key from PGSTE for Y will got to storage key for zero page
  overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).
As sske and host large pages are also mutual exclusive we do not even
need to retry the fixup_user_fault.

As use_skey is already the condition on which we call s390_enable_skey
we need to introduce a new flag for the mm->context on which we decide
if zero page mapping is allowed.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/mmu.h |  2 ++
 arch/s390/include/asm/pgtable.h | 14 ++
 arch/s390/mm/pgtable.c  | 12 
 3 files changed, 28 insertions(+)

diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
index a5e6562..0f38469 100644
--- a/arch/s390/include/asm/mmu.h
+++ b/arch/s390/include/asm/mmu.h
@@ -18,6 +18,8 @@ typedef struct {
unsigned int has_pgste:1;
/* The mmu context uses storage keys. */
unsigned int use_skey:1;
+   /* The mmu context forbids zeropage mappings. */
+   unsigned int forbids_zeropage:1;
 } mm_context_t;
 
 #define INIT_MM_CONTEXT(name)\
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1e991f6a..fe3cfdf 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -481,6 +481,20 @@ static inline int mm_has_pgste(struct mm_struct *mm)
return 0;
 }
 
+/*
+ * In the case that a guest uses storage keys
+ * faults should no longer be backed by zero pages
+ */
+#define mm_forbids_zeropage mm_forbids_zeropage
+static inline int mm_forbids_zeropage(struct mm_struct *mm)
+{
+#ifdef CONFIG_PGSTE
+   if (mm->context.forbids_zeropage)
+   return 1;
+#endif
+   return 0;
+}
+
 static inline int mm_use_skey(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ab55ba8..1e06fbc 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long 
addr,
pgste_t pgste;
 
pgste = pgste_get_lock(pte);
+   /*
+* Remove all zero page mappings,
+* after establishing a policy to forbid zero page mappings
+* following faults for that page will get fresh anonymous pages
+*/
+   if (is_zero_pfn(pte_pfn(*pte))) {
+   ptep_flush_direct(walk->mm, addr, pte);
+   pte_val(*pte) = _PAGE_INVALID;
+   }
/* Clear storage key */
pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
  PGSTE_GR_BIT | PGSTE_GC_BIT);
@@ -1327,6 +1336,9 @@ void s390_enable_skey(void)
down_write(&mm->mmap_sem);
if (mm_use_skey(mm))
goto out_up;
+
+   mm->context.forbids_zeropage = 1;
+
walk.mm = mm;
walk_page_range(0, TASK_SIZE, &walk);
mm->context.use_skey = 1;
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] s390/mm: disable KSM for storage key enabled pages

2014-10-22 Thread Dominik Dingel
When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel 
Acked-by: Christian Borntraeger 
---
 arch/s390/include/asm/pgtable.h |  2 +-
 arch/s390/kvm/priv.c| 17 -
 arch/s390/mm/pgtable.c  | 15 ++-
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index fe3cfdf..20f3186 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1763,7 +1763,7 @@ static inline pte_t mk_swap_pte(unsigned long type, 
unsigned long offset)
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
-extern void s390_enable_skey(void);
+extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f89c1cd..e0967fd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static void __skey_check_enable(struct kvm_vcpu *vcpu)
+static int __skey_check_enable(struct kvm_vcpu *vcpu)
 {
+   int rc = 0;
if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
-   return;
+   return rc;
 
-   s390_enable_skey();
+   rc = s390_enable_skey();
trace_kvm_s390_skey_related_inst(vcpu);
vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
+   return rc;
 }
 
 
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
-   __skey_check_enable(vcpu);
+   int rc = __skey_check_enable(vcpu);
 
+   if (rc)
+   return rc;
vcpu->stat.instruction_storage_key++;
 
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
}
 
if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
-   __skey_check_enable(vcpu);
+   int rc = __skey_check_enable(vcpu);
+
+   if (rc)
+   return rc;
if (set_guest_storage_key(current->mm, useraddr,
vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 1e06fbc..798ab49 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1328,16 +1330,26 @@ static int __s390_enable_skey(pte_t *pte, unsigned long 
addr,
return 0;
 }
 
-void s390_enable_skey(void)
+int s390_enable_skey(void)
 {
struct mm_walk walk = { .pte_entry = __s390_enable_skey };
struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
+   int rc = 0;
 
down_write(&mm->mmap_sem);
if (mm_use_skey(mm))
goto out_up;
 
mm->context.forbids_zeropage = 1;
+   for (vma = mm->mmap; vma; vma = vma->vm_next) {
+   if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+   MADV_UNMERGEABLE, &vma->vm_flags)) {
+   rc = -ENOMEM;
+   goto out_up;
+   }
+   }
+   mm->def_flags &= ~VM_MERGEABLE;
 
walk.mm = mm;
walk_page_range(0, TASK_SIZE, &walk);
@@ -1345,6 +1357,7 @@ void s390_enable_skey(void)
 
 out_up:
up_write(&mm->mmap_sem);
+   return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] mm: new function to forbid zeropage mappings for a process

2014-10-22 Thread Dominik Dingel
s390 has the special notion of storage keys which are some sort of page flags
associated with physical pages and live outside of direct addressable memory.
These storage keys can be queried and changed with a special set of 
instructions.
The mentioned instructions behave quite nicely under virtualization, if there 
is: 
- an invalid pte, then the instructions will work on memory in the host page 
table
- a valid pte, then the instructions will work with the real storage key

Thanks to Martin with his software reference and dirty bit tracking,
the kernel does not issue any storage key instructions as now a 
software based approach will be taken, on the other hand distributions 
in the wild are currently using them.

However, for virtualized guests we still have a problem with guest pages 
mapped to zero pages and the kernel same page merging.  
With each one multiple guest pages will point to the same physical page
and share the same storage key.

Let's fix this by introducing a new function which s390 will define to
forbid new zero page mappings.  If the guest issues a storage key related 
instruction we flag the mm_struct, drop existing zero page mappings
and unmerge the guest memory.

v1 -> v2: 
 - Following Dave and Paolo suggestion removing the vma flag

Dominik Dingel (4):
  s390/mm: recfactor global pgste updates
  mm: introduce mm_forbids_zeropage function
  s390/mm: prevent and break zero page mappings in case of storage keys
  s390/mm: disable KSM for storage key enabled pages

 arch/s390/include/asm/mmu.h |   2 +
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |  17 +++-
 arch/s390/kvm/kvm-s390.c|   2 +-
 arch/s390/kvm/priv.c|  17 ++--
 arch/s390/mm/pgtable.c  | 180 ++--
 include/linux/mm.h  |   4 +
 mm/huge_memory.c|   2 +-
 mm/memory.c |   2 +-
 9 files changed, 117 insertions(+), 111 deletions(-)

-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] s390/mm: recfactor global pgste updates

2014-10-22 Thread Dominik Dingel
Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   1 +
 arch/s390/kvm/kvm-s390.c|   2 +-
 arch/s390/mm/pgtable.c  | 153 ++--
 4 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 9e18a61..120e126 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned 
long);
 void page_table_free(struct mm_struct *, unsigned long *);
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
 
-void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long,
-   bool init_skey);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
  unsigned long key, bool nq);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5efb2fe..1e991f6a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned 
long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern void s390_enable_skey(void);
+extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
  * No page table caches to initialise
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 81b0e11..7a33c11 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct 
kvm_device_attr *attr)
case KVM_S390_VM_MEM_CLR_CMMA:
mutex_lock(&kvm->lock);
idx = srcu_read_lock(&kvm->srcu);
-   page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false);
+   s390_reset_cmma(kvm->arch.gmap->mm);
srcu_read_unlock(&kvm->srcu, idx);
mutex_unlock(&kvm->lock);
ret = 0;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5404a62..ab55ba8 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long 
*table)
__free_page(page);
 }
 
-static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t 
*pmd,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   pte_t *start_pte, *pte;
-   spinlock_t *ptl;
-   pgste_t pgste;
-
-   start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-   pte = start_pte;
-   do {
-   pgste = pgste_get_lock(pte);
-   pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
-   if (init_skey) {
-   unsigned long address;
-
-   pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
- PGSTE_GR_BIT | PGSTE_GC_BIT);
-
-   /* skip invalid and not writable pages */
-   if (pte_val(*pte) & _PAGE_INVALID ||
-   !(pte_val(*pte) & _PAGE_WRITE)) {
-   pgste_set_unlock(pte, pgste);
-   continue;
-   }
-
-   address = pte_val(*pte) & PAGE_MASK;
-   page_set_storage_key(address, PAGE_DEFAULT_KEY, 1);
-   }
-   pgste_set_unlock(pte, pgste);
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   pte_unmap_unlock(start_pte, ptl);
-
-   return addr;
-}
-
-static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t 
*pud,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   unsigned long next;
-   pmd_t *pmd;
-
-   pmd = pmd_offset(pud, addr);
-   do {
-   next = pmd_addr_end(addr, end);
-   if (pmd_none_or_clear_bad(pmd))
-   continue;
-   next = page_table_reset_pte(mm, pmd, addr, next, init_skey);
-   } while (pmd++, addr = next, addr != end);
-
-   return addr;
-}
-
-static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t 
*pgd,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   unsigned long next;
-   pud_t *pud;
-
-   pud = pud_offset(pgd, addr);
-   do {
-   next = pud_addr_end(addr, end);
-   if (pud_none_or_clear_bad(pud))
-   continue;

Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag

2014-10-21 Thread Dominik Dingel
On Tue, 21 Oct 2014 10:11:43 +0200
Paolo Bonzini  wrote:

> 
> 
> On 10/21/2014 08:11 AM, Martin Schwidefsky wrote:
> >> I agree with Dave (I thought I disagreed, but I changed my mind while
> >> writing down my thoughts).  Just define mm_forbids_zeropage in
> >> arch/s390/include/asm, and make it return mm->context.use_skey---with a
> >> comment explaining how this is only for processes that use KVM, and then
> >> only for guests that use storage keys.
> >
> > The mm_forbids_zeropage() sure will work for now, but I think a vma flag
> > is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE,
> > the best solution would be to only mark those vmas that are mapped to
> > the guest. That we have not found a way to do that yet in a sensible way
> > does not change the fact that "no-zero-page" is a per-vma property, no?
> 
> I agree it should be per-VMA.  However, right now the code is 
> complicated unnecessarily by making it a per-VMA flag.  Also, setting 
> the flag per VMA should probably be done in 
> kvm_arch_prepare_memory_region together with some kind of storage key 
> notifier.  This is not very much like Dominik's patch.  All in all, 
> mm_forbids_zeropage() provides a non-intrusive and non-controversial way 
> to fix the bug.  Later on, switching to vma_forbids_zeropage() will be 
> trivial as far as mm/ code is concerned.
> 

Thank you for all the feedback, will cook up a new version.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag

2014-10-18 Thread Dominik Dingel
On Fri, 17 Oct 2014 15:04:21 -0700
Dave Hansen  wrote:
 
> Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE
> status?  Reading the patches, it _looks_ like it might be an all or
> nothing thing.

Currently it is an all or nothing thing, but for a future change we might want 
to just
tag the guest memory instead of the complete user address space.

> Full disclosure: I've got an x86-specific feature I want to steal a flag
> for.  Maybe we should just define another VM_ARCH bit.
> 

So you think of something like:

#if defined(CONFIG_S390)
# define VM_NOZEROPAGE  VM_ARCH_1
#endif

#ifndef VM_NOZEROPAGE
# define VM_NOZEROPAGE  VM_NONE
#endif

right?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] s390/mm: disable KSM for storage key enabled pages

2014-10-17 Thread Dominik Dingel
When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel 
Acked-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/pgtable.h |  2 +-
 arch/s390/kvm/priv.c| 17 -
 arch/s390/mm/pgtable.c  | 15 +--
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1e991f6a..a5362e4 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1749,7 +1749,7 @@ static inline pte_t mk_swap_pte(unsigned long type, 
unsigned long offset)
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
-extern void s390_enable_skey(void);
+extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f89c1cd..e0967fd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static void __skey_check_enable(struct kvm_vcpu *vcpu)
+static int __skey_check_enable(struct kvm_vcpu *vcpu)
 {
+   int rc = 0;
if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
-   return;
+   return rc;
 
-   s390_enable_skey();
+   rc = s390_enable_skey();
trace_kvm_s390_skey_related_inst(vcpu);
vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
+   return rc;
 }
 
 
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
-   __skey_check_enable(vcpu);
+   int rc = __skey_check_enable(vcpu);
 
+   if (rc)
+   return rc;
vcpu->stat.instruction_storage_key++;
 
if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
}
 
if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
-   __skey_check_enable(vcpu);
+   int rc = __skey_check_enable(vcpu);
+
+   if (rc)
+   return rc;
if (set_guest_storage_key(current->mm, useraddr,
vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 6321692..b3311c1 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1328,18 +1330,26 @@ static int __s390_enable_skey(pte_t *pte, unsigned long 
addr,
return 0;
 }
 
-void s390_enable_skey(void)
+int s390_enable_skey(void)
 {
struct mm_walk walk = { .pte_entry = __s390_enable_skey };
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+   int rc = 0;
 
down_write(&mm->mmap_sem);
if (mm_use_skey(mm))
goto out_up;
 
-   for (vma = mm->mmap; vma; vma = vma->vm_next)
+   for (vma = mm->mmap; vma; vma = vma->vm_next) {
+   if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+   MADV_UNMERGEABLE, &vma->vm_flags)) {
+   rc = -ENOMEM;
+   goto out_up;
+   }
vma->vm_flags |= VM_NOZEROPAGE;
+   }
+   mm->def_flags &= ~VM_MERGEABLE;
mm->def_flags |= VM_NOZEROPAGE;
 
walk.mm = mm;
@@ -1348,6 +1358,7 @@ void s390_enable_skey(void)
 
 out_up:
up_write(&mm->mmap_sem);
+   return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys

2014-10-17 Thread Dominik Dingel
As soon as storage keys are enabled we need to work around of zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
  storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
  storage key from PGSTE for Y will got to storage key for zero page
  overwriting storage key for X

While holding the mmap sem, we are safe before changes on entries we
already fixed. As sske and host large pages are also mutual exclusive
we do not even need to retry the fixup_user_fault.

Signed-off-by: Dominik Dingel 
Acked-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/Kconfig  |  3 +++
 arch/s390/mm/pgtable.c | 15 +++
 2 files changed, 18 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 05c78bb..4e04e63 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -1,6 +1,9 @@
 config MMU
def_bool y
 
+config NOZEROPAGE
+   def_bool y
+
 config ZONE_DMA
def_bool y
 
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ab55ba8..6321692 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long 
addr,
pgste_t pgste;
 
pgste = pgste_get_lock(pte);
+   /*
+* Remove all zero page mappings,
+* after establishing a policy to forbid zero page mappings
+* following faults for that page will get fresh anonymous pages
+*/
+   if (is_zero_pfn(pte_pfn(*pte))) {
+   ptep_flush_direct(walk->mm, addr, pte);
+   pte_val(*pte) = _PAGE_INVALID;
+   }
/* Clear storage key */
pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
  PGSTE_GR_BIT | PGSTE_GC_BIT);
@@ -1323,10 +1332,16 @@ void s390_enable_skey(void)
 {
struct mm_walk walk = { .pte_entry = __s390_enable_skey };
struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
 
down_write(&mm->mmap_sem);
if (mm_use_skey(mm))
goto out_up;
+
+   for (vma = mm->mmap; vma; vma = vma->vm_next)
+   vma->vm_flags |= VM_NOZEROPAGE;
+   mm->def_flags |= VM_NOZEROPAGE;
+
walk.mm = mm;
walk_page_range(0, TASK_SIZE, &walk);
mm->context.use_skey = 1;
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] mm: new flag to forbid zero page mappings for a vma

2014-10-17 Thread Dominik Dingel
s390 has the special notion of storage keys which are some sort of page flags
associated with physical pages and live outside of direct addressable memory.
These storage keys can be queried and changed with a special set of 
instructions.
The mentioned instructions behave quite nicely under virtualization, if there 
is: 
- an invalid pte, then the instructions will work on some memory reserved in 
the host page table
- a valid pte, then the instructions will work with the real storage key

Thanks to Martin with his software reference and dirty bit tracking, the kernel 
does not issue any 
storage key instructions as now a software based approach will be taken, on the 
other hand 
distributions in the wild are currently using them.

However, for virtualized guests we still have a problem with guest pages mapped 
to zero pages
and the kernel same page merging.  WIth each one multiple guest pages will 
point to the same 
physical page and share the same storage key.

Let's fix this by introducing a new flag which will forbid new zero page 
mappings.
If the guest issues a storage key related instruction we flag all vmas and drop 
existing 
zero page mappings and unmerge the guest memory.

Dominik Dingel (4):
  s390/mm: recfactor global pgste updates
  mm: introduce new VM_NOZEROPAGE flag
  s390/mm: prevent and break zero page mappings in case of storage keys
  s390/mm: disable KSM for storage key enabled pages

 arch/s390/Kconfig   |   3 +
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   3 +-
 arch/s390/kvm/kvm-s390.c|   2 +-
 arch/s390/kvm/priv.c|  17 ++--
 arch/s390/mm/pgtable.c  | 181 ++--
 include/linux/mm.h  |  13 ++-
 mm/huge_memory.c|   2 +-
 mm/memory.c |   2 +-
 9 files changed, 112 insertions(+), 113 deletions(-)

-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] s390/mm: recfactor global pgste updates

2014-10-17 Thread Dominik Dingel
Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   1 +
 arch/s390/kvm/kvm-s390.c|   2 +-
 arch/s390/mm/pgtable.c  | 153 ++--
 4 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 9e18a61..120e126 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned 
long);
 void page_table_free(struct mm_struct *, unsigned long *);
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
 
-void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long,
-   bool init_skey);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
  unsigned long key, bool nq);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5efb2fe..1e991f6a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned 
long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern void s390_enable_skey(void);
+extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
  * No page table caches to initialise
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 81b0e11..7a33c11 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct 
kvm_device_attr *attr)
case KVM_S390_VM_MEM_CLR_CMMA:
mutex_lock(&kvm->lock);
idx = srcu_read_lock(&kvm->srcu);
-   page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false);
+   s390_reset_cmma(kvm->arch.gmap->mm);
srcu_read_unlock(&kvm->srcu, idx);
mutex_unlock(&kvm->lock);
ret = 0;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5404a62..ab55ba8 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long 
*table)
__free_page(page);
 }
 
-static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t 
*pmd,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   pte_t *start_pte, *pte;
-   spinlock_t *ptl;
-   pgste_t pgste;
-
-   start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-   pte = start_pte;
-   do {
-   pgste = pgste_get_lock(pte);
-   pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
-   if (init_skey) {
-   unsigned long address;
-
-   pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
- PGSTE_GR_BIT | PGSTE_GC_BIT);
-
-   /* skip invalid and not writable pages */
-   if (pte_val(*pte) & _PAGE_INVALID ||
-   !(pte_val(*pte) & _PAGE_WRITE)) {
-   pgste_set_unlock(pte, pgste);
-   continue;
-   }
-
-   address = pte_val(*pte) & PAGE_MASK;
-   page_set_storage_key(address, PAGE_DEFAULT_KEY, 1);
-   }
-   pgste_set_unlock(pte, pgste);
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   pte_unmap_unlock(start_pte, ptl);
-
-   return addr;
-}
-
-static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t 
*pud,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   unsigned long next;
-   pmd_t *pmd;
-
-   pmd = pmd_offset(pud, addr);
-   do {
-   next = pmd_addr_end(addr, end);
-   if (pmd_none_or_clear_bad(pmd))
-   continue;
-   next = page_table_reset_pte(mm, pmd, addr, next, init_skey);
-   } while (pmd++, addr = next, addr != end);
-
-   return addr;
-}
-
-static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t 
*pgd,
-   unsigned long addr, unsigned long end, bool init_skey)
-{
-   unsigned long next;
-   pud_t *pud;
-
-   pud = pud_offset(pgd, addr);
-   do {
-   next = pud_addr_end(addr, end);
-   if (pud_none_or_clear_

[PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag

2014-10-17 Thread Dominik Dingel
Add a new vma flag to allow an architecture to disable the backing
of non-present, anonymous pages with the read-only empty zero page.

Signed-off-by: Dominik Dingel 
Acked-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
 include/linux/mm.h | 13 +++--
 mm/huge_memory.c   |  2 +-
 mm/memory.c|  2 +-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd33ae2..8f09c91 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_GROWSDOWN   0x0100  /* general info on the segment */
 #define VM_PFNMAP  0x0400  /* Page-ranges managed without "struct 
page", just pure PFN */
 #define VM_DENYWRITE   0x0800  /* ETXTBSY on write attempts.. */
-
+#define VM_NOZEROPAGE  0x1000  /* forbid new zero page mappings */
 #define VM_LOCKED  0x2000
 #define VM_IO   0x4000 /* Memory mapped I/O or similar */
 
@@ -179,7 +179,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
 
 /* This mask defines which mm->def_flags a process can inherit its parent */
-#define VM_INIT_DEF_MASK   VM_NOHUGEPAGE
+#define VM_INIT_DEF_MASK   (VM_NOHUGEPAGE | VM_NOZEROPAGE)
 
 /*
  * mapping from the currently active vm_flags protection bits (the
@@ -1293,6 +1293,15 @@ static inline int stack_guard_page_end(struct 
vm_area_struct *vma,
!vma_growsup(vma->vm_next, addr);
 }
 
+static inline int vma_forbids_zeropage(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_NOZEROPAGE
+   return vma->vm_flags & VM_NOZEROPAGE;
+#else
+   return 0;
+#endif
+}
+
 extern struct task_struct *task_of_stack(struct task_struct *task,
struct vm_area_struct *vma, bool in_group);
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de98415..c271265 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return VM_FAULT_OOM;
if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
return VM_FAULT_OOM;
-   if (!(flags & FAULT_FLAG_WRITE) &&
+   if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma) &&
transparent_hugepage_use_zero_page()) {
spinlock_t *ptl;
pgtable_t pgtable;
diff --git a/mm/memory.c b/mm/memory.c
index 64f82aa..1859b2b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return VM_FAULT_SIGBUS;
 
/* Use the zero-page for reads */
-   if (!(flags & FAULT_FLAG_WRITE)) {
+   if (!(flags & FAULT_FLAG_WRITE) && !vma_forbids_zeropage(vma)) {
entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
vma->vm_page_prot));
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] PF: Async page fault support on s390

2013-07-10 Thread Dominik Dingel
This patch enables async page faults for s390 kvm guests.
It provides the userspace API to enable, disable or get the status of this
feature. Also it includes the diagnose code, called by the guest to enable
async page faults.

The async page faults will use an already existing guest interface for this
purpose, as described in "CP Programming Services (SC24-6084)".

Signed-off-by: Dominik Dingel 
---
 Documentation/s390/kvm.txt   |  24 +
 arch/s390/include/asm/kvm_host.h |  22 
 arch/s390/include/uapi/asm/kvm.h |  10 
 arch/s390/kvm/Kconfig|   2 +
 arch/s390/kvm/Makefile   |   2 +-
 arch/s390/kvm/diag.c |  63 +++
 arch/s390/kvm/interrupt.c|  43 +---
 arch/s390/kvm/kvm-s390.c | 107 ++-
 arch/s390/kvm/kvm-s390.h |   4 ++
 arch/s390/kvm/sigp.c |   6 +++
 include/uapi/linux/kvm.h |   2 +
 11 files changed, 276 insertions(+), 9 deletions(-)

diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt
index 85f3280..707b7e9 100644
--- a/Documentation/s390/kvm.txt
+++ b/Documentation/s390/kvm.txt
@@ -70,6 +70,30 @@ floating interrupts are:
 KVM_S390_INT_VIRTIO
 KVM_S390_INT_SERVICE
 
+ioctl:  KVM_S390_APF_ENABLE:
+args:   none
+This ioctl is used to enable the async page fault interface. So in a
+host page fault case the host can now submit pfault tokens to the guest.
+
+ioctl:  KVM_S390_APF_DISABLE:
+args:   none
+This ioctl is used to disable the async page fault interface. From this point
+on no new pfault tokens will be issued to the guest. Already existing async
+page faults are not covered by this and will be normally handled.
+
+ioctl:  KVM_S390_APF_STATUS:
+args:   none
+This ioctl allows the userspace to get the current status of the APF feature.
+The main purpose for this, is to ensure that no pfault tokens will be lost
+during live migration or similar management operations.
+The possible return values are:
+KVM_S390_APF_DISABLED_NON_PENDING
+KVM_S390_APF_DISABLED_PENDING
+KVM_S390_APF_ENABLED_NON_PENDING
+KVM_S390_APF_ENABLED_PENDING
+Caution: if KVM_S390_APF is enabled the PENDING status could be already changed
+as soon as the ioctl returns to userspace.
+
 3. ioctl calls to the kvm-vcpu file descriptor
 KVM does support the following ioctls on s390 that are common with other
 architectures and do behave the same:
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index cd30c3d..e8012fc 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -257,6 +257,10 @@ struct kvm_vcpu_arch {
u64 stidp_data;
};
struct gmap *gmap;
+#define KVM_S390_PFAULT_TOKEN_INVALID  (-1UL)
+   unsigned long pfault_token;
+   unsigned long pfault_select;
+   unsigned long pfault_compare;
 };
 
 struct kvm_vm_stat {
@@ -282,6 +286,24 @@ static inline bool kvm_is_error_hva(unsigned long addr)
return addr == KVM_HVA_ERR_BAD;
 }
 
+#define ASYNC_PF_PER_VCPU  64
+struct kvm_vcpu;
+struct kvm_async_pf;
+struct kvm_arch_async_pf {
+   unsigned long pfault_token;
+};
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work);
+
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index d25da59..b6c83e0 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -57,4 +57,14 @@ struct kvm_sync_regs {
 #define KVM_REG_S390_EPOCHDIFF (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x2)
 #define KVM_REG_S390_CPU_TIMER  (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x3)
 #define KVM_REG_S390_CLOCK_COMP (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x4)
+
+/* ioctls used for setting/getting status of APF on s390x */
+#define KVM_S390_APF_ENABLE1
+#define KVM_S390_APF_DISABLE   2
+#define KVM_S390_APF_STATUS3
+#define KVM_S390_APF_DISABLED_NON_PENDING  0
+#define KVM_S390_APF_DISABLED_PENDING  1
+#define KVM_S390_APF_ENABLED_NON_PENDING   2
+#define KVM_S390_APF_ENABLED_PENDING   3
+
 #endif
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 70b46ea..4993eed 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -23,6 +23,8 @@ config KVM
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_EVENTFD
+   select KVM_ASYNC_PF
+   select KVM_ASYNC_PF_DIRECT
---help---
  Support hosting paravirtualized guest mach

[PATCH 3/4] PF: Provide additional direct page notification

2013-07-10 Thread Dominik Dingel
By setting a Kconfig option, the architecture can control when
guest notifications will be presented by the apf backend.
So there is the default batch mechanism, working as before, where the vcpu 
thread
should pull in this information. On the other hand there is now the direct
mechanism, this will directly push the information to the guest.
This way s390 can use an already existing architecture interface.

Still the vcpu thread should call check_completion to cleanup leftovers,
that leaves most of the common code untouched.

Signed-off-by: Dominik Dingel 
Acked-by: Christian Borntraeger 
---
 arch/x86/kvm/mmu.c   |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/Kconfig |  4 
 virt/kvm/async_pf.c  | 22 +++---
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0d094da..b8632e9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3343,7 +3343,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, 
gva_t gva, gfn_t gfn)
arch.direct_map = vcpu->arch.mmu.direct_map;
arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
 
-   return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
+   return kvm_setup_async_pf(vcpu, gva, gfn_to_hva(vcpu->kvm, gfn), &arch);
 }
 
 static bool can_do_async_pf(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 92e8f64..fe87e46 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -191,7 +191,7 @@ struct kvm_async_pf {
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
   struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 779262f..0774495 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -22,6 +22,10 @@ config KVM_MMIO
 config KVM_ASYNC_PF
bool
 
+# Toggle to switch between direct notification and batch job
+config KVM_ASYNC_PF_SYNC
+   bool
+
 config HAVE_KVM_MSI
bool
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index ea475cd..cfa9366 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -28,6 +28,21 @@
 #include "async_pf.h"
 #include 
 
+static inline void kvm_async_page_present_sync(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work)
+{
+#ifdef CONFIG_KVM_ASYNC_PF_SYNC
+   kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+static inline void kvm_async_page_present_async(struct kvm_vcpu *vcpu,
+   struct kvm_async_pf *work)
+{
+#ifndef CONFIG_KVM_ASYNC_PF_SYNC
+   kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+
 static struct kmem_cache *async_pf_cache;
 
 int kvm_async_pf_init(void)
@@ -70,6 +85,7 @@ static void async_pf_execute(struct work_struct *work)
down_read(&mm->mmap_sem);
get_user_pages(current, mm, addr, 1, 1, 0, &page, NULL);
up_read(&mm->mmap_sem);
+   kvm_async_page_present_sync(vcpu, apf);
unuse_mm(mm);
 
spin_lock(&vcpu->async_pf.lock);
@@ -134,7 +150,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 
if (work->page)
kvm_arch_async_page_ready(vcpu, work);
-   kvm_arch_async_page_present(vcpu, work);
+   kvm_async_page_present_async(vcpu, work);
 
list_del(&work->queue);
vcpu->async_pf.queued--;
@@ -144,7 +160,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
}
 }
 
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
   struct kvm_arch_async_pf *arch)
 {
struct kvm_async_pf *work;
@@ -166,7 +182,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
work->done = false;
work->vcpu = vcpu;
work->gva = gva;
-   work->addr = gfn_to_hva(vcpu->kvm, gfn);
+   work->addr = hva;
work->arch = *arch;
work->mm = current->mm;
atomic_inc(&work->mm->mm_count);
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/4] Enable async page faults on s390

2013-07-10 Thread Dominik Dingel
Gleb, Paolo, 

based on the work from Martin and Carsten, this implementation enables async 
page faults.
To the guest it will provide the pfault interface, but internally it uses the
async page fault common code. 

The inital submission and it's discussion can be followed on 
http://www.mail-archive.com/kvm@vger.kernel.org/msg63359.html . 

There is a slight modification for common code to move from a pull to a push 
based approch on s390. 
As s390 we don't want to wait till we leave the guest state to queue the 
notification interrupts.

To use this feature the controlling userspace hase to enable the capability.
With that knob we can later on disable this feature for live migration.

v3 -> v4
 - Change "done" interrupts from local to floating
 - Add a comment for clarification
 - Change KVM_HAVE_ERR_BAD to move s390 implementation to s390 backend 

v2 -> v3
 - Reworked the architecture specific parts, to only provide on addtional
   implementation
 - Renamed function to kvm_async_page_present_(sync|async)
 - Fixing KVM_HVA_ERR_BAD handling

v1 -> v2:
 - Adding other architecture backends
 - Adding documentation for the ioctl
 - Improving the overall error handling
 - Reducing the needed modifications on the common code

Dominik Dingel (4):
  PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault
  PF: Make KVM_HVA_ERR_BAD usable on s390
  PF: Provide additional direct page notification
  PF: Async page fault support on s390

 Documentation/s390/kvm.txt|  24 
 arch/s390/include/asm/kvm_host.h  |  30 ++
 arch/s390/include/asm/pgtable.h   |   2 +
 arch/s390/include/asm/processor.h |   1 +
 arch/s390/include/uapi/asm/kvm.h  |  10 
 arch/s390/kvm/Kconfig |   2 +
 arch/s390/kvm/Makefile|   2 +-
 arch/s390/kvm/diag.c  |  63 
 arch/s390/kvm/interrupt.c |  43 +++---
 arch/s390/kvm/kvm-s390.c  | 118 ++
 arch/s390/kvm/kvm-s390.h  |   4 ++
 arch/s390/kvm/sigp.c  |   6 ++
 arch/s390/mm/fault.c  |  26 +++--
 arch/x86/kvm/mmu.c|   2 +-
 include/linux/kvm_host.h  |  10 +++-
 include/uapi/linux/kvm.h  |   2 +
 virt/kvm/Kconfig  |   4 ++
 virt/kvm/async_pf.c   |  22 ++-
 18 files changed, 354 insertions(+), 17 deletions(-)

-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault

2013-07-10 Thread Dominik Dingel
In case of a fault retry exit sie64() with gmap_fault indication for the
running thread set. This makes it possible to handle async page faults
without the need for mm notifiers.

Based on a patch from Martin Schwidefsky.

Signed-off-by: Dominik Dingel 
Acked-by: Christian Borntraeger 
---
 arch/s390/include/asm/pgtable.h   |  2 ++
 arch/s390/include/asm/processor.h |  1 +
 arch/s390/kvm/kvm-s390.c  | 13 +
 arch/s390/mm/fault.c  | 26 ++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0ea4e59..4a4cc64 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -740,6 +740,7 @@ static inline void pgste_set_pte(pte_t *ptep, pte_t entry)
  * @table: pointer to the page directory
  * @asce: address space control element for gmap page table
  * @crst_list: list of all crst tables used in the guest address space
+ * @pfault_enabled: defines if pfaults are applicable for the guest
  */
 struct gmap {
struct list_head list;
@@ -748,6 +749,7 @@ struct gmap {
unsigned long asce;
void *private;
struct list_head crst_list;
+   unsigned long pfault_enabled;
 };
 
 /**
diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 6b49987..4fa96ca 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -77,6 +77,7 @@ struct thread_struct {
 unsigned long ksp;  /* kernel stack pointer */
mm_segment_t mm_segment;
unsigned long gmap_addr;/* address of last gmap fault. */
+   unsigned int gmap_pfault;   /* signal of a pending guest pfault */
struct per_regs per_user;   /* User specified PER registers */
struct per_event per_event; /* Cause of the last PER trap */
unsigned long per_flags;/* Flags to control debug behavior */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ba694d2..702daca 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -682,6 +682,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static void kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
+{
+   hva_t fault = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
+   struct mm_struct *mm = current->mm;
+   down_read(&mm->mmap_sem);
+   get_user_pages(current, mm, fault, 1, 1, 0, NULL, NULL);
+   up_read(&mm->mmap_sem);
+}
+
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
int rc;
@@ -715,6 +724,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (rc < 0) {
if (kvm_is_ucontrol(vcpu->kvm)) {
rc = SIE_INTERCEPT_UCONTROL;
+   } else if (current->thread.gmap_pfault) {
+   kvm_arch_fault_in_sync(vcpu);
+   current->thread.gmap_pfault = 0;
+   rc = 0;
} else {
VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
trace_kvm_s390_sie_fault(vcpu);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 047c3e4..7d4c4b1 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -50,6 +50,7 @@
 #define VM_FAULT_BADMAP0x02
 #define VM_FAULT_BADACCESS 0x04
 #define VM_FAULT_SIGNAL0x08
+#define VM_FAULT_PFAULT0x10
 
 static unsigned long store_indication __read_mostly;
 
@@ -232,6 +233,7 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
return;
}
case VM_FAULT_BADCONTEXT:
+   case VM_FAULT_PFAULT:
do_no_context(regs);
break;
case VM_FAULT_SIGNAL:
@@ -269,6 +271,9 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
  */
 static inline int do_exception(struct pt_regs *regs, int access)
 {
+#ifdef CONFIG_PGSTE
+   struct gmap *gmap;
+#endif
struct task_struct *tsk;
struct mm_struct *mm;
struct vm_area_struct *vma;
@@ -307,9 +312,10 @@ static inline int do_exception(struct pt_regs *regs, int 
access)
down_read(&mm->mmap_sem);
 
 #ifdef CONFIG_PGSTE
-   if ((current->flags & PF_VCPU) && S390_lowcore.gmap) {
-   address = __gmap_fault(address,
-(struct gmap *) S390_lowcore.gmap);
+   gmap = (struct gmap *)
+   ((current->flags & PF_VCPU) ? S390_lowcore.gmap : 0);
+   if (gmap) {
+   address = __gmap_fault(address, gmap);
if (address == -EFAULT) {
fault = VM_FAULT_BADMAP;
goto out_up;
@@ -318,6 +324,8 @@ static inline int do_excep

[PATCH 2/4] PF: Make KVM_HVA_ERR_BAD usable on s390

2013-07-10 Thread Dominik Dingel
Current common code uses PAGE_OFFSET to indicate a bad host virtual address.
As this check won't work on architectures that don't map kernel and user memory
into the same address space (e.g. s390), such architectures can now provide
there own KVM_HVA_ERR_BAD defines.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/kvm_host.h | 8 
 include/linux/kvm_host.h | 8 
 2 files changed, 16 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3238d40..cd30c3d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -274,6 +274,14 @@ struct kvm_arch{
int css_support;
 };
 
+#define KVM_HVA_ERR_BAD(-1UL)
+#define KVM_HVA_ERR_RO_BAD (-1UL)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   return addr == KVM_HVA_ERR_BAD;
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a63d83e..92e8f64 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -85,6 +85,12 @@ static inline bool is_noslot_pfn(pfn_t pfn)
return pfn == KVM_PFN_NOSLOT;
 }
 
+/*
+ * architectures with KVM_HVA_ERR_BAD other than PAGE_OFFSET (e.g. s390)
+ * provide own defines and kvm_is_error_hva
+ */
+#ifndef KVM_HVA_ERR_BAD
+
 #define KVM_HVA_ERR_BAD(PAGE_OFFSET)
 #define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
 
@@ -93,6 +99,8 @@ static inline bool kvm_is_error_hva(unsigned long addr)
return addr >= PAGE_OFFSET;
 }
 
+#endif
+
 #define KVM_ERR_PTR_BAD_PAGE   (ERR_PTR(-ENOENT))
 
 static inline bool is_error_page(struct page *page)
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault

2013-07-09 Thread Dominik Dingel
In case of a fault retry exit sie64() with gmap_fault indication for the
running thread set. This makes it possible to handle async page faults
without the need for mm notifiers.

Based on a patch from Martin Schwidefsky.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/pgtable.h   |  2 ++
 arch/s390/include/asm/processor.h |  1 +
 arch/s390/kvm/kvm-s390.c  | 13 +
 arch/s390/mm/fault.c  | 26 ++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0ea4e59..4a4cc64 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -740,6 +740,7 @@ static inline void pgste_set_pte(pte_t *ptep, pte_t entry)
  * @table: pointer to the page directory
  * @asce: address space control element for gmap page table
  * @crst_list: list of all crst tables used in the guest address space
+ * @pfault_enabled: defines if pfaults are applicable for the guest
  */
 struct gmap {
struct list_head list;
@@ -748,6 +749,7 @@ struct gmap {
unsigned long asce;
void *private;
struct list_head crst_list;
+   unsigned long pfault_enabled;
 };
 
 /**
diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 6b49987..4fa96ca 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -77,6 +77,7 @@ struct thread_struct {
 unsigned long ksp;  /* kernel stack pointer */
mm_segment_t mm_segment;
unsigned long gmap_addr;/* address of last gmap fault. */
+   unsigned int gmap_pfault;   /* signal of a pending guest pfault */
struct per_regs per_user;   /* User specified PER registers */
struct per_event per_event; /* Cause of the last PER trap */
unsigned long per_flags;/* Flags to control debug behavior */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ba694d2..702daca 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -682,6 +682,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static void kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
+{
+   hva_t fault = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
+   struct mm_struct *mm = current->mm;
+   down_read(&mm->mmap_sem);
+   get_user_pages(current, mm, fault, 1, 1, 0, NULL, NULL);
+   up_read(&mm->mmap_sem);
+}
+
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
int rc;
@@ -715,6 +724,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (rc < 0) {
if (kvm_is_ucontrol(vcpu->kvm)) {
rc = SIE_INTERCEPT_UCONTROL;
+   } else if (current->thread.gmap_pfault) {
+   kvm_arch_fault_in_sync(vcpu);
+   current->thread.gmap_pfault = 0;
+   rc = 0;
} else {
VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
trace_kvm_s390_sie_fault(vcpu);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 047c3e4..7d4c4b1 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -50,6 +50,7 @@
 #define VM_FAULT_BADMAP0x02
 #define VM_FAULT_BADACCESS 0x04
 #define VM_FAULT_SIGNAL0x08
+#define VM_FAULT_PFAULT0x10
 
 static unsigned long store_indication __read_mostly;
 
@@ -232,6 +233,7 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
return;
}
case VM_FAULT_BADCONTEXT:
+   case VM_FAULT_PFAULT:
do_no_context(regs);
break;
case VM_FAULT_SIGNAL:
@@ -269,6 +271,9 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
  */
 static inline int do_exception(struct pt_regs *regs, int access)
 {
+#ifdef CONFIG_PGSTE
+   struct gmap *gmap;
+#endif
struct task_struct *tsk;
struct mm_struct *mm;
struct vm_area_struct *vma;
@@ -307,9 +312,10 @@ static inline int do_exception(struct pt_regs *regs, int 
access)
down_read(&mm->mmap_sem);
 
 #ifdef CONFIG_PGSTE
-   if ((current->flags & PF_VCPU) && S390_lowcore.gmap) {
-   address = __gmap_fault(address,
-(struct gmap *) S390_lowcore.gmap);
+   gmap = (struct gmap *)
+   ((current->flags & PF_VCPU) ? S390_lowcore.gmap : 0);
+   if (gmap) {
+   address = __gmap_fault(address, gmap);
if (address == -EFAULT) {
fault = VM_FAULT_BADMAP;
goto out_up;
@@ -318,6 +324,8 @@ static inline int do_exception(struct pt_regs *regs, int 
acces

[PATCH 3/4] PF: Provide additional direct page notification

2013-07-09 Thread Dominik Dingel
By setting a Kconfig option, the architecture can control when
guest notifications will be presented by the apf backend.
So there is the default batch mechanism, working as before, where the vcpu 
thread
should pull in this information. On the other hand there is now the direct
mechanism, this will directly push the information to the guest.

Still the vcpu thread should call check_completion to cleanup leftovers,
that leaves most of the common code untouched.

Signed-off-by: Dominik Dingel 
---
 arch/x86/kvm/mmu.c   |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/Kconfig |  4 
 virt/kvm/async_pf.c  | 22 +++---
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0d094da..b8632e9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3343,7 +3343,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, 
gva_t gva, gfn_t gfn)
arch.direct_map = vcpu->arch.mmu.direct_map;
arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
 
-   return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
+   return kvm_setup_async_pf(vcpu, gva, gfn_to_hva(vcpu->kvm, gfn), &arch);
 }
 
 static bool can_do_async_pf(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3c04e7..465c639 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -197,7 +197,7 @@ struct kvm_async_pf {
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
   struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 779262f..0774495 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -22,6 +22,10 @@ config KVM_MMIO
 config KVM_ASYNC_PF
bool
 
+# Toggle to switch between direct notification and batch job
+config KVM_ASYNC_PF_SYNC
+   bool
+
 config HAVE_KVM_MSI
bool
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index ea475cd..cfa9366 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -28,6 +28,21 @@
 #include "async_pf.h"
 #include 
 
+static inline void kvm_async_page_present_sync(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work)
+{
+#ifdef CONFIG_KVM_ASYNC_PF_SYNC
+   kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+static inline void kvm_async_page_present_async(struct kvm_vcpu *vcpu,
+   struct kvm_async_pf *work)
+{
+#ifndef CONFIG_KVM_ASYNC_PF_SYNC
+   kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+
 static struct kmem_cache *async_pf_cache;
 
 int kvm_async_pf_init(void)
@@ -70,6 +85,7 @@ static void async_pf_execute(struct work_struct *work)
down_read(&mm->mmap_sem);
get_user_pages(current, mm, addr, 1, 1, 0, &page, NULL);
up_read(&mm->mmap_sem);
+   kvm_async_page_present_sync(vcpu, apf);
unuse_mm(mm);
 
spin_lock(&vcpu->async_pf.lock);
@@ -134,7 +150,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 
if (work->page)
kvm_arch_async_page_ready(vcpu, work);
-   kvm_arch_async_page_present(vcpu, work);
+   kvm_async_page_present_async(vcpu, work);
 
list_del(&work->queue);
vcpu->async_pf.queued--;
@@ -144,7 +160,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
}
 }
 
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
   struct kvm_arch_async_pf *arch)
 {
struct kvm_async_pf *work;
@@ -166,7 +182,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
work->done = false;
work->vcpu = vcpu;
work->gva = gva;
-   work->addr = gfn_to_hva(vcpu->kvm, gfn);
+   work->addr = hva;
work->arch = *arch;
work->mm = current->mm;
atomic_inc(&work->mm->mm_count);
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] PF: Make KVM_HVA_ERR_BAD usable on s390

2013-07-09 Thread Dominik Dingel
Current common code uses PAGE_OFFSET to indicate a bad host virtual address.
As this check won't work on architectures that don't map kernel and user memory
into the same address space (e.g. s390), an additional implementation is made
available in the case that PAGE_OFFSET == 0.

Signed-off-by: Dominik Dingel 
---
 include/linux/kvm_host.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a63d83e..f3c04e7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -85,6 +85,18 @@ static inline bool is_noslot_pfn(pfn_t pfn)
return pfn == KVM_PFN_NOSLOT;
 }
 
+#if (PAGE_OFFSET == 0)
+
+#define KVM_HVA_ERR_BAD(-1UL)
+#define KVM_HVA_ERR_RO_BAD (-1UL)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   return addr == KVM_HVA_ERR_BAD;
+}
+
+#else
+
 #define KVM_HVA_ERR_BAD(PAGE_OFFSET)
 #define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
 
@@ -93,6 +105,8 @@ static inline bool kvm_is_error_hva(unsigned long addr)
return addr >= PAGE_OFFSET;
 }
 
+#endif
+
 #define KVM_ERR_PTR_BAD_PAGE   (ERR_PTR(-ENOENT))
 
 static inline bool is_error_page(struct page *page)
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/4] Enable async page faults on s390

2013-07-09 Thread Dominik Dingel
Gleb, Paolo, 

based on the work from Martin and Carsten, this implementation enables async 
page faults.
To the guest it will provide the pfault interface, but internally it uses the
async page fault common code. 

The inital submission and it's discussion can be followed on 
http://www.mail-archive.com/kvm@vger.kernel.org/msg63359.html . 

There is a slight modification for common code to move from a pull to a push 
based approch on s390. 
As s390 we don't want to wait till we leave the guest state to queue the 
notification interrupts.

To use this feature the controlling userspace hase to enable the capability.
With that knob we can later on disable this feature for live migration.

v2 -> v3
 - Reworked the architecture specific parts, to only provide on addtional
   implementation
 - Renamed function to kvm_async_page_present_(sync|async)
 - Fixing KVM_HVA_ERR_BAD handling

v1 -> v2:
 - Adding other architecture backends
 - Adding documentation for the ioctl
 - Improving the overall error handling
 - Reducing the needed modifications on the common code


Dominik Dingel (4):
  PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault
  PF: Make KVM_HVA_ERR_BAD usable on s390
  PF: Provide additional direct page notification
  PF: Async page fault support on s390

 Documentation/s390/kvm.txt|  24 +
 arch/s390/include/asm/kvm_host.h  |  22 
 arch/s390/include/asm/pgtable.h   |   2 +
 arch/s390/include/asm/processor.h |   1 +
 arch/s390/include/uapi/asm/kvm.h  |  10 
 arch/s390/kvm/Kconfig |   2 +
 arch/s390/kvm/Makefile|   2 +-
 arch/s390/kvm/diag.c  |  57 
 arch/s390/kvm/interrupt.c |  38 ++---
 arch/s390/kvm/kvm-s390.c  | 111 ++
 arch/s390/kvm/kvm-s390.h  |   4 ++
 arch/s390/kvm/sigp.c  |   2 +
 arch/s390/mm/fault.c  |  26 +++--
 arch/x86/kvm/mmu.c|   2 +-
 include/linux/kvm_host.h  |  16 +-
 include/uapi/linux/kvm.h  |   2 +
 virt/kvm/Kconfig  |   4 ++
 virt/kvm/async_pf.c   |  22 ++--
 18 files changed, 330 insertions(+), 17 deletions(-)

-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] PF: Async page fault support on s390

2013-07-09 Thread Dominik Dingel
This patch enables async page faults for s390 kvm guests.
It provides the userspace API to enable, disable or get the status of this
feature. Also it includes the diagnose code, called by the guest to enable
async page faults.

The async page faults will use an already existing guest interface for this
purpose, as described in "CP Programming Services (SC24-6084)".

Signed-off-by: Dominik Dingel 
---
 Documentation/s390/kvm.txt   |  24 ++
 arch/s390/include/asm/kvm_host.h |  22 +
 arch/s390/include/uapi/asm/kvm.h |  10 
 arch/s390/kvm/Kconfig|   2 +
 arch/s390/kvm/Makefile   |   2 +-
 arch/s390/kvm/diag.c |  57 ++
 arch/s390/kvm/interrupt.c|  38 ---
 arch/s390/kvm/kvm-s390.c | 100 ++-
 arch/s390/kvm/kvm-s390.h |   4 ++
 arch/s390/kvm/sigp.c |   2 +
 include/uapi/linux/kvm.h |   2 +
 11 files changed, 254 insertions(+), 9 deletions(-)

diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt
index 85f3280..707b7e9 100644
--- a/Documentation/s390/kvm.txt
+++ b/Documentation/s390/kvm.txt
@@ -70,6 +70,30 @@ floating interrupts are:
 KVM_S390_INT_VIRTIO
 KVM_S390_INT_SERVICE
 
+ioctl:  KVM_S390_APF_ENABLE:
+args:   none
+This ioctl is used to enable the async page fault interface. So in a
+host page fault case the host can now submit pfault tokens to the guest.
+
+ioctl:  KVM_S390_APF_DISABLE:
+args:   none
+This ioctl is used to disable the async page fault interface. From this point
+on no new pfault tokens will be issued to the guest. Already existing async
+page faults are not covered by this and will be normally handled.
+
+ioctl:  KVM_S390_APF_STATUS:
+args:   none
+This ioctl allows the userspace to get the current status of the APF feature.
+The main purpose for this, is to ensure that no pfault tokens will be lost
+during live migration or similar management operations.
+The possible return values are:
+KVM_S390_APF_DISABLED_NON_PENDING
+KVM_S390_APF_DISABLED_PENDING
+KVM_S390_APF_ENABLED_NON_PENDING
+KVM_S390_APF_ENABLED_PENDING
+Caution: if KVM_S390_APF is enabled the PENDING status could be already changed
+as soon as the ioctl returns to userspace.
+
 3. ioctl calls to the kvm-vcpu file descriptor
 KVM does support the following ioctls on s390 that are common with other
 architectures and do behave the same:
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3238d40..ae75104 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -257,6 +257,10 @@ struct kvm_vcpu_arch {
u64 stidp_data;
};
struct gmap *gmap;
+#define KVM_S390_PFAULT_TOKEN_INVALID  (-1UL)
+   unsigned long pfault_token;
+   unsigned long pfault_select;
+   unsigned long pfault_compare;
 };
 
 struct kvm_vm_stat {
@@ -274,6 +278,24 @@ struct kvm_arch{
int css_support;
 };
 
+#define ASYNC_PF_PER_VCPU  64
+struct kvm_vcpu;
+struct kvm_async_pf;
+struct kvm_arch_async_pf {
+   unsigned long pfault_token;
+};
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work);
+
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index d25da59..b6c83e0 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -57,4 +57,14 @@ struct kvm_sync_regs {
 #define KVM_REG_S390_EPOCHDIFF (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x2)
 #define KVM_REG_S390_CPU_TIMER  (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x3)
 #define KVM_REG_S390_CLOCK_COMP (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x4)
+
+/* ioctls used for setting/getting status of APF on s390x */
+#define KVM_S390_APF_ENABLE1
+#define KVM_S390_APF_DISABLE   2
+#define KVM_S390_APF_STATUS3
+#define KVM_S390_APF_DISABLED_NON_PENDING  0
+#define KVM_S390_APF_DISABLED_PENDING  1
+#define KVM_S390_APF_ENABLED_NON_PENDING   2
+#define KVM_S390_APF_ENABLED_PENDING   3
+
 #endif
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 70b46ea..4993eed 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -23,6 +23,8 @@ config KVM
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_EVENTFD
+   select KVM_ASYNC_PF
+   select KVM_ASYNC_PF_DIRECT
---help---
  Support hosting paravirtualized guest machines using the SIE
  virtualization capabil

[PATCH 3/4] PF: Provide additional direct page notification

2013-07-05 Thread Dominik Dingel
By setting a Kconfig option, the architecture can control when
guest notifications will be presented by the apf backend.
So there is the default batch mechanism, working as before, where the vcpu 
thread
should pull in this information. On the other hand there is now the direct
mechanism, this will directly push the information to the guest.

Still the vcpu thread should call check_completion to cleanup leftovers,
that leaves most of the common code untouched.

Signed-off-by: Dominik Dingel 
---
 arch/x86/kvm/mmu.c   |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/Kconfig |  4 
 virt/kvm/async_pf.c  | 22 +++---
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0d094da..b8632e9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3343,7 +3343,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, 
gva_t gva, gfn_t gfn)
arch.direct_map = vcpu->arch.mmu.direct_map;
arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
 
-   return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
+   return kvm_setup_async_pf(vcpu, gva, gfn_to_hva(vcpu->kvm, gfn), &arch);
 }
 
 static bool can_do_async_pf(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 210f493..969d575 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -175,7 +175,7 @@ struct kvm_async_pf {
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
   struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 779262f..715e6b5 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -22,6 +22,10 @@ config KVM_MMIO
 config KVM_ASYNC_PF
bool
 
+# Toggle to switch between direct notification and batch job
+config KVM_ASYNC_PF_DIRECT
+   bool
+
 config HAVE_KVM_MSI
bool
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index ea475cd..b8df37a 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -28,6 +28,21 @@
 #include "async_pf.h"
 #include 
 
+static inline void kvm_async_page_direct_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work)
+{
+#ifdef CONFIG_KVM_ASYNC_PF_DIRECT
+   kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+static inline void kvm_async_page_batch_present(struct kvm_vcpu *vcpu,
+   struct kvm_async_pf *work)
+{
+#ifndef CONFIG_KVM_ASYNC_PF_DIRECT
+   kvm_arch_async_page_present(vcpu, work);
+#endif
+}
+
 static struct kmem_cache *async_pf_cache;
 
 int kvm_async_pf_init(void)
@@ -70,6 +85,7 @@ static void async_pf_execute(struct work_struct *work)
down_read(&mm->mmap_sem);
get_user_pages(current, mm, addr, 1, 1, 0, &page, NULL);
up_read(&mm->mmap_sem);
+   kvm_async_page_direct_present(vcpu, apf);
unuse_mm(mm);
 
spin_lock(&vcpu->async_pf.lock);
@@ -134,7 +150,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 
if (work->page)
kvm_arch_async_page_ready(vcpu, work);
-   kvm_arch_async_page_present(vcpu, work);
+   kvm_async_page_batch_present(vcpu, work);
 
list_del(&work->queue);
vcpu->async_pf.queued--;
@@ -144,7 +160,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
}
 }
 
-int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
   struct kvm_arch_async_pf *arch)
 {
struct kvm_async_pf *work;
@@ -166,7 +182,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
work->done = false;
work->vcpu = vcpu;
work->gva = gva;
-   work->addr = gfn_to_hva(vcpu->kvm, gfn);
+   work->addr = hva;
work->arch = *arch;
work->mm = current->mm;
atomic_inc(&work->mm->mm_count);
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault

2013-07-05 Thread Dominik Dingel
In case of a fault retry exit sie64() with gmap_fault indication for the
running thread set. This makes it possible to handle async page faults
without the need for mm notifiers.

Based on a patch from Martin Schwidefsky.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/pgtable.h   |  2 ++
 arch/s390/include/asm/processor.h |  1 +
 arch/s390/kvm/kvm-s390.c  | 13 +
 arch/s390/mm/fault.c  | 26 ++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0ea4e59..4a4cc64 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -740,6 +740,7 @@ static inline void pgste_set_pte(pte_t *ptep, pte_t entry)
  * @table: pointer to the page directory
  * @asce: address space control element for gmap page table
  * @crst_list: list of all crst tables used in the guest address space
+ * @pfault_enabled: defines if pfaults are applicable for the guest
  */
 struct gmap {
struct list_head list;
@@ -748,6 +749,7 @@ struct gmap {
unsigned long asce;
void *private;
struct list_head crst_list;
+   unsigned long pfault_enabled;
 };
 
 /**
diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 6b49987..4fa96ca 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -77,6 +77,7 @@ struct thread_struct {
 unsigned long ksp;  /* kernel stack pointer */
mm_segment_t mm_segment;
unsigned long gmap_addr;/* address of last gmap fault. */
+   unsigned int gmap_pfault;   /* signal of a pending guest pfault */
struct per_regs per_user;   /* User specified PER registers */
struct per_event per_event; /* Cause of the last PER trap */
unsigned long per_flags;/* Flags to control debug behavior */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ba694d2..702daca 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -682,6 +682,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static void kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
+{
+   hva_t fault = gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
+   struct mm_struct *mm = current->mm;
+   down_read(&mm->mmap_sem);
+   get_user_pages(current, mm, fault, 1, 1, 0, NULL, NULL);
+   up_read(&mm->mmap_sem);
+}
+
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
int rc;
@@ -715,6 +724,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (rc < 0) {
if (kvm_is_ucontrol(vcpu->kvm)) {
rc = SIE_INTERCEPT_UCONTROL;
+   } else if (current->thread.gmap_pfault) {
+   kvm_arch_fault_in_sync(vcpu);
+   current->thread.gmap_pfault = 0;
+   rc = 0;
} else {
VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
trace_kvm_s390_sie_fault(vcpu);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 047c3e4..7d4c4b1 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -50,6 +50,7 @@
 #define VM_FAULT_BADMAP0x02
 #define VM_FAULT_BADACCESS 0x04
 #define VM_FAULT_SIGNAL0x08
+#define VM_FAULT_PFAULT0x10
 
 static unsigned long store_indication __read_mostly;
 
@@ -232,6 +233,7 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
return;
}
case VM_FAULT_BADCONTEXT:
+   case VM_FAULT_PFAULT:
do_no_context(regs);
break;
case VM_FAULT_SIGNAL:
@@ -269,6 +271,9 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
  */
 static inline int do_exception(struct pt_regs *regs, int access)
 {
+#ifdef CONFIG_PGSTE
+   struct gmap *gmap;
+#endif
struct task_struct *tsk;
struct mm_struct *mm;
struct vm_area_struct *vma;
@@ -307,9 +312,10 @@ static inline int do_exception(struct pt_regs *regs, int 
access)
down_read(&mm->mmap_sem);
 
 #ifdef CONFIG_PGSTE
-   if ((current->flags & PF_VCPU) && S390_lowcore.gmap) {
-   address = __gmap_fault(address,
-(struct gmap *) S390_lowcore.gmap);
+   gmap = (struct gmap *)
+   ((current->flags & PF_VCPU) ? S390_lowcore.gmap : 0);
+   if (gmap) {
+   address = __gmap_fault(address, gmap);
if (address == -EFAULT) {
fault = VM_FAULT_BADMAP;
goto out_up;
@@ -318,6 +324,8 @@ static inline int do_exception(struct pt_regs *regs, int 
acces

[PATCH 4/4] PF: Async page fault support on s390

2013-07-05 Thread Dominik Dingel
This patch enables async pfage faults for s390 kvm guests.
It proviedes the userspace API to enable, disable or get the status of this
feature. Also it includes the diagnose code, called by the guest to enable
async page faults from guest point.

The async page faults will use an already existing guest interface for this
purpose, as described in "CP Programming Services" (SC24-6084)".

Signed-off-by: Dominik Dingel 
---
 Documentation/s390/kvm.txt   |  24 ++
 arch/s390/include/asm/kvm_host.h |  22 +
 arch/s390/include/uapi/asm/kvm.h |  10 
 arch/s390/kvm/Kconfig|   2 +
 arch/s390/kvm/Makefile   |   2 +-
 arch/s390/kvm/diag.c |  57 ++
 arch/s390/kvm/interrupt.c|  38 ---
 arch/s390/kvm/kvm-s390.c | 100 ++-
 arch/s390/kvm/kvm-s390.h |   4 ++
 arch/s390/kvm/sigp.c |   2 +
 include/uapi/linux/kvm.h |   2 +
 11 files changed, 254 insertions(+), 9 deletions(-)

diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt
index 85f3280..707b7e9 100644
--- a/Documentation/s390/kvm.txt
+++ b/Documentation/s390/kvm.txt
@@ -70,6 +70,30 @@ floating interrupts are:
 KVM_S390_INT_VIRTIO
 KVM_S390_INT_SERVICE
 
+ioctl:  KVM_S390_APF_ENABLE:
+args:   none
+This ioctl is used to enable the async page fault interface. So in a
+host page fault case the host can now submit pfault tokens to the guest.
+
+ioctl:  KVM_S390_APF_DISABLE:
+args:   none
+This ioctl is used to disable the async page fault interface. From this point
+on no new pfault tokens will be issued to the guest. Already existing async
+page faults are not covered by this and will be normally handled.
+
+ioctl:  KVM_S390_APF_STATUS:
+args:   none
+This ioctl allows the userspace to get the current status of the APF feature.
+The main purpose for this, is to ensure that no pfault tokens will be lost
+during live migration or similar management operations.
+The possible return values are:
+KVM_S390_APF_DISABLED_NON_PENDING
+KVM_S390_APF_DISABLED_PENDING
+KVM_S390_APF_ENABLED_NON_PENDING
+KVM_S390_APF_ENABLED_PENDING
+Caution: if KVM_S390_APF is enabled the PENDING status could be already changed
+as soon as the ioctl returns to userspace.
+
 3. ioctl calls to the kvm-vcpu file descriptor
 KVM does support the following ioctls on s390 that are common with other
 architectures and do behave the same:
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 152..ed57362 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -257,6 +257,10 @@ struct kvm_vcpu_arch {
u64 stidp_data;
};
struct gmap *gmap;
+#define KVM_S390_PFAULT_TOKEN_INVALID  (-1UL)
+   unsigned long pfault_token;
+   unsigned long pfault_select;
+   unsigned long pfault_compare;
 };
 
 struct kvm_vm_stat {
@@ -277,6 +281,24 @@ struct kvm_arch{
 #define KVM_HVA_ERR_BAD(-1UL)
 #define KVM_HVA_ERR_RO_BAD (-1UL)
 
+#define ASYNC_PF_PER_VCPU  64
+struct kvm_vcpu;
+struct kvm_async_pf;
+struct kvm_arch_async_pf {
+   unsigned long pfault_token;
+};
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work);
+
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
 static inline bool kvm_is_error_hva(unsigned long addr)
 {
/*
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index d25da59..b6c83e0 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -57,4 +57,14 @@ struct kvm_sync_regs {
 #define KVM_REG_S390_EPOCHDIFF (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x2)
 #define KVM_REG_S390_CPU_TIMER  (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x3)
 #define KVM_REG_S390_CLOCK_COMP (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x4)
+
+/* ioctls used for setting/getting status of APF on s390x */
+#define KVM_S390_APF_ENABLE1
+#define KVM_S390_APF_DISABLE   2
+#define KVM_S390_APF_STATUS3
+#define KVM_S390_APF_DISABLED_NON_PENDING  0
+#define KVM_S390_APF_DISABLED_PENDING  1
+#define KVM_S390_APF_ENABLED_NON_PENDING   2
+#define KVM_S390_APF_ENABLED_PENDING   3
+
 #endif
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 70b46ea..4993eed 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -23,6 +23,8 @@ config KVM
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_EVENTFD
+   select KVM_ASYNC_PF
+   select KVM_ASYNC_PF_DIRECT
---help---
  Support hosting paravirtualized

[PATCH 2/4] PF: Move architecture specifics to the backends

2013-07-05 Thread Dominik Dingel
Current common codes uses PAGE_OFFSET to indicate a bad host virtual address.
As this check won't work on architectures that don't map kernel and user memory
into the same address space (e.g. s390), it is moved into architcture specific
code.

Signed-off-by: Dominik Dingel 
---
 arch/arm/include/asm/kvm_host.h |  8 
 arch/ia64/include/asm/kvm_host.h|  3 +++
 arch/mips/include/asm/kvm_host.h|  6 ++
 arch/powerpc/include/asm/kvm_host.h |  8 
 arch/s390/include/asm/kvm_host.h| 12 
 arch/x86/include/asm/kvm_host.h |  8 
 include/linux/kvm_host.h|  8 
 7 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7d22517..557c2a1 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -74,6 +74,14 @@ struct kvm_arch {
struct vgic_distvgic;
 };
 
+#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   return addr >= PAGE_OFFSET;
+}
+
 #define KVM_NR_MEM_OBJS 40
 
 /*
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 989dd3f..d3afa6f 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -486,6 +486,9 @@ struct kvm_arch {
unsigned long irq_states[KVM_IOAPIC_NUM_PINS];
 };
 
+#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
+
 union cpuid3_t {
u64 value;
struct {
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 4d6fa0b..3a0a3f7 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -34,7 +34,13 @@
 #define KVM_NR_PAGE_SIZES  1
 #define KVM_PAGES_PER_HPAGE(x) 1
 
+#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
 
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   return addr >= PAGE_OFFSET;
+}
 
 /* Special address that contains the comm page, used for reducing # of traps */
 #define KVM_GUEST_COMMPAGE_ADDR 0x0
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index af326cd..be5d7f4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -273,6 +273,14 @@ struct kvm_arch {
 #endif
 };
 
+#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   return addr >= PAGE_OFFSET;
+}
+
 /*
  * Struct for a virtual core.
  * Note: entry_exit_count combines an entry count in the bottom 8 bits
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3238d40..152 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -274,6 +274,18 @@ struct kvm_arch{
int css_support;
 };
 
+#define KVM_HVA_ERR_BAD(-1UL)
+#define KVM_HVA_ERR_RO_BAD (-1UL)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   /*
+* on s390, this check is not needed as kernel and user memory
+* is not mapped into the same address space
+*/
+   return false;
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f87f7fc..07e8570 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -96,6 +96,14 @@
 
 #define ASYNC_PF_PER_VCPU 64
 
+#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   return addr >= PAGE_OFFSET;
+}
+
 struct kvm_vcpu;
 struct kvm;
 struct kvm_async_pf;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a63d83e..210f493 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -85,14 +85,6 @@ static inline bool is_noslot_pfn(pfn_t pfn)
return pfn == KVM_PFN_NOSLOT;
 }
 
-#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
-#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
-
-static inline bool kvm_is_error_hva(unsigned long addr)
-{
-   return addr >= PAGE_OFFSET;
-}
-
 #define KVM_ERR_PTR_BAD_PAGE   (ERR_PTR(-ENOENT))
 
 static inline bool is_error_page(struct page *page)
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 0/4] Enable async page faults on s390

2013-07-05 Thread Dominik Dingel
Gleb, Paolo, 

based on the work from Martin and Carsten, this implementation enables async 
page faults.
To the guest it will provide the pfault interface, but internally it uses the
async page fault common code. 

The inital submission and it's discussion can be followed on 
http://www.mail-archive.com/kvm@vger.kernel.org/msg63359.html .

There is a slight modification for common code to move from a pull to a push 
based approch on s390. 
As s390 we don't want to wait till we leave the guest state to queue the 
notification interrupts.

To use this feature the controlling userspace hase to enable the capability.
With that knob we can later on disable this feature for live migration.

v1 -> v2:
 - Adding other architecture backends
 - Adding documentation for the ioctl
 - Improving the overall error handling
 - Reducing the needed modifications on the common code

Dominik Dingel (4):
  PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault
  PF: Move architecture specifics to the backends
  PF: Provide additional direct page notification
  PF: Async page fault support on s390

 Documentation/s390/kvm.txt  |  24 
 arch/arm/include/asm/kvm_host.h |   8 +++
 arch/ia64/include/asm/kvm_host.h|   3 +
 arch/mips/include/asm/kvm_host.h|   6 ++
 arch/powerpc/include/asm/kvm_host.h |   8 +++
 arch/s390/include/asm/kvm_host.h|  34 +++
 arch/s390/include/asm/pgtable.h |   2 +
 arch/s390/include/asm/processor.h   |   1 +
 arch/s390/include/uapi/asm/kvm.h|  10 
 arch/s390/kvm/Kconfig   |   2 +
 arch/s390/kvm/Makefile  |   2 +-
 arch/s390/kvm/diag.c|  57 ++
 arch/s390/kvm/interrupt.c   |  38 +---
 arch/s390/kvm/kvm-s390.c| 111 
 arch/s390/kvm/kvm-s390.h|   4 ++
 arch/s390/kvm/sigp.c|   2 +
 arch/s390/mm/fault.c|  26 +++--
 arch/x86/include/asm/kvm_host.h |   8 +++
 arch/x86/kvm/mmu.c  |   2 +-
 include/linux/kvm_host.h|  10 +---
 include/uapi/linux/kvm.h|   2 +
 virt/kvm/Kconfig|   4 ++
 virt/kvm/async_pf.c |  22 ++-
 23 files changed, 361 insertions(+), 25 deletions(-)

-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] PF: Move architecture specifics to the backends

2013-06-10 Thread Dominik Dingel
Current common code use PAGE_OFFSET to indicate a bad host virtual address.
This works for x86 but not necessarily on other architectures. So the check
is moved into architecture specific code.

Todo:
 - apply to other architectures when applicable

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/kvm_host.h | 12 
 arch/x86/include/asm/kvm_host.h  |  8 
 include/linux/kvm_host.h |  8 
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index deb1990..e014bba 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -277,6 +277,18 @@ struct kvm_arch{
int css_support;
 };
 
+#define KVM_HVA_ERR_BAD(-1UL)
+#define KVM_HVA_ERR_RO_BAD (-1UL)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   /*
+* on s390, this check is not needed as kernel and user memory
+* is not mapped into the same address space
+*/
+   return false;
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern unsigned long sie_exit_addr;
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4979778..5ed7c83 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -94,6 +94,14 @@
 
 #define ASYNC_PF_PER_VCPU 64
 
+#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
+#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
+
+static inline bool kvm_is_error_hva(unsigned long addr)
+{
+   return addr >= PAGE_OFFSET;
+}
+
 extern raw_spinlock_t kvm_lock;
 extern struct list_head vm_list;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c139582..9bd29ef 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -84,14 +84,6 @@ static inline bool is_noslot_pfn(pfn_t pfn)
return pfn == KVM_PFN_NOSLOT;
 }
 
-#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
-#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
-
-static inline bool kvm_is_error_hva(unsigned long addr)
-{
-   return addr >= PAGE_OFFSET;
-}
-
 #define KVM_ERR_PTR_BAD_PAGE   (ERR_PTR(-ENOENT))
 
 static inline bool is_error_page(struct page *page)
-- 
1.8.1.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] PF: Intial async page fault support on s390x

2013-06-10 Thread Dominik Dingel
This patch adds the handling for async page faults to s390x code.
It provides the userspace API to enable, disable or get the
status of this feature.
Also it includes the diagnose code, called by the guest, to enable
async page faults by pfault or disable them.

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/kvm_host.h | 22 ++
 arch/s390/include/uapi/asm/kvm.h | 10 +
 arch/s390/kvm/Kconfig|  1 +
 arch/s390/kvm/Makefile   |  2 +-
 arch/s390/kvm/diag.c | 46 
 arch/s390/kvm/interrupt.c| 40 ++
 arch/s390/kvm/kvm-s390.c | 90 +++-
 arch/s390/kvm/kvm-s390.h |  4 ++
 include/uapi/linux/kvm.h |  2 +
 9 files changed, 207 insertions(+), 10 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index e014bba..18b5492 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -260,6 +260,10 @@ struct kvm_vcpu_arch {
u64 stidp_data;
};
struct gmap *gmap;
+#define KVM_S390_PFAULT_TOKEN_INVALID (-1UL)
+   unsigned long pfault_token;
+   unsigned long pfault_select;
+   unsigned long pfault_compare;
 };
 
 struct kvm_vm_stat {
@@ -280,6 +284,24 @@ struct kvm_arch{
 #define KVM_HVA_ERR_BAD(-1UL)
 #define KVM_HVA_ERR_RO_BAD (-1UL)
 
+#define ASYNC_PF_PER_VCPU  64
+struct kvm_vcpu;
+struct kvm_async_pf;
+struct kvm_arch_async_pf {
+   unsigned long pfault_token;
+};
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work);
+
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+struct kvm_async_pf *work);
+
 static inline bool kvm_is_error_hva(unsigned long addr)
 {
/*
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index d25da59..b995abe 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -57,4 +57,14 @@ struct kvm_sync_regs {
 #define KVM_REG_S390_EPOCHDIFF (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x2)
 #define KVM_REG_S390_CPU_TIMER  (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x3)
 #define KVM_REG_S390_CLOCK_COMP (KVM_REG_S390 | KVM_REG_SIZE_U64 | 0x4)
+
+/* ioctls used by userspace for setting/getting status of APF on s390x */
+#define KVM_S390_APF_ENABLE1
+#define KVM_S390_APF_DISABLE   2
+#define KVM_S390_APF_STATUS3
+#define KVM_S390_APF_DISABLED_NON_PENDING  0
+#define KVM_S390_APF_DISABLED_PENDING  1
+#define KVM_S390_APF_ENABLED_NON_PENDING   2
+#define KVM_S390_APF_ENABLED_PENDING   3
+
 #endif
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 60f9f8a..67f154e 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -22,6 +22,7 @@ config KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_CPU_RELAX_INTERCEPT
+   select KVM_ASYNC_PF
---help---
  Support hosting paravirtualized guest machines using the SIE
  virtualization capability on the mainframe. This should work
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 3975722..5bcea24 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -6,7 +6,7 @@
 # it under the terms of the GNU General Public License (version 2 only)
 # as published by the Free Software Foundation.
 
-common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
+common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o async_pf.o)
 
 ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
 
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 744cd9c..2c56824 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -17,6 +17,7 @@
 #include "kvm-s390.h"
 #include "trace.h"
 #include "trace-s390.h"
+#include "gaccess.h"
 
 static int diag_release_pages(struct kvm_vcpu *vcpu)
 {
@@ -107,6 +108,49 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
return -EREMOTE;
 }
 
+static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
+{
+   struct prs_parm {
+   u16 code;
+   u16 subcode;
+   u16 parm_len;
+   u16 parm_version;
+   u64 token_addr;
+   u64 select_mask;
+   u64 compare_mask;
+   u64 zarch;
+   };
+   struct prs_parm parm;
+   int rc;
+   u16 rx = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
+   u16 ry = (vcpu->arch.sie_block->ipa & 0x0f);
+   if (copy_from_guest_absolute(vcpu, &parm, vcpu->run->s.regs.gprs[rx],
+sizeof(parm)))
+ 

[PATCH 3/4] PF: Additional flag for direct page fault inject

2013-06-10 Thread Dominik Dingel
On some architectures, as on s390x we may want to be able to directly inject
notifications to the guest in case of a swapped in page. Also on s390x
there is no need to go from gfn to hva as by calling gmap_fault we already
have the needed address.

Due to a possible race, we now always have to insert the page to the queue.
So if we are not able to schedule the async page, we have to remove it from
the list again. As this is only when we also have to page in synchronously,
the overhead is not really important.

Signed-off-by: Dominik Dingel 
---
 arch/x86/kvm/mmu.c   |  2 +-
 include/linux/kvm_host.h |  3 ++-
 virt/kvm/async_pf.c  | 33 +++--
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 956ca35..02a49a9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3223,7 +3223,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, 
gva_t gva, gfn_t gfn)
arch.direct_map = vcpu->arch.mmu.direct_map;
arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
 
-   return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
+   return kvm_setup_async_pf(vcpu, gva, gfn, &arch, false);
 }
 
 static bool can_do_async_pf(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9bd29ef..a798deb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -165,12 +165,13 @@ struct kvm_async_pf {
struct kvm_arch_async_pf arch;
struct page *page;
bool done;
+   bool direct_inject;
 };
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
 int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
-  struct kvm_arch_async_pf *arch);
+  struct kvm_arch_async_pf *arch, bool is_direct);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index ea475cd..a4a6483 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -73,9 +73,17 @@ static void async_pf_execute(struct work_struct *work)
unuse_mm(mm);
 
spin_lock(&vcpu->async_pf.lock);
-   list_add_tail(&apf->link, &vcpu->async_pf.done);
apf->page = page;
apf->done = true;
+   if (apf->direct_inject) {
+   kvm_arch_async_page_present(vcpu, apf);
+   list_del(&apf->queue);
+   vcpu->async_pf.queued--;
+   kvm_release_page_clean(apf->page);
+   kmem_cache_free(async_pf_cache, apf);
+   } else {
+   list_add_tail(&apf->link, &vcpu->async_pf.done);
+   }
spin_unlock(&vcpu->async_pf.lock);
 
/*
@@ -145,7 +153,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 }
 
 int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
-  struct kvm_arch_async_pf *arch)
+  struct kvm_arch_async_pf *arch, bool is_direct)
 {
struct kvm_async_pf *work;
 
@@ -165,13 +173,24 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
work->page = NULL;
work->done = false;
work->vcpu = vcpu;
-   work->gva = gva;
-   work->addr = gfn_to_hva(vcpu->kvm, gfn);
+   if (gfn == -1) {
+   work->gva = -1;
+   work->addr = gva;
+   } else {
+   work->gva = gva;
+   work->addr = gfn_to_hva(vcpu->kvm, gfn);
+   }
+   work->direct_inject = is_direct;
work->arch = *arch;
work->mm = current->mm;
atomic_inc(&work->mm->mm_count);
kvm_get_kvm(work->vcpu->kvm);
 
+   spin_lock(&vcpu->async_pf.lock);
+   list_add_tail(&work->queue, &vcpu->async_pf.queue);
+   vcpu->async_pf.queued++;
+   spin_unlock(&vcpu->async_pf.lock);
+
/* this can't really happen otherwise gfn_to_pfn_async
   would succeed */
if (unlikely(kvm_is_error_hva(work->addr)))
@@ -181,11 +200,13 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
if (!schedule_work(&work->work))
goto retry_sync;
 
-   list_add_tail(&work->queue, &vcpu->async_pf.queue);
-   vcpu->async_pf.queued++;
kvm_arch_async_page_not_present(vcpu, work);
return 1;
 retry_sync:
+   spin_lock(&vcpu->async_pf.lock);
+   list_del(&work->queue);
+   vcpu->async_pf.queued--;
+   spin_unlock(&vcpu->async_pf.lock);
kvm_put_kvm(work->vcpu->kvm);
mmdrop(work->mm);
kmem_cache_free(async_pf_cache, work);
-- 
1.8.1.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault

2013-06-10 Thread Dominik Dingel
In case of a fault retry exit sie64a() with the gmap_fault indication set.
This makes it possbile to handle async page faults without the need for mm 
notifiers.

Based on a patch from Marin Schwidefsky.

Todo:
 - Add access to distinguish fault types to prevent double fault

Signed-off-by: Dominik Dingel 
---
 arch/s390/include/asm/processor.h |  7 +++
 arch/s390/kvm/kvm-s390.c  | 15 +++
 arch/s390/mm/fault.c  | 29 +
 arch/s390/mm/pgtable.c|  1 +
 4 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 6b49987..938d92c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -77,6 +77,13 @@ struct thread_struct {
 unsigned long ksp;  /* kernel stack pointer */
mm_segment_t mm_segment;
unsigned long gmap_addr;/* address of last gmap fault. */
+#define PFAULT_EN  1
+#define PFAULT_PEND2
+   unsigned long gmap_pfault;  /*
+* indicator if pfault is enabled for a
+* guest and if a guest pfault is
+* pending
+*/
struct per_regs per_user;   /* User specified PER registers */
struct per_event per_event; /* Cause of the last PER trap */
unsigned long per_flags;/* Flags to control debug behavior */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index a44c0dc..c2ae2c4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -706,6 +706,17 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static void kvm_arch_fault_in_sync(struct kvm_vcpu *vcpu)
+{
+   hva_t fault_addr;
+   /* TODO let current->thread.gmap_pfault indicate read or write fault */
+   struct mm_struct *mm = current->mm;
+   down_read(&mm->mmap_sem);
+   fault_addr = __gmap_fault(current->thread.gmap_addr, vcpu->arch.gmap);
+   get_user_pages(current, mm, fault_addr, 1, 1, 0, NULL, NULL);
+   up_read(&mm->mmap_sem);
+}
+
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
int rc;
@@ -739,6 +750,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (rc < 0) {
if (kvm_is_ucontrol(vcpu->kvm)) {
rc = SIE_INTERCEPT_UCONTROL;
+   } else if (test_bit(PFAULT_PEND,
+   ¤t->thread.gmap_pfault)) {
+   kvm_arch_fault_in_sync(vcpu);
+   rc = 0;
} else {
VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
trace_kvm_s390_sie_fault(vcpu);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index c5cfb6f..61b1644 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -50,6 +50,7 @@
 #define VM_FAULT_BADMAP0x02
 #define VM_FAULT_BADACCESS 0x04
 #define VM_FAULT_SIGNAL0x08
+#define VM_FAULT_PFAULT0x10
 
 static unsigned long store_indication __read_mostly;
 
@@ -226,6 +227,7 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
return;
}
case VM_FAULT_BADCONTEXT:
+   case VM_FAULT_PFAULT:
do_no_context(regs);
break;
case VM_FAULT_SIGNAL:
@@ -263,6 +265,9 @@ static noinline void do_fault_error(struct pt_regs *regs, 
int fault)
  */
 static inline int do_exception(struct pt_regs *regs, int access)
 {
+#ifdef CONFIG_PGSTE
+   struct gmap *gmap;
+#endif
struct task_struct *tsk;
struct mm_struct *mm;
struct vm_area_struct *vma;
@@ -301,9 +306,10 @@ static inline int do_exception(struct pt_regs *regs, int 
access)
down_read(&mm->mmap_sem);
 
 #ifdef CONFIG_PGSTE
-   if ((current->flags & PF_VCPU) && S390_lowcore.gmap) {
-   address = __gmap_fault(address,
-(struct gmap *) S390_lowcore.gmap);
+   gmap = (struct gmap *)
+   ((current->flags & PF_VCPU) ? S390_lowcore.gmap : 0);
+   if (gmap) {
+   address = __gmap_fault(address, gmap);
if (address == -EFAULT) {
fault = VM_FAULT_BADMAP;
goto out_up;
@@ -312,6 +318,8 @@ static inline int do_exception(struct pt_regs *regs, int 
access)
fault = VM_FAULT_OOM;
goto out_up;
}
+   if (test_bit(PFAULT_EN, ¤t->thread.gmap_pfault))
+   flags |= FAULT_FLAG_RETRY_NOWAIT;
}
 #endif
 
@@ -368,9 +376,22 @@ retry:
  

[RFC PATCH 0/4] Enable async page faults on s390

2013-06-10 Thread Dominik Dingel
Gleb, Paolo, 

based on the work from Martin and Carsten, this implementation enables async 
page faults.
To the guest it will provide the pfault interface, but internally it uses the
async page fault common code. 

The inital submission and it's discussion can be followed on 
http://www.mail-archive.com/kvm@vger.kernel.org/msg63359.html .

There is a slight modification for common code to move from a pull to a push 
based approch on s390. 
As s390 we don't want to wait till we leave the guest state to queue the 
notification interrupts.

To use this feature the controlling userspace hase to enable the capability.
With that knob we can later on disable this feature for live migration.

Dominik Dingel (4):
  PF: Add FAULT_FLAG_RETRY_NOWAIT for guest fault
  PF: Move architecture specifics to the backends
  PF: Additional flag for direct page fault inject
  PF: Intial async page fault support on s390x

 arch/s390/include/asm/kvm_host.h  |  34 +
 arch/s390/include/asm/processor.h |   7 +++
 arch/s390/include/uapi/asm/kvm.h  |  10 
 arch/s390/kvm/Kconfig |   1 +
 arch/s390/kvm/Makefile|   2 +-
 arch/s390/kvm/diag.c  |  46 +
 arch/s390/kvm/interrupt.c |  40 ---
 arch/s390/kvm/kvm-s390.c  | 101 ++
 arch/s390/kvm/kvm-s390.h  |   4 ++
 arch/s390/mm/fault.c  |  29 +--
 arch/s390/mm/pgtable.c|   1 +
 arch/x86/include/asm/kvm_host.h   |   8 +++
 arch/x86/kvm/mmu.c|   2 +-
 include/linux/kvm_host.h  |  11 +
 include/uapi/linux/kvm.h  |   2 +
 virt/kvm/async_pf.c   |  33 ++---
 16 files changed, 303 insertions(+), 28 deletions(-)

-- 
1.8.1.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html