[v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-10 Thread Tiejun Chen
We should ensure the preemption cannot occur while calling get_paca()
insdide hard_irq_disable(), otherwise the paca_struct may be the
wrong one just after. And btw, we may update timing stats in this case.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
 arch/powerpc/kvm/booke.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index dcc94f0..9dae25d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
WARN_ON(local_paca-irq_happened != 0);
 #endif
 
+   preempt_disable();
/*
 * We enter with interrupts disabled in hardware, but
 * we need to call hard_irq_disable anyway to ensure that
@@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
 
/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);
+   preempt_enable();
 
/* restart interrupts if they were meant for the host */
kvmppc_restart_interrupt(vcpu, exit_nr);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] kvm/ppc: fixes for 3.10

2013-07-10 Thread tiejun.chen

On 07/10/2013 05:33 AM, Alexander Graf wrote:


On 09.07.2013, at 11:09, tiejun.chen wrote:


On 07/09/2013 04:53 PM, Alexander Graf wrote:



Am 09.07.2013 um 09:21 schrieb tiejun.chen tiejun.c...@windriver.com:


On 06/07/2013 08:16 AM, Scott Wood wrote:

Most of these have been posted before, but I grouped them together as
there are some contextual dependencies between them.

Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these
if there's no objection over the next few days?


Scott and Alex,

Any plan to merge these commits into git://github.com/agraf/linux-2.6.git as 
well?


Which commits exactly are you missing there?


Alex,

Four patches,

kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
kvm/ppc/booke: Hold srcu lock when calling gfn functions
kvm/ppc/booke64: Disable e6500 support  

are already in Linus's tree,

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=authorq=Scott+Wood

So should we sync these commits into your tree?


They should already be in there?


Yes, I already see this.

Thanks,

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 08:02, Tiejun Chen wrote:

 We should ensure the preemption cannot occur while calling get_paca()
 insdide hard_irq_disable(), otherwise the paca_struct may be the
 wrong one just after. And btw, we may update timing stats in this case.
 
 Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
 ---
 arch/powerpc/kvm/booke.c |2 ++
 1 file changed, 2 insertions(+)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index dcc94f0..9dae25d 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   WARN_ON(local_paca-irq_happened != 0);
 #endif
 
 + preempt_disable();
   /*
* We enter with interrupts disabled in hardware, but
* we need to call hard_irq_disable anyway to ensure that
 @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
 
   /* update before a new last_exit_type is rewritten */
   kvmppc_update_timing_stats(vcpu);
 + preempt_enable();

All of the code here is already called with interrupts disabled. I don't see 
how we could preempt then?


Alex

 
   /* restart interrupts if they were meant for the host */
   kvmppc_restart_interrupt(vcpu, exit_nr);
 -- 
 1.7.9.5
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] KVM: PPC: Add support for multiple-TCE hcalls

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 07:00, Alexey Kardashevskiy wrote:

 On 07/10/2013 03:02 AM, Alexander Graf wrote:
 On 07/06/2013 05:07 PM, Alexey Kardashevskiy wrote:
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.
 
 We don't mention QEMU explicitly in KVM code usually.
 
 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.
 
 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.
 
 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.
 
 Same as above. But really you're only giving recommendations here. What's
 the point? Please describe what the benefit of this patch is, not what some
 other random subsystem might do with the benefits it brings.
 
 
 Signed-off-by: Paul Mackerraspau...@samba.org
 Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru
 
 ---
 Changelog:
 2013/07/06:
 * fixed number of wrong get_page()/put_page() calls
 
 2013/06/27:
 * fixed clear of BUSY bit in kvmppc_lookup_pte()
 * H_PUT_TCE_INDIRECT does realmode_get_page() now
 * KVM_CAP_SPAPR_MULTITCE now depends on CONFIG_PPC_BOOK3S_64
 * updated doc
 
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number
 
 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 
 Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru
 ---
  Documentation/virtual/kvm/api.txt   |  25 +++
  arch/powerpc/include/asm/kvm_host.h |   9 ++
  arch/powerpc/include/asm/kvm_ppc.h  |  16 +-
  arch/powerpc/kvm/book3s_64_vio.c| 154 ++-
  arch/powerpc/kvm/book3s_64_vio_hv.c | 260
 
  arch/powerpc/kvm/book3s_hv.c|  41 -
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +
  arch/powerpc/kvm/book3s_pr_papr.c   |  37 -
  arch/powerpc/kvm/powerpc.c  |   3 +
  9 files changed, 517 insertions(+), 34 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt
 b/Documentation/virtual/kvm/api.txt
 index 6365fef..762c703 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,31 @@ calls by the guest for that service will be passed
 to userspace to be
  handled.
 
 
 +4.86 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability means the kernel is capable of handling hypercalls
 +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
 +space. This significanly accelerates DMA operations for PPC KVM guests.
 
 significanly? Please run this through a spell checker.
 
 +The user space should expect that its handlers for these hypercalls
 
 s/The//
 
 +are not going to be called.
 
 Is user space guaranteed they will not be called? Or can it still happen?
 
 ... if user space previously registered LIOBN in KVM (via
 KVM_CREATE_SPAPR_TCE or similar calls).
 
 ok?

How about this?

The hypercalls mentioned above may or may not be processed successfully in the 
kernel based fast path. If they can not be handled by the kernel, they will get 
passed on to user space. So user space still has to have an implementation for 
these despite the in kernel acceleration.

---

The target audience for this documentation is user space KVM API users. Someone 
developing kvm tool for example. They want to know implications specific CAPs 
have.

 
 There is also KVM_CREATE_SPAPR_TCE_IOMMU but it is not in the kernel yet
 and may never get there.
 
 
 +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
 +the user space might have to advertise it for the guest. For example,
 +IBM pSeries guest starts using them if hcall-multi-tce is present in
 +the ibm,hypertas-functions 

Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 02:06, Scott Wood wrote:

 On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
 On 09.07.2013, at 20:46, Scott Wood wrote:
  I suspect that tlbsx is faster, or at worst similar.  And unlike comparing 
  tlbsx to lwepx (not counting a fix for the threading problem), we don't 
  already have code to search the guest TLB, so testing would be more work.
 We have code to walk the guest TLB for TLB misses. This really is just the 
 TLB miss search without host TLB injection.
 So let's say we're using the shadow TLB. The guest always has its say 64 TLB 
 entries that it can count on - we never evict anything by accident, because 
 we store all of the 64 entries in our guest TLB cache. When the guest faults 
 at an address, the first thing we do is we check the cache whether we have 
 that page already mapped.
 However, with this method we now have 2 enumeration methods for guest TLB 
 searches. We have the tlbsx one which searches the host TLB and we have our 
 guest TLB cache. The guest TLB cache might still contain an entry for an 
 address that we already invalidated on the host. Would that impose a problem?
 I guess not because we're swizzling the exit code around to instead be an 
 instruction miss which means we restore the TLB entry into our host's TLB so 
 that when we resume, we land here and the tlbsx hits. But it feels backwards.
 
 Any better way?  Searching the guest TLB won't work for the LRAT case, so 
 we'd need to have this logic around anyway.  We shouldn't add a second 
 codepath unless it's a clear performance gain -- and again, I suspect it 
 would be the opposite, especially if the entry is not in TLB0 or in one of 
 the first few entries searched in TLB1.  The tlbsx miss case is not what we 
 should optimize for.

Hrm.

So let's redesign this thing theoretically. We would have an exit that requires 
an instruction fetch. We would override kvmppc_get_last_inst() to always do 
kvmppc_ld_inst(). That one can fail because it can't find the TLB entry in the 
host TLB. When it fails, we have to abort the emulation and resume the guest at 
the same IP.

Now the guest gets the TLB miss, we populate, go back into the guest. The guest 
hits the emulation failure again. We go back to kvmppc_ld_inst() which succeeds 
this time and we can emulate the instruction.

I think this works. Just make sure that the gateway to the instruction fetch is 
kvmppc_get_last_inst() and make that failable. Then the difference between 
looking for the TLB entry in the host's TLB or in the guest's TLB cache is 
hopefully negligible.

 
 At least this code has to become something more generic, such as 
 kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu 
 implementation, as it's 100% host mmu specific.
 
 I agree that e500_mmu_host.c is a better place for it (with an ifdef for 
 BOOKEHV), but supporting anything other than instruction fetches could wait 
 until we have a user for it (it means extra code to figure out if permissions 
 are correct).

Works for me, as long as it's either documented through BUG_ON/WARN_ON's or an 
explicit naming convention.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 02:12, Scott Wood wrote:

 On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
 On 28.06.2013, at 11:20, Mihai Caraman wrote:
  +  /* Get page size */
  +  if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
  +  psize_shift = PAGE_SHIFT;
  +  else
  +  psize_shift = MAS1_GET_TSIZE(mas1) + 10;
  +
  +  mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
  +  mfspr(SPRN_MAS3);
  +  addr = (mas7_mas3  (~0ULL  psize_shift)) |
  + (geaddr  ((1ULL  psize_shift) - 1ULL));
  +
  +  /* Map a page and get guest's instruction */
  +  page = pfn_to_page(addr  PAGE_SHIFT);
 While looking at this I just realized that you're missing a check here. What 
 if our IP is in some PCI BAR? Or can't we execute from those?
 
 We at least need to check pfn_valid() first.  That'll just keep us from 
 accessing a bad pointer in the host kernel, though -- it won't make the 
 emulation actually work.  If we need that, we'll probably need to create a 
 temporary TLB entry manually.

ioremap()?

However, if we were walking the guest TLB cache instead we would get a guest 
physical address which we can always resolve to a host virtual address.

I'm not sure how important that whole use case is though. Maybe we should just 
error out to the guest for now.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 00:26, Scott Wood wrote:

 On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
 On 09.07.2013, at 23:54, Scott Wood wrote:
  On 07/09/2013 04:49:32 PM, Alexander Graf wrote:
  Not sure I understand. What the timing stats do is that they measure the 
  time between [exit ... entry], right? We'd do the same thing, just all in 
  C code. That means we would become slightly less accurate, but gain 
  dynamic enabling of the traces and get rid of all the timing stat asm 
  code.
 
  Compile-time enabling bothers me less than a loss of accuracy (not just a 
  small loss by moving into C code, but a potential for a large loss if we 
  overflow the buffer)
 Then don't overflow the buffer. Make it large enough.
 
 How large is that?  Does the tool recognize and report when overflow happens?
 
 How much will the overhead of running some python script on the host, 
 consuming a large volume of data, affect the results?
 
 IIRC ftrace improved recently to dynamically increase the buffer size too.
 Steven, do I remember correctly here?
 
 Yay more complexity.
 
 So now we get to worry about possible memory allocations happening when we 
 try to log something?  Or if there is a way to do an atomic log, we're back 
 to the buffer might be full situation.
 
  and a dependency on a userspace tool
 We already have that for kvm_stat. It's a simple python script - and you 
 surely have python on your rootfs, no?
  (both in terms of the tool needing to be written, and in the hassle of 
  ensuring that it's present in the root filesystem of whatever system I'm 
  testing).  And the whole mechanism will be more complicated.
 It'll also be more flexible at the same time. You could take the logs and 
 actually check what's going on to debug issues that you're encountering for 
 example.
 We could even go as far as sharing the same tool with other architectures, 
 so that we only have to learn how to debug things once.
 
 Have you encountered an actual need for this flexibility, or is it 
 theoretical?

Yeah, first thing I did back then to actually debug kvm failures was to add 
trace points.

 Is there common infrastructure for dealing with measuring intervals and 
 tracking statistics thereof, rather than just tracking points and letting 
 userspace connect the dots (though it could still do that as an option)?  
 Even if it must be done in userspace, it doesn't seem like something that 
 should be KVM-specific.

Would you like to have different ways of measuring mm subsystem overhead? I 
don't :). The same goes for KVM really. If we could converge towards a single 
user space interface to get exit timings, it'd make debugging a lot easier.

We already have this for the debugfs counters btw. And the timing framework 
does break kvm_stat today already, as it emits textual stats rather than 
numbers which all of the other debugfs stats do. But at least I can take the 
x86 kvm_stat tool and run it on ppc just fine to see exit stats.

 
  Lots of debug options are enabled at build time; why must this be 
  different?
 Because I think it's valuable as debug tool for cases where compile time 
 switches are not the best way of debugging things. It's not a high profile 
 thing to tackle for me tbh, but I don't really think working heavily on the 
 timing stat thing is the correct path to walk along.
 
 Adding new exit types isn't working heavily on it.

No, but the fact that the first patch is a fix to add exit stats for exits that 
we missed out before doesn't give me a lot of confidence that lots of people 
use timing stats. And I am always very weary of #ifdef'ed code, as it blows up 
the test matrix heavily.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 01:35, Alexey Kardashevskiy wrote:

 On 07/10/2013 01:35 AM, Alexander Graf wrote:
 On 06/27/2013 07:02 AM, Alexey Kardashevskiy wrote:
 Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru
 ---
  include/uapi/linux/kvm.h |2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 970b1f5..0865c01 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_RTAS 91
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_SPAPR_MULTITCE 93
 +#define KVM_CAP_SPAPR_TCE_IOMMU 94
 
  #ifdef KVM_CAP_IRQ_ROUTING
 
 @@ -923,6 +924,7 @@ struct kvm_s390_ucas_mapping {
  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
  #define KVM_PPC_ALLOCATE_HTAB  _IOWR(KVMIO, 0xa7, __u32)
  #define KVM_CREATE_SPAPR_TCE  _IOW(KVMIO,  0xa8, struct
 kvm_create_spapr_tce)
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xaf, struct
 kvm_create_spapr_tce_iommu)
 
 Please order them by number.
 
 Oh. Again :( We have had this discussion with Scott Wood here already.
 Where _exactly_ do you want me to put it?

8 lines further down. With a comment saying when it's available. Also why is it 
af, not ad?

 Many sections, not really ordered. Thank you.

They should all be ordered inside of their own categories.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-07-10 Thread Benjamin Herrenschmidt
On Wed, 2013-07-10 at 12:33 +0200, Alexander Graf wrote:
 
 It's not exactly obvious that you're calling it with writing == 1 :).
 Can you create a new local variable is_write in the calling
 function, set that to 1 before the call to get_user_pages_fast and
 pass it in instead of the 1? The compiler should easily optimize all
 of that away, but it makes the code by far easier to read.

Ugh ?

Nobody else does that  (look at futex :-)

Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 12:39, Benjamin Herrenschmidt wrote:

 On Wed, 2013-07-10 at 12:33 +0200, Alexander Graf wrote:
 
 It's not exactly obvious that you're calling it with writing == 1 :).
 Can you create a new local variable is_write in the calling
 function, set that to 1 before the call to get_user_pages_fast and
 pass it in instead of the 1? The compiler should easily optimize all
 of that away, but it makes the code by far easier to read.
 
 Ugh ?
 
 Nobody else does that  (look at futex :-)

Yeah, that's fortunately code that I don't have to read :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 12:40, Alexander Graf wrote:

 
 On 10.07.2013, at 12:39, Benjamin Herrenschmidt wrote:
 
 On Wed, 2013-07-10 at 12:33 +0200, Alexander Graf wrote:
 
 It's not exactly obvious that you're calling it with writing == 1 :).
 Can you create a new local variable is_write in the calling
 function, set that to 1 before the call to get_user_pages_fast and
 pass it in instead of the 1? The compiler should easily optimize all
 of that away, but it makes the code by far easier to read.
 
 Ugh ?
 
 Nobody else does that  (look at futex :-)
 
 Yeah, that's fortunately code that I don't have to read :).

The proper alternative would be to pass an enum for read/write into the 
function rather than an int. But that'd be a pretty controversial, big change 
that I'd rather not put on Alexey. With a local variable we're nicely 
self-contained readable ;)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: Book3S HV: Correct tlbie usage

2013-07-10 Thread Alexander Graf

On 08.07.2013, at 12:08, Paul Mackerras wrote:

 This corrects the usage of the tlbie (TLB invalidate entry) instruction
 in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
 On the PPC970, the bit to select large vs. small page is in the instruction,
 not in the RB register value.  This changes the code to use the correct
 form on PPC970.
 
 On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
 field of the RB value incorrectly for 64k pages.  This fixes it.
 
 Since we now have several cases to handle for the tlbie instruction, this
 factors out the code to do a sequence of tlbies into a new function,
 do_tlbies(), and calls that from the various places where the code was
 doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
 use the same global_invalidates() function for determining whether to do
 local or global TLB invalidations as is used in other places, for
 consistency, and also to make sure that kvm-arch.need_tlb_flush gets
 updated properly.
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 Cc: sta...@vger.kernel.org

Thanks, applied both to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

2013-07-10 Thread Alexey Kardashevskiy
On 07/10/2013 08:27 PM, Alexander Graf wrote:
 
 On 10.07.2013, at 01:35, Alexey Kardashevskiy wrote:
 
 On 07/10/2013 01:35 AM, Alexander Graf wrote:
 On 06/27/2013 07:02 AM, Alexey Kardashevskiy wrote:
 Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru
 ---
  include/uapi/linux/kvm.h |2 ++
  1 file changed, 2 insertions(+)

 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 970b1f5..0865c01 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_RTAS 91
  #define KVM_CAP_IRQ_XICS 92
  #define KVM_CAP_SPAPR_MULTITCE 93
 +#define KVM_CAP_SPAPR_TCE_IOMMU 94

  #ifdef KVM_CAP_IRQ_ROUTING

 @@ -923,6 +924,7 @@ struct kvm_s390_ucas_mapping {
  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
  #define KVM_PPC_ALLOCATE_HTAB  _IOWR(KVMIO, 0xa7, __u32)
  #define KVM_CREATE_SPAPR_TCE  _IOW(KVMIO,  0xa8, struct
 kvm_create_spapr_tce)
 +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO,  0xaf, struct
 kvm_create_spapr_tce_iommu)

 Please order them by number.

 Oh. Again :( We have had this discussion with Scott Wood here already.
 Where _exactly_ do you want me to put it?
 
 8 lines further down. With a comment saying when it's available. Also why is 
 it af, not ad?


0xad and 0xae are taken.
Where should I have commented this? In the commit message? Or in the patch
itself?


 Many sections, not really ordered. Thank you.
 
 They should all be ordered inside of their own categories.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-10 Thread Scott Wood

On 07/10/2013 05:23:36 AM, Alexander Graf wrote:


On 10.07.2013, at 00:26, Scott Wood wrote:

 On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
 It'll also be more flexible at the same time. You could take the  
logs and actually check what's going on to debug issues that you're  
encountering for example.
 We could even go as far as sharing the same tool with other  
architectures, so that we only have to learn how to debug things once.


 Have you encountered an actual need for this flexibility, or is it  
theoretical?


Yeah, first thing I did back then to actually debug kvm failures was  
to add trace points.


I meant specifically for handling exit timings this way.

 Is there common infrastructure for dealing with measuring intervals  
and tracking statistics thereof, rather than just tracking points and  
letting userspace connect the dots (though it could still do that as  
an option)?  Even if it must be done in userspace, it doesn't seem  
like something that should be KVM-specific.


Would you like to have different ways of measuring mm subsystem  
overhead? I don't :). The same goes for KVM really. If we could  
converge towards a single user space interface to get exit timings,  
it'd make debugging a lot easier.


I agree -- that's why I said it doesn't seem like something that should  
be KVM-specific.  But that's orthogonal to whether it's done in kernel  
space or user space.  The ability to get begin/end events from  
userspace would be nice when it is specifically requested, but it would  
also be nice if the kernel could track some basic statistics so we  
wouldn't have to ship so much data around to arrive at the same result.


At the very least, I'd like such a tool/infrastructure to exist before  
we start complaining about doing minor maintenance of the current  
mechanism.


We already have this for the debugfs counters btw. And the timing  
framework does break kvm_stat today already, as it emits textual  
stats rather than numbers which all of the other debugfs stats do.  
But at least I can take the x86 kvm_stat tool and run it on ppc just  
fine to see exit stats.


We already have what?  The last two sentences seem contradictory -- can  
you or can't you use kvm_stat as is?  I'm not familiar with kvm_stat.


What does x86 KVM expose in debugfs?

  Lots of debug options are enabled at build time; why must this  
be different?
 Because I think it's valuable as debug tool for cases where  
compile time switches are not the best way of debugging things. It's  
not a high profile thing to tackle for me tbh, but I don't really  
think working heavily on the timing stat thing is the correct path to  
walk along.


 Adding new exit types isn't working heavily on it.

No, but the fact that the first patch is a fix to add exit stats for  
exits that we missed out before doesn't give me a lot of confidence  
that lots of people use timing stats. And I am always very weary of  
#ifdef'ed code, as it blows up the test matrix heavily.


I used it quite a lot when I was doing KVM performance work.  It's just  
been a while since I last did that.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Scott Wood

On 07/10/2013 05:18:10 AM, Alexander Graf wrote:


On 10.07.2013, at 02:12, Scott Wood wrote:

 On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
 On 28.06.2013, at 11:20, Mihai Caraman wrote:
  + /* Get page size */
  + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
  + psize_shift = PAGE_SHIFT;
  + else
  + psize_shift = MAS1_GET_TSIZE(mas1) + 10;
  +
  + mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
  + mfspr(SPRN_MAS3);
  + addr = (mas7_mas3  (~0ULL  psize_shift)) |
  +(geaddr  ((1ULL  psize_shift) - 1ULL));
  +
  + /* Map a page and get guest's instruction */
  + page = pfn_to_page(addr  PAGE_SHIFT);
 While looking at this I just realized that you're missing a check  
here. What if our IP is in some PCI BAR? Or can't we execute from  
those?


 We at least need to check pfn_valid() first.  That'll just keep us  
from accessing a bad pointer in the host kernel, though -- it won't  
make the emulation actually work.  If we need that, we'll probably  
need to create a temporary TLB entry manually.


ioremap()?


That's a bit heavy... also we'd need to deal with cacheability.  This  
code is already engaged in directly creating TLB entries, so it doesn't  
seem like much of a stretch to create one for this.  It should be  
faster than ioremap() or kmap_atomic().


The one complication is allocating the virtual address space, but maybe  
we could just use the page that kmap_atomic would have used?  Of  
course, if we want to handle execution from other than normal kernel  
memory, we'll need to make sure that the virtual address space is  
allocated even when highmem is not present (e.g. 64-bit).


However, if we were walking the guest TLB cache instead we would get  
a guest physical address which we can always resolve to a host  
virtual address.


I'm not sure how important that whole use case is though. Maybe we  
should just error out to the guest for now.


It's not that important, now that we are using hugetlb rather than  
directly mapping a large hunk of reserved memory.  It would be nice to  
handle it though, if we can do without too much hassle.  And I think  
manually creating a TLB entry could be faster than kmap_atomic(), or  
searching the guest TLB and then doing a reverse memslot lookup.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Scott Wood

On 07/10/2013 05:15:09 AM, Alexander Graf wrote:


On 10.07.2013, at 02:06, Scott Wood wrote:

 On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
 On 09.07.2013, at 20:46, Scott Wood wrote:
  I suspect that tlbsx is faster, or at worst similar.  And unlike  
comparing tlbsx to lwepx (not counting a fix for the threading  
problem), we don't already have code to search the guest TLB, so  
testing would be more work.
 We have code to walk the guest TLB for TLB misses. This really is  
just the TLB miss search without host TLB injection.
 So let's say we're using the shadow TLB. The guest always has its  
say 64 TLB entries that it can count on - we never evict anything by  
accident, because we store all of the 64 entries in our guest TLB  
cache. When the guest faults at an address, the first thing we do is  
we check the cache whether we have that page already mapped.
 However, with this method we now have 2 enumeration methods for  
guest TLB searches. We have the tlbsx one which searches the host TLB  
and we have our guest TLB cache. The guest TLB cache might still  
contain an entry for an address that we already invalidated on the  
host. Would that impose a problem?
 I guess not because we're swizzling the exit code around to  
instead be an instruction miss which means we restore the TLB entry  
into our host's TLB so that when we resume, we land here and the  
tlbsx hits. But it feels backwards.


 Any better way?  Searching the guest TLB won't work for the LRAT  
case, so we'd need to have this logic around anyway.  We shouldn't  
add a second codepath unless it's a clear performance gain -- and  
again, I suspect it would be the opposite, especially if the entry is  
not in TLB0 or in one of the first few entries searched in TLB1.  The  
tlbsx miss case is not what we should optimize for.


Hrm.

So let's redesign this thing theoretically. We would have an exit  
that requires an instruction fetch. We would override  
kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can  
fail because it can't find the TLB entry in the host TLB. When it  
fails, we have to abort the emulation and resume the guest at the  
same IP.


Now the guest gets the TLB miss, we populate, go back into the guest.  
The guest hits the emulation failure again. We go back to  
kvmppc_ld_inst() which succeeds this time and we can emulate the  
instruction.


That's pretty much what this patch does, except that it goes  
immediately to the TLB miss code rather than having the extra  
round-trip back to the guest.  Is there any benefit from adding that  
extra round-trip?  Rewriting the exit type instead doesn't seem that  
bad...


I think this works. Just make sure that the gateway to the  
instruction fetch is kvmppc_get_last_inst() and make that failable.  
Then the difference between looking for the TLB entry in the host's  
TLB or in the guest's TLB cache is hopefully negligible.


I don't follow here.  What does this have to do with looking in the  
guest TLB?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-10 Thread Scott Wood

On 07/10/2013 01:02:19 AM, Tiejun Chen wrote:

We should ensure the preemption cannot occur while calling get_paca()
insdide hard_irq_disable(), otherwise the paca_struct may be the
wrong one just after. And btw, we may update timing stats in this  
case.


The soft-ee mechanism depends on accessing the PACA directly via r13 to  
avoid this.  We probably should be using inline asm to read was_enabled  
rather than hoping the compiler doesn't do anything silly.


Plus what Alex said, regarding this patch specifically.

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 20:24, Scott Wood wrote:

 On 07/10/2013 05:23:36 AM, Alexander Graf wrote:
 On 10.07.2013, at 00:26, Scott Wood wrote:
  On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
  It'll also be more flexible at the same time. You could take the logs and 
  actually check what's going on to debug issues that you're encountering 
  for example.
  We could even go as far as sharing the same tool with other 
  architectures, so that we only have to learn how to debug things once.
 
  Have you encountered an actual need for this flexibility, or is it 
  theoretical?
 Yeah, first thing I did back then to actually debug kvm failures was to add 
 trace points.
 
 I meant specifically for handling exit timings this way.

No, but I did encounter the need for debugging exits. And the only thing we 
would need to get exit timing stats once we already have trace points for exit 
types would be to have trace points for guest entry and maybe type specific 
events to indicate what the exit is about as well.

 
  Is there common infrastructure for dealing with measuring intervals and 
  tracking statistics thereof, rather than just tracking points and letting 
  userspace connect the dots (though it could still do that as an option)?  
  Even if it must be done in userspace, it doesn't seem like something that 
  should be KVM-specific.
 Would you like to have different ways of measuring mm subsystem overhead? I 
 don't :). The same goes for KVM really. If we could converge towards a 
 single user space interface to get exit timings, it'd make debugging a lot 
 easier.
 
 I agree -- that's why I said it doesn't seem like something that should be 
 KVM-specific.  But that's orthogonal to whether it's done in kernel space or 
 user space.  The ability to get begin/end events from userspace would be nice 
 when it is specifically requested, but it would also be nice if the kernel 
 could track some basic statistics so we wouldn't have to ship so much data 
 around to arrive at the same result.
 
 At the very least, I'd like such a tool/infrastructure to exist before we 
 start complaining about doing minor maintenance of the current mechanism.

I admit that I don't fully understand qemu/scripts/kvm/kvm_stat, but it seems 
to me as if it already does pretty much what we want. It sets up a filter to 
only get events and their time stamps through.

It does use normal exit trace points on x86 to replace the old debugfs based 
stat counters. And it seems to work reasonably well for that.

 
 We already have this for the debugfs counters btw. And the timing framework 
 does break kvm_stat today already, as it emits textual stats rather than 
 numbers which all of the other debugfs stats do. But at least I can take the 
 x86 kvm_stat tool and run it on ppc just fine to see exit stats.
 
 We already have what?  The last two sentences seem contradictory -- can you 
 or can't you use kvm_stat as is?  I'm not familiar with kvm_stat.

Kvm_stat back in the day used debugfs to give you an idea on what exit event 
happens most often. That mechanism got replaced by trace points later which the 
current kvm_stat uses.

I still have a copy of the old kvm_stat that I always use to get a first 
feeling for what goes wrong if something goes wrong. The original code couldn't 
deal with the fact that we have a debugfs file that contains text though. I 
patched it locally. It also works just fine if you simply disable timing stats, 
since then you won't have the text file.

 What does x86 KVM expose in debugfs?

The same thing it always exposed - exit stats. I am fairly sure Avi wanted to 
completely deprecate that interface in favor of the trace point based approach, 
but I don't think he ever got around to it.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] kvm/ppc/booke: Don't call kvm_guest_enter twice

2013-07-10 Thread Scott Wood
kvm_guest_enter() was already called by kvmppc_prepare_to_enter().
Don't call it again.

Signed-off-by: Scott Wood scottw...@freescale.com
---
 arch/powerpc/kvm/booke.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index aa3bc36..57c1462 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -672,8 +672,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
goto out;
}
 
-   kvm_guest_enter();
-
 #ifdef CONFIG_PPC_FPU
/* Save userspace FPU state in stack */
enable_kernel_fp();
-- 
1.8.1.2


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2013-07-10 Thread Scott Wood
Simplify the handling of lazy EE by going directly from fully-enabled
to hard-disabled.  This replaces the lazy_irq_pending() check
(including its misplaced kvm_guest_exit() call).

As suggested by Tiejun Chen, move the interrupt disabling into
kvmppc_prepare_to_enter() rather than have each caller do it.  Also
move the IRQ enabling on heavyweight exit into
kvmppc_prepare_to_enter().

Signed-off-by: Scott Wood scottw...@freescale.com
---
 arch/powerpc/include/asm/kvm_ppc.h |  6 ++
 arch/powerpc/kvm/book3s_pr.c   | 12 +++-
 arch/powerpc/kvm/booke.c   | 11 +++
 arch/powerpc/kvm/powerpc.c | 23 ++-
 4 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 6885846..e4474f8 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void)
trace_hardirqs_on();
 
 #ifdef CONFIG_PPC64
+   /*
+* To avoid races, the caller must have gone directly from having
+* interrupts fully-enabled to hard-disabled.
+*/
+   WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);
+
/* Only need to enable IRQs by hard enabling them after this */
local_paca-irq_happened = 0;
local_paca-soft_enabled = 1;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index ddfaf56..c13caa0 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -884,14 +884,11 @@ program_interrupt:
 * and if we really did time things so badly, then we just exit
 * again due to a host external interrupt.
 */
-   local_irq_disable();
s = kvmppc_prepare_to_enter(vcpu);
-   if (s = 0) {
-   local_irq_enable();
+   if (s = 0)
r = s;
-   } else {
+   else
kvmppc_fix_ee_before_entry();
-   }
}
 
trace_kvm_book3s_reenter(r, vcpu);
@@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 * really did time things so badly, then we just exit again due to
 * a host external interrupt.
 */
-   local_irq_disable();
ret = kvmppc_prepare_to_enter(vcpu);
-   if (ret = 0) {
-   local_irq_enable();
+   if (ret = 0)
goto out;
-   }
 
/* Save FPU state in stack */
if (current-thread.regs-msr  MSR_FP)
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 6e35351..aa3bc36 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -617,7 +617,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
local_irq_enable();
kvm_vcpu_block(vcpu);
clear_bit(KVM_REQ_UNHALT, vcpu-requests);
-   local_irq_disable();
+   hard_irq_disable();
 
kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
r = 1;
@@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
return -EINVAL;
}
 
-   local_irq_disable();
s = kvmppc_prepare_to_enter(vcpu);
if (s = 0) {
-   local_irq_enable();
ret = s;
goto out;
}
@@ -1162,14 +1160,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * aren't already exiting to userspace for some other reason.
 */
if (!(r  RESUME_HOST)) {
-   local_irq_disable();
s = kvmppc_prepare_to_enter(vcpu);
-   if (s = 0) {
-   local_irq_enable();
+   if (s = 0)
r = (s  2) | RESUME_HOST | (r  RESUME_FLAG_NV);
-   } else {
+   else
kvmppc_fix_ee_before_entry();
-   }
}
 
return r;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4e05f8c..2f7a221 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 {
int r = 1;
 
-   WARN_ON_ONCE(!irqs_disabled());
+   WARN_ON(irqs_disabled());
+   hard_irq_disable();
+
while (true) {
if (need_resched()) {
local_irq_enable();
cond_resched();
-   local_irq_disable();
+   hard_irq_disable();
continue;
}
 
@@ -95,7 +97,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
local_irq_enable();
trace_kvm_check_requests(vcpu);
r = 

Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 20:37, Scott Wood wrote:

 On 07/10/2013 05:18:10 AM, Alexander Graf wrote:
 On 10.07.2013, at 02:12, Scott Wood wrote:
  On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
  On 28.06.2013, at 11:20, Mihai Caraman wrote:
   +   /* Get page size */
   +   if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
   +   psize_shift = PAGE_SHIFT;
   +   else
   +   psize_shift = MAS1_GET_TSIZE(mas1) + 10;
   +
   +   mas7_mas3 = (((u64) mfspr(SPRN_MAS7))  32) |
   +   mfspr(SPRN_MAS3);
   +   addr = (mas7_mas3  (~0ULL  psize_shift)) |
   +  (geaddr  ((1ULL  psize_shift) - 1ULL));
   +
   +   /* Map a page and get guest's instruction */
   +   page = pfn_to_page(addr  PAGE_SHIFT);
  While looking at this I just realized that you're missing a check here. 
  What if our IP is in some PCI BAR? Or can't we execute from those?
 
  We at least need to check pfn_valid() first.  That'll just keep us from 
  accessing a bad pointer in the host kernel, though -- it won't make the 
  emulation actually work.  If we need that, we'll probably need to create a 
  temporary TLB entry manually.
 ioremap()?
 
 That's a bit heavy... also we'd need to deal with cacheability.  This code is 
 already engaged in directly creating TLB entries, so it doesn't seem like 
 much of a stretch to create one for this.  It should be faster than ioremap() 
 or kmap_atomic().

Do we have any guarantees that the TLB entry we create doesn't get evicted by 
another thread by the time we want to use it?


Alex

 The one complication is allocating the virtual address space, but maybe we 
 could just use the page that kmap_atomic would have used?  Of course, if we 
 want to handle execution from other than normal kernel memory, we'll need to 
 make sure that the virtual address space is allocated even when highmem is 
 not present (e.g. 64-bit).
 
 However, if we were walking the guest TLB cache instead we would get a guest 
 physical address which we can always resolve to a host virtual address.
 I'm not sure how important that whole use case is though. Maybe we should 
 just error out to the guest for now.
 
 It's not that important, now that we are using hugetlb rather than directly 
 mapping a large hunk of reserved memory.  It would be nice to handle it 
 though, if we can do without too much hassle.  And I think manually creating 
 a TLB entry could be faster than kmap_atomic(), or searching the guest TLB 
 and then doing a reverse memslot lookup.
 
 -Scott
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] kvm/ppc: Call trace_hardirqs_on before entry

2013-07-10 Thread Scott Wood
Currently this is only being done on 64-bit.  Rather than just move it
out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
consistent with lazy ee state, and so that we don't track more host
code as interrupts-enabled than necessary.

Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
that this function now has a role on 32-bit as well.

Signed-off-by: Scott Wood scottw...@freescale.com
---
 arch/powerpc/include/asm/kvm_ppc.h | 11 ---
 arch/powerpc/kvm/book3s_pr.c   |  4 ++--
 arch/powerpc/kvm/booke.c   |  4 ++--
 arch/powerpc/kvm/powerpc.c |  2 --
 4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index a5287fe..6885846 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
}
 }
 
-/* Please call after prepare_to_enter. This function puts the lazy ee state
-   back to normal mode, without actually enabling interrupts. */
-static inline void kvmppc_lazy_ee_enable(void)
+/*
+ * Please call after prepare_to_enter. This function puts the lazy ee and irq
+ * disabled tracking state back to normal mode, without actually enabling
+ * interrupts.
+ */
+static inline void kvmppc_fix_ee_before_entry(void)
 {
+   trace_hardirqs_on();
+
 #ifdef CONFIG_PPC64
/* Only need to enable IRQs by hard enabling them after this */
local_paca-irq_happened = 0;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 19498a5..ddfaf56 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -890,7 +890,7 @@ program_interrupt:
local_irq_enable();
r = s;
} else {
-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();
}
}
 
@@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
if (vcpu-arch.shared-msr  MSR_FP)
kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 
-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();
 
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index dcc94f0..6e35351 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -698,7 +698,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_load_guest_fp(vcpu);
 #endif
 
-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();
 
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
@@ -1168,7 +1168,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
local_irq_enable();
r = (s  2) | RESUME_HOST | (r  RESUME_FLAG_NV);
} else {
-   kvmppc_lazy_ee_enable();
+   kvmppc_fix_ee_before_entry();
}
}
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6316ee3..4e05f8c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
kvm_guest_exit();
continue;
}
-
-   trace_hardirqs_on();
 #endif
 
kvm_guest_enter();
-- 
1.8.1.2


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 20:42, Scott Wood wrote:

 On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
 On 10.07.2013, at 02:06, Scott Wood wrote:
  On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
  On 09.07.2013, at 20:46, Scott Wood wrote:
   I suspect that tlbsx is faster, or at worst similar.  And unlike 
   comparing tlbsx to lwepx (not counting a fix for the threading 
   problem), we don't already have code to search the guest TLB, so 
   testing would be more work.
  We have code to walk the guest TLB for TLB misses. This really is just 
  the TLB miss search without host TLB injection.
  So let's say we're using the shadow TLB. The guest always has its say 64 
  TLB entries that it can count on - we never evict anything by accident, 
  because we store all of the 64 entries in our guest TLB cache. When the 
  guest faults at an address, the first thing we do is we check the cache 
  whether we have that page already mapped.
  However, with this method we now have 2 enumeration methods for guest TLB 
  searches. We have the tlbsx one which searches the host TLB and we have 
  our guest TLB cache. The guest TLB cache might still contain an entry for 
  an address that we already invalidated on the host. Would that impose a 
  problem?
  I guess not because we're swizzling the exit code around to instead be an 
  instruction miss which means we restore the TLB entry into our host's TLB 
  so that when we resume, we land here and the tlbsx hits. But it feels 
  backwards.
 
  Any better way?  Searching the guest TLB won't work for the LRAT case, so 
  we'd need to have this logic around anyway.  We shouldn't add a second 
  codepath unless it's a clear performance gain -- and again, I suspect it 
  would be the opposite, especially if the entry is not in TLB0 or in one of 
  the first few entries searched in TLB1.  The tlbsx miss case is not what 
  we should optimize for.
 Hrm.
 So let's redesign this thing theoretically. We would have an exit that 
 requires an instruction fetch. We would override kvmppc_get_last_inst() to 
 always do kvmppc_ld_inst(). That one can fail because it can't find the TLB 
 entry in the host TLB. When it fails, we have to abort the emulation and 
 resume the guest at the same IP.
 Now the guest gets the TLB miss, we populate, go back into the guest. The 
 guest hits the emulation failure again. We go back to kvmppc_ld_inst() which 
 succeeds this time and we can emulate the instruction.
 
 That's pretty much what this patch does, except that it goes immediately to 
 the TLB miss code rather than having the extra round-trip back to the guest.  
 Is there any benefit from adding that extra round-trip?  Rewriting the exit 
 type instead doesn't seem that bad...

It's pretty bad. I want to have code that is easy to follow - and I don't care 
whether the very rare case of a TLB entry getting evicted by a random other 
thread right when we execute the exit path is slower by a few percent if we get 
cleaner code for that.

 
 I think this works. Just make sure that the gateway to the instruction fetch 
 is kvmppc_get_last_inst() and make that failable. Then the difference 
 between looking for the TLB entry in the host's TLB or in the guest's TLB 
 cache is hopefully negligible.
 
 I don't follow here.  What does this have to do with looking in the guest TLB?

I want to hide the fact that we're cheating as much as possible, that's it.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2013-07-10 Thread Alexander Graf

On 11.07.2013, at 00:47, Scott Wood wrote:

 Simplify the handling of lazy EE by going directly from fully-enabled
 to hard-disabled.  This replaces the lazy_irq_pending() check
 (including its misplaced kvm_guest_exit() call).
 
 As suggested by Tiejun Chen, move the interrupt disabling into
 kvmppc_prepare_to_enter() rather than have each caller do it.  Also
 move the IRQ enabling on heavyweight exit into
 kvmppc_prepare_to_enter().
 
 Signed-off-by: Scott Wood scottw...@freescale.com

Ben, could you please review this too? :) Thanks a bunch!

 ---
 arch/powerpc/include/asm/kvm_ppc.h |  6 ++
 arch/powerpc/kvm/book3s_pr.c   | 12 +++-
 arch/powerpc/kvm/booke.c   | 11 +++
 arch/powerpc/kvm/powerpc.c | 23 ++-
 4 files changed, 22 insertions(+), 30 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
 b/arch/powerpc/include/asm/kvm_ppc.h
 index 6885846..e4474f8 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void)
   trace_hardirqs_on();
 
 #ifdef CONFIG_PPC64
 + /*
 +  * To avoid races, the caller must have gone directly from having
 +  * interrupts fully-enabled to hard-disabled.
 +  */
 + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);

WARN_ON(lazy_irq_pending()); ?


Alex

 +
   /* Only need to enable IRQs by hard enabling them after this */
   local_paca-irq_happened = 0;
   local_paca-soft_enabled = 1;
 diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
 index ddfaf56..c13caa0 100644
 --- a/arch/powerpc/kvm/book3s_pr.c
 +++ b/arch/powerpc/kvm/book3s_pr.c
 @@ -884,14 +884,11 @@ program_interrupt:
* and if we really did time things so badly, then we just exit
* again due to a host external interrupt.
*/
 - local_irq_disable();
   s = kvmppc_prepare_to_enter(vcpu);
 - if (s = 0) {
 - local_irq_enable();
 + if (s = 0)
   r = s;
 - } else {
 + else
   kvmppc_fix_ee_before_entry();
 - }
   }
 
   trace_kvm_book3s_reenter(r, vcpu);
 @@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
 kvm_vcpu *vcpu)
* really did time things so badly, then we just exit again due to
* a host external interrupt.
*/
 - local_irq_disable();
   ret = kvmppc_prepare_to_enter(vcpu);
 - if (ret = 0) {
 - local_irq_enable();
 + if (ret = 0)
   goto out;
 - }
 
   /* Save FPU state in stack */
   if (current-thread.regs-msr  MSR_FP)
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 6e35351..aa3bc36 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -617,7 +617,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
   local_irq_enable();
   kvm_vcpu_block(vcpu);
   clear_bit(KVM_REQ_UNHALT, vcpu-requests);
 - local_irq_disable();
 + hard_irq_disable();
 
   kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
   r = 1;
 @@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
 kvm_vcpu *vcpu)
   return -EINVAL;
   }
 
 - local_irq_disable();
   s = kvmppc_prepare_to_enter(vcpu);
   if (s = 0) {
 - local_irq_enable();
   ret = s;
   goto out;
   }
 @@ -1162,14 +1160,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
* aren't already exiting to userspace for some other reason.
*/
   if (!(r  RESUME_HOST)) {
 - local_irq_disable();
   s = kvmppc_prepare_to_enter(vcpu);
 - if (s = 0) {
 - local_irq_enable();
 + if (s = 0)
   r = (s  2) | RESUME_HOST | (r  RESUME_FLAG_NV);
 - } else {
 + else
   kvmppc_fix_ee_before_entry();
 - }
   }
 
   return r;
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 4e05f8c..2f7a221 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 {
   int r = 1;
 
 - WARN_ON_ONCE(!irqs_disabled());
 + WARN_ON(irqs_disabled());
 + hard_irq_disable();
 +
   while (true) {
   if (need_resched()) {
   local_irq_enable();
   cond_resched();
 - local_irq_disable();
 + hard_irq_disable();
   continue;
   }
 
 @@ -95,7 +97,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
   local_irq_enable();

Re: [PATCH 0/3] kvm/ppc: fixes/cleanup

2013-07-10 Thread Alexander Graf

On 11.07.2013, at 00:47, Scott Wood wrote:

 These are the remaining patches from the set of 8 that was
 originally posted for 3.10, but which weren't deemed urgent enough
 for that.

Thanks, applied 1/3 and 3/3 to kvm-ppc-queue.


Alex

 
 Scott Wood (3):
  kvm/ppc: Call trace_hardirqs_on before entry
  kvm/ppc: IRQ disabling cleanup
  kvm/ppc/booke: Don't call kvm_guest_enter twice
 
 arch/powerpc/include/asm/kvm_ppc.h | 17 ++---
 arch/powerpc/kvm/book3s_pr.c   | 16 +---
 arch/powerpc/kvm/booke.c   | 17 +
 arch/powerpc/kvm/powerpc.c | 25 ++---
 4 files changed, 34 insertions(+), 41 deletions(-)
 
 -- 
 1.8.1.2
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2013-07-10 Thread Benjamin Herrenschmidt
On Thu, 2013-07-11 at 00:57 +0200, Alexander Graf wrote:
  #ifdef CONFIG_PPC64
  + /*
  +  * To avoid races, the caller must have gone directly from having
  +  * interrupts fully-enabled to hard-disabled.
  +  */
  + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);
 
 WARN_ON(lazy_irq_pending()); ?

Different semantics. What you propose will not catch irq_happened == 0 :-)

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2013-07-10 Thread Alexander Graf

On 11.07.2013, at 01:01, Benjamin Herrenschmidt wrote:

 On Thu, 2013-07-11 at 00:57 +0200, Alexander Graf wrote:
 #ifdef CONFIG_PPC64
 + /*
 +  * To avoid races, the caller must have gone directly from having
 +  * interrupts fully-enabled to hard-disabled.
 +  */
 + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);
 
 WARN_ON(lazy_irq_pending()); ?
 
 Different semantics. What you propose will not catch irq_happened == 0 :-)

Right, but we only ever reach here after hard_irq_disable() I think.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2013-07-10 Thread Scott Wood

On 07/10/2013 06:04:53 PM, Alexander Graf wrote:


On 11.07.2013, at 01:01, Benjamin Herrenschmidt wrote:

 On Thu, 2013-07-11 at 00:57 +0200, Alexander Graf wrote:
 #ifdef CONFIG_PPC64
 + /*
 +  * To avoid races, the caller must have gone directly from  
having

 +  * interrupts fully-enabled to hard-disabled.
 +  */
 + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);

 WARN_ON(lazy_irq_pending()); ?

 Different semantics. What you propose will not catch irq_happened  
== 0 :-)


Right, but we only ever reach here after hard_irq_disable() I think.


And the WARN_ON helps us ensure that it stays that way.

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm/ppc: IRQ disabling cleanup

2013-07-10 Thread Alexander Graf

On 11.07.2013, at 01:08, Scott Wood wrote:

 On 07/10/2013 06:04:53 PM, Alexander Graf wrote:
 On 11.07.2013, at 01:01, Benjamin Herrenschmidt wrote:
  On Thu, 2013-07-11 at 00:57 +0200, Alexander Graf wrote:
  #ifdef CONFIG_PPC64
  + /*
  +  * To avoid races, the caller must have gone directly from having
  +  * interrupts fully-enabled to hard-disabled.
  +  */
  + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);
 
  WARN_ON(lazy_irq_pending()); ?
 
  Different semantics. What you propose will not catch irq_happened == 0 :-)
 Right, but we only ever reach here after hard_irq_disable() I think.
 
 And the WARN_ON helps us ensure that it stays that way.

Heh - ok :). Works for me.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Scott Wood

On 07/10/2013 05:50:01 PM, Alexander Graf wrote:


On 10.07.2013, at 20:42, Scott Wood wrote:

 On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
 On 10.07.2013, at 02:06, Scott Wood wrote:
  On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
  On 09.07.2013, at 20:46, Scott Wood wrote:
   I suspect that tlbsx is faster, or at worst similar.  And  
unlike comparing tlbsx to lwepx (not counting a fix for the threading  
problem), we don't already have code to search the guest TLB, so  
testing would be more work.
  We have code to walk the guest TLB for TLB misses. This really  
is just the TLB miss search without host TLB injection.
  So let's say we're using the shadow TLB. The guest always has  
its say 64 TLB entries that it can count on - we never evict anything  
by accident, because we store all of the 64 entries in our guest TLB  
cache. When the guest faults at an address, the first thing we do is  
we check the cache whether we have that page already mapped.
  However, with this method we now have 2 enumeration methods for  
guest TLB searches. We have the tlbsx one which searches the host TLB  
and we have our guest TLB cache. The guest TLB cache might still  
contain an entry for an address that we already invalidated on the  
host. Would that impose a problem?
  I guess not because we're swizzling the exit code around to  
instead be an instruction miss which means we restore the TLB entry  
into our host's TLB so that when we resume, we land here and the  
tlbsx hits. But it feels backwards.

 
  Any better way?  Searching the guest TLB won't work for the LRAT  
case, so we'd need to have this logic around anyway.  We shouldn't  
add a second codepath unless it's a clear performance gain -- and  
again, I suspect it would be the opposite, especially if the entry is  
not in TLB0 or in one of the first few entries searched in TLB1.  The  
tlbsx miss case is not what we should optimize for.

 Hrm.
 So let's redesign this thing theoretically. We would have an exit  
that requires an instruction fetch. We would override  
kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can  
fail because it can't find the TLB entry in the host TLB. When it  
fails, we have to abort the emulation and resume the guest at the  
same IP.
 Now the guest gets the TLB miss, we populate, go back into the  
guest. The guest hits the emulation failure again. We go back to  
kvmppc_ld_inst() which succeeds this time and we can emulate the  
instruction.


 That's pretty much what this patch does, except that it goes  
immediately to the TLB miss code rather than having the extra  
round-trip back to the guest.  Is there any benefit from adding that  
extra round-trip?  Rewriting the exit type instead doesn't seem that  
bad...


It's pretty bad. I want to have code that is easy to follow - and I  
don't care whether the very rare case of a TLB entry getting evicted  
by a random other thread right when we execute the exit path is  
slower by a few percent if we get cleaner code for that.


I guess I just don't see how this is so much harder to follow than  
returning to guest.  I find it harder to follow the flow when there are  
more round trips to the guest involved.  Treat this as an ITLB miss  
is simpler than, Let this fail, and make sure we retry the trapping  
instruction on failure.  Then, an ITLB miss will happen.


Also note that making kvmppc_get_last_inst() able to fail means  
updating several existing callsites, both for the change in function  
signature and to actually handle failures.


I don't care that deeply either way, it just doesn't seem obviously  
better.


 I think this works. Just make sure that the gateway to the  
instruction fetch is kvmppc_get_last_inst() and make that failable.  
Then the difference between looking for the TLB entry in the host's  
TLB or in the guest's TLB cache is hopefully negligible.


 I don't follow here.  What does this have to do with looking in the  
guest TLB?


I want to hide the fact that we're cheating as much as possible,  
that's it.


How are we cheating, and what specifically are you proposing to do to  
hide that?  How is the guest TLB involved at all in the change you're  
asking for?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-10 Thread Alexander Graf

On 11.07.2013, at 02:15, Scott Wood wrote:

 On 07/10/2013 05:50:01 PM, Alexander Graf wrote:
 On 10.07.2013, at 20:42, Scott Wood wrote:
  On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
  On 10.07.2013, at 02:06, Scott Wood wrote:
   On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
   On 09.07.2013, at 20:46, Scott Wood wrote:
I suspect that tlbsx is faster, or at worst similar.  And unlike 
comparing tlbsx to lwepx (not counting a fix for the threading 
problem), we don't already have code to search the guest TLB, so 
testing would be more work.
   We have code to walk the guest TLB for TLB misses. This really is just 
   the TLB miss search without host TLB injection.
   So let's say we're using the shadow TLB. The guest always has its say 
   64 TLB entries that it can count on - we never evict anything by 
   accident, because we store all of the 64 entries in our guest TLB 
   cache. When the guest faults at an address, the first thing we do is 
   we check the cache whether we have that page already mapped.
   However, with this method we now have 2 enumeration methods for guest 
   TLB searches. We have the tlbsx one which searches the host TLB and we 
   have our guest TLB cache. The guest TLB cache might still contain an 
   entry for an address that we already invalidated on the host. Would 
   that impose a problem?
   I guess not because we're swizzling the exit code around to instead be 
   an instruction miss which means we restore the TLB entry into our 
   host's TLB so that when we resume, we land here and the tlbsx hits. 
   But it feels backwards.
  
   Any better way?  Searching the guest TLB won't work for the LRAT case, 
   so we'd need to have this logic around anyway.  We shouldn't add a 
   second codepath unless it's a clear performance gain -- and again, I 
   suspect it would be the opposite, especially if the entry is not in 
   TLB0 or in one of the first few entries searched in TLB1.  The tlbsx 
   miss case is not what we should optimize for.
  Hrm.
  So let's redesign this thing theoretically. We would have an exit that 
  requires an instruction fetch. We would override kvmppc_get_last_inst() 
  to always do kvmppc_ld_inst(). That one can fail because it can't find 
  the TLB entry in the host TLB. When it fails, we have to abort the 
  emulation and resume the guest at the same IP.
  Now the guest gets the TLB miss, we populate, go back into the guest. The 
  guest hits the emulation failure again. We go back to kvmppc_ld_inst() 
  which succeeds this time and we can emulate the instruction.
 
  That's pretty much what this patch does, except that it goes immediately 
  to the TLB miss code rather than having the extra round-trip back to the 
  guest.  Is there any benefit from adding that extra round-trip?  Rewriting 
  the exit type instead doesn't seem that bad...
 It's pretty bad. I want to have code that is easy to follow - and I don't 
 care whether the very rare case of a TLB entry getting evicted by a random 
 other thread right when we execute the exit path is slower by a few percent 
 if we get cleaner code for that.
 
 I guess I just don't see how this is so much harder to follow than returning 
 to guest.  I find it harder to follow the flow when there are more round 
 trips to the guest involved.  Treat this as an ITLB miss is simpler than, 
 Let this fail, and make sure we retry the trapping instruction on failure.  
 Then, an ITLB miss will happen.
 
 Also note that making kvmppc_get_last_inst() able to fail means updating 
 several existing callsites, both for the change in function signature and to 
 actually handle failures.
 
 I don't care that deeply either way, it just doesn't seem obviously better.
 
  I think this works. Just make sure that the gateway to the instruction 
  fetch is kvmppc_get_last_inst() and make that failable. Then the 
  difference between looking for the TLB entry in the host's TLB or in the 
  guest's TLB cache is hopefully negligible.
 
  I don't follow here.  What does this have to do with looking in the guest 
  TLB?
 I want to hide the fact that we're cheating as much as possible, that's it.
 
 How are we cheating, and what specifically are you proposing to do to hide 
 that?  How is the guest TLB involved at all in the change you're asking for?

It's not involved, but it's basically what we do on book3s pr kvm. There 
kvmppc_ld reads the guest htab, not the host htab. I think it's fine to expose 
both cases as the same thing to the rest of kvm.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-10 Thread tiejun.chen

On 07/10/2013 05:49 PM, Alexander Graf wrote:


On 10.07.2013, at 08:02, Tiejun Chen wrote:


We should ensure the preemption cannot occur while calling get_paca()
insdide hard_irq_disable(), otherwise the paca_struct may be the
wrong one just after. And btw, we may update timing stats in this case.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
---
arch/powerpc/kvm/booke.c |2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index dcc94f0..9dae25d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
WARN_ON(local_paca-irq_happened != 0);
#endif

+   preempt_disable();
/*
 * We enter with interrupts disabled in hardware, but
 * we need to call hard_irq_disable anyway to ensure that
@@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,

/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);
+   preempt_enable();


All of the code here is already called with interrupts disabled. I don't see 
how we could preempt then?


But the kernel still check this in preempt case,

#define get_paca()  ((void) debug_smp_processor_id(), local_paca)

then trigger that known call trace as I observed :)

BUG: using smp_processor_id() in preemptible [] code: 
qemu-system-ppc/2065
caller is .kvmppc_handle_exit+0x48/0x810
CPU: 0 PID: 2065 Comm: qemu-system-ppc Not tainted 3.10.0-172036-ge2daa28-dirty 
#116
Call Trace:
[c001fc637570] [c000835c] .show_stack+0x7c/0x1f0 (unreliable)
[c001fc637640] [c0673a0c] .dump_stack+0x28/0x3c
[c001fc6376b0] [c02f04d8] .debug_smp_processor_id+0x108/0x120
[c001fc637740] [c00444e8] .kvmppc_handle_exit+0x48/0x810
[c001fc6377f0] [c0047c80] .kvmppc_resume_host+0xa4/0xf8

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()

2013-07-10 Thread tiejun.chen

On 07/11/2013 03:15 AM, Scott Wood wrote:

On 07/10/2013 01:02:19 AM, Tiejun Chen wrote:

We should ensure the preemption cannot occur while calling get_paca()
insdide hard_irq_disable(), otherwise the paca_struct may be the
wrong one just after. And btw, we may update timing stats in this case.


The soft-ee mechanism depends on accessing the PACA directly via r13 to avoid
this.  We probably should be using inline asm to read was_enabled rather than


Yes.


hoping the compiler doesn't do anything silly.


Do you recommend I should directly replace get_paca() with local_paca inside 
hard_irq_disable()?


#define hard_irq_disable()  do {\
u8 _was_enabled = get_paca()-soft_enabled; \

-   u8 _was_enabled = local_paca-soft_enabled;

But is this safe for all scenarios?

Tiejun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] KVM: PPC: Add support for multiple-TCE hcalls

2013-07-10 Thread Alexey Kardashevskiy
On 07/10/2013 08:05 PM, Alexander Graf wrote:
 
 On 10.07.2013, at 07:00, Alexey Kardashevskiy wrote:
 
 On 07/10/2013 03:02 AM, Alexander Graf wrote:
 On 07/06/2013 05:07 PM, Alexey Kardashevskiy wrote:
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.

 We don't mention QEMU explicitly in KVM code usually.

 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.

 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.

 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.

 Same as above. But really you're only giving recommendations here. What's
 the point? Please describe what the benefit of this patch is, not what some
 other random subsystem might do with the benefits it brings.


 Signed-off-by: Paul Mackerraspau...@samba.org
 Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru

 ---
 Changelog:
 2013/07/06:
 * fixed number of wrong get_page()/put_page() calls

 2013/06/27:
 * fixed clear of BUSY bit in kvmppc_lookup_pte()
 * H_PUT_TCE_INDIRECT does realmode_get_page() now
 * KVM_CAP_SPAPR_MULTITCE now depends on CONFIG_PPC_BOOK3S_64
 * updated doc

 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number

 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)

 Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru
 ---
  Documentation/virtual/kvm/api.txt   |  25 +++
  arch/powerpc/include/asm/kvm_host.h |   9 ++
  arch/powerpc/include/asm/kvm_ppc.h  |  16 +-
  arch/powerpc/kvm/book3s_64_vio.c| 154 ++-
  arch/powerpc/kvm/book3s_64_vio_hv.c | 260
 
  arch/powerpc/kvm/book3s_hv.c|  41 -
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +
  arch/powerpc/kvm/book3s_pr_papr.c   |  37 -
  arch/powerpc/kvm/powerpc.c  |   3 +
  9 files changed, 517 insertions(+), 34 deletions(-)

 diff --git a/Documentation/virtual/kvm/api.txt
 b/Documentation/virtual/kvm/api.txt
 index 6365fef..762c703 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,31 @@ calls by the guest for that service will be passed
 to userspace to be
  handled.


 +4.86 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability means the kernel is capable of handling hypercalls
 +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
 +space. This significanly accelerates DMA operations for PPC KVM guests.

 significanly? Please run this through a spell checker.

 +The user space should expect that its handlers for these hypercalls

 s/The//

 +are not going to be called.

 Is user space guaranteed they will not be called? Or can it still happen?

 ... if user space previously registered LIOBN in KVM (via
 KVM_CREATE_SPAPR_TCE or similar calls).

 ok?
 
 How about this?
 
 The hypercalls mentioned above may or may not be processed successfully in 
 the kernel based fast path. If they can not be handled by the kernel, they 
 will get passed on to user space. So user space still has to have an 
 implementation for these despite the in kernel acceleration.
 
 ---
 
 The target audience for this documentation is user space KVM API users. 
 Someone developing kvm tool for example. They want to know implications 
 specific CAPs have.
 

 There is also KVM_CREATE_SPAPR_TCE_IOMMU but it is not in the kernel yet
 and may never get there.


 +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
 +the user space might have to advertise it for the guest. For example,
 +IBM pSeries guest starts using them if hcall-multi-tce is present in