from:"Thomas Huth"

Re: [PATCH v2] KVM: PPC: Exit guest upon fatal machine check exception

2015-12-16 Thread Thomas Huth

On 16/12/15 06:56, Aravinda Prasad wrote:
> This patch modifies KVM to cause a guest exit with
> KVM_EXIT_NMI instead of immediately delivering a 0x200
> interrupt to guest upon machine check exception in
> guest address. Exiting the guest enables QEMU to build
> error log and deliver machine check exception to guest
> OS (either via guest OS registered machine check
> handler or via 0x200 guest OS interrupt vector).
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier approach
> of KVM directly invoking 0x200 guest interrupt vector.
> In the earlier approach QEMU patched the 0x200 interrupt
> vector during boot. The patched code at 0x200 issued a
> private hcall to pass the control to QEMU to build the
> error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> This patch also introduces a new KVM capability to
> control how KVM behaves on machine check exception.
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problems if a new kernel/KVM
> is used with an old QEMU for guests that don't issue
> "ibm,nmi-register". As old QEMU does not understand the
> NMI exit type, it treats it as a fatal error. However,
> the guest could have handled the machine check error
> if the exception was delivered to guest's 0x200 interrupt
> vector instead of NMI exit in case of old QEMU.
> 
> Change Log v2:
>   - Added KVM capability
> 
> Signed-off-by: Aravinda Prasad 
> ---
>  arch/powerpc/include/asm/kvm_host.h |1 +
>  arch/powerpc/kernel/asm-offsets.c   |1 +
>  arch/powerpc/kvm/book3s_hv.c|   12 +++---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   37 
> +++
>  arch/powerpc/kvm/powerpc.c  |7 ++
>  include/uapi/linux/kvm.h|1 +
>  6 files changed, 31 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 827a38d..8a652ba 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -243,6 +243,7 @@ struct kvm_arch {
>   int hpt_cma_alloc;
>   struct dentry *debugfs_dir;
>   struct dentry *htab_dentry;
> + u8 fwnmi_enabled;

Here you declare the variable as 8-bits ...

>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>   struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 221d584..6a4e81a 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -506,6 +506,7 @@ int main(void)
>   DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
>   DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
>   DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
> + DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));

... then define an asm-offset for it ...

>   DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
>   DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>   DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..f43c124 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
[...]
> @@ -2381,24 +2377,27 @@ machine_check_realmode:
>   ld  r9, HSTATE_KVM_VCPU(r13)
>   li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>   /*
> -  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -  * machine check interrupt (set HSRR0 to 0x200). And for handled
> -  * errors (no-fatal), just go back to guest execution with current
> -  * HSRR0 instead of exiting guest. This new approach will inject
> -  * machine check to guest for fatal error causing guest to crash.
> -  *
> -  * The old code used to return to host for unhandled errors which
> -  * was causing guest to hang with soft lockups inside guest and
> -  * makes it difficult to recover guest instance.
> +  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest
> +  * by exiting the guest with KVM_EXIT_NMI exit reason (exit
> +  * reason set later based on trap). For handled errors
> +  * (no-fatal), go back to guest execution with current HSRR0
> +  * instead of exiting the guest. This approach will cause
> +  * the guest to exit in case of fatal machine check error.
>

[PATCH] KVM: PPC: Increase memslots to 512

2015-12-09 Thread Thomas Huth

Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.

x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
total). We should do something similar for powerpc, and since we do
not use private slots here, we can set the value to 512 directly.

While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 887c259..2c96a72 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,8 +38,7 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_USER_MEM_SLOTS 32
-#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
+#define KVM_USER_MEM_SLOTS 512
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-09 Thread Thomas Huth

On 09/12/15 04:28, Paul Mackerras wrote:
> On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
>> Only using 32 memslots for KVM on powerpc is way too low, you can
>> nowadays hit this limit quite fast by adding a couple of PCI devices
>> and/or pluggable memory DIMMs to the guest.
>> x86 already increased the limit to 512 in total, to satisfy 256
>> pluggable DIMM slots, 3 private slots and 253 slots for other things
>> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
> 
> I agree with increasing the limit.  Is there a reason we have only 32
> pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
> that limit too?  If so, maybe we should increase the number of memory
> slots to 512?

H ... the comment before the #define in QEMU reads:

/*
 * This defines the maximum number of DIMM slots we can have for sPAPR
 * guest. This is not defined by sPAPR but we are defining it to 32 slots
 * based on default number of slots provided by PowerPC kernel.
 */
#define SPAPR_MAX_RAM_SLOTS 32

So as far as I can see, there's indeed a possibility that we'll
increase this value once the kernel can handle more slots!

So I guess you're right and we should play save and use more slots
right from the start. I'll send a new patch with 512 instead.

>> QEMU, not 256, so we likely do not as much slots as on x86. Thus
> 
> "so we likely do not need as many slots as on x86" would be better
> English.

Oops, I'm sure that was my original intention ... anyway, thanks for
pointing it out, it's always good to get some feedback as a non-native
English speaker.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-07 Thread Thomas Huth

On 20/11/15 09:11, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..3983b87 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>  2:   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> - rlwimi  r5, r4, 1, DAWRX_WT
> + rlwimi  r5, r4, 2, DAWRX_WT
>   clrrdi  r4, r4, 3
>   std r4, VCPU_DAWR(r3)
>   std r5, VCPU_DAWRX(r3)

Ping?

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-11-20 Thread Thomas Huth

In the old DABR register, the BT (Breakpoint Translation) bit
is bit number 61. In the new DAWRX register, the WT (Watchpoint
Translation) bit is bit number 59. So to move the DABR-BT bit
into the position of the DAWRX-WT bit, it has to be shifted by
two, not only by one. This fixes hardware watchpoints in gdb of
older guests that only use the H_SET_DABR/X interface instead
of the new H_SET_MODE interface.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..3983b87 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
 2: rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
-   rlwimi  r5, r4, 1, DAWRX_WT
+   rlwimi  r5, r4, 2, DAWRX_WT
clrrdi  r4, r4, 3
std r4, VCPU_DAWR(r3)
std r5, VCPU_DAWRX(r3)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] KVM: Provide function for VCPU lookup by id

2015-11-19 Thread Thomas Huth

On 19/11/15 09:37, Christian Borntraeger wrote:
> From: David Hildenbrand 
> 
> Let's provide a function to lookup a VCPU by id.
> 
> Reviewed-by: Christian Borntraeger 
> Reviewed-by: Dominik Dingel 
> Signed-off-by: David Hildenbrand 
> Signed-off-by: Christian Borntraeger 
> [split patch from refactoring patch]
> ---
>  include/linux/kvm_host.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5706a21..b9f345f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -460,6 +460,17 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
> *kvm, int i)
>(vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
>idx++)
>  
> +static inline struct kvm_vcpu *kvm_lookup_vcpu(struct kvm *kvm, int id)
> +{
> + struct kvm_vcpu *vcpu;
> + int i;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm)
> + if (vcpu->vcpu_id == id)
> + return vcpu;
> + return NULL;
> +}
> +

Any chance that you name this function differently? Otherwise we've got
two functions that sound very similar and also have similar prototypes:

- kvm_get_vcpu(struct kvm *kvm, int i)
- kvm_lookup_vcpu(struct kvm *kvm, int id)

I'm pretty sure this will cause confusion in the future!
==> Could you maybe name the new function something like
"kvm_lookup_vcpu_by_id" or "kvm_get_vcpu_by_id" instead?

Also a short comment before the function would be nice to make the
reader aware of the difference (e.g. when hot-plugging) between the
vcpu-id and array-id.

Otherwise: +1 for finally having a proper common function for this!

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-11-18 Thread Thomas Huth

On 04/11/15 10:03, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> x86 already increased the limit to 512 in total, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
> QEMU, not 256, so we likely do not as much slots as on x86. Thus
> setting the slot limit to 320 sounds like a good value for the
> time being (until we have some code in the future to resize the
> memslot array dynamically).
> And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/powerpc/include/asm/kvm_host.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 887c259..89d387a 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -38,8 +38,7 @@
>  
>  #define KVM_MAX_VCPUSNR_CPUS
>  #define KVM_MAX_VCORES   NR_CPUS
> -#define KVM_USER_MEM_SLOTS 32
> -#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
> +#define KVM_USER_MEM_SLOTS 320
>  
>  #ifdef CONFIG_KVM_MMIO
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> 

Ping! ... any comments on this one?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-12 Thread Thomas Huth

On 13/11/15 07:26, Aravinda Prasad wrote:
> 
> On Friday 13 November 2015 07:20 AM, David Gibson wrote:
>> On Thu, Nov 12, 2015 at 11:22:29PM +0530, Aravinda Prasad wrote:
[...]
>>> So thinking whether qemu should explicitly enable the new NMI
>>> behavior.
>>
>> So, I think the reasoning above tends towards having qemu control the
>> MC behaviour.  If qemu does nothing, MCs are delivered direct to
>> 0x200, if it enables the new handling, they cause a KVM exit and qemu
>> will deliver the MC.
> 
> This essentially requires qemu to control how KVM behaves as KVM does
> the actual redirection of MC either to guest's 0x200 vector or to exit
> guest. So, if we are running new qemu, then KVM should exit guest and if
> we are running old qemu, KVM should redirect MC to 0x200. Is there any
> way to communicate this to KVM? ioctl?

Simply introduce a KVM capability that can be enabled by userspace.
See kvm_vcpu_ioctl_enable_cap() in arch/powerpc/kvm/powerpc.c.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-11-04 Thread Thomas Huth

Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.
x86 already increased the limit to 512 in total, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
QEMU, not 256, so we likely do not as much slots as on x86. Thus
setting the slot limit to 320 sounds like a good value for the
time being (until we have some code in the future to resize the
memslot array dynamically).
And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 887c259..89d387a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,8 +38,7 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_USER_MEM_SLOTS 32
-#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
+#define KVM_USER_MEM_SLOTS 320
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/1] KVM: PPC: Increase memslots

2015-11-04 Thread Thomas Huth

Now that the patch from Nikunj to support the KVM_CAP_NR_MEMSLOTS
capability on powerpc has been merged to kvm/next, we can/should
increase the amount of memslots on ppc, too.
Since nobody else sent a patch yet (as far as I can see), I'd like
to suggest to increase the slot number to 320 now. Why 320? Well,
x86 uses 509 (+3 internal slots), to give enough space for
256 pluggable DIMMs and 253 other slots (for PCI etc.).
On powerpc, QEMU only supports 32 pluggable DIMMs, so I think we
should be fine by using something around 253 + 32 slots + some few
extra ... so 320 sounds like a good value to me for the time
being (but in the long run, we should really make this dynamically
instead, I think).

Thomas Huth (1):
  KVM: PPC: Increase memslots to 320

 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-unit-tests PATCH 00/14] ppc64: initial drop

2015-11-02 Thread Thomas Huth

On 03/08/15 16:41, Andrew Jones wrote:
> This series is the first series of a series of series that will
> bring support to kvm-unit-tests for ppc64, and eventually ppc64le.

 Hi Andrew,

may I ask about the current state of ppc64 support in the
kvm-unit-tests? Is there a newer version available than the one you
posted three months ago?

 Thanks,
  Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails

2015-11-02 Thread Thomas Huth

On 27/10/15 06:13, Paul Mackerras wrote:
> When handling a hypervisor data or instruction storage interrupt (HDSI
> or HISI), we look up the SLB entry for the address being accessed in
> order to translate the effective address to a virtual address which can
> be looked up in the guest HPT.  This lookup can occasionally fail due
> to the guest replacing an SLB entry without invalidating the evicted
> SLB entry.  In this situation an ERAT (effective to real address
> translation cache) entry can persist and be used by the hardware even
> though there is no longer a corresponding SLB entry.
> 
> Previously we would just deliver a data or instruction storage interrupt
> (DSI or ISI) to the guest in this case.  However, this is not correct
> and has been observed to cause guests to crash, typically with a
> data storage protection interrupt on a store to the vmemmap area.
> 
> Instead, what we do now is to synthesize a data or instruction segment
> interrupt.  That should cause the guest to reload an appropriate entry
> into the SLB and retry the faulting instruction.  If it still faults,
> we should find an appropriate SLB entry next time and be able to handle
> the fault.
> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b1dab8d..3c6badc 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1749,7 +1749,8 @@ kvmppc_hdsi:
>   beq 3f
>   clrrdi  r0, r4, 28
>   PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
> - bne 1f  /* if no SLB entry found */
> + li  r0, BOOK3S_INTERRUPT_DATA_SEGMENT
> + bne 7f  /* if no SLB entry found */
>  4:   std r4, VCPU_FAULT_DAR(r9)
>   stw r6, VCPU_FAULT_DSISR(r9)
>  
> @@ -1768,14 +1769,15 @@ kvmppc_hdsi:
>   cmpdi   r3, -2  /* MMIO emulation; need instr word */
>   beq 2f
>  
> - /* Synthesize a DSI for the guest */
> + /* Synthesize a DSI (or DSegI) for the guest */
>   ld  r4, VCPU_FAULT_DAR(r9)
>   mr  r6, r3
> -1:   mtspr   SPRN_DAR, r4
> +1:   li  r0, BOOK3S_INTERRUPT_DATA_STORAGE
>   mtspr   SPRN_DSISR, r6
> +7:   mtspr   SPRN_DAR, r4

I first though "hey, you're not setting DSISR for the DsegI now" ... but
then I saw in the PowerISA that DSISR is "undefined" for the DSegI, so
this should be fine, I think.

>   mtspr   SPRN_SRR0, r10
>   mtspr   SPRN_SRR1, r11
> - li  r10, BOOK3S_INTERRUPT_DATA_STORAGE
> + mr  r10, r0
>   bl  kvmppc_msr_interrupt
>  fast_interrupt_c_return:
>  6:   ld  r7, VCPU_CTR(r9)
> @@ -1823,7 +1825,8 @@ kvmppc_hisi:
>   beq 3f
>   clrrdi  r0, r10, 28
>   PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
> - bne 1f  /* if no SLB entry found */
> + li  r0, BOOK3S_INTERRUPT_INST_SEGMENT
> + bne 7f  /* if no SLB entry found */
>  4:
>   /* Search the hash table. */
>   mr  r3, r9  /* vcpu pointer */
> @@ -1840,11 +1843,12 @@ kvmppc_hisi:
>   cmpdi   r3, -1  /* handle in kernel mode */
>   beq guest_exit_cont
>  
> - /* Synthesize an ISI for the guest */
> + /* Synthesize an ISI (or ISegI) for the guest */
>   mr  r11, r3
> -1:   mtspr   SPRN_SRR0, r10
> +1:   li  r0, BOOK3S_INTERRUPT_INST_STORAGE
> +7:   mtspr   SPRN_SRR0, r10
>   mtspr   SPRN_SRR1, r11
> - li  r10, BOOK3S_INTERRUPT_INST_STORAGE
> + mr  r10, r0
>   bl  kvmppc_msr_interrupt
>   b   fast_interrupt_c_return

As far as I can see (but I'm really not an expert here), the patch seems
to be fine. And we have a test running with it for almost 6 days now and
did not see any further guest crashes, so it seems to me like this is
the right fix for this issue! So if you like, you can add my
"Reviewed-by" or "Tested-by" to this patch.

 Thanks a lot for fixing this!
  Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-27 Thread Thomas Huth

On 26/10/15 06:15, Paul Mackerras wrote:
> On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
>> Yes, we'll likely need this soon! 32 slots are not enough...
> 
> Would anyone object if I raised the limit for PPC to 512 slots?

In the long run we should really make this somehow dynamically instead.
But as a first step, that should IMHO be fine. Question is whether we
immediately need 512 on PPC, too? QEMU for x86 features 256 pluggable
memory DIMM slots ("#define ACPI_MAX_RAM_SLOTS 256"), while PPC only has
32 ("#define SPAPR_MAX_RAM_SLOTS 32"). So maybe we are fine with less
memory slots on PPC, e.g. 300 or so?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-15 Thread Thomas Huth

On 16/10/15 06:57, Nikunj A Dadhania wrote:
> QEMU assumes 32 memslots if this extension is not implemented. Although,
> current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS
> changes QEMU would take a wrong value.
> 
> Signed-off-by: Nikunj A Dadhania 
> ---
>  arch/powerpc/kvm/powerpc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2e51289..6fd2405 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -559,6 +559,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   else
>   r = num_online_cpus();
>   break;
> + case KVM_CAP_NR_MEMSLOTS:
> + r = KVM_USER_MEM_SLOTS;
> + break;
>   case KVM_CAP_MAX_VCPUS:
>   r = KVM_MAX_VCPUS;
>   break;

Yes, we'll likely need this soon! 32 slots are not enough...

Reviewed-by: Thomas Huth 

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v4] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-21 Thread Thomas Huth

On 21/09/15 10:01, Greg Kurz wrote:
> On Mon, 21 Sep 2015 12:10:00 +1000
> David Gibson  wrote:
> 
>> On Fri, Sep 18, 2015 at 11:05:52AM +0200, Greg Kurz wrote:
>>> On Thu, 17 Sep 2015 10:49:41 +0200
>>> Thomas Huth  wrote:
>>>
>>>> The PAPR interface defines a hypercall to pass high-quality
>>>> hardware generated random numbers to guests. Recent kernels can
>>>> already provide this hypercall to the guest if the right hardware
>>>> random number generator is available. But in case the user wants
>>>> to use another source like EGD, or QEMU is running with an older
>>>> kernel, we should also have this call in QEMU, so that guests that
>>>> do not support virtio-rng yet can get good random numbers, too.
>>>>
>>>> This patch now adds a new pseudo-device to QEMU that either
>>>> directly provides this hypercall to the guest or is able to
>>>> enable the in-kernel hypercall if available.
...
>>> It is a good thing that the user can choose between in-kernel and backend,
>>> and this patch does the work.
>>>
>>> This being said, I am not sure about the use case where a user has a hwrng
>>> capable platform and wants to run guests without any hwrng support at all is
>>> an appropriate default behavior... I guess we will find more users that want
>>> in-kernel being the default if it is available.
>>>
>>> The patch below modifies yours to do just this: the pseudo-device is only
>>> created if hwrng is present and not already created.
>>
>> I have mixed feelings about this.  On the one hand, I agree that it
>> would be nice to allow H_RANDOM support by default.  On the other hand
>> the patch below leaves no way to turn it off for testing purposes.  It
>> also adds another place where the guest hardware depends on the host
>> configuration, which adds to the already substantial mess of ensuring
>> that source and destination hardware configuration matches for
>> migration.
> 
> Yeah, describing the guest hw is really essential for migration... this
> is best addressed at the libvirt level with a full XML description of
> the machine... but FWIW if we are talking about running pseries on a
> POWER8 or newer host, I am not aware about "hwrng-less" boards... but
> I am probably missing something :)

Maybe it would be at least ok to enable it by default as long as
"-nodefaults" has not been specified as command line option?

> Back to Thomas' patch, it does the job and brings H_RANDOM, which is
> currently missing.
> 
> Acked-by: Greg Kurz 
> 
> I could test both use-kvm and backend flavors (including migration).
> 
> Tested-by: Greg Kurz 

Thanks!

 Thomas




signature.asc
Description: OpenPGP digital signature

Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-20 Thread Thomas Huth

On 21/09/15 03:37, David Gibson wrote:
> On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote:
>> Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
>> functions) has to be protected via the kvm->srcu lock.
>> The kvmppc_h_logical_ci_load() and -store() functions are missing
>> this lock so far, so let's add it there, too.
>> This fixes the problem that the kernel reports "suspicious RCU usage"
>> when lock debugging is enabled.
>>
>> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
>> Signed-off-by: Thomas Huth 
> 
> Nice catch.  Looks like I missed this because the places
> kvm_io_bus_{read,write}() are called on x86 are buried about 5 layers
> below where the srcu lock is taken :/.

AFAIK the philosophy for taking the srcu lock is completely different
between powerpc and x86. On powerpc it is only taken when needed (and
released immediately afterwards), while the x86 code tries to hold it
the whole time while not being in the guest and not being in userspace.
See vcpu_enter_guest() in the x86 code for example, the lock is dropped
before entering the guest, and taken again before leaving this function.

 Thomas

signature.asc
Description: OpenPGP digital signature

Re: [PATCH v4] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-20 Thread Thomas Huth

On 21/09/15 04:10, David Gibson wrote:
> On Fri, Sep 18, 2015 at 11:05:52AM +0200, Greg Kurz wrote:
>> On Thu, 17 Sep 2015 10:49:41 +0200
>> Thomas Huth  wrote:
>>
>>> The PAPR interface defines a hypercall to pass high-quality
>>> hardware generated random numbers to guests. Recent kernels can
>>> already provide this hypercall to the guest if the right hardware
>>> random number generator is available. But in case the user wants
>>> to use another source like EGD, or QEMU is running with an older
>>> kernel, we should also have this call in QEMU, so that guests that
>>> do not support virtio-rng yet can get good random numbers, too.
>>>
>>> This patch now adds a new pseudo-device to QEMU that either
>>> directly provides this hypercall to the guest or is able to
>>> enable the in-kernel hypercall if available. The in-kernel
>>> hypercall can be enabled with the use-kvm property, e.g.:
>>>
>>>  qemu-system-ppc64 -device spapr-rng,use-kvm=true
>>>
>>> For handling the hypercall in QEMU instead, a "RngBackend" is
>>> required since the hypercall should provide "good" random data
>>> instead of pseudo-random (like from a "simple" library function
>>> like rand() or g_random_int()). Since there are multiple RngBackends
>>> available, the user must select an appropriate back-end via the
>>> "rng" property of the device, e.g.:
>>>
>>>  qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \
>>>-device spapr-rng,rng=gid0 ...
>>>
>>> See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
>>> other example of specifying RngBackends.
>>>
>>> Signed-off-by: Thomas Huth 
>>> ---
>>
>> It is a good thing that the user can choose between in-kernel and backend,
>> and this patch does the work.
>>
>> This being said, I am not sure about the use case where a user has a hwrng
>> capable platform and wants to run guests without any hwrng support at all is
>> an appropriate default behavior... I guess we will find more users that want
>> in-kernel being the default if it is available.
>>
>> The patch below modifies yours to do just this: the pseudo-device is only
>> created if hwrng is present and not already created.
> 
> I have mixed feelings about this.  On the one hand, I agree that it
> would be nice to allow H_RANDOM support by default.  On the other hand
> the patch below leaves no way to turn it off for testing purposes.  It
> also adds another place where the guest hardware depends on the host
> configuration, which adds to the already substantial mess of ensuring
> that source and destination hardware configuration matches for
> migration.

I thought about this question on the weekend and came to the same
conclusion. I think if we want to enable this by default, it likely
should rather be done at the libvirt level instead?

 Thomas





signature.asc
Description: OpenPGP digital signature

Re: suspicious RCU usage with kvm_pr

2015-09-18 Thread Thomas Huth

On 16/09/15 12:59, Denis Kirjanov wrote:
> On 9/16/15, Thomas Huth  wrote:
>> On 16/09/15 10:51, Denis Kirjanov wrote:
>>> Hi,
>>>
>>> I see the following trace on qemu startup (ps700 blade):
>>>
>>> v4.2-11169-g64d1def
>>>
>>>
>>> [  143.369638] ===
>>> [  143.369640] [ INFO: suspicious RCU usage. ]
>>> [  143.369643] 4.2.0-11169-g64d1def #10 Tainted: G S
>>> [  143.369645] ---
>>> [  143.369647] arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:3310
>>> suspicious rcu_dereference_check() usage!
>>> [  143.369649]
>>> other info that might help us debug this:
>>>
>>> [  143.369652]
>>> rcu_scheduler_active = 1, debug_locks = 1
>>> [  143.369655] 1 lock held by qemu-system-ppc/2292:
>>> [  143.369656]  #0:  (&vcpu->mutex){+.+.+.}, at: []
>>> .vcpu_load+0x2c/0xb0 [kvm]
>>> [  143.369672]
>>> stack backtrace:
>>> [  143.369675] CPU: 12 PID: 2292 Comm: qemu-system-ppc Tainted: G S
>>>   4.2.0-11169-g64d1def #10
>>> [  143.369677] Call Trace:
>>> [  143.369682] [c001d08bf200] [c0816dd0]
>>> .dump_stack+0x98/0xd4 (unreliable)
>>> [  143.369687] [c001d08bf280] [c00f7058]
>>> .lockdep_rcu_suspicious+0x108/0x170
>>> [  143.369696] [c001d08bf310] [d42296d8]
>>> .kvm_io_bus_read+0x1d8/0x220 [kvm]
>>> [  143.369705] [c001d08bf3c0] [d422f980]
>>> .kvmppc_h_logical_ci_load+0x60/0xe0 [kvm]
>>
>> Could it be that we need to srcu_read_lock(&vcpu->kvm->srcu) before
>> calling the kvm_io_bus_read/write() function in the
>> kvmppc_h_logical_ci_load/store() function?
> 
> I haven't had time to dig into this. I'll try it.

FYI, I had the same problem with kvm_hv, so I tried to come up with a patch:

https://patchwork.ozlabs.org/patch/519143/

Sorry, forgot to CC: you there, but it would be great if you could give
it a try!

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

2015-09-18 Thread Thomas Huth

On 18/09/15 08:57, Paul Mackerras wrote:
> This fixes a bug which results in stale vcore pointers being left in
> the per-cpu preempted vcore lists when a VM is destroyed.  The result
> of the stale vcore pointers is usually either a crash or a lockup
> inside collect_piggybacks() when another VM is run.  A typical
> lockup message looks like:
> 
> [  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
> [qemu-system-ppc:7039]
> [  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM 
> ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT 
> nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
> ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
> nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
> nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp 
> rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon 
> i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio 
> dm_multipath [last unloaded: kvm_hv]
> [  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ 
> #49
> [  472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: 
> c01e41bfc000
> [  472.162262] NIP: c096b094 LR: c096b08c CTR: 
> c030
> [  472.162337] REGS: c01e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
> [  472.162399] MSR: 90019033   CR: 24848844  
> XER: 
> [  472.162588] CFAR: c096b0ac SOFTE: 1
> GPR00: c070 c01e41bff7a0 c127df00 0001
> GPR04: 0003 0001  00874821
> GPR08: c01e41bff8e0 0001  defde740
> GPR12: c030 cfdae400
> [  472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130
> [  472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130
> [  472.163179] Call Trace:
> [  472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 
> (unreliable)
> [  472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90
> [  472.163375] [c01e41bff830] [defd6fc0] 
> kvmppc_run_core+0x1240/0x1950 [kvm_hv]
> [  472.163465] [c01e41bffa30] [defd8510] 
> kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
> [  472.163559] [c01e41bffb70] [de9318a4] 
> kvmppc_vcpu_run+0x44/0x60 [kvm]
> [  472.163653] [c01e41bffba0] [de92e674] 
> kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
> [  472.163745] [c01e41bffbe0] [de9263a8] 
> kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
> [  472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0
> [  472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0
> [  472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0
> [  472.164060] Instruction dump:
> [  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 
> 8bad02e2
> [  472.164224] 7fc3f378 4b6a57c1 6000 7c210b78  89290009 
> 792affe3 40820070
> 
> The bug is that kvmppc_run_vcpu does not correctly handle the case
> where a vcpu task receives a signal while its guest vcpu is executing
> in the guest as a result of being piggy-backed onto the execution of
> another vcore.  In that case we need to wait for the vcpu to finish
> executing inside the guest, and then remove this vcore from the
> preempted vcores list.  That way, we avoid leaving this vcpu's vcore
> on the preempted vcores list when the vcpu gets interrupted.
> 
> Fixes: ec2571650826
> Reported-by: Thomas Huth 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 9754e68..2280497 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
> struct kvm_vcpu *vcpu)
>  
>   while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
>  (vc->vcore_state == VCORE_RUNNING ||
> - vc->vcore_state == VCORE_EXITING))
> + vc->vcore_state == VCORE_EXITING ||
> + vc->vcore_state == VCORE_PIGGYBACK))
>   kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE);
>  
> + if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL)
> + kvmppc_vcore_end_preempt(vc);
> +
>   if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) {
>   kvmppc_remove_runnable(vc, vcpu);
>   vcpu->stat.signal_exits++;

Great, this fixes the crash for me! Thanks!

Tested-by: Thomas Huth 


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-17 Thread Thomas Huth

Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
functions) has to be protected via the kvm->srcu lock.
The kvmppc_h_logical_ci_load() and -store() functions are missing
this lock so far, so let's add it there, too.
This fixes the problem that the kernel reports "suspicious RCU usage"
when lock debugging is enabled.

Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
Signed-off-by: Thomas Huth 
---
 arch/powerpc/kvm/book3s.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index d75bf32..096e5eb 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -828,12 +828,15 @@ int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
unsigned long size = kvmppc_get_gpr(vcpu, 4);
unsigned long addr = kvmppc_get_gpr(vcpu, 5);
u64 buf;
+   int srcu_idx;
int ret;
 
if (!is_power_of_2(size) || (size > sizeof(buf)))
return H_TOO_HARD;
 
+   srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf);
+   srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
if (ret != 0)
return H_TOO_HARD;
 
@@ -868,6 +871,7 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
unsigned long addr = kvmppc_get_gpr(vcpu, 5);
unsigned long val = kvmppc_get_gpr(vcpu, 6);
u64 buf;
+   int srcu_idx;
int ret;
 
switch (size) {
@@ -891,7 +895,9 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
return H_TOO_HARD;
}
 
+   srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, &buf);
+   srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
if (ret != 0)
return H_TOO_HARD;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-17 Thread Thomas Huth

The PAPR interface defines a hypercall to pass high-quality
hardware generated random numbers to guests. Recent kernels can
already provide this hypercall to the guest if the right hardware
random number generator is available. But in case the user wants
to use another source like EGD, or QEMU is running with an older
kernel, we should also have this call in QEMU, so that guests that
do not support virtio-rng yet can get good random numbers, too.

This patch now adds a new pseudo-device to QEMU that either
directly provides this hypercall to the guest or is able to
enable the in-kernel hypercall if available. The in-kernel
hypercall can be enabled with the use-kvm property, e.g.:

 qemu-system-ppc64 -device spapr-rng,use-kvm=true

For handling the hypercall in QEMU instead, a "RngBackend" is
required since the hypercall should provide "good" random data
instead of pseudo-random (like from a "simple" library function
like rand() or g_random_int()). Since there are multiple RngBackends
available, the user must select an appropriate back-end via the
"rng" property of the device, e.g.:

 qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \
   -device spapr-rng,rng=gid0 ...

See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
other example of specifying RngBackends.

Signed-off-by: Thomas Huth 
---
 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/spapr.c |   8 +++
 hw/ppc/spapr_rng.c | 186 +
 include/hw/ppc/spapr.h |   4 ++
 target-ppc/kvm.c   |   9 +++
 target-ppc/kvm_ppc.h   |   5 ++
 6 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_rng.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..c1ffc77 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf0c64f..34e7d24 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -768,6 +768,14 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
 exit(1);
 }
 
+if (object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
+ret = spapr_rng_populate_dt(fdt);
+if (ret < 0) {
+fprintf(stderr, "could not set up rng device in the fdt\n");
+exit(1);
+}
+}
+
 QLIST_FOREACH(phb, &spapr->phbs, list) {
 ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
 }
diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
new file mode 100644
index 000..ed43d5e
--- /dev/null
+++ b/hw/ppc/spapr_rng.c
@@ -0,0 +1,186 @@
+/*
+ * QEMU sPAPR random number generator "device" for H_RANDOM hypercall
+ *
+ * Copyright 2015 Thomas Huth, Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/error-report.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/device_tree.h"
+#include "sysemu/rng.h"
+#include "hw/ppc/spapr.h"
+#include "kvm_ppc.h"
+
+#define SPAPR_RNG(obj) \
+OBJECT_CHECK(sPAPRRngState, (obj), TYPE_SPAPR_RNG)
+
+struct sPAPRRngState {
+/*< private >*/
+DeviceState ds;
+RngBackend *backend;
+bool use_kvm;
+};
+typedef struct sPAPRRngState sPAPRRngState;
+
+struct HRandomData {
+QemuSemaphore sem;
+union {
+uint64_t v64;
+uint8_t v8[8];
+} val;
+int received;
+};
+typedef struct HRandomData HRandomData;
+
+/* Callback function for the RngBackend */
+static void random_recv(void *dest, const void *src, size_t size)
+{
+HRandomData *hrdp = dest;
+
+if (src && size > 0) {
+assert(size + hrdp->received <= sizeof(hrdp->val.v8));
+memcpy(&hrdp->val.v8[hrdp->received], src, size);
+hrdp->received += size;
+}
+
+qemu_sem_post(&hrdp->sem);
+}
+
+/* Handler for the H_RANDOM hypercall */
+

Re: suspicious RCU usage with kvm_pr

2015-09-16 Thread Thomas Huth

On 16/09/15 10:51, Denis Kirjanov wrote:
> Hi,
> 
> I see the following trace on qemu startup (ps700 blade):
> 
> v4.2-11169-g64d1def
> 
> 
> [  143.369638] ===
> [  143.369640] [ INFO: suspicious RCU usage. ]
> [  143.369643] 4.2.0-11169-g64d1def #10 Tainted: G S
> [  143.369645] ---
> [  143.369647] arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:3310
> suspicious rcu_dereference_check() usage!
> [  143.369649]
> other info that might help us debug this:
> 
> [  143.369652]
> rcu_scheduler_active = 1, debug_locks = 1
> [  143.369655] 1 lock held by qemu-system-ppc/2292:
> [  143.369656]  #0:  (&vcpu->mutex){+.+.+.}, at: []
> .vcpu_load+0x2c/0xb0 [kvm]
> [  143.369672]
> stack backtrace:
> [  143.369675] CPU: 12 PID: 2292 Comm: qemu-system-ppc Tainted: G S
>   4.2.0-11169-g64d1def #10
> [  143.369677] Call Trace:
> [  143.369682] [c001d08bf200] [c0816dd0]
> .dump_stack+0x98/0xd4 (unreliable)
> [  143.369687] [c001d08bf280] [c00f7058]
> .lockdep_rcu_suspicious+0x108/0x170
> [  143.369696] [c001d08bf310] [d42296d8]
> .kvm_io_bus_read+0x1d8/0x220 [kvm]
> [  143.369705] [c001d08bf3c0] [d422f980]
> .kvmppc_h_logical_ci_load+0x60/0xe0 [kvm]

Could it be that we need to srcu_read_lock(&vcpu->kvm->srcu) before
calling the kvm_io_bus_read/write() function in the
kvmppc_h_logical_ci_load/store() function?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-13 Thread Thomas Huth

On 14/09/15 04:15, David Gibson wrote:
> On Fri, Sep 11, 2015 at 11:17:01AM +0200, Thomas Huth wrote:
>> The PAPR interface defines a hypercall to pass high-quality
>> hardware generated random numbers to guests. Recent kernels can
>> already provide this hypercall to the guest if the right hardware
>> random number generator is available. But in case the user wants
>> to use another source like EGD, or QEMU is running with an older
>> kernel, we should also have this call in QEMU, so that guests that
>> do not support virtio-rng yet can get good random numbers, too.
>>
>> This patch now adds a new pseude-device to QEMU that either
>> directly provides this hypercall to the guest or is able to
>> enable the in-kernel hypercall if available. The in-kernel
>> hypercall can be enabled with the use-kvm property, e.g.:
>>
>>  qemu-system-ppc64 -device spapr-rng,use-kvm=true
>>
>> For handling the hypercall in QEMU instead, a RngBackend is required
>> since the hypercall should provide "good" random data instead of
>> pseudo-random (like from a "simple" library function like rand()
>> or g_random_int()). Since there are multiple RngBackends available,
>> the user must select an appropriate backend via the "backend"
>> property of the device, e.g.:
>>
>>  qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=rng0 \
>>-device spapr-rng,backend=rng0 ...
>>
>> See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
>> other example of specifying RngBackends.
...
>> +
>> +#include "qemu/error-report.h"
>> +#include "sysemu/sysemu.h"
>> +#include "sysemu/device_tree.h"
>> +#include "sysemu/rng.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "kvm_ppc.h"
>> +
>> +#define SPAPR_RNG(obj) \
>> +OBJECT_CHECK(sPAPRRngState, (obj), TYPE_SPAPR_RNG)
>> +
>> +typedef struct sPAPRRngState {
>> +/*< private >*/
>> +DeviceState ds;
>> +RngBackend *backend;
>> +bool use_kvm;
>> +} sPAPRRngState;
>> +
>> +typedef struct HRandomData {
>> +QemuSemaphore sem;
>> +union {
>> +uint64_t v64;
>> +uint8_t v8[8];
>> +} val;
>> +int received;
>> +} HRandomData;
>> +
>> +/* Callback function for the RngBackend */
>> +static void random_recv(void *dest, const void *src, size_t size)
>> +{
>> +HRandomData *hrdp = dest;
>> +
>> +if (src && size > 0) {
>> +assert(size + hrdp->received <= sizeof(hrdp->val.v8));
>> +memcpy(&hrdp->val.v8[hrdp->received], src, size);
>> +hrdp->received += size;
>> +}
>> +
>> +qemu_sem_post(&hrdp->sem);
> 
> I'm assuming qemu_sem_post() includes the necessary memory barrier to
> make sure the requesting thread actually sees the data.

Not sure whether I fully got your point here... both callback function
and main thread are calling an extern C-function, so the compiler should
not assume that the memory stays the same in the main thread...?

Anyway, I've tested the hypercall by implementing it in SLOF and calling
it a couple of times there to see that all bits in the result behave
randomly, so for me this is working fine.

>> +}
>> +
>> +/* Handler for the H_RANDOM hypercall */
>> +static target_ulong h_random(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>> + target_ulong opcode, target_ulong *args)
>> +{
>> +sPAPRRngState *rngstate;
>> +HRandomData hrdata;
>> +
>> +rngstate = SPAPR_RNG(object_resolve_path_type("", TYPE_SPAPR_RNG, 
>> NULL));
>> +
>> +if (!rngstate || !rngstate->backend) {
>> +return H_HARDWARE;
>> +}
>> +
>> +qemu_sem_init(&hrdata.sem, 0);
>> +hrdata.val.v64 = 0;
>> +hrdata.received = 0;
>> +
>> +qemu_mutex_unlock_iothread();
>> +while (hrdata.received < 8) {
>> +rng_backend_request_entropy(rngstate->backend, 8 - hrdata.received,
>> +random_recv, &hrdata);
>> +qemu_sem_wait(&hrdata.sem);
>> +}
>> +qemu_mutex_lock_iothread();
>> +
>> +qemu_sem_destroy(&hrdata.sem);
>> +args[0] = hrdata.val.v64;
>> +
>> +return H_SUCCESS;
>> +}
>> +
>> +static void spapr_rng_instance_init(Object *obj)
>> +{
>> +sPAPRRngState *rngstate = SPAPR_RNG(obj);
>

[PATCH v3] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-11 Thread Thomas Huth

The PAPR interface defines a hypercall to pass high-quality
hardware generated random numbers to guests. Recent kernels can
already provide this hypercall to the guest if the right hardware
random number generator is available. But in case the user wants
to use another source like EGD, or QEMU is running with an older
kernel, we should also have this call in QEMU, so that guests that
do not support virtio-rng yet can get good random numbers, too.

This patch now adds a new pseude-device to QEMU that either
directly provides this hypercall to the guest or is able to
enable the in-kernel hypercall if available. The in-kernel
hypercall can be enabled with the use-kvm property, e.g.:

 qemu-system-ppc64 -device spapr-rng,use-kvm=true

For handling the hypercall in QEMU instead, a RngBackend is required
since the hypercall should provide "good" random data instead of
pseudo-random (like from a "simple" library function like rand()
or g_random_int()). Since there are multiple RngBackends available,
the user must select an appropriate backend via the "backend"
property of the device, e.g.:

 qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=rng0 \
   -device spapr-rng,backend=rng0 ...

See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
other example of specifying RngBackends.

Signed-off-by: Thomas Huth 
---
 v3:
 - Completely reworked the patch set accordingly to discussion
   on the mailing list, so that the code is now encapsulated
   as a QEMU device in a separate file.

 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/spapr.c |   8 +++
 hw/ppc/spapr_rng.c | 178 +
 include/hw/ppc/spapr.h |   4 ++
 target-ppc/kvm.c   |   9 +++
 target-ppc/kvm_ppc.h   |   5 ++
 6 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_rng.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..c1ffc77 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf0c64f..34e7d24 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -768,6 +768,14 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
 exit(1);
 }
 
+if (object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
+ret = spapr_rng_populate_dt(fdt);
+if (ret < 0) {
+fprintf(stderr, "couldn't setup rng device in fdt\n");
+exit(1);
+}
+}
+
 QLIST_FOREACH(phb, &spapr->phbs, list) {
 ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
 }
diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
new file mode 100644
index 000..d4923bc
--- /dev/null
+++ b/hw/ppc/spapr_rng.c
@@ -0,0 +1,178 @@
+/*
+ * QEMU sPAPR random number generator "device" for H_RANDOM hypercall
+ *
+ * Copyright 2015 Thomas Huth, Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/error-report.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/device_tree.h"
+#include "sysemu/rng.h"
+#include "hw/ppc/spapr.h"
+#include "kvm_ppc.h"
+
+#define SPAPR_RNG(obj) \
+OBJECT_CHECK(sPAPRRngState, (obj), TYPE_SPAPR_RNG)
+
+typedef struct sPAPRRngState {
+/*< private >*/
+DeviceState ds;
+RngBackend *backend;
+bool use_kvm;
+} sPAPRRngState;
+
+typedef struct HRandomData {
+QemuSemaphore sem;
+union {
+uint64_t v64;
+uint8_t v8[8];
+} val;
+int received;
+} HRandomData;
+
+/* Callback function for the RngBackend */
+static void random_recv(void *dest, const void *src, size_t size)
+{
+HRandomData *hrdp = dest;
+
+if (src && size > 0) {
+assert(size + hrdp->received <= sizeof(hrdp->val.v8));
+memcpy(&hrdp->val.v8[hrdp->received], src, size);
+

Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-08 Thread Thomas Huth

On 08/09/15 09:11, Christian Borntraeger wrote:
> Am 08.09.2015 um 08:05 schrieb Thomas Huth:
>> On 07/09/15 16:31, Igor Mammedov wrote:
>>> On Fri, 4 Sep 2015 12:04:41 +0200
>>> Alexander Graf  wrote:
>>>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>>>
>>>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>>>> you'll see that it aborts way earlier already.
>>>>>>
>>>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>>>
>>>>> When you are at it, the s390 value should also be increased I guess.
>>>>
>>>> That constant defines the array size for the memslot array in struct kvm
>>>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>>>> memory that is physically contiguous. Doing big allocations can turn
>>>> into problems during runtime.
>>>>
>>>> So maybe there is another way? Can we extend the memslot array size
>>>> dynamically somehow? Allocate it separately? How much memory does the
>>>> memslot array use up with 512 entries?
>>>
>>> KVM switched memslots allocation to kvm_kvzalloc(), so it would fallback to 
>>> vmalloc
>>>  commit 744961341d472db6272ed9b42319a90f5a2aa7c4
>>>  kvm: avoid page allocation failure in kvm_set_memory_region()
>>
>> Good hint, thanks for pointing that out! ... so increasing the array
>> size should not cause too much trouble :-)
> 
> Changing the allocation of the memslots to a growing structure seems like a
> good idea nevertheless. Any chance to do this as well when you are at it?

I'd like to finish some other stuff first, so if you want to have a try,
feel free to do so ... if not, I'll have a closer look at this later.

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-07 Thread Thomas Huth

On 07/09/15 16:31, Igor Mammedov wrote:
> On Fri, 4 Sep 2015 12:04:41 +0200
> Alexander Graf  wrote:
> 
>>
>>
>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>
>>>>  Hi all,
>>>>
>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>> you'll see that it aborts way earlier already.
>>>>
>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>
>>> When you are at it, the s390 value should also be increased I guess.
>>
>> That constant defines the array size for the memslot array in struct kvm
>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>> memory that is physically contiguous. Doing big allocations can turn
>> into problems during runtime.
>>
>> So maybe there is another way? Can we extend the memslot array size
>> dynamically somehow? Allocate it separately? How much memory does the
>> memslot array use up with 512 entries?
> 
> KVM switched memslots allocation to kvm_kvzalloc(), so it would fallback to 
> vmalloc
>  commit 744961341d472db6272ed9b42319a90f5a2aa7c4
>  kvm: avoid page allocation failure in kvm_set_memory_region()

Good hint, thanks for pointing that out! ... so increasing the array
size should not cause too much trouble :-)

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth

On 04/09/15 12:04, Alexander Graf wrote:
> 
> 
> On 04.09.15 11:59, Christian Borntraeger wrote:
>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>
>>>  Hi all,
>>>
>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>> you'll see that it aborts way earlier already.
>>>
>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>
>> When you are at it, the s390 value should also be increased I guess.
> 
> That constant defines the array size for the memslot array in struct kvm
> which in turn again gets allocated by kzalloc, so it's pinned kernel
> memory that is physically contiguous. Doing big allocations can turn
> into problems during runtime.

FWIW, I've just checked sizeof(struct kvm) with the current ppc64 kernel
build from master branch, and it is 34144 bytes.
So on a system that is using PAGE_SIZE = 64kB, there should be plenty of
space left before we're getting into trouble.

And even assuming the worst case, that we're on a system which still
uses PAGE_SIZE = 4kB, the last page of the 34144 bytes is only filled
with 1376 bytes, leaving 2720 bytes free right now.
sizeof(struct kvm_memory_slot) is 48 bytes right now on powerpc, and you
need two additional bytes per entry for the id_to_index array in
struct kvm_memslots, i.e. we need 50 additional bytes per entry on ppc.
That means we could increase KVM_USER_MEM_SLOTS by 2720 / 50 = 54
entries without getting into further trouble.

I think we should leave some more additional bytes left in that last
4k page of the struct kvm region, ... so what about increasing
KVM_USER_MEM_SLOTS to 32 + 48 = 80 now (instead of 32 + 54 = 86) to
ease the memslot situation at least a little bit 'till we figured out
a really final solution like growable memslots?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth

On 04/09/15 12:07, Christian Borntraeger wrote:
> Am 04.09.2015 um 12:04 schrieb Alexander Graf:
>>
>>
>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>
>>>>  Hi all,
>>>>
>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>> you'll see that it aborts way earlier already.
>>>>
>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>
>>> When you are at it, the s390 value should also be increased I guess.
>>
>> That constant defines the array size for the memslot array in struct kvm
>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>> memory that is physically contiguous. Doing big allocations can turn
>> into problems during runtime.
>>
>> So maybe there is another way? Can we extend the memslot array size
>> dynamically somehow? Allocate it separately? How much memory does the
>> memslot array use up with 512 entries?
> 
> Maybe some rcu protected scheme that doubles the amount of memslots for
> each overrun? Yes, that would be good and even reduce the footprint for
> systems with only a small number of memslots.

Seems like Alex Williamson already posted a patchset for growable
memslots a couple of years ago:

 http://www.spinics.net/lists/kvm/msg50491.html

But I didn't quite spot the result in that thread why it never has been
included upstream. Alex (W.), do you remember the outcome?

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth


 Hi all,

now that we get memory hotplugging for the spapr machine on qemu-ppc,
too, it seems like we easily can hit the amount of KVM-internal memory
slots now ("#define KVM_USER_MEM_SLOTS 32" in
arch/powerpc/include/asm/kvm_host.h). For example, start
qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
you'll see that it aborts way earlier already.

The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
already (+3 internal slots = 512) ... maybe we should now increase the
amount of slots on powerpc, too? Since we don't use internal slots on
POWER, would 512 be a good value? Or would less be sufficient, too?

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth

The size of the Problem State Priority Boost Register is only
32 bits, but the kvm_vcpu_arch->pspb variable is declared as
"ulong", ie. 64-bit. However, the assembler code accesses this
variable with 32-bit accesses, and the KVM_REG_PPC_PSPB macro
is defined with SIZE_U32, too, so that the current code is
broken on big endian hosts: kvmppc_get_one_reg_hv() will only
return zero for this register since it is using the wrong half
of the pspb variable. Let's fix this problem by adjusting the
size of the pspb field in the kvm_vcpu_arch structure.

Signed-off-by: Thomas Huth 
---
 v2: Updated patch description

 arch/powerpc/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..c825f3a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -473,7 +473,7 @@ struct kvm_vcpu_arch {
ulong ciabr;
ulong cfar;
ulong ppr;
-   ulong pspb;
+   u32 pspb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth

On 02/09/15 10:26, Alexander Graf wrote:
> 
>> Am 02.09.2015 um 09:26 schrieb Thomas Huth :
>>
>>> On 02/09/15 00:55, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote:
>>>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt
>>>> wrote:
>>>>> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
>>>>>> The size of the Problem State Priority Boost Register is only
>>>>>> 32 bits, so let's change the type of the corresponding variable
>>>>>> accordingly to avoid future trouble.
>>>>>
>>>>> It's not future trouble, it's broken today for LE and this should
>>>>> fix
>>>>> it BUT 
>>>>
>>>> No, it's broken today for BE hosts, which will always see 0 for the
>>>> PSPB register value.  LE hosts are fine.
>>
>> Right ... I just meant that nobody really experienced trouble with this
>> today yet, but the bug is already present now already of course.
> 
> Sounds like a great candidate for kvm-unit-tests then, no? ;)

I'm certainly looking forward to seeing powerpc support in there :-)

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth

On 02/09/15 00:55, Benjamin Herrenschmidt wrote:
> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote:
>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt
>> wrote:
>>> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
>>>> The size of the Problem State Priority Boost Register is only
>>>> 32 bits, so let's change the type of the corresponding variable
>>>> accordingly to avoid future trouble.
>>>
>>> It's not future trouble, it's broken today for LE and this should
>>> fix
>>> it BUT 
>>
>> No, it's broken today for BE hosts, which will always see 0 for the
>> PSPB register value.  LE hosts are fine.

Right ... I just meant that nobody really experienced trouble with this
today yet, but the bug is already present now already of course.

>>> The asm accesses it using lwz/stw and C accesses it as a ulong. On
>>> LE
>>> that will mean that userspace will see the value << 32
>>
>> No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a
>> 32-bit register, we'll just pass 0 back to userspace when it reads
>> it.
> 
> Ah ok, I missed that bit about KVM_REG_PPC_PSPB
> 
>>> Now "fixing" it might break migration if that field is already
>>> stored/loaded in its "broken" form. We may have to keep the
>>> "broken"
>>> behaviour and document that qemu sees a value shifted by 32.
>>
>> It will be being set to 0 on BE hosts across migration today
>> (fortunately 0 is a benign value for PSPB).  If we fix this on both
>> the source and destination host, then the value will get migrated
>> across correctly.
> 
> Ok, I missed the part where KVM_REG_PPC_PSPB passed it down as a 32
> -bit. That means Thomas patch should work indeed.

... and if I get the QEMU source code right, the register is currently
not migrated at all - or at least I was not able to find the spot in the
source code that migrates this register.

>> I think Thomas's patch is fine, it just needs a stronger patch
>> description saying that it fixes an actual bug.

Ok, I'll resend with a better patch description.

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth

On 02/09/15 00:24, Paul Mackerras wrote:
> On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote:
>> The size of the Problem State Priority Boost Register is only
>> 32 bits, so let's change the type of the corresponding variable
>> accordingly to avoid future trouble.
> 
> Since we're already using lwz/stw in the assembly code in
> book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it?
> How did you find it?  Did you observe a failure of some kind, or did
> you just find it by code inspection?

Code inspection. I was looking for similar problems like the issue with
the XER register that we've hit recently
(https://patchwork.ozlabs.org/patch/476872/) while I was trying to debug
a similar problem.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Thomas Huth

The size of the Problem State Priority Boost Register is only
32 bits, so let's change the type of the corresponding variable
accordingly to avoid future trouble.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..c825f3a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -473,7 +473,7 @@ struct kvm_vcpu_arch {
ulong ciabr;
ulong cfar;
ulong ppr;
-   ulong pspb;
+   u32 pspb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

2015-07-29 Thread Thomas Huth

- Original Message -
> The EPOW interrupt handler uses rtas_get_sensor(), which in turn
> uses rtas_busy_delay() to wait for RTAS becoming ready in case it
> is necessary. But rtas_busy_delay() is annotated with might_sleep()
> and thus may not be used by interrupts handlers like the EPOW handler!
> This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
> enabled:
> 
>  BUG: sleeping function called from invalid context at
>  arch/powerpc/kernel/rtas.c:496
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
>  Call Trace:
>  [c0007ffe7b90] [c0807670] dump_stack+0xa0/0xdc (unreliable)
>  [c0007ffe7bc0] [c00e1f14] ___might_sleep+0x134/0x180
>  [c0007ffe7c20] [c002aec0] rtas_busy_delay+0x30/0xd0
>  [c0007ffe7c50] [c002bde4] rtas_get_sensor+0x74/0xe0
>  [c0007ffe7ce0] [c0083264] ras_epow_interrupt+0x44/0x450
>  [c0007ffe7d90] [c0120260] handle_irq_event_percpu+0xa0/0x300
>  [c0007ffe7e70] [c0120524] handle_irq_event+0x64/0xc0
>  [c0007ffe7eb0] [c0124dbc] handle_fasteoi_irq+0xec/0x260
>  [c0007ffe7ef0] [c011f4f0] generic_handle_irq+0x50/0x80
>  [c0007ffe7f20] [c0010f3c] __do_irq+0x8c/0x200
>  [c0007ffe7f90] [c00236cc] call_do_irq+0x14/0x24
>  [c0007e6f39e0] [c0011144] do_IRQ+0x94/0x110
>  [c0007e6f3a30] [c0002594] hardware_interrupt_common+0x114/0x180
> 
> Fix this issue by introducing a new rtas_get_sensor_fast() function
> that does not use rtas_busy_delay() - and thus can only be used for
> sensors that do not cause a BUSY condition (which should be the case
> for the sensor that is queried by the EPOW IRQ handler).
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/powerpc/include/asm/rtas.h  |  1 +
>  arch/powerpc/kernel/rtas.c   | 17 +
>  arch/powerpc/platforms/pseries/ras.c |  3 ++-
>  3 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h
> b/arch/powerpc/include/asm/rtas.h
> index 7a4ede1..b77ef36 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -343,6 +343,7 @@ extern void rtas_power_off(void);
>  extern void rtas_halt(void);
>  extern void rtas_os_term(char *str);
>  extern int rtas_get_sensor(int sensor, int index, int *state);
> +extern int rtas_get_sensor_fast(int sensor, int index, int *state);
>  extern int rtas_get_power_level(int powerdomain, int *level);
>  extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
>  extern bool rtas_indicator_present(int token, int *maxindex);
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 7a488c1..caffb10 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -584,6 +584,23 @@ int rtas_get_sensor(int sensor, int index, int *state)
>  }
>  EXPORT_SYMBOL(rtas_get_sensor);
>  
> +int rtas_get_sensor_fast(int sensor, int index, int *state)
> +{
> + int token = rtas_token("get-sensor-state");
> + int rc;
> +
> + if (token == RTAS_UNKNOWN_SERVICE)
> + return -ENOENT;
> +
> + rc = rtas_call(token, 2, 2, state, sensor, index);
> + WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> + rc <= RTAS_EXTENDED_DELAY_MAX));
> +
> + if (rc < 0)
> + return rtas_error_rc(rc);
> + return rc;
> +}
> +
>  bool rtas_indicator_present(int token, int *maxindex)
>  {
>   int proplen, count, i;
> diff --git a/arch/powerpc/platforms/pseries/ras.c
> b/arch/powerpc/platforms/pseries/ras.c
> index 02e4a17..3b6647e 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -189,7 +189,8 @@ static irqreturn_t ras_epow_interrupt(int irq, void
> *dev_id)
>   int state;
>   int critical;
>  
> - status = rtas_get_sensor(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX, &state);
> + status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
> +   &state);
>  
>   if (state > 3)
>   critical = 1;   /* Time Critical */
> --
> 1.8.3.1

*ping*

Michael, do you think this patch is OK for fixing this problem?
Or shall I rather send a patch to simply revert 587f83e8dd50d instead?

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

2015-07-22 Thread Thomas Huth

On 22/07/15 13:25, Michael Ellerman wrote:
> On Fri, 2015-17-07 at 10:46:58 UTC, Thomas Huth wrote:
>> The EPOW interrupt handler uses rtas_get_sensor(), which in turn
>> uses rtas_busy_delay() to wait for RTAS becoming ready in case it
>> is necessary. But rtas_busy_delay() is annotated with might_sleep()
>> and thus may not be used by interrupts handlers like the EPOW handler!
>> This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
>> enabled:
> 
> When did we break this?

 Hi Michael,

the bug has been introduced by commit 587f83e8dd50d22bc0c62e32ec49fd31
("powerpc/pseries: Use rtas_get_sensor in RAS code") which switched the
EPOW handler to use rtas_get_sensor() instead of using rtas_call directly.

Also have a look at this thread here:
http://www.spinics.net/lists/kvm-ppc/msg10768.html

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

2015-07-17 Thread Thomas Huth

The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

 BUG: sleeping function called from invalid context at 
arch/powerpc/kernel/rtas.c:496
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
 Call Trace:
 [c0007ffe7b90] [c0807670] dump_stack+0xa0/0xdc (unreliable)
 [c0007ffe7bc0] [c00e1f14] ___might_sleep+0x134/0x180
 [c0007ffe7c20] [c002aec0] rtas_busy_delay+0x30/0xd0
 [c0007ffe7c50] [c002bde4] rtas_get_sensor+0x74/0xe0
 [c0007ffe7ce0] [c0083264] ras_epow_interrupt+0x44/0x450
 [c0007ffe7d90] [c0120260] handle_irq_event_percpu+0xa0/0x300
 [c0007ffe7e70] [c0120524] handle_irq_event+0x64/0xc0
 [c0007ffe7eb0] [c0124dbc] handle_fasteoi_irq+0xec/0x260
 [c0007ffe7ef0] [c011f4f0] generic_handle_irq+0x50/0x80
 [c0007ffe7f20] [c0010f3c] __do_irq+0x8c/0x200
 [c0007ffe7f90] [c00236cc] call_do_irq+0x14/0x24
 [c0007e6f39e0] [c0011144] do_IRQ+0x94/0x110
 [c0007e6f3a30] [c0002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition (which should be the case
for the sensor that is queried by the EPOW IRQ handler).

Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/rtas.h  |  1 +
 arch/powerpc/kernel/rtas.c   | 17 +
 arch/powerpc/platforms/pseries/ras.c |  3 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 7a4ede1..b77ef36 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -343,6 +343,7 @@ extern void rtas_power_off(void);
 extern void rtas_halt(void);
 extern void rtas_os_term(char *str);
 extern int rtas_get_sensor(int sensor, int index, int *state);
+extern int rtas_get_sensor_fast(int sensor, int index, int *state);
 extern int rtas_get_power_level(int powerdomain, int *level);
 extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
 extern bool rtas_indicator_present(int token, int *maxindex);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 7a488c1..caffb10 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -584,6 +584,23 @@ int rtas_get_sensor(int sensor, int index, int *state)
 }
 EXPORT_SYMBOL(rtas_get_sensor);
 
+int rtas_get_sensor_fast(int sensor, int index, int *state)
+{
+   int token = rtas_token("get-sensor-state");
+   int rc;
+
+   if (token == RTAS_UNKNOWN_SERVICE)
+   return -ENOENT;
+
+   rc = rtas_call(token, 2, 2, state, sensor, index);
+   WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
+   rc <= RTAS_EXTENDED_DELAY_MAX));
+
+   if (rc < 0)
+   return rtas_error_rc(rc);
+   return rc;
+}
+
 bool rtas_indicator_present(int token, int *maxindex)
 {
int proplen, count, i;
diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..3b6647e 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -189,7 +189,8 @@ static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
int state;
int critical;
 
-   status = rtas_get_sensor(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX, &state);
+   status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
+ &state);
 
if (state > 3)
critical = 1;   /* Time Critical */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-07-16 Thread Thomas Huth

On 05/27/2015 01:56 AM, Sam Bobroff wrote:
> In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
> bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
> accessed as such.
> 
> This patch corrects places where it is accessed as a 32 bit field by a
> 64 bit kernel.  In some cases this is via a 32 bit load or store
> instruction which, depending on endianness, will cause either the
> lower or upper 32 bits to be missed.  In another case it is cast as a
> u32, causing the upper 32 bits to be cleared.
> 
> This patch corrects those places by extending the access methods to
> 64 bits.
> 
> Signed-off-by: Sam Bobroff 

Reviewed-by: Thomas Huth 

Actually this patch also fixes a bug that SLOF sometimes crashes when a
vCPU gets kicked out of kernel mode (see the following URL for details:
https://bugzilla.redhat.com/show_bug.cgi?id=1178502 ), and I've just
tested that this bug does not occur with this patch anymore, so also:

Tested-by: Thomas Huth 

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG: sleeping function called from ras_epow_interrupt context

2015-07-15 Thread Thomas Huth

On 07/15/2015 09:58 PM, Nathan Fontenot wrote:
> On 07/15/2015 09:35 AM, Thomas Huth wrote:
>> On 07/14/2015 11:22 PM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2015-07-14 at 20:43 +0200, Thomas Huth wrote:
>>>> Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use
>>>> mdelay() instead of msleep() in rtas_busy_delay()? Something more
>>>> fancy?
>>>
>>> A proper fix would be more fancy, the get_sensor should happen in a
>>> kernel thread instead.
>>
>> I'm not very familiar with this stuff, but isn't the EPOW interrupt
>> something that is very time-critical? Moving parts of the handler into a
>> kernel thread then does not sound like a very good idea to me...
>>
>> Another question: Can it happen at all that this get-sensor call results
>> in a sleep condition? Looking at commit ID
>> 81b73dd92b97423b8f5324a59044da478c04f4c4 ("Fix might-sleep warning on
>> removing cpus"), which apparently fixed a similar issue for CPU
>> hot-plugging, indicates that at least some of the rtas calls are never
>> returning the busy code? In that case we could fix this by introducing a
>> similar rtas_get_sensor_fast() function? (or simply revert 587f83e8dd50d
>> which would be quite similar, I think)
>>
> 
> Looking at the PAPR, the get-sensor-state rtas call for the EPOW sensor
> is listed as a fast call and should not return a busy indication.

Great, good to know, thanks for looking that up! So IMHO we should
either introduce a rtas_get_sensor_fast() function or revert
587f83e8dd50d ... any preferences? Shall I come up with a patch?

> I'm curious as to why we're getting a busy return indication when
> making this call.

Looking at the code again, rtas_busy_delay() likely never slept ... it's
likely just the "might_sleep()" annotation in that function that causes
the BUG.

 Thomas




signature.asc
Description: OpenPGP digital signature

Re: BUG: sleeping function called from ras_epow_interrupt context

2015-07-15 Thread Thomas Huth

On 07/14/2015 11:22 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2015-07-14 at 20:43 +0200, Thomas Huth wrote:
>> Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use
>> mdelay() instead of msleep() in rtas_busy_delay()? Something more
>> fancy?
> 
> A proper fix would be more fancy, the get_sensor should happen in a
> kernel thread instead.

I'm not very familiar with this stuff, but isn't the EPOW interrupt
something that is very time-critical? Moving parts of the handler into a
kernel thread then does not sound like a very good idea to me...

Another question: Can it happen at all that this get-sensor call results
in a sleep condition? Looking at commit ID
81b73dd92b97423b8f5324a59044da478c04f4c4 ("Fix might-sleep warning on
removing cpus"), which apparently fixed a similar issue for CPU
hot-plugging, indicates that at least some of the rtas calls are never
returning the busy code? In that case we could fix this by introducing a
similar rtas_get_sensor_fast() function? (or simply revert 587f83e8dd50d
which would be quite similar, I think)

 Thomas

signature.asc
Description: OpenPGP digital signature

BUG: sleeping function called from ras_epow_interrupt context

2015-07-14 Thread Thomas Huth


 Hi all!

A colleague recently ran into some kernel BUG messages that happen when
hot-plugging a virtio disk to a KVM guest on powerpc (with "virsh
attach-disk"), and IIRC CONFIG_DEBUG_ATOMIC_SLEEP enabled. I've tried to
re-create the problem with an up-to-date kernel (4.2.0-rc2) and the
problem still seems to be there:

The hotplug action triggers the ras_epow_interrupt() in
arch/powerpc/platforms/pseries/ras.c, which again calls
rtas_get_sensor(). That function then uses rtas_busy_delay() to wait in
case the RTAS call did not succeed immediately. But rtas_busy_delay()
uses msleep() for sleeping - which is forbidden during an atomic
interrupt context!

Following backtrace is printed out by the kernel:

[   33.920528] BUG: sleeping function called from invalid context at
/home/thuth/devel/linux-up/arch/powerpc/kernel/rtas.c:496
[   33.920590] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
[   33.920624] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #1
[   33.920657] Call Trace:
[   33.920677] [c0007ffe79b0] [c07e43f4]
.dump_stack+0x98/0xd4 (unreliable)
[   33.920729] [c0007ffe7a30] [c00dcc78]
.___might_sleep+0x128/0x170
[   33.920769] [c0007ffe7aa0] [c0029f38]
.rtas_busy_delay+0x28/0xe0
[   33.920809] [c0007ffe7b20] [c002adb4]
.rtas_get_sensor+0x74/0xe0
[   33.920850] [c0007ffe7bc0] [c007ff58]
.ras_epow_interrupt+0x48/0x450
[   33.920896] [c0007ffe7c80] [c0119d94]
.handle_irq_event_percpu+0xa4/0x310
[   33.920942] [c0007ffe7d70] [c011a05c]
.handle_irq_event+0x5c/0xa0
[   33.920982] [c0007ffe7e00] [c011e7a8]
.handle_fasteoi_irq+0xe8/0x270
[   33.921028] [c0007ffe7e90] [c01190bc]
.generic_handle_irq+0x4c/0x80
[   33.921074] [c0007ffe7f10] [c0010a48] .__do_irq+0x88/0x1f0
[   33.921115] [c0007ffe7f90] [c0022a0c] .call_do_irq+0x14/0x24
[   33.921155] [c0007e6f37e0] [c0010c3c] .do_IRQ+0x8c/0x100
[   33.921195] [c0007e6f3880] [c0002594]
hardware_interrupt_common+0x114/0x180
[   33.921243] --- interrupt: 501 at .plpar_hcall_norets+0x14/0x20
[   33.921243] LR = .check_and_cede_processor+0x24/0x40
[   33.921300] [c0007e6f3b70] []   (null)
(unreliable)
[   33.921347] [c0007e6f3be0] [c0628068]
.shared_cede_loop+0x58/0x160
[   33.921393] [c0007e6f3c70] [c06259ac]
.cpuidle_enter_state+0xbc/0x3b0
[   33.921439] [c0007e6f3d30] [c00fe32c] .call_cpuidle+0x4c/0xa0
[   33.921479] [c0007e6f3db0] [c00fe700]
.cpu_startup_entry+0x380/0x4a0
[   33.921526] [c0007e6f3ed0] [c0043110]
.start_secondary+0x320/0x350
[   33.921571] [c0007e6f3f90] [c0008b6c]
start_secondary_prolog+0x10/0x14

I think that bug might have been introduced by commit
587f83e8dd50d22bc0c62 ("Use rtas_get_sensor in RAS code") since the
rtas_busy_delay() was not called before that commit, as far as I can see.

Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use
mdelay() instead of msleep() in rtas_busy_delay()? Something more fancy?

 Thanks,
  Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth

On Thu, 9 Jul 2015 16:07:47 +0300
"Michael S. Tsirkin"  wrote:

> On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 09/07/2015 11:48, Laurent Vivier wrote:
> > > 
> > > 
> > > On 09/07/2015 09:49, Thomas Huth wrote:
> > >> The option for supporting cross-endianness legacy guests in
> > >> the vhost and tun code should only be available on systems
> > >> that support cross-endian guests.
> > > 
> > > I'm sure I misunderstand something, but what happens if we use QEMU with
> > > TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
> > > little endian host ?
> > 
> > TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.
> > 
> > Paolo
> 
> vhost does not require irqfd anymore.  I think ioeventfd actually works
> fine though I didn't try, it would be easy to support.

That's an interesting issue, thanks for pointing this out, Laurent! So
do we now rather want to leave everything as it currently is, in case
somebody wants to use vhost-net with a cross-endian TCG guest one day?

Or do we assume that either
a) TCG is so slow anyway that nobody wants to accelerate it with vhost
or
b) TCG vhost likely won't happen that soon so we hope that everybody
will already be using virtio 1.0 at that point in time (with a fixed
endianness) 
?
... then I think we should go on and include this patch.

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth

The option for supporting cross-endianness legacy guests in
the vhost and tun code should only be available on systems
that support cross-endian guests.

Signed-off-by: Thomas Huth 
---
 arch/arm/kvm/Kconfig | 1 +
 arch/arm64/kvm/Kconfig   | 1 +
 arch/powerpc/kvm/Kconfig | 1 +
 drivers/net/Kconfig  | 1 +
 drivers/vhost/Kconfig| 1 +
 virt/kvm/Kconfig | 3 +++
 6 files changed, 8 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..9d8f363 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..9af39fe 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..e028710 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support running unmodified book3s_64 guest kernels in
  virtual machines on POWER7 and PPC970 processors that have
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e6..0c4ce47 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -261,6 +261,7 @@ config TUN
 config TUN_VNET_CROSS_LE
bool "Support for cross-endian vnet headers on little-endian kernels"
default n
+   depends on KVM_CROSS_ENDIAN_GUESTS
---help---
  This option allows TUN/TAP and MACVTAP device drivers in a
  little-endian kernel to parse vnet headers that come from a
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..4d8ae6b 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -35,6 +35,7 @@ config VHOST
 
 config VHOST_CROSS_ENDIAN_LEGACY
bool "Cross-endian support for vhost"
+   depends on KVM_CROSS_ENDIAN_GUESTS
default n
---help---
  This option allows vhost to support guests with a different byte
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index e2c876d..cc7b28a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
 config KVM_COMPAT
def_bool y
depends on COMPAT && !S390
+
+config KVM_CROSS_ENDIAN_GUESTS
+   bool
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig

2015-05-22 Thread Thomas Huth

Since the PPC970 support has been removed from the kvm-hv kernel
module recently, we should also reflect this change in the help
text of the corresponding Kconfig option.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/kvm/Kconfig | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..c2024ac 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -74,14 +74,14 @@ config KVM_BOOK3S_64
  If unsure, say N.
 
 config KVM_BOOK3S_64_HV
-   tristate "KVM support for POWER7 and PPC970 using hypervisor mode in 
host"
+   tristate "KVM for POWER7 and later using hypervisor mode in host"
depends on KVM_BOOK3S_64 && PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
---help---
  Support running unmodified book3s_64 guest kernels in
- virtual machines on POWER7 and PPC970 processors that have
+ virtual machines on POWER7 and newer processors that have
  hypervisor mode available to the host.
 
  If you say Y here, KVM will use the hardware virtualization
@@ -89,8 +89,8 @@ config KVM_BOOK3S_64_HV
  guest operating systems will run at full hardware speed
  using supervisor and user modes.  However, this also means
  that KVM is not usable under PowerVM (pHyp), is only usable
- on POWER7 (or later) processors and PPC970-family processors,
- and cannot emulate a different processor from the host processor.
+ on POWER7 or later processors, and cannot emulate a
+ different processor from the host processor.
 
  If unsure, say N.
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Fix warnings from sparse

2015-05-22 Thread Thomas Huth

When compiling the KVM code for POWER with "make C=1", sparse
complains about functions missing proper prototypes and a 64-bit
constant missing the ULL prefix. Let's fix this by making the
functions static or by including the proper header with the
prototypes, and by appending a ULL prefix to the constant
PPC_MPPE_ADDRESS_MASK.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/ppc-opcode.h| 2 +-
 arch/powerpc/kvm/book3s.c| 3 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_64_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_emulate.c| 1 +
 arch/powerpc/kvm/book3s_hv.c | 8 
 arch/powerpc/kvm/book3s_paired_singles.c | 2 +-
 arch/powerpc/kvm/powerpc.c   | 2 +-
 8 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 5c93f69..05ac8b81 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -285,7 +285,7 @@
 
 /* POWER8 Micro Partition Prefetch (MPP) parameters */
 /* Address mask is common for LOGMPP instruction and MPPR SPR */
-#define PPC_MPPE_ADDRESS_MASK 0xc000
+#define PPC_MPPE_ADDRESS_MASK 0xc000ULL
 
 /* Bits 60 and 61 of MPP SPR should be set to one of the following */
 /* Aborting the fetch is indeed setting 00 in the table size bits */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 453a8a4..0e7d606 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -240,7 +240,8 @@ void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, 
ulong flags)
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE);
 }
 
-int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu,
+unsigned int priority)
 {
int deliver = 1;
int vec = 0;
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 2035d16..d5c9bfe 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include "book3s.h"
 
 /* #define DEBUG_MMU */
 /* #define DEBUG_SR */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index b982d92..79ad35a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include "trace_pr.h"
+#include "book3s.h"
 
 #define PTE_SIZE 12
 
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 5a2bc4b..2afdb9c 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include "book3s.h"
 
 #define OP_19_XOP_RFID 18
 #define OP_19_XOP_RFI  50
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..5c34f7d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -214,12 +214,12 @@ static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 
msr)
kvmppc_end_cede(vcpu);
 }
 
-void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
+static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
 {
vcpu->arch.pvr = pvr;
 }
 
-int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
+static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
 {
unsigned long pcr = 0;
struct kvmppc_vcore *vc = vcpu->arch.vcore;
@@ -259,7 +259,7 @@ int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 
arch_compat)
return 0;
 }
 
-void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
+static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 {
int r;
 
@@ -292,7 +292,7 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
   vcpu->arch.last_inst);
 }
 
-struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
+static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
 {
int r;
struct kvm_vcpu *v, *ret = NULL;
diff --git a/arch/powerpc/kvm/book3s_paired_singles.c 
b/arch/powerpc/kvm/book3s_paired_singles.c
index bd6ab16..a759d9a 100644
--- a/arch/powerpc/kvm/book3s_paired_singles.c
+++ b/arch/powerpc/kvm/book3s_paired_singles.c
@@ -352,7 +352,7 @@ static inline u32 inst_get_field(u32 inst, int msb, int lsb)
return kvmppc_get_field(inst, msb + 32, lsb + 32);
 }
 
-bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
+static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
 {
if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
return false;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8cd1f80..b509e2e 100644
--- a/arch/powerpc

[PATCH] KVM: PPC: Book3S HV: Replace kvmppc_find_vcpu() with kvm_get_vcpu()

2015-05-07 Thread Thomas Huth

Both functions are doing the same thing - looking up the struct
kvm_vcpu pointer for a given vCPU ID. So there's no need for the
kvmppc_find_vcpu() function, simply use the common function instead.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/kvm/book3s_hv.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..78e62cf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -292,22 +292,6 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
   vcpu->arch.last_inst);
 }
 
-struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
-{
-   int r;
-   struct kvm_vcpu *v, *ret = NULL;
-
-   mutex_lock(&kvm->lock);
-   kvm_for_each_vcpu(r, v, kvm) {
-   if (v->vcpu_id == id) {
-   ret = v;
-   break;
-   }
-   }
-   mutex_unlock(&kvm->lock);
-   return ret;
-}
-
 static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa)
 {
vpa->__old_status |= LPPACA_OLD_SHARED_PROC;
@@ -358,7 +342,7 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
int subfunc;
struct kvmppc_vpa *vpap;
 
-   tvcpu = kvmppc_find_vcpu(kvm, vcpuid);
+   tvcpu = kvm_get_vcpu(kvm, vcpuid);
if (!tvcpu)
return H_PARAMETER;
 
@@ -678,7 +662,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
case H_PROD:
target = kvmppc_get_gpr(vcpu, 4);
-   tvcpu = kvmppc_find_vcpu(vcpu->kvm, target);
+   tvcpu = kvm_get_vcpu(vcpu->kvm, target);
if (!tvcpu) {
ret = H_PARAMETER;
break;
@@ -696,7 +680,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
target = kvmppc_get_gpr(vcpu, 4);
if (target == -1)
break;
-   tvcpu = kvmppc_find_vcpu(vcpu->kvm, target);
+   tvcpu = kvm_get_vcpu(vcpu->kvm, target);
if (!tvcpu) {
ret = H_PARAMETER;
break;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-21 Thread Thomas Huth

Am Tue, 21 Apr 2015 16:51:21 +1000
schrieb David Gibson :

> On Tue, Apr 21, 2015 at 08:37:02AM +0200, Thomas Huth wrote:
> > Am Tue, 21 Apr 2015 10:41:51 +1000
> > schrieb David Gibson :
> > 
> > > On POWER, storage caching is usually configured via the MMU - attributes
> > > such as cache-inhibited are stored in the TLB and the hashed page table.
> > > 
> > > This makes correctly performing cache inhibited IO accesses awkward when
> > > the MMU is turned off (real mode).  Some CPU models provide special
> > > registers to control the cache attributes of real mode load and stores but
> > > this is not at all consistent.  This is a problem in particular for SLOF,
> > > the firmware used on KVM guests, which runs entirely in real mode, but
> > > which needs to do IO to load the kernel.
> > > 
> > > To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
> > > and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
> > > a logical address (aka guest physical address).  SLOF uses these for IO.
> > > 
> > > However, because these are implemented within qemu, not the host kernel,
> > > these bypass any IO devices emulated within KVM itself.  The simplest way
> > > to see this problem is to attempt to boot a KVM guest from a virtio-blk
> > > device with iothread / dataplane enabled.  The iothread code relies on an
> > > in kernel implementation of the virtio queue notification, which is not
> > > triggered by the IO hcalls, and so the guest will stall in SLOF unable to
> > > load the guest OS.
> > > 
> > > This patch addresses this by providing in-kernel implementations of the
> > > 2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
> > > address not handled by the KVM IO bus will cause a VM exit, hitting the
> > > qemu implementation as before.
> > > 
> > > Note that a userspace change is also required, in order to enable these
> > > new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
> > > 
> > > Signed-off-by: David Gibson 
> > > ---
> > >  arch/powerpc/include/asm/kvm_book3s.h |  3 ++
> > >  arch/powerpc/kvm/book3s.c | 76 
> > > +++
> > >  arch/powerpc/kvm/book3s_hv.c  | 12 ++
> > >  arch/powerpc/kvm/book3s_pr_papr.c | 28 +
> > >  4 files changed, 119 insertions(+)
> > ...
> > > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> > > index cfbcdc6..453a8a4 100644
> > > --- a/arch/powerpc/kvm/book3s.c
> > > +++ b/arch/powerpc/kvm/book3s.c
> > > @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
> > >  #endif
> > >  }
> > >  
> > > +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
> > > +{
> > > + unsigned long size = kvmppc_get_gpr(vcpu, 4);
> > > + unsigned long addr = kvmppc_get_gpr(vcpu, 5);
> > > + u64 buf;
> > > + int ret;
> > > +
> > > + if (!is_power_of_2(size) || (size > sizeof(buf)))
> > > + return H_TOO_HARD;
> > > +
> > > + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf);
> > > + if (ret != 0)
> > > + return H_TOO_HARD;
> > > +
> > > + switch (size) {
> > > + case 1:
> > > + kvmppc_set_gpr(vcpu, 4, *(u8 *)&buf);
> > > + break;
> > > +
> > 
> > Most of the code in book3s.c seems not to use a empty line after a
> > "break;", so may I suggest to remove these empty lines here, too, to
> > keep the coding style a little bit more consistent?
> 
> I don't think it's worth respinning just for that.
> 
> > > + case 2:
> > > + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)&buf));
> > > + break;
> > > +
> > > + case 4:
> > > + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)&buf));
> > > + break;
> > > +
> > > + case 8:
> > > + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)&buf));
> > > + break;
> > > +
> > > + default:
> > > + BUG();
> > 
> > If I got the code right, a malicious guest could easily trigger this
> > BUG() statement, couldn't it? ... so a BUG() is maybe not the right
> > thing to do here. Would it be appropriate to return an error value to
> > the guest instead?
> 
> Actually no - the test at the top of the function for
> is_power_of_2(size) etc. catches this safely before we get here.  The
> BUG() is just paranoia.

Ah, missed that, you're right, so the code should be fine!

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-20 Thread Thomas Huth

Am Tue, 21 Apr 2015 10:41:51 +1000
schrieb David Gibson :

> On POWER, storage caching is usually configured via the MMU - attributes
> such as cache-inhibited are stored in the TLB and the hashed page table.
> 
> This makes correctly performing cache inhibited IO accesses awkward when
> the MMU is turned off (real mode).  Some CPU models provide special
> registers to control the cache attributes of real mode load and stores but
> this is not at all consistent.  This is a problem in particular for SLOF,
> the firmware used on KVM guests, which runs entirely in real mode, but
> which needs to do IO to load the kernel.
> 
> To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
> and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
> a logical address (aka guest physical address).  SLOF uses these for IO.
> 
> However, because these are implemented within qemu, not the host kernel,
> these bypass any IO devices emulated within KVM itself.  The simplest way
> to see this problem is to attempt to boot a KVM guest from a virtio-blk
> device with iothread / dataplane enabled.  The iothread code relies on an
> in kernel implementation of the virtio queue notification, which is not
> triggered by the IO hcalls, and so the guest will stall in SLOF unable to
> load the guest OS.
> 
> This patch addresses this by providing in-kernel implementations of the
> 2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
> address not handled by the KVM IO bus will cause a VM exit, hitting the
> qemu implementation as before.
> 
> Note that a userspace change is also required, in order to enable these
> new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/include/asm/kvm_book3s.h |  3 ++
>  arch/powerpc/kvm/book3s.c | 76 
> +++
>  arch/powerpc/kvm/book3s_hv.c  | 12 ++
>  arch/powerpc/kvm/book3s_pr_papr.c | 28 +
>  4 files changed, 119 insertions(+)
...
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index cfbcdc6..453a8a4 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
>  #endif
>  }
>  
> +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
> +{
> + unsigned long size = kvmppc_get_gpr(vcpu, 4);
> + unsigned long addr = kvmppc_get_gpr(vcpu, 5);
> + u64 buf;
> + int ret;
> +
> + if (!is_power_of_2(size) || (size > sizeof(buf)))
> + return H_TOO_HARD;
> +
> + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf);
> + if (ret != 0)
> + return H_TOO_HARD;
> +
> + switch (size) {
> + case 1:
> + kvmppc_set_gpr(vcpu, 4, *(u8 *)&buf);
> + break;
> +

Most of the code in book3s.c seems not to use a empty line after a
"break;", so may I suggest to remove these empty lines here, too, to
keep the coding style a little bit more consistent?

> + case 2:
> + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)&buf));
> + break;
> +
> + case 4:
> + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)&buf));
> + break;
> +
> + case 8:
> + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)&buf));
> + break;
> +
> + default:
> + BUG();

If I got the code right, a malicious guest could easily trigger this
BUG() statement, couldn't it? ... so a BUG() is maybe not the right
thing to do here. Would it be appropriate to return an error value to
the guest instead?

> + }
> +
> + return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

48 matches

Mail list logo