Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
On Thu, Oct 03, 2013 at 08:21:20AM +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-10-02 at 17:37 +0300, Gleb Natapov wrote: On Wed, Oct 02, 2013 at 04:33:18PM +0200, Paolo Bonzini wrote: Il 02/10/2013 16:08, Alexander Graf ha scritto: The hwrng is accessible by host userspace via /dev/mem. A guest should live on the same permission level as a user space application. If you run QEMU as UID 1000 without access to /dev/mem, why should the guest suddenly be able to directly access a memory location (MMIO) it couldn't access directly through a normal user space interface. It's basically a layering violation. With Michael's earlier patch in this series, the hwrng is accessible by host userspace via /dev/hwrng, no? Access to which can be controlled by its permission. Permission of /dev/kvm may be different. If we route hypercall via userspace and configure qemu to get entropy from /dev/hwrng everything will fall nicely together (except performance). Yes, except abysmall performance and a lot more code for something completely and utterly pointless nice. Pointless? You yourself said that fallback to userspace will be required for migration, so the code have to be there regardless. About abysmal performance this is what you repeatedly refused to prove. All you said is that exit to userspace is expensive, we all know that, it is slow for all arch and all devices implemented in usrerspace, but we do not move all of them to the kernel. We do move some, most performance critical, so all you need to show that for typical guest workload having device in the kernel speed up things measurably. Why not do that instead of writing rude emails? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
On Thu, Oct 03, 2013 at 08:07:22AM +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-10-02 at 13:02 +0300, Gleb Natapov wrote: Yes, I alluded to it in my email to Paul and Paolo asked also. How this interface is disabled? Also hwrnd is MMIO in a host why guest needs to use hypercall instead of emulating the device (in kernel or somewhere else?). Another things is that on a host hwrnd is protected from direct userspace access by virtue of been a device, but guest code (event kernel mode) is userspace as far as hosts security model goes, so by implementing this hypercall in a way that directly access hwrnd you expose hwrnd to a userspace unconditionally. Why is this a good idea? BTW. Is this always going to be like this ? If something questionable will be noticed explanation will be required. It is like that for all arches and all parts of kernel. Every *single* architectural or design decision we make for our architecture has to be justified 30 times over, every piece of code bike shedded to oblivion for month, etc... ? This is simply not true, most powerpc patches go in without any comments. Do we always have to finally get to some kind of agreement on design, go to the 6 month bike-shedding phase, just to have somebody else come up and start re-questioning the whole original design (without any understanding of our specific constraints of course) ? Do you really think that nobody here understands that exit to userspace is slow? You guys are the most horrendous community I have ever got to work with. It's simply impossible to get anything done in any reasonable time frame . At this stage, it would have taken us an order of magnitude less time to simply rewrite an entire hypervisor from scratch. Of course, it is always much easier to ignore other people input and do everything your way. Why listen to people who deal with migration issues for many years if you can commit the patch and forget about it until migration fails, but who cares, you got there in an order of magnitude less time and this is what counts. This is sad. Agree. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
On Thu, 2013-10-03 at 08:43 +0300, Gleb Natapov wrote: Why it can be a bad idea? User can drain hwrng continuously making other users of it much slower, or even worse, making them fall back to another much less reliable, source of entropy. Not in a very significant way, we generate entropy at 1Mhz after all, which is pretty good. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -V2] kvm: powerpc: book3s: Fix build break for BOOK3S_32
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com This was introduced by 85a0d845d8 (KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache). arch/powerpc/kvm/book3s_pr.c: In function 'kvmppc_core_vcpu_create': arch/powerpc/kvm/book3s_pr.c:1182:30: error: 'struct kvmppc_vcpu_book3s' has no member named 'shadow_vcpu' make[1]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1 Acked-by: Paul Mackerras pau...@samba.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_pr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 8941885..6075dbd 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1179,7 +1179,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) #ifdef CONFIG_KVM_BOOK3S_32 vcpu-arch.shadow_vcpu = - kzalloc(sizeof(*vcpu_book3s-shadow_vcpu), GFP_KERNEL); + kzalloc(sizeof(*vcpu-arch.shadow_vcpu), GFP_KERNEL); if (!vcpu-arch.shadow_vcpu) goto free_vcpu3s; #endif -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
On Thu, Oct 03, 2013 at 08:48:03AM +0300, Gleb Natapov wrote: On Thu, Oct 03, 2013 at 08:45:42AM +1000, Paul Mackerras wrote: On Wed, Oct 02, 2013 at 04:36:05PM +0200, Alexander Graf wrote: On 02.10.2013, at 16:33, Paolo Bonzini wrote: Il 02/10/2013 16:08, Alexander Graf ha scritto: The hwrng is accessible by host userspace via /dev/mem. A guest should live on the same permission level as a user space application. If you run QEMU as UID 1000 without access to /dev/mem, why should the guest suddenly be able to directly access a memory location (MMIO) it couldn't access directly through a normal user space interface. It's basically a layering violation. With Michael's earlier patch in this series, the hwrng is accessible by host userspace via /dev/hwrng, no? Yes, but there's not token from user space that gets passed into the kernel to check whether access is ok or not. So while QEMU may not have permission to open /dev/hwrng it could spawn a guest that opens it, drains all entropy out of it and thus stall other processes which try to fetch entropy, no? Even if you drain all entropy out of it, wait 64 microseconds and it will be full again. :) Basically it produces 64 bits every microsecond and puts that in a 64 entry x 64-bit FIFO buffer, which is what is read by the MMIO. So there is no danger of stalling other processes for any significant amount of time. Even if user crates 100s guests each one of which reads hwrng in a loop? Well, you can't actually have more guests running than there are cores in a system. POWER7+ has one RNG per chip and 8 cores per chip, each of which can run 4 threads (which have to be in the same guest). Michael's code uses the RNG on the same chip. Worst case therefore is 32 threads accessing the same RNG, so a given thread might have to wait up to 32 microseconds for its data. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] KVM: PPC: Book3S: MMIO emulation support for little endian guests
MMIO emulation reads the last instruction executed by the guest and then emulates. If the guest is running in Little Endian mode, the instruction needs to be byte-swapped before being emulated. This patch stores the last instruction in the endian order of the host, primarily doing a byte-swap if needed. The common code which fetches last_inst uses a helper routine kvmppc_need_byteswap(). and the exit paths for the Book3S PV and HR guests use their own version in assembly. kvmppc_emulate_instruction() also uses kvmppc_need_byteswap() to define in which endian order the mmio needs to be done. The patch is based on Alex Graf's kvm-ppc-queue branch and it has been tested on Big Endian and Little Endian HV guests and Big Endian PR guests. Signed-off-by: Cédric Le Goater c...@fr.ibm.com --- Here are some comments/questions : * the host is assumed to be running in Big Endian. when Little Endian hosts are supported in the future, we will use the cpu features to fix kvmppc_need_byteswap() * the 'is_bigendian' parameter of the routines kvmppc_handle_load() and kvmppc_handle_store() seems redundant but the *BRX opcodes make the improvements unclear. We could eventually rename the parameter to byteswap and the attribute vcpu-arch.mmio_is_bigendian to vcpu-arch.mmio_need_byteswap. Anyhow, the current naming sucks and I would happy to have some directions to fix it. arch/powerpc/include/asm/kvm_book3s.h | 15 ++- arch/powerpc/kvm/book3s_64_mmu_hv.c |4 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 14 +- arch/powerpc/kvm/book3s_segment.S | 14 +- arch/powerpc/kvm/emulate.c | 71 +-- 5 files changed, 83 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 0ec00f4..36c5573 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -270,14 +270,22 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu) return vcpu-arch.pc; } +static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.shared-msr MSR_LE; +} + static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu) { ulong pc = kvmppc_get_pc(vcpu); /* Load the instruction manually if it failed to do so in the * exit path */ - if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED) + if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED) { kvmppc_ld(vcpu, pc, sizeof(u32), vcpu-arch.last_inst, false); + if (kvmppc_need_byteswap(vcpu)) + vcpu-arch.last_inst = swab32(vcpu-arch.last_inst); + } return vcpu-arch.last_inst; } @@ -293,8 +301,11 @@ static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu) /* Load the instruction manually if it failed to do so in the * exit path */ - if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED) + if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED) { kvmppc_ld(vcpu, pc, sizeof(u32), vcpu-arch.last_inst, false); + if (kvmppc_need_byteswap(vcpu)) + vcpu-arch.last_inst = swab32(vcpu-arch.last_inst); + } return vcpu-arch.last_inst; } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 3a89b85..28130c7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -547,6 +547,10 @@ static int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu, ret = kvmppc_ld(vcpu, srr0, sizeof(u32), last_inst, false); if (ret != EMULATE_DONE || last_inst == KVM_INST_FETCH_FAILED) return RESUME_GUEST; + + if (kvmppc_need_byteswap(vcpu)) + last_inst = swab32(last_inst); + vcpu-arch.last_inst = last_inst; } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index dd80953..1d3ee40 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1393,14 +1393,26 @@ fast_interrupt_c_return: lwz r8, 0(r10) mtmsrd r3 + ld r0, VCPU_MSR(r9) + + /* r10 = vcpu-arch.msr MSR_LE */ + rldicl r10, r0, 0, 63 + cmpdi r10, 0 + bne 2f + /* Store the result */ stw r8, VCPU_LAST_INST(r9) /* Unset guest mode. */ - li r0, KVM_GUEST_MODE_NONE +1: li r0, KVM_GUEST_MODE_NONE stb r0, HSTATE_IN_GUEST(r13) b guest_exit_cont + /* Swap and store the result */ +2: addir11, r9, VCPU_LAST_INST + stwbrx r8, 0, r11 + b 1b + /* * Similarly for an HISI, reflect it to the guest as an ISI unless * it is an HPTE not found fault for a page that we have paged out. diff --git
Re: [PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems
On Thu, Oct 03, 2013 at 08:06:30PM +1000, Paul Mackerras wrote: On Thu, Oct 03, 2013 at 08:48:03AM +0300, Gleb Natapov wrote: On Thu, Oct 03, 2013 at 08:45:42AM +1000, Paul Mackerras wrote: On Wed, Oct 02, 2013 at 04:36:05PM +0200, Alexander Graf wrote: On 02.10.2013, at 16:33, Paolo Bonzini wrote: Il 02/10/2013 16:08, Alexander Graf ha scritto: The hwrng is accessible by host userspace via /dev/mem. A guest should live on the same permission level as a user space application. If you run QEMU as UID 1000 without access to /dev/mem, why should the guest suddenly be able to directly access a memory location (MMIO) it couldn't access directly through a normal user space interface. It's basically a layering violation. With Michael's earlier patch in this series, the hwrng is accessible by host userspace via /dev/hwrng, no? Yes, but there's not token from user space that gets passed into the kernel to check whether access is ok or not. So while QEMU may not have permission to open /dev/hwrng it could spawn a guest that opens it, drains all entropy out of it and thus stall other processes which try to fetch entropy, no? Even if you drain all entropy out of it, wait 64 microseconds and it will be full again. :) Basically it produces 64 bits every microsecond and puts that in a 64 entry x 64-bit FIFO buffer, which is what is read by the MMIO. So there is no danger of stalling other processes for any significant amount of time. Even if user crates 100s guests each one of which reads hwrng in a loop? Well, you can't actually have more guests running than there are cores in a system. POWER7+ has one RNG per chip and 8 cores per chip, each of which can run 4 threads (which have to be in the same guest). Michael's code uses the RNG on the same chip. Worst case therefore is 32 threads accessing the same RNG, so a given thread might have to wait up to 32 microseconds for its data. OK, thanks. Even if it become an issue for some reason it is always possible to rate limit it. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] QEMU/KVM PowerPC: virtio and guest endianness
Hi, There have been some work on the topic lately but no agreement has been reached yet. I want to consolidate the facts in a single thread of mail and re-start the discussion. Please find below a recap of what we have as of today: From a virtio POV, guest endianness is reflected by the endianness of the interrupt vectors (ILE bit in the LPCR register). The guest kernel relies on the H_SET_MODE_RESOURCE_LE hcall to set this bit, early in the boot process. Rusty sent a patchset on qemu-devel@ to provide the necessary bits to perform byteswap in the QEMU: http://patchwork.ozlabs.org/patch/266451/ http://patchwork.ozlabs.org/patch/266452/ http://patchwork.ozlabs.org/patch/266450/ (plus other enablement patches for virtio drivers, not essential for the discussion). In non-KVM mode, QEMU implements the H_SET_MODE_RESOURCE_LE and updates its internal value for LPCR when the guest requests it. Rusty's patchset works out-of-the-box in this mode: I could successfully setup and use a 9p share over virtio transport (broader virtio testing still to be done though). When using KVM, the story is different : QEMU is not on this endianness change flow anymore, providing KVM has the following patch from Anton: http://patchwork.ozlabs.org/patch/277079/ There are *at least* two approaches to bring back endianness knowledge to QEMU: polling (1) and propagation (2). (1) QEMU must retrieve LPCR from the kernel using the following API: http://patchwork.ozlabs.org/patch/273029/ (2) KVM can resume execution to the host and thus propagating H_SET_MODE_RESOURCE_LE to QEMU. Laurent came up with a patch on linuxppc-dev@ to do this: http://patchwork.ozlabs.org/patch/278590/ I would say (1) is a standard and sane way of addressing the issue: since the LPCR register value is held by KVM, it makes sense to introduce an API to get/set it. Then, it is up to QEMU to use this API. We can dumbly do the polling in all the places where byteswapping matters: it is clearly sub-optimized, especially since the LPCR_ILE bit doesn't change so often. Rusty suggested we can retrieve it at virtio device reset time and cache it, since an endianness change after the devices have started to be used is non-sensical. I have searched for an appropriate place to add the polling and I must admit I did not find any... I am no QEMU expert but I suspect we would need some kind of arch specific hook to be called from the virtio code to do this... :-\ I hope I am wrong, please correct me if so. On the other hand, (2) looks a bit hacky: KVM usually returns to the host when it cannot fully handle the h_call. Propagating may look like a useless path to follow from a KVM POV. From a QEMU POV, things are different: propagation will trig the fallback code in QEMU, already working in non-KVM mode. Nothing more to be done. I have a better feeling for (2) because: - 2-liner patch in KVM - no extra code change in QEMU - already *partially* tested Also, I understood Rusty is working on the next virtio specification which should address the endian issue: probably not worth to add too many temporary lines in the QEMU code... Of course, I probably lack some essential knowledge that would be more favorable to (1)... so please comment and argue ! :) Thanks. -- Greg Kurz -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
looking for loan
Do you have a firm or company that need loan to start up a business or need,personal loan, Debt consolidation? For more information,Contact us now for a guarantee loan with low interest rate. We will provide you with loan to meet your needs. For more information contact us with the following information's. Full name: country: Address: Phone Number: Amount needed: Duration of loan: sg.loan...@outlook.com Kind regards -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()
-Original Message- From: Wood Scott-B07421 Sent: Thursday, October 03, 2013 12:04 AM To: Alexander Graf Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 1/2] kvm/powerpc: rename kvm_hypercall() to epapr_hypercall() On Wed, 2013-10-02 at 19:54 +0200, Alexander Graf wrote: On 02.10.2013, at 19:49, Scott Wood wrote: On Wed, 2013-10-02 at 19:46 +0200, Alexander Graf wrote: On 02.10.2013, at 19:42, Scott Wood wrote: On Wed, 2013-10-02 at 19:17 +0200, Alexander Graf wrote: On 02.10.2013, at 19:04, Scott Wood wrote: On Wed, 2013-10-02 at 18:53 +0200, Alexander Graf wrote: On 02.10.2013, at 18:40, Scott Wood wrote: On Wed, 2013-10-02 at 16:19 +0200, Alexander Graf wrote: Won't this break when CONFIG_EPAPR_PARAVIRT=n? We wouldn't have epapr_hcalls.S compiled into the code base then and the bl above would reference an unknown function. KVM_GUEST selects EPAPR_PARAVIRT. But you can not select KVM_GUEST and still call these inline functions, no? No. Like kvm_arch_para_features(). Where does that get called without KVM_GUEST? How would that work currently, with the call to kvm_hypercall() in arch/powerpc/kernel/kvm.c (which calls epapr_hypercall, BTW)? It wouldn't ever get called because kvm_hypercall() ends up always returning EV_UNIMPLEMENTED when #ifndef CONFIG_KVM_GUEST. OK, so the objection is to removing that stub? Where would we actually want to call this without knowing that KVM_GUEST or EPAPR_PARAVIRT are enabled? In probing code. I usually prefer if (kvm_feature_available(X)) { ... } over #ifdef CONFIG_KVM_GUEST if (kvm_feature_available(X)) { ... } #endif at least when I can avoid it. With the current code the compiler would be smart enough to just optimize out the complete branch. Sure. My point is, where would you be calling that where the entire file isn't predicated on (or selecting) CONFIG_KVM_GUEST or similar? We don't do these stubs for every single function in the kernel -- only ones where the above is a reasonable use case. Yeah, I'm fine on dropping it, but we need to make that a conscious decision and verify that no caller relies on it. kvm_para_has_feature() is called from arch/powerpc/kernel/kvm.c, arch/x86/kernel/kvm.c, and arch/x86/kernel/kvmclock.c, all of which are enabled by CONFIG_KVM_GUEST. I did find one example of kvm_para_available() being used in an unexpected place -- sound/pci/intel8x0.c. It defines its own non-CONFIG_KVM_GUEST stub, even though x86 defines kvm_para_available() using inline CPUID stuff which should work without CONFIG_KVM_GUEST. I'm not sure why it even needs to do that, though -- shouldn't the subsequent PCI subsystem vendor/device check should be sufficient? No hypercalls are involved. That said, the possibility that some random driver might want to make use of paravirt features is a decent argument for keeping the stub. I am not sure where we are agreeing on? Do we want to remove the stub in arch/powerpc/include/asm/kvm_para.h ? as there is no caller without KVM_GUEST and in future caller ensure this to be called only from code selected by KVM_GUEST? Or let this stub stay to avoid any random driver calling this ? Thanks -Bharat