RE: [PATCH v5 4/4] KVM: PPC: epapr: Update other hypercall invoking
> -Original Message- > From: Wood Scott-B07421 > Sent: Wednesday, February 22, 2012 5:58 AM > To: Liu Yu-B13201 > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > linuxppc-...@ozlabs.org; Wood Scott-B07421 > Subject: Re: [PATCH v5 4/4] KVM: PPC: epapr: Update other hypercall > invoking > > On 02/20/2012 10:46 PM, Liu Yu wrote: > > Discard the old way that invoke hypercall, instead, use epapr > > paravirt. > > > > Signed-off-by: Liu Yu > > --- > > v5: new patch > > > > arch/powerpc/include/asm/epapr_hcalls.h | 22 +- > > arch/powerpc/include/asm/fsl_hcalls.h | 36 +++ > --- > > 2 files changed, 29 insertions(+), 29 deletions(-) > > Make sure all the Topaz/ePAPR drivers that use this select EPAPR_PARAVIRT. > > Have you tested with Topaz? > Not yet test. Thanks, Yu
RE: [PATCH v5 1/4] KVM: PPC: epapr: Factor out the epapr init
> -Original Message- > From: Wood Scott-B07421 > Sent: Wednesday, February 22, 2012 5:56 AM > To: Liu Yu-B13201 > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > linuxppc-...@ozlabs.org; Wood Scott-B07421 > Subject: Re: [PATCH v5 1/4] KVM: PPC: epapr: Factor out the epapr init > > On 02/20/2012 10:46 PM, Liu Yu wrote: > > from the kvm guest paravirt init code. > > > > Signed-off-by: Liu Yu > > --- > > v5: > > 1. fix the if test > > 2. use patch_instruction() > > 3. code cleanup > > 4. rename the files > > 5. make epapr paravirt user-selectable > > > > arch/powerpc/include/asm/epapr_hcalls.h |2 + > > arch/powerpc/kernel/Makefile|1 + > > arch/powerpc/kernel/epapr_hcalls.S | 25 ++ > > arch/powerpc/kernel/epapr_paravirt.c| 54 > +++ > > arch/powerpc/kernel/kvm.c | 28 ++-- > > arch/powerpc/kernel/kvm_emul.S | 10 -- > > arch/powerpc/platforms/Kconfig |7 > > 7 files changed, 92 insertions(+), 35 deletions(-) create mode > > 100644 arch/powerpc/kernel/epapr_hcalls.S > > create mode 100644 arch/powerpc/kernel/epapr_paravirt.c > > > > diff --git a/arch/powerpc/include/asm/epapr_hcalls.h > > b/arch/powerpc/include/asm/epapr_hcalls.h > > index f3b0c2c..0ff3f24 100644 > > --- a/arch/powerpc/include/asm/epapr_hcalls.h > > +++ b/arch/powerpc/include/asm/epapr_hcalls.h > > @@ -148,6 +148,8 @@ > > #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, "r5" > > #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, "r4" > > > > +extern bool epapr_para_enabled; > > +extern u32 epapr_hypercall_start[]; > > I asked for s/epapr_para/epapr_paravirt/, at least in anything that is > exposed beyond a single file. > > > /* > > * We use "uintptr_t" to define a register because it's guaranteed to > > be a diff --git a/arch/powerpc/kernel/Makefile > > b/arch/powerpc/kernel/Makefile index ee728e4..ba8fa43 100644 > > --- a/arch/powerpc/kernel/Makefile > > +++ b/arch/powerpc/kernel/Makefile > > @@ -136,6 +136,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),) > > obj-y += ppc_save_regs.o > > endif > > > > +obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o > > obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o > > > > # Disable GCOV in odd or sensitive code diff --git > > a/arch/powerpc/kernel/epapr_hcalls.S > > b/arch/powerpc/kernel/epapr_hcalls.S > > new file mode 100644 > > index 000..697b390 > > --- /dev/null > > +++ b/arch/powerpc/kernel/epapr_hcalls.S > > @@ -0,0 +1,25 @@ > > +/* > > + * Copyright (C) 2012 Freescale Semiconductor, Inc. > > + * > > + * This program is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU General Public License > > + * as published by the Free Software Foundation; either version > > + * 2 of the License, or (at your option) any later version. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* Hypercall entry point. Will be patched with device tree > > +instructions. */ .global epapr_hypercall_start > > +epapr_hypercall_start: > > + li r3, -1 > > + nop > > + nop > > + nop > > + blr > > diff --git a/arch/powerpc/kernel/epapr_paravirt.c > > b/arch/powerpc/kernel/epapr_paravirt.c > > new file mode 100644 > > index 000..e601da7 > > --- /dev/null > > +++ b/arch/powerpc/kernel/epapr_paravirt.c > > @@ -0,0 +1,54 @@ > > +/* > > + * ePAPR para-virtualization support. > > + * > > + * This program is free software; you can redistribute it and/or > > +modify > > + * it under the terms of the GNU General Public License, version 2, > > +as > > + * published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program; if not, write to the Free Software > > + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
RE: [PATCH v4 1/3] KVM: PPC: epapr: Factor out the epapr init
> -Original Message- > From: Wood Scott-B07421 > Sent: Friday, February 17, 2012 1:13 AM > To: Liu Yu-B13201 > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > linuxppc-...@ozlabs.org; Wood Scott-B07421 > Subject: Re: [PATCH v4 1/3] KVM: PPC: epapr: Factor out the epapr init > > On 02/16/2012 03:26 AM, Liu Yu wrote: > > from the kvm guest paravirt init code. > > > > Signed-off-by: Liu Yu > > --- > > v4: > > 1. code cleanup > > 2. move kvm_hypercall_start() to epapr_hypercall_start() > > > > arch/powerpc/Kconfig|4 ++ > > arch/powerpc/include/asm/epapr_hcalls.h |2 + > > arch/powerpc/kernel/Makefile|1 + > > arch/powerpc/kernel/epapr.S | 25 > > arch/powerpc/kernel/epapr_para.c| 49 > +++ > > arch/powerpc/kernel/kvm.c | 28 ++ > > arch/powerpc/kernel/kvm_emul.S | 10 -- > > arch/powerpc/kvm/Kconfig|1 + > > 8 files changed, 85 insertions(+), 35 deletions(-) create mode > > 100644 arch/powerpc/kernel/epapr.S create mode 100644 > > arch/powerpc/kernel/epapr_para.c > > The comment about spelling out "paravirt" wasnn't meant to be restricted > to the kconfig symbol. There are lots of words that begin with "para", > and ePAPR isn't just about virtualization. What do you mean? Do you suggest that we should name it epapr_paravirt.c? Thanks, Yu
RE: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for host
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, February 16, 2012 6:20 PM > To: Liu Yu-B13201 > Cc: ; ; d...@ozlabs.org>; Wood Scott-B07421; Liu Yu-B13201 > Subject: Re: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for > host > > > > On 16.02.2012, at 10:26, Liu Yu wrote: > > > And add a new flag definition in kvm_ppc_pvinfo to indicate whether > > host support EV_IDLE hcall. > > > > Signed-off-by: Liu Yu > > --- > > v4: > > no change > > > > arch/powerpc/include/asm/kvm_para.h | 14 -- > > arch/powerpc/kvm/powerpc.c |8 > > include/linux/kvm.h |2 ++ > > 3 files changed, 22 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/kvm_para.h > > b/arch/powerpc/include/asm/kvm_para.h > > index 7b754e7..81a34c9 100644 > > --- a/arch/powerpc/include/asm/kvm_para.h > > +++ b/arch/powerpc/include/asm/kvm_para.h > > @@ -75,9 +75,19 @@ struct kvm_vcpu_arch_shared { }; > > > > #define KVM_SC_MAGIC_R00x4b564d21 /* "KVM!" */ > > -#define HC_VENDOR_KVM(42 << 16) > > + > > +#include > > + > > +/* ePAPR Hypercall Vendor ID */ > > +#define HC_VENDOR_EPAPR(EV_EPAPR_VENDOR_ID << 16) > > +#define HC_VENDOR_KVM(EV_KVM_VENDOR_ID << 16) > > + > > +/* ePAPR Hypercall Token */ > > +#define HC_EV_IDLEEV_IDLE > > + > > +/* ePAPR Hypercall Return Codes */ > > #define HC_EV_SUCCESS0 > > -#define HC_EV_UNIMPLEMENTED12 > > +#define HC_EV_UNIMPLEMENTEDEV_UNIMPLEMENTED > > > > #define KVM_FEATURE_MAGIC_PAGE1 > > > > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > > index 0e21d15..03ebd5d 100644 > > --- a/arch/powerpc/kvm/powerpc.c > > +++ b/arch/powerpc/kvm/powerpc.c > > @@ -81,6 +81,10 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) > > > >/* Second return value is in r4 */ > >break; > > +case HC_VENDOR_EPAPR | HC_EV_IDLE: > > +r = HC_EV_SUCCESS; > > +kvm_vcpu_block(vcpu); > > +break; > >default: > >r = HC_EV_UNIMPLEMENTED; > >break; > > @@ -746,6 +750,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct > kvm_ppc_pvinfo *pvinfo) > >pvinfo->hcall[2] = inst_sc; > >pvinfo->hcall[3] = inst_nop; > > > > +#ifdef CONFIG_BOOKE > > +pvinfo->flags |= KVM_PPC_PVINFO_FLAGS_EV_IDLE; #endif > > + > >return 0; > > }> > > Why limit it to booke? The less ifdefs our code has, the better :) The code here tells userspace that kvm support ev_idle. But I'm not sure if the ev_idle code works for other platforms. So I think we should keep the ifdef until other platform test the code :) Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init
> -Original Message- > From: Wood Scott-B07421 > Sent: Saturday, February 11, 2012 2:40 AM > To: Liu Yu-B13201 > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > linuxppc-...@ozlabs.org; Wood Scott-B07421 > Subject: Re: [PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init > > On 02/10/2012 04:02 AM, Liu Yu wrote: > > from the kvm guest paravirt init code. > > > > Signed-off-by: Liu Yu > > --- > > v3: > > apply the epapr init for all ppc platform > > > > arch/powerpc/Kconfig|4 +++ > > arch/powerpc/include/asm/epapr_hcalls.h |8 + > > arch/powerpc/kernel/Makefile|1 + > > arch/powerpc/kernel/epapr_para.c| 46 > +++ > > arch/powerpc/kernel/kvm.c | 13 +++-- > > arch/powerpc/kvm/Kconfig|1 + > > 6 files changed, 64 insertions(+), 9 deletions(-) create mode 100644 > > arch/powerpc/kernel/epapr_para.c > > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index > > 47682b6..00bd508 100644 > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -196,6 +196,10 @@ config EPAPR_BOOT > > Used to allow a board to specify it wants an ePAPR compliant > wrapper. > > default n > > > > +config EPAPR_PARA > > + bool > > + default n > > EPAPR_PARAVIRT > > > config DEFAULT_UIMAGE > > bool > > help > > diff --git a/arch/powerpc/include/asm/epapr_hcalls.h > > b/arch/powerpc/include/asm/epapr_hcalls.h > > index f3b0c2c..c4b86e4 100644 > > --- a/arch/powerpc/include/asm/epapr_hcalls.h > > +++ b/arch/powerpc/include/asm/epapr_hcalls.h > > @@ -148,6 +148,14 @@ > > #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, "r5" > > #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, "r4" > > > > +extern u32 *epapr_hcall_insts; > > +extern int epapr_hcall_insts_len; > > + > > +static inline void epapr_get_hcall_insts(u32 **instp, int *lenp) { > > + *instp = epapr_hcall_insts; > > + *lenp = epapr_hcall_insts_len; > > +} > > Why do we need a function for this? Why is the public interface anything > other than "invoke a hypercall"? > > > +static int __init epapr_para_init(void) { > > + struct device_node *hyper_node; > > + u32 *insts; > > + int len; > > + > > + hyper_node = of_find_node_by_path("/hypervisor"); > > + if (!hyper_node) > > + return -ENODEV; > > + > > + insts = (u32*)of_get_property(hyper_node, "hcall-instructions", > > +&len); > > Do not cast away that const. > > > @@ -535,18 +536,12 @@ EXPORT_SYMBOL_GPL(kvm_hypercall); static int > > kvm_para_setup(void) { > > extern u32 kvm_hypercall_start; > > - struct device_node *hyper_node; > > u32 *insts; > > int len, i; > > > > - hyper_node = of_find_node_by_path("/hypervisor"); > > - if (!hyper_node) > > - return -1; > > - > > - insts = (u32*)of_get_property(hyper_node, "hcall-instructions", > &len); > > - if (len % 4) > > - return -1; > > - if (len > (4 * 4)) > > + insts = epapr_hcall_insts; > > + len = epapr_hcall_insts_len; > > + if (insts == NULL) > > return -1; > > > > for (i = 0; i < (len / 4); i++) > > Why are you still doing the patching inside kvm.c? > Do you mean we should move kvm_hypercall_start() into epapr bit? Thanks, Yu
RE: [PATCH 2/2] KVM: PPC: epapr: Install ev_idle hcall for paravirt guest linux
> -Original Message- > From: Wood Scott-B07421 > Sent: Tuesday, January 03, 2012 2:23 AM > To: Liu Yu-B13201 > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org > Subject: Re: [PATCH 2/2] KVM: PPC: epapr: Install ev_idle hcall for > paravirt guest linux > > On 12/31/2011 12:16 AM, Liu Yu wrote: > > If the guest hypervisor node contains "has-idle" property. > > > > Signed-off-by: Liu Yu > > --- > > arch/powerpc/kernel/kvm.c | 29 + > > 1 files changed, 29 insertions(+), 0 deletions(-) > > > > diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c > > index b06bdae..258f129 100644 > > --- a/arch/powerpc/kernel/kvm.c > > +++ b/arch/powerpc/kernel/kvm.c > > @@ -28,6 +28,7 @@ > > #include > > #include > > #include > > +#include > > > > #define KVM_MAGIC_PAGE (-4096L) > > #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct > > kvm_vcpu_arch_shared, x) @@ -571,6 +572,30 @@ static __init void > kvm_free_tmp(void) > > } > > } > > > > +static void kvm_hcall_ev_idle(void) > > +{ > > + ulong in[8]; > > + ulong out[8]; > > + > > + current_thread_info()->local_flags |= _TLF_NAPPING; > > + local_irq_enable(); > > + kvm_hypercall(in, out, HC_VENDOR_EPAPR | HC_EV_IDLE); } > > I think this has to be done in assembly -- otherwise it's not safe for > the interrupt handler to return directly to LR. > > > +static int kvm_para_ev_has_idle(void) { > > + struct device_node *hyper_node; > > + > > + hyper_node = of_find_node_by_path("/hypervisor"); > > + if (!hyper_node) > > + return 0; > > + > > + if (of_get_property(hyper_node, "has-idle", NULL)) > > + return 1; > > + > > + return 0; > > +} > > Since this is standardized in ePAPR, can we move it out of kvm.c an into > generic code to work with any hypervisor that sets has-idle? Likewise > with kvm_para_setup(). > Do you mean we scan and parse hypervisor node in early_init_devtree()? Thanks, Yu N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
RE: [PATCH] KVM: PPC: E500: Support hugetlbfs
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Tuesday, September 20, 2011 7:36 AM > To: kvm-...@vger.kernel.org > Cc: kvm@vger.kernel.org > Subject: [PATCH] KVM: PPC: E500: Support hugetlbfs > > With hugetlbfs support emerging on e500, we should also support KVM > backing its guest memory by it. > > This patch adds support for hugetlbfs into the e500 shadow mmu code. > > Signed-off-by: Alexander Graf > --- > arch/powerpc/kvm/e500_tlb.c | 22 ++ > 1 files changed, 22 insertions(+), 0 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c > index ec17148..64f75eb 100644 > --- a/arch/powerpc/kvm/e500_tlb.c > +++ b/arch/powerpc/kvm/e500_tlb.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -673,13 +674,34 @@ static inline void > kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > pfn &= ~(tsize_pages - 1); > break; > } > + } else if (vma && hva >= vma->vm_start && > + (vma->vm_flags & VM_HUGETLB)) { Why check (vma && hva >= vma->vm_start) twice? Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, September 17, 2010 7:34 PM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > Subject: Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0 > > > On 17.09.2010, at 13:28, Liu Yu-B13201 wrote: > > > > > > >> -Original Message- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Friday, September 17, 2010 6:20 PM > >> To: Liu Yu-B13201 > >> Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > >> Subject: Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to > host TLB0 > >> > >> > >> On 17.09.2010, at 10:47, Liu Yu-B13201 wrote: > >> > >>> > >>> > >>>> -Original Message- > >>>> From: kvm-ppc-ow...@vger.kernel.org > >>>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of > Alexander Graf > >>>> Sent: Thursday, September 16, 2010 7:44 PM > >>>> To: Liu Yu-B13201 > >>>> Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > >>>> Subject: Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to > >> host TLB0 > >>>> > >>>>>> > >>>>>>> /* > >>>>>>> * writing shadow tlb entry to host TLB > >>>>>>> */ > >>>>>>> -static inline void __write_host_tlbe(struct tlbe *stlbe) > >>>>>>> +static inline void __host_tlbe_write(struct tlbe *stlbe) > >>>>>>> { > >>>>>>> mtspr(SPRN_MAS1, stlbe->mas1); > >>>>>>> mtspr(SPRN_MAS2, stlbe->mas2); > >>>>>>> @@ -109,25 +110,22 @@ static inline void > >>>>>>> > >>>>>> __write_host_tlbe(struct tlbe *stlbe) > >>>>>> > >>>>>>> __asm__ __volatile__ ("tlbwe\n" : : ); > >>>>>>> } > >>>>>>> > >>>>>>> -static inline void write_host_tlbe(struct kvmppc_vcpu_e500 > >>>>>>> > >>>>>> *vcpu_e500, > >>>>>> > >>>>>>> - int tlbsel, int esel, struct tlbe *stlbe) > >>>>>>> +static inline u32 host_tlb0_write(struct kvmppc_vcpu_e500 > >>>>>>> > >>>>>> *vcpu_e500, > >>>>>> > >>>>>>> + u32 gvaddr, struct > >>>> tlbe *stlbe) > >>>>>>> { > >>>>>>> - local_irq_disable(); > >>>>>>> - if (tlbsel == 0) { > >>>>>>> - __write_host_tlbe(stlbe); > >>>>>>> - } else { > >>>>>>> - unsigned register mas0; > >>>>>>> + unsigned register mas0; > >>>>>>> > >>>>>>> - mas0 = mfspr(SPRN_MAS0); > >>>>>>> + local_irq_disable(); > >>>>>>> > >>>>>> Do you have to disable interrupts for a tlb write? If you get > >>>>>> preempted before the write, the entry you overwrite could be > >>>>>> different. But you don't protect against that either way. And > >>>>>> if you get preempted afterwards, you could lose the entry. > >>>>>> But since you enable interrupts beyond that, that could > >>>>>> happen either way too. > >>>>>> > >>>>>> So what's the reason for the disable here? > >>>>>> > >>>>>> > >>>>> > >>>>> Hello Alex, > >>>>> > >>>>> Doesn't local_irq_disable() also disable preempt? > >>>>> The reason to disable interrupts is because it's possible > >>>> to have tlb > >>>>> misses in interrupt service route. > >>>>> > >>>> > >>>> Yes, local_irq_disable disables preempt. That's exactly > what I was > >>>> referring to :). > >>>> I don't understand the statement about the service route. > >> A tlb miss > >>>> still arrives with local_irq_disable. > >>>> > >>>> > >>> > >>> Use local_irq_disable here is to make sure we update masN > in atomic. > >>> If get preempted when we update part of masN, > >>> then we may write wrong TLB entry in hardware. > >> > >> I see, makes sense. Doing the mfmsr of MAS0 in an irq > >> disabled section doesn't make sense to me though. > > > > Hardware choose the position and put TLB entry by itself. > > I need to get the position from MAS0. > > So that I know exactly which old entry is overrided. > > Huh? But you do the mfmsr before the tlb write? Or does MAS0 > contain the "next" tlb pointer? There's no mfmsr. The MAS0 contain the position we are going to write the entry in. > > > > >> Also, wouldn't it be better to do the irq disabling inside of > >> __host_tlbe_write using irqsave, not the explicit enable/disable? > >> > > > > Oh? What benefit we can get from this? > > Cleaner structure. If writing a tlb entry needs to happen > atomically, the function you call should be atomic. In your > case that's reasonably simple to do. Also, you could return > MAS0 in that function. That would make things a lot more > obvious. While you're at it, a comment explaining what the > function does doesn't hurt either :) > Sound reasonable.:) but why using irqsave? Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, September 17, 2010 6:20 PM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > Subject: Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0 > > > On 17.09.2010, at 10:47, Liu Yu-B13201 wrote: > > > > > > >> -Original Message- > >> From: kvm-ppc-ow...@vger.kernel.org > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > >> Sent: Thursday, September 16, 2010 7:44 PM > >> To: Liu Yu-B13201 > >> Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > >> Subject: Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to > host TLB0 > >> > >>>> > >>>>> /* > >>>>> * writing shadow tlb entry to host TLB > >>>>> */ > >>>>> -static inline void __write_host_tlbe(struct tlbe *stlbe) > >>>>> +static inline void __host_tlbe_write(struct tlbe *stlbe) > >>>>> { > >>>>> mtspr(SPRN_MAS1, stlbe->mas1); > >>>>> mtspr(SPRN_MAS2, stlbe->mas2); > >>>>> @@ -109,25 +110,22 @@ static inline void > >>>>> > >>>> __write_host_tlbe(struct tlbe *stlbe) > >>>> > >>>>> __asm__ __volatile__ ("tlbwe\n" : : ); > >>>>> } > >>>>> > >>>>> -static inline void write_host_tlbe(struct kvmppc_vcpu_e500 > >>>>> > >>>> *vcpu_e500, > >>>> > >>>>> - int tlbsel, int esel, struct tlbe *stlbe) > >>>>> +static inline u32 host_tlb0_write(struct kvmppc_vcpu_e500 > >>>>> > >>>> *vcpu_e500, > >>>> > >>>>> + u32 gvaddr, struct > >> tlbe *stlbe) > >>>>> { > >>>>> - local_irq_disable(); > >>>>> - if (tlbsel == 0) { > >>>>> - __write_host_tlbe(stlbe); > >>>>> - } else { > >>>>> - unsigned register mas0; > >>>>> + unsigned register mas0; > >>>>> > >>>>> - mas0 = mfspr(SPRN_MAS0); > >>>>> + local_irq_disable(); > >>>>> > >>>> Do you have to disable interrupts for a tlb write? If you get > >>>> preempted before the write, the entry you overwrite could be > >>>> different. But you don't protect against that either way. And > >>>> if you get preempted afterwards, you could lose the entry. > >>>> But since you enable interrupts beyond that, that could > >>>> happen either way too. > >>>> > >>>> So what's the reason for the disable here? > >>>> > >>>> > >>> > >>> Hello Alex, > >>> > >>> Doesn't local_irq_disable() also disable preempt? > >>> The reason to disable interrupts is because it's possible > >> to have tlb > >>> misses in interrupt service route. > >>> > >> > >> Yes, local_irq_disable disables preempt. That's exactly what I was > >> referring to :). > >> I don't understand the statement about the service route. > A tlb miss > >> still arrives with local_irq_disable. > >> > >> > > > > Use local_irq_disable here is to make sure we update masN in atomic. > > If get preempted when we update part of masN, > > then we may write wrong TLB entry in hardware. > > I see, makes sense. Doing the mfmsr of MAS0 in an irq > disabled section doesn't make sense to me though. Hardware choose the position and put TLB entry by itself. I need to get the position from MAS0. So that I know exactly which old entry is overrided. > Also, wouldn't it be better to do the irq disabling inside of > __host_tlbe_write using irqsave, not the explicit enable/disable? > Oh? What benefit we can get from this? Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Thursday, September 16, 2010 7:44 PM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > Subject: Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0 > > >> > >>> /* > >>> * writing shadow tlb entry to host TLB > >>> */ > >>> -static inline void __write_host_tlbe(struct tlbe *stlbe) > >>> +static inline void __host_tlbe_write(struct tlbe *stlbe) > >>> { > >>> mtspr(SPRN_MAS1, stlbe->mas1); > >>> mtspr(SPRN_MAS2, stlbe->mas2); > >>> @@ -109,25 +110,22 @@ static inline void > >>> > >> __write_host_tlbe(struct tlbe *stlbe) > >> > >>> __asm__ __volatile__ ("tlbwe\n" : : ); > >>> } > >>> > >>> -static inline void write_host_tlbe(struct kvmppc_vcpu_e500 > >>> > >> *vcpu_e500, > >> > >>> - int tlbsel, int esel, struct tlbe *stlbe) > >>> +static inline u32 host_tlb0_write(struct kvmppc_vcpu_e500 > >>> > >> *vcpu_e500, > >> > >>> + u32 gvaddr, struct > tlbe *stlbe) > >>> { > >>> - local_irq_disable(); > >>> - if (tlbsel == 0) { > >>> - __write_host_tlbe(stlbe); > >>> - } else { > >>> - unsigned register mas0; > >>> + unsigned register mas0; > >>> > >>> - mas0 = mfspr(SPRN_MAS0); > >>> + local_irq_disable(); > >>> > >> Do you have to disable interrupts for a tlb write? If you get > >> preempted before the write, the entry you overwrite could be > >> different. But you don't protect against that either way. And > >> if you get preempted afterwards, you could lose the entry. > >> But since you enable interrupts beyond that, that could > >> happen either way too. > >> > >> So what's the reason for the disable here? > >> > >> > > > > Hello Alex, > > > > Doesn't local_irq_disable() also disable preempt? > > The reason to disable interrupts is because it's possible > to have tlb > > misses in interrupt service route. > > > > Yes, local_irq_disable disables preempt. That's exactly what I was > referring to :). > I don't understand the statement about the service route. A tlb miss > still arrives with local_irq_disable. > > Use local_irq_disable here is to make sure we update masN in atomic. If get preempted when we update part of masN, then we may write wrong TLB entry in hardware. And local_irq_disable blocks external and decrement interrupts. Althought it's unlikely to have tlb miss in external and decrement interrrupt handler of current kernel. It still has the possibility.. Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, September 10, 2010 7:24 AM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > Subject: Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0 > > > On 08.09.2010, at 11:40, Liu Yu wrote: > > > Current guest TLB1 is mapped to host TLB1. > > As host kernel only provides 4K uncontinuous pages, > > we have to break guest large mapping into 4K shadow mappings. > > These 4K shadow mappings are then mapped into host TLB1 on fly. > > As host TLB1 only has 13 free entries, there's serious tlb miss. > > > > Since e500v2 has a big number of TLB0 entries, > > it should be help to map those 4K shadow mappings to host TLB0. > > To achieve this, we need to unlink guest tlb and host tlb, > > So that guest TLB1 mappings can route to any host TLB0 > entries freely. > > > > Pages/mappings are considerred in the same kind as host tlb entry. > > This patch remove the link between pages and guest tlb > entry to do the unlink. > > And keep host_tlb0_ref in each vcpu to trace pages. > > Then it's easy to map guest TLB1 to host TLB0. > > > > In guest ramdisk boot test(guest mainly uses TLB1), > > with this patch, the tlb miss number get down 90%. > > > > Signed-off-by: Liu Yu > > --- > > arch/powerpc/include/asm/kvm_e500.h |7 +- > > arch/powerpc/kvm/e500.c |4 + > > arch/powerpc/kvm/e500_tlb.c | 280 > --- > > arch/powerpc/kvm/e500_tlb.h |1 + > > 4 files changed, 104 insertions(+), 188 deletions(-) > > > > > > > -static unsigned int tlb1_entry_num; > > +static inline unsigned int get_tlb0_entry_offset(u32 > eaddr, u32 esel) > > +{ > > + return ((eaddr & 0x7F000) >> (12 - host_tlb0_assoc_bit) | > > + (esel & (host_tlb0_assoc - 1))); > > +} > > > > void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu) > > { > > @@ -62,11 +68,6 @@ static inline unsigned int tlb0_get_next_victim( > > return victim; > > } > > > > -static inline unsigned int tlb1_max_shadow_size(void) > > -{ > > - return tlb1_entry_num - tlbcam_index; > > -} > > - > > static inline int tlbe_is_writable(struct tlbe *tlbe) > > { > > return tlbe->mas3 & (MAS3_SW|MAS3_UW); > > @@ -100,7 +101,7 @@ static inline u32 > e500_shadow_mas2_attrib(u32 mas2, int usermode) > > /* > > * writing shadow tlb entry to host TLB > > */ > > -static inline void __write_host_tlbe(struct tlbe *stlbe) > > +static inline void __host_tlbe_write(struct tlbe *stlbe) > > { > > mtspr(SPRN_MAS1, stlbe->mas1); > > mtspr(SPRN_MAS2, stlbe->mas2); > > @@ -109,25 +110,22 @@ static inline void > __write_host_tlbe(struct tlbe *stlbe) > > __asm__ __volatile__ ("tlbwe\n" : : ); > > } > > > > -static inline void write_host_tlbe(struct kvmppc_vcpu_e500 > *vcpu_e500, > > - int tlbsel, int esel, struct tlbe *stlbe) > > +static inline u32 host_tlb0_write(struct kvmppc_vcpu_e500 > *vcpu_e500, > > + u32 gvaddr, struct tlbe *stlbe) > > { > > - local_irq_disable(); > > - if (tlbsel == 0) { > > - __write_host_tlbe(stlbe); > > - } else { > > - unsigned register mas0; > > + unsigned register mas0; > > > > - mas0 = mfspr(SPRN_MAS0); > > + local_irq_disable(); > > Do you have to disable interrupts for a tlb write? If you get > preempted before the write, the entry you overwrite could be > different. But you don't protect against that either way. And > if you get preempted afterwards, you could lose the entry. > But since you enable interrupts beyond that, that could > happen either way too. > > So what's the reason for the disable here? > Hello Alex, Doesn't local_irq_disable() also disable preempt? The reason to disable interrupts is because it's possible to have tlb misses in interrupt service route. Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2] kvm/e500v2: MMU optimization
> -Original Message- > From: Hollis Blanchard [mailto:hollis_blanch...@mentor.com] > Sent: Friday, September 10, 2010 12:23 AM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org; ag...@suse.de > Subject: Re: [PATCH 0/2] kvm/e500v2: MMU optimization > > >> > >> > > Hi Hollis, > > > > Guest uses AS=1 and host uses AS=0, > > so even guest uses the same TID with host, they're in > different space. > > > > Then why guest needs to care about host TID? > > > > > You're absolutely right, but this makes a couple key assumptions: > 1. The guest doesn't try to use AS=1. This is already false in Linux, > because the udbg code uses an AS=1 mapping for the UART, but > this can be > configured out (with a small loss in functionality). In > non-Linux guests > the AS=0 restriction could be onerous. We could map (guest AS, guest TID) to (shadow TID), So that we still don't need to bother host. > 2. A Book E MMU. If we participate in the host "MMU context" > allocation, > the guest -> host address space code could be generalized to include > e.g. an e600 guest on an e500 host, or vice versa. > Hmm.. Not sure it's a real requirement. Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/2] kvm/e500v2: MMU optimization
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard > Sent: Thursday, September 09, 2010 12:07 AM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org; ag...@suse.de > Subject: Re: [PATCH 0/2] kvm/e500v2: MMU optimization > > On 09/08/2010 02:40 AM, Liu Yu wrote: > > The patchset aims at mapping guest TLB1 to host TLB0. > > And it includes: > > [PATCH 1/2] kvm/e500v2: Remove shadow tlb > > [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0 > > > > The reason we need patch 1 is because patch 1 make things > simple and flexible. > > Only applying patch 1 aslo make kvm work. > > I've always thought the best long-term "optimization" on > these cores is > to share in the host PID allocation (i.e. __init_new_context()). This > way, the TID in guest mappings would not overlap the TID in host > mappings, and guest mappings could be demand-faulted rather > than swapped > wholesale. To do that, you would need to track the host PID > in KVM data > structures, I guess in the tlbe_ref structure. > Hi Hollis, Guest uses AS=1 and host uses AS=0, so even guest uses the same TID with host, they're in different space. Then why guest needs to care about host TID? Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] kvm/e500v2: Remove shadow tlb
> -Original Message- > From: Hollis Blanchard [mailto:hollis_blanch...@mentor.com] > Sent: Thursday, September 09, 2010 12:07 AM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org; ag...@suse.de > Subject: Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb > > On 09/08/2010 02:40 AM, Liu Yu wrote: > > It is unnecessary to keep shadow tlb. > > first, shadow tlb keep fixed value in shadow, which make > things unflexible. > > second, remove shadow tlb can save a lot memory. > > > > This patch remove shadow tlb and caculate the shadow tlb entry value > > before we write it to hardware. > > > > Also we use new struct tlbe_ref to trace the relation > > between guest tlb entry and page. > > Did you look at the performance impact? > > Back in the day, we did essentially the same thing on 440. However, > rather than discard the whole TLB when context switching away > from the > host (to be demand-faulted when the guest is resumed), we found a > noticeable performance improvement by preserving a shadow TLB across > context switches. We only use it in the vcpu_put/vcpu_load path. > > Of course, our TLB was much smaller (64 entries), so the use > model may > not be the same at all (e.g. it takes longer to restore a > full guest TLB > working set, but maybe it's not really possible to use all 1024 TLB0 > entries in one timeslice anyways). > Yes, it's hard to resume TLB0. We only resume TLB1 in previous code. But TLB1 is even more smaller (13 free entries) than 440, So that it still has little possibility to get hit. thus the resumption is useless. And we plan to use shadow PID to minimize the TLB invalidation. Then we don't need to invalidate TLB when guest schedule out. So that we don't need to resume TLB entries. But thit method require dynamic TID in shadow TLB, So we wouldn't like to keep shadow TLB anyway. Thanks, Yu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 14/27] KVM: PPC: Magic Page BookE support
> -Original Message- > From: kvm-ow...@vger.kernel.org > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Thursday, July 01, 2010 6:43 PM > To: kvm-...@vger.kernel.org > Cc: KVM list; linuxppc-dev > Subject: [PATCH 14/27] KVM: PPC: Magic Page BookE support > > As we now have Book3s support for the magic page, we also > need BookE to > join in on the party. > > This patch implements generic magic page logic for BookE and specific > TLB logic for e500. I didn't have any 440 around, so I didn't dare to > blindly try and write up broken code. > > Signed-off-by: Alexander Graf > --- > arch/powerpc/kvm/booke.c| 29 + > arch/powerpc/kvm/e500_tlb.c | 19 +-- > 2 files changed, 46 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > index 0f8ff9d..9609207 100644 > --- a/arch/powerpc/kvm/booke.c > +++ b/arch/powerpc/kvm/booke.c > @@ -380,6 +406,9 @@ int kvmppc_handle_exit(struct kvm_run > *run, struct kvm_vcpu *vcpu, > gpa_t gpaddr; > gfn_t gfn; > > + if (kvmppc_dtlb_magic_page(vcpu, eaddr)) > + break; > + > /* Check the guest TLB. */ > gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr); > if (gtlb_index < 0) { How about moving this part into tlb search fail path? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c
> -Original Message- > From: kvm-ow...@vger.kernel.org > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Friday, June 04, 2010 10:08 PM > To: Kevin Hao > Cc: Marcelo Tosatti; Kumar Gala; Avi Kivity; > kvm@vger.kernel.org list; linuxppc-...@ozlabs.org; Liu Yu-B13201 > Subject: Re: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c > > > On 03.06.2010, at 07:52, Kevin Hao wrote: > > > We use the wrong number arguments when invoking > trace_kvm_stlb_inval, > > and cause the following build error. > > arch/powerpc/kvm/e500_tlb.c: In function > 'kvmppc_e500_stlbe_invalidate': > > arch/powerpc/kvm/e500_tlb.c:230: error: too many arguments > to function 'trace_kvm_stlb_inval' > > Liu, I'd like to get an ack from you here. > Seems commit e43f2f741a49483034bf968841275cfa553a4cb3 has solved this. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
> -Original Message- > From: kvm-ow...@vger.kernel.org > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Zhang, Yanmin > Sent: Monday, April 19, 2010 1:33 PM > To: Avi Kivity > Cc: Ingo Molnar; Peter Zijlstra; Avi Kivity; Sheng Yang; > linux-ker...@vger.kernel.org; kvm@vger.kernel.org; Marcelo > Tosatti; oerg Roedel; Jes Sorensen; Gleb Natapov; Zachary > Amsden; zhiteng.hu...@intel.com; tim.c.c...@intel.com; > Arnaldo Carvalho de Melo > Subject: [PATCH V5 1/3] perf & kvm: Enhance perf to collect > KVM guest os statistics from host side > > Below patch introduces perf_guest_info_callbacks and related > register/unregister > functions. Add more PERF_RECORD_MISC_XXX bits meaning guest > kernel and guest user > space. > > Signed-off-by: Zhang Yanmin > > --- > diff -Nraup --exclude-from=exclude.diff > linux-2.6_tip0417/include/linux/perf_event.h > linux-2.6_tip0417_perfkvm/include/linux/perf_event.h > --- linux-2.6_tip0417/include/linux/perf_event.h > 2010-04-19 09:51:59.544791000 +0800 > +++ linux-2.6_tip0417_perfkvm/include/linux/perf_event.h > 2010-04-19 09:53:59.691378953 +0800 > @@ -932,6 +940,12 @@ static inline void perf_event_mmap(struc > __perf_event_mmap(vma); > } > > +extern struct perf_guest_info_callbacks *perf_guest_cbs; > +extern int perf_register_guest_info_callbacks( > + struct perf_guest_info_callbacks *); > +extern int perf_unregister_guest_info_callbacks( > + struct perf_guest_info_callbacks *); > + > extern void perf_event_comm(struct task_struct *tsk); > extern void perf_event_fork(struct task_struct *tsk); > > @@ -1001,6 +1015,11 @@ perf_sw_event(u32 event_id, u64 nr, int > static inline void > perf_bp_event(struct perf_event *event, void *data) > { } > > +static inline int perf_register_guest_info_callbacks > +(struct perf_guest_info_callbacks *) {return 0; } > +static inline int perf_unregister_guest_info_callbacks > +(struct perf_guest_info_callbacks *) {return 0; } > + > static inline void perf_event_mmap(struct vm_area_struct > *vma) { } > static inline void perf_event_comm(struct task_struct *tsk) > { } > static inline void perf_event_fork(struct task_struct *tsk) > { } Hi, I met this error when built kernel. Anything wrong? CC init/main.o In file included from include/linux/ftrace_event.h:8, from include/trace/syscall.h:6, from include/linux/syscalls.h:75, from init/main.c:16: include/linux/perf_event.h: In function 'perf_register_guest_info_callbacks': include/linux/perf_event.h:1019: error: parameter name omitted include/linux/perf_event.h: In function 'perf_unregister_guest_info_callbacks': include/linux/perf_event.h:1021: error: parameter name omitted make[1]: *** [init/main.o] Error 1 make: *** [init] Error 2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: PPC: E500 compile fix
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, March 26, 2010 5:10 PM > To: Bruce Majia > Cc: Avi Kivity; kvm@vger.kernel.org; kvm-ppc; Liu Yu-B13201 > Subject: Re: [PATCH] KVM: PPC: E500 compile fix > > > On 26.03.2010, at 08:15, Bruce Majia wrote: > > > On Sun, Jan 17, 2010 at 03:06:11PM +0200, Avi Kivity wrote: > >> On 01/10/2010 07:01 PM, Alexander Graf wrote: > >>> While trying to compile an E500 vmlinux, I stumbled > across a compilation bug > >>> that was obviously there before I touched any of the > code. A trace point > >>> doesn't get the correct arguments. > >>> > >>> Since that shouldn't be any critical to the functionality > of the code, my quick > >>> workaround is to #if 0 it out. I would very much > appreciate someone fixing it > >>> properly though. > >>> > >>> Liu, it would be nice if you could be the one doing that. > >>> > >> > >> Applied, thanks. > > > > Hi Avi, > > > > Seems the problem still exists in the mainline kernel. Are > you guys miss > > something in the past few months? Does it means even if I > passed compile > > phase with the '#if 0' workaround, it's functions are not > completed yet. > > So its impossible to try kvm on ppc e500 target? > > If #if 0 only affects you if you want to trace that specific > function. It's probably safe to say you don't care, so just > #if 0 it out and be good :-). > The trace info here is not that important. And will be removed in future patches... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4] kvmppc/booke: exit_nr fixup for guest debug single step
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Wednesday, February 03, 2010 6:14 PM > To: Liu Yu-B13201 > Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > kvm@vger.kernel.org > Subject: Re: [PATCH 4/4] kvmppc/booke: exit_nr fixup for > guest debug single step > > Liu Yu-B13201 wrote: > > > > > > > >> -Original Message- > >> From: kvm-ppc-ow...@vger.kernel.org > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > >> Sent: Wednesday, February 03, 2010 5:03 PM > >> To: Liu Yu-B13201 > >> Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > >> kvm@vger.kernel.org; Liu Yu-B13201 > >> Subject: Re: [PATCH 4/4] kvmppc/booke: exit_nr fixup for > >> guest debug single step > >> > >> > >> Am 03.02.2010 um 08:53 schrieb Liu Yu : > >> > >> > >>> As BOOKE doesn't have hardware support for virtualization, > >>> hardware never know who's guest and host. > >>> > >>> When enable hardware single step in guest, > >>> we cannot disabled it at the point we switch back to host. > >>> > >> Why not? We directly arrive in our code. So we can just > >> disable it, no? > >> > >> Or does that break when you'd try to debug the guest > >> interrupt handlers? > >> > > > > That's the hardware limitition. > > Assume received itlb miss interrupt, but it doesn't clear > MSR_DE in MSR, > > so on the exit path single step still work and then debug > interrupt is > > triggled. > > > > MSRDE is set to 0 by critical class interrupts > unless Category E.ED is supported, by Debug > interrupts, and by Machine Check interrupts, > and is left unchanged by all other interrupts. > > Great. > > So when single stepping is enabled, you jump into the guest, > get an itlb > miss, get out, still have DE set, get in KVM's own DE handler and can > process things from there. > > Could you check if the debug instruction was on PR=0? If so, you can > just rfi and be good, right? > Hr? The moment we found this happen we've already saved the guest and loaded host on exit path Rfi will make exit path again which means save guest again. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/4] kvmppc: guest debug definitions
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Wednesday, February 03, 2010 5:51 PM > To: Liu Yu-B13201 > Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > kvm@vger.kernel.org > Subject: Re: [PATCH 1/4] kvmppc: guest debug definitions > > Liu Yu-B13201 wrote: > > > > > > > >> -Original Message- > >> From: kvm-ppc-ow...@vger.kernel.org > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > >> Sent: Wednesday, February 03, 2010 4:57 PM > >> To: Liu Yu-B13201 > >> Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > >> kvm@vger.kernel.org; Liu Yu-B13201 > >> Subject: Re: [PATCH 1/4] kvmppc: guest debug definitions > >> > >> > >> Am 03.02.2010 um 08:53 schrieb Liu Yu : > >> > >> > >>> Signed-off-by: Liu Yu > >>> --- > >>> arch/powerpc/include/asm/kvm.h | 20 > >>> arch/powerpc/include/asm/kvm_host.h | 16 > >>> 2 files changed, 36 insertions(+), 0 deletions(-) > >>> > >>> diff --git a/arch/powerpc/include/asm/kvm.h > b/arch/powerpc/include/ > >>> asm/kvm.h > >>> index 81f3b0b..b7f7861 100644 > >>> --- a/arch/powerpc/include/asm/kvm.h > >>> +++ b/arch/powerpc/include/asm/kvm.h > >>> @@ -22,6 +22,9 @@ > >>> > >>> #include > >>> > >>> +/* Select powerpc specific features in */ > >>> +#define __KVM_HAVE_GUEST_DEBUG > >>> + > >>> struct kvm_regs { > >>>__u64 pc; > >>>__u64 cr; > >>> @@ -71,10 +74,27 @@ struct kvm_fpu { > >>> }; > >>> > >>> struct kvm_debug_exit_arch { > >>> +__u32 exception; > >>> +__u32 pc; > >>> +__u32 status; > >>> }; > >>> > >>> +#define KVM_INST_GUESTGDB 0x4422 > >>> > >> What instruction is this again? :) Is it something reserved for > >> purposes like this? > >> > >> > > > > Just an invalid instruction which can generate program interrupt... > > I'm open to it's value btw. > > > > Well this definitely doesn't generate a program interrupt. Or at least > it shouldn't :-). > I just remembered where I've seen an opcode like this before. > This is a > part of a dump of arch/powerpc/boot/ps3-hvcall.o > > : >0: 7c 08 02 a6 mflrr0 >4: 90 01 00 04 stw r0,4(r1) >8: 94 21 ff f0 stwur1,-16(r1) >c: 90 61 00 08 stw r3,8(r1) > 10: 39 60 00 45 li r11,69 > 14: 44 00 00 22 sc 1 > > So as you can see, this is the hypercall instruction for lv1. > IIRC beat > uses the same. I don't think we want to reuse that opcode for > ourselves. > Maybe one day someone figures it's a good idea to implement a > beat-style > ABI in KVM. > > But IIRC sc can take a lot of values, so we can just take sc 0x1234 or > so :-). > > >>> + > >>> +#define KVM_GUESTDBG_USE_SW_BP 0x0001 > >>> +#define KVM_GUESTDBG_USE_HW_BP 0x0002 > >>> + > >>> +#define KVMPPC_DEBUG_NOTYPE 0x0 > >>> +#define KVMPPC_DEBUG_BREAKPOINT (1UL << 1) > >>> +#define KVMPPC_DEBUG_WATCH_WRITE(1UL << 2) > >>> +#define KVMPPC_DEBUG_WATCH_READ (1UL << 3) > >>> + > >>> /* for KVM_SET_GUEST_DEBUG */ > >>> struct kvm_guest_debug_arch { > >>> +struct { > >>> +__u32 addr; > >>> +__u32 type; > >>> +} bp[6]; > >>> > >> I can't look up the sources right now. Is this a struct that > >> 1:1 maps > >> to an ioctl struct? If so, we should add padding for a > >> possible future > >> extension of debug registers. > >> > > > > Yes it's used by ioctl. > > What's the usually pad size? > > > > I don't think there's a default. I just tend to pad it to something > reasonable. I guess in this case we can even just extend bp to 128 > entries, add a reasonable amount of churn to the debug info > and be good: > > struct kvm_guest_debug_arch { > struct { > __u64 addr; > __u32 type; > __u32 pad1; > __u64 pad2; > } bp[128]; > } > Software breakpoint is maintained by qemu. Here it's only used by hardware breakpoint/watchpoint Is 128 much too large? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4] kvmppc/booke: exit_nr fixup for guest debug single step
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Wednesday, February 03, 2010 5:03 PM > To: Liu Yu-B13201 > Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > kvm@vger.kernel.org; Liu Yu-B13201 > Subject: Re: [PATCH 4/4] kvmppc/booke: exit_nr fixup for > guest debug single step > > > Am 03.02.2010 um 08:53 schrieb Liu Yu : > > > As BOOKE doesn't have hardware support for virtualization, > > hardware never know who's guest and host. > > > > When enable hardware single step in guest, > > we cannot disabled it at the point we switch back to host. > > Why not? We directly arrive in our code. So we can just > disable it, no? > > Or does that break when you'd try to debug the guest > interrupt handlers? That's the hardware limitition. Assume received itlb miss interrupt, but it doesn't clear MSR_DE in MSR, so on the exit path single step still work and then debug interrupt is triggled. > > > Thus, we'll see that an single step interrupt happens at > > the beginning of guest exit path. > > > > Then we need to recognize this kind of single step interrupt > > and fix the exit_nr to the original value. > > So that everything looks like normal. > > > > Signed-off-by: Liu Yu > > --- > > arch/powerpc/kvm/booke.c| 82 > ++ > > + > > arch/powerpc/kvm/booke_interrupts.S |9 ++-- > > 2 files changed, 87 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > > index ec2722d..9056708 100644 > > --- a/arch/powerpc/kvm/booke.c > > +++ b/arch/powerpc/kvm/booke.c > > @@ -24,6 +24,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -34,6 +35,8 @@ > > #include "booke.h" > > > > unsigned long kvmppc_booke_handlers; > > +unsigned long kvmppc_booke_handler_addr[16]; > > +#define handler_vector_num > (sizeof(kvmppc_booke_handler_addr)/sizeof > > (kvmppc_booke_handler_addr[0])) > > > > #define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM > > #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), > KVM_STAT_VCPU > > @@ -214,6 +217,80 @@ void kvmppc_core_deliver_interrupts(struct > > kvm_vcpu *vcpu) > >} > > } > > > > +int kvmppc_read_guest(struct kvm_vcpu *vcpu, unsigned long geaddr, > > + void *data, int len) > > Ah, nice. I have something similar in book3s.c. IIRC it's called > kvmppc_ld. > > I think we should make the semantics identical and declare it as > common kvmppc_core function. > Cool. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/4] kvmppc: guest debug definitions
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Wednesday, February 03, 2010 4:57 PM > To: Liu Yu-B13201 > Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > kvm@vger.kernel.org; Liu Yu-B13201 > Subject: Re: [PATCH 1/4] kvmppc: guest debug definitions > > > Am 03.02.2010 um 08:53 schrieb Liu Yu : > > > Signed-off-by: Liu Yu > > --- > > arch/powerpc/include/asm/kvm.h | 20 > > arch/powerpc/include/asm/kvm_host.h | 16 > > 2 files changed, 36 insertions(+), 0 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/ > > asm/kvm.h > > index 81f3b0b..b7f7861 100644 > > --- a/arch/powerpc/include/asm/kvm.h > > +++ b/arch/powerpc/include/asm/kvm.h > > @@ -22,6 +22,9 @@ > > > > #include > > > > +/* Select powerpc specific features in */ > > +#define __KVM_HAVE_GUEST_DEBUG > > + > > struct kvm_regs { > >__u64 pc; > >__u64 cr; > > @@ -71,10 +74,27 @@ struct kvm_fpu { > > }; > > > > struct kvm_debug_exit_arch { > > +__u32 exception; > > +__u32 pc; > > +__u32 status; > > }; > > > > +#define KVM_INST_GUESTGDB 0x4422 > > What instruction is this again? :) Is it something reserved for > purposes like this? > Just an invalid instruction which can generate program interrupt... I'm open to it's value btw. > > > + > > +#define KVM_GUESTDBG_USE_SW_BP 0x0001 > > +#define KVM_GUESTDBG_USE_HW_BP 0x0002 > > + > > +#define KVMPPC_DEBUG_NOTYPE 0x0 > > +#define KVMPPC_DEBUG_BREAKPOINT (1UL << 1) > > +#define KVMPPC_DEBUG_WATCH_WRITE(1UL << 2) > > +#define KVMPPC_DEBUG_WATCH_READ (1UL << 3) > > + > > /* for KVM_SET_GUEST_DEBUG */ > > struct kvm_guest_debug_arch { > > +struct { > > +__u32 addr; > > +__u32 type; > > +} bp[6]; > > I can't look up the sources right now. Is this a struct that > 1:1 maps > to an ioctl struct? If so, we should add padding for a > possible future > extension of debug registers. Yes it's used by ioctl. What's the usually pad size? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] kvmppc/booke: Set ESR and DEAR when inject interrupt to guest
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, January 22, 2010 7:33 PM > To: Liu Yu-B13201 > Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > kvm@vger.kernel.org > Subject: Re: [PATCH] kvmppc/booke: Set ESR and DEAR when > inject interrupt to guest > > > On 22.01.2010, at 12:27, Liu Yu-B13201 wrote: > > > > > > >> -Original Message- > >> From: kvm-ppc-ow...@vger.kernel.org > >> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > >> Sent: Friday, January 22, 2010 7:13 PM > >> To: Liu Yu-B13201 > >> Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > >> kvm@vger.kernel.org > >> Subject: Re: [PATCH] kvmppc/booke: Set ESR and DEAR when > >> inject interrupt to guest > >> > >> > >> On 22.01.2010, at 11:54, Liu Yu wrote: > >> > >>> Old method prematurely sets ESR and DEAR. > >>> Move this part after we decide to inject interrupt, > >>> and make it more like hardware behave. > >>> > >>> Signed-off-by: Liu Yu > >>> --- > >>> arch/powerpc/kvm/booke.c | 24 ++-- > >>> arch/powerpc/kvm/emulate.c |2 -- > >>> 2 files changed, 14 insertions(+), 12 deletions(-) > >>> > >>> @@ -286,15 +295,12 @@ int kvmppc_handle_exit(struct kvm_run > >> *run, struct kvm_vcpu *vcpu, > >>> break; > >>> > >>> case BOOKE_INTERRUPT_DATA_STORAGE: > >>> - vcpu->arch.dear = vcpu->arch.fault_dear; > >>> - vcpu->arch.esr = vcpu->arch.fault_esr; > >>> kvmppc_booke_queue_irqprio(vcpu, > >> BOOKE_IRQPRIO_DATA_STORAGE); > >> > >> kvmppc_booke_queue_data_storage(vcpu, vcpu->arch.fault_esr, > >> vcpu->arch.fault_dear); > >> > >>> kvmppc_account_exit(vcpu, DSI_EXITS); > >>> r = RESUME_GUEST; > >>> break; > >>> > >>> case BOOKE_INTERRUPT_INST_STORAGE: > >>> - vcpu->arch.esr = vcpu->arch.fault_esr; > >>> kvmppc_booke_queue_irqprio(vcpu, > >> BOOKE_IRQPRIO_INST_STORAGE); > >> > >> kvmppc_booke_queue_inst_storage(vcpu, vcpu->arch.fault_esr); > >> > > > > Not sure if this is redundant, as we already have fault_esr. > > Or should we ignore what hareware is and create a new esr to guest? > > On Book3S I take the SRR1 we get from the host as > "inspiration" of what to pass to the guest as SRR1. I think > we should definitely be able to inject a fault that we didn't > get in that exact form from the exit path. > > I'm also not sure if something could clobber fault_esr if > another interrupt takes precedence. Say a #MC. No as far as I know. And if yes, the clobber could as well happen before we copy it. Hollis, what do you think we should do here? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] kvmppc/booke: Set ESR and DEAR when inject interrupt to guest
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Friday, January 22, 2010 7:13 PM > To: Liu Yu-B13201 > Cc: hol...@penguinppc.org; kvm-...@vger.kernel.org; > kvm@vger.kernel.org > Subject: Re: [PATCH] kvmppc/booke: Set ESR and DEAR when > inject interrupt to guest > > > On 22.01.2010, at 11:54, Liu Yu wrote: > > > Old method prematurely sets ESR and DEAR. > > Move this part after we decide to inject interrupt, > > and make it more like hardware behave. > > > > Signed-off-by: Liu Yu > > --- > > arch/powerpc/kvm/booke.c | 24 ++-- > > arch/powerpc/kvm/emulate.c |2 -- > > 2 files changed, 14 insertions(+), 12 deletions(-) > > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > > index e283e44..a8ee148 100644 > > --- a/arch/powerpc/kvm/booke.c > > +++ b/arch/powerpc/kvm/booke.c > > @@ -84,7 +84,8 @@ static void > kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, > > > > void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) > > { > > - /* BookE does flags in ESR, so ignore those we get here */ > > + if (flags == SRR1_PROGTRAP) > > + vcpu->arch.fault_esr = ESR_PTR; > > This looks odd. I was more thinking of "flags" being ESR in > this context. So you'd save off flags to a field in the vcpu > here and restore it later when the fault actually gets injected. See the end. > > > kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); > > } > > > > @@ -115,14 +116,19 @@ static int > kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, > > { > > int allowed = 0; > > ulong msr_mask; > > + bool update_esr = false, update_dear = false; > > > > switch (priority) { > > - case BOOKE_IRQPRIO_PROGRAM: > > case BOOKE_IRQPRIO_DTLB_MISS: > > - case BOOKE_IRQPRIO_ITLB_MISS: > > - case BOOKE_IRQPRIO_SYSCALL: > > Is this correct? Was the previous order wrong according to the spec? what order? just make switch items share the code. > > > case BOOKE_IRQPRIO_DATA_STORAGE: > > + update_dear = true; > > + /* fall through */ > > case BOOKE_IRQPRIO_INST_STORAGE: > > + case BOOKE_IRQPRIO_PROGRAM: > > + update_esr = true; > > + /* fall through */ > > + case BOOKE_IRQPRIO_ITLB_MISS: > > + case BOOKE_IRQPRIO_SYSCALL: > > case BOOKE_IRQPRIO_FP_UNAVAIL: > > case BOOKE_IRQPRIO_SPE_UNAVAIL: > > case BOOKE_IRQPRIO_SPE_FP_DATA: > > @@ -157,6 +163,10 @@ static int > kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, > > vcpu->arch.srr0 = vcpu->arch.pc; > > vcpu->arch.srr1 = vcpu->arch.msr; > > vcpu->arch.pc = vcpu->arch.ivpr | > vcpu->arch.ivor[priority]; > > + if (update_esr == true) > > + vcpu->arch.esr = vcpu->arch.fault_esr; > > I'd rather like to see new fields used for that. > > @@ -286,15 +295,12 @@ int kvmppc_handle_exit(struct kvm_run > *run, struct kvm_vcpu *vcpu, > > break; > > > > case BOOKE_INTERRUPT_DATA_STORAGE: > > - vcpu->arch.dear = vcpu->arch.fault_dear; > > - vcpu->arch.esr = vcpu->arch.fault_esr; > > kvmppc_booke_queue_irqprio(vcpu, > BOOKE_IRQPRIO_DATA_STORAGE); > > kvmppc_booke_queue_data_storage(vcpu, vcpu->arch.fault_esr, > vcpu->arch.fault_dear); > > > kvmppc_account_exit(vcpu, DSI_EXITS); > > r = RESUME_GUEST; > > break; > > > > case BOOKE_INTERRUPT_INST_STORAGE: > > - vcpu->arch.esr = vcpu->arch.fault_esr; > > kvmppc_booke_queue_irqprio(vcpu, > BOOKE_IRQPRIO_INST_STORAGE); > > kvmppc_booke_queue_inst_storage(vcpu, vcpu->arch.fault_esr); > Not sure if this is redundant, as we already have fault_esr. Or should we ignore what hareware is and create a new esr to guest? > > > - vcpu->arch.dear = vcpu->arch.fault_dear; > > - vcpu->arch.esr = vcpu->arch.fault_esr; > > kvmppc_mmu_dtlb_miss(vcpu); > > kvmppc_account_exit(vcpu, DTLB_REAL_MISS_EXITS); > > r = RESUME_GUEST; > > diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c > > index b905623..4f6b9df 100644 > > --- a/arch/powerpc/kvm/emulate.c > > +++ b/arch/powerpc/kvm/emulate.c > > @@ -151,8 +151,6 @@ int kvmppc_emulate_instruction(struct > kvm_run *run, struct kvm_vcpu *vcpu) > > case OP_TRAP: > > #ifdef CONFIG_PPC64 > > case OP_TRAP_64: > > -#else > > - vcpu->arch.esr |= ESR_PTR; > > #endif > > kvmppc_core_queue_program(vcpu, SRR1_PROGTRAP); > > kvmppc_core_queue_program(vcpu, vcpu->arch.esr | ESR_PTR); > This breaks book3s code. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 3/3] kvmppc/e500: fix tlbcfg emulation
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, January 22, 2010 7:05 PM > To: Liu Yu-B13201 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org > Subject: Re: [PATCH v2 3/3] kvmppc/e500: fix tlbcfg emulation > > > On 22.01.2010, at 11:50, Liu Yu wrote: > > > commit 55fb1027c1cf9797dbdeab48180da530e81b1c39 doesn't > update tlbcfg correctly. > > Fix it and move this part to init code. > > > > Signed-off-by: Liu Yu > > --- > > arch/powerpc/include/asm/kvm_e500.h |2 ++ > > arch/powerpc/kvm/e500_emulate.c | 20 ++-- > > arch/powerpc/kvm/e500_tlb.c |6 ++ > > 3 files changed, 10 insertions(+), 18 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/kvm_e500.h > b/arch/powerpc/include/asm/kvm_e500.h > > index 569dfd3..7fea26f 100644 > > --- a/arch/powerpc/include/asm/kvm_e500.h > > +++ b/arch/powerpc/include/asm/kvm_e500.h > > @@ -56,6 +56,8 @@ struct kvmppc_vcpu_e500 { > > u32 l1csr1; > > u32 hid0; > > u32 hid1; > > + u32 tlb0cfg; > > + u32 tlb1cfg; > > > > struct kvm_vcpu vcpu; > > }; > > diff --git a/arch/powerpc/kvm/e500_emulate.c > b/arch/powerpc/kvm/e500_emulate.c > > index 95f8ec8..8e3edfb 100644 > > --- a/arch/powerpc/kvm/e500_emulate.c > > +++ b/arch/powerpc/kvm/e500_emulate.c > > @@ -164,25 +164,9 @@ int kvmppc_core_emulate_mfspr(struct > kvm_vcpu *vcpu, int sprn, int rt) > > kvmppc_set_gpr(vcpu, rt, vcpu_e500->mas7); break; > > > > case SPRN_TLB0CFG: > > - { > > - ulong tmp = SPRN_TLB0CFG; > > - > > - tmp &= ~0xfffUL; > > - tmp |= vcpu_e500->guest_tlb_size[0]; > > - kvmppc_set_gpr(vcpu, rt, tmp); > > - break; > > - } > > - > > + kvmppc_set_gpr(vcpu, rt, vcpu_e500->tlb0cfg); break; > > So before the guest couldn't change the guest TLB size and > now it can? Is that on purpose? Mind to explain why the old > code was there? > What? The register is readonly. I was thinking we could change guest TLB size online. But I don't think guest kernel would like that. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard > Sent: Saturday, January 09, 2010 3:30 AM > To: Alexander Graf > Cc: kvm@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu > Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > > index 338baf9..e283e44 100644 > > --- a/arch/powerpc/kvm/booke.c > > +++ b/arch/powerpc/kvm/booke.c > > @@ -82,8 +82,9 @@ static void > kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, > > set_bit(priority, &vcpu->arch.pending_exceptions); > > } > > > > -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) > > +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) > > { > > + /* BookE does flags in ESR, so ignore those we get here */ > > kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); > > } > > Actually, I think Book E prematurely sets ESR, since it's done before > the program interrupt is actually delivered. Architecturally, I'm not > sure if it's a problem, but philosophically I've always wanted it to > work the way you've just implemented for Book S. > ESR is updated not only by program but by data_tlb, data_storage, etc. Should we rearrange them all? Also DEAR has the same situation as ESR. Should it be updated when we decide to inject interrupt to guest? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Tuesday, January 12, 2010 8:08 AM > To: Liu Yu-B13201 > Cc: Hollis Blanchard; kvm@vger.kernel.org; kvm-ppc; Benjamin > Herrenschmidt; Liu Yu > Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly > > >> > >> Anyways, since we can't test changes at the moment (Yu, > can you?), I'd > >> settle for a comment to the effect that Book E code > *should* do this, > >> but doesn't (rather than the comment above that says it's ok). > >> > > > > Sure. > > You mean set the ESR at the moment we inject trap to guest? > > If it is not urgent, I'll do it later. > > It's definitely not urgent. In fact, If you could just test > patches I send to you I can write it and have you test it :-). > That would be nice.:-) But last time I try the top tree, it seemed not work. Maybe I need to find out the reason and fix it first... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: PPC: E500 compile fix
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Monday, January 11, 2010 1:02 AM > To: kvm@vger.kernel.org > Cc: kvm-ppc; Liu Yu-B13201 > Subject: [PATCH] KVM: PPC: E500 compile fix > > While trying to compile an E500 vmlinux, I stumbled across a > compilation bug > that was obviously there before I touched any of the code. A > trace point > doesn't get the correct arguments. > > Since that shouldn't be any critical to the functionality of > the code, my quick > workaround is to #if 0 it out. I would very much appreciate > someone fixing it > properly though. > > Liu, it would be nice if you could be the one doing that. > Hrr... I sent it before, but the whole series is blocked. http://marc.info/?l=kvm-ppc&m=124929800623835&w=2 Anyway the series needs to be resent, since there's a lot changes. Thanks for doing this for me! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard > Sent: Saturday, January 09, 2010 3:30 AM > To: Alexander Graf > Cc: kvm@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu > Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly > > On Thu, Jan 7, 2010 at 5:58 PM, Alexander Graf wrote: > > Book3S needs some flags in SRR1 to get to know details > about an interrupt. > > > > One such example is the trap instruction. It tells the > guest kernel that > > a program interrupt is due to a trap using a bit in SRR1. > > > > This patch implements above behavior, making WARN_ON behave > like WARN_ON. > > ... "for Book S". It already works properly for Book E, > thankyouverymuch. ;) > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > > index 338baf9..e283e44 100644 > > --- a/arch/powerpc/kvm/booke.c > > +++ b/arch/powerpc/kvm/booke.c > > @@ -82,8 +82,9 @@ static void > kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, > > set_bit(priority, &vcpu->arch.pending_exceptions); > > } > > > > -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) > > +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) > > { > > + /* BookE does flags in ESR, so ignore those we get here */ > > kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); > > } > > Actually, I think Book E prematurely sets ESR, since it's done before > the program interrupt is actually delivered. Architecturally, I'm not > sure if it's a problem, but philosophically I've always wanted it to > work the way you've just implemented for Book S. > > Anyways, since we can't test changes at the moment (Yu, can you?), I'd > settle for a comment to the effect that Book E code *should* do this, > but doesn't (rather than the comment above that says it's ok). > Sure. You mean set the ESR at the moment we inject trap to guest? If it is not urgent, I'll do it later. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] KVM/PPC: Fix PPC KVM e500_tlb.c build error
> -Original Message- > From: Avi Kivity [mailto:a...@redhat.com] > Sent: Thursday, July 02, 2009 7:20 PM > To: Liu Yu-B13201 > Cc: Yang Shi; holl...@us.ibm.com; kvm-...@vger.kernel.org; > kvm@vger.kernel.org; linuxppc-...@ozlabs.org > Subject: Re: [PATCH 1/2] KVM/PPC: Fix PPC KVM e500_tlb.c build error > > On 07/02/2009 06:09 AM, Liu Yu-B13201 wrote: > > This fix is already accepted in kvm.git > > > > What about the other? Is it needed? > No, it's already fixed by commit b57227e600f4ecc394d6ba3c2aaa558867b5addc. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] KVM/PPC: Fix PPC KVM e500_tlb.c build error
This fix is already accepted in kvm.git > -Original Message- > From: Yang Shi [mailto:yang@windriver.com] > Sent: Thursday, July 02, 2009 10:55 AM > To: Liu Yu-B13201; holl...@us.ibm.com; a...@redhat.com > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; > linuxppc-...@ozlabs.org > Subject: [PATCH 1/2] KVM/PPC: Fix PPC KVM e500_tlb.c build error > > Since include/asm/mmu-fsl-booke.h was replaced by > include/asm/mmu-book3e.h, > fix e500_tlb.h to reflect the change and fix e500_tlb.c to > align with the > new page size macro definition in include/asm/mmu-book3e.h. > > Signed-off-by: Yang Shi > --- > arch/powerpc/kvm/e500_tlb.c |8 > arch/powerpc/kvm/e500_tlb.h |2 +- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c > index 0e773fc..616762b 100644 > --- a/arch/powerpc/kvm/e500_tlb.c > +++ b/arch/powerpc/kvm/e500_tlb.c > @@ -309,7 +309,7 @@ static inline void > kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, > vcpu_e500->shadow_pages[tlbsel][esel] = new_page; > > /* Force TS=1 IPROT=0 TSIZE=4KB for all guest mappings. */ > - stlbe->mas1 = MAS1_TSIZE(BOOKE_PAGESZ_4K) > + stlbe->mas1 = MAS1_TSIZE(BOOK3E_PAGESZ_4K) > | MAS1_TID(get_tlb_tid(gtlbe)) | MAS1_TS | MAS1_VALID; > stlbe->mas2 = (gvaddr & MAS2_EPN) > | e500_shadow_mas2_attrib(gtlbe->mas2, > @@ -545,7 +545,7 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu) > case 0: > /* TLB0 */ > gtlbe->mas1 &= ~MAS1_TSIZE(~0); > - gtlbe->mas1 |= MAS1_TSIZE(BOOKE_PAGESZ_4K); > + gtlbe->mas1 |= MAS1_TSIZE(BOOK3E_PAGESZ_4K); > > stlbsel = 0; > sesel = > kvmppc_e500_stlbe_map(vcpu_e500, 0, esel); > @@ -679,14 +679,14 @@ void kvmppc_e500_tlb_setup(struct > kvmppc_vcpu_e500 *vcpu_e500) > > /* Insert large initial mapping for guest. */ > tlbe = &vcpu_e500->guest_tlb[1][0]; > - tlbe->mas1 = MAS1_VALID | MAS1_TSIZE(BOOKE_PAGESZ_256M); > + tlbe->mas1 = MAS1_VALID | MAS1_TSIZE(BOOK3E_PAGESZ_256M); > tlbe->mas2 = 0; > tlbe->mas3 = E500_TLB_SUPER_PERM_MASK; > tlbe->mas7 = 0; > > /* 4K map for serial output. Used by kernel wrapper. */ > tlbe = &vcpu_e500->guest_tlb[1][1]; > - tlbe->mas1 = MAS1_VALID | MAS1_TSIZE(BOOKE_PAGESZ_4K); > + tlbe->mas1 = MAS1_VALID | MAS1_TSIZE(BOOK3E_PAGESZ_4K); > tlbe->mas2 = (0xe0004500 & 0xF000) | MAS2_I | MAS2_G; > tlbe->mas3 = (0xe0004500 & 0xF000) | > E500_TLB_SUPER_PERM_MASK; > tlbe->mas7 = 0; > diff --git a/arch/powerpc/kvm/e500_tlb.h b/arch/powerpc/kvm/e500_tlb.h > index 45b064b..abb1bf8 100644 > --- a/arch/powerpc/kvm/e500_tlb.h > +++ b/arch/powerpc/kvm/e500_tlb.h > @@ -16,7 +16,7 @@ > #define __KVM_E500_TLB_H__ > > #include > -#include > +#include > #include > #include > > -- > 1.6.0.4 > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PowerPC page faults
> -Original Message- > From: kvm-ow...@vger.kernel.org > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Anthony Liguori > Sent: Tuesday, May 12, 2009 6:18 AM > To: Hollis Blanchard > Cc: Gregory Haskins; Avi Kivity; Chris Wright; Gregory > Haskins; linux-ker...@vger.kernel.org; kvm@vger.kernel.org > Subject: Re: PowerPC page faults > > Hollis Blanchard wrote: > > On Mon, 2009-05-11 at 12:54 -0500, Anthony Liguori wrote: > > > >> For future ppcemb's, do you know if there is an equivalent > of a PF exit > >> type? Does the hardware squirrel away the faulting > address somewhere > >> and set PC to the start of the instruction? If so, no > guest memory load > >> should be required. > >> > > > > Ahhh... you're saying that the address itself (or offset > within a page) > > is the hypercall token, totally separate from IO emulation, > and so we > > could ignore the access size. > > No, I'm not being nearly that clever. > > I was suggesting that hardware virtualization support in future PPC > systems might contain a mechanism to intercept a guest-mode > TLB miss. > If it did, it would be useful if that guest-mode TLB miss "exit" > contained extra information somewhere that included the PC of the > faulting instruction, the address response for the fault, and enough > information to handle the fault without instruction decoding. I don't think this can come true if it is software managed TLB. For guest will never know the content of its TLB without the help of software. > > I assume all MMIO comes from the same set of instructions in PPC? > Something like ld/st instructions? Presumably all you need > to know from > instruction decoding is the destination register and whether it was a > read or write? Yes, but more there is a search in guest TLB to see if it is a guest TLB miss currently. A big overhead, especially for a large TLB without being organized by sth. like rbtree. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: qemu-kvm.git now live
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org > [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Avi Kivity > Sent: Thursday, April 23, 2009 11:40 PM > To: KVM list > Cc: Anthony Liguori; Hollis Blanchard; Zhang, Xiantao; > kvm-ppc; kvm-i...@vger.kernel.org; Carsten Otte > Subject: qemu-kvm.git now live > > After a lengthy testing phase, qemu-kvm.git has replaced > kvm-userspace.git as the source repository for kvm userspace > development. > > Differences from kvm-userspace.git are as follows: > - everything under qemu/ has been moved to the top-level directory > - everything not under qemu/ has been moved under a new top-level > directory, kvm/ > - all qemu subversion commits have been rewritten to be > compatible with > what will become the master qemu git repository > - all branches and tags have been converted > - qemu-kvm.git builds standalone (does not require kvm.git) > > The repository is compatible with upstream qemu.git; merges and > cherry-picking will work fine in either direction > > Still missing: > - I have not tested powerpc or ia64. Patches welcome! > - testsuite (kvm/user/ directory) does not build > > git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git > Is there a http interface?
RE: KVM Port
Yes. At least, you need an application to launch VM. It's helpful to use qemu to emulate the I/O. But if your hardware support the technic like iommu, you can move the part into KVM for better performance. > -Original Message- > From: kvm port [mailto:kvmp...@gmail.com] > Sent: Thursday, March 26, 2009 2:03 PM > To: Liu Yu-B13201 > Cc: kvm@vger.kernel.org; kvm-...@vger.kernel.org > Subject: Re: KVM Port > > do we have to port qemu as well? > > On Thu, Mar 26, 2009 at 7:22 AM, Liu Yu-B13201 > wrote: > > > > IMHO, one thing you should keep in mind is how to isolate > the guest space based on your hardware MMU. > > And then deal with the exceptions carefully, > > some may be directly send to guest and some should be > handled by hypervisor. > > > > In powerpc BOOKE implementation, we have to hijack all exceptions, > > because BOOKE doesn't support technic like VT. > > > > > >> -Original Message- > >> From: kvm-ow...@vger.kernel.org > >> [mailto:kvm-ow...@vger.kernel.org] On Behalf Of kvm port > >> Sent: Wednesday, March 25, 2009 11:08 PM > >> To: kvm@vger.kernel.org; kvm-...@vger.kernel.org > >> Subject: KVM Port > >> > >> Hi KVM Gurus, > >> > >> We have a EVB with a fpga based RISC processor with VT support. > >> As a proof of concept i have to port KVM onto it. we have run > >> linux as of now. > >> can anyof u help with how should i begin > >> > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > kvm-ppc" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM Port
IMHO, one thing you should keep in mind is how to isolate the guest space based on your hardware MMU. And then deal with the exceptions carefully, some may be directly send to guest and some should be handled by hypervisor. In powerpc BOOKE implementation, we have to hijack all exceptions, because BOOKE doesn't support technic like VT. > -Original Message- > From: kvm-ow...@vger.kernel.org > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of kvm port > Sent: Wednesday, March 25, 2009 11:08 PM > To: kvm@vger.kernel.org; kvm-...@vger.kernel.org > Subject: KVM Port > > Hi KVM Gurus, > > We have a EVB with a fpga based RISC processor with VT support. > As a proof of concept i have to port KVM onto it. we have run > linux as of now. > can anyof u help with how should i begin > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Virtio_pci in kernel ignore endian of PCI I/O space?
> -Original Message- > From: Hollis Blanchard [mailto:[EMAIL PROTECTED] > Sent: Friday, September 05, 2008 11:31 PM > To: Liu Yu-B13201 > Cc: [EMAIL PROTECTED]; kvm > Subject: RE: Virtio_pci in kernel ignore endian of PCI I/O space? > > On Thu, 2008-09-04 at 13:27 +0800, Liu Yu-B13201 wrote: > > > > > > > Since E500 is big endian, this bring the misunderstanding > > > between qemu > > > > and guest. > > > > > > What are you using for PCI emulation in qemu? I don't > think it should > > > matter in this case, since the kernel is doing 1-byte reads > > > in vp_get(), > > > but endianness gets very convoluted wherever qemu is involved. :( > > > > > > > Did you see Anthony's reply? > > It's weird that 440 can still work fine. > > OK, I've found my problem. My host kernel is current, but my *guest* > kernel is old (it was still byte-swapping). > > Also, the reason the S390 guys didn't see a problem is that > they aren't > yet using qemu, and their virtio userspace implementation > doesn't swap. Thanks a lot! > > Yu, would you send a patch? > Certainly. As my code is on branch kvm-70rc1, I could only test it on branch kvm-70rc1 now. I don't know whether it is matter? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Virtio_pci in kernel ignore endian of PCI I/O space?
> -Original Message- > From: Hollis Blanchard [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 02, 2008 10:16 PM > To: Liu Yu-B13201 > Cc: [EMAIL PROTECTED]; kvm > Subject: Re: Virtio_pci in kernel ignore endian of PCI I/O space? > > On Tue, 2008-09-02 at 15:29 +0800, Liu Yu-B13201 wrote: > > Hollis, > > > > I noticed fuction virtio_blk_update_config() > (qemu/hw/virtio-blk.c) in > > qemu > > updated the virtio disk's capacity in little endian. > > But virtblk_probe() (drivers/block/virtio_blk.c) in kernel read the > > capacity from I/O space without convertion. > > Sigh, I don't remember why this works any more. > > There was discussion on this on kvm-ppc-devel in April > (unfortunately it > was right at the month line, so the threading is broken, but see e.g. > http://marc.info/?l=kvm-ppc&m=120716463602156), and ultimately > cpu_to_le64() was added to qemu. However, a couple weeks later, the > swapping that the kernel was doing was removed (see > http://thread.gmane.org/gmane.linux.kernel.virtualization/5776 > /focus=5801). > > However, the code is actually working today on 440 (and > reporting a sane > number of blocks). Also, this patch > (http://article.gmane.org/gmane.linux.kernel.virtualization/61 > 26/match=virtio%5fblk+fix+endianess+annotations) suggests > that struct virtio_blk_config is no longer LE, so I'm > thinking your problem must be in qemu. AFAICS the kernel is > basically just doing memcpy in a few places. Thanks for your informations! > > > Since E500 is big endian, this bring the misunderstanding > between qemu > > and guest. > > What are you using for PCI emulation in qemu? I don't think it should > matter in this case, since the kernel is doing 1-byte reads > in vp_get(), > but endianness gets very convoluted wherever qemu is involved. :( > Did you see Anthony's reply? It's weird that 440 can still work fine. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: 63 sectors
> -Original Message- > From: H. Peter Anvin [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 04, 2008 11:33 AM > To: Liu Yu-B13201 > Cc: Avi Kivity; KVM list > Subject: Re: 63 sectors > > Liu Yu-B13201 wrote: > > How about adding 1 sector offset when access disk image. > > So that it looks like the 63rd sector, but it's indeed the 64th. > > The problem with that is you make it suck on OSes which actually > partition sanely. > You mean guest OSes? Guest OSes never know this offset, adding one sector offset is in qemu and transparent to them.
RE: 63 sectors
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Avi Kivity > Sent: Wednesday, September 03, 2008 4:12 PM > To: KVM list > Cc: H. Peter Anvin > Subject: 63 sectors > > Qemu sets the sectors-per-track setting of virtual disks to 63. This > seems to be in accordance with the specs; drivers/ide/ide-disk.c says: > > > /* > > * The ATA spec tells large drives to return > > * C/H/S = 16383/16/63 independent of their size. > > * Some drives can be jumpered to use 15 heads > instead of 16. > > * Some drives can be jumpered to use 4092 cyls > instead of 16383. > > */ > > if ((id->cyls == 16383 > > || (id->cyls == 4092 && id->cur_cyls == 16383)) && > > id->sectors == 63 && > > (id->heads == 15 || id->heads == 16) && > > (id->lba_capacity >= 16383*63*id->heads)) > > return 1; > > That setting has some unfortunate side effects. Partitioning > tools will > locate the first partition at the second cylinder, which is > at the 63rd > sector. This means that if the guest uses a 4K block > filesystem on the > first partition (an incredibly common occurance), then every single > access will not be 4K aligned with respect to the virtual > block device. > This will cause fragmentation and read/modify/write cycles with: > > - qcow2 (which uses aligned 4K blocks) > - any host filesystem which uses 4K blocks > - any host disk which uses 4K blocks (not yet common) > > I can think of a few workarounds, all bad: > - add a partitioning tool (or option to qemu-img) to format the disk, > placing the first partition on the fourth cylinder, aligning > it. tell > the users not to wipe the disks out but instead install to one of the > existing parititions > - add a tool to optimize an existing disk by extending it and > moving the > partitions around so they are aligned. may break boot loaders. > - make qcow4 use 512 byte sectors. will increase overhead > and doesn't > solve problems on the host filesystem and disk. > - have qcow51 detect misaligned accesses and adjust itself somehow. > doesn't help raw and other formats. likely very difficult. > > Does anybody know if scsi will have the same problems? Can anyone > suggest other workarounds? > How about adding 1 sector offset when access disk image. So that it looks like the 63rd sector, but it's indeed the 64th. N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf