Amit, Alex, Please see my comments bellow. Avi, Please have a look at the patches, and let me know the parts you think can be done better.
On Fri, 2008-11-14 at 06:07 -0700, Amit Shah wrote: > * On Thursday 13 Nov 2008 19:08:14 Alexander Graf wrote: > > On 13.11.2008, at 05:35, Amit Shah wrote: > > > * On Wednesday 12 Nov 2008 22:49:16 Alexander Graf wrote: > > >> On 12.11.2008, at 17:52, Amit Shah wrote: > > >>> Hi Alex, > > >>> > > >>> * On Wednesday 12 Nov 2008 21:09:43 Alexander Graf wrote: > > >>>> Hi, > > >>>> > > >>>> I was thinking a bit about cross vendor migration recently and > > >>>> since > > >>>> we're doing open source development, I figured it might be a good > > >>>> idea > > >>>> to talk to everyone about this. > > >>>> > > >>>> So why are we having a problem? > > >>>> > > >>>> In normal operation we don't. If we're running a 32-bit kernel, we > > >>>> can > > >>>> use SYSENTER to jump from kernel<->userspace. If we're on a 64-bit > > >>>> kernel with 64-bit userspace, every CPU supports SYSCALL. At least > > >>>> Linux is being smart on this and does use exactly these two > > >>>> capabilities in these two cases. > > >>>> But if we're running in compat mode (64-bit kernel with 32-bit > > >>>> userspace), things differ. Intel supports only SYSENTER here, while > > >>>> AMD only supports SYSCALL. Both can still use int80. > > >>>> > > >>>> Operating systems detect usage of SYSCALL or SYSENTER pretty > > >>>> early on > > >>>> (Linux does this on vdso). So when we boot up on an Intel machine, > > >>>> Linux assumes that using SYSENTER in compat mode is fine. Migrating > > >>>> that machine to an AMD machine breaks this assumption though, since > > >>>> SYSENTER can't be used in compat mode. > > >>>> On LInux, this detection is based on the CPU vendor string. If > > >>>> Linux > > >>>> finds a "GenuineIntel", SYSENTER is used in compat mode, if it's > > >>>> "AuthenticAMD", SYSCALL is used and if none of these two is found, > > >>>> int80 is used. > > >>>> > > >>>> I tried modifying the vendor string, removed the "overwrite the > > >>>> vendor > > >>>> string with the native string" hack and things look like they work > > >>>> just fine with Linux. > > >>>> > > >>>> Unfortunately right now I don't have a 64-bit Windows installation > > >>>> around to check if that approach works there too, but if it does > > >>>> and > > >>>> no known OS breaks due to the invalid vendor string, we can just > > >>>> create our own virtual CPU string, no? > > >>> > > >>> qemu has an option for that, -cpu qemu64 IIRC. As long as we expose > > >>> practically correct cpuids and MSRs, this should be fine. I've not > > >>> tested > > >>> qemu64 with winxp x64 though. Also, last I knew, winxp x64 > > >>> installation > > >>> didn't succeed with --no-kvm. qemu by default exposes an AMD CPU > > >>> type. > > >> > > >> I wasn't talking about CPUID features, but the vendor string. Qemu64 > > >> provides the AuthenticAMD string, so we don't run into any issues I'm > > >> presuming. > > > > > > Right -- the thing is, with the default AuthenticAMD string, winp x64 > > > installation fails. That has to be because of some missing cpuids. > > > That's one > > > of the drawbacks of exposing a well-known CPU type. I was suggesting > > > we > > > should try out the -cpu qemu64 CPU type since it exposes a non- > > > standard CPU > > > to see if guests and most userspace programs work fine without any > > > further > > > tweaking -- see the 'cons' below for why this might be a problem. > > > > I still don't really understand what you're trying to say - qemu64 is > > the default in KVM right now. You mean winxp64 installation doesn't > No, the default for KVM is the host CPU type. Amit, Aliex is correct. the default cpu for kvm is qemu64 not the host. I have sent the patches to add an options -cpu host. Some of the patches are gone in, But All the patches are not in yet. Also my patches does not make the host option as default. I have attached the remaining two patches. Alex, can you try these patches with "-cpu host" option and see if you can get the host vendor string in the guest for AMD box. I have already tested it on the latest Intel system. > > > work as is and we should fix it? This has nothing to do with the > > migration problems, right? > > Solutions shouldn't involve adding known regressions. If our default cpu type > changes to one that renders some of the OSes we support right now to become > nonfunctional, such changes won't be accepted. Of course, we can improve the > qemu64 cpu type to ensure the popular OS types work properly at the least. > > > >>> There are pros and cons to expose a custom vendor ID: > > >>> > > >>> pros: > > >>> - We don't need to have all the cpuid features exposed which are > > >>> expected of a > > >>> physically available CPU in the market, for example, badly-coded > > >>> applications > > >>> might crash if we don't have SSSE3 on a Core2Duo. But badly-coded or > > >>> not, not > > >>> exposing what's actually available on every C2D out there is bad. > > >>> > > >>> cons: > > >>> - To expose the "correct" set of feature bits for a known processor, > > >>> we also > > >>> need to check the family/model/stepping to support the exact same > > >>> feature > > >>> bits that were present in the CPU. > > >>> - We might not get some optimizations that OSes might have based on > > >>> CPU type, > > >>> even if the host CPU qualifies for such optimizations > > >>> - Standard programs like benchmarking tools, etc., might fail if > > >>> they depend > > >>> on the vendor string for their functionality > > >>> > > >>> For 32-bit guests, I think exposing a pentium4 or Athlon CPU type > > >>> should be > > >>> fine. For 64-bit guests, the newer the better. > > >> > > >> Well, we could create different CPU definitions: > > >> > > >> - migration safe (do what is safe for migration) > > > > > > There are multiple ways of approaching this: peg to a least-known > > > good CPU > > > type, all of whose instructions will work on processors from both > > > the major > > > vendors. However, you never know how the server pools change and > > > you'd want > > > to upgrade the CPU type once you know the CPUs that are installed in > > > servers. > > > This has to be dynamic and the management application has to take > > > care of > > > exposing a CPU that's of a "safe" type for the particular server > > > pool. We > > > have to provide ways to mask off CPUID bits as requested by the > > > management > > > application. (Each server sends its cpuid to the management > > > application, > > > which calculates the safest bits and then conveys this to each > > > server before > > > starting a VM.) > > > > IMHO we shouldn't really start to be smart here. There's only so much > > benefit in using the least common dominator between all CPUs in the > > datacenter vs. using the least common dominator between all possible > > CPUs. You'll basically end up enabling some newer SSE instructions. > > I'm just saying the management application will do it. So it'll be local to > the server pool the management app caters to. Not a common denominator for > all deployments. > > > So I don't think we need to go through the hassle of making this > > dynamic. If you want to migrate your machines - use the migrate > > preset. That won't give you the 150% speed boost on video encoding, > > but should not really be any slower on normal workloads. It does make > > things a lot more transparent to us and the admin of a network though, > > because you know what you'll end up with "-cpu migration". > > We hardly know what uses KVM will be put to. Server virtualisation, desktop > virtualisation, combination, what not. If we provide with the flexibility to > the admin to tune as necessary, it's not a bad option at all. All the > userspace needs is one tool that can calculate the max. features supported by > the current CPU and send it over to the management app when asked for. The > management app does the rest. KVM is not involved at all. > > > >> - CPU specific (like a Core2Duo, necessary to run Mac OS X) > > > > > > This doesn't need any more work -- we already have the ability to > > > select CPU > > > types. If the management application has knowledge of the kind of OS > > > being > > > installed in a VM (which these days is true), exposing a Core2Duo > > > for a > > > Mac-based OS isn't difficult. > > > > There is no sysenter emulation for IA-32e on AMD yet, right? That's > > the only issue I see here and your emulation patch should address that. > > As I've mentioned before, I've not yet been able to test my patch because I've > not found the sysenter/sysexit calls being used at all. It's included at the > end of this mail for review; hopefully someone finds a use-case and we can > take it forward. > > > >> - host (fastest possible, but no migration) > > > > > > This should be the default. > > > > I'm not sure. Either host or migration should be the default. This > > I'm suggesting that 'host' should be default. Where do we disagree? Well I believe the corss-architecture migration is not a common case. So the host should be default in IMHO too. But I will not press for it. > > > actually depends on the workload you have on KVM. For servers you'll > > probably want to have migration be the default. For desktop usage it's > > host. I can't think of a way we can be smart about that on the KVM > > level. > > For individual runs from the command line, I'd prefer the host to be the > default. For a wider deployment, the management app will set the defaults as > necessary (admin-chosen). > > > >> I don't think we could find one definition that fits all, so the user > > >> would have to define what the usage pattern will be. > > >> > > >>>> I'd love to hear comments and suggestions on this and hope we'll > > >>>> end > > >>>> up in a fruitful discussion on how to improve the current > > >>>> situation. > > >>> > > >>> I have a patch ready for emulating sysenter/sysexit on AMD systems > > >>> (needs > > >>> testing). Patching the guest was an option that was discouraged; I > > >>> had a hack > > >>> ready but it was quickly shelved (again, untested). > > >> > > >> That sounds useful for misbehaving guests or cases I haven't thought > > >> of yet. Are you sure you're intercepting the SYSENTER MSRs on AMD, so > > >> you don't end up only getting 32 bits? > > > > > > Can you elaborate? > > > > When you write to MSR_IA32_SYSENTER_EIP on AMD, that MSR will be > > directly passed through to the hardware (search for that MSR in > > svm.c). This is because SVM automatically writes the SYSENTER MSRs to > > the SYSENTER fields in the VMCB. > > My patch just handles the case when a sysenter is attempted on a system which > doesn't have that instruction. So I just emulate it. Accessing the MSR and > setting values is done at boot-time by the OS, and any migrations at that > instant is a corner case and not too critical. > > Now, the patch. > > From e1b760d8e596811081c282484621b49c674f1c22 Mon Sep 17 00:00:00 2001 > From: Amit Shah <[EMAIL PROTECTED]> > Date: Wed, 12 Nov 2008 11:31:05 +0530 > Subject: [PATCH] KVM: SVM: Emulate SYSENTER/SYSEXIT on AMD processors > > This patch enables emulation of the sysenter/sysexit instructions in > AMD long mode. This will enable a guest started on an Intel machine to > be migrated to an AMD machine. > > Signed-off-by: Amit Shah <[EMAIL PROTECTED]> > --- > arch/x86/kvm/svm.c | 13 ++++ > arch/x86/kvm/x86_emulate.c | 137 > +++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 149 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index f0ad4d4..4e6e1dc 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -1155,6 +1155,19 @@ static int vmmcall_interception(struct vcpu_svm *svm, > struct kvm_run *kvm_run) > static int invalid_op_interception(struct vcpu_svm *svm, > struct kvm_run *kvm_run) > { > + /* > + * If we're running in long mode on x86_64, check if we can > + * emulate sysenter / sysexit > + */ > + if (!is_long_mode(&svm->vcpu)) > + goto out; > + > + if (emulate_instruction(&svm->vcpu, NULL, 0, 0, 0) == EMULATE_DONE) { > + /* We could emulate it. */ > + return 1; > + } > + > + out: > kvm_queue_exception(&svm->vcpu, UD_VECTOR); > return 1; > } > diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c > index 8f60ace..c8afd20 100644 > --- a/arch/x86/kvm/x86_emulate.c > +++ b/arch/x86/kvm/x86_emulate.c > @@ -205,7 +205,9 @@ static u16 twobyte_table[256] = { > ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > /* 0x30 - 0x3F */ > - ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > + ImplicitOps, 0, ImplicitOps, 0, > + ImplicitOps, ImplicitOps, 0, 0, > + 0, 0, 0, 0, 0, 0, 0, 0, > /* 0x40 - 0x47 */ > DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, > DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, > @@ -305,8 +307,11 @@ static u16 group2_table[] = { > }; > > /* EFLAGS bit definitions. */ > +#define EFLG_VM (1<<17) > +#define EFLG_RF (1<<16) > #define EFLG_OF (1<<11) > #define EFLG_DF (1<<10) > +#define EFLG_IF (1<<9) > #define EFLG_SF (1<<7) > #define EFLG_ZF (1<<6) > #define EFLG_AF (1<<4) > @@ -1959,6 +1964,136 @@ twobyte_insn: > rc = X86EMUL_CONTINUE; > c->dst.type = OP_NONE; > break; > + case 0x34: { /* sysenter */ > + /* Vol 2b */ > + unsigned long cr0 = ctxt->vcpu->arch.cr0; > + struct kvm_segment cs, ss; > + u64 data; > + > + if (cr0 & X86_CR0_PE) { > + kvm_inject_gp(ctxt->vcpu, 0); > + goto cannot_emulate; > + } > + > + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &data); > + if (!(data & 0xFFFC)) { > + kvm_inject_gp(ctxt->vcpu, 0); > + goto cannot_emulate; > + } > + > + ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF); > + > + kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); > + cs.selector = (__u16) data; > + cs.base = 0; > + cs.limit = 0xfffff; > + cs.g = 1; > + cs.s = 1; > + cs.type = 0x0b; > + cs.db = 1; > + cs.dpl = 0; > + cs.selector &= ~SELECTOR_RPL_MASK; > + cs.present = 1; > + /* The CPL should be set to 0 */ > + > + if (ctxt->mode == X86EMUL_MODE_PROT64) { > + cs.l = 1; > + cs.limit = 0xffffffff; > + } > + > + ss.selector = cs.selector + 8; > + ss.base = 0; > + ss.limit = 0xfffff; > + ss.g = 1; > + ss.s = 1; > + ss.type = 0x03; > + ss.db = 1; > + ss.dpl = 0; > + ss.selector &= ~SELECTOR_RPL_MASK; > + ss.present = 1; > + if (ctxt->mode == X86EMUL_MODE_PROT64) { > + ss.limit = 0xffffffff; > + } > + > + kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); > + kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS); > + > + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_EIP, > &data); > + c->eip = data; > + > + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, > &data); > + c->regs[VCPU_REGS_RSP] = data; > + > + goto writeback; > + break; > + } > + case 0x35: { /* sysexit */ > + /* Vol 2b */ > + u64 data; > + unsigned long cr0 = ctxt->vcpu->arch.cr0; > + struct kvm_segment cs, ss; > + > + if (cr0 & X86_CR0_PE) { > + kvm_inject_gp(ctxt->vcpu, 0); > + goto cannot_emulate; > + } > + > + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &data); > + if (!(data & 0xFFFC) || > + ((ctxt->mode == X86EMUL_MODE_PROT64) && !data)) { > + kvm_inject_gp(ctxt->vcpu, 0); > + goto cannot_emulate; > + } > + > + /* Check if CPL is 0. If not, inject_gp */ > + > + kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); > + cs.selector = (u16)(data + > + (ctxt->mode == X86EMUL_MODE_PROT64 ? 32 : > 16)); > + cs.base = 0; > + cs.limit = 0xfffff; > + cs.g = 1; > + cs.s = 1; > + cs.type = 0x0b; > + cs.db = 1; > + cs.dpl = 3; > + cs.selector |= SELECTOR_RPL_MASK; > + cs.present = 1; > + cs.l = 0; /* For return to compatibility mode */ > + /* The CPL should be set to 3 */ > + > + if (ctxt->mode == X86EMUL_MODE_PROT64) { > + cs.l = 1; > + /* The manual doesn't talk about CS limit */ > + } > + > + ss.selector = cs.selector + > + (ctxt->mode == X86EMUL_MODE_PROT64 ? 16 : 8); > + ss.base = 0; > + ss.limit = 0xfffff; > + ss.g = 1; > + ss.s = 1; > + ss.type = 0x03; > + ss.db = 1; > + ss.dpl = 3; > + ss.selector |= SELECTOR_RPL_MASK; > + ss.present = 1; > + if (ctxt->mode == X86EMUL_MODE_PROT64) { > + ss.base = 0; > + ss.limit = 0xffffffff; > + } > + > + kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); > + kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS); > + > + c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX]; > + c->regs[VCPU_REGS_RSP] = c->regs[VCPU_REGS_RCX]; > + > + /* TODO: Check if rip and rsp are canonical. inject_gp() if > not */ > + > + goto writeback; > + break; > + } > case 0x40 ... 0x4f: /* cmov */ > c->dst.val = c->dst.orig_val = c->src.val; > if (!test_cc(c->b, ctxt->eflags)) > -- > 1.5.4.3 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks & Regards, Nitin Open Source Technology Center, Intel Corporation ----------------------------------------------------------------- The mind is like a parachute; it works much better when it's open
commit 70e4e65bc591eb9cf25c1cbc0d16b2cbdb089a6f Author: Nitin A Kamble <[EMAIL PROTECTED]> Date: Wed Nov 5 16:17:46 2008 -0800 Change the ioctl KVM_GET_SUPPORTED_CPUID, such that it will return the no of entries in the list when requested no of entries (nent) is 0. Also add another KVM_CHECK_EXTENSION, KVM_CAP_CPUID_SIZER to determine if the running kernel supports the above changed ABI. Signed-Off-By: Nitin A Kamble <[EMAIL PROTECTED]> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 09e6c56..e50db11 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -86,7 +86,7 @@ #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT) #define KVM_MIN_FREE_MMU_PAGES 5 #define KVM_REFILL_PAGES 25 -#define KVM_MAX_CPUID_ENTRIES 40 +#define KVM_MAX_CPUID_ENTRIES 100 #define KVM_NR_FIXED_MTRR_REGION 88 #define KVM_NR_VAR_MTRR 8 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bf7461b..52e6207 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -969,6 +969,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_NOP_IO_DELAY: case KVM_CAP_MP_STATE: case KVM_CAP_SYNC_MMU: + case KVM_CAP_CPUID_SIZER: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -1303,10 +1304,14 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, { struct kvm_cpuid_entry2 *cpuid_entries; int limit, nent = 0, r = -E2BIG; + int sizer = 0; u32 func; - if (cpuid->nent < 1) - goto out; + if (cpuid->nent == 0) { + sizer = 1; + cpuid->nent = KVM_MAX_CPUID_ENTRIES; + } + r = -ENOMEM; cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent); if (!cpuid_entries) @@ -1327,9 +1332,11 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, do_cpuid_ent(&cpuid_entries[nent], func, 0, &nent, cpuid->nent); r = -EFAULT; - if (copy_to_user(entries, cpuid_entries, + if (!sizer) { + if (copy_to_user(entries, cpuid_entries, nent * sizeof(struct kvm_cpuid_entry2))) - goto out_free; + goto out_free; + } cpuid->nent = nent; r = 0; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 44fd7fa..d4cb8b1 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -392,6 +392,9 @@ struct kvm_trace_rec { #endif #define KVM_CAP_IOMMU 18 #define KVM_CAP_NMI 19 +#define KVM_CAP_CPUID_SIZER 20 /* return of 1 means the KVM_GET_SUPPORTED_CPUID */ + /* ioctl will return the size of list when input */ + /* list size (nent) is 0 */ /* * ioctls for VM fds
diff --git a/libkvm/libkvm-x86.c b/libkvm/libkvm-x86.c index a8cca15..7aafa20 100644 --- a/libkvm/libkvm-x86.c +++ b/libkvm/libkvm-x86.c @@ -379,6 +379,34 @@ int kvm_set_msrs(kvm_context_t kvm, int vcpu, struct kvm_msr_entry *msrs, return r; } +/* + * Returns available host cpuid entries. User must free. + */ +struct kvm_cpuid2 *kvm_get_host_cpuid_entries(kvm_context_t kvm) +{ + struct kvm_cpuid2 sizer, *cpuids; + int r, e; + + sizer.nent = 0; + r = ioctl(kvm->fd, KVM_GET_SUPPORTED_CPUID, &sizer); + if (r == -1 && errno != E2BIG) + return NULL; + cpuids = malloc(sizeof *cpuids + sizer.nent * sizeof *cpuids->entries); + if (!cpuids) { + errno = ENOMEM; + return NULL; + } + cpuids->nent = sizer.nent; + r = ioctl(kvm->fd, KVM_GET_SUPPORTED_CPUID, cpuids); + if (r == -1) { + e = errno; + free(cpuids); + errno = e; + return NULL; + } + return cpuids; +} + static void print_seg(FILE *file, const char *name, struct kvm_segment *seg) { fprintf(stderr, @@ -458,9 +486,9 @@ __u64 kvm_get_cr8(kvm_context_t kvm, int vcpu) } int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent, - struct kvm_cpuid_entry *entries) + struct kvm_cpuid_entry2 *entries) { - struct kvm_cpuid *cpuid; + struct kvm_cpuid2 *cpuid; int r; cpuid = malloc(sizeof(*cpuid) + nent * sizeof(*entries)); @@ -469,7 +497,7 @@ int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent, cpuid->nent = nent; memcpy(cpuid->entries, entries, nent * sizeof(*entries)); - r = ioctl(kvm->vcpu_fd[vcpu], KVM_SET_CPUID, cpuid); + r = ioctl(kvm->vcpu_fd[vcpu], KVM_SET_CPUID2, cpuid); free(cpuid); return r; diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h index 423ce31..f84d524 100644 --- a/libkvm/libkvm.h +++ b/libkvm/libkvm.h @@ -27,6 +27,9 @@ typedef struct kvm_context *kvm_context_t; struct kvm_msr_list *kvm_get_msr_list(kvm_context_t); int kvm_get_msrs(kvm_context_t, int vcpu, struct kvm_msr_entry *msrs, int n); int kvm_set_msrs(kvm_context_t, int vcpu, struct kvm_msr_entry *msrs, int n); +struct kvm_cpuid2 *kvm_get_host_cpuid_entries(kvm_context_t); +void get_host_cpuid_entry(uint32_t function, uint32_t index, + struct kvm_cpuid_entry2 * e); #endif /*! @@ -374,7 +377,7 @@ int kvm_guest_debug(kvm_context_t, int vcpu, struct kvm_debug_guest *dbg); * \return 0 on success, or -errno on error */ int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent, - struct kvm_cpuid_entry *entries); + struct kvm_cpuid_entry2 *entries); /*! * \brief Setting the number of shadow pages to be allocated to the vm diff --git a/qemu/qemu-kvm-x86.c b/qemu/qemu-kvm-x86.c index bf62e18..b776ebf 100644 --- a/qemu/qemu-kvm-x86.c +++ b/qemu/qemu-kvm-x86.c @@ -21,6 +21,7 @@ #define MSR_IA32_TSC 0x10 static struct kvm_msr_list *kvm_msr_list; +static struct kvm_cpuid2 *kvm_host_cpuid_entries; extern unsigned int kvm_shadow_memory; extern kvm_context_t kvm_context; static int kvm_has_msr_star; @@ -52,11 +53,17 @@ int kvm_arch_qemu_create_context(void) kvm_msr_list = kvm_get_msr_list(kvm_context); if (!kvm_msr_list) - return -1; + return -1; + for (i = 0; i < kvm_msr_list->nmsrs; ++i) - if (kvm_msr_list->indices[i] == MSR_STAR) - kvm_has_msr_star = 1; - return 0; + if (kvm_msr_list->indices[i] == MSR_STAR) + kvm_has_msr_star = 1; + + kvm_host_cpuid_entries = kvm_get_host_cpuid_entries(kvm_context); + if (!kvm_host_cpuid_entries) + return -1; + + return 0; } static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index, @@ -476,13 +483,61 @@ static void host_cpuid(uint32_t function, uint32_t *eax, uint32_t *ebx, *edx = vec[3]; } +void get_host_cpuid_entry(uint32_t function, uint32_t index, + struct kvm_cpuid_entry2 * e) +{ + int i; + struct kvm_cpuid_entry2 *entries; + + memset(e, 0, (sizeof *e)); + e->function = function; + e->index = index; + + if (!kvm_host_cpuid_entries) + return; + + entries = kvm_host_cpuid_entries->entries; + + for (i=0; i<kvm_host_cpuid_entries->nent; i++) { + struct kvm_cpuid_entry2 *ent = &entries[i]; + if (ent->function != function) + continue; + if ((ent->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && + (ent->index != index)) + continue; + if ((ent->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) && + !(ent->flags & KVM_CPUID_FLAG_STATE_READ_NEXT)) + continue; + + memcpy(e, ent, sizeof (*e)); + + if (ent->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) { + int j; + ent->flags &= ~KVM_CPUID_FLAG_STATE_READ_NEXT; + for (j=i+1; ; j=(j+1)%(kvm_host_cpuid_entries->nent)) { + struct kvm_cpuid_entry2 *entj = &entries[j]; + if (entj->function == ent->function) { + entj->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; + break; + } + } + } + break; + } +} -static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function, - CPUState *env) +static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function, + uint32_t index, CPUState *env) { + if (env->cpuid_host_cpu) { + get_host_cpuid_entry(function, index, e); + return; + } + e->function = function; + e->index = index; env->regs[R_EAX] = function; + env->regs[R_ECX] = index; qemu_kvm_cpuid_on_env(env); - e->function = function; e->eax = env->regs[R_EAX]; e->ebx = env->regs[R_EBX]; e->ecx = env->regs[R_ECX]; @@ -521,6 +576,11 @@ static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function, if (function == 1) e->ecx |= (1u << 31); + if ((function == 4) || (function == 0xb)) + e->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX; + else + e->flags = 0; + // 3dnow isn't properly emulated yet if (function == 0x80000001) e->edx &= ~0xc0000000; @@ -559,17 +619,30 @@ static int get_para_features(kvm_context_t kvm_context) int kvm_arch_qemu_init_env(CPUState *cenv) { - struct kvm_cpuid_entry cpuid_ent[100]; + struct kvm_cpuid_entry2 *cpuid_ent, entry, *e; + int cpuid_nent = 0, malloc_size = 0; + CPUState copy; + uint32_t i, limit; #ifdef KVM_CPUID_SIGNATURE - struct kvm_cpuid_entry *pv_ent; + struct kvm_cpuid_entry2 *pv_ent; uint32_t signature[3]; + + malloc_size += 2; #endif - int cpuid_nent = 0; - CPUState copy; - uint32_t i, limit; copy = *cenv; + if (copy.cpuid_host_cpu) { + if (!kvm_host_cpuid_entries) + return -EINVAL; + malloc_size += kvm_host_cpuid_entries->nent; + } else + malloc_size += 100; + + cpuid_ent = malloc(malloc_size * sizeof (struct kvm_cpuid_entry2)); + if (!cpuid_ent) + return -ENOMEM; + #ifdef KVM_CPUID_SIGNATURE /* Paravirtualization CPUIDs */ memcpy(signature, "KVMKVMKVM", 12); @@ -587,21 +660,48 @@ int kvm_arch_qemu_init_env(CPUState *cenv) pv_ent->eax = get_para_features(kvm_context); #endif - copy.regs[R_EAX] = 0; - qemu_kvm_cpuid_on_env(©); - limit = copy.regs[R_EAX]; + limit = copy.cpuid_level; + for (i=0; ((i<2) && (i<limit)) ; i++) { + e = &cpuid_ent[cpuid_nent++]; + do_cpuid_ent(e, i, 0, ©); + } + + if (limit >= 2) { /* get the multiple stateful leaf values */ + do_cpuid_ent(&entry, 2, 0, ©); + cpuid_ent[cpuid_nent++] = entry; + for (i = 1; i<(entry.eax & 0xff); i++) { + e = &cpuid_ent[cpuid_nent++]; + do_cpuid_ent(e, 2, 0, ©); + } + } - for (i = 0; i <= limit; ++i) - do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©); + for (i = 3; i <= limit; i++) { + e = &cpuid_ent[cpuid_nent++]; + do_cpuid_ent(e, i, 0, ©); + } - copy.regs[R_EAX] = 0x80000000; - qemu_kvm_cpuid_on_env(©); - limit = copy.regs[R_EAX]; + if (limit >= 4) { /* get the per index values */ + int i = 1; + do { + e = &cpuid_ent[cpuid_nent++]; + do_cpuid_ent(e, 4, i++, ©); + } while(e->eax & 0x1f); /* until the last index */ + } + + if (limit >= 0xb) { /* get the per index values */ + int i = 1; + do { + e = &cpuid_ent[cpuid_nent++]; + do_cpuid_ent(e, 0xb, i++, ©); + } while(e->ecx & 0xff00); /* until the last index */ + } + limit = copy.cpuid_xlevel; for (i = 0x80000000; i <= limit; ++i) - do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©); + do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, 0, ©); kvm_setup_cpuid(kvm_context, cenv->cpu_index, cpuid_nent, cpuid_ent); + free(cpuid_ent); return 0; } diff --git a/qemu/target-i386/cpu.h b/qemu/target-i386/cpu.h index 11bc2c1..42d646a 100644 --- a/qemu/target-i386/cpu.h +++ b/qemu/target-i386/cpu.h @@ -612,6 +612,7 @@ typedef struct CPUX86State { uint32_t cpuid_ext2_features; uint32_t cpuid_ext3_features; uint32_t cpuid_apic_id; + uint32_t cpuid_host_cpu; #ifdef USE_KQEMU int kqemu_enabled; diff --git a/qemu/target-i386/helper.c b/qemu/target-i386/helper.c index 68efd4d..c23e16e 100644 --- a/qemu/target-i386/helper.c +++ b/qemu/target-i386/helper.c @@ -152,6 +152,9 @@ typedef struct x86_def_t { static x86_def_t x86_defs[] = { #ifdef TARGET_X86_64 { + .name = "host", + }, + { .name = "qemu64", .level = 2, .vendor1 = CPUID_VENDOR_AMD_1, @@ -405,10 +408,59 @@ void x86_cpu_list (FILE *f, int (*cpu_fprintf)(FILE *f, const char *fmt, ...)) (*cpu_fprintf)(f, "x86 %16s\n", x86_defs[i].name); } +int fill_x86_defs_for_host(CPUX86State *env, x86_def_t * def) +{ + struct kvm_cpuid_entry2 e; + + get_host_cpuid_entry(0, 0, &e); + env->cpuid_level = e.eax; + env->cpuid_vendor1 = e.ebx; + env->cpuid_vendor2 = e.ecx; + env->cpuid_vendor3 = e.edx; + + get_host_cpuid_entry(1, 0, &e); + env->cpuid_version = e.eax; + env->cpuid_features = e.edx; + env->cpuid_ext_features = e.ecx; + + get_host_cpuid_entry(0x80000000, 0, &e); + env->cpuid_xlevel = e.eax; + + get_host_cpuid_entry(0x80000001, 0, &e); + env->cpuid_ext3_features = e.ecx; + env->cpuid_ext2_features = e.edx; + + get_host_cpuid_entry(0x80000002, 0, &e); + env->cpuid_model[0] = e.eax; + env->cpuid_model[1] = e.ebx; + env->cpuid_model[2] = e.ecx; + env->cpuid_model[3] = e.edx; + + get_host_cpuid_entry(0x80000003, 0, &e); + env->cpuid_model[4] = e.eax; + env->cpuid_model[5] = e.ebx; + env->cpuid_model[6] = e.ecx; + env->cpuid_model[7] = e.edx; + + get_host_cpuid_entry(0x80000004, 0, &e); + env->cpuid_model[8] = e.eax; + env->cpuid_model[9] = e.ebx; + env->cpuid_model[10] = e.ecx; + env->cpuid_model[11] = e.edx; + + return 0; +} + static int cpu_x86_register (CPUX86State *env, const char *cpu_model) { x86_def_t def1, *def = &def1; + if (strcmp(cpu_model, "host") == 0) { + env->cpuid_host_cpu = 1; + fill_x86_defs_for_host(env, def); + return 0; + } /* else follow through */ + env->cpuid_host_cpu = 0; if (cpu_x86_find_by_name(def, cpu_model) < 0) return -1; if (def->vendor1) {