Re: [PATCH][KVM-AUTOTEST] Check exit status of custom install script and fail if script failed.
On Sun, 2009-05-24 at 17:48 +0300, Avi Kivity wrote: Mike Burns wrote: Signed-off-by: Mike Burns mbu...@redhat.com --- client/tests/kvm_runtest_2/kvm_install.py |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_install.py b/client/tests/kvm_runtest_2/kvm_install.py index ebd8b7d..392ef0c 100755 --- a/client/tests/kvm_runtest_2/kvm_install.py +++ b/client/tests/kvm_runtest_2/kvm_install.py @@ -90,7 +90,12 @@ def run_kvm_install(test, params, env): kvm_log.info(Adding KVM_INSTALL_%s to Environment % (k)) os.putenv(KVM_INSTALL_%s % (k), str(params[k])) kvm_log.info(Running + script + to install kvm) -os.system(cd %s; %s % (test.bindir, script)) +install_result = os.system(cd %s; %s % (test.bindir, script)) + if os.WEXITSTATUS(install_result) != 0: + message = Custom Script encountered an error + kvm_log.error(message) + raise error.TestError, message + How about a helper that does os.system() (or rather, commands.getstatusoutput()) and throws an exception on failure? I imagine it could be used in many places. utils.system() does that. If we have exit code != 0, it throws an error.CmdError exception. -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [KVM] soft lockup with RHEL 5.3 guest remote migaration
Find the total dmesg for RHEL5.3 guest http://pastebin.com/f7e22fd1a Regards Pradeep - Forwarded by Pradeep K Surisetty/India/IBM on 05/28/2009 12:13 PM - Pradeep K Surisetty/India/I BM To kvm@vger.kernel.org 05/27/2009 10:07 cc AMPavan Naregundi/India/i...@ibmin, Sachin P Sant/India/i...@ibmin Subject [KVM] soft lockup with RHEL 5.3 guest remote migaration Tried to migrate RHEL 5.3 guest to remote machine. It fails to migrate with the below soft lockup message on guest. I haven't faced this issue, with qemu-kvm-10.1. Remote migration fails with qemu-kvm-0.10.4. = BUG: soft lockup - CPU#0 stuck for 10s! [init:1] Pid: 1, comm: init EIP: 0060:[c044d1e9] CPU: 0 EIP is at handle_IRQ_event+0x39/0x8c EFLAGS: 0246Not tainted (2.6.18-125.el5 #1) EAX: 000c EBX: c06e7480 ECX: c79a8da0 EDX: c0734fb4 ESI: c79a8da0 EDI: 000c EBP: DS: 007b ES: 007b CR0: 8005003b CR2: 08198f00 CR3: 079c7000 CR4: 06d0 [c044d2c0] __do_IRQ+0x84/0xd6 [c044d23c] __do_IRQ+0x0/0xd6 [c04074ce] do_IRQ+0x99/0xc3 [c0405946] common_interrupt+0x1a/0x20 [c0428b6f] __do_softirq+0x57/0x114 [c04073eb] do_softirq+0x52/0x9c [c04059d7] apic_timer_interrupt+0x1f/0x24 [c053a6ae] add_softcursor+0x13/0xa2 [c053ab36] set_cursor+0x3a/0x5c [c053ab9e] con_flush_chars+0x27/0x2f [c0533a31] write_chan+0x1c5/0x298 [c041e3d7] default_wake_function+0x0/0xc [c05315ea] tty_write+0x147/0x1d8 [c053386c] write_chan+0x0/0x298 [c0531ffd] redirected_tty_write+0x1c/0x6c [c0531fe1] redirected_tty_write+0x0/0x6c [c0472cff] vfs_write+0xa1/0x143 [c04732f1] sys_write+0x3c/0x63 [c0404f17] syscall_call+0x7/0xb === Source machine: Machine: x3850 Kernel: 2.6.30-rc6-git4 qemu-kvm-0.10.4 Destination machine: Machine: LS21 Kernel: 2.6.30-rc6-git3 qemu-kvm-0.10.4 Steps to Reproduce: 1. Install RHEL 5.3 guest on x3850 with above mentioned kernel on qemu version 2. NFS mount the dir containing the guest image on Destination(LS21) 3. Boot the rhel guest 4. Wait for guest migration on LS21 by following command qemu-system-x86_64 -boot c rhel5.3.raw -incoming tcp:0: 5. Start the migration on source(x3850) a. On guest press Alt+Ctl+2 to switch to qemu prompt b. Run migrate -d tcp:'Destination IP': 6. Above command hang the guest for around 3 or 4 min and give the call trace on dmesg showing the soft lockup on guest. Few other details: Tried to migrate from ls21 to another ls21 and faced the same failure. I haven't faced this issue, when i was using qemu-10.1 Migration of RHEL 5.3 guest from ls21 to another ls21 succeeded with qemu-10.1 Regards Pradeep -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state
Marcelo Tosatti wrote: On Tue, May 26, 2009 at 10:02:59AM +0200, Christian Ehrhardt wrote: Marcelo Tosatti wrote: On Mon, May 25, 2009 at 01:40:49PM +0200, ehrha...@linux.vnet.ibm.com wrote: From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com To ensure vcpu's come out of guest context in certain cases this patch adds a s390 specific way to kick them out of guest context. Currently it kicks them out to rerun the vcpu_run path in the s390 code, but the mechanism itself is expandable and with a new flag we could also add e.g. kicks to userspace etc. Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com For now I added the optimization to skip kicking vcpus out of guest that had the request bit already set to the s390 specific loop (sent as v2 in a few minutes). We might one day consider standardizing some generic kickout levels e.g. kick to inner loop, arch vcpu run, generic vcpu run, userspace, ... whatever levels fit *all* our use cases. And then let that kicks be implemented in an kvm_arch_* backend as it might be very different how they behave on different architectures. That would be ideal, yes. Two things make_all_requests handles: 1) It disables preemption with get_cpu(), so it can reliably check for cpu id. Somehow you don't need that for s390 when kicking multiple vcpus? I don't even need the cpuid as make_all_requests does, I just insert a special bit in the vcpu arch part and the vcpu will come out to me (host). Fortunateley the kick is rare and fast so I can just insert it unconditionally (it's even ok to insert it if the vcpu is not in guest state). That prevents us from needing vcpu lock or detailed checks which would end up where we started (no guarantee that vcpu's come out of guest context while trying to aquire all vcpu locks) Let me see if I get this right: you kick the vcpus out of guest mode by using a special bit in the vcpu arch part. OK. What I don't understand is this: would end up where we started (no guarantee that vcpu's come out of guest context while trying to aquire all vcpu locks) initially the mechanism looped over vcpu's and just aquired the vcpu lock and then updated the vcpu.arch infor directly. Avi mentioned that we have no guarantee if/when the vcpu will come out of guest context to free a lock currently held and suggested the mechanism x86 uses via setting vcpu-request and kicking the vcpu. Thats the eason behind end up where we (the discussion) started, if we would need the vcpu lock again we would be at the beginnign of the discussion. So you _need_ a mechanism to kick all vcpus out of guest mode? I have a mechanism to kick a vcpu, and I use it. Due to the fact that smp_call_* don't work as kick for us the kick is an arch specific function. I hop ethat clarified this part :-) 2) It uses smp_call_function_many(wait=1), which guarantees that by the time make_all_requests returns no vcpus will be using stale data (the remote vcpus will have executed ack_flush). yes this is really a part my s390 implementation doesn't fulfill yet. Currently on return vcpus might still use the old memslot information. As mentioned before letting all interrupts come too far out of the hot loop would be a performance issue, therefore I think I will need some requestconfirm mechanism. I'm not sure yet but maybe it could be as easy as this pseudo code example: # in make_all_requests # remember we have slots_lock write here and the reentry that updates the vcpu specific data aquires slots_lock for read. loop vcpus set_bit in vcpu requests kick vcpu #arch function endloop loop vcpus until the requested bit is disappeared #as the reentry path uses test_and_clear it will disappear endloop That would be a implicit synchronization and should work, as I wrote before setting memslots while the guest is running is rare if ever existant for s390. On x86 smp_call_many could then work without the wait flag being set. I see, yes. But I assume that this synchronization approach is slower as it serializes all vcpus on reentry (they wait for the slots_lock to get dropped), therefore I wanted to ask how often setting memslots on runtime will occur on x86 ? Would this approach be acceptable ? For x86 we need slots_lock for two things: 1) to protect the memslot structures from changing (very rare), ie: kvm_set_memory. 2) to protect updates to the dirty bitmap (operations on behalf of guest) which take slots_lock for read versus updates to that dirty bitmap (an ioctl that retrieves what pages have been dirtied in the memslots, and clears the dirtyness info). All you need for S390 is 1), AFAICS. correct For 1), we can drop the slots_lock usage, but instead create an explicit synchronization point, where all vcpus are forced to (say kvm_vcpu_block) paused state. qemu-kvm has such notion. Same language? Yes, I think i got your point :-) But I
[ kvm-Bugs-2796640 ] KVM Regression from 2.6.28-11 to 2.6.30-rc5
Bugs item #2796640, was opened at 2009-05-26 03:00 Message generated for change (Comment added) made by kelu6 You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2796640group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Bob Manners (bobmanners) Assigned to: Nobody/Anonymous (nobody) Summary: KVM Regression from 2.6.28-11 to 2.6.30-rc5 Initial Comment: Changing from the Ubuntu 09.04 stock kernel 2.6.28-11 to the current 'alpha' kernel in the Ubuntu development repository for 'Karmic' 2.6.30-rc5, some (but not all) of my KVM guests stop booting. I have the following guest OSs installed as KVM virtual machines: Ubuntu 09.04, FreeBSD 7.1, Minix 3, OpenSolaris 08.11, Windows XP and Windows 7RC Under 2.6.28-11 these all boot and run OK. With kernel 2.6.30-rc5, Ubuntu, FreeBSD and Minix boot and run OK, but OpenSolaris 08.11, Windows XP and Windows 7RC fail to boot. They appear to freeze very early in the boot process, after the boot loader is done and the OS kernel itself starts loading. There are no strange messages in the host system logs that I am aware of. When the guest OS freezes, it pins one of my CPUs at 100% utilization (I have KVM set to only give one CPU to the guest). This looks like a regression under 2.6.30-rc5. It is possible that it is caused by a Ubuntu patch, and I have filed a bug in Launchpad, but I suspect that this is an upstream problem, and so I am reporting it here also. Please let me know what I can do to assist in debugging this. -- Comment By: Kenni (kelu6) Date: 2009-05-28 10:26 Message: Your CPU still has the NX flag, which apparently also was my issue: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/31206/focus=31774 -- Comment By: Bob Manners (bobmanners) Date: 2009-05-28 02:22 Message: In response to kelu6's question, actually I am running 32 bit host / 32 bit guest. My CPU is Intel Core Duo (not Core2 Duo) which is a 32 bit CPU: b...@gecko2:~$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Genuine Intel(R) CPU T2400 @ 1.83GHz stepping: 8 cpu MHz : 1000.000 cache size : 2048 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts pni monitor vmx est tm2 xtpr pdcm bogomips: 3658.94 clflush size: 64 power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Genuine Intel(R) CPU T2400 @ 1.83GHz stepping: 8 cpu MHz : 1000.000 cache size : 2048 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts pni monitor vmx est tm2 xtpr pdcm bogomips: 3658.93 clflush size: 64 power management:
Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state
Christian Ehrhardt wrote: So you _need_ a mechanism to kick all vcpus out of guest mode? I have a mechanism to kick a vcpu, and I use it. Due to the fact that smp_call_* don't work as kick for us the kick is an arch specific function. I hop ethat clarified this part :-) You could still use make_all_vcpus_request(), just change smp_call_function_many() to your own kicker. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm + windows XP, failure to connect to samba server on host
Hello, I use kvm on debian with Windoes XP as guest OS. To have XP in the same network I use a tap network device. I start kvm with sudo kvm -m 512 -smp 2 -hda /mnt/ext3_data/Qemu/winxp.img -net nic,macaddr=52:54:00:00:00:01 -net tap,ifname=tap0,script=no -localtime from a script. I have a fat32 partition mounted on the host which I want to share with XP. I use a samba server as the Quemu build-in samba server did not work. In windows I see the samba server but when trying to connect to the fat32 partition a login is required. First no password works and secondly I don't want a login, I prefer to be able to connect straight away. This is the smb.conf: [global] workgroup = NANOTRONIC server string = Linux SAMBA server3 log file = /var/log/samba/log.%m max log size = 50 [vfat_data] comment = fat32 drive path = /mnt/vfat_data guest ok = yes Does anyone have an idea what I am missing? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Userspace MSR handling
Gerd Hoffmann wrote: - what about connecting the guest driver to xen netback one day? we don't want to go through userspace for that. You can't without emulation tons of xen stuff in-kernel. Current situation: * Guest does xen hypercalls. We can handle that just fine. * Host userspace (backends) calls libxen*, where the xen hypercall calls are hidden. We can redirect the library calls via LD_PRELOAD (standalone xenner) or function pointers (qemuified xenner) and do something else instead. Trying to use in-kernel xen netback driver adds this problem: * Host kernel does xen hypercalls. Ouch. We have to emulate them in-kernel (otherwise using in-kernel netback would be a quite pointless exercise). Or do the standard function pointer trick. Event channel notifications change to eventfd_signal, grant table ops change to copy_to_user(). One way or another, the MSR somehow has to map in a chunk of data supplied by userspace. Are you suggesting an alternative to the PIO hack? Well, the chunk of data is on disk anyway: $libdir/xenner/hvm{32,64}.bin So a possible plan to attack could be ln -s $libdir/xenner /lib/firmware, let kvm.ko grab it if needed using request_firmware(xenner/hvm${bits}.bin), and a few lines of kernel code handling the wrmsr. Logic is just this: void xenner_wrmsr(uint64_t val, int longmode) { uint32_t page = val ~PAGE_MASK; uint64_t paddr = val PAGE_MASK; uint8_t *blob = longmode ? hvm64 : hvm32; cpu_physical_memory_write(paddr, blob + page * PAGE_SIZE, PAGE_SIZE); } Well, you'll have to sprinkle in blob loading and caching and some error checking. But even with that it is probably hard to beat in actual code size. This ties all guests to one hypercall page implementation installed in one root-only place. Additional plus is we get away without a new ioctl then. Minimizing the amount of ioctls is an important non-goal. If you replace request_firmware with an ioctl that defines the location and size of the hypercall page in host userspace, this would work well. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-kmod: Add missing line continuation
Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index db44772..f51b491 100644 --- a/Makefile +++ b/Makefile @@ -30,7 +30,7 @@ all:: prerequisite $(if $(KERNELSOURCEDIR),-Iinclude2 -I$(KERNELSOURCEDIR)/include) \ -Iarch/${ARCH_DIR}/include -I`pwd`/include-compat \ -include include/linux/autoconf.h \ - -include `pwd`/$(ARCH_DIR)/external-module-compat.h $(module_defines) + -include `pwd`/$(ARCH_DIR)/external-module-compat.h $(module_defines) \ $$@ include $(MAKEFILE_PRE) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2D Graphics Performance
No, it works on windows too, it is just (quite) a bit of a pain to find and setup the correct driver. Any pointers where to look for them? SR -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
On Wed, 2009-05-27 at 16:45 -0400, Gregory Haskins wrote: Mark McLoughlin wrote: The virtio ABI is fixed, so we couldn't e.g. have the guest use a cookie to identify a queue - it's just going to continue using a per-device queue number. Actually, I was originally thinking this would be exposed as a virtio FEATURE bit anyway, so there were no backwards-compat constraints. That said, we can possibly make it work in a backwards compat way, too. IIRC, today virtio does a PIO cycle to a specific register with the queue-id when it wants to signal guest-host, right? What is the width of the write? It's a 16-bit write. /* A 16-bit r/w queue notifier */ #define VIRTIO_PCI_QUEUE_NOTIFY 16 So, if the cookie was also the trigger, we'd need an eventfd per device. I'm having trouble parsing this one. The cookie namespace is controlled by the userspace component that owns the corresponding IO address, so there's no reason you can't make queue-id = 0 use cookie = 0, or whatever. That said, I still think a separation of the cookie and trigger as suggested above is a good idea, so its probably moot to discuss this point further. Ah, my mistake - I thought the cookie was returned to userspace when the eventfd was signalled, but no ... userspace only gets an event counter value and the cookie is used during de-assignment to distinguish between iosignalfds. Okay, so suppose you do assign multiple times at a given address - you're presumably going to use a different eventfd for each assignment? If so, can't we match using both the address and eventfd at de-assignment and drop the cookie from the interface altogether? i.e. to replace the virtio queue notify with this, we'd: 1) create an eventfd per queue 2) assign each of those eventfds to the QUEUE_NOTIFY address 3) have only one of the eventfds be triggered, depending on what value is written by the guest 4) de-assign using the address/eventfd pair to distinguish between assignments Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Remove qemu_alloc_physram()
After some tests : with kvm-85 all seems to be ok for me, multiple qemu runs well with hugepage with kvm-86 + patch : after some time, vm reboot with strange disk corruption ( windows tells that's some file missing ) and cpu load is very important. Note : KVM 85 : ( run one qemu with -m 512 ) HugePages_Total: 13676 HugePages_Free: 13409 HugePages_Rsvd: 0 HugePages_Surp: 0 KVM 86 + patche HugePages_Total: 13676 HugePages_Free: 13412 HugePages_Rsvd: 0 HugePages_Surp: 0 I do not known if this difference is the bug cause. Regards Nicolas 2009/5/27 Avi Kivity a...@redhat.com: Avi Kivity wrote: nicolas prochazka wrote: without -mem-prealloc HugePages_Total: 2560 HugePages_Free: 2296 HugePages_Rsvd: 0 so after minimum test, i can say that's your patch seems to be correct this problem. It isn't correct, it doesn't generate the right alignment. Better patch attached. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] use explicit 64bit storage for sysenter values
Since AMD does not support sysenter in 64bit mode, the VMCB fields storing the MSRs are truncated to 32bit upon VMRUN/#VMEXIT. So store the values in a separate 64bit storage to avoid truncation. Signed-off-by: Christoph Egger christoph.eg...@amd.com --- arch/x86/kvm/kvm_svm.h |4 arch/x86/kvm/svm.c | 12 ++-- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/kvm_svm.h b/arch/x86/kvm/kvm_svm.h index ed66e4c..4129dc1 100644 --- a/arch/x86/kvm/kvm_svm.h +++ b/arch/x86/kvm/kvm_svm.h @@ -27,6 +27,10 @@ struct vcpu_svm { unsigned long vmcb_pa; struct svm_cpu_data *svm_data; uint64_t asid_generation; + uint64_t sysenter_cs; + uint64_t sysenter_esp; + uint64_t sysenter_eip; + struct kvm_segment user_cs; /* used in sysenter/sysexit emulation */ u64 next_rip; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index dd667dd..f0f2885 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1978,13 +1978,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) break; #endif case MSR_IA32_SYSENTER_CS: - *data = svm-vmcb-save.sysenter_cs; + *data = svm-sysenter_cs; break; case MSR_IA32_SYSENTER_EIP: - *data = svm-vmcb-save.sysenter_eip; + *data = svm-sysenter_eip; break; case MSR_IA32_SYSENTER_ESP: - *data = svm-vmcb-save.sysenter_esp; + *data = svm-sysenter_esp; break; /* Nobody will change the following 5 values in the VMCB so we can safely return them on rdmsr. They will always be 0 @@ -2068,13 +2068,13 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) break; #endif case MSR_IA32_SYSENTER_CS: - svm-vmcb-save.sysenter_cs = data; + svm-sysenter_cs = data; break; case MSR_IA32_SYSENTER_EIP: - svm-vmcb-save.sysenter_eip = data; + svm-sysenter_eip = data; break; case MSR_IA32_SYSENTER_ESP: - svm-vmcb-save.sysenter_esp = data; + svm-sysenter_esp = data; break; case MSR_IA32_DEBUGCTLMSR: if (!svm_has(SVM_FEATURE_LBRV)) { -- 1.6.1.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 3/4] Break dependency between vcpu index in vcpus array and vcpu_id.
Archs are free to use vcpu_id as they see fit. For x86 it is used as vcpu's apic id. New ioctl is added to configure boot vcpu id that was assumed to be 0 till now. Signed-off-by: Gleb Natapov g...@redhat.com --- include/linux/kvm.h |2 + include/linux/kvm_host.h |2 + virt/kvm/kvm_main.c | 50 ++--- 3 files changed, 33 insertions(+), 21 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 632a856..d10ab5d 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -430,6 +430,7 @@ struct kvm_trace_rec { #ifdef __KVM_HAVE_PIT #define KVM_CAP_PIT2 33 #endif +#define KVM_CAP_SET_BOOT_CPU_ID 34 #ifdef KVM_CAP_IRQ_ROUTING @@ -537,6 +538,7 @@ struct kvm_irqfd { #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) +#define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e9e0cd8..e368a14 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -130,8 +130,10 @@ struct kvm { int nmemslots; struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; + u32 bsp_vcpu_id; struct kvm_vcpu *bsp_vcpu; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; + atomic_t online_vcpus; struct list_head vm_list; struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5a55fe0..d65c637 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -693,11 +693,6 @@ out: } #endif -static inline int valid_vcpu(int n) -{ - return likely(n = 0 n KVM_MAX_VCPUS); -} - inline int kvm_is_mmio_pfn(pfn_t pfn) { if (pfn_valid(pfn)) { @@ -1713,15 +1708,12 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu) /* * Creates some virtual cpus. Good luck creating more than one. */ -static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n) +static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) { int r; struct kvm_vcpu *vcpu; - if (!valid_vcpu(n)) - return -EINVAL; - - vcpu = kvm_arch_vcpu_create(kvm, n); + vcpu = kvm_arch_vcpu_create(kvm, id); if (IS_ERR(vcpu)) return PTR_ERR(vcpu); @@ -1732,25 +1724,36 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n) return r; mutex_lock(kvm-lock); - if (kvm-vcpus[n]) { - r = -EEXIST; + if (atomic_read(kvm-online_vcpus) == KVM_MAX_VCPUS) { + r = -EINVAL; goto vcpu_destroy; } - kvm-vcpus[n] = vcpu; - if (n == 0) - kvm-bsp_vcpu = vcpu; - mutex_unlock(kvm-lock); + + for (r = 0; r atomic_read(kvm-online_vcpus); r++) + if (kvm-vcpus[r]-vcpu_id == id) { + r = -EEXIST; + goto vcpu_destroy; + } + + BUG_ON(kvm-vcpus[atomic_read(kvm-online_vcpus)]); /* Now it's all set up, let userspace reach it */ kvm_get_kvm(kvm); r = create_vcpu_fd(vcpu); - if (r 0) - goto unlink; + if (r 0) { + kvm_put_kvm(kvm); + goto vcpu_destroy; + } + + kvm-vcpus[atomic_read(kvm-online_vcpus)] = vcpu; + smp_wmb(); + atomic_inc(kvm-online_vcpus); + + if (kvm-bsp_vcpu_id == id) + kvm-bsp_vcpu = vcpu; + mutex_unlock(kvm-lock); return r; -unlink: - mutex_lock(kvm-lock); - kvm-vcpus[n] = NULL; vcpu_destroy: mutex_unlock(kvm-lock); kvm_arch_vcpu_destroy(vcpu); @@ -2223,6 +2226,10 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags); break; } + case KVM_SET_BOOT_CPU_ID: + kvm-bsp_vcpu_id = arg; + r = 0; + break; default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } @@ -2289,6 +2296,7 @@ static long kvm_dev_ioctl_check_extension_generic(long arg) case KVM_CAP_USER_MEMORY: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: + case KVM_CAP_SET_BOOT_CPU_ID: return 1; #ifdef CONFIG_HAVE_KVM_IRQCHIP case KVM_CAP_IRQ_ROUTING: -- 1.6.2.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 0/4] decouple vcpu index from apic id
Currently vcpu_id is used as an index into vcpus array and as apic id on x86. This is incorrect since apic ids not have to be continuous (they can also encode cpu hierarchy information) and may have values bigger then vcpu array in case of x2apic. This series removes use of vcpu_id as vcpus array index. Each architecture may use it how it sees fit. x86 uses it as apic id. v2: In this version vcpus[] is managed by generic code. v3: New ioctl is added to specify vcpu_id of bsp vcpu. To maintain backwards compatibility by default vcpu_id == 0 is bsp. Gleb Natapov (4): Introduce kvm_vcpu_is_bsp() function. Use pointer to vcpu instead of vcpu_id in timer code. Break dependency between vcpu index in vcpus array and vcpu_id. Use macro to iterate over vcpus. arch/ia64/kvm/kvm-ia64.c | 35 ++-- arch/ia64/kvm/vcpu.c |2 +- arch/powerpc/kvm/powerpc.c | 16 +++ arch/s390/kvm/kvm-s390.c | 33 +++ arch/x86/kvm/i8254.c | 13 +++- arch/x86/kvm/i8259.c |6 ++-- arch/x86/kvm/kvm_timer.h |2 +- arch/x86/kvm/lapic.c |9 +++--- arch/x86/kvm/mmu.c |6 ++-- arch/x86/kvm/svm.c |4 +- arch/x86/kvm/timer.c |2 +- arch/x86/kvm/vmx.c |6 ++-- arch/x86/kvm/x86.c | 29 ++-- include/linux/kvm.h|2 + include/linux/kvm_host.h | 18 virt/kvm/ioapic.c |4 ++- virt/kvm/irq_comm.c|6 +--- virt/kvm/kvm_main.c| 63 +++ 18 files changed, 138 insertions(+), 118 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 2/4] Use pointer to vcpu instead of vcpu_id in timer code.
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/i8254.c |2 +- arch/x86/kvm/kvm_timer.h |2 +- arch/x86/kvm/lapic.c |2 +- arch/x86/kvm/timer.c |2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index de4785a..049cea2 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -291,7 +291,7 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) pt-timer.function = kvm_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu_id = 0; + pt-vcpu = pt-kvm-bsp_vcpu; atomic_set(pt-pending, 0); ps-irq_ack = 1; diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h index 26bd6ba..55c7524 100644 --- a/arch/x86/kvm/kvm_timer.h +++ b/arch/x86/kvm/kvm_timer.h @@ -6,7 +6,7 @@ struct kvm_timer { bool reinject; struct kvm_timer_ops *t_ops; struct kvm *kvm; - int vcpu_id; + struct kvm_vcpu *vcpu; }; struct kvm_timer_ops { diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 260d4fa..9d0608c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -946,7 +946,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) apic-lapic_timer.timer.function = kvm_timer_fn; apic-lapic_timer.t_ops = lapic_timer_ops; apic-lapic_timer.kvm = vcpu-kvm; - apic-lapic_timer.vcpu_id = vcpu-vcpu_id; + apic-lapic_timer.vcpu = vcpu; apic-base_address = APIC_DEFAULT_PHYS_BASE; vcpu-arch.apic_base = APIC_DEFAULT_PHYS_BASE; diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c index 86dbac0..85cc743 100644 --- a/arch/x86/kvm/timer.c +++ b/arch/x86/kvm/timer.c @@ -33,7 +33,7 @@ enum hrtimer_restart kvm_timer_fn(struct hrtimer *data) struct kvm_vcpu *vcpu; struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); - vcpu = ktimer-kvm-vcpus[ktimer-vcpu_id]; + vcpu = ktimer-vcpu; if (!vcpu) return HRTIMER_NORESTART; -- 1.6.2.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v3 1/4] Introduce kvm_vcpu_is_bsp() function.
Use it instead of open code vcpu_id zero is BSP assumption. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/ia64/kvm/kvm-ia64.c |2 +- arch/ia64/kvm/vcpu.c |2 +- arch/x86/kvm/i8254.c |4 ++-- arch/x86/kvm/i8259.c |6 +++--- arch/x86/kvm/lapic.c |7 --- arch/x86/kvm/svm.c |4 ++-- arch/x86/kvm/vmx.c |6 +++--- arch/x86/kvm/x86.c |4 ++-- include/linux/kvm_host.h |5 + virt/kvm/ioapic.c|4 +++- virt/kvm/kvm_main.c |2 ++ 11 files changed, 28 insertions(+), 18 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 3199221..3924591 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1216,7 +1216,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) if (IS_ERR(vmm_vcpu)) return PTR_ERR(vmm_vcpu); - if (vcpu-vcpu_id == 0) { + if (kvm_vcpu_is_bsp(vcpu)) { vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE; /*Set entry address for first run.*/ diff --git a/arch/ia64/kvm/vcpu.c b/arch/ia64/kvm/vcpu.c index a2c6c15..7e7391d 100644 --- a/arch/ia64/kvm/vcpu.c +++ b/arch/ia64/kvm/vcpu.c @@ -830,7 +830,7 @@ static void vcpu_set_itc(struct kvm_vcpu *vcpu, u64 val) kvm = (struct kvm *)KVM_VM_BASE; - if (vcpu-vcpu_id == 0) { + if (kvm_vcpu_is_bsp(vcpu)) { for (i = 0; i kvm-arch.online_vcpus; i++) { v = (struct kvm_vcpu *)((char *)vcpu + sizeof(struct kvm_vcpu_data) * i); diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 584e3d3..de4785a 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -228,7 +228,7 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu) { struct kvm_pit *pit = vcpu-kvm-arch.vpit; - if (pit vcpu-vcpu_id == 0 pit-pit_state.irq_ack) + if (pit kvm_vcpu_is_bsp(vcpu) pit-pit_state.irq_ack) return atomic_read(pit-pit_state.pit_timer.pending); return 0; } @@ -249,7 +249,7 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) struct kvm_pit *pit = vcpu-kvm-arch.vpit; struct hrtimer *timer; - if (vcpu-vcpu_id != 0 || !pit) + if (!kvm_vcpu_is_bsp(vcpu) || !pit) return; timer = pit-pit_state.pit_timer.timer; diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 1ccb50c..6befa98 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -57,7 +57,7 @@ static void pic_unlock(struct kvm_pic *s) } if (wakeup) { - vcpu = s-kvm-vcpus[0]; + vcpu = s-kvm-bsp_vcpu; if (vcpu) kvm_vcpu_kick(vcpu); } @@ -252,7 +252,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s) { int irq, irqbase, n; struct kvm *kvm = s-pics_state-irq_request_opaque; - struct kvm_vcpu *vcpu0 = kvm-vcpus[0]; + struct kvm_vcpu *vcpu0 = kvm-bsp_vcpu; if (s == s-pics_state-pics[0]) irqbase = 0; @@ -505,7 +505,7 @@ static void picdev_read(struct kvm_io_device *this, static void pic_irq_request(void *opaque, int level) { struct kvm *kvm = opaque; - struct kvm_vcpu *vcpu = kvm-vcpus[0]; + struct kvm_vcpu *vcpu = kvm-bsp_vcpu; struct kvm_pic *s = pic_irqchip(kvm); int irq = pic_get_irq(s-pics[0]); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index ae99d83..260d4fa 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -787,7 +787,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) vcpu-arch.apic_base = value; return; } - if (apic-vcpu-vcpu_id) + + if (!kvm_vcpu_is_bsp(apic-vcpu)) value = ~MSR_IA32_APICBASE_BSP; vcpu-arch.apic_base = value; @@ -844,7 +845,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu) } update_divide_count(apic); atomic_set(apic-lapic_timer.pending, 0); - if (vcpu-vcpu_id == 0) + if (kvm_vcpu_is_bsp(vcpu)) vcpu-arch.apic_base |= MSR_IA32_APICBASE_BSP; apic_update_ppr(apic); @@ -985,7 +986,7 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); int r = 0; - if (vcpu-vcpu_id == 0) { + if (kvm_vcpu_is_bsp(vcpu)) { if (!apic_hw_enabled(vcpu-arch.apic)) r = 1; if ((lvt0 APIC_LVT_MASKED) == 0 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index dd667dd..8eede3f 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -605,7 +605,7 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu) init_vmcb(svm); - if (vcpu-vcpu_id != 0) { + if (!kvm_vcpu_is_bsp(vcpu)) { kvm_rip_write(vcpu, 0); svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector 12;
[PATCH RFC v3 4/4] Use macro to iterate over vcpus.
Signed-off-by: Gleb Natapov g...@redhat.com --- arch/ia64/kvm/kvm-ia64.c | 33 ++--- arch/powerpc/kvm/powerpc.c | 16 ++-- arch/s390/kvm/kvm-s390.c | 33 - arch/x86/kvm/i8254.c |7 ++- arch/x86/kvm/mmu.c |6 +++--- arch/x86/kvm/x86.c | 25 - include/linux/kvm_host.h | 11 +++ virt/kvm/irq_comm.c|6 ++ virt/kvm/kvm_main.c| 19 +++ 9 files changed, 77 insertions(+), 79 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 3924591..c1c5cb6 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -337,13 +337,12 @@ static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, unsigned long id, { union ia64_lid lid; int i; + struct kvm_vcpu *vcpu; - for (i = 0; i kvm-arch.online_vcpus; i++) { - if (kvm-vcpus[i]) { - lid.val = VCPU_LID(kvm-vcpus[i]); - if (lid.id == id lid.eid == eid) - return kvm-vcpus[i]; - } + kvm_for_each_vcpu(i, vcpu, kvm) { + lid.val = VCPU_LID(vcpu); + if (lid.id == id lid.eid == eid) + return vcpu; } return NULL; @@ -409,21 +408,21 @@ static int handle_global_purge(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) struct kvm *kvm = vcpu-kvm; struct call_data call_data; int i; + struct kvm_vcpu *vcpui; call_data.ptc_g_data = p-u.ptc_g_data; - for (i = 0; i kvm-arch.online_vcpus; i++) { - if (!kvm-vcpus[i] || kvm-vcpus[i]-arch.mp_state == - KVM_MP_STATE_UNINITIALIZED || - vcpu == kvm-vcpus[i]) + kvm_for_each_vcpu(i, vcpui, kvm) { + if (vcpui-arch.mp_state == KVM_MP_STATE_UNINITIALIZED || + vcpu == vcpui) continue; - if (waitqueue_active(kvm-vcpus[i]-wq)) - wake_up_interruptible(kvm-vcpus[i]-wq); + if (waitqueue_active(vcpui-wq)) + wake_up_interruptible(vcpui-wq); - if (kvm-vcpus[i]-cpu != -1) { - call_data.vcpu = kvm-vcpus[i]; - smp_call_function_single(kvm-vcpus[i]-cpu, + if (vcpui-cpu != -1) { + call_data.vcpu = vcpui; + smp_call_function_single(vcpui-cpu, vcpu_global_purge, call_data, 1); } else printk(KERN_WARNINGkvm: Uninit vcpu received ipi!\n); @@ -852,8 +851,6 @@ struct kvm *kvm_arch_create_vm(void) kvm_init_vm(kvm); - kvm-arch.online_vcpus = 0; - return kvm; } @@ -1356,8 +1353,6 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, goto fail; } - kvm-arch.online_vcpus++; - return vcpu; fail: return ERR_PTR(r); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2cf915e..7ad30e0 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -122,13 +122,17 @@ struct kvm *kvm_arch_create_vm(void) static void kvmppc_free_vcpus(struct kvm *kvm) { unsigned int i; + struct kvm_vcpu *vcpu; - for (i = 0; i KVM_MAX_VCPUS; ++i) { - if (kvm-vcpus[i]) { - kvm_arch_vcpu_free(kvm-vcpus[i]); - kvm-vcpus[i] = NULL; - } - } + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_arch_vcpu_free(vcpu); + + mutex_lock(kvm-lock); + for (i = 0; i atomic_read(kvm-online_vcpus); i++) + kvm-vcpus[i] = NULL; + + atomic_set(kvm-online_vcpus, 0); + mutex_unlock(kvm-lock); } void kvm_arch_sync_events(struct kvm *kvm) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 981ab04..217d3d3 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -209,13 +209,17 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) static void kvm_free_vcpus(struct kvm *kvm) { unsigned int i; + struct kvm_vcpu *vcpu; - for (i = 0; i KVM_MAX_VCPUS; ++i) { - if (kvm-vcpus[i]) { - kvm_arch_vcpu_destroy(kvm-vcpus[i]); - kvm-vcpus[i] = NULL; - } - } + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_arch_vcpu_destroy(vcpu); + + mutex_lock(kvm-lock); + for (i = 0; i atomic_read(kvm-online_vcpus); i++) + kvm-vcpus[i] = NULL; + + atomic_set(kvm-online_vcpus, 0); + mutex_unlock(kvm-lock); } void kvm_arch_sync_events(struct kvm *kvm) @@ -311,8 +315,6 @@ struct kvm_vcpu
[PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode
sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas syscall is not supported on Intel's 32bit compat mode. To allow cross vendor migration we emulate the missing instructions by setting up the processor state according to the other call. The sysenter code was originally sketched by Amit Shah, it was completed, debugged, syscall added and made-to-work by Christoph Egger and polished up by Andre Przywara. Please note that sysret does not need to be emulated, because it will be exectued in 64bit mode and returning to 32bit compat mode works on Intel. Signed-off-by: Amit Shah amit.s...@redhat.com Signed-off-by: Christoph Egger christoph.eg...@amd.com Signed-off-by: Andre Przywara andre.przyw...@amd.com --- arch/x86/kvm/x86.c | 37 - arch/x86/kvm/x86_emulate.c | 349 +++- 2 files changed, 380 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6d44dd5..dae7726 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2593,11 +2593,38 @@ int emulate_instruction(struct kvm_vcpu *vcpu, /* Reject the instructions other than VMCALL/VMMCALL when * try to emulate invalid opcode */ c = vcpu-arch.emulate_ctxt.decode; - if ((emulation_type EMULTYPE_TRAP_UD) - (!(c-twobyte c-b == 0x01 - (c-modrm_reg == 0 || c-modrm_reg == 3) - c-modrm_mod == 3 c-modrm_rm == 1))) - return EMULATE_FAIL; + + if (emulation_type EMULTYPE_TRAP_UD) { + if (!c-twobyte) + return EMULATE_FAIL; + switch (c-b) { + case 0x01: /* VMMCALL */ + if (c-modrm_mod != 3) + return EMULATE_FAIL; + if (c-modrm_rm != 1) + return EMULATE_FAIL; + break; + case 0x34: /* sysenter */ + case 0x35: /* sysexit */ + if (c-modrm_mod != 0) + return EMULATE_FAIL; + if (c-modrm_rm != 0) + return EMULATE_FAIL; + break; + case 0x05: /* syscall */ + r = 0; + if (c-modrm_mod != 0) + return EMULATE_FAIL; + if (c-modrm_rm != 0) + return EMULATE_FAIL; + break; + default: + return EMULATE_FAIL; + } + + if (!(c-modrm_reg == 0 || c-modrm_reg == 3)) + return EMULATE_FAIL; + } ++vcpu-stat.insn_emulation; if (r) { diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index 22c765d..41b78fa 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -32,6 +32,8 @@ #include linux/module.h #include asm/kvm_x86_emulate.h +#include mmu.h + /* * Opcode effective-address decode tables. * Note that we only emulate instructions that have at least one memory @@ -217,7 +219,9 @@ static u32 twobyte_table[256] = { ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x30 - 0x3F */ - ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + ImplicitOps, 0, ImplicitOps, 0, + ImplicitOps, ImplicitOps, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, /* 0x40 - 0x47 */ DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, @@ -320,8 +324,11 @@ static u32 group2_table[] = { }; /* EFLAGS bit definitions. */ +#define EFLG_VM (117) +#define EFLG_RF (116) #define EFLG_OF (111) #define EFLG_DF (110) +#define EFLG_IF (19) #define EFLG_SF (17) #define EFLG_ZF (16) #define EFLG_AF (14) @@ -1985,10 +1992,114 @@ twobyte_insn: goto cannot_emulate; } break; + case 0x05: { /* syscall */ + unsigned long cr0 = ctxt-vcpu-arch.cr0; + struct kvm_segment cs, ss; + + memset(cs, 0, sizeof(struct kvm_segment)); + memset(ss, 0, sizeof(struct kvm_segment)); + + /* inject #UD if +* 1. we are in real mode +* 2. protected mode is not enabled +* 3. LOCK prefix is used +*/ + if ((ctxt-mode == X86EMUL_MODE_REAL) + || (!(cr0 X86_CR0_PE)) +
Fw: [KVM] soft lockup with RHEL 5.3 guest remote migaration
Tried local migration of RHEL 5.3 guest on x3850. After local migration, both source and destination are in active state and leaves soft lockup oops. Thanks Pradeep - Forwarded by Pradeep K Surisetty/India/IBM on 05/28/2009 04:01 PM - Pradeep K Surisetty/India/I BM To kvm@vger.kernel.org 05/28/2009 12:13 cc PMPavan Naregundi/India/i...@ibmin, Sachin P Sant/India/i...@ibmin Subject Fw: [KVM] soft lockup with RHEL 5.3 guest remote migaration Find the total dmesg for RHEL5.3 guest http://pastebin.com/f7e22fd1a Regards Pradeep - Forwarded by Pradeep K Surisetty/India/IBM on 05/28/2009 12:13 PM - Pradeep K Surisetty/India/I BM To kvm@vger.kernel.org 05/27/2009 10:07 cc AMPavan Naregundi/India/i...@ibmin, Sachin P Sant/India/i...@ibmin Subject [KVM] soft lockup with RHEL 5.3 guest remote migaration Tried to migrate RHEL 5.3 guest to remote machine. It fails to migrate with the below soft lockup message on guest. I haven't faced this issue, with qemu-kvm-10.1. Remote migration fails with qemu-kvm-0.10.4. = BUG: soft lockup - CPU#0 stuck for 10s! [init:1] Pid: 1, comm: init EIP: 0060:[c044d1e9] CPU: 0 EIP is at handle_IRQ_event+0x39/0x8c EFLAGS: 0246Not tainted (2.6.18-125.el5 #1) EAX: 000c EBX: c06e7480 ECX: c79a8da0 EDX: c0734fb4 ESI: c79a8da0 EDI: 000c EBP: DS: 007b ES: 007b CR0: 8005003b CR2: 08198f00 CR3: 079c7000 CR4: 06d0 [c044d2c0] __do_IRQ+0x84/0xd6 [c044d23c] __do_IRQ+0x0/0xd6 [c04074ce] do_IRQ+0x99/0xc3 [c0405946] common_interrupt+0x1a/0x20 [c0428b6f] __do_softirq+0x57/0x114 [c04073eb] do_softirq+0x52/0x9c [c04059d7] apic_timer_interrupt+0x1f/0x24 [c053a6ae] add_softcursor+0x13/0xa2 [c053ab36] set_cursor+0x3a/0x5c [c053ab9e] con_flush_chars+0x27/0x2f [c0533a31] write_chan+0x1c5/0x298 [c041e3d7] default_wake_function+0x0/0xc [c05315ea] tty_write+0x147/0x1d8 [c053386c] write_chan+0x0/0x298 [c0531ffd] redirected_tty_write+0x1c/0x6c [c0531fe1] redirected_tty_write+0x0/0x6c [c0472cff] vfs_write+0xa1/0x143 [c04732f1] sys_write+0x3c/0x63 [c0404f17] syscall_call+0x7/0xb === Source machine: Machine: x3850 Kernel: 2.6.30-rc6-git4 qemu-kvm-0.10.4 Destination machine: Machine: LS21 Kernel: 2.6.30-rc6-git3 qemu-kvm-0.10.4 Steps to Reproduce: 1. Install RHEL 5.3 guest on x3850 with above mentioned kernel on qemu version 2. NFS mount the dir containing the guest image on Destination(LS21) 3. Boot the rhel guest 4. Wait for guest migration on LS21 by following command qemu-system-x86_64 -boot c rhel5.3.raw -incoming tcp:0: 5. Start the migration on source(x3850) a. On guest press Alt+Ctl+2 to switch to qemu prompt b. Run migrate -d tcp:'Destination IP': 6. Above command hang the guest for around 3 or 4 min and give the call
Re: [KVM PATCH v2 2/3] kvm: cleanup io_device code
Chris Wright wrote: * Gregory Haskins (ghask...@novell.com) wrote: We modernize the io_device code so that we use container_of() instead of dev-private, and move the vtable to a separate ops structure (theoretically allows better caching for multiple instances of the same ops structure) Looks like a nice cleanup. Couple minor nits. +static struct kvm_io_device_ops pit_dev_ops = { +.read = pit_ioport_read, +.write= pit_ioport_write, +.in_range = pit_in_range, +}; + +static struct kvm_io_device_ops speaker_dev_ops = { +.read = speaker_ioport_read, +.write= speaker_ioport_write, +.in_range = speaker_in_range, +}; kvm_io_device_ops instances could be made const. Ack --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2227,7 +2227,7 @@ static struct kvm_io_device *vcpu_find_pervcpu_dev(struct kvm_vcpu *vcpu, if (vcpu-arch.apic) { dev = vcpu-arch.apic-dev; -if (dev-in_range(dev, addr, len, is_write)) +if (dev-ops-in_range(dev, addr, len, is_write)) return dev; --- a/virt/kvm/iodev.h +++ b/virt/kvm/iodev.h @@ -18,7 +18,9 @@ #include linux/kvm_types.h -struct kvm_io_device { +struct kvm_io_device; + +struct kvm_io_device_ops { void (*read)(struct kvm_io_device *this, gpa_t addr, int len, @@ -30,16 +32,25 @@ struct kvm_io_device { int (*in_range)(struct kvm_io_device *this, gpa_t addr, int len, int is_write); void (*destructor)(struct kvm_io_device *this); +}; + -void *private; +struct kvm_io_device { +struct kvm_io_device_ops *ops; }; Did you plan to extend kvm_io_device struct? +static inline void kvm_iodevice_init(struct kvm_io_device *dev, + struct kvm_io_device_ops *ops) +{ +dev-ops = ops; +} And similarly, did you have a plan to do more with kvm_iodevice_init()? Otherwise looking a bit like overkill to me. Yeah. As of right now my plan is to wait for Marcelo's lock cleanup to go in and integrate with that, and then convert the MMIO/PIO code to use RCU to acquire a reference to the io_device (so we run as fine-graned and lockless as possible). When that happens, you will see an atomic_t in the struct/init as well. Even if that doesn't make the cut after review, I am thinking that we may be making the structure more complex in the future (for instance, to use a rbtree/hlist instead of the array, or to do tricks with caching the MRU device, etc.) and this will simplify that effort by already having all users call the abstracted init. That said, we could just defer these hunks until needed. I just figured while Im in here but its nbd either way. --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2456,7 +2456,7 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, for (i = 0; i bus-dev_count; i++) { struct kvm_io_device *pos = bus-devs[i]; -if (pos-in_range(pos, addr, len, is_write)) +if (kvm_iodevice_inrange(pos, addr, len, is_write)) return pos; } You converted this to the helper but not vcpu_find_pervcpu_dev() (not convinced it actually helps readability, but consistency is good). Oops..oversight. Will fix. BTW, while there, s/kvm_iodevice_inrange/kvm_iodevice_in_range/ would be nice. Yeah, good idea. Will fix. Thanks Chris, -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM-AUTOTEST PATCH] Use new function VM.get_name() to get the VM's name, instead of VM.name
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote: kvm_vm.py: add function VM.get_name(). kvm_preprocessing.py: use VM.get_name() instead of directly accessing the .name attribute. Are there any advantages of creating this method over directly accessing the attribute? Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm_runtest_2/kvm_preprocessing.py |6 +++--- client/tests/kvm_runtest_2/kvm_vm.py|4 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_preprocessing.py b/client/tests/kvm_runtest_2/kvm_preprocessing.py index c9eb35d..bcabf5a 100644 --- a/client/tests/kvm_runtest_2/kvm_preprocessing.py +++ b/client/tests/kvm_runtest_2/kvm_preprocessing.py @@ -178,7 +178,7 @@ def preprocess(test, params, env): if vm.is_dead(): continue if not vm.verify_process_identity(): -kvm_log.debug(VM '%s' seems to have been replaced by another process % vm.name) +kvm_log.debug(VM '%s' seems to have been replaced by another process % vm.get_name()) vm.pid = None # Destroy and remove VMs that are no longer needed in the environment @@ -187,8 +187,8 @@ def preprocess(test, params, env): vm = env[key] if not kvm_utils.is_vm(vm): continue -if not vm.name in requested_vms: -kvm_log.debug(VM '%s' found in environment but not required for test; removing it... % vm.name) +if not vm.get_name() in requested_vms: +kvm_log.debug(VM '%s' found in environment but not required for test; removing it... % vm.get_name()) vm.destroy() del env[key] diff --git a/client/tests/kvm_runtest_2/kvm_vm.py b/client/tests/kvm_runtest_2/kvm_vm.py index fab839f..df99859 100644 --- a/client/tests/kvm_runtest_2/kvm_vm.py +++ b/client/tests/kvm_runtest_2/kvm_vm.py @@ -454,6 +454,10 @@ class VM: Return True iff the VM's PID does not exist. return not kvm_utils.pid_exists(self.pid) +def get_name(self): +Return the VM's name. +return self.name + def get_params(self): Return the VM's params dict. -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] kvm-s390: infrastructure to kick vcpus out of guest state
Avi Kivity wrote: Christian Ehrhardt wrote: So you _need_ a mechanism to kick all vcpus out of guest mode? I have a mechanism to kick a vcpu, and I use it. Due to the fact that smp_call_* don't work as kick for us the kick is an arch specific function. I hop ethat clarified this part :-) You could still use make_all_vcpus_request(), just change smp_call_function_many() to your own kicker. Yes and I like this idea for further unification, but I don't want it mixed too much into the patches in discussion atm. Because on one hand I have some problems giving my arch specific kick a behaviour like return when the guest WAS kicked and on the other hand I would e.g. also need to streamline the check in make_all_vcpus_request which cpu is running etc because vcpu-cpu stays -1 all the time on s390 (never used). Therefore I would unify things step by step and this way allow single task to went off my task pile here :-) -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] kvm_runtest_2.py: use pickle instead of shelve when loading/saving env
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote: pickle allows more control over the load/save process. Specifically, it enables us to dump the contents of an object to disk without having to unpickle it. shelve, which uses pickle, seems to pickle and unpickle every time sync() is called. This is bad for classes that need to be unpickled only once per test (such a class will be introduced in a future patch). Looks good to me. Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm_runtest_2/kvm_runtest_2.py | 29 -- 1 files changed, 22 insertions(+), 7 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_runtest_2.py b/client/tests/kvm_runtest_2/kvm_runtest_2.py index a69951b..5f7f6ad 100644 --- a/client/tests/kvm_runtest_2/kvm_runtest_2.py +++ b/client/tests/kvm_runtest_2/kvm_runtest_2.py @@ -3,9 +3,9 @@ import sys import os import time -import shelve import random import resource +import cPickle from autotest_lib.client.bin import test from autotest_lib.client.common_lib import error @@ -18,6 +18,22 @@ class test_routine: self.routine = None +def dump_env(obj, filename): +file = open(filename, w) +cPickle.dump(obj, file) +file.close() + + +def load_env(filename, default=None): +try: +file = open(filename, r) +except: +return default +obj = cPickle.load(file) +file.close() +return obj + + class kvm_runtest_2(test.test): version = 1 @@ -65,7 +81,7 @@ class kvm_runtest_2(test.test): # Open the environment file env_filename = os.path.join(self.bindir, params.get(env, env)) -env = shelve.open(env_filename, writeback=True) +env = load_env(env_filename, {}) kvm_log.debug(Contents of environment: %s % str(env)) try: @@ -87,24 +103,23 @@ class kvm_runtest_2(test.test): # Preprocess kvm_preprocessing.preprocess(self, params, env) -env.sync() +dump_env(env, env_filename) # Run the test function routine_obj.routine(self, params, env) -env.sync() +dump_env(env, env_filename) except Exception, e: kvm_log.error(Test failed: %s % e) kvm_log.debug(Postprocessing on error...) kvm_preprocessing.postprocess_on_error(self, params, env) -env.sync() +dump_env(env, env_filename) raise finally: # Postprocess kvm_preprocessing.postprocess(self, params, env) kvm_log.debug(Contents of environment: %s % str(env)) -env.sync() -env.close() +dump_env(env, env_filename) def postprocess(self): pass -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] kvm_vm.py: add new VM parameter 'x11_display' that controls $DISPLAY
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote: If x11_display is specified, the DISPLAY environment variable is set to this value for the QEMU process. This may be useful for SDL rendering. Looks good to me! Also add some comments. Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm_runtest_2/kvm_vm.py | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_vm.py b/client/tests/kvm_runtest_2/kvm_vm.py index 9571a3b..7a4ce4a 100644 --- a/client/tests/kvm_runtest_2/kvm_vm.py +++ b/client/tests/kvm_runtest_2/kvm_vm.py @@ -173,6 +173,8 @@ class VM: (iso_dir is pre-pended to the ISO filename) extra_params -- a string to append to the qemu command ssh_port -- should be 22 for SSH, 23 for Telnet +x11_display -- if specified, the DISPLAY environment variable will be be set +to this value for the qemu process (useful for SDL rendering) images -- a list of image object names, separated by spaces nics -- a list of NIC object names, separated by spaces @@ -198,8 +200,16 @@ class VM: if iso_dir == None: iso_dir = self.iso_dir -qemu_cmd = qemu_path +# Start constructing the qemu command +qemu_cmd = +# Set the X11 display parameter if requested +if params.get(x11_display): +qemu_cmd += DISPLAY=%s % params.get(x11_display) +# Add the qemu binary +qemu_cmd += qemu_path +# Add the VM's name qemu_cmd += -name '%s' % name +# Add the monitor socket parameter qemu_cmd += -monitor unix:%s,server,nowait % self.monitor_file_name for image_name in kvm_utils.get_sub_dict_names(params, images): -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] kvm_runtest_2.py: use environment filename specified by the 'env' parameter
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote: Do not use hardcoded environment filename 'env'. Instead use the value specified by the 'env' parameter. If unspecified, use 'env' as the filename. Looks good to me! This is important for parallel execution; it may be necessary to use a separate environment file for each process. Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm_runtest_2/kvm_runtest_2.py |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_runtest_2.py b/client/tests/kvm_runtest_2/kvm_runtest_2.py index fda7282..a69951b 100644 --- a/client/tests/kvm_runtest_2/kvm_runtest_2.py +++ b/client/tests/kvm_runtest_2/kvm_runtest_2.py @@ -64,7 +64,7 @@ class kvm_runtest_2(test.test): self.write_test_keyval({key: params[key]}) # Open the environment file -env_filename = os.path.join(self.bindir, env) +env_filename = os.path.join(self.bindir, params.get(env, env)) env = shelve.open(env_filename, writeback=True) kvm_log.debug(Contents of environment: %s % str(env)) -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: just a dump
On Wed, 2009-05-27 at 09:43 +0200, Hans de Bruin wrote: [09:09:47 INFO ] Test finished after 1 iterations. Memory test passed. [09:09:48 DEBUG] Running 'grep MemTotal /proc/meminfo' [09:09:48 DEBUG] Running 'rpm -qa' [09:09:49 INFO ]GOODdma_memtest dma_memtest timestamp=1243408189localtime=May 27 09:09:49 completed successfully [09:09:49 DEBUG] Persistent state variable __group_level now set to 1 [09:09:49 INFO ]END GOODdma_memtest dma_memtest timestamp=1243408189localtime=May 27 09:09:49 [09:09:49 DEBUG] Dropping caches [09:09:49 DEBUG] Running 'sync' [09:09:51 DEBUG] Running 'sync' [09:09:51 DEBUG] Running 'echo 3 /proc/sys/vm/drop_caches' [09:09:52 DEBUG] Persistent state variable __group_level now set to 0 [09:09:52 INFO ] END GOOD timestamp=1243408192 localtime=May 27 09:09:52 Well that looks good. The web page talks about forcing the system to swap. That never happend swap usage is still 0 bytes. I installed autotest on the same lv (3 disk stripe) as the vmdisks. Interesting, about that I made some tests and just realized that in some cases both mine and the original shell implementation are failing on forcing the system to go to swap (the tests I made when the test was firstly implemented did manage to make the machines swap, but in conditions where the system had a significantly higher initial memory usage). I will work on a better heuristic to force swap. Thanks for pointing this out, -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH] VM.create(): always destroy() the VM before attempting to start it
On Sun, 2009-05-24 at 18:46 +0300, Michael Goldish wrote: Also, don't do it in kvm_preprocessing.py since it's now done in kvm_vm.py. Looks good to me! Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm_runtest_2/kvm_preprocessing.py |1 - client/tests/kvm_runtest_2/kvm_vm.py|2 ++ 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_preprocessing.py b/client/tests/kvm_runtest_2/kvm_preprocessing.py index bcabf5a..9ccaf78 100644 --- a/client/tests/kvm_runtest_2/kvm_preprocessing.py +++ b/client/tests/kvm_runtest_2/kvm_preprocessing.py @@ -84,7 +84,6 @@ def preprocess_vm(test, params, env, name): start_vm = True if start_vm: -vm.destroy() if not vm.create(name, params, qemu_path, image_dir, iso_dir, for_migration): message = Could not start VM kvm_log.error(message) diff --git a/client/tests/kvm_runtest_2/kvm_vm.py b/client/tests/kvm_runtest_2/kvm_vm.py index df99859..a1462c6 100644 --- a/client/tests/kvm_runtest_2/kvm_vm.py +++ b/client/tests/kvm_runtest_2/kvm_vm.py @@ -238,6 +238,8 @@ class VM: stored in the class attributes is used, and if it is supplied, it is stored for later use. +self.destroy() + if name != None: self.name = name if params != None: -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange guest slowness after some time
Avi Kivity wrote: Tomasz Chmielewski wrote: Maybe virtio is racy and a loaded host exposes the race. I see it happening with virtio on 2.6.29.x guests as well. So, what would you do if you saw it on your systems as well? ;) Add some debug routines into virtio_* modules? I'm no virtio expert. Maybe I'd insert tracepoints to record interrupts and kicks. Accidentally, I made some interesting discovery. This ~2 MB video shows a kvm-86 guest being rebooted and GRUB started: http://syneticon.net/kvm/kvm-slowness.ogg GRUB has its timeout set to 50 seconds, and is supposed to show it on the screen by decreasing the number of seconds shown, every second. Here, GRUB decreases the second counter very fast by 2 seconds, then waits 2 seconds, then again decreases the number of sends by 2 seconds very fast, and so on. Perhaps my wording does not describe it very well though, so just try to download the video and open it i.e. in mplayer. Comments? -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] kvm-s390: update vcpu-cpu
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com kvm on s390 formerly ignored vcpu-cpu. This patch adds set/unset vcpu-cpu in kvm_arch_vcpu_load/put to allow further architecture unification e.g. let generic code not find -1 on currently scheduled vcpus. Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com --- [diffstat] kvm-s390.c |2 ++ 1 file changed, 2 insertions(+) [diff] Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -243,6 +243,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcp void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { + vcpu-cpu = cpu; save_fp_regs(vcpu-arch.host_fpregs); save_access_regs(vcpu-arch.host_acrs); vcpu-arch.guest_fpregs.fpc = FPC_VALID_MASK; @@ -252,6 +253,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) { + vcpu-cpu = -1; save_fp_regs(vcpu-arch.guest_fpregs); save_access_regs(vcpu-arch.guest_acrs); restore_fp_regs(vcpu-arch.host_fpregs); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] kvm-s390: streamline memslot handling - v4
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com *updates in v4* - kickout only scheduled vcpus (its superfluous and wait might hang forever on not running vcpus) *updates in v3* - handling the mmu reload vcpu request can now be handled inside the sigp handling avoiding an addtional exit - kvm_arch_set_memory_region now waits for kicked vcpu's to consume the request bit it set to ensure that after the kvm_arch_set_memory_region call all vcpus use the updated memory information *updates in v2* - added optimization to skip (addtional) kickout of vcpu's that had the request already set. This patch relocates the variables kvm-s390 uses to track guest mem addr/size. As discussed dropping the variables at struct kvm_arch level allows to use the common vcpu-request based mechanism to reload guest memory if e.g. changes via set_memory_region. The kick mechanism introduced in this series is used to ensure running vcpus leave guest state to catch the update. Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com --- [diffstat] include/asm/kvm_host.h |4 --- kvm/gaccess.h | 23 ++- kvm/intercept.c|6 ++--- kvm/kvm-s390.c | 57 - kvm/kvm-s390.h | 33 +++- kvm/sigp.c |4 +-- 6 files changed, 74 insertions(+), 53 deletions(-) [diff] Index: kvm/arch/s390/kvm/gaccess.h === --- kvm.orig/arch/s390/kvm/gaccess.h +++ kvm/arch/s390/kvm/gaccess.h @@ -1,7 +1,7 @@ /* * gaccess.h - access guest memory * - * Copyright IBM Corp. 2008 + * Copyright IBM Corp. 2008,2009 * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License (version 2 only) @@ -16,13 +16,14 @@ #include linux/compiler.h #include linux/kvm_host.h #include asm/uaccess.h +#include kvm-s390.h static inline void __user *__guestaddr_to_user(struct kvm_vcpu *vcpu, unsigned long guestaddr) { unsigned long prefix = vcpu-arch.sie_block-prefix; - unsigned long origin = vcpu-kvm-arch.guest_origin; - unsigned long memsize = vcpu-kvm-arch.guest_memsize; + unsigned long origin = vcpu-arch.sie_block-gmsor; + unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu); if (guestaddr 2 * PAGE_SIZE) guestaddr += prefix; @@ -158,8 +159,8 @@ static inline int copy_to_guest(struct k const void *from, unsigned long n) { unsigned long prefix = vcpu-arch.sie_block-prefix; - unsigned long origin = vcpu-kvm-arch.guest_origin; - unsigned long memsize = vcpu-kvm-arch.guest_memsize; + unsigned long origin = vcpu-arch.sie_block-gmsor; + unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu); if ((guestdest 2 * PAGE_SIZE) (guestdest + n 2 * PAGE_SIZE)) goto slowpath; @@ -209,8 +210,8 @@ static inline int copy_from_guest(struct unsigned long guestsrc, unsigned long n) { unsigned long prefix = vcpu-arch.sie_block-prefix; - unsigned long origin = vcpu-kvm-arch.guest_origin; - unsigned long memsize = vcpu-kvm-arch.guest_memsize; + unsigned long origin = vcpu-arch.sie_block-gmsor; + unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu); if ((guestsrc 2 * PAGE_SIZE) (guestsrc + n 2 * PAGE_SIZE)) goto slowpath; @@ -244,8 +245,8 @@ static inline int copy_to_guest_absolute unsigned long guestdest, const void *from, unsigned long n) { - unsigned long origin = vcpu-kvm-arch.guest_origin; - unsigned long memsize = vcpu-kvm-arch.guest_memsize; + unsigned long origin = vcpu-arch.sie_block-gmsor; + unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu); if (guestdest + n memsize) return -EFAULT; @@ -262,8 +263,8 @@ static inline int copy_from_guest_absolu unsigned long guestsrc, unsigned long n) { - unsigned long origin = vcpu-kvm-arch.guest_origin; - unsigned long memsize = vcpu-kvm-arch.guest_memsize; + unsigned long origin = vcpu-arch.sie_block-gmsor; + unsigned long memsize = kvm_s390_vcpu_get_memsize(vcpu); if (guestsrc + n memsize) return -EFAULT; Index: kvm/arch/s390/kvm/intercept.c === --- kvm.orig/arch/s390/kvm/intercept.c +++ kvm/arch/s390/kvm/intercept.c @@ -1,7 +1,7 @@ /* * intercept.c - in-kernel handling for sie intercepts * - * Copyright IBM Corp. 2008 + * Copyright IBM Corp. 2008,2009 * * This program is free software; you can
[PATCH 1/5] kvm-s390: infrastructure to kick vcpus out of guest state - v3
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com *updates in v3* - ensure allocations (might_sleep) are out of atomic context *updates in v2* instead of a kick to level behaviour the patch now implements a kick to the lowest level. The check there bails out to upper levels if not all outstanding vcpu-requests could be handled internally (it could still support an explicit kick to level if ever needed). To ensure vcpu's come out of guest context in certain cases this patch adds a s390 specific way to kick them out of guest context. Currently it kicks them out to rerun the vcpu_run path in the s390 code, but the mechanism itself is expandable and with a new flag we could also add e.g. kicks to userspace etc. Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com --- [diffstat] include/asm/kvm_host.h |5 ++-- kvm/intercept.c| 14 +-- kvm/kvm-s390.c |6 kvm/kvm-s390.h | 17 ++ kvm/sigp.c | 59 ++--- 5 files changed, 79 insertions(+), 22 deletions(-) [diff] Index: kvm/arch/s390/kvm/intercept.c === --- kvm.orig/arch/s390/kvm/intercept.c +++ kvm/arch/s390/kvm/intercept.c @@ -128,7 +128,7 @@ static int handle_noop(struct kvm_vcpu * static int handle_stop(struct kvm_vcpu *vcpu) { - int rc; + int rc = 0; vcpu-stat.exit_stop_request++; atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags); @@ -141,12 +141,20 @@ static int handle_stop(struct kvm_vcpu * rc = -ENOTSUPP; } + if (vcpu-arch.local_int.action_bits ACTION_VCPUREQUEST_ON_STOP) { + vcpu-arch.local_int.action_bits = ~ACTION_VCPUREQUEST_ON_STOP; + if (kvm_s390_handle_vcpu_requests(vcpu, VCPUREQUESTLVL_SIGP)) { + rc = SIE_INTERCEPT_CHECKREQUESTS; + vcpu-run-exit_reason = KVM_EXIT_INTR; + } + } + if (vcpu-arch.local_int.action_bits ACTION_STOP_ON_STOP) { vcpu-arch.local_int.action_bits = ~ACTION_STOP_ON_STOP; VCPU_EVENT(vcpu, 3, %s, cpu stopped); rc = -ENOTSUPP; - } else - rc = 0; + } + spin_unlock_bh(vcpu-arch.local_int.lock); return rc; } Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -487,6 +487,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v vcpu_load(vcpu); +rerun_vcpu: + kvm_s390_handle_vcpu_requests(vcpu, VCPUREQUESTLVL_VCPURUN); + /* verify, that memory has been registered */ if (!vcpu-kvm-arch.guest_memsize) { vcpu_put(vcpu); @@ -519,6 +522,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v rc = kvm_handle_sie_intercept(vcpu); } while (!signal_pending(current) !rc); + if (rc == SIE_INTERCEPT_CHECKREQUESTS) + goto rerun_vcpu; + if (signal_pending(current) !rc) rc = -EINTR; Index: kvm/arch/s390/kvm/kvm-s390.h === --- kvm.orig/arch/s390/kvm/kvm-s390.h +++ kvm/arch/s390/kvm/kvm-s390.h @@ -20,6 +20,8 @@ typedef int (*intercept_handler_t)(struct kvm_vcpu *vcpu); +/* negativ values are error codes, positive values for internal conditions */ +#define SIE_INTERCEPT_CHECKREQUESTS(10) int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu); #define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\ @@ -50,6 +52,21 @@ int kvm_s390_inject_vm(struct kvm *kvm, int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu, struct kvm_s390_interrupt *s390int); int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code); +int kvm_s390_inject_sigp_stop(struct kvm_vcpu *vcpu, int action); + +/* interception levels from which handle vcpu requests can be called */ +#define VCPUREQUESTLVL_SIGP1 +#define VCPUREQUESTLVL_VCPURUN 2 +static inline unsigned long kvm_s390_handle_vcpu_requests(struct kvm_vcpu *vcpu, + int level) +{ + BUG_ON(!level); + + if (!vcpu-requests) + return 0; + + return vcpu-requests; +} /* implemented in priv.c */ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu); Index: kvm/arch/s390/include/asm/kvm_host.h === --- kvm.orig/arch/s390/include/asm/kvm_host.h +++ kvm/arch/s390/include/asm/kvm_host.h @@ -180,8 +180,9 @@ struct kvm_s390_interrupt_info { }; /* for local_interrupt.action_flags */ -#define ACTION_STORE_ON_STOP 1 -#define ACTION_STOP_ON_STOP 2 +#define ACTION_STORE_ON_STOP (10) +#define ACTION_STOP_ON_STOP(11) +#define ACTION_VCPUREQUEST_ON_STOP (12)
[PATCH 4/5] kvm: remove redundant declarations
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com Changing s390 code in kvm_arch_vcpu_load/put come across this header declarations. They are complete duplicates, not even useful forward declarations as nothing using it is in between (according to git blame the s390 and ia64 contributions introducing the first arch independency overlapped and added it twice). This patch removes the two dispensable lines. Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com --- [diffstat] kvm_host.h |2 -- 1 file changed, 2 deletions(-) [diff] Index: kvm/include/linux/kvm_host.h === --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -241,8 +241,6 @@ long kvm_arch_dev_ioctl(struct file *fil unsigned int ioctl, unsigned long arg); long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); -void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); int kvm_dev_ioctl_check_extension(long ext); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] kvm-s390: fix interruption casued by signal - v2
From: Christian Ehrhardt ehrha...@linux.vnet.ibm.com *updates in v2* merged a small piece of code from patch 1/1 that belongs here themtically If signal pending is true we exit without updating kvm_run, userspace currently just does nothing and jumps to kvm_run again. Since we did not set an exit_reason we might end up with a random one (whatever was the last exit). Therefore it was possible to e.g. jump to the psw position the last real interruption set. Setting the INTR exit reason ensures that no old psw data is swapped in on reentry. Signed-off-by: Christian Ehrhardt ehrha...@linux.vnet.ibm.com --- [diffstat] kvm-s390.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) [diff] Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -509,6 +509,7 @@ rerun_vcpu: vcpu-arch.sie_block-gpsw.addr = kvm_run-s390_sieic.addr; break; case KVM_EXIT_UNKNOWN: + case KVM_EXIT_INTR: case KVM_EXIT_S390_RESET: break; default: @@ -525,8 +526,10 @@ rerun_vcpu: if (rc == SIE_INTERCEPT_CHECKREQUESTS) goto rerun_vcpu; - if (signal_pending(current) !rc) + if (signal_pending(current) !rc) { + kvm_run-exit_reason = KVM_EXIT_INTR; rc = -EINTR; + } if (rc == -ENOTSUPP) { /* intercept cannot be handled in-kernel, prepare kvm-run */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] kvm-s390: revised version of kvm-s390 guest memory handling - v4
From: Christian Ehrhardt ehrha...@de.ibm.com *updates in v3* - ensure kick allocations (might_sleep) are out of atomic context - replaced kick to level behaviour by kick to low level and bail out - updates on running vcpus can now be handled without need to rerun the vcpu - kvm_arch_set_memory_region waits until the update is consumed by the vcpu - kickout only scheduled vcpus (wait might hang forever on non-scheduled vcpus) - moved some code between patches 1 2 thematically - update vcpu-cpu in kvm-s390 arch handler for load/put - remove a redundant declaration in kvm_host.h related to load/put Note: further unification of make_all_cpu_request and the kick mechanism is planned, but it might be good to split it from this step towards commonality. *updates in v2* added optimization to patch 3/3 to skip (addtional) kickout of vcpu's that had the request already set. This patch series results from our discussions about handling memslots and vcpu mmu reloads. It streamlines kvm-s390 a bit by using slots_lock, vcpu-request (KVM_REQ_MMU_RELOAD) and a kick mechanism to ensure vcpus come out of guest context to catch the update. I tested the reworked code a while with multiple smp guests and some extra code that periodically injects kicks and/or mmu reload requests, but I'm happy about every additional review feedback. Patches included: Subject: [PATCH 1/5] kvm-s390: infrastructure to kick vcpus out of guest state - v3 Subject: [PATCH 2/5] kvm-s390: fix interruption casued by signal - v2 Subject: [PATCH 3/5] kvm-s390: update vcpu-cpu Subject: [PATCH 4/5] kvm: remove redundant declarations Subject: [PATCH 5/5] kvm-s390: streamline memslot handling - v4 Overall-Diffstat: arch/s390/include/asm/kvm_host.h |9 ++--- arch/s390/kvm/gaccess.h | 23 ++-- arch/s390/kvm/intercept.c| 20 +++ arch/s390/kvm/kvm-s390.c | 70 --- arch/s390/kvm/kvm-s390.h | 50 +++ arch/s390/kvm/sigp.c | 63 --- include/linux/kvm_host.h |2 - 7 files changed, 159 insertions(+), 78 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Remove qemu_alloc_physram()
* Avi Kivity a...@redhat.com [2009-05-27 11:23]: Avi Kivity wrote: nicolas prochazka wrote: without -mem-prealloc HugePages_Total: 2560 HugePages_Free: 2296 HugePages_Rsvd: 0 so after minimum test, i can say that's your patch seems to be correct this problem. It isn't correct, it doesn't generate the right alignment. Better patch attached. This patch restores -mempath working for me on latest qemu-kvm.git. btw, why'd we go from -mem-path (kvm-84/stable) to -mempath (kvm-85 and newer)? Changing these breaks existing scripts. Not a big deal, but wondering what was the motivation for the change. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] VMX Unrestricted mode support
Avi, A new VMX feature Unrestricted Guest feature is added in the VMX specification. You can look at the latest Intel processor manual for details of the feature here: http://www.intel.com/products/processor/manuals It allows kvm guests to run real mode and unpaged mode code natively in the VMX mode when EPT is turned on. With the unrestricted guest there is no need to emulate the guest real mode code in the vm86 container or in the emulator. Also the guest big real mode code works like native. The attached patch enhances KVM to use the unrestricted guest feature if available on the processor. It also adds a new kernel/module parameter to disable the unrestricted guest feature at the boot time. Signed-Off-By: Nitin A Kamble nitin.a.kam...@intel.com Thanks Regards, Nitin diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d2b082d..7832599 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -40,9 +40,13 @@ #define KVM_GUEST_CR0_MASK\ (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE \ | X86_CR0_NW | X86_CR0_CD) +#define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST \ + (X86_CR0_WP | X86_CR0_NE | X86_CR0_TS | X86_CR0_MP) +#define KVM_VM_CR0_ALWAYS_ON_RESTRICTED_GUEST \ + (KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE) #define KVM_VM_CR0_ALWAYS_ON \ - (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE | X86_CR0_TS \ -| X86_CR0_MP) + (enable_unrestricted_guest ? KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST \ + : KVM_VM_CR0_ALWAYS_ON_RESTRICTED_GUEST) #define KVM_GUEST_CR4_MASK \ (X86_CR4_VME | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_PGE | X86_CR4_VMXE) #define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 498f944..c73da02 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -55,6 +55,7 @@ #define SECONDARY_EXEC_ENABLE_EPT 0x0002 #define SECONDARY_EXEC_ENABLE_VPID 0x0020 #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 +#define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 #define PIN_BASED_EXT_INTR_MASK 0x0001 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 25f1239..703d2c4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -50,6 +50,10 @@ module_param_named(flexpriority, flexpriority_enabled, bool, S_IRUGO); static int __read_mostly enable_ept = 1; module_param_named(ept, enable_ept, bool, S_IRUGO); +static int __read_mostly enable_unrestricted_guest = 1; +module_param_named(unrestricted_guest, + enable_unrestricted_guest, bool, S_IRUGO); + static int __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); @@ -268,6 +272,12 @@ static inline int cpu_has_vmx_ept(void) SECONDARY_EXEC_ENABLE_EPT; } +static inline int cpu_has_vmx_unrestricted_guest(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_UNRESTRICTED_GUEST; +} + static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm) { return flexpriority_enabled @@ -731,7 +741,7 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu) static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) { - if (vcpu-arch.rmode.active) + if (vcpu-arch.rmode.active !enable_unrestricted_guest) rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM; vmcs_writel(GUEST_RFLAGS, rflags); } @@ -1195,7 +1205,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | SECONDARY_EXEC_WBINVD_EXITING | SECONDARY_EXEC_ENABLE_VPID | - SECONDARY_EXEC_ENABLE_EPT; + SECONDARY_EXEC_ENABLE_EPT | + SECONDARY_EXEC_UNRESTRICTED_GUEST; if (adjust_vmx_controls(min2, opt2, MSR_IA32_VMX_PROCBASED_CTLS2, _cpu_based_2nd_exec_control) 0) @@ -1325,8 +1336,13 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_vpid()) enable_vpid = 0; - if (!cpu_has_vmx_ept()) + if (!cpu_has_vmx_ept()) { enable_ept = 0; + enable_unrestricted_guest = 0; + } + + if (!cpu_has_vmx_unrestricted_guest()) + enable_unrestricted_guest = 0; if (!cpu_has_vmx_flexpriority()) flexpriority_enabled = 0; @@ -1363,9 +1379,17 @@ static void enter_pmode(struct kvm_vcpu *vcpu) unsigned long flags;
Re: [patch] VMX Unrestricted mode support
VMX Unrestricted mode support -- looks like very interesting (and useful !) feature. Which CPUs support it ? Core i7 900-series (Nehalem) ? -- -Alexey Eromenko Technologov -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip v8 6/7] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com Cc: Frederic Weisbecker fweis...@gmail.com --- include/linux/ftrace_event.h | 13 ++ include/trace/ftrace.h | 22 + kernel/trace/trace_events.c | 54 +- kernel/trace/trace_export.c | 27 ++--- 4 files changed, 69 insertions(+), 47 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index bbf40f6..e25f3a4 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -108,12 +108,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void); - void(*unregfunc)(void); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; void*filter; @@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index b4ec83a..de3ee7c 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -229,7 +229,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags)\ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ static int \ -ftrace_format_##call(struct trace_seq *s) \ +ftrace_format_##call(struct ftrace_event_call *event_call, \ +struct trace_seq *s) \ { \ struct ftrace_raw_##call field __attribute__((unused)); \ int ret = 0;\ @@ -269,10 +270,9 @@ ftrace_format_##call(struct trace_seq *s) \ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ int\ -ftrace_define_fields_##call(void) \ +ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ { \ struct ftrace_raw_##call field; \ - struct ftrace_event_call *event_call = event_##call; \ int ret;\ \ __common_field(int, type, 1); \ @@ -298,7 +298,7 @@ ftrace_define_fields_##call(void) \ * event_trace_printk(_RET_IP_, call: fmt); * } * - * static int ftrace_reg_event_call(void) + * static int ftrace_reg_event_call(struct ftrace_event_call *dummy) * { * int ret; * @@ -309,7 +309,7 @@ ftrace_define_fields_##call(void) \ * return ret; * } * - * static void ftrace_unreg_event_call(void) + * static void ftrace_unreg_event_call(struct ftrace_event_call *dummy) * { * unregister_trace_call(ftrace_event_call); * } @@ -342,7 +342,7 @@ ftrace_define_fields_##call(void) \ * trace_current_buffer_unlock_commit(event, irq_flags, pc); * } * - * static int ftrace_raw_reg_event_call(void) + * static int
[PATCH -tip v8 3/7] kprobes: checks probe address is instruction boudary on x86
Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 7b5169d..41d524f 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,12 +48,14 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h #include asm/pgtable.h #include asm/uaccess.h #include asm/alternative.h +#include asm/insn.h void jprobe_return_end(void); @@ -244,6 +246,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Basically, kp-ainsn.insn has an original instruction. +* However, RIP-relative instruction can not do single-stepping +* at different place, fix_riprel() tweaks the displacement of +* that instruction. In that case, we can't recover the instruction +* from the kp-ainsn.insn. +* +* On the other hand, kp-opcode has a copy of the first byte of +* the probed instruction, which is overwritten by int3. And +* the instruction at kp-addr is not modified by kprobes except +* for the first byte, we can recover the original instruction +* from it and kp-opcode. +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + /* +* Another debugging subsystem might insert +* this breakpoint. In that case, we can't +* recover it. +*/ + return 0; + kernel_insn_init(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -359,6 +426,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip v8 5/7] x86: add pt_regs register and stack access APIs
Add following APIs for accessing registers and stack entries from pt_regs. - query_register_offset(const char *name) Query the offset of name register. - query_register_name(unsigned offset) Query the name of register by its offset. - get_register(struct pt_regs *regs, unsigned offset) Get the value of a register by its offset. - within_kernel_stack(struct pt_regs *regs, unsigned long addr) Check the address is in the kernel stack. - get_kernel_stack_nth(struct pt_regs *reg, unsigned nth) Get Nth entry of the kernel stack. (N = 0) - get_argument_nth(struct pt_regs *reg, unsigned nth) Get Nth argument at function call. (N = 0) Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com --- arch/x86/include/asm/ptrace.h | 67 + arch/x86/kernel/ptrace.c | 60 + 2 files changed, 127 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0f0d908..577d625 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -216,6 +217,72 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int query_register_offset(const char *name); +extern const char *query_register_name(unsigned offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/* Get register value from its offset */ +static inline unsigned long get_register(struct pt_regs *regs, unsigned offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/* Check the address in the stack */ +static inline int within_kernel_stack(struct pt_regs *regs, unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_stack_pointer(regs) ~(THREAD_SIZE - 1))); +} + +/* Get Nth entry of the stack */ +static inline unsigned long get_kernel_stack_nth(struct pt_regs *regs, +unsigned n) +{ + unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + addr += n; + if (within_kernel_stack(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/* Get Nth argument at function call */ +static inline unsigned long get_argument_nth(struct pt_regs *regs, unsigned n) +{ +#ifdef CONFIG_X86_32 +#define NR_REGPARMS 3 + if (n NR_REGPARMS) { + switch (n) { + case 0: return regs-ax; + case 1: return regs-dx; + case 2: return regs-cx; + } + return 0; +#else /* CONFIG_X86_64 */ +#define NR_REGPARMS 6 + if (n NR_REGPARMS) { + switch (n) { + case 0: return regs-di; + case 1: return regs-si; + case 2: return regs-dx; + case 3: return regs-cx; + case 4: return regs-r8; + case 5: return regs-r9; + } + return 0; +#endif + } else { + /* +* The typical case: arg n is on the stack. +* (Note: stack[0] = return address, so skip it) +*/ + return get_kernel_stack_nth(regs, 1 + n - NR_REGPARMS); + } +} + /* * These are defined as per linux/ptrace.h, which see. */ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 09ecbde..00eb9d7 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -48,6 +48,66 @@ enum x86_regset { REGSET_IOPERM32, }; +struct pt_regs_offset { + const char *name; + int offset; +}; + +#define REG_OFFSET(r) offsetof(struct pt_regs, r) +#define REG_OFFSET_NAME(r) {.name = #r, .offset = REG_OFFSET(r)} +#define REG_OFFSET_END {.name = NULL, .offset = 0} + +static const struct pt_regs_offset regoffset_table[] = { +#ifdef CONFIG_X86_64 + REG_OFFSET_NAME(r15), + REG_OFFSET_NAME(r14), + REG_OFFSET_NAME(r13), + REG_OFFSET_NAME(r12), + REG_OFFSET_NAME(r11), + REG_OFFSET_NAME(r10), + REG_OFFSET_NAME(r9), + REG_OFFSET_NAME(r8), +#endif + REG_OFFSET_NAME(bx), + REG_OFFSET_NAME(cx), + REG_OFFSET_NAME(dx), + REG_OFFSET_NAME(si), + REG_OFFSET_NAME(di), + REG_OFFSET_NAME(bp), + REG_OFFSET_NAME(ax), +#ifdef CONFIG_X86_32 + REG_OFFSET_NAME(ds), + REG_OFFSET_NAME(es), + REG_OFFSET_NAME(fs), + REG_OFFSET_NAME(gs), +#endif + REG_OFFSET_NAME(orig_ax), +
[PATCH -tip v8 2/7] x86: x86 instruction decoder build-time selftest
Add a user-space selftest of x86 instruction decoder at kernel build time. When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86 instruction decoder and performs it after building vmlinux. The test compares the results of objdump and x86 instruction decoder code and check there are no differences. Changes from v7: - Add data, addr, rep, lock prefixes to skip instructions list. - Add license comments. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Sam Ravnborg s...@ravnborg.org --- arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 + arch/x86/include/asm/inat.h |2 + arch/x86/include/asm/insn.h |2 + arch/x86/lib/inat.c |2 + arch/x86/lib/insn.c |2 + arch/x86/scripts/Makefile | 19 +++ arch/x86/scripts/distill.awk| 42 + arch/x86/scripts/test_get_len.c | 99 +++ arch/x86/scripts/user_include.h | 49 +++ 10 files changed, 229 insertions(+), 0 deletions(-) create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/test_get_len.c create mode 100644 arch/x86/scripts/user_include.h diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 9a88937..430aab4 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -179,6 +179,15 @@ config X86_DS_SELFTEST config HAVE_MMIOTRACE_SUPPORT def_bool y +config X86_DECODER_SELFTEST + bool x86 instruction decoder selftest + depends on DEBUG_KERNEL + ---help--- +Perform x86 instruction decoder selftests at build time. +This option is useful for checking the sanity of x86 instruction +decoder code. +If unsure, say N. + # # IO delay types: # diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1b68659..7046556 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -154,6 +154,9 @@ all: bzImage KBUILD_IMAGE := $(boot)/bzImage bzImage: vmlinux +ifeq ($(CONFIG_X86_DECODER_SELFTEST),y) + $(Q)$(MAKE) $(build)=arch/x86/scripts posttest +endif $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 01e079a..9090665 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -20,7 +20,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* Instruction attributes */ typedef u32 insn_attr_t; diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h index 5b50fa3..5736404 100644 --- a/arch/x86/include/asm/insn.h +++ b/arch/x86/include/asm/insn.h @@ -20,7 +20,9 @@ * Copyright (C) IBM Corporation, 2009 */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* insn_attr_t is defined in inat.h */ #include asm/inat.h diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c index d6a34be..564ecbd 100644 --- a/arch/x86/lib/inat.c +++ b/arch/x86/lib/inat.c @@ -18,7 +18,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/module.h +#endif #include asm/insn.h /* Attribute tables are generated from opcode map */ diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c index 254c848..3b9451a 100644 --- a/arch/x86/lib/insn.c +++ b/arch/x86/lib/insn.c @@ -18,8 +18,10 @@ * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ +#ifdef __KERNEL__ #include linux/string.h #include linux/module.h +#endif #include asm/inat.h #include asm/insn.h diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile new file mode 100644 index 000..f08859e --- /dev/null +++ b/arch/x86/scripts/Makefile @@ -0,0 +1,19 @@ +PHONY += posttest +quiet_cmd_posttest = TEST$@ + cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len + +posttest: $(obj)/test_get_len vmlinux + $(call cmd,posttest) + +test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c +test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c + +quiet_cmd_test_get_len = CC $@ + cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC)
[PATCH -tip v8 4/7] kprobes: cleanup fix_riprel() using insn decoder on x86
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction decoder. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 128 - 1 files changed, 23 insertions(+), 105 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 41d524f..ebac470 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -108,50 +108,6 @@ static const u32 twobyte_is_boostable[256 / 32] = { /* --- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; -static const u32 onebyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */ - W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */ - W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */ - W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */ - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ - W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */ - W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */ - W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */ - W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ - W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */ - W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */ - W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */ - W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */ - W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ - W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */ - W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* f0 */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; -static const u32 twobyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ - W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ - W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ - W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ - W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ - W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ - W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ - W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ - W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ - W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ - W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ - W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ - W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ - W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ - W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; #undef W struct kretprobe_blackpoint kretprobe_blacklist[] = { @@ -344,68 +300,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn) static void __kprobes fix_riprel(struct kprobe *p) { #ifdef CONFIG_X86_64 - u8 *insn = p-ainsn.insn; - s64 disp; - int need_modrm; - - /* Skip legacy instruction prefixes. */ - while (1) { - switch (*insn) { - case 0x66: - case 0x67: - case 0x2e: - case 0x3e: - case 0x26: - case 0x64: - case 0x65: - case 0x36: - case 0xf0: - case 0xf3: - case 0xf2: - ++insn; - continue; - } - break; - } + struct insn insn; + kernel_insn_init(insn, p-ainsn.insn); - /* Skip REX instruction prefix. */ - if (is_REX_prefix(insn)) - ++insn; - - if (*insn == 0x0f) { - /* Two-byte opcode. */ - ++insn; -
[PATCH -tip v8 1/7] x86: instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps do NOT include most of SSE and FP opcodes, because those opcodes are not used in the kernel. Changes from v6.1: - fix patch title. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it --- arch/x86/include/asm/inat.h| 125 ++ arch/x86/include/asm/insn.h| 134 ++ arch/x86/lib/Makefile | 13 + arch/x86/lib/inat.c| 80 arch/x86/lib/insn.c| 471 + arch/x86/lib/x86-opcode-map.txt| 711 arch/x86/scripts/gen-insn-attr-x86.awk | 314 ++ 7 files changed, 1848 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..01e079a --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,125 @@ +#ifndef _ASM_INAT_INAT_H +#define _ASM_INAT_INAT_H +/* + * x86 instruction attributes + * + * Written by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + */ +#include linux/types.h + +/* Instruction attributes */ +typedef u32 insn_attr_t; + +/* + * Internal bits. Don't use bitmasks directly, because these bits are + * unstable. You should add checking macros and use that macro in + * your code. + */ + +#define INAT_OPCODE_TABLE_SIZE 256 +#define INAT_GROUP_TABLE_SIZE 8 + +/* Legacy instruction prefixes */ +#define INAT_PFX_OPNDSZ1 /* 0x66 */ /* LPFX1 */ +#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */ +#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */ +#define INAT_PFX_LOCK 4 /* 0xF0 */ +#define INAT_PFX_CS5 /* 0x2E */ +#define INAT_PFX_DS6 /* 0x3E */ +#define INAT_PFX_ES7 /* 0x26 */ +#define INAT_PFX_FS8 /* 0x64 */ +#define INAT_PFX_GS9 /* 0x65 */ +#define INAT_PFX_SS10 /* 0x36 */ +#define INAT_PFX_ADDRSZ11 /* 0x67 */ + +#define INAT_LPREFIX_MAX 3 + +/* Immediate size */ +#define INAT_IMM_BYTE 1 +#define INAT_IMM_WORD 2 +#define INAT_IMM_DWORD 3 +#define INAT_IMM_QWORD 4 +#define INAT_IMM_PTR 5 +#define INAT_IMM_VWORD32 6 +#define INAT_IMM_VWORD 7 + +/* Legacy prefix */ +#define INAT_PFX_OFFS 0 +#define INAT_PFX_BITS 4 +#define INAT_PFX_MAX((1 INAT_PFX_BITS) - 1) +#define INAT_PFX_MASK (INAT_PFX_MAX INAT_PFX_OFFS) +/* Escape opcodes */ +#define INAT_ESC_OFFS (INAT_PFX_OFFS + INAT_PFX_BITS) +#define INAT_ESC_BITS 2 +#define INAT_ESC_MAX ((1
[PATCH -tip v8 0/7] tracing: kprobe-based event tracer and x86 instruction decoder
Hi, Here are the patches of kprobe-based event tracer for x86, version 8, which allows you to probe various kernel events through ftrace interface. This version, I added per-probe filtering support which allows you to set filters on each probe and shows formats of each probe. I think this is more generic integration with ftrace, especially event-tracer. This patchset also includes x86(-64) instruction decoder which supports non-SSE/FP opcodes and includes x86 opcode map. The decoder is used for finding the instruction boundaries when inserting new kprobes. I think it will be possible to share this opcode map with KVM's decoder. The decoder is tested when building kernel, the test compares the results of objdump and the decoder right after building vmlinux. You can enable that test by CONFIG_X86_DECODER_SELFTEST=y. This series can be applied on the latest linux-2.6-tip tree. This supports only x86(-32/-64) (but porting it on other arch just needs kprobes/kretprobes and register and stack access APIs). This patchset includes following changes: - Add x86 instruction decoder [1/7] (FIXED) - Add x86 instruction decoder selftest [2/7] (FIXED) - Check insertion point safety in kprobe [3/7] - Cleanup fix_riprel() with insn decoder [4/7] - Add arch-dep register and stack fetching functions [5/7] - Add dynamic event_call support to ftrace [6/7] (NEW) - Add kprobe-based event tracer [7/7] (UPDATED) Enhancement ideas will be added after merging: - .init function tracing support. - Support primitive types(long, ulong, int, uint, etc) for args. Kprobe-based Event Tracer = Overview This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Unlike the function tracer, this tracer can probe instructions inside of kernel functions. It allows you to check which instruction has been executed. Unlike the Tracepoint based events tracer, this tracer can add new probe points on the fly. Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /debug/tracing/kprobe_events. Synopsis of kprobe_events - p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS] : set a probe r[:EVENT] SYMBOL[+0] [FETCHARGS] : set a return probe EVENT : Event name SYMBOL[+offs|-offs]: Symbol+offset where the probe is inserted MEMADDR: Address where the probe is inserted FETCHARGS : Arguments %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0)(*) rv: Fetch return value.(**) ra: Fetch return address.(**) +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) (*) aN may not correct on asmlinkaged functions and at the middle of function body. (**) only for return probe. (***) this is useful for fetching a field of data structures. Per-Probe Event Filtering - Per-probe event filtering feature allows you to set different filter on each probe and gives you what arguments will be shown in trace buffer. If an event name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds an event under tracing/events/kprobes/EVENT, at the directory you can see 'id', 'enabled', 'format' and 'filter'. enabled: You can enable/disable the probe by writing 1 or 0 on it. format: It shows the format of this probe event. It also shows aliases of arguments which you specified to kprobe_events. filter: You can write filtering rules of this event. And you can use both of aliase names and field names for describing filters. Usage examples -- To add a probe as a new event, write a new definition to kprobe_events as below. echo p:myprobe do_sys_open a0 a1 a2 a3 /debug/tracing/kprobe_events This sets a kprobe on the top of do_sys_open() function with recording 1st to 4th arguments as myprobe event. echo r:myretprobe do_sys_open rv ra /debug/tracing/kprobe_events This sets a kretprobe on the return point of do_sys_open() function with recording return value and return address as myretprobe event. You can see the format of these events via tracing/events/kprobes/EVENT/format. cat /debug/tracing/events/kprobes/myprobe/format name: myprobe ID: 23 format: field:unsigned short common_type; offset:0; size:2; field:unsigned char common_flags; offset:2; size:1; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4;
[PATCH -tip v8 7/7] tracing: add kprobe-based event tracer
Add kprobes-based event tracer on ftrace. This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /debug/tracing/kprobe_events. And you can set filters on each probe events via /debug/tracing/events/kprobes/EVENT/filter. This tracer supports following probe arguments for each probe. %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0) rv: Fetch return value. ra: Fetch return address. +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address. See Documentation/trace/kprobes.txt for details. Changes from v7: - Fix document example. - Remove solved TODO. - Support per-probe event filtering. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobes.txt | 138 kernel/trace/Kconfig | 12 kernel/trace/Makefile|1 kernel/trace/trace.h | 22 + kernel/trace/trace_event_types.h | 20 + kernel/trace/trace_kprobe.c | 1174 ++ 6 files changed, 1367 insertions(+), 0 deletions(-) create mode 100644 Documentation/trace/kprobes.txt create mode 100644 kernel/trace/trace_kprobe.c diff --git a/Documentation/trace/kprobes.txt b/Documentation/trace/kprobes.txt new file mode 100644 index 000..f6b4587 --- /dev/null +++ b/Documentation/trace/kprobes.txt @@ -0,0 +1,138 @@ + Kprobe-based Event Tracer + = + + Documentation is written by Masami Hiramatsu + + +Overview + +This tracer is similar to the events tracer which is based on Tracepoint +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe +and kretprobe). It probes anywhere where kprobes can probe(this means, all +functions body except for __kprobes functions). + +Unlike the function tracer, this tracer can probe instructions inside of +kernel functions. It allows you to check which instruction has been executed. + +Unlike the Tracepoint based events tracer, this tracer can add and remove +probe points on the fly. + +Similar to the events tracer, this tracer doesn't need to be activated via +current_tracer, instead of that, just set probe points via +/debug/tracing/kprobe_events. And you can set filters on each probe events +via /debug/tracing/events/kprobes/EVENT/filter. + + +Synopsis of kprobe_events +- + p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: set a probe + r[:EVENT] SYMBOL[+0] [FETCHARGS] : set a return probe + + EVENT : Event name + SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted + MEMADDR : Address where the probe is inserted + + FETCHARGS : Arguments + %REG : Fetch register REG + sN : Fetch Nth entry of stack (N = 0) + @ADDR: Fetch memory at ADDR (ADDR should be in kernel) + @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data symbol) + aN : Fetch function argument. (N = 0)(*) + rv : Fetch return value.(**) + ra : Fetch return address.(**) + +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) + + (*) aN may not correct on asmlinkaged functions and at the middle of + function body. + (**) only for return probe. + (***) this is useful for fetching a field of data structures. + + +Per-Probe Event Filtering +- + Per-probe event filtering feature allows you to set different filter on each +probe and gives you what arguments will be shown in trace buffer. If an event +name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds +an event under tracing/events/kprobes/EVENT, at the directory you can see +'id', 'enabled', 'format' and 'filter'. + +enabled: + You can enable/disable the probe by writing 1 or 0 on it. + +format: + It shows the format of this probe event. It also shows aliases of arguments + which you specified to kprobe_events. + +filter: + You can write filtering rules of this event. And you can use both of aliase + names and field names for describing filters. + + +Usage examples +-- +To add a probe as a new event, write a new definition to kprobe_events +as below. + + echo
Re: [patch] VMX Unrestricted guest mode support
Unrestricted guest features is introduced in Westmere processor. Westmere is next to nehalem. And all the following processors will support it. Westmere would be the 1st processor being built on the 32nm process. Thanks, Nitin On Thu, 2009-05-28 at 16:39 -0700, Alexey Eremenko wrote: VMX Unrestricted mode support -- looks like very interesting (and useful !) feature. Which CPUs support it ? Core i7 900-series (Nehalem) ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] Add serial number support for virtio_blk, V3
+if (!(id = kzalloc(ATA_ID_WORDS, GFP_KERNEL))) +rv = -ENOMEM; Doesn't ATA_ID_WORDS seem like a strange name for a number of bytes? Yes I caught that bug in the rework as well. What's this *for* BTW? Sorry -- I assumed you were on either list. Please see patch to follow. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] Add serial number support for virtio_blk, V3
Christoph Hellwig wrote: On Wed, May 27, 2009 at 09:49:19AM +0200, Christoph Hellwig wrote: /* * IDE-compatible identify ioctl. * * Currenlyt only returns the serial number and leaves all other fields * zero. */ Btw, thinking about it the rest of the information in the ioctl should probably be filled up with faked data, similar to how we do it for the ide emulation inside qemu. Agreed. I've done so to the extent it makes sense in the case of the more generic fields. A fair amount of the identify information being generated by hw/ide.c appears obsoleted. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Add serial number support for virtio_blk, V4
[Rework of earlier patch to provide additional information in the response to an ATA identify request -- virtio_blk treats the data as opaque, content created by qemu's virtio-blk. Comments from Christoph also incorporated.] This patch allows passing of a virtio_blk drive serial number from qemu into a guest's virtio_blk driver, and provides a means to access the serial number from a guest's userspace. Equivalent functionality currently exists for IDE and SCSI, however it is not yet implemented for virtio. Scenarios exist where guest code relies on a unique drive serial number to correctly identify the machine environment in which it exists. The following two patches implement the above: qemu-vblk-serial-4.patch which provides the qemu missing bits to interpret a '-drive .. serial=XYZ ..' flag, and: virtio_blk-serial-4.patch which extracts this information and makes it available to guest userspace via an HDIO_GET_IDENTITY ioctl, eg: 'hdparm -i /dev/vda'. The above patches are relative to qemu-kvm.git and 2.6.29.3 respectively. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Add serial number support for virtio_blk, V4
qemu-vblk-serial-4.patch diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index 8dd3c7a..0b7ebe9 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -25,6 +25,7 @@ typedef struct VirtIOBlock BlockDriverState *bs; VirtQueue *vq; void *rq; +char serial_str[BLOCK_SERIAL_STRLEN + 1]; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -285,6 +286,50 @@ static void virtio_blk_reset(VirtIODevice *vdev) qemu_aio_flush(); } +/* store identify data in little endian format + */ +static inline void put_le16(uint16_t *p, unsigned int v) +{ +*p = cpu_to_le16(v); +} + +/* copy to *dst from *src, nul pad dst tail as needed to len bytes + */ +static inline void padstr(char *dst, const char *src, int len) +{ +while (len--) +*dst++ = *src ? *src++ : '\0'; +} + +/* setup simulated identify data as appropriate for virtio block device + * + * ref: AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS) + */ +static inline void virtio_identify_template(struct virtio_blk_config *bc) +{ +uint16_t *p = bc-identify[0]; +uint64_t lba_sectors = bc-capacity; + +memset(p, 0, sizeof(bc-identify)); +put_le16(p + 0, 0x0);/* ATA device */ +padstr((char *)(p + 23), QEMU_VERSION, 8); /* firmware revision */ +padstr((char *)(p + 27), QEMU VIRT_BLK, 40); /* model# */ +put_le16(p + 47, 0x80ff);/* max xfer 255 sectors */ +put_le16(p + 49, 0x0b00);/* support IORDY/LBA/DMA */ +put_le16(p + 59, 0x1ff); /* cur xfer 255 sectors */ +put_le16(p + 80, 0x1f0); /* support ATA8/7/6/5/4 */ +put_le16(p + 81, 0x16); +put_le16(p + 82, 0x400); +put_le16(p + 83, 0x400); +put_le16(p + 100, lba_sectors); +put_le16(p + 101, lba_sectors 16); +put_le16(p + 102, lba_sectors 32); +put_le16(p + 103, lba_sectors 48); +} + +/* coalesce internal state, copy to pci i/o region 0 + */ + static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = to_virtio_blk(vdev); @@ -299,11 +344,15 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) stw_raw(blkcfg.cylinders, cylinders); blkcfg.heads = heads; blkcfg.sectors = secs; +virtio_identify_template(blkcfg); +memcpy(blkcfg.identify[VIRTIO_BLK_ID_SN], s-serial_str, +VIRTIO_BLK_ID_SN_BYTES); memcpy(config, blkcfg, sizeof(blkcfg)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { +VirtIOBlock *s = to_virtio_blk(vdev); uint32_t features = 0; features |= (1 VIRTIO_BLK_F_SEG_MAX); @@ -311,6 +360,8 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev) #ifdef __linux__ features |= (1 VIRTIO_BLK_F_SCSI); #endif +if (strcmp(s-serial_str, 0)) +features |= 1 VIRTIO_BLK_F_IDENTIFY; return features; } @@ -354,6 +405,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev) int cylinders, heads, secs; static int virtio_blk_id; BlockDriverState *bs; +char *ps; s = (VirtIOBlock *)virtio_common_init(virtio-blk, VIRTIO_ID_BLOCK, sizeof(struct virtio_blk_config), @@ -365,6 +417,10 @@ VirtIODevice *virtio_blk_init(DeviceState *dev) s-vdev.reset = virtio_blk_reset; s-bs = bs; s-rq = NULL; +if (strlen(ps = (char *)drive_get_serial(bs))) +strncpy(s-serial_str, ps, sizeof(s-serial_str)); +else +snprintf(s-serial_str, sizeof(s-serial_str), 0); bs-private = dev; bdrv_guess_geometry(s-bs, cylinders, heads, secs); bdrv_set_geometry_hint(s-bs, cylinders, heads, secs); diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h index dff3e0c..1be4342 100644 --- a/hw/virtio-blk.h +++ b/hw/virtio-blk.h @@ -30,6 +30,11 @@ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ +#define VIRTIO_BLK_F_IDENTIFY 8 /* ATA IDENTIFY supported */ + +#define VIRTIO_BLK_ID_LEN 256 /* length of identify u16 array */ +#define VIRTIO_BLK_ID_SN10 /* start of char * serial# */ +#define VIRTIO_BLK_ID_SN_BYTES 20 /* length in bytes of serial# */ struct virtio_blk_config { @@ -39,6 +44,8 @@ struct virtio_blk_config uint16_t cylinders; uint8_t heads; uint8_t sectors; +uint32_t _blk_size;/* structure pad, currently unused */ +uint16_t identify[VIRTIO_BLK_ID_LEN]; } __attribute__((packed)); /* These two define direction. */ diff --git a/sysemu.h b/sysemu.h index 47d001e..d3df19f 100644 --- a/sysemu.h +++ b/sysemu.h @@ -152,6 +152,8 @@ typedef enum { BLOCK_ERR_STOP_ANY } BlockInterfaceErrorAction; +#define BLOCK_SERIAL_STRLEN 20 + typedef struct DriveInfo { BlockDriverState *bdrv;
[PATCH 2/2] Add serial number support for virtio_blk, V4
virtio_blk-serial-4.patch drivers/block/virtio_blk.c | 41 ++--- include/linux/virtio_blk.h |7 +++ 2 files changed, 45 insertions(+), 3 deletions(-) = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,46 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* return ATA identify data + */ +static int virtblk_identify(struct gendisk *disk, void *argp) +{ + struct virtio_blk *vblk = disk-private_data; + u16 *id; + int err = -ENOMEM; + + id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL); + if (!id) + goto out; + + err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY, + offsetof(struct virtio_blk_config, identify), id, + VIRTIO_BLK_ID_BYTES); + + if (err) + goto out_kfree; + + if (copy_to_user(argp, id, VIRTIO_BLK_ID_BYTES)) + err = -EFAULT; + +out_kfree: + kfree(id); +out: + return err; +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + struct gendisk *disk = bdev-bd_disk; + void __user *argp = (void __user *)data; + + switch (cmd) { + case HDIO_GET_IDENTITY: + return virtblk_identify(disk, argp); + default: + return scsi_cmd_ioctl(disk-queue, disk, mode, cmd, argp); + } } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -356,6 +390,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_IDENTIFY }; static struct virtio_driver virtio_blk = { = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -15,7 +15,13 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_IDENTIFY 8 /* ATA IDENTIFY supported */ +#define VIRTIO_BLK_ID_LEN 256 +#define VIRTIO_BLK_ID_BYTES (VIRTIO_BLK_ID_LEN * sizeof (u16)) + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -32,6 +38,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u16 identify[VIRTIO_BLK_ID_LEN]; } __attribute__((packed)); /* These two define direction. */ -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html