Re: [RFC 0/4] KVM in-kernel PM Timer implementation
On 12/14/2010 06:04 PM, Anthony Liguori wrote: On 12/14/2010 09:38 AM, Avi Kivity wrote: Fortunately, we have a very good bytecode interpreter that's accelerated in the kernel called KVM ;-) We have exactly the same bytecode interpreter under a different name, it's called userspace. If you can afford to make the transition back to the guest for emulation, you might as well transition to userspace. If you re-entered the guest and setup a stack that had the RIP of the source of the exit, then there's no additional need to exit the guest. The handler can just do an iret. Or am I missing something? I didn't even consider an iret-to-guest, to be honest. Let's consider the options: - iret-to-guest (a la tpr patching) - need to have an executable page in the guest virtual address space and some stack space (on 64-bit, can rely on iretq switching the stack). That is probably impossible to do in a generic way without guest cooperation. If we rely on guest cooperation, we might as well have the guest patch the IN instruction itself (no exits at all). - architectural SMM - no need to find a virtual mapping, or even a physical page, since we're in our own physical address space. However, the RSM instruction will trap, and on Intel, at least the first few instructions need to be emulated since SMM starts in big real mode. Also needs a tlb flush. - kvm-specific SMM (probably what you referred to as paravirt SMM, but if the guest OS is not involved, it's not really paravirt) - can switch to our own cr3 so no problem with finding a virtual mapping; however still needs a tlb flush, and on pre-NPT/EPT machines, switching cr3 back will involve an exit. We already have a virtual address space that works for most guests thanks to the TPR optimization. It only works for Windows XP and Windows XP with the /3GB extension. Is this a fundamental limitation or just a statement of today's heuristics? Does any guest not keep the BIOS in virtual memory in a static location? If you're looking for a fundamental limitation, then yes, a guest need not map the BIOS at all. Practically, I believe all common guest do map the BIOS, but IIRC modern guests use non-executable mappings. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
Am 15.12.2010 09:05, Thomas Gleixner wrote: On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 14.12.2010 22:46, Thomas Gleixner wrote: On Mon, 13 Dec 2010, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com chip_bus_lock(desc); retval = __setup_irq(irq, desc, action); chip_bus_sync_unlock(desc); - if (retval) + if (retval) { + if (desc-action !desc-action-next) + desc-irq_data.drv_status = ~IRQS_SHARED; This is redundant. IRQS_SHARED gets set in a code path where all checks are done already. Nope, it's also set before entry of __setup_irq in case we call an IRQF_ADAPTIVE handler. We need to set it that early as we may race with IRQ events for the already registered handler happening between the sharing notification and the actual registration of the second handler. Hmm, ok. Though the MAKE_SHAREABLE flag should be sufficient to do the notification. For notification, yes. But we need SHARED once we reenable the line after the notification. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) + +Example: + +- { execute: inject_nmi, arguments: { cpu_index: 0 } } +- { return: {} } + +EQMP + { .name = migrate, .args_type = detach:-d,blk:-b,inc:-i,uri:s, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 15.12.2010 09:05, Thomas Gleixner wrote: On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 14.12.2010 22:46, Thomas Gleixner wrote: On Mon, 13 Dec 2010, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com chip_bus_lock(desc); retval = __setup_irq(irq, desc, action); chip_bus_sync_unlock(desc); -if (retval) +if (retval) { +if (desc-action !desc-action-next) +desc-irq_data.drv_status = ~IRQS_SHARED; This is redundant. IRQS_SHARED gets set in a code path where all checks are done already. Nope, it's also set before entry of __setup_irq in case we call an IRQF_ADAPTIVE handler. We need to set it that early as we may race with IRQ events for the already registered handler happening between the sharing notification and the actual registration of the second handler. Hmm, ok. Though the MAKE_SHAREABLE flag should be sufficient to do the notification. For notification, yes. But we need SHARED once we reenable the line after the notification. Darn. Will think more about it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] pci-assign: Convert need_emulate_cmd into a bitmask
On Mon, Dec 13, 2010 at 04:56:29PM -0700, Alex Williamson wrote: On Tue, 2010-12-14 at 00:25 +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com Define a mask of PCI command register bits that need to be emulated, i.e. read back from their shadow state. We will need this for selectively emulating the INTx mask bit. Note: No initialization of emulate_cmd_mask to zero needed, the device state is already zero-initialized. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/device-assignment.c | 18 ++ hw/device-assignment.h |2 +- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index ef045f4..26d3bd7 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -525,14 +525,17 @@ again: DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n, (d-devfn 3) 0x1F, (d-devfn 0x7), address, val, len); -if (pci_dev-need_emulate_cmd +if (pci_dev-emulate_cmd_mask ranges_overlap(address, len, PCI_COMMAND, 2)) { if (address == PCI_COMMAND) { -val = 0x; -val |= pci_default_read_config(d, PCI_COMMAND, 2); +val = ~pci_dev-emulate_cmd_mask; +val |= pci_default_read_config(d, PCI_COMMAND, 2) +pci_dev-emulate_cmd_mask; } else { /* high-byte access */ -val = pci_default_read_config(d, PCI_COMMAND+1, 1); +val = ~(pci_dev-emulate_cmd_mask 8); +val |= pci_default_read_config(d, PCI_COMMAND+1, 1) +(pci_dev-emulate_cmd_mask 8); } } We should definitely be using merge_bits here, this is the sort of thing I had in mind for it: val = merge_bits(val, pci_default_read_config(d, address, len), PCI_COMMAND, pci_dev-emulate_cmd_mask); Not a comment on this patch at all, rather on assigned-devices support code generally: I think code will become cleaner if we add a mask array mapping the config space, along the lines of wmask and friends in pci.c I think this will make it possible to mostly get rid of merge_bits and tricky range check hacks. @@ -800,10 +803,9 @@ again: /* dealing with virtual function device */ snprintf(name, sizeof(name), %sphysfn/, dir); -if (!stat(name, statbuf)) - pci_dev-need_emulate_cmd = 1; -else - pci_dev-need_emulate_cmd = 0; +if (!stat(name, statbuf)) { +pci_dev-emulate_cmd_mask = 0x; +} dev-region_number = r; return 0; diff --git a/hw/device-assignment.h b/hw/device-assignment.h index c94a730..9ead022 100644 --- a/hw/device-assignment.h +++ b/hw/device-assignment.h @@ -109,7 +109,7 @@ typedef struct AssignedDevice { void *msix_table_page; target_phys_addr_t msix_table_addr; int mmio_index; -int need_emulate_cmd; +uint32_t emulate_cmd_mask; char *configfd_name; QLIST_ENTRY(AssignedDevice) next; } AssignedDevice; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AW: Freezing Windows 2008 x64bit guest
On Wed, 2010-12-15 at 00:57 +0100, Manfred Heubach wrote: Vadim Rozenfeld vrozenfe at redhat.com writes: On Mon, 2010-12-13 at 22:12 +0200, Dor Laor wrote: On 12/13/2010 09:42 PM, Manfred Heubach wrote: I was running the host with Ubuntu 10.04 but upgraded to 10.10 - mainly because of performance problems which were solved by the upgrade. After the upgrade the system became extremly unstable. It was crashing as soon as disk io and network io load was growing. 100% reproduceable with windows server backup to an iscsi volume. i had virtio drivers for storage and network installed (redhat/fedora 1.1.11). Which fedora/rhel release is that? The host is Ubuntu 10.10 x64 The drivers are from http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/ 1.1.11-0 released on 17-Aug-2010 - are there any newer drivers? Yes, they are the most recent drivers at fedoraproject. Our rhel6 drivers are almost the same, except for a few non-critical, WHQL related bug fixes. However, there is one fix related to Large Send Offload (LSO) problem which was fixed in rhel6, but I don't see the relevant changes in fedora kvmnet driver sources http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/src/ What's the windows virtio driver version? The virtio storage version shown in Windows is 6.0.0.10 Have you tried using virt-manager/virhs instead of raw cmdline? I'm starting it with libvirt/virsh cmd-line copied from the log (and some log entries): LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 8192 -smp 4,sockets=4,cores=1,threads=1 -name sbs2008 -uuid 933c2ef2-e5b0-0b39-db60-016b5d226534 -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/sbs2008.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=localtime -boot c -drive file=/var/lib/libvirt/images/olscanner/virtio-win-1.1.11-0.iso,if=none,media=cdrom, id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/lib/libvirt/images/sbs2008/sbs2008.img,if=none,id=drive-virtio-disk0, boot=on,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/dev/volg1/sbsdata,if=none,id=drive-virtio-disk1,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/volg1/wsus,if=none,id=drive-virtio-disk2,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2 -device e1000,vlan=0,id=net0,mac=52:54:00:8a:bc:c9,bus=pci.0,addr=0x6 -net tap,fd=107,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k de -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 07:12:02.715: debug : qemudInitCpuAffinity:2423 : Setting CPU affinity 07:12:02.717: debug : qemuSecurityDACSetProcessLabel:547 : Dropping privileges of VM to 105:114 char device redirected to /dev/pts/0 pci_add_option_rom: failed to find romfile pxe-e1000.bin About e1000, some windows comes with buggy driver and an update e1000 from Intel fixes some issues. I'm running latest drivers from Intel: 8.3.15.0 At each BSOD I had the following line in the log of the guest: virtio_ioport_write: unexpected address 0x13 value 0x1 Seems to be the result of calling PapaNdis_OnBugCheck function I changed the network interface back to e1000. What I experience now (and I had that a the very beginning before i switched to virtio network) are freezes. The guest doesn't respond anymore (doesn't answer to pings and doesn't interact via mouse/keyboard anymore). Host CPU usage of the kvm process is 100% on as many cores as there are virtual cpus (in this case 4). I had a crash today but no logentry on the host - but that could be because I had to restart syslog (ran out of diskspace after turning on debug logging ob libvirtd - didn't think that it would generate 6 GB of logs per day :-) Sounds like an interrupt storm to me. Can you try to ping your VM? No responds to ping. Anyway the best way to start debugging a stalled system is just to crash it with BSOD. For doing it you will need: - enable NMICrashDump (please see http://support.microsoft.com/kb/927069 for more information - enable Kernel Memory Dump (actually Complete is much better, but it can be too big) http://support.microsoft.com/kb/969028 - you only will need to type nmi 0 in the qemu monitor to crash the system, when the system hangs next time. I prepared this. When the system crashed today I didn't have the complete memory dump ready - so I only have a minidump. The intersting point is that the system today crashed
Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
Lai Jiangshan la...@cn.fujitsu.com writes: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com A note on commit messages: The commit message should describe the current version of the patch. Don't repeat the subject line in the body. Patch history is very useful for review, but usually uninteresting once the patch is committed. Thus, it's best to put it after the --- separator. Subsystem tags in the subject line are helpful. But qemu doesn't provide any information there :) Regarding the patch: The conversion looks good. The new QMP command is called inject_nmi, while the existing human monitor command is called nmi. Luiz asked for this name change. I don't mind. But should we rename the human monitor command for consistency? Regardless, the differing command name is worth mentioning in the commit message. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/4] KVM in-kernel PM Timer implementation
- Anthony Liguori anth...@codemonkey.ws wrote: On 12/14/2010 06:09 AM, Ulrich Obergfell wrote: [...] Parts 1 thru 4 of this RFC contain experimental source code which I recently used to investigate the performance benefit. In a Linux guest, I was running a program that calls gettimeofday() 'n' times in a loop (the PM Timer register is read during each call). With in-kernel PM Timer, I observed a significant reduction of program execution time. I've played with this in the past. Can you post real numbers, preferably, with a real work load? Anthony, I only experimented with a gettimeofday() loop. With this test scenario I observed that in-kernel PM Timer reduced the program execution time to roughly half of the execution time that it takes with userspace PM Timer. Please find some example results below (these results were obtained while the host was not busy). The relative difference of in-kernel PM Timer versus userspace PM Timer is high, whereas the absolute difference per call appears to be low. So, the benefit much depends on how frequently gettimeofday() is called in a real work load. I don't have any numbers from a real work load. When I began working on this, I was motivated by the fact that the Linux kernel itself provides an optimization for the gettimeofday() call ('vxtime'). So, from this I presumed that there would be real work loads which would benefit from the optimization of the gettimeofday() call (otherwise, why would we have 'vxtime' ?). Of course, 'vxtime' is not related to PM based time keeping. However, the experimental code shows an approach to optimize gettimeofday() in KVM virtual machines. Regards, Uli - host: # grep model name /proc/cpuinfo | sort | uniq -c 8 model name : Intel(R) Core(TM) i7 CPU Q 740 @ 1.73GHz # uname -r 2.6.37-rc4 - guest: # grep model name /proc/cpuinfo | sort | uniq -c 4 model name : QEMU Virtual CPU version 0.13.50 - test program ('gtod.c'): #include sys/time.h #include stdlib.h struct timeval tv; main(int argc, char *argv[]) { int i = atoi(argv[1]); while (i-- 0) gettimeofday(tv, NULL); } - example results with in-kernel PM Timer: # for i in 1 2 3 do time ./gtod 2500 done real0m44.302s user0m1.090s sys 0m43.163s real0m44.509s user0m1.100s sys 0m43.393s real0m45.290s user0m1.160s sys 0m44.123s # for i in 1000 5000 1 do time ./gtod $i done real0m17.981s user0m0.810s sys 0m17.157s real1m27.253s user0m1.930s sys 1m25.307s real2m51.801s user0m3.359s sys 2m48.384s - example results with userspace PM Timer: # for i in 1 2 3 do time ./gtod 2500 done real1m24.185s user0m2.000s sys 1m22.168s real1m23.508s user0m1.750s sys 1m21.738s real1m24.437s user0m1.900s sys 1m22.517s # for i in 1000 5000 1 do time ./gtod $i done real0m33.479s user0m0.680s sys 0m32.785s real2m50.831s user0m3.389s sys 2m47.405s real5m42.304s user0m7.319s sys 5m34.919s -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 14.12.2010 21:54, Thomas Gleixner wrote: On Mon, 13 Dec 2010, Jan Kiszka wrote: @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) /* Make sure it's not being used on another CPU: */ synchronize_irq(irq); + if (single_handler) + desc-irq_data.drv_status = ~IRQS_SHARED; + What's the reason to clear this flag outside of the desc-lock held region. We need to synchronize the irq first before clearing the flag. The problematic scenario behind this: An IRQ started in shared mode, this the line was unmasked after the hardirq. Now we clear IRQS_SHARED before calling into the threaded handler. And that handler may now think that the line is still masked as IRQS_SHARED is set. That should read not set I guess. Hmm, needs more thoughts :( I need this status for other purposes as well, where I definitely need serialization. Well, two options: wrap all bit manipulations with desc-lock acquisition/release or turn drv_status into an atomic. I don't know what your plans with drv_status are, so... Some bits for irq migration and other stuff, which allows us to avoid fiddling with irqdesc in the drivers. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/28] nVMX: Add VMX and SVM to list of supported cpuid features
On Thu, Dec 09, 2010, Joerg Roedel wrote about Re: [PATCH 02/28] nVMX: Add VMX and SVM to list of supported cpuid features: This patch should be the last one in your series because VMX should be fully supported before it is reported to userspace. Joerg Thanks, good idea - especially for bisection (where we don't want a guest to see half the nested VMX feature). I also removed the silly reference to SVM in the title of this patch ;-) Nadav. -- Nadav Har'El| Wednesday, Dec 15 2010, 8 Tevet 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Bore, n.: A person who talks when you http://nadav.harel.org.il |wish him to listen. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
Am 15.12.2010 14:04, Thomas Gleixner wrote: On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 14.12.2010 21:54, Thomas Gleixner wrote: On Mon, 13 Dec 2010, Jan Kiszka wrote: @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) /* Make sure it's not being used on another CPU: */ synchronize_irq(irq); + if (single_handler) + desc-irq_data.drv_status = ~IRQS_SHARED; + What's the reason to clear this flag outside of the desc-lock held region. We need to synchronize the irq first before clearing the flag. The problematic scenario behind this: An IRQ started in shared mode, this the line was unmasked after the hardirq. Now we clear IRQS_SHARED before calling into the threaded handler. And that handler may now think that the line is still masked as IRQS_SHARED is set. That should read not set I guess. Can't remember who wrote this, but that guy might have been too tired for clear sentences: Yes, of course, we could run into troubles, if IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard and threaded handler. Hmm, needs more thoughts :( Be warned, might be painful. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 15.12.2010 14:04, Thomas Gleixner wrote: On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 14.12.2010 21:54, Thomas Gleixner wrote: On Mon, 13 Dec 2010, Jan Kiszka wrote: @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) /* Make sure it's not being used on another CPU: */ synchronize_irq(irq); +if (single_handler) +desc-irq_data.drv_status = ~IRQS_SHARED; + What's the reason to clear this flag outside of the desc-lock held region. We need to synchronize the irq first before clearing the flag. The problematic scenario behind this: An IRQ started in shared mode, this the line was unmasked after the hardirq. Now we clear IRQS_SHARED before calling into the threaded handler. And that handler may now think that the line is still masked as IRQS_SHARED is set. That should read not set I guess. Can't remember who wrote this, but that guy might have been too tired for clear sentences: Yes, of course, we could run into troubles, if IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard and threaded handler. Right. As a side note, the current implementation requires that you lookup irq_data.drv_status for every invocation of the handler or have a reference to irq_data.drv_status somewhere locally, which I don't like either. I have an evil and nasy idea how to avoid that, need to look how ugly it gets. Worst case we need to go back to that notification thing which I wanted really avoid in the first place. Though I like the register_mutex idea which came out of this a lot as it allows us to reduce desc-lock held and interrupt disabled regions quite nicely. /me goes back to stare at the code Hmm, needs more thoughts :( Be warned, might be painful. Bah, my brain became pain resistant when I started hacking that code. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 15.12.2010 14:04, Thomas Gleixner wrote: On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 14.12.2010 21:54, Thomas Gleixner wrote: On Mon, 13 Dec 2010, Jan Kiszka wrote: @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) /* Make sure it's not being used on another CPU: */ synchronize_irq(irq); +if (single_handler) +desc-irq_data.drv_status = ~IRQS_SHARED; + What's the reason to clear this flag outside of the desc-lock held region. We need to synchronize the irq first before clearing the flag. The problematic scenario behind this: An IRQ started in shared mode, this the line was unmasked after the hardirq. Now we clear IRQS_SHARED before calling into the threaded handler. And that handler may now think that the line is still masked as IRQS_SHARED is set. That should read not set I guess. Can't remember who wrote this, but that guy might have been too tired for clear sentences: Yes, of course, we could run into troubles, if IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard and threaded handler. Hmm, needs more thoughts :( Be warned, might be painful. Talking about headache. Your solution above does not prevent that scenario. CPU 0 CPU 1 synchronize_irq(); hard irq comes in sees shared and unmasks clear IRQS_SHARED thread handler runs and sees !SHARED Same scenario, just moved by a few lines :) Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
Am 15.12.2010 16:41, Thomas Gleixner wrote: On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 15.12.2010 14:04, Thomas Gleixner wrote: On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 14.12.2010 21:54, Thomas Gleixner wrote: On Mon, 13 Dec 2010, Jan Kiszka wrote: @@ -943,6 +950,9 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) /* Make sure it's not being used on another CPU: */ synchronize_irq(irq); +if (single_handler) +desc-irq_data.drv_status = ~IRQS_SHARED; + What's the reason to clear this flag outside of the desc-lock held region. We need to synchronize the irq first before clearing the flag. The problematic scenario behind this: An IRQ started in shared mode, this the line was unmasked after the hardirq. Now we clear IRQS_SHARED before calling into the threaded handler. And that handler may now think that the line is still masked as IRQS_SHARED is set. That should read not set I guess. Can't remember who wrote this, but that guy might have been too tired for clear sentences: Yes, of course, we could run into troubles, if IRQS_SHARED was _not_ set while the IRQ line is unmasked between hard and threaded handler. Hmm, needs more thoughts :( Be warned, might be painful. Talking about headache. Your solution above does not prevent that scenario. CPU 0 CPU 1 synchronize_irq(); hard irq comes in sees shared and unmasks Nope, IRQ_ONESHOT is already cleared at that point. clear IRQS_SHARED thread handler runs and sees !SHARED Same scenario, just moved by a few lines :) The same, just the other way around - and mostly harmless, I hope. :) Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics
We currently enable KVM by default, and when it's not available, we print a message and fall back to TCG. Option -enable-kvm is ignored. Option -no-kvm suppresses KVM. Upstream works differently: KVM is off by default, -enable-kvm switches it on. -enable-kvm terminates the process unsuccessfully if KVM is not available. upstream qemu | default |-enable-kvm +---+--- KVM available | disabled | enabled KVM unavailable | disabled |fail qemu-kvm| default |-enable-kvm +---+--- KVM available | enabled* | enabled KVM unavailable | disabled | disabled* * differs from upstream Users of qemu and qemu-kvm need to be aware of these differences to enable / disable use of KVM reliably. This is bothersome. Consider -enable-kvm when KVM is unavailable: If the user expects qemu-kvm behavior (fall back), but qemu fails, he'll likely be surprised and unhappy. If the user expects upstream behavior (fail), but qemu-kvm falls back to TCG, the guest runs slow as molasses, and the user will likely be confused and unhappy (unless he spots and understands the disable KVM message). Switch to upstream semantics: KVM off by default, -enable-kvm switches it on, and when it can't, it's fatal. Having to enable KVM explicitly is annoying, but the proper place to address that is upstream. Signed-off-by: Markus Armbruster arm...@redhat.com --- vl.c | 10 +- 1 files changed, 1 insertions(+), 9 deletions(-) diff --git a/vl.c b/vl.c index e3c8919..87e88c2 100644 --- a/vl.c +++ b/vl.c @@ -247,7 +247,7 @@ static void *boot_set_opaque; static NotifierList exit_notifiers = NOTIFIER_LIST_INITIALIZER(exit_notifiers); -int kvm_allowed = 1; +int kvm_allowed = 0; uint32_t xen_domid; enum xen_mode xen_mode = XEN_EMULATE; @@ -2436,10 +2436,8 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_smbios: do_smbios_option(optarg); break; -#ifdef OBSOLETE_KVM_IMPL case QEMU_OPTION_enable_kvm: kvm_allowed = 1; -#endif break; case QEMU_OPTION_no_kvm: kvm_allowed = 0; @@ -2789,18 +2787,12 @@ int main(int argc, char **argv, char **envp) if (kvm_allowed) { int ret = kvm_init(smp_cpus); if (ret 0) { -#if defined(OBSOLETE_KVM_IMPL) || defined(CONFIG_NO_CPU_EMULATION) if (!kvm_available()) { printf(KVM not supported for this target\n); } else { fprintf(stderr, failed to initialize KVM: %s\n, strerror(-ret)); } exit(1); -#endif -#ifdef CONFIG_KVM -fprintf(stderr, Could not initialize KVM, will disable KVM support\n); -kvm_allowed = 0; -#endif } } -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] genirq: Inform handler about line sharing state
On Wed, 15 Dec 2010, Jan Kiszka wrote: Am 15.12.2010 16:41, Thomas Gleixner wrote: Talking about headache. Your solution above does not prevent that scenario. CPU 0 CPU 1 synchronize_irq(); hard irq comes in sees shared and unmasks Nope, IRQ_ONESHOT is already cleared at that point. Errm ? It's set. Could you please stop to increase my confusion ? :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 00/14] API test framework
This patchset adds an API test framework. Rather than driving kvm from qemu, we now have a way of calling the kvm API directly and observing the results. We can switch to guest mode and back at will and see any micro effects such as the result of executing particular instructions. A first test is added, for commit edde99ce0 (KVM: Write protect memory after slot swap). This would be pretty hard to test for using the current qemu-driven tests, but detects 5 million failures in about two minutes on my machine. As an experiment, the framework is coded in C++. Avi Kivity (14): Makefile: add support for C++ Improve autodepend includes v2: new Add exception class for kernel errors (errno) v2: new Add try_main() for running a program under an exception handler v2: new Introduce a C++ wrapper for the kvm APIs v2: throw std::exception exceptions, not ints, for libc failures better msr list management Add support for calling a function in guest mode v2: use tr1::function instead of boost::function Add sample test using the api test harness api: add support for KVM_SET_USER_MEMORY_REGION flags field v2: new api: support KVM_GET_DIRTY_LOG ioctl v2: new api: add memory map management v2: new Build tests with debug information v2: new api: Add support for creating an identity map with a hole v2: new Introduce libapi.a to avoid long Makefile recipes v2: new Add dirty log test v2: new Makefile |8 +- api/api-sample.cc | 30 api/dirty-log.cc | 78 api/exception.cc | 33 api/exception.hh | 19 + api/identity.cc | 95 api/identity.hh | 43 +++ api/kvmxx.cc | 194 + api/kvmxx.hh | 85 + api/memmap.cc | 76 +++ api/memmap.hh | 43 +++ config-x86-common.mak | 15 - 12 files changed, 715 insertions(+), 4 deletions(-) create mode 100644 api/api-sample.cc create mode 100644 api/dirty-log.cc create mode 100644 api/exception.cc create mode 100644 api/exception.hh create mode 100644 api/identity.cc create mode 100644 api/identity.hh create mode 100644 api/kvmxx.cc create mode 100644 api/kvmxx.hh create mode 100644 api/memmap.cc create mode 100644 api/memmap.hh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 01/14] Makefile: add support for C++
Signed-off-by: Avi Kivity a...@redhat.com --- Makefile |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index d25e6f2..9b0256d 100644 --- a/Makefile +++ b/Makefile @@ -30,11 +30,13 @@ CFLAGS += -O1 CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall CFLAGS += $(call cc-option, -fno-stack-protector, ) CFLAGS += $(call cc-option, -fno-stack-protector-all, ) +CFLAGS += -I. -CXXFLAGS = $(autodepend-flags) +CXXFLAGS += $(CFLAGS) autodepend-flags = -MMD -MF $(dir $*).$(notdir $*).d +LDFLAGS += $(CFLAGS) LDFLAGS += -pthread -lrt kvmtrace_objs= kvmtrace.o -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 03/14] Add exception class for kernel errors (errno)
Signed-off-by: Avi Kivity a...@redhat.com --- api/exception.cc | 20 api/exception.hh | 16 2 files changed, 36 insertions(+), 0 deletions(-) create mode 100644 api/exception.cc create mode 100644 api/exception.hh diff --git a/api/exception.cc b/api/exception.cc new file mode 100644 index 000..500569a --- /dev/null +++ b/api/exception.cc @@ -0,0 +1,20 @@ +#include exception.hh +#include cstdio +#include cstring + +errno_exception::errno_exception(int errno) +: _errno(errno) +{ +} + +int errno_exception::errno() const +{ +return _errno; +} + +const char *errno_exception::what() +{ +std::snprintf(_buf, sizeof _buf, error: %s (%d), + std::strerror(_errno), _errno); +return _buf; +} diff --git a/api/exception.hh b/api/exception.hh new file mode 100644 index 000..4672760 --- /dev/null +++ b/api/exception.hh @@ -0,0 +1,16 @@ +#ifndef EXCEPTION_HH +#define EXCEPTION_HH + +#include exception + +class errno_exception : public std::exception { +public: +explicit errno_exception(int err_no); +int errno() const; +virtual const char *what(); +private: +int _errno; +char _buf[1000]; +}; + +#endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 06/14] Add support for calling a function in guest mode
This patch provides a way to establish an identity guest which has a 1:1 gva-hva translation. This allows the host to switch to guest mode, call a function in the same address space, and return. Because long mode virtual addresses are 47 bits long, and some hosts have smaller physical addresses, we target 32-bit mode only. On x86_64 the code needs to be run with 'setarch i386 -3' to limit the address space to 3GB, so the address space occupied by the local APIC is left unused. Signed-off-by: Avi Kivity a...@redhat.com --- api/identity.cc | 76 + api/identity.hh | 28 ++ config-x86-common.mak |1 + 3 files changed, 105 insertions(+), 0 deletions(-) create mode 100644 api/identity.cc create mode 100644 api/identity.hh diff --git a/api/identity.cc b/api/identity.cc new file mode 100644 index 000..de52f68 --- /dev/null +++ b/api/identity.cc @@ -0,0 +1,76 @@ + +#include identity.hh +#include stdio.h + +namespace identity { + +typedef unsigned long ulong; + +void setup_vm(kvm::vm vm) +{ +vm.set_memory_region(0, NULL, 0, 3UL 30); +vm.set_tss_addr(3UL 30); +} + +void vcpu::setup_sregs() +{ +kvm_sregs sregs = { }; +kvm_segment dseg = { }; +dseg.base = 0; dseg.limit = -1U; dseg.type = 3; dseg.present = 1; +dseg.dpl = 3; dseg.db = 1; dseg.s = 1; dseg.l = 0; dseg.g = 1; +kvm_segment cseg = dseg; +cseg.type = 11; + +sregs.cs = cseg; asm (mov %%cs, %0 : =rm(sregs.cs.selector)); +sregs.ds = dseg; asm (mov %%ds, %0 : =rm(sregs.ds.selector)); +sregs.es = dseg; asm (mov %%es, %0 : =rm(sregs.es.selector)); +sregs.fs = dseg; asm (mov %%fs, %0 : =rm(sregs.fs.selector)); +sregs.gs = dseg; asm (mov %%gs, %0 : =rm(sregs.gs.selector)); +sregs.ss = dseg; asm (mov %%ss, %0 : =rm(sregs.ss.selector)); + +uint32_t gsbase; +asm (mov %%gs:0, %0 : =r(gsbase)); +sregs.gs.base = gsbase; + +sregs.tr.base = reinterpret_castulong(*_stack.begin()); +sregs.tr.type = 11; +sregs.tr.s = 0; +sregs.tr.present = 1; + +sregs.cr0 = 0x11; /* PE, ET, !PG */ +sregs.cr4 = 0; +sregs.efer = 0; +sregs.apic_base = 0xfee0; +_vcpu.set_sregs(sregs); +} + +void vcpu::thunk(vcpu* zis) +{ +zis-_guest_func(); +asm volatile(outb %%al, %%dx : : a(0), d(0)); +} + +void vcpu::setup_regs() +{ +kvm_regs regs = {}; +regs.rflags = 0x3202; +regs.rsp = reinterpret_castulong(*_stack.end()); +regs.rsp = ~15UL; +ulong* sp = reinterpret_castulong *(regs.rsp); +*--sp = reinterpret_castulong((char*)this); +*--sp = 0; +regs.rsp = reinterpret_castulong(sp); +regs.rip = reinterpret_castulong(vcpu::thunk); +printf(rip %llx\n, regs.rip); +_vcpu.set_regs(regs); +} + +vcpu::vcpu(kvm::vcpu vcpu, std::tr1::functionvoid () guest_func, + unsigned long stack_size) +: _vcpu(vcpu), _guest_func(guest_func), _stack(stack_size) +{ +setup_sregs(); +setup_regs(); +} + +} diff --git a/api/identity.hh b/api/identity.hh new file mode 100644 index 000..7401826 --- /dev/null +++ b/api/identity.hh @@ -0,0 +1,28 @@ +#ifndef API_IDENTITY_HH +#define API_IDENTITY_HH + +#include kvmxx.hh +#include tr1/functional +#include vector + +namespace identity { + +void setup_vm(kvm::vm vm); + +class vcpu { +public: +vcpu(kvm::vcpu vcpu, std::tr1::functionvoid () guest_func, +unsigned long stack_size = 256 * 1024); +private: +static void thunk(vcpu* vcpu); +void setup_regs(); +void setup_sregs(); +private: +kvm::vcpu _vcpu; +std::tr1::functionvoid () _guest_func; +std::vectorchar _stack; +}; + +} + +#endif diff --git a/config-x86-common.mak b/config-x86-common.mak index d22df17..dde4f67 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -81,3 +81,4 @@ arch_clean: $(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \ $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o +api/%.o: CFLAGS += -m32 -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 11/14] Build tests with debug information
Signed-off-by: Avi Kivity a...@redhat.com --- Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index 85ebd37..b6e8759 100644 --- a/Makefile +++ b/Makefile @@ -6,7 +6,7 @@ DESTDIR := $(PREFIX)/share/qemu/tests .PHONY: arch_clean clean #make sure env CFLAGS variable is not used -CFLAGS = +CFLAGS = -g libgcc := $(shell $(CC) --print-libgcc-file-name) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 07/14] Add sample test using the api test harness
Call a function setting a global variable. Signed-off-by: Avi Kivity a...@redhat.com --- api/api-sample.cc | 29 + config-x86-common.mak |7 +++ 2 files changed, 36 insertions(+), 0 deletions(-) create mode 100644 api/api-sample.cc diff --git a/api/api-sample.cc b/api/api-sample.cc new file mode 100644 index 000..8d57c09 --- /dev/null +++ b/api/api-sample.cc @@ -0,0 +1,29 @@ + +#include api/kvmxx.hh +#include api/identity.hh +#include api/exception.hh +#include stdio.h + +static int global = 0; + +static void set_global() +{ +global = 1; +} + +int test_main(int ac, char** av) +{ +kvm::system system; +kvm::vm vm(system); +identity::setup_vm(vm); +kvm::vcpu vcpu(vm, 0); +identity::vcpu thread(vcpu, set_global); +vcpu.run(); +printf(global %d\n, global); +return global == 1 ? 0 : 1; +} + +int main(int ac, char** av) +{ +return try_main(test_main, ac, av); +} diff --git a/config-x86-common.mak b/config-x86-common.mak index dde4f67..3e8e641 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -32,6 +32,8 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ $(TEST_DIR)/hypercall.flat $(TEST_DIR)/sieve.flat \ $(TEST_DIR)/kvmclock_test.flat +tests-common += api/api-sample + tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg test_cases: $(tests-common) $(tests) @@ -82,3 +84,8 @@ arch_clean: $(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o api/%.o: CFLAGS += -m32 + +api/api-sample: LDLIBS += -lstdc++ +api/api-sample: LDFLAGS += -m32 + +api/api-sample: api/api-sample.o api/kvmxx.o api/identity.o api/exception.o -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 14/14] Add dirty log test
This checks the failure that was fixed by kernel commit edde99ce0529 (KVM: Write protect memory after slot swap). Two threads are used; a guest thread continuously updates a shared variable, which is also sampled by a host thread that also checks if dirty logging marked it as dirty. It detects about 5 million failures with the fix reverted, and 0 failures with the fix applied. Signed-off-by: Avi Kivity a...@redhat.com --- api/dirty-log.cc | 78 + config-x86-common.mak |7 +++- 2 files changed, 83 insertions(+), 2 deletions(-) create mode 100644 api/dirty-log.cc diff --git a/api/dirty-log.cc b/api/dirty-log.cc new file mode 100644 index 000..1e4ef9e --- /dev/null +++ b/api/dirty-log.cc @@ -0,0 +1,78 @@ +#include kvmxx.hh +#include memmap.hh +#include identity.hh +#include boost/thread/thread.hpp +#include stdlib.h +#include stdio.h + +namespace { + +void delay_loop(unsigned n) +{ +for (unsigned i = 0; i n; ++i) { +asm volatile(pause); +} + } + +void write_mem(volatile bool running, volatile int* shared_var) +{ +while (running) { +++*shared_var; +delay_loop(1000); +} +} + +void check_dirty_log(mem_slot slot, + volatile bool running, + volatile int* shared_var, + int nr_fail) +{ +uint64_t shared_var_gpa = reinterpret_castuint64_t(shared_var); +slot.set_dirty_logging(true); +slot.update_dirty_log(); +for (int i = 0; i 1000; ++i) { +int sample1 = *shared_var; +delay_loop(600); +int sample2 = *shared_var; +slot.update_dirty_log(); +if (!slot.is_dirty(shared_var_gpa) sample1 != sample2) { +++nr_fail; +} +} +running = false; +slot.set_dirty_logging(false); +} + +} + +using boost::ref; +using std::tr1::bind; + +int main(int ac, char **av) +{ +kvm::system sys; +kvm::vm vm(sys); +mem_map memmap(vm); +void* logged_slot_virt; +posix_memalign(logged_slot_virt, 4096, 4096); +int* shared_var = static_castint*(logged_slot_virt); +identity::hole hole(logged_slot_virt, 4096); +identity::vm ident_vm(vm, memmap, hole); +kvm::vcpu vcpu(vm, 0); +bool running = true; +int nr_fail = 0; +mem_slot logged_slot(memmap, + reinterpret_castuint64_t(logged_slot_virt), + 4096, logged_slot_virt); +boost::thread host_poll_thread(check_dirty_log, ref(logged_slot), + ref(running), + ref(shared_var), ref(nr_fail)); +identity::vcpu guest_write_thread(vcpu, + bind(write_mem, + ref(running), + ref(shared_var))); +vcpu.run(); +host_poll_thread.join(); +printf(Dirty bitmap failures: %d\n, nr_fail); +return nr_fail == 0 ? 0 : 1; +} diff --git a/config-x86-common.mak b/config-x86-common.mak index ce36cde..b5c49f4 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -33,6 +33,7 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ $(TEST_DIR)/kvmclock_test.flat tests-common += api/api-sample +tests-common += api/dirty-log tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg @@ -85,10 +86,12 @@ arch_clean: api/%.o: CFLAGS += -m32 -api/%: LDLIBS += -lstdc++ +api/%: LDLIBS += -lstdc++ -lboost_thread-mt -lpthread api/%: LDFLAGS += -m32 api/libapi.a: api/kvmxx.o api/identity.o api/exception.o api/memmap.o $(AR) rcs $@ $^ -api/api-sample: api/api-sample.o api/libapi.a \ No newline at end of file +api/api-sample: api/api-sample.o api/libapi.a + +api/dirty-log: api/dirty-log.o api/libapi.a -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 13/14] Introduce libapi.a to avoid long Makefile recipes
Signed-off-by: Avi Kivity a...@redhat.com --- config-x86-common.mak | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/config-x86-common.mak b/config-x86-common.mak index 436f4bd..ce36cde 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -85,8 +85,10 @@ arch_clean: api/%.o: CFLAGS += -m32 -api/api-sample: LDLIBS += -lstdc++ -api/api-sample: LDFLAGS += -m32 +api/%: LDLIBS += -lstdc++ +api/%: LDFLAGS += -m32 -api/api-sample: api/api-sample.o api/kvmxx.o api/identity.o api/exception.o -api/api-sample: api/memmap.o \ No newline at end of file +api/libapi.a: api/kvmxx.o api/identity.o api/exception.o api/memmap.o + $(AR) rcs $@ $^ + +api/api-sample: api/api-sample.o api/libapi.a \ No newline at end of file -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 10/14] api: add memory map management
Add a class to manage the memory map and a class to represent a memory slot. Signed-off-by: Avi Kivity a...@redhat.com --- api/memmap.cc | 76 + api/memmap.hh | 43 2 files changed, 119 insertions(+), 0 deletions(-) create mode 100644 api/memmap.cc create mode 100644 api/memmap.hh diff --git a/api/memmap.cc b/api/memmap.cc new file mode 100644 index 000..c625852 --- /dev/null +++ b/api/memmap.cc @@ -0,0 +1,76 @@ + +#include memmap.hh + +mem_slot::mem_slot(mem_map map, uint64_t gpa, uint64_t size, void* hva) +: _map(map) +, _slot(map._free_slots.top()) +, _gpa(gpa) +, _size(size) +, _hva(hva) +, _dirty_log_enabled(false) +, _log() +{ +map._free_slots.pop(); +update(); +} + +mem_slot::~mem_slot() +{ +_size = 0; +try { +update(); +_map._free_slots.push(_slot); +} catch (...) { +// can't do much if we can't undo slot registration - leak the slot +} +} + +void mem_slot::set_dirty_logging(bool enabled) +{ +if (_dirty_log_enabled != enabled) { +_dirty_log_enabled = enabled; +if (enabled) { +int logsize = ((_size 12) + bits_per_word - 1) / bits_per_word; +_log.resize(logsize); +} else { +_log.resize(0); +} +update(); +} +} + +void mem_slot::update() +{ +uint32_t flags = 0; +if (_dirty_log_enabled) { +flags |= KVM_MEM_LOG_DIRTY_PAGES; +} +_map._vm.set_memory_region(_slot, _hva, _gpa, _size, flags); +} + +bool mem_slot::dirty_logging() const +{ +return _dirty_log_enabled; +} + +void mem_slot::update_dirty_log() +{ +_map._vm.get_dirty_log(_slot, _log[0]); +} + +bool mem_slot::is_dirty(uint64_t gpa) const +{ +uint64_t pagenr = (gpa - _gpa) 12; +ulong wordnr = pagenr / bits_per_word; +ulong bit = 1ULL (pagenr % bits_per_word); +return _log[wordnr] bit; +} + +mem_map::mem_map(kvm::vm vm) +: _vm(vm) +{ +int nr_slots = vm.sys().get_extension_int(KVM_CAP_NR_MEMSLOTS); +for (int i = 0; i nr_slots; ++i) { +_free_slots.push(i); +} +} diff --git a/api/memmap.hh b/api/memmap.hh new file mode 100644 index 000..59ec619 --- /dev/null +++ b/api/memmap.hh @@ -0,0 +1,43 @@ +#ifndef MEMMAP_HH +#define MEMMAP_HH + +#include kvmxx.hh +#include stdint.h +#include vector +#include stack + +class mem_map; +class mem_slot; + +class mem_slot { +public: +mem_slot(mem_map map, uint64_t gpa, uint64_t size, void *hva); +~mem_slot(); +void set_dirty_logging(bool enabled); +bool dirty_logging() const; +void update_dirty_log(); +bool is_dirty(uint64_t gpa) const; +private: +void update(); +private: +typedef unsigned long ulong; +static const int bits_per_word = sizeof(ulong) * 8; +mem_map _map; +int _slot; +uint64_t _gpa; +uint64_t _size; +void *_hva; +bool _dirty_log_enabled; +std::vectorulong _log; +}; + +class mem_map { +public: +mem_map(kvm::vm vm); +private: +kvm::vm _vm; +std::stackint _free_slots; +friend class mem_slot; +}; + +#endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 09/14] api: support KVM_GET_DIRTY_LOG ioctl
Signed-off-by: Avi Kivity a...@redhat.com --- api/kvmxx.cc |8 api/kvmxx.hh |1 + 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/api/kvmxx.cc b/api/kvmxx.cc index 42e8781..7ebebb5 100644 --- a/api/kvmxx.cc +++ b/api/kvmxx.cc @@ -163,6 +163,14 @@ void vm::set_memory_region(int slot, void *addr, uint64_t gpa, size_t len, _fd.ioctlp(KVM_SET_USER_MEMORY_REGION, umr); } +void vm::get_dirty_log(int slot, void *log) +{ +struct kvm_dirty_log kdl; +kdl.slot = slot; +kdl.dirty_bitmap = log; +_fd.ioctlp(KVM_GET_DIRTY_LOG, kdl); +} + void vm::set_tss_addr(uint32_t addr) { _fd.ioctl(KVM_SET_TSS_ADDR, addr); diff --git a/api/kvmxx.hh b/api/kvmxx.hh index 958d36f..1dcb41d 100644 --- a/api/kvmxx.hh +++ b/api/kvmxx.hh @@ -59,6 +59,7 @@ public: explicit vm(system system); void set_memory_region(int slot, void *addr, uint64_t gpa, size_t len, uint32_t flags = 0); +void get_dirty_log(int slot, void *log); void set_tss_addr(uint32_t addr); system sys() { return _system; } private: -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 12/14] api: Add support for creating an identity map with a hole
The hole may be used for mmio or dirty logging. Signed-off-by: Avi Kivity a...@redhat.com --- api/api-sample.cc |3 ++- api/identity.cc | 23 +-- api/identity.hh | 17 - config-x86-common.mak |1 + 4 files changed, 40 insertions(+), 4 deletions(-) diff --git a/api/api-sample.cc b/api/api-sample.cc index 8d57c09..524ad7b 100644 --- a/api/api-sample.cc +++ b/api/api-sample.cc @@ -15,7 +15,8 @@ int test_main(int ac, char** av) { kvm::system system; kvm::vm vm(system); -identity::setup_vm(vm); +mem_map memmap(vm); +identity::vm ident_vm(vm, memmap); kvm::vcpu vcpu(vm, 0); identity::vcpu thread(vcpu, set_global); vcpu.run(); diff --git a/api/identity.cc b/api/identity.cc index de52f68..e04231b 100644 --- a/api/identity.cc +++ b/api/identity.cc @@ -6,9 +6,28 @@ namespace identity { typedef unsigned long ulong; -void setup_vm(kvm::vm vm) +hole::hole() +: address(), size() { -vm.set_memory_region(0, NULL, 0, 3UL 30); +} + +hole::hole(void* address, size_t size) +: address(address), size(size) +{ +} + +vm::vm(kvm::vm vm, mem_map mmap, hole h) +{ +uint64_t hole_gpa = reinterpret_castuint64_t(h.address); +char* hole_hva = static_castchar*(h.address); +if (h.address) { +_slots.push_back(mem_slot_ptr(new mem_slot(mmap, 0, hole_gpa, NULL))); +} +uint64_t hole_end = hole_gpa + h.size; +uint64_t end = 3U 30; +_slots.push_back(mem_slot_ptr(new mem_slot(mmap, hole_end, + end - hole_end, + hole_hva + h.size))); vm.set_tss_addr(3UL 30); } diff --git a/api/identity.hh b/api/identity.hh index 7401826..4491043 100644 --- a/api/identity.hh +++ b/api/identity.hh @@ -2,12 +2,27 @@ #define API_IDENTITY_HH #include kvmxx.hh +#include memmap.hh #include tr1/functional +#include tr1/memory #include vector namespace identity { -void setup_vm(kvm::vm vm); +struct hole { +hole(); +hole(void* address, size_t size); +void* address; +size_t size; +}; + +class vm { +public: +vm(kvm::vm vm, mem_map mmap, hole address_space_hole = hole()); +private: +typedef std::tr1::shared_ptrmem_slot mem_slot_ptr; +std::vectormem_slot_ptr _slots; +}; class vcpu { public: diff --git a/config-x86-common.mak b/config-x86-common.mak index 3e8e641..436f4bd 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -89,3 +89,4 @@ api/api-sample: LDLIBS += -lstdc++ api/api-sample: LDFLAGS += -m32 api/api-sample: api/api-sample.o api/kvmxx.o api/identity.o api/exception.o +api/api-sample: api/memmap.o \ No newline at end of file -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 08/14] api: add support for KVM_SET_USER_MEMORY_REGION flags field
Signed-off-by: Avi Kivity a...@redhat.com --- api/kvmxx.cc |5 +++-- api/kvmxx.hh |3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/api/kvmxx.cc b/api/kvmxx.cc index ad27907..42e8781 100644 --- a/api/kvmxx.cc +++ b/api/kvmxx.cc @@ -150,12 +150,13 @@ vm::vm(system system) { } -void vm::set_memory_region(int slot, void *addr, uint64_t gpa, size_t len) +void vm::set_memory_region(int slot, void *addr, uint64_t gpa, size_t len, + uint32_t flags) { struct kvm_userspace_memory_region umr; umr.slot = slot; -umr.flags = 0; +umr.flags = flags; umr.guest_phys_addr = gpa; umr.memory_size = len; umr.userspace_addr = reinterpret_castuint64_t(addr); diff --git a/api/kvmxx.hh b/api/kvmxx.hh index 51dbe7a..958d36f 100644 --- a/api/kvmxx.hh +++ b/api/kvmxx.hh @@ -57,7 +57,8 @@ private: class vm { public: explicit vm(system system); -void set_memory_region(int slot, void *addr, uint64_t gpa, size_t len); +void set_memory_region(int slot, void *addr, uint64_t gpa, size_t len, + uint32_t flags = 0); void set_tss_addr(uint32_t addr); system sys() { return _system; } private: -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests v2 05/14] Introduce a C++ wrapper for the kvm APIs
Introduce exception-safe objects for calling system, vm, and vcpu ioctls. Signed-off-by: Avi Kivity a...@redhat.com --- api/kvmxx.cc | 185 ++ api/kvmxx.hh | 83 ++ 2 files changed, 268 insertions(+), 0 deletions(-) create mode 100644 api/kvmxx.cc create mode 100644 api/kvmxx.hh diff --git a/api/kvmxx.cc b/api/kvmxx.cc new file mode 100644 index 000..ad27907 --- /dev/null +++ b/api/kvmxx.cc @@ -0,0 +1,185 @@ +#include kvmxx.hh +#include exception.hh +#include fcntl.h +#include sys/ioctl.h +#include sys/mman.h +#include stdlib.h +#include memory +#include algorithm + +namespace kvm { + +static long check_error(long r) +{ +if (r == -1) { + throw errno_exception(errno); +} +return r; +} + +fd::fd(int fd) +: _fd(fd) +{ +} + +fd::fd(const fd other) +: _fd(::dup(other._fd)) +{ +check_error(_fd); +} + +fd::fd(std::string device_node, int flags) +: _fd(::open(device_node.c_str(), flags)) +{ +check_error(_fd); +} + +long fd::ioctl(unsigned nr, long arg) +{ +return check_error(::ioctl(_fd, nr, arg)); +} + +vcpu::vcpu(vm vm, int id) +: _vm(vm), _fd(vm._fd.ioctl(KVM_CREATE_VCPU, id)), _shared(NULL) +, _mmap_size(_vm._system._fd.ioctl(KVM_GET_VCPU_MMAP_SIZE, 0)) + +{ +kvm_run *shared = static_castkvm_run*(::mmap(NULL, _mmap_size, + PROT_READ | PROT_WRITE, + MAP_SHARED, + _fd.get(), 0)); +if (shared == MAP_FAILED) { + throw errno_exception(errno); +} +_shared = shared; +} + +vcpu::~vcpu() +{ +munmap(_shared, _mmap_size); +} + +void vcpu::run() +{ +_fd.ioctl(KVM_RUN, 0); +} + +kvm_regs vcpu::regs() +{ +kvm_regs regs; +_fd.ioctlp(KVM_GET_REGS, regs); +return regs; +} + +void vcpu::set_regs(const kvm_regs regs) +{ +_fd.ioctlp(KVM_SET_REGS, const_castkvm_regs*(regs)); +} + +kvm_sregs vcpu::sregs() +{ +kvm_sregs sregs; +_fd.ioctlp(KVM_GET_SREGS, sregs); +return sregs; +} + +void vcpu::set_sregs(const kvm_sregs sregs) +{ +_fd.ioctlp(KVM_SET_SREGS, const_castkvm_sregs*(sregs)); +} + +class vcpu::kvm_msrs_ptr { +public: +explicit kvm_msrs_ptr(size_t nmsrs); +~kvm_msrs_ptr() { ::free(_kvm_msrs); } +kvm_msrs* operator-() { return _kvm_msrs; } +kvm_msrs* get() { return _kvm_msrs; } +private: +kvm_msrs* _kvm_msrs; +}; + +vcpu::kvm_msrs_ptr::kvm_msrs_ptr(size_t nmsrs) +: _kvm_msrs(0) +{ +size_t size = sizeof(kvm_msrs) + sizeof(kvm_msr_entry) * nmsrs; +_kvm_msrs = static_castkvm_msrs*(::malloc(size)); +if (!_kvm_msrs) { + throw std::bad_alloc(); +} +} + +std::vectorkvm_msr_entry vcpu::msrs(std::vectoruint32_t indices) +{ +kvm_msrs_ptr msrs(indices.size()); +msrs-nmsrs = indices.size(); +for (unsigned i = 0; i msrs-nmsrs; ++i) { + msrs-entries[i].index = indices[i]; +} +_fd.ioctlp(KVM_GET_MSRS, msrs.get()); +return std::vectorkvm_msr_entry(msrs-entries, + msrs-entries + msrs-nmsrs); +} + +void vcpu::set_msrs(const std::vectorkvm_msr_entry msrs) +{ +kvm_msrs_ptr _msrs(msrs.size()); +_msrs-nmsrs = msrs.size(); +std::copy(msrs.begin(), msrs.end(), _msrs-entries); +_fd.ioctlp(KVM_SET_MSRS, _msrs.get()); +} + +void vcpu::set_debug(uint64_t dr[8], bool enabled, bool singlestep) +{ +kvm_guest_debug gd; + +gd.control = 0; +if (enabled) { + gd.control |= KVM_GUESTDBG_ENABLE; +} +if (singlestep) { + gd.control |= KVM_GUESTDBG_SINGLESTEP; +} +for (int i = 0; i 8; ++i) { + gd.arch.debugreg[i] = dr[i]; +} +_fd.ioctlp(KVM_SET_GUEST_DEBUG, gd); +} + +vm::vm(system system) +: _system(system), _fd(system._fd.ioctl(KVM_CREATE_VM, 0)) +{ +} + +void vm::set_memory_region(int slot, void *addr, uint64_t gpa, size_t len) +{ +struct kvm_userspace_memory_region umr; + +umr.slot = slot; +umr.flags = 0; +umr.guest_phys_addr = gpa; +umr.memory_size = len; +umr.userspace_addr = reinterpret_castuint64_t(addr); +_fd.ioctlp(KVM_SET_USER_MEMORY_REGION, umr); +} + +void vm::set_tss_addr(uint32_t addr) +{ +_fd.ioctl(KVM_SET_TSS_ADDR, addr); +} + +system::system(std::string device_node) +: _fd(device_node, O_RDWR) +{ +} + +bool system::check_extension(int extension) +{ +return _fd.ioctl(KVM_CHECK_EXTENSION, extension); +} + +int system::get_extension_int(int extension) +{ +return _fd.ioctl(KVM_CHECK_EXTENSION, extension); +} + +}; diff --git a/api/kvmxx.hh b/api/kvmxx.hh new file mode 100644 index 000..51dbe7a --- /dev/null +++ b/api/kvmxx.hh @@ -0,0 +1,83 @@ +#ifndef KVMXX_H +#define KVMXX_H + +#include string +#include signal.h +#include unistd.h +#include vector +#include errno.h +#include linux/kvm.h +#include stdint.h + +namespace kvm { + +class system; +class vm; +class vcpu; +class fd;
[PATCH kvm-unit-tests v2 04/14] Add try_main() for running a program under an exception handler
Signed-off-by: Avi Kivity a...@redhat.com --- api/exception.cc | 13 + api/exception.hh |3 +++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/api/exception.cc b/api/exception.cc index 500569a..910bdff 100644 --- a/api/exception.cc +++ b/api/exception.cc @@ -18,3 +18,16 @@ const char *errno_exception::what() std::strerror(_errno), _errno); return _buf; } + +int try_main(int (*main)(int argc, char** argv), int argc, char** argv, +int ret_on_exception) +{ +try { +return main(argc, argv); +} catch (std::exception e) { +std::fprintf(stderr, exception: %s\n, e.what()); +} catch (...) { +std::fprintf(stderr, unknown exception\n); +} +return ret_on_exception; +} diff --git a/api/exception.hh b/api/exception.hh index 4672760..f78d9a1 100644 --- a/api/exception.hh +++ b/api/exception.hh @@ -13,4 +13,7 @@ private: char _buf[1000]; }; +int try_main(int (*main)(int argc, char** argv), int argc, char** argv, +int ret_on_exception = 127); + #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH-v2 0/3] KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()
This patchset consists of three patches: [1/3] KVM: Take missing slots_lock for kvm_io_bus_unregister_dev() [2/3] KVM: Centralize slots_lock aquisition during KVM_CREATE_IRQCHIP [3/3] KVM: Centralize irq_lock aquisition during KVM_CREATE_IRQCHIP Though patch 1 is a fix, this is only for an error handling path. Patches 2,3 are just for coding style consistency. So pick up those whitch match your policy. Changelog: moved slots_lock and irq_lock aquisition to their caller following Marcelo's advice. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking slots_lock in the error handling path. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/ia64/kvm/kvm-ia64.c |2 ++ arch/x86/kvm/x86.c |4 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 48a48bd..70d224d 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -951,7 +951,9 @@ long kvm_arch_vm_ioctl(struct file *filp, goto out; r = kvm_setup_default_irq_routing(kvm); if (r) { + mutex_lock(kvm-slots_lock); kvm_ioapic_destroy(kvm); + mutex_unlock(kvm-slots_lock); goto out; } break; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8d76150..3113aaf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3308,8 +3308,10 @@ long kvm_arch_vm_ioctl(struct file *filp, if (vpic) { r = kvm_ioapic_init(kvm); if (r) { + mutex_lock(kvm-slots_lock); kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, vpic-dev); + mutex_unlock(kvm-slots_lock); kfree(vpic); goto create_irqchip_unlock; } @@ -3320,10 +3322,12 @@ long kvm_arch_vm_ioctl(struct file *filp, smp_wmb(); r = kvm_setup_default_irq_routing(kvm); if (r) { + mutex_lock(kvm-slots_lock); mutex_lock(kvm-irq_lock); kvm_ioapic_destroy(kvm); kvm_destroy_pic(kvm); mutex_unlock(kvm-irq_lock); + mutex_unlock(kvm-slots_lock); } create_irqchip_unlock: mutex_unlock(kvm-lock); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: Centralize slots_lock aquisition during KVM_CREATE_IRQCHIP
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Move slots_lock aquisition from kvm_ioapic_init() and kvm_create_pic() to their caller. As a result, x86's KVM_CREATE_IRQCHIP is now covered by a unified slots_lock section, including kvm_setup_default_irq_routing(). Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/ia64/kvm/kvm-ia64.c |2 ++ arch/x86/kvm/i8259.c |2 -- arch/x86/kvm/x86.c |6 ++ virt/kvm/ioapic.c|2 -- 4 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 70d224d..060c594 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -946,7 +946,9 @@ long kvm_arch_vm_ioctl(struct file *filp, } case KVM_CREATE_IRQCHIP: r = -EFAULT; + mutex_lock(kvm-slots_lock); r = kvm_ioapic_init(kvm); + mutex_unlock(kvm-slots_lock); if (r) goto out; r = kvm_setup_default_irq_routing(kvm); diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index f628234..37d24bc 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -580,9 +580,7 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm) * Initialize PIO device */ kvm_iodevice_init(s-dev, picdev_ops); - mutex_lock(kvm-slots_lock); ret = kvm_io_bus_register_dev(kvm, KVM_PIO_BUS, s-dev); - mutex_unlock(kvm-slots_lock); if (ret 0) { kfree(s); return NULL; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3113aaf..736ab93 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3300,6 +3300,7 @@ long kvm_arch_vm_ioctl(struct file *filp, struct kvm_pic *vpic; mutex_lock(kvm-lock); + mutex_lock(kvm-slots_lock); r = -EEXIST; if (kvm-arch.vpic) goto create_irqchip_unlock; @@ -3308,10 +3309,8 @@ long kvm_arch_vm_ioctl(struct file *filp, if (vpic) { r = kvm_ioapic_init(kvm); if (r) { - mutex_lock(kvm-slots_lock); kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, vpic-dev); - mutex_unlock(kvm-slots_lock); kfree(vpic); goto create_irqchip_unlock; } @@ -3322,14 +3321,13 @@ long kvm_arch_vm_ioctl(struct file *filp, smp_wmb(); r = kvm_setup_default_irq_routing(kvm); if (r) { - mutex_lock(kvm-slots_lock); mutex_lock(kvm-irq_lock); kvm_ioapic_destroy(kvm); kvm_destroy_pic(kvm); mutex_unlock(kvm-irq_lock); - mutex_unlock(kvm-slots_lock); } create_irqchip_unlock: + mutex_unlock(kvm-slots_lock); mutex_unlock(kvm-lock); break; } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 0b9df83..532bffc 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -393,9 +393,7 @@ int kvm_ioapic_init(struct kvm *kvm) kvm_ioapic_reset(ioapic); kvm_iodevice_init(ioapic-dev, ioapic_mmio_ops); ioapic-kvm = kvm; - mutex_lock(kvm-slots_lock); ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, ioapic-dev); - mutex_unlock(kvm-slots_lock); if (ret 0) { kvm-arch.vioapic = NULL; kfree(ioapic); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: Centralize irq_lock aquisition during KVM_CREATE_IRQCHIP
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Move irq_lock aquisition from kvm_setup_default_irq_routing(), inside of kvm_set_irq_routing(), to its caller. This makes the lock management clearer, with a bit longer irq_lock section. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/ia64/kvm/kvm-ia64.c |2 ++ arch/x86/kvm/x86.c |4 ++-- virt/kvm/irq_comm.c |2 -- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 060c594..34a1699 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -951,7 +951,9 @@ long kvm_arch_vm_ioctl(struct file *filp, mutex_unlock(kvm-slots_lock); if (r) goto out; + mutex_lock(kvm-irq_lock); r = kvm_setup_default_irq_routing(kvm); + mutex_unlock(kvm-irq_lock); if (r) { mutex_lock(kvm-slots_lock); kvm_ioapic_destroy(kvm); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 736ab93..5c70869 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3319,13 +3319,13 @@ long kvm_arch_vm_ioctl(struct file *filp, smp_wmb(); kvm-arch.vpic = vpic; smp_wmb(); + mutex_lock(kvm-irq_lock); r = kvm_setup_default_irq_routing(kvm); if (r) { - mutex_lock(kvm-irq_lock); kvm_ioapic_destroy(kvm); kvm_destroy_pic(kvm); - mutex_unlock(kvm-irq_lock); } + mutex_unlock(kvm-irq_lock); create_irqchip_unlock: mutex_unlock(kvm-slots_lock); mutex_unlock(kvm-lock); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 9f614b4..265fd2f 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -407,10 +407,8 @@ int kvm_set_irq_routing(struct kvm *kvm, ++ue; } - mutex_lock(kvm-irq_lock); old = kvm-irq_routing; kvm_irq_routing_update(kvm, new); - mutex_unlock(kvm-irq_lock); synchronize_rcu(); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? + +Example: + +- { execute: inject_nmi, arguments: { cpu_index: 0 } } +- { return: {} } + +EQMP + { .name = migrate, .args_type = detach:-d,blk:-b,inc:-i,uri:s, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 11:49:23 +0100 Markus Armbruster arm...@redhat.com wrote: Lai Jiangshan la...@cn.fujitsu.com writes: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com A note on commit messages: The commit message should describe the current version of the patch. Don't repeat the subject line in the body. Patch history is very useful for review, but usually uninteresting once the patch is committed. Thus, it's best to put it after the --- separator. Subsystem tags in the subject line are helpful. But qemu doesn't provide any information there :) Regarding the patch: The conversion looks good. The new QMP command is called inject_nmi, while the existing human monitor command is called nmi. Luiz asked for this name change. I don't mind. But should we rename the human monitor command for consistency? I don't think so, we don't need (and maybe don't even want) naming parity between QMP and HMP. Remember that one of our mistakes was to couple the two. Also, Avi asked for more descriptive names in QMP and I agree with him, I would even be in favor of calling it inject-non-maskable-interrupt. Regardless, the differing command name is worth mentioning in the commit message. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support
This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Why do we care? How are you defining actual VM RAM? Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that can be mapped into the guest physical address space, so all uses of qemu_ram_alloc should be using this API. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 19:18:32 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). Looks like a GUI feature to me, _might_ turn out to be an undesirable side effect to client writers. I guess I prefer a to-all-cpus argument. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics
On 12/15/2010 09:50 AM, Markus Armbruster wrote: We currently enable KVM by default, and when it's not available, we print a message and fall back to TCG. Option -enable-kvm is ignored. Option -no-kvm suppresses KVM. Upstream works differently: KVM is off by default, -enable-kvm switches it on. -enable-kvm terminates the process unsuccessfully if KVM is not available. upstream qemu | default |-enable-kvm +---+--- KVM available | disabled | enabled KVM unavailable | disabled |fail qemu-kvm| default |-enable-kvm +---+--- KVM available | enabled* | enabled KVM unavailable | disabled | disabled* * differs from upstream Users of qemu and qemu-kvm need to be aware of these differences to enable / disable use of KVM reliably. This is bothersome. Consider -enable-kvm when KVM is unavailable: If the user expects qemu-kvm behavior (fall back), but qemu fails, he'll likely be surprised and unhappy. If the user expects upstream behavior (fail), but qemu-kvm falls back to TCG, the guest runs slow as molasses, and the user will likely be confused and unhappy (unless he spots and understands the disable KVM message). Switch to upstream semantics: KVM off by default, -enable-kvm switches it on, and when it can't, it's fatal. Having to enable KVM explicitly is annoying, but the proper place to address that is upstream. Signed-off-by: Markus Armbrusterarm...@redhat.com Backwards compatibility is going to kill us if we try to make this change. Current qemu-kvm behavior: default: -accel kvm,tcg -no-kvm: -accel tcg -enable-kvm: -accel kvm,tcg Current upstream behavior default: -accel tcg -enable-kvm: -accel kvm I think we should tie `-accel' to the machine type. For qemu-kvm, a different default machine type should be used than upstream qemu (it really should be a configure switch). For `pc', the default `-accel' behavior should remain 'tcg'. For `kvmpc', the default `-accel' behavior should be 'kvm,tcg'. -no-kvm should be deprecated. -enable-kvm should also be deprecated in favor of the `-accel' option. In the short term, it would be a good idea to modify qemu-kvm to switch the -enable-kvm semantics to match upstream (fail if KVM isn't available). Adding an alias for 'kvmpc' upstream and qemu-kvm and making qemu-kvm default to 'kvmpc' would be helpful for management tools too. Regards, Anthony Liguori --- vl.c | 10 +- 1 files changed, 1 insertions(+), 9 deletions(-) diff --git a/vl.c b/vl.c index e3c8919..87e88c2 100644 --- a/vl.c +++ b/vl.c @@ -247,7 +247,7 @@ static void *boot_set_opaque; static NotifierList exit_notifiers = NOTIFIER_LIST_INITIALIZER(exit_notifiers); -int kvm_allowed = 1; +int kvm_allowed = 0; uint32_t xen_domid; enum xen_mode xen_mode = XEN_EMULATE; @@ -2436,10 +2436,8 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_smbios: do_smbios_option(optarg); break; -#ifdef OBSOLETE_KVM_IMPL case QEMU_OPTION_enable_kvm: kvm_allowed = 1; -#endif break; case QEMU_OPTION_no_kvm: kvm_allowed = 0; @@ -2789,18 +2787,12 @@ int main(int argc, char **argv, char **envp) if (kvm_allowed) { int ret = kvm_init(smp_cpus); if (ret 0) { -#if defined(OBSOLETE_KVM_IMPL) || defined(CONFIG_NO_CPU_EMULATION) if (!kvm_available()) { printf(KVM not supported for this target\n); } else { fprintf(stderr, failed to initialize KVM: %s\n, strerror(-ret)); } exit(1); -#endif -#ifdef CONFIG_KVM -fprintf(stderr, Could not initialize KVM, will disable KVM support\n); -kvm_allowed = 0; -#endif } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 19:18:32 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? _might_ turn out to be an undesirable side effect to client writers. They seem to be coping fine with optional arguments elsewhere. I guess I prefer a to-all-cpus argument. How would that look like? cpu-index: all? I find optional json-int a simpler schema than either a json-int or the json-string all. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 18:39:07 +0100 Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 11:49:23 +0100 Markus Armbruster arm...@redhat.com wrote: Lai Jiangshan la...@cn.fujitsu.com writes: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com A note on commit messages: The commit message should describe the current version of the patch. Don't repeat the subject line in the body. Patch history is very useful for review, but usually uninteresting once the patch is committed. Thus, it's best to put it after the --- separator. Subsystem tags in the subject line are helpful. But qemu doesn't provide any information there :) Regarding the patch: The conversion looks good. The new QMP command is called inject_nmi, while the existing human monitor command is called nmi. Luiz asked for this name change. I don't mind. But should we rename the human monitor command for consistency? I don't think so, we don't need (and maybe don't even want) naming parity between QMP and HMP. Remember that one of our mistakes was to couple the two. Human nmi and QMP inject_nmi are identical commands, aren't they? At this point in time yes, but they might not be in the near future. Assuming they might be different is the safest thing to do. That's true for all existing commands. Giving the same things the same name isn't coupling :) Expecting them to be the same in the future is. The mistake that matters here was adopting existing human commands for QMP uncritically. That's the protocol visible mistake, yes. Also, Avi asked for more descriptive names in QMP and I agree with him, I would even be in favor of calling it inject-non-maskable-interrupt. I like inject_nmi better than nmi. inject-non-maskable-interrupt is too long even for QMP. It's not supposed to be typed that much, but I'm not that strong about that. nitpick: I think we should be consistent in the use of _ or -, eg. we should pick inject-nmi or inject_nmi? Regardless, the differing command name is worth mentioning in the commit message. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Switch to upstream -enable-kvm semantics
Anthony Liguori anth...@codemonkey.ws writes: On 12/15/2010 09:50 AM, Markus Armbruster wrote: We currently enable KVM by default, and when it's not available, we print a message and fall back to TCG. Option -enable-kvm is ignored. Option -no-kvm suppresses KVM. Upstream works differently: KVM is off by default, -enable-kvm switches it on. -enable-kvm terminates the process unsuccessfully if KVM is not available. upstream qemu | default |-enable-kvm +---+--- KVM available | disabled | enabled KVM unavailable | disabled |fail qemu-kvm| default |-enable-kvm +---+--- KVM available | enabled* | enabled KVM unavailable | disabled | disabled* * differs from upstream Users of qemu and qemu-kvm need to be aware of these differences to enable / disable use of KVM reliably. This is bothersome. Consider -enable-kvm when KVM is unavailable: If the user expects qemu-kvm behavior (fall back), but qemu fails, he'll likely be surprised and unhappy. If the user expects upstream behavior (fail), but qemu-kvm falls back to TCG, the guest runs slow as molasses, and the user will likely be confused and unhappy (unless he spots and understands the disable KVM message). Switch to upstream semantics: KVM off by default, -enable-kvm switches it on, and when it can't, it's fatal. Having to enable KVM explicitly is annoying, but the proper place to address that is upstream. Signed-off-by: Markus Armbrusterarm...@redhat.com Backwards compatibility is going to kill us if we try to make this change. Current qemu-kvm behavior: default: -accel kvm,tcg -no-kvm: -accel tcg -enable-kvm: -accel kvm,tcg Current upstream behavior default: -accel tcg -enable-kvm: -accel kvm I think we should tie `-accel' to the machine type. For qemu-kvm, a different default machine type should be used than upstream qemu (it really should be a configure switch). For `pc', the default `-accel' behavior should remain 'tcg'. For kvmpc', the default `-accel' behavior should be 'kvm,tcg'. -no-kvm should be deprecated. -enable-kvm should also be deprecated in favor of the `-accel' option. I'm fine with -accel and deprecating the old options. But until we have that: In the short term, it would be a good idea to modify qemu-kvm to switch the -enable-kvm semantics to match upstream (fail if KVM isn't available). That's what my patch does. Additionally, it changes the default to match upstream: KVM disabled. What do you want changed in my patch? Adding an alias for 'kvmpc' upstream and qemu-kvm and making qemu-kvm default to 'kvmpc' would be helpful for management tools too. That would address Having to enable KVM explicitly is annoying. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
On Wed, 15 Dec 2010 18:45:09 +0100 Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Wed, 15 Dec 2010 19:18:32 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 07:09 PM, Luiz Capitulino wrote: On Wed, 15 Dec 2010 17:49:27 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Convert do_inject_nmi() to QObject, QError, we need to use it(via libvirt). changed from v1 Add document. Add error handling when the cpu index is invalid. changed from v2 use QERR_INVALID_PARAMETER_VALUE as Markus suggest. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 23024ba..f86d9fe 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu_index:i, .params = cpu, .help = inject an NMI on the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index ec31eac..3e33a96 100644 --- a/monitor.c +++ b/monitor.c @@ -2119,7 +2119,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index = qdict_get_int(qdict, cpu_index); @@ -2127,8 +2127,12 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +qerror_report(QERR_INVALID_PARAMETER_VALUE, cpu_index, + a CPU number); +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index e5f157f..fcb6bf2 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, +.args_type = cpu_index:i, +.params = cpu, +.help = inject an NMI on the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) Please, use cpu-index, that's what we're using for the human-monitor-command. Avi, Anthony, can you please ACK this new command? I'd like to see cpu-index made optional; if not present, nmi all cpus (that's what the nmi button on many machines does, or at least I think that's what it does). Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? I guess so. Avi referred to 'nmi button on many machines', I assumed he meant a virtual machine GUI, am I wrong? _might_ turn out to be an undesirable side effect to client writers. They seem to be coping fine with optional arguments elsewhere. Which we might want to review. I guess I prefer a to-all-cpus argument. How would that look like? cpu-index: all? Like this: { execute: inject-nmi, arguments: { to-all-cpus: true } } But this looks like an optimization to me, because it's also easy to do: for cpu in query-cpus; do inject-nmi cpu Unless we want to do this in an atomic way, due to side effects I'm not aware about. I find optional json-int a simpler schema than either a json-int or the json-string all. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support
On Wed, 2010-12-15 at 17:23 +, Paul Brook wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Why do we care? How are you defining actual VM RAM? Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that can be mapped into the guest physical address space, so all uses of qemu_ram_alloc should be using this API. http://wiki.qemu.org/Features/RamAPI -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support
On 12/15/2010 11:23 AM, Paul Brook wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Why do we care? How are you defining actual VM RAM? Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that can be mapped into the guest physical address space, so all uses of qemu_ram_alloc should be using this API. actual VM RAM == the DIMM devices. This address has exactly a 1-1 mapping between memory content and an address. It doesn't change during program execution. It may be mapped in the CPU in weird ways, it may be visibly different to devices, but that's a different interface. Why do we care about differentiating actual VM RAM from things that behave like RAM but are not actually RAM (like device ROM)? Because the semantics are different. ROM is non-volatile and RAM is volatile. If we don't make that distinction in our interfaces, we loose the ability to model the behavioral differences. For things like paravirtual devices, we can take short cuts (to optimize performance) by saying the device is directly connecting to RAM (and doesn't go through the normal translation hierarchy). Regards, Anthony Liguori Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ConVirt 2.0.1 Open Source released.
We are pleased to announce availability of ConVirt 2.0.1 open source. We would like to thank ConVirt user community for their continuing participation and support. This release incorporates feedback gathered from the community over last few months. To learn more about the release, please visit http://bit.ly/hZbHvW Thanks ConVirt Team http://www.convirture.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ConVirt 2.0.1 Open Source released.
On Wed, Dec 15, 2010 at 8:28 PM, jd jdsw2...@yahoo.com wrote: We are pleased to announce availability of ConVirt 2.0.1 open source. We would like to thank ConVirt user community for their continuing participation and support. This release incorporates feedback gathered from the community over last few months. jd: A description of ConVirt would be nice. Here's what I've figured out from the links: It is a management tool for Xen, KVM, and others. Written in Python under the GPLv2 but developed as open core software (there's an open source edition and an enterprise edition). It talks to KVM using the QEMU (human) monitor. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
2010/12/3 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/12/2 Michael S. Tsirkin m...@redhat.com: On Wed, Dec 01, 2010 at 05:03:43PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Sun, Nov 28, 2010 at 08:27:58PM +0900, Yoshiaki Tamura wrote: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:44PM +0900, Yoshiaki Tamura wrote: Modify inuse type to uint16_t, let save/load to handle, and revert last_avail_idx with inuse if there are outstanding emulation. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp This changes migration format, so it will break compatibility with existing drivers. More generally, I think migrating internal state that is not guest visible is always a mistake as it ties migration format to an internal implementation (yes, I know we do this sometimes, but we should at least try not to add such cases). I think the right thing to do in this case is to flush outstanding work when vm is stopped. Then, we are guaranteed that inuse is 0. I sent patches that do this for virtio net and block. Could you give me the link of your patches? I'd like to test whether they work with Kemari upon failover. If they do, I'm happy to drop this patch. Yoshi Look for this: stable migration image on a stopped vm sent on: Wed, 24 Nov 2010 17:52:49 +0200 Thanks for the info. However, The patch series above didn't solve the issue. In case of Kemari, inuse is mostly 0 because it queues the output, and while last_avail_idx gets incremented immediately, not sending inuse makes the state inconsistent between Primary and Secondary. Hmm. Can we simply avoid incrementing last_avail_idx? I think we can calculate or prepare an internal last_avail_idx, and update the external when inuse is decremented. I'll try whether it work w/ w/o Kemari. Hi Michael, Could you please take a look at the following patch? commit 36ee7910059e6b236fe9467a609f5b4aed866912 Author: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Date: Thu Dec 16 14:50:54 2010 +0900 virtio: update last_avail_idx when inuse is decreased. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp diff --git a/hw/virtio.c b/hw/virtio.c index c8a0fc6..6688c02 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -237,6 +237,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) wmb(); trace_virtqueue_flush(vq, count); vring_used_idx_increment(vq, count); +vq-last_avail_idx += count; vq-inuse -= count; } @@ -385,7 +386,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) unsigned int i, head, max; target_phys_addr_t desc_pa = vq-vring.desc; -if (!virtqueue_num_heads(vq, vq-last_avail_idx)) +if (!virtqueue_num_heads(vq, vq-last_avail_idx + vq-inuse)) return 0; /* When we start there are none of either input nor output. */ @@ -393,7 +394,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) max = vq-vring.num; -i = head = virtqueue_get_head(vq, vq-last_avail_idx++); +i = head = virtqueue_get_head(vq, vq-last_avail_idx + vq-inuse); if (vring_desc_flags(desc_pa, i) VRING_DESC_F_INDIRECT) { if (vring_desc_len(desc_pa, i) % sizeof(VRingDesc)) { I'm wondering why last_avail_idx is OK to send but not inuse. last_avail_idx is at some level a mistake, it exposes part of our internal implementation, but it does *also* express a guest observable state. Here's the problem that it solves: just looking at the rings in virtio there is no way to detect that a specific request has already been completed. And the protocol forbids completing the same request twice. Our implementation always starts processing the requests in order, and since we flush outstanding requests before save, it works to just tell the remote 'process only requests after this place'. But there's no such requirement in the virtio protocol, so to be really generic we could add a bitmask of valid avail ring entries that did not complete yet. This would be the exact representation of the guest observable state. In practice we have rings of up to 512 entries. That's 64 byte per ring, not a lot at all. However, if we ever do change the protocol to send the bitmask, we would need some code to resubmit requests out of order, so it's not trivial. Another minor mistake with last_avail_idx is that it has some redundancy: the high bits in the index ( vq size) are not necessary as they can be got from avail idx. There's a consistency check in load but we really should try to use formats that are always consistent. The following patch does the same thing as original, yet keeps the format of the virtio. It shouldn't break live migration either because inuse should be 0. Yoshi Question is, can you flush to make inuse 0 in kemari too? And if not, how do you handle the fact that some requests are
Re: [PATCH 11/21] ioport: insert event_tap_ioport() to ioport_write().
2010/11/28 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:50PM +0900, Yoshiaki Tamura wrote: Record ioport event to replay it upon failover. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Interesting. This will have to be extended to support ioeventfd. Since each eventfd is really just a binary trigger it should be enough to read out the fd state. Haven't thought about eventfd yet. Will try doing it in the next spin. Hi Michael, I looked into eventfd and realized it's only used with vhost now. However, I believe vhost bypass the net layer in qemu, and there is no way for Kemari to detect the outputs. To me, it doesn't make sense to extend this patch to support eventfd... Thanks, Yoshi Yoshi --- ioport.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ioport.c b/ioport.c index aa4188a..74aebf5 100644 --- a/ioport.c +++ b/ioport.c @@ -27,6 +27,7 @@ #include ioport.h #include trace.h +#include event-tap.h /***/ /* IO Port */ @@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, uint32_t data) default_ioport_writel }; IOPortWriteFunc *func = ioport_write_table[index][address]; + event_tap_ioport(index, address, data); if (!func) func = default_func[index]; func(ioport_opaque[address], address, data); -- 1.7.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/14/2010 07:53 PM, Hollis Blanchard wrote: On 12/14/2010 12:48 AM, Avi Kivity wrote: On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? Perhaps I should rephrase... any real-world dangerous scenarios? :) That's much less fun. I was hoping you could share some traps you've hit with Linux or Windows on x86. We've hit a lot of issues with the very limited patching we do for Windows XP (Linux does its own patching): - Windows hibernation saves the patched code, but not the payload, so we have to set up hooks to re-enable the payload when Windows resumes from hibernation - We need the vcpu id in the payload code, and no easy way to get at it. After several wierd hacks we settled on peeking at the Windows processor control block, a guest specific per-cpu data structure. - Some patched instructions are called before the stack is set up, so the return doesn't work very well - others I'm suppressing - guest checksums own kernel pages For runtime intrusion detection? Such guests can simply not ask the hypervisor to enable the rewriting feature. Which is sad. - clever compiler reuses code for constant pool Not sure what you mean here. Anyways I think clever compilers are irrelevant, since a compiler will not ordinarily emit a supervisor-mode instruction. The hypervisor has no need to patch normal user-mode instructions. I meant a really clever compiler. And by using code for the constant pool I using IP-relative addressing to fetch a constant using a small offset. If the constant happens to be a patched instruction, it won't be so constant. - guest patches itself (a la linux alternatives), surprised when it sees a different instruction PowerPC Linux does patch itself, which is a write-only operation. Other self-patchers might be different; say you use xor to toggle between two variants, reducing the amount of data you need to keep for patching. - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write This is getting really hypothetical, but why would a JIT need to read the generated code? Any wierd hypothetical idea will be in mission-critical production use somewhere, see Andreas reply. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc- ow...@vger.kernel.org] On Behalf Of Avi Kivity Sent: Tuesday, December 14, 2010 9:18 PM To: Yoder Stuart-B08248 Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org Subject: Re: re-writing on powerpc On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, December 14, 2010 2:49 AM To: Hollis Blanchard Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org Subject: Re: re-writing on powerpc On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? - guest checksums own kernel pages - clever compiler reuses code for constant pool - guest patches itself (a la linux alternatives), surprised when it sees a different instruction - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write One possible solution to hiding rewriting from guest if it must be hidden is to mark patched pages as execute only. If a guest reads a patched page, the hypervisor can fix up the read. Yes. Something that is common to all the problems above is using code as data. However, execute only would only affect the page's mapping, not the page itself, yes? So if the page has another mapping, this doesn't work. But KVM would be aware of guest page mappings, so access permissions for any particular mapping can be controlled by KVM. kvm isn't aware of all guest mappings (only those that were instantiated in shadow tlb/pagetables). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: re-writing on powerpc
-Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Wednesday, December 15, 2010 4:49 PM To: Sethi Varun-B16395 Cc: Yoder Stuart-B08248; Hollis Blanchard; Alexander Graf; kvm- p...@vger.kernel.org Subject: Re: re-writing on powerpc On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc- ow...@vger.kernel.org] On Behalf Of Avi Kivity Sent: Tuesday, December 14, 2010 9:18 PM To: Yoder Stuart-B08248 Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org Subject: Re: re-writing on powerpc On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, December 14, 2010 2:49 AM To: Hollis Blanchard Cc: Yoder Stuart-B08248; Alexander Graf; kvm- p...@vger.kernel.org Subject: Re: re-writing on powerpc On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? - guest checksums own kernel pages - clever compiler reuses code for constant pool - guest patches itself (a la linux alternatives), surprised when it sees a different instruction - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write One possible solution to hiding rewriting from guest if it must behidden is to mark patched pages as execute only. If a guest reads a patched page, the hypervisor can fix up the read. Yes. Something that is common to all the problems above is using code as data. However, execute only would only affect the page's mapping, not the page itself, yes? So if the page has another mapping, this doesn't work. But KVM would be aware of guest page mappings, so access permissions for any particular mapping can be controlled by KVM. kvm isn't aware of all guest mappings (only those that were instantiated in shadow tlb/pagetables). I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed. That's when we can set the access permissions. -Varun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/15/2010 01:32 PM, Sethi Varun-B16395 wrote: But KVM would be aware of guest page mappings, so access permissions for any particular mapping can be controlled by KVM. kvm isn't aware of all guest mappings (only those that were instantiated in shadow tlb/pagetables). I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed. That's when we can set the access permissions. You're right, for a shadow tlb kvm has all guest mappings at all time. For page table models, it doesn't. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html