Re: [PATCH 2/2] kvm tools: Use host's resolv.conf within the guest
On Thu, 2011-09-15 at 08:44 +0300, Pekka Enberg wrote: On 9/15/11 8:36 AM, Sasha Levin wrote: On Thu, 2011-09-15 at 08:29 +0300, Pekka Enberg wrote: On Wed, Sep 14, 2011 at 7:28 PM, Sasha Levinlevinsasha...@gmail.com wrote: Since kernel IP autoconfiguration doesn't set up /etc/resolv.conf, we'll use the one located within the host, since this was anyway what we simulated within the DHCP offer packets. Signed-off-by: Sasha Levinlevinsasha...@gmail.com Wouldn't a symlink to /host/etc/resolv.conf be more appropriate? Remember, we're supposed to only need to setup the shared rootfs once. It would mean the guest can screw up with the host's networking. How? You're not supposed to run the tool. Hm? If you it to the host's resolv.conf, a guest can edit host's file, no? Might even be not on purpose... For example, simply running dhcpcd on the guest. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm tools: Use host's resolv.conf within the guest
On Thu, Sep 15, 2011 at 9:00 AM, Sasha Levin levinsasha...@gmail.com wrote: Hm? If you it to the host's resolv.conf, a guest can edit host's file, no? Might even be not on purpose... For example, simply running dhcpcd on the guest. How is that going to happen if you're not running kvmtool as root? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm tools: Use host's resolv.conf within the guest
On Thu, 2011-09-15 at 09:04 +0300, Pekka Enberg wrote: On Thu, Sep 15, 2011 at 9:00 AM, Sasha Levin levinsasha...@gmail.com wrote: Hm? If you it to the host's resolv.conf, a guest can edit host's file, no? Might even be not on purpose... For example, simply running dhcpcd on the guest. How is that going to happen if you're not running kvmtool as root? In that case, dhcpcd in the guest will simply break because it can't modify resolv.conf, no? -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More work on Livebackup for qemu/qemu-kvm
Jagane, we are testing and reviewing the livebackup workspace from git://github.com/jagane/qemu-livebackup.git Several questions are coming from us. 1) It seems that the workspace has not been updated for a while. Is there any new update for this project? 2) It looks like that the support is hightly bounded with qcow2 image format. Is there any plan to support other formats? Like raw, qed streaming? 3) Can we add some checksum method to check if the backup image is correct in the process of image transfering? For example, a checksum is made before the snapshot is transfered and then is compared with the checksum of the backup image after the backup is done. Jagane Sundar: Hello All, I have made more progress on the proposed Livebackup feature for qemu and qemu-kvm. Based on Jes' feedback, I have switched over to using command line parameters instead of specific named files. So, a typical command line looks like this: # ./x86_64-softmmu/qemu-system-x86_64 -drive \ file=/dev/kvm_vol_group/kvm_root_part,boot=on,if=virtio,livebackup=on \ -drive file=/dev/kvm_vol_group/kvm_disk1,if=virtio,livebackup=on \ -m 512 -net nic,model=virtio,macaddr=52:54:00:00:00:01 \ -net tap,ifname=tap0,script=no,downscript=no \ -vnc 0.0.0.0:1000 -usb -usbdevice tablet \ -livebackup_dir /root/kvm/livebackup -livebackup_port 7900 Note the new option livebackup=on in the drive parameters, and the two new parameters -livebackup_dir and -livebackup_port Here's my strategy for rigorous testing of this new code: I have created two virtual disks in LVM logical volumes, and added code in qemu livebackup to create a LVM snapshot as soon as livebackup_client connects to qemu and creates a livebackup snapshot. Then I binary compare the livebackup backed up version of the virtual disk image with the LVM snapshot that was created by using 'cmp'. The backup images are a bit for bit match! As always, all information is available at: http://wiki.qemu.org/Features/Livebackup I have also sent in my application to make a presentation at the qemu forum 2011. In the meantime, I invite feedback on livebackup. Specifically, I am interested in scrutiny of my testing methodology. Also, I plan to add encryption (probably SSL) to the livebackup TCP connection, and some form of authentication. Any thoughts, feedback? Thanks, Jagane -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm tools: Use host's resolv.conf within the guest
On 9/15/11 9:04 AM, Sasha Levin wrote: On Thu, 2011-09-15 at 09:04 +0300, Pekka Enberg wrote: On Thu, Sep 15, 2011 at 9:00 AM, Sasha Levinlevinsasha...@gmail.com wrote: Hm? If you it to the host's resolv.conf, a guest can edit host's file, no? Might even be not on purpose... For example, simply running dhcpcd on the guest. How is that going to happen if you're not running kvmtool as root? In that case, dhcpcd in the guest will simply break because it can't modify resolv.conf, no? Yes. Why is that a problem? You're not supposed to launch a dhcp client when using shared rootfs because kvmtool takes care of that for you. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] KVM: emulate lapic tsc deadline timer for guest
Marcelo Tosatti wrote: diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h index 34595d5..3925d80 100644 --- a/arch/x86/include/asm/apicdef.h +++ b/arch/x86/include/asm/apicdef.h @@ -100,7 +100,9 @@ #define APIC_TIMER_BASE_CLKIN 0x0 #define APIC_TIMER_BASE_TMBASE 0x1 #define APIC_TIMER_BASE_DIV 0x2 +#define APIC_LVT_TIMER_ONESHOT (0 17) #define APIC_LVT_TIMER_PERIODIC (1 17) +#define APIC_LVT_TIMER_TSCDEADLINE (2 17) #define APIC_LVT_MASKED (1 16) #define APIC_LVT_LEVEL_TRIGGER (1 15) #define APIC_LVT_REMOTE_IRR (1 14) Please have a separate, introductory patch for definitions that are not KVM specific. OK, will present a separate patch. BTW, will the separate patch still be send to kvm@vger.kernel.org? +++ b/arch/x86/include/asm/kvm_host.h @@ -671,6 +671,8 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); extern bool tdp_enabled; +extern u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); + No need for extern. Any special concern, or, for coding style? a little curious :) +} else if (apic_lvtt_tscdeadline(apic)) { +/* lapic timer in tsc deadline mode */ +u64 guest_tsc, guest_tsc_delta, ns = 0; +struct kvm_vcpu *vcpu = apic-vcpu; +unsigned long this_tsc_khz = vcpu_tsc_khz(vcpu); + unsigned long flags; + +if (unlikely(!apic-lapic_timer.tscdeadline || !this_tsc_khz)) +return; + +local_irq_save(flags); + +now = apic-lapic_timer.timer.base-get_time(); +kvm_get_msr(vcpu, MSR_IA32_TSC, guest_tsc); Use kvm_x86_ops-read_l1_tsc(vcpu) instead of direct MSR read (to avoid reading L2 guest TSC in case of nested virt). Fine. I use some old version kvm (Jul 22), and didn't notice Nadav's patch checked in Aug 2 with read_l1_tsc hook. Thanks for tell me. +guest_tsc_delta = apic-lapic_timer.tscdeadline - guest_tsc; if (guest_tsc = tscdeadline), the timer should start immediately. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6cb353c..a73c059 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -610,6 +610,16 @@ static void update_cpuid(struct kvm_vcpu *vcpu) if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) best-ecx |= bit(X86_FEATURE_OSXSAVE); } + +/* + * When cpu has tsc deadline timer capacibility, use bit 17/18 + * as timer mode mask. Otherwise only use bit 17. + */ +if (cpu_has_tsc_deadline_timer best-function == 0x1) { +best-ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER); +vcpu-arch.apic-lapic_timer.timer_mode_mask = (3 17); + } else +vcpu-arch.apic-lapic_timer.timer_mode_mask = (1 17); } The deadline timer is entirely emulated, whether the host CPU supports it or not is irrelevant. Why was this changed from previous submissions? Hmm, will explain in next email. Thanks, Jinsong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kgdb hooks and kvm-tool
Thanks! \dae On Thu, Sep 15, 2011 at 08:39:03AM +0300, Sasha Levin wrote: On Thu, 2011-09-15 at 08:32 +0300, Pekka Enberg wrote: On Thu, Sep 15, 2011 at 2:17 AM, David Evensky even...@dancer.ca.sandia.gov wrote: Hi. Is it possible to use kvm-tool with a kernel compiled with kgdb? I've tried adding 'kgdbwait kgdboc=ttyS0' to -p, but that doesn't seem to work. I've never tried kgdb myself but I'm rather surprised it doesn't just work. Sasha, Cyrill, Asias, have you guys ever tried kvmtool with kgdb? You can either use 'kgdboc=kbd' to use it over the keyboard. I also have a patch which uses forktty() to spawn serial consoles and redirect guest tty's into them, but it's somewhat ugly. Give me a day or two to make it nicer and I'll send it over. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm tools: Use host's resolv.conf within the guest
On Thu, 2011-09-15 at 09:22 +0300, Pekka Enberg wrote: On 9/15/11 9:04 AM, Sasha Levin wrote: On Thu, 2011-09-15 at 09:04 +0300, Pekka Enberg wrote: On Thu, Sep 15, 2011 at 9:00 AM, Sasha Levinlevinsasha...@gmail.com wrote: Hm? If you it to the host's resolv.conf, a guest can edit host's file, no? Might even be not on purpose... For example, simply running dhcpcd on the guest. How is that going to happen if you're not running kvmtool as root? In that case, dhcpcd in the guest will simply break because it can't modify resolv.conf, no? Yes. Why is that a problem? You're not supposed to launch a dhcp client when using shared rootfs because kvmtool takes care of that for you. Why? Testing a brand new dhcp client for example :) We can't block the user from editing guest configuration files... -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory API code review
Avi Kivity a...@redhat.com wrote: I would like to carry out an online code review of the memory API so that more people are familiar with the internals, and perhaps even to catch some bugs or deficiency. I'd like to use the next kvm conference call slot for this (Tuesday 1400 UTC) since many people already have it reserved in the schedule. It would be great if people from the wider qemu community be present, rather than the usual x86 is everything crowd (+Jan) that usually participates in the kvm weekly call. Juan, Chris, can we dedicate next week's call to this? I think so. Later, Juan. We'll also need a way to disseminate a few slides and an editor session for showing the code. We have an elluminate account that can be used for this, but usually this has a 50% failure rate on Linux. Anthony, perhaps we can set up a view-only vnc reflector on qemu.org? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] KVM: emulate lapic tsc deadline timer for guest
Marcelo Tosatti wrote: +} else if (apic_lvtt_tscdeadline(apic)) { +/* lapic timer in tsc deadline mode */ +u64 guest_tsc, guest_tsc_delta, ns = 0; +struct kvm_vcpu *vcpu = apic-vcpu; +unsigned long this_tsc_khz = vcpu_tsc_khz(vcpu); + unsigned long flags; + +if (unlikely(!apic-lapic_timer.tscdeadline || !this_tsc_khz)) +return; + +local_irq_save(flags); + +now = apic-lapic_timer.timer.base-get_time(); +kvm_get_msr(vcpu, MSR_IA32_TSC, guest_tsc); Use kvm_x86_ops-read_l1_tsc(vcpu) instead of direct MSR read (to avoid reading L2 guest TSC in case of nested virt). +guest_tsc_delta = apic-lapic_timer.tscdeadline - guest_tsc; if (guest_tsc = tscdeadline), the timer should start immediately. Yes, under such case the timer does start immediately, with ns = 0 Thanks, Jinsong-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: Allow remapping guest TTY into host PTS
This patch adds the '-tty' option to 'kvm run' which allows the user to remap a guest TTY into a PTS on the host. Usage: 'kvm run --tty [id] [other options]' The tty will be mapped to a pts and will be printed on the screen: ' Info: Assigned terminal 1 to pty /dev/pts/X' At this point, it is possible to communicate with the guest using that pty. This is useful for debugging guest kernel using KGDB: 1. Run the guest: 'kvm run -k [vmlinuz] -p kdbgoc=ttyS1 kdbgwait --tty 1' And see which PTY got assigned to ttyS1. 2. Run GDB on the host: 'gdb [vmlinuz]' 3. Connect to the guest (from within GDB): 'target remote /dev/pty/X' 4. Start debugging! (enter 'continue' to continue boot). Cc: David Evensky even...@dancer.ca.sandia.gov Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile |1 + tools/kvm/builtin-run.c | 12 tools/kvm/hw/serial.c| 46 ++-- tools/kvm/include/kvm/term.h | 11 --- tools/kvm/term.c | 60 + tools/kvm/virtio/console.c |6 ++-- 6 files changed, 96 insertions(+), 40 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index efa032d..fef624d 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -115,6 +115,7 @@ OBJS+= bios/bios-rom.o LIBS += -lrt LIBS += -lpthread +LIBS += -lutil # Additional ARCH settings for x86 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \ diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 5dafb15..b5c63ca 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -172,6 +172,15 @@ static int virtio_9p_rootdir_parser(const struct option *opt, const char *arg, i return 0; } +static int tty_parser(const struct option *opt, const char *arg, int unset) +{ + int tty = atoi(arg); + + term_set_tty(tty); + + return 0; +} + static int shmem_parser(const struct option *opt, const char *arg, int unset) { const u64 default_size = SHMEM_DEFAULT_SIZE; @@ -316,6 +325,9 @@ static const struct option options[] = { OPT_STRING('\0', console, console, serial or virtio, Console to use), OPT_STRING('\0', dev, dev, device_file, KVM device file), + OPT_CALLBACK('\0', tty, NULL, tty id, +Remap guest TTY into a pty on the host, +tty_parser), OPT_GROUP(Kernel options:), OPT_STRING('k', kernel, kernel_filename, kernel, diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c index b3b233f..11fa5d4 100644 --- a/tools/kvm/hw/serial.c +++ b/tools/kvm/hw/serial.c @@ -14,6 +14,7 @@ struct serial8250_device { pthread_mutex_t mutex; + u8 id; u16 iobase; u8 irq; @@ -42,6 +43,7 @@ static struct serial8250_device devices[] = { [0] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 0, .iobase = 0x3f8, .irq= 4, @@ -51,6 +53,7 @@ static struct serial8250_device devices[] = { [1] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 1, .iobase = 0x2f8, .irq= 3, @@ -60,6 +63,7 @@ static struct serial8250_device devices[] = { [2] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 2, .iobase = 0x3e8, .irq= 4, @@ -69,6 +73,7 @@ static struct serial8250_device devices[] = { [3] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 3, .iobase = 0x2e8, .irq= 3, @@ -111,10 +116,10 @@ static void serial8250__receive(struct kvm *kvm, struct serial8250_device *dev) return; } - if (!term_readable(CONSOLE_8250)) + if (!term_readable(CONSOLE_8250, dev-id)) return; - c = term_getc(CONSOLE_8250); + c = term_getc(CONSOLE_8250, dev-id); if (c 0) return; @@ -123,30 +128,31 @@ static void serial8250__receive(struct kvm *kvm, struct serial8250_device *dev) dev-lsr|= UART_LSR_DR; } -/* - * Interrupts are injected for ttyS0 only. - */ void serial8250__inject_interrupt(struct kvm *kvm) { - struct serial8250_device *dev = devices[0]; + int i; - mutex_lock(dev-mutex); + for (i = 0; i 4; i++) { + struct serial8250_device *dev = devices[i]; -
-cpu core2duo still has no SSE4 support?
hi, i run kvm with -cpu core2duo option, but /proc/cpuinfo only shows SSE and SSE2. my host is Core i7, so i suppose that i should have SSE4 with this option, but it seems not? is there any way to get SSE4? (i am on kvm-0.12.3 on Ubuntu 10.04) thanks, Jun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: -cpu core2duo still has no SSE4 support?
On Thu, 2011-09-15 at 17:04 +0800, Jun Koi wrote: hi, i run kvm with -cpu core2duo option, but /proc/cpuinfo only shows SSE and SSE2. my host is Core i7, so i suppose that i should have SSE4 with this option, but it seems not? is there any way to get SSE4? How about just running it with '-cpu host'? -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: -cpu core2duo still has no SSE4 support?
On Thu, Sep 15, 2011 at 5:05 PM, Sasha Levin levinsasha...@gmail.com wrote: On Thu, 2011-09-15 at 17:04 +0800, Jun Koi wrote: hi, i run kvm with -cpu core2duo option, but /proc/cpuinfo only shows SSE and SSE2. my host is Core i7, so i suppose that i should have SSE4 with this option, but it seems not? is there any way to get SSE4? How about just running it with '-cpu host'? hah, that works, thanks! but then there are 2 problems: - -cpu host should be exposed in the doc. kvm -cpu ? reports no such option, so i missed it. - -cpu core2duo should enable SSE4, but it doesnt. a bug? thanks, Jun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: -cpu core2duo still has no SSE4 support?
On 09/15/2011 12:14 PM, Jun Koi wrote: On Thu, Sep 15, 2011 at 5:05 PM, Sasha Levinlevinsasha...@gmail.com wrote: On Thu, 2011-09-15 at 17:04 +0800, Jun Koi wrote: hi, i run kvm with -cpu core2duo option, but /proc/cpuinfo only shows SSE and SSE2. my host is Core i7, so i suppose that i should have SSE4 with this option, but it seems not? is there any way to get SSE4? How about just running it with '-cpu host'? hah, that works, thanks! but then there are 2 problems: - -cpu host should be exposed in the doc. kvm -cpu ? reports no such option, so i missed it. - -cpu core2duo should enable SSE4, but it doesnt. a bug? If this model should have contain it, yes. According to sysconfigs/target/target-x86_64.conf it shouldn't be in core2duo but does appear in newer ones. thanks, Jun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: -cpu core2duo still has no SSE4 support?
On Thu, 2011-09-15 at 17:14 +0800, Jun Koi wrote: On Thu, Sep 15, 2011 at 5:05 PM, Sasha Levin levinsasha...@gmail.com wrote: On Thu, 2011-09-15 at 17:04 +0800, Jun Koi wrote: hi, i run kvm with -cpu core2duo option, but /proc/cpuinfo only shows SSE and SSE2. my host is Core i7, so i suppose that i should have SSE4 with this option, but it seems not? is there any way to get SSE4? How about just running it with '-cpu host'? hah, that works, thanks! but then there are 2 problems: - -cpu host should be exposed in the doc. kvm -cpu ? reports no such option, so i missed it. - -cpu core2duo should enable SSE4, but it doesnt. a bug? SSE4 doesn't come built in with all core2duos. See http://download.intel.com/design/mobile/datashts/31674505.pdf for example. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Allow remapping guest TTY into host PTS
On Thu, Sep 15, 2011 at 11:53 AM, Sasha Levin levinsasha...@gmail.com wrote: This patch adds the '-tty' option to 'kvm run' which allows the user to remap a guest TTY into a PTS on the host. Usage: 'kvm run --tty [id] [other options]' The tty will be mapped to a pts and will be printed on the screen: ' Info: Assigned terminal 1 to pty /dev/pts/X' At this point, it is possible to communicate with the guest using that pty. This is useful for debugging guest kernel using KGDB: 1. Run the guest: 'kvm run -k [vmlinuz] -p kdbgoc=ttyS1 kdbgwait --tty 1' And see which PTY got assigned to ttyS1. 2. Run GDB on the host: 'gdb [vmlinuz]' 3. Connect to the guest (from within GDB): 'target remote /dev/pty/X' 4. Start debugging! (enter 'continue' to continue boot). Cc: David Evensky even...@dancer.ca.sandia.gov Signed-off-by: Sasha Levin levinsasha...@gmail.com Neat! Would a tools/kvm/Documentation/debugging.txt be helpful for people who want to do kernel debugging with kvmtool? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nokia
Congratulations! Your email address has won £250,000.00 Pounds in this Year Nokia UK Mobile Promo.To claim E-mail your name, tel and add. Regards Susan Oxford 15/09/2011 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: emulate lapic tsc deadline timer for guest
On Thu, Sep 15, 2011 at 04:17:20PM +0800, Liu, Jinsong wrote: Marcelo Tosatti wrote: + } else if (apic_lvtt_tscdeadline(apic)) { + /* lapic timer in tsc deadline mode */ + u64 guest_tsc, guest_tsc_delta, ns = 0; + struct kvm_vcpu *vcpu = apic-vcpu; + unsigned long this_tsc_khz = vcpu_tsc_khz(vcpu); + unsigned long flags; + + if (unlikely(!apic-lapic_timer.tscdeadline || !this_tsc_khz)) + return; + + local_irq_save(flags); + + now = apic-lapic_timer.timer.base-get_time(); + kvm_get_msr(vcpu, MSR_IA32_TSC, guest_tsc); Use kvm_x86_ops-read_l1_tsc(vcpu) instead of direct MSR read (to avoid reading L2 guest TSC in case of nested virt). + guest_tsc_delta = apic-lapic_timer.tscdeadline - guest_tsc; if (guest_tsc = tscdeadline), the timer should start immediately. Yes, under such case the timer does start immediately, with ns = 0 No, guest_tsc_delta is unsigned, so the 0 comparation fails. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Allow remapping guest TTY into host PTS
On Thu, 2011-09-15 at 12:32 +0300, Pekka Enberg wrote: On Thu, Sep 15, 2011 at 11:53 AM, Sasha Levin levinsasha...@gmail.com wrote: This patch adds the '-tty' option to 'kvm run' which allows the user to remap a guest TTY into a PTS on the host. Usage: 'kvm run --tty [id] [other options]' The tty will be mapped to a pts and will be printed on the screen: ' Info: Assigned terminal 1 to pty /dev/pts/X' At this point, it is possible to communicate with the guest using that pty. This is useful for debugging guest kernel using KGDB: 1. Run the guest: 'kvm run -k [vmlinuz] -p kdbgoc=ttyS1 kdbgwait --tty 1' And see which PTY got assigned to ttyS1. 2. Run GDB on the host: 'gdb [vmlinuz]' 3. Connect to the guest (from within GDB): 'target remote /dev/pty/X' 4. Start debugging! (enter 'continue' to continue boot). Cc: David Evensky even...@dancer.ca.sandia.gov Signed-off-by: Sasha Levin levinsasha...@gmail.com Neat! Would a tools/kvm/Documentation/debugging.txt be helpful for people who want to do kernel debugging with kvmtool? I'll write a basic doc with the details provided above. David, does this patch allows you to properly debug guest kernels? If so, could you mail back any issues or hacks you had to do to set it up so I could add it to the doc and move it into 'Documentation/'? -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
-Original Message- From: Roedel, Joerg [mailto:joerg.roe...@amd.com] Sent: Monday, September 12, 2011 6:06 PM To: Sethi Varun-B16395 Cc: Joerg Roedel; Greg KH; io...@lists.linux-foundation.org; Alex Williamson; Ohad Ben-Cohen; David Woodhouse; David Brown; kvm@vger.kernel.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type On Mon, Sep 12, 2011 at 08:08:41AM -0400, Sethi Varun-B16395 wrote: The IOMMUs are usually devices on the bus itself, so they are initialized after the bus is set up and the devices on it are populated. So the function can not be called on bus initialization because the IOMMU is not ready at this point. Well, at what point would the add_device_group (referring to patch set posted by Alex) call back be invoked? The details are up to Alex Williamson. One option is to register a notifier for the bus in the iommu_bus_init() function and react to its notifications. I think in the end we will have a number of additional call-backs in the iommu_ops which are called by the notifier (or from the driver-core directly) to handle actions like added or removed devices. All the infrastructure for that which is implemented in the iommu-drivers today will then be in the iommu-core code. I am not sure If I understand this, but as per your earlier statement iommu is a device on the bus and its initialization would happen when bus is set up and devices are populated. So, when would device notifier call an iommu call back? -Varun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: emulate lapic tsc deadline timer for guest
On Thu, Sep 15, 2011 at 02:22:58PM +0800, Liu, Jinsong wrote: Marcelo Tosatti wrote: diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h index 34595d5..3925d80 100644 --- a/arch/x86/include/asm/apicdef.h +++ b/arch/x86/include/asm/apicdef.h @@ -100,7 +100,9 @@ #define APIC_TIMER_BASE_CLKIN 0x0 #define APIC_TIMER_BASE_TMBASE 0x1 #define APIC_TIMER_BASE_DIV 0x2 +#define APIC_LVT_TIMER_ONESHOT (0 17) #define APIC_LVT_TIMER_PERIODIC (1 17) +#define APIC_LVT_TIMER_TSCDEADLINE (2 17) #define APIC_LVT_MASKED (1 16) #define APIC_LVT_LEVEL_TRIGGER (1 15) #define APIC_LVT_REMOTE_IRR (1 14) Please have a separate, introductory patch for definitions that are not KVM specific. OK, will present a separate patch. BTW, will the separate patch still be send to kvm@vger.kernel.org? Yes. +++ b/arch/x86/include/asm/kvm_host.h @@ -671,6 +671,8 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); extern bool tdp_enabled; +extern u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); + No need for extern. Any special concern, or, for coding style? a little curious :) It is not necessary. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [dhcp] Use random transaction ID to associate messages
RFC2131.txt: xid 4 Transaction ID, a random number chosen by the client, used by the client and server to associate messages and responses between a client and a server. The 'xid' field is used by the client to match incoming DHCP messages with pending requests. A DHCP client MUST choose 'xid's in such a way as to minimize the chance of using an 'xid' identical to one used by another client. For example, a client may choose a different, random initial 'xid' each time the client is rebooted, and subsequently use sequential 'xid's until the next reboot. Selecting a new 'xid' for each retransmission is an implementation decision. A client may choose to reuse the same 'xid' or select a new 'xid' for each retransmitted message. This patch generates random id when start dhcp, and record it to netdev struct. Signed-off-by: Amos Kong ak...@redhat.com CC: Eduardo Habkost ehabk...@redhat.com CC: Marty Connor m...@etherboot.org --- src/include/gpxe/netdevice.h |3 +++ src/net/udp/dhcp.c | 23 --- 2 files changed, 7 insertions(+), 19 deletions(-) diff --git a/src/include/gpxe/netdevice.h b/src/include/gpxe/netdevice.h index 97bf168..7272cf8 100644 --- a/src/include/gpxe/netdevice.h +++ b/src/include/gpxe/netdevice.h @@ -294,6 +294,9 @@ struct net_device { /** Link-layer broadcast address */ const uint8_t *ll_broadcast; + /* DHCP Transaction ID */ + uint32_t xid; + /** Current device state * * This is the bitwise-OR of zero or more NETDEV_XXX constants. diff --git a/src/net/udp/dhcp.c b/src/net/udp/dhcp.c index 4bfcb80..51b7150 100644 --- a/src/net/udp/dhcp.c +++ b/src/net/udp/dhcp.c @@ -136,23 +136,6 @@ static inline const char * dhcp_msgtype_name ( unsigned int msgtype ) { } } -/** - * Calculate DHCP transaction ID for a network device - * - * @v netdev Network device - * @ret xidDHCP XID - * - * Extract the least significant bits of the hardware address for use - * as the transaction ID. - */ -static uint32_t dhcp_xid ( struct net_device *netdev ) { - uint32_t xid; - - memcpy ( xid, ( netdev-ll_addr + netdev-ll_protocol-ll_addr_len -- sizeof ( xid ) ), sizeof ( xid ) ); - return xid; -} - / * * DHCP session @@ -1070,7 +1053,7 @@ int dhcp_create_packet ( struct dhcp_packet *dhcppkt, /* Initialise DHCP packet content */ memset ( dhcphdr, 0, max_len ); - dhcphdr-xid = dhcp_xid ( netdev ); + dhcphdr-xid = netdev-xid; dhcphdr-magic = htonl ( DHCP_MAGIC_COOKIE ); dhcphdr-htype = ntohs ( netdev-ll_protocol-ll_proto ); dhcphdr-op = dhcp_op[msgtype]; @@ -1313,7 +1296,8 @@ static int dhcp_deliver_iob ( struct xfer_interface *xfer, server_id, sizeof ( server_id ) ); /* Check for matching transaction ID */ - if ( dhcphdr-xid != dhcp_xid ( dhcp-netdev ) ) { + if ( dhcphdr-xid != dhcp-netdev-xid ) { + DBGC ( dhcp, DHCP %p %s from %s:%d has bad transaction ID\n, dhcp, dhcp_msgtype_name ( msgtype ), inet_ntoa ( peer-sin_addr ), @@ -1442,6 +1426,7 @@ int start_dhcp ( struct job_interface *job, struct net_device *netdev ) { dhcp = zalloc ( sizeof ( *dhcp ) ); if ( ! dhcp ) return -ENOMEM; + netdev-xid = random(); ref_init ( dhcp-refcnt, dhcp_free ); job_init ( dhcp-job, dhcp_job_operations, dhcp-refcnt ); xfer_init ( dhcp-xfer, dhcp_xfer_operations, dhcp-refcnt ); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 05/11] KVM: MMU: do not mark accessed bit on pte write path
On Wed, Sep 14, 2011 at 12:55:09PM +0300, Avi Kivity wrote: On 09/13/2011 09:29 PM, Xiao Guangrong wrote: On 09/13/2011 06:53 PM, Avi Kivity wrote: On 08/30/2011 05:35 AM, Xiao Guangrong wrote: In current code, the accessed bit is always set when page fault occurred, do not need to set it on pte write path What about speculative sptes that are then only accessed via emulation? The gfn is read and written only via emulation? I think this case is very very rare? Probably... The access information will be transferred via the host pte, via get_user_pages, to MM layer, in that case. Marcelo? Can you think of another case where spte.accessed is needed? No, an spte updated via emulation will either be accessed directly, or if via emulation, access to the gfn it points transferred via host pte. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
On Thu, Sep 15, 2011 at 08:45:35AM -0400, Sethi Varun-B16395 wrote: From: Roedel, Joerg [mailto:joerg.roe...@amd.com] The details are up to Alex Williamson. One option is to register a notifier for the bus in the iommu_bus_init() function and react to its notifications. I think in the end we will have a number of additional call-backs in the iommu_ops which are called by the notifier (or from the driver-core directly) to handle actions like added or removed devices. All the infrastructure for that which is implemented in the iommu-drivers today will then be in the iommu-core code. I am not sure If I understand this, but as per your earlier statement iommu is a device on the bus and its initialization would happen when bus is set up and devices are populated. So, when would device notifier call an iommu call back? This is done in the iommu_bus_init() function. It will iterate over all devices that are already on the bus and do the iommu specific initialization on them. The devices added or removed later the notifier will do the job. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host
On Tue, Sep 13, 2011 at 04:49:55PM -0400, Eric B Munson wrote: On Fri, 09 Sep 2011, Marcelo Tosatti wrote: On Thu, Sep 01, 2011 at 02:27:49PM -0600, emun...@mgebm.net wrote: On Thu, 01 Sep 2011 14:24:12 -0500, Anthony Liguori wrote: On 08/30/2011 07:26 AM, Marcelo Tosatti wrote: On Mon, Aug 29, 2011 at 05:27:11PM -0600, Eric B Munson wrote: Currently, when qemu stops a guest kernel that guest will issue a soft lockup message when it resumes. This set provides the ability for qemu to comminucate to the guest that it has been stopped. When the guest hits the watchdog on resume it will check if it was suspended before issuing the warning. Eric B Munson (4): Add flag to indicate that a vm was stopped by the host Add functions to check if the host has stopped the vm Add generic stubs for kvm stop check functions Add check for suspended vm in softlockup detector arch/x86/include/asm/pvclock-abi.h |1 + arch/x86/include/asm/pvclock.h |2 ++ arch/x86/kernel/kvmclock.c | 14 ++ include/asm-generic/pvclock.h | 14 ++ kernel/watchdog.c | 12 5 files changed, 43 insertions(+), 0 deletions(-) create mode 100644 include/asm-generic/pvclock.h -- 1.7.4.1 How is the host supposed to set this flag? As mentioned previously, if you save save/restore the offset added to kvmclock on stop/cont (and the TSC MSR, forgot to mention that), no paravirt infrastructure is required. Which means the issue is also fixed for older guests. Marcelo, I think that stopping the TSC is the wrong approach because it will break time between the two systems so timething that expects the monotonic clock to move consistently will be wrong. In case the VM stops for whatever reason, the host system is not supposed to adjust time related hardware state to compensate, in an attempt to present apparent continuous time. If you save a VM and then restore it later, it is the guest responsability to adjust its time representation. QEMU exposing continuous TSC and kvmclock state between stop and cont should not be a reason to introduce new paravirt infrastructure. IMO, messing with the TSC at run time to avoid a watchdog message is the wrong solution, better to teach the watchdog to ignore this special case. OK then, it is not a harmful addition, can you post the QEMU patches to set the ignore watchdog bit. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [dhcp] Use random transaction ID to associate messages
On Thu, 15 Sep 2011 21:00:38 +0800, Amos Kong ak...@redhat.com wrote: + netdev-xid = random(); This will not work for reboots. The decision that the hardware address is choosen was not accidental. Not sure if some DHCP server will count on the ID. (RFC 2131 Retain DHCP client configuration across server reboots, and, whenever possible, a DHCP client should be assigned the same configuration parameters despite restarts of the DHCP mechanism). If not so I am fine with the patch. Hagen -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode
The netlink patch is still in the works. I will post the patches after I clean it up a bit and also accommodate or find answers to most questions discussed for non-passthru case. Thought I will post the netlink interface here to see if anyone has any early comments. I have a rtnl_link_ops-set_rx_filter defined. [IFLA_RX_FILTER] = { [IFLA_ADDRESS_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_VLAN_FILTER] = { [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } } Some open questions: - The VLAN filter above shows a VLAN list. It could also be a bitmap or the interface could provide both a bitmap and VLAN list for more flexibility . Like the below [IFLA_RX_FILTER] = { [IFLA_ADDRESS_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_VLAN_FILTER] = { [IFLA_VLAN_BITMAP] [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } } - Do you see any advantage in keeping Unicast and multicast address list separate ? Something like the below : [IFLA_RX_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_UC_ADDRESS_FILTER] = { [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_MC_ADDRESS_FILTER] = { [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_VLAN_FILTER] = { [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } } - Is there any need to keep address and vlan filters separate. And have two rtnl_link_ops, set_rx_address_filter, set_rx_vlan_filter ?. I don't see one . [IFLA_RX_ADDRESS_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_RX_VLAN_FILTER] = { [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } Thanks, Roopa On 9/12/11 10:02 AM, Roopa Prabhu ropra...@cisco.com wrote: On 9/11/11 12:03 PM, Michael S. Tsirkin m...@redhat.com wrote: On Sun, Sep 11, 2011 at 06:18:01AM -0700, Roopa Prabhu wrote: On 9/11/11 2:44 AM, Michael S. Tsirkin m...@redhat.com wrote: Yes, but what I mean is, if the size of the single filter table is limited, we need to decide how many addresses is each guest allowed. If we let one guest ask for as many as it wants, it can lock others out. Yes true. In these cases ie when the number of unicast addresses being registered is more than it can handle, The VF driver will put the VF in promiscuous mode (Or at least its supposed to do. I think all drivers do that). Thanks, Roopa Right, so that works at least but likely performs worse than a hardware filter. So we better allocate it in some fair way, as a minimum. Maybe a way for the admin to control that allocation is useful. Yes I think we will have to do something like that. There is a maximum that hw can support. Might need to consider that too. But there is no interface to get that today. I think the virtualization case gets a little trickier. Virtio-net allows upto 64 unicast addresses. But the lowerdev may allow only upto say 10 unicast addresses (I think intel supports 10 unicast addresses on the VF). Am not sure if there is a good way to notify the guest of blocked addresses. Maybe putting the lower dev in promiscuous mode could be a policy decision too in this case. One other thing, I had indicated that I will look up details on opening my patch for non-passthru to enable hw filtering (without adding filtering support in macvlan right away. Ie phase1). Turns out in current code in macvlan_handle_frame, for non-passthru case, it does not fwd unicast pkts destined to macs other than the ones in macvlan hash. So a filter or hash lookup there for additional unicast addresses needs to be definitely added for non-passthru. Thanks, Roopa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [dhcp] Use random transaction ID to associate messages
- Original Message - On Thu, 15 Sep 2011 21:00:38 +0800, Amos Kong ak...@redhat.com wrote: + netdev-xid = random(); This will not work for reboots. The decision that the hardware address is choosen was not accidental. Not sure if some DHCP server will count on the ID. (RFC 2131 Retain DHCP client configuration across server reboots, and, whenever possible, a DHCP client should be assigned the same configuration parameters despite restarts of the DHCP mechanism). If not so I am fine with the patch. Hagen But a DHCP client should be identified by its MAC, not the xid. Y. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [dhcp] Use random transaction ID to associate messages
On Thu, 2011-09-15 at 10:13 -0400, Yaniv Kaul wrote: - Original Message - On Thu, 15 Sep 2011 21:00:38 +0800, Amos Kong ak...@redhat.com wrote: + netdev-xid = random(); This will not work for reboots. The decision that the hardware address is choosen was not accidental. Not sure if some DHCP server will count on the ID. (RFC 2131 Retain DHCP client configuration across server reboots, and, whenever possible, a DHCP client should be assigned the same configuration parameters despite restarts of the DHCP mechanism). If not so I am fine with the patch. Hagen But a DHCP client should be identified by its MAC, not the xid. Y. DHCP server may not be aware of MAC address. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests] apic: test simultaneous NMIs
If multiple NMIs occur simultaneously, the first is handled while the others are collapsed and queued. But the current implementation may collapse all NMIs into the first if timing is bad. Signed-off-by: Avi Kivity a...@redhat.com --- x86/apic.c | 75 1 files changed, 75 insertions(+), 0 deletions(-) diff --git a/x86/apic.c b/x86/apic.c index 1366185..c51e6a5 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -198,6 +198,80 @@ static void test_sti_nmi(void) report(nmi-after-sti, nmi_hlt_counter == 0); } +static volatile bool nmi_done, nmi_flushed; +static volatile int nmi_received; +static volatile int cpu0_nmi_ctr1, cpu1_nmi_ctr1; +static volatile int cpu0_nmi_ctr2, cpu1_nmi_ctr2; + +static void multiple_nmi_handler(isr_regs_t *regs) +{ +++nmi_received; +} + +static void kick_me_nmi(void *blah) +{ +while (!nmi_done) { + ++cpu1_nmi_ctr1; + while (cpu1_nmi_ctr1 != cpu0_nmi_ctr1 !nmi_done) { + pause(); + } + if (nmi_done) { + return; + } + apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_NMI | APIC_INT_ASSERT, 0); + /* make sure the NMI has arrived by sending an IPI after it */ + apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_FIXED | APIC_INT_ASSERT + | 0x44, 0); + ++cpu1_nmi_ctr2; + while (cpu1_nmi_ctr2 != cpu0_nmi_ctr2 !nmi_done) { + pause(); + } +} +} + +static void flush_nmi(isr_regs_t *regs) +{ +nmi_flushed = true; +apic_write(APIC_EOI, 0); +} + +static void test_multiple_nmi(void) +{ +int i; +bool ok = true; + +if (cpu_count() 2) { + return; +} + +sti(); +handle_irq(2, multiple_nmi_handler); +handle_irq(0x44, flush_nmi); +on_cpu_async(1, kick_me_nmi, 0); +for (i = 0; i 100; ++i) { + nmi_flushed = false; + nmi_received = 0; + ++cpu0_nmi_ctr1; + while (cpu1_nmi_ctr1 != cpu0_nmi_ctr1) { + pause(); + } + apic_icr_write(APIC_DEST_PHYSICAL | APIC_DM_NMI | APIC_INT_ASSERT, 0); + while (!nmi_flushed) { + pause(); + } + if (nmi_received != 2) { + ok = false; + break; + } + ++cpu0_nmi_ctr2; + while (cpu1_nmi_ctr2 != cpu0_nmi_ctr2) { + pause(); + } +} +nmi_done = true; +report(multiple nmi, ok); +} + int main() { setup_vm(); @@ -215,6 +289,7 @@ int main() test_ioapic_intr(); test_ioapic_simultaneous(); test_sti_nmi(); +test_multiple_nmi(); printf(\nsummary: %d tests, %d failures\n, g_tests, g_fail); -- 1.7.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [dhcp] Use random transaction ID to associate messages
- Original Message - - Original Message - On Thu, 15 Sep 2011 21:00:38 +0800, Amos Kong ak...@redhat.com wrote: + netdev-xid = random(); This will not work for reboots. The decision that the hardware address is choosen was not accidental. Not sure if some DHCP server will count on the ID. (RFC 2131 Retain DHCP client configuration across server reboots, and, whenever possible, a DHCP client should be assigned the same configuration parameters despite restarts of the DHCP mechanism). If not so I am fine with the patch. Hi Hagen, rfc2131 clearly describes that we need a random xid, I don't think xid is a port of DHCP client configuration, it only be used to associate messages and responses between client and server. I would post a patch to ipxe maillist later if it's ok. Thanks, Amos But a DHCP client should be identified by its MAC, not the xid. Y. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] KVM: Fix simultaneous NMIs
If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Fix by using a counter for pending NMIs instead of a bool; collapsing happens when the NMI window reopens. Signed-off-by: Avi Kivity a...@redhat.com --- Not sure whether this interacts correctly with NMI-masked-by-STI or with save/restore. arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/svm.c |1 + arch/x86/kvm/vmx.c |3 ++- arch/x86/kvm/x86.c | 33 +++-- arch/x86/kvm/x86.h |7 +++ 5 files changed, 26 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6ab4241..3a95885 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -413,7 +413,7 @@ struct kvm_vcpu_arch { u32 tsc_catchup_mult; s8 tsc_catchup_shift; - bool nmi_pending; + atomic_t nmi_pending; bool nmi_injected; struct mtrr_state_type mtrr_state; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index e7ed4b1..d4c792f 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3609,6 +3609,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) if ((svm-vcpu.arch.hflags HF_IRET_MASK) kvm_rip_read(svm-vcpu) != svm-nmi_iret_rip) { svm-vcpu.arch.hflags = ~(HF_NMI_MASK | HF_IRET_MASK); + kvm_collapse_pending_nmis(svm-vcpu); kvm_make_request(KVM_REQ_EVENT, svm-vcpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a0d6bd9..745dadb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4761,6 +4761,7 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu) cpu_based_vm_exec_control = ~CPU_BASED_VIRTUAL_NMI_PENDING; vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control); ++vcpu-stat.nmi_window_exits; + kvm_collapse_pending_nmis(vcpu); kvm_make_request(KVM_REQ_EVENT, vcpu); return 1; @@ -5790,7 +5791,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) if (vmx_interrupt_allowed(vcpu)) { vmx-soft_vnmi_blocked = 0; } else if (vmx-vnmi_blocked_time 10LL - vcpu-arch.nmi_pending) { + atomic_read(vcpu-arch.nmi_pending)) { /* * This CPU don't support us in finding the end of an * NMI-blocked window if the guest runs with IRQs diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b37f18..d4f45e0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -359,8 +359,8 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) void kvm_inject_nmi(struct kvm_vcpu *vcpu) { + atomic_inc(vcpu-arch.nmi_pending); kvm_make_request(KVM_REQ_EVENT, vcpu); - vcpu-arch.nmi_pending = 1; } EXPORT_SYMBOL_GPL(kvm_inject_nmi); @@ -2844,7 +2844,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI); events-nmi.injected = vcpu-arch.nmi_injected; - events-nmi.pending = vcpu-arch.nmi_pending; + events-nmi.pending = atomic_read(vcpu-arch.nmi_pending) != 0; events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu); events-nmi.pad = 0; @@ -2878,7 +2878,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, vcpu-arch.nmi_injected = events-nmi.injected; if (events-flags KVM_VCPUEVENT_VALID_NMI_PENDING) - vcpu-arch.nmi_pending = events-nmi.pending; + atomic_set(vcpu-arch.nmi_pending, events-nmi.pending); kvm_x86_ops-set_nmi_mask(vcpu, events-nmi.masked); if (events-flags KVM_VCPUEVENT_VALID_SIPI_VECTOR) @@ -4763,7 +4763,7 @@ int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip) kvm_set_rflags(vcpu, ctxt-eflags); if (irq == NMI_VECTOR) - vcpu-arch.nmi_pending = false; + atomic_set(vcpu-arch.nmi_pending, 0); else vcpu-arch.interrupt.pending = false; @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) } /* try to inject new event if pending */ - if (vcpu-arch.nmi_pending) { + if (atomic_read(vcpu-arch.nmi_pending)) { if (kvm_x86_ops-nmi_allowed(vcpu)) { - vcpu-arch.nmi_pending = false; + atomic_dec(vcpu-arch.nmi_pending); vcpu-arch.nmi_injected = true; kvm_x86_ops-set_nmi(vcpu); } @@ -5604,10 +5604,14 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu) } } +static bool
Re: [PATCH] [dhcp] Use random transaction ID to associate messages
On Thu, 2011-09-15 at 10:43 -0400, Amos Kong wrote: - Original Message - - Original Message - On Thu, 15 Sep 2011 21:00:38 +0800, Amos Kong ak...@redhat.com wrote: + netdev-xid = random(); This will not work for reboots. The decision that the hardware address is choosen was not accidental. Not sure if some DHCP server will count on the ID. (RFC 2131 Retain DHCP client configuration across server reboots, and, whenever possible, a DHCP client should be assigned the same configuration parameters despite restarts of the DHCP mechanism). If not so I am fine with the patch. Hi Hagen, rfc2131 clearly describes that we need a random xid, I don't think xid is a port of DHCP client configuration, it only be used to associate messages and responses between client and server. I would post a patch to ipxe maillist later if it's ok. rfc2131 only required that A DHCP client MUST choose 'xid's in such a way as to minimize the chance of using an 'xid' identical to one used by another client.. The 'random xid' suggestion is listed merely as an example. The way I see it using a xid based on MAC instead of a random number is safer since the odds for same MAC on the same network are pretty slim since it would cause problems on other layers in the network. Whats the reason behind this patch? Whats wrong with current selection of xid? -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [dhcp] Use random transaction ID to associate messages
Whole Archive: http://marc.info/?l=kvmm=131609166918121w=2 - Original Message - On Thu, 2011-09-15 at 10:43 -0400, Amos Kong wrote: - Original Message - - Original Message - On Thu, 15 Sep 2011 21:00:38 +0800, Amos Kong ak...@redhat.com wrote: + netdev-xid = random(); This will not work for reboots. The decision that the hardware address is choosen was not accidental. Not sure if some DHCP server will count on the ID. (RFC 2131 Retain DHCP client configuration across server reboots, and, whenever possible, a DHCP client should be assigned the same configuration parameters despite restarts of the DHCP mechanism). If not so I am fine with the patch. Hi Hagen, rfc2131 clearly describes that we need a random xid, I don't think xid is a port of DHCP client configuration, it only be used to associate messages and responses between client and server. I would post a patch to ipxe maillist later if it's ok. rfc2131 only required that A DHCP client MUST choose 'xid's in such a way as to minimize the chance of using an 'xid' identical to one used by another client.. The 'random xid' suggestion is listed merely as an example. The way I see it using a xid based on MAC instead of a random number is safer since the odds for same MAC on the same network are pretty slim since it would cause problems on other layers in the network. Users may boot up a QEMU guest without default mac address, it's easy to repeat. Yaniv, what real problem do you touched? only not in accordance to the RFC? Try to re-start host network, I can capture random dhcp idx, it's not fixed. Amos Whats the reason behind this patch? Whats wrong with current selection of xid? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] KVM: Fix simultaneous NMIs
On 2011-09-15 16:45, Avi Kivity wrote: If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Can you describe the race in a few more details here (sometimes sounds like I don't know when :) )? Fix by using a counter for pending NMIs instead of a bool; collapsing happens when the NMI window reopens. Signed-off-by: Avi Kivity a...@redhat.com --- Not sure whether this interacts correctly with NMI-masked-by-STI or with save/restore. arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/svm.c |1 + arch/x86/kvm/vmx.c |3 ++- arch/x86/kvm/x86.c | 33 +++-- arch/x86/kvm/x86.h |7 +++ 5 files changed, 26 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6ab4241..3a95885 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -413,7 +413,7 @@ struct kvm_vcpu_arch { u32 tsc_catchup_mult; s8 tsc_catchup_shift; - bool nmi_pending; + atomic_t nmi_pending; bool nmi_injected; struct mtrr_state_type mtrr_state; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index e7ed4b1..d4c792f 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3609,6 +3609,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) if ((svm-vcpu.arch.hflags HF_IRET_MASK) kvm_rip_read(svm-vcpu) != svm-nmi_iret_rip) { svm-vcpu.arch.hflags = ~(HF_NMI_MASK | HF_IRET_MASK); + kvm_collapse_pending_nmis(svm-vcpu); kvm_make_request(KVM_REQ_EVENT, svm-vcpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a0d6bd9..745dadb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4761,6 +4761,7 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu) cpu_based_vm_exec_control = ~CPU_BASED_VIRTUAL_NMI_PENDING; vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control); ++vcpu-stat.nmi_window_exits; + kvm_collapse_pending_nmis(vcpu); kvm_make_request(KVM_REQ_EVENT, vcpu); return 1; @@ -5790,7 +5791,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) if (vmx_interrupt_allowed(vcpu)) { vmx-soft_vnmi_blocked = 0; } else if (vmx-vnmi_blocked_time 10LL -vcpu-arch.nmi_pending) { +atomic_read(vcpu-arch.nmi_pending)) { /* * This CPU don't support us in finding the end of an * NMI-blocked window if the guest runs with IRQs diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b37f18..d4f45e0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -359,8 +359,8 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) void kvm_inject_nmi(struct kvm_vcpu *vcpu) { + atomic_inc(vcpu-arch.nmi_pending); kvm_make_request(KVM_REQ_EVENT, vcpu); - vcpu-arch.nmi_pending = 1; Does the reordering matter? Do we need barriers? } EXPORT_SYMBOL_GPL(kvm_inject_nmi); @@ -2844,7 +2844,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI); events-nmi.injected = vcpu-arch.nmi_injected; - events-nmi.pending = vcpu-arch.nmi_pending; + events-nmi.pending = atomic_read(vcpu-arch.nmi_pending) != 0; events-nmi.masked = kvm_x86_ops-get_nmi_mask(vcpu); events-nmi.pad = 0; @@ -2878,7 +2878,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, vcpu-arch.nmi_injected = events-nmi.injected; if (events-flags KVM_VCPUEVENT_VALID_NMI_PENDING) - vcpu-arch.nmi_pending = events-nmi.pending; + atomic_set(vcpu-arch.nmi_pending, events-nmi.pending); kvm_x86_ops-set_nmi_mask(vcpu, events-nmi.masked); if (events-flags KVM_VCPUEVENT_VALID_SIPI_VECTOR) @@ -4763,7 +4763,7 @@ int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip) kvm_set_rflags(vcpu, ctxt-eflags); if (irq == NMI_VECTOR) - vcpu-arch.nmi_pending = false; + atomic_set(vcpu-arch.nmi_pending, 0); else vcpu-arch.interrupt.pending = false; @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) } /* try to inject new event if pending */ - if (vcpu-arch.nmi_pending) { + if (atomic_read(vcpu-arch.nmi_pending)) { if (kvm_x86_ops-nmi_allowed(vcpu)) { - vcpu-arch.nmi_pending = false; + atomic_dec(vcpu-arch.nmi_pending); Here we lost NMIs in the past by
Re: [PATCH] kvm tools: Allow remapping guest TTY into host PTS
Sasha, So far so good! I applied your patch to an older version of kvm-tool that I had hacked on and it works for a simple test. So I think that I can do some kernel hacking with kvm tool! Very cool. I'm tested with the older version of kvm-tool because I am seeing a bug with an old kernel (2.6.28.10) and the latest version of kvm-tool. It is an old kernel, and now that I can debug more easily; hopefully I won't require it. In case this is worthwhile the error I'm seeing is below. While I do have 9p compiled into my kernel, I'm not actually using it. I haven't tried without the 9p compiled in. $ sudo ~/.../unpatched/linux-kvm/tools/kvm/kvm run -c 1 -m 2048 -k ./bzImage-2.6.28.10 \ --console serial -p 'console=ttyS0 ip=192.168.122.2 ' -i ./initramfs-guest.img \ -n tap --host-ip 192.168.122.1 --guest-ip 192.168.122.2 --shmem pci:0xc800:16m:create ... [1.245232] Installing 9P2000 support [1.246826] 9p: virtio: Maximum channels exceeded [1.248674] [ cut here ] [1.250291] kernel BUG at net/9p/trans_virtio.c:240! [1.252018] invalid opcode: [#1] SMP [1.252491] last sysfs file: [1.252491] Dumping ftrace buffer: [1.252491](ftrace buffer empty) [1.252491] CPU 0 [1.252491] Modules linked in: [1.252491] Pid: 1, comm: swapper Not tainted 2.6.28.10big_64 #6 [1.252491] RIP: 0010:[8057ee2f] [8057ee2f] p9_virtio_probe+0xcf/0x120 [1.252491] RSP: 0018:88007ec2bc90 EFLAGS: 00010286 [1.252491] RAX: 0038 RBX: 0001 RCX: [1.252491] RDX: 807d6978 RSI: 0086 RDI: 0246 [1.252491] RBP: 88007d6ae800 R08: R09: [1.252491] R10: R11: R12: 88007d6ae808 [1.252491] R13: R14: 00013200 R15: [1.252491] FS: () GS:809ee000() knlGS: [1.252491] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [1.252491] CR2: 7f77866892c0 CR3: 00201000 CR4: 06e0 [1.252491] DR0: DR1: DR2: [1.252491] DR3: DR6: 0ff0 DR7: 0400 [1.252491] Process swapper (pid: 1, threadinfo 88007ec2a000, task 88007ec29390) [1.252491] Stack: [1.252491] 88007d6ae808 804fe696 808106e0 88007d6ae800 [1.252491] 88007d6ae808 804fe8db 808106e0 [1.252491] 88007d6ae808 8047ee3a 808106e0 88007d6ae808 [1.252491] Call Trace: [1.252491] [804fe696] ? add_status+0x26/0x50 [1.252491] [804fe8db] ? virtio_dev_probe+0xab/0xf0 [1.252491] [8047ee3a] ? driver_probe_device+0x9a/0x1b0 [1.252491] [8047eff3] ? __driver_attach+0xa3/0xb0 [1.252491] [8047ef50] ? __driver_attach+0x0/0xb0 [1.252491] [8047e3a8] ? bus_for_each_dev+0x58/0x80 [1.252491] [8047e632] ? bus_add_driver+0xb2/0x230 [1.252491] [8047f1d7] ? driver_register+0x67/0x130 [1.252491] [8059a5cf] ? _spin_lock+0xf/0x20 [1.252491] [80578ce2] ? v9fs_register_trans+0x42/0x50 [1.252491] [809897ea] ? p9_virtio_init+0x0/0x24 [1.252491] [80209042] ? _stext+0x42/0x1c0 [1.252491] [803e40db] ? ida_get_new_above+0x14b/0x220 [1.252491] [802ebd32] ? kmem_cache_alloc+0x102/0x110 [1.252491] [803e431b] ? idr_pre_get+0x4b/0x90 [1.252491] [8059a5cf] ? _spin_lock+0xf/0x20 [1.252491] [80342d72] ? proc_register+0x142/0x240 [1.252491] [8095ad23] ? kernel_init+0x115/0x15d [1.252491] [8095ad1c] ? kernel_init+0x10e/0x15d [1.252491] [8022eaf9] ? child_rip+0xa/0x11 [1.252491] [8095ac0e] ? kernel_init+0x0/0x15d [1.252491] [8022eaef] ? child_rip+0x0/0x11 [1.252491] Code: 68 fa e6 ff c6 83 e1 85 b1 80 00 c6 83 e0 85 b1 80 01 31 c0 48 83 c4 10 5b 5d 41 5c c3 48 c7 c7 e0 42 6e 80 3 1 c0 e8 0b 8b 01 00 0f 0b eb fe 48 8b 95 78 02 00 00 48 89 c7 48 89 44 24 08 ff 52 [1.252491] RIP [8057ee2f] p9_virtio_probe+0xcf/0x120 [1.252491] RSP 88007ec2bc90 [1.360254] ---[ end trace 695d68cac3254cff ]--- [1.361863] Kernel panic - not syncing: Attempted to kill init! [1.364043] Rebooting in 1 seconds.. *** Compatability Warning *** virtio-9p device was not detected While you have requested a virtio-9p device, the guest kernel didn't seem to detect it. Please make sure that the kernel was compiled with CONFIG_NET_9P_VIRTIO. # KVM session ended normally. For this kernel, CONFIG_NET_9P_VIRTIO is defined, but the kernel is old, so there may be issues. \dae On Thu, Sep 15, 2011 at 03:28:46PM +0300,
Re: [RFC] KVM: Fix simultaneous NMIs
On 09/15/2011 07:01 PM, Jan Kiszka wrote: On 2011-09-15 16:45, Avi Kivity wrote: If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Can you describe the race in a few more details here (sometimes sounds like I don't know when :) )? In this case it was I'm in a hurry. void kvm_inject_nmi(struct kvm_vcpu *vcpu) { + atomic_inc(vcpu-arch.nmi_pending); kvm_make_request(KVM_REQ_EVENT, vcpu); - vcpu-arch.nmi_pending = 1; Does the reordering matter? I think so. Suppose the vcpu enters just after kvm_make_request(); it sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it wasn't set set. Then comes a kick, the guest is reentered with nmi_pending set but KVM_REQ_EVENT clear and sails through the check and enters the guest. The NMI is delayed until the next KVM_REQ_EVENT. Do we need barriers? Yes. @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) } /* try to inject new event if pending */ - if (vcpu-arch.nmi_pending) { + if (atomic_read(vcpu-arch.nmi_pending)) { if (kvm_x86_ops-nmi_allowed(vcpu)) { - vcpu-arch.nmi_pending = false; + atomic_dec(vcpu-arch.nmi_pending); Here we lost NMIs in the past by overwriting nmi_pending while another one was already queued, right? One place, yes. The other is kvm_inject_nmi() - if the first nmi didn't get picked up by the vcpu by the time the second nmi arrives, we lose the second nmi. if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { inject_pending_event(vcpu); /* enable NMI/IRQ window open exits if needed */ - if (nmi_pending) + if (atomic_read(vcpu-arch.nmi_pending) + nmi_in_progress(vcpu)) Is nmi_pending !nmi_in_progress possible at all? Yes, due to NMI-blocked-by-STI. A really touchy area. Is it rather a BUG condition? No. If not, what will happen next? The NMI window will open and we'll inject the NMI. But I think we have a bug here - we should only kvm_collapse_nmis() if an NMI handler was indeed running, yet we do it unconditionally. +static inline void kvm_collapse_pending_nmis(struct kvm_vcpu *vcpu) +{ + /* Collapse all NMIs queued while an NMI handler was running to one */ + if (atomic_read(vcpu-arch.nmi_pending)) + atomic_set(vcpu-arch.nmi_pending, 1); Is it OK that NMIs injected after the collapse will increment this to 1 again? Or is that impossible? It's possible and okay. We're now completing execution of IRET. Doing atomic_set() after atomic_inc() means the NMI happened before IRET completed, and vice versa. Since these events are asynchronous, we're free to choose one or the other (a self-IPI-NMI just before the IRET must be swallowed, and a self-IPI-NMI just after the IRET would only be executed after the next time around the handler). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Allow remapping guest TTY into host PTS
On Thu, 2011-09-15 at 09:52 -0700, David Evensky wrote: Sasha, So far so good! I applied your patch to an older version of kvm-tool that I had hacked on and it works for a simple test. So I think that I can do some kernel hacking with kvm tool! Very cool. Awesome! I'm tested with the older version of kvm-tool because I am seeing a bug with an old kernel (2.6.28.10) and the latest version of kvm-tool. It is an old kernel, and now that I can debug more easily; hopefully I won't require it. In case this is worthwhile the error I'm seeing is below. While I do have 9p compiled into my kernel, I'm not actually using it. I haven't tried without the 9p compiled in. [snip] For this kernel, CONFIG_NET_9P_VIRTIO is defined, but the kernel is old, so there may be issues. \dae I've noticed that 9p/virtio-9p is a bit unstable in older versions, for example: you can't use 9p rootfs with kernels older than 2.6.38 (which isn't that old really). -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] KVM: Fix simultaneous NMIs
On 2011-09-15 19:02, Avi Kivity wrote: On 09/15/2011 07:01 PM, Jan Kiszka wrote: On 2011-09-15 16:45, Avi Kivity wrote: If simultaneous NMIs happen, we're supposed to queue the second and next (collapsing them), but currently we sometimes collapse the second into the first. Can you describe the race in a few more details here (sometimes sounds like I don't know when :) )? In this case it was I'm in a hurry. void kvm_inject_nmi(struct kvm_vcpu *vcpu) { + atomic_inc(vcpu-arch.nmi_pending); kvm_make_request(KVM_REQ_EVENT, vcpu); - vcpu-arch.nmi_pending = 1; Does the reordering matter? I think so. Suppose the vcpu enters just after kvm_make_request(); it sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it wasn't set set. Then comes a kick, the guest is reentered with nmi_pending set but KVM_REQ_EVENT clear and sails through the check and enters the guest. The NMI is delayed until the next KVM_REQ_EVENT. That makes sense - and the old code looks more strange now. Do we need barriers? Yes. @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) } /* try to inject new event if pending */ - if (vcpu-arch.nmi_pending) { + if (atomic_read(vcpu-arch.nmi_pending)) { if (kvm_x86_ops-nmi_allowed(vcpu)) { - vcpu-arch.nmi_pending = false; + atomic_dec(vcpu-arch.nmi_pending); Here we lost NMIs in the past by overwriting nmi_pending while another one was already queued, right? One place, yes. The other is kvm_inject_nmi() - if the first nmi didn't get picked up by the vcpu by the time the second nmi arrives, we lose the second nmi. Thinking this through again, it's actually not yet clear to me what we are modeling here: If two NMI events arrive almost perfectly in parallel, does the real hardware guarantee that they will always cause two NMI events in the CPU? Then this is required. Otherwise I just lost understanding again why we were loosing NMIs here and in kvm_inject_nmi (maybe elsewhere then?). if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { inject_pending_event(vcpu); /* enable NMI/IRQ window open exits if needed */ - if (nmi_pending) + if (atomic_read(vcpu-arch.nmi_pending) +nmi_in_progress(vcpu)) Is nmi_pending !nmi_in_progress possible at all? Yes, due to NMI-blocked-by-STI. A really touchy area. And we don't need the window exit notification then? I don't understand what nmi_in_progress is supposed to do here. Is it rather a BUG condition? No. If not, what will happen next? The NMI window will open and we'll inject the NMI. How will we know this? We do not request the exit, that's my worry. But I think we have a bug here - we should only kvm_collapse_nmis() if an NMI handler was indeed running, yet we do it unconditionally. +static inline void kvm_collapse_pending_nmis(struct kvm_vcpu *vcpu) +{ + /* Collapse all NMIs queued while an NMI handler was running to one */ + if (atomic_read(vcpu-arch.nmi_pending)) + atomic_set(vcpu-arch.nmi_pending, 1); Is it OK that NMIs injected after the collapse will increment this to 1 again? Or is that impossible? It's possible and okay. We're now completing execution of IRET. Doing atomic_set() after atomic_inc() means the NMI happened before IRET completed, and vice versa. Since these events are asynchronous, we're free to choose one or the other (a self-IPI-NMI just before the IRET must be swallowed, and a self-IPI-NMI just after the IRET would only be executed after the next time around the handler). Need to think through this separately. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] KVM: Fix simultaneous NMIs
On 09/15/2011 08:25 PM, Jan Kiszka wrote: I think so. Suppose the vcpu enters just after kvm_make_request(); it sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it wasn't set set. Then comes a kick, the guest is reentered with nmi_pending set but KVM_REQ_EVENT clear and sails through the check and enters the guest. The NMI is delayed until the next KVM_REQ_EVENT. That makes sense - and the old code looks more strange now. I think it dates to the time all NMIs were synchronous. /* try to inject new event if pending */ -if (vcpu-arch.nmi_pending) { +if (atomic_read(vcpu-arch.nmi_pending)) { if (kvm_x86_ops-nmi_allowed(vcpu)) { -vcpu-arch.nmi_pending = false; +atomic_dec(vcpu-arch.nmi_pending); Here we lost NMIs in the past by overwriting nmi_pending while another one was already queued, right? One place, yes. The other is kvm_inject_nmi() - if the first nmi didn't get picked up by the vcpu by the time the second nmi arrives, we lose the second nmi. Thinking this through again, it's actually not yet clear to me what we are modeling here: If two NMI events arrive almost perfectly in parallel, does the real hardware guarantee that they will always cause two NMI events in the CPU? Then this is required. It's not 100% clear from the SDM, but this is what I understood from it. And it's needed - the NMI handlers are now being reworked to handle just one NMI source (hopefully the cheapest) in the handler, and if we detect a back-to-back NMI, handle all possible NMI sources. This optimization is needed in turn so we can use Jeremy's paravirt spinlock framework, which requires a sleep primitive and a wake-up-even-if-the-sleeper-has-interrupts-disabled primitive. i thought of using HLT and NMIs respectively, but that means we need a cheap handler (i.e. don't go reading PMU MSRs). Otherwise I just lost understanding again why we were loosing NMIs here and in kvm_inject_nmi (maybe elsewhere then?). Because of that. if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { inject_pending_event(vcpu); /* enable NMI/IRQ window open exits if needed */ -if (nmi_pending) +if (atomic_read(vcpu-arch.nmi_pending) + nmi_in_progress(vcpu)) Is nmi_pending !nmi_in_progress possible at all? Yes, due to NMI-blocked-by-STI. A really touchy area. And we don't need the window exit notification then? I don't understand what nmi_in_progress is supposed to do here. We need the window notification in both cases. If we're recovering from STI, then we don't need to collapse NMIs. If we're completing an NMI handler, then we do need to collapse NMIs (since the queue length is two, and we just completed one). If not, what will happen next? The NMI window will open and we'll inject the NMI. How will we know this? We do not request the exit, that's my worry. I think we do? Oh, but this patch breaks it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [dhcp] Use random transaction ID to associate messages
On Thu, Sep 15, 2011 at 05:53:02PM +0300, Sasha Levin wrote: [...] The 'random xid' suggestion is listed merely as an example. The way I see it using a xid based on MAC instead of a random number is safer since the odds for same MAC on the same network are pretty slim since it would cause problems on other layers in the network. I would agree with you if the current code didn't use just the last 4 bytes of the MAC address. So clients could have completely different MAC addresses (as expected), have no problems communicating in the network, but share the same final 4 bytes in the MAC address and end up generating the same xid. Probably a hash function that used all bytes of the MAC address as input would work too, but using a random number seems to be good enough (and simpler, IMO). Whats the reason behind this patch? Whats wrong with current selection of xid? I'm not sure what issue made Amos investigate the xid generation code, but the current selection of xid is wrong as it uses just the last 4 bytes of the MAC address. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci: clean all funcs when hot-removing multifunc device
On Tue, Sep 13, 2011 at 10:55 PM, Amos Kong ak...@redhat.com wrote: 'slot-funcs' is initialized in acpiphp_glue.c:register_slot() before hotpluging device, and only one entry(func 0) is added to it, no new entry will be added to the list when hotpluging devices to the slot. When we release the whole device, there is only one entry in the list, this causes func1~7 could not be released. I try to add entries for all hotpluged device in enable_device(), but it doesn't work, because 'slot-funcs' is used in many place which we only need to process func 0. This patch just try to clean all funcs in disable_device(). ... Hotpluging multifunc of WinXp is fine. I'm going to ignore this patch for now. Please consider these questions, then repost it if you still want it: I assume you mean that Linux and WinXP are both running on top of the same SeaBIOS, and hot-remove of a multifunction device works in WinXP and fails in Linux. That sounds like Linux is broken, and we should fix it. We might want to make a SeaBIOS change for other reasons, but it'd still be good to fix Linux in case there are other similar BIOSes. Why do we need pci_scan_single_device()? The device should have been scanned already when it was added, and I would think that should have set pdev-multifunction. Your patch needs spaces around the operators in the for loop. In the changelog, it would be nice to have the URL of a bugzilla where the dmesg and DSDT are attached. Bjorn Signed-off-by: Amos Kong ak...@redhat.com --- drivers/pci/hotplug/acpiphp_glue.c | 27 ++- 1 files changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index a70fa89..3b86d1a 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -880,6 +880,8 @@ static int disable_device(struct acpiphp_slot *slot) { struct acpiphp_func *func; struct pci_dev *pdev; + struct pci_bus *bus = slot-bridge-pci_bus; + int i, num = 1; /* is this slot already disabled? */ if (!(slot-flags SLOT_ENABLED)) @@ -893,16 +895,23 @@ static int disable_device(struct acpiphp_slot *slot) func-bridge = NULL; } - pdev = pci_get_slot(slot-bridge-pci_bus, - PCI_DEVFN(slot-device, func-function)); - if (pdev) { - pci_stop_bus_device(pdev); - if (pdev-subordinate) { - disable_bridges(pdev-subordinate); - pci_disable_device(pdev); + pdev = pci_scan_single_device(bus, + PCI_DEVFN(slot-device, 0)); + if (!pdev) + goto err_exit; + if (pdev-multifunction == 1) + num = 8; + for (i=0; inum; i++) { + pdev = pci_get_slot(bus, PCI_DEVFN(slot-device, i)); + if (pdev) { + pci_stop_bus_device(pdev); + if (pdev-subordinate) { + disable_bridges(pdev-subordinate); + pci_disable_device(pdev); + } + pci_remove_bus_device(pdev); + pci_dev_put(pdev); } - pci_remove_bus_device(pdev); - pci_dev_put(pdev); } } -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci: clean all funcs when hot-removing multifunc device
(2011/09/16 4:03), Bjorn Helgaas wrote: On Tue, Sep 13, 2011 at 10:55 PM, Amos Kongak...@redhat.com wrote: 'slot-funcs' is initialized in acpiphp_glue.c:register_slot() before hotpluging device, and only one entry(func 0) is added to it, no new entry will be added to the list when hotpluging devices to the slot. When we release the whole device, there is only one entry in the list, this causes func1~7 could not be released. I try to add entries for all hotpluged device in enable_device(), but it doesn't work, because 'slot-funcs' is used in many place which we only need to process func 0. This patch just try to clean all funcs in disable_device(). ... Hotpluging multifunc of WinXp is fine. I'm going to ignore this patch for now. Please consider these questions, then repost it if you still want it: I assume you mean that Linux and WinXP are both running on top of the same SeaBIOS, and hot-remove of a multifunction device works in WinXP and fails in Linux. That sounds like Linux is broken, and we should fix it. We might want to make a SeaBIOS change for other reasons, but it'd still be good to fix Linux in case there are other similar BIOSes. No objection about fixing Linux. Why do we need pci_scan_single_device()? The device should have been scanned already when it was added, and I would think that should have set pdev-multifunction. It should be pci_get_slot() instead. Note that it needs corresponding pci_dev_put(). Regards, Kenji Kaneshige Your patch needs spaces around the operators in the for loop. In the changelog, it would be nice to have the URL of a bugzilla where the dmesg and DSDT are attached. Bjorn Signed-off-by: Amos Kongak...@redhat.com --- drivers/pci/hotplug/acpiphp_glue.c | 27 ++- 1 files changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index a70fa89..3b86d1a 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -880,6 +880,8 @@ static int disable_device(struct acpiphp_slot *slot) { struct acpiphp_func *func; struct pci_dev *pdev; + struct pci_bus *bus = slot-bridge-pci_bus; + int i, num = 1; /* is this slot already disabled? */ if (!(slot-flags SLOT_ENABLED)) @@ -893,16 +895,23 @@ static int disable_device(struct acpiphp_slot *slot) func-bridge = NULL; } - pdev = pci_get_slot(slot-bridge-pci_bus, - PCI_DEVFN(slot-device, func-function)); - if (pdev) { - pci_stop_bus_device(pdev); - if (pdev-subordinate) { - disable_bridges(pdev-subordinate); - pci_disable_device(pdev); + pdev = pci_scan_single_device(bus, + PCI_DEVFN(slot-device, 0)); + if (!pdev) + goto err_exit; + if (pdev-multifunction == 1) + num = 8; +for (i=0; inum; i++) { + pdev = pci_get_slot(bus, PCI_DEVFN(slot-device, i)); + if (pdev) { + pci_stop_bus_device(pdev); + if (pdev-subordinate) { + disable_bridges(pdev-subordinate); + pci_disable_device(pdev); + } + pci_remove_bus_device(pdev); + pci_dev_put(pdev); } - pci_remove_bus_device(pdev); - pci_dev_put(pdev); } } -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Allow remapping guest TTY into host PTS
On 09/15/2011 04:53 PM, Sasha Levin wrote: This patch adds the '-tty' option to 'kvm run' which allows the user to remap a guest TTY into a PTS on the host. Usage: 'kvm run --tty [id] [other options]' The tty will be mapped to a pts and will be printed on the screen: ' Info: Assigned terminal 1 to pty /dev/pts/X' At this point, it is possible to communicate with the guest using that pty. This is useful for debugging guest kernel using KGDB: 1. Run the guest: 'kvm run -k [vmlinuz] -p kdbgoc=ttyS1 kdbgwait --tty 1' And see which PTY got assigned to ttyS1. 2. Run GDB on the host: 'gdb [vmlinuz]' 3. Connect to the guest (from within GDB): 'target remote /dev/pty/X' 4. Start debugging! (enter 'continue' to continue boot). Cc: David Evensky even...@dancer.ca.sandia.gov Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/Makefile |1 + tools/kvm/builtin-run.c | 12 tools/kvm/hw/serial.c| 46 ++-- tools/kvm/include/kvm/term.h | 11 --- tools/kvm/term.c | 60 + tools/kvm/virtio/console.c |6 ++-- 6 files changed, 96 insertions(+), 40 deletions(-) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index efa032d..fef624d 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -115,6 +115,7 @@ OBJS += bios/bios-rom.o LIBS += -lrt LIBS += -lpthread +LIBS += -lutil # Additional ARCH settings for x86 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \ diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 5dafb15..b5c63ca 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -172,6 +172,15 @@ static int virtio_9p_rootdir_parser(const struct option *opt, const char *arg, i return 0; } +static int tty_parser(const struct option *opt, const char *arg, int unset) +{ + int tty = atoi(arg); + + term_set_tty(tty); + + return 0; +} + static int shmem_parser(const struct option *opt, const char *arg, int unset) { const u64 default_size = SHMEM_DEFAULT_SIZE; @@ -316,6 +325,9 @@ static const struct option options[] = { OPT_STRING('\0', console, console, serial or virtio, Console to use), OPT_STRING('\0', dev, dev, device_file, KVM device file), + OPT_CALLBACK('\0', tty, NULL, tty id, + Remap guest TTY into a pty on the host, + tty_parser), OPT_GROUP(Kernel options:), OPT_STRING('k', kernel, kernel_filename, kernel, diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c index b3b233f..11fa5d4 100644 --- a/tools/kvm/hw/serial.c +++ b/tools/kvm/hw/serial.c @@ -14,6 +14,7 @@ struct serial8250_device { pthread_mutex_t mutex; + u8 id; u16 iobase; u8 irq; @@ -42,6 +43,7 @@ static struct serial8250_device devices[] = { [0] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 0, .iobase = 0x3f8, .irq= 4, @@ -51,6 +53,7 @@ static struct serial8250_device devices[] = { [1] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 1, .iobase = 0x2f8, .irq= 3, @@ -60,6 +63,7 @@ static struct serial8250_device devices[] = { [2] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 2, .iobase = 0x3e8, .irq= 4, @@ -69,6 +73,7 @@ static struct serial8250_device devices[] = { [3] = { .mutex = PTHREAD_MUTEX_INITIALIZER, + .id = 3, .iobase = 0x2e8, .irq= 3, @@ -111,10 +116,10 @@ static void serial8250__receive(struct kvm *kvm, struct serial8250_device *dev) return; } - if (!term_readable(CONSOLE_8250)) + if (!term_readable(CONSOLE_8250, dev-id)) return; - c = term_getc(CONSOLE_8250); + c = term_getc(CONSOLE_8250, dev-id); if (c 0) return; @@ -123,30 +128,31 @@ static void serial8250__receive(struct kvm *kvm, struct serial8250_device *dev) dev-lsr|= UART_LSR_DR; } -/* - * Interrupts are injected for ttyS0 only. - */ void serial8250__inject_interrupt(struct kvm *kvm) { - struct serial8250_device *dev = devices[0]; + int i; - mutex_lock(dev-mutex); + for (i = 0; i 4; i++)
Re: [PATCH 4/5] KVM: PPC: e500: eliminate a trap when entering idle
On 09/05/2011 05:30 PM, Alexander Graf wrote: On 27.08.2011, at 01:31, Scott Wood wrote: +#ifdef CONFIG_E500 +/* + * Skip the overhead of HID0 accesses that KVM ignores -- + * just write MSR[WE]. + * + * We don't need _TLF_NAPPING, because under KVM we know + * it will take effect right away. + */ +if (ppc_md.power_save == e500_idle) +ppc_md.power_save = kvm_msrwe_idle; Why the if() here? To avoid replacing some other power_save() implementation. kvm_msrwe_idle() is a paravirt-optimized version of e500_idle(). However, now that e500_idle has an ifdef for e500mc, we'll need that ifdef here as well. e500mc doesn't use MSR[WE] (and if it did, we couldn't trap on it). For e500mc we'll want to make an hcall for idle (ePAPR EV_IDLE). -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html