Re: [PATCH] Add realmode test for CALL FAR IMM instruction
On 08/25/2010 09:16 AM, Wei Yongjun wrote: > Signed-off-by: Wei Yongjun > --- > x86/realmode.c |6 ++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/x86/realmode.c b/x86/realmode.c > index a833829..2e12680 100644 > --- a/x86/realmode.c > +++ b/x86/realmode.c > @@ -437,6 +437,9 @@ void test_call(void) > "ret\n\t" > "2:\t"); > MK_INSN(call_far1, "lcallw *(%ebx)\n\t"); > + MK_INSN(call_far2, ".byte 0x9a\n\t" > + ".word retf\n\t" > + ".word 0x00\n\t"); Why .byte encoding? won't "lcallw $0, $retf" (or the other way round) work? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add realmode test for CALL FAR IMM instruction
Signed-off-by: Wei Yongjun --- x86/realmode.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/x86/realmode.c b/x86/realmode.c index a833829..2e12680 100644 --- a/x86/realmode.c +++ b/x86/realmode.c @@ -437,6 +437,9 @@ void test_call(void) "ret\n\t" "2:\t"); MK_INSN(call_far1, "lcallw *(%ebx)\n\t"); + MK_INSN(call_far2, ".byte 0x9a\n\t" + ".word retf\n\t" + ".word 0x00\n\t"); MK_INSN(ret_imm,"sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b"); exec_in_big_real_mode(&insn_call1); @@ -453,6 +456,9 @@ void test_call(void) exec_in_big_real_mode(&insn_call_far1); report("call far 1", 0, 1); + exec_in_big_real_mode(&insn_call_far2); + report("call far 2", 0, 1); + exec_in_big_real_mode(&insn_ret_imm); report("ret imm 1", 0, 1); } -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)
Signed-off-by: Wei Yongjun --- arch/x86/kvm/emulate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 26f69c2..e5f7f5c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2487,7 +2487,7 @@ static struct opcode opcode_table[256] = { X8(D(SrcAcc | DstReg)), /* 0x98 - 0x9F */ D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd), - D(SrcImmFAddr | No64), N, + I(SrcImmFAddr | No64, em_call_far), N, D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N, /* 0xA0 - 0xA7 */ D(ByteOp | DstAcc | SrcMem | Mov | MemAbs), D(DstAcc | SrcMem | Mov | MemAbs), -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM timekeeping and TSC virtualization
On 08/24/2010 12:13 PM, Marcelo Tosatti wrote: On Thu, Aug 19, 2010 at 10:07:14PM -1000, Zachary Amsden wrote: This patch set implements full TSC virtualization, with both trapping and passthrough modes, and intelligent mode switching. As a result, TSC will never go backwards, we are stable against guest re-calibration attempts, VM reset, and migration. For guests which require it, the TSC khz can even be preserved on migration to a new host. The TSC will never be trapped on UP systems unless the host TSC actually runs faster than the guest; other conditions, including bad hardware and changing speeds are accomodated by using catchup mode to keep the guest passthrough TSC in line with the host clock. What is still needed on top of this is a way to force TSC trapping, or disable it entirely, for benchmarking purposes. I refrained from adding that last bit because it wasn't clear whether the best thing to do is a global 'force TSC trapping' / 'force TSC passthrough' / 'intelligent choice', or if this control should be on a per-VM level, via an ioctl(), module parameter, or sysfs. John and Thomas I have cc'd on this because it may be relevant to their interests and I always appreciate feedback, especially on a change set as large and complex as this. Enjoy. This time, there are no howler monkeys. I've included all the feedback I got from previous rounds of this and more. Applied 1-19, thanks. Great, thanks. I'll wait for more feedback on the rest and rebase. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: fix missing percpu counter destroy
On Mon, Aug 23, 2010 at 04:13:15PM +0800, Wei Yongjun wrote: > commit ad05c88266b4cce1c820928ce8a0fb7690912ba1 > (KVM: create aggregate kvm_total_used_mmu_pages value) > introduce percpu counter kvm_total_used_mmu_pages but never > destroy it, this may cause oops when rmmod & modprobe. > > Signed-off-by: Wei Yongjun > --- > arch/x86/kvm/mmu.c |5 - > 1 files changed, 4 insertions(+), 1 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM timekeeping and TSC virtualization
On Thu, Aug 19, 2010 at 10:07:14PM -1000, Zachary Amsden wrote: > This patch set implements full TSC virtualization, with both > trapping and passthrough modes, and intelligent mode switching. > As a result, TSC will never go backwards, we are stable against > guest re-calibration attempts, VM reset, and migration. For guests > which require it, the TSC khz can even be preserved on migration > to a new host. > > The TSC will never be trapped on UP systems unless the host TSC > actually runs faster than the guest; other conditions, including > bad hardware and changing speeds are accomodated by using catchup > mode to keep the guest passthrough TSC in line with the host clock. > > What is still needed on top of this is a way to force TSC > trapping, or disable it entirely, for benchmarking purposes. > I refrained from adding that last bit because it wasn't clear > whether the best thing to do is a global 'force TSC trapping' / > 'force TSC passthrough' / 'intelligent choice', or if this control > should be on a per-VM level, via an ioctl(), module parameter, > or sysfs. > > John and Thomas I have cc'd on this because it may be relevant to > their interests and I always appreciate feedback, especially on > a change set as large and complex as this. > > Enjoy. This time, there are no howler monkeys. I've included > all the feedback I got from previous rounds of this and more. Applied 1-19, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Bridged Networking Issue in Ubuntu Lucid
Rus Hughes wrote: Hi guys, I'm trying to sort out networking for a VM I've created on my Ubuntu Lucid box but the VM cannot access the Internet. I can connect to the VM (crisps) from the host (holly) and to the host from the VM. But the VM cannot connect to the Internet and the Internet cannot connect to the VM. I followed the guide here: https://help.ubuntu.com/community/KVM I created it using ubuntu-vm-builder using : ubuntu-vm-builder kvm lucid \ --domain crisps \ --dest crisps \ --arch amd64 \ --hostname crisps \ --mem 2048 \ --rootsize 4096 \ --swapsize 1024 \ --user magicalusernameofdoom \ --pass somesecretawesomepassword \ --ip 178.63.60.159 \ --mask 255.255.255.192 \ --bcast 178.63.60.191 \ --gw 178.63.60.129 \ --dns 213.133.99.99 \ --mirror http://de.archive.ubuntu.com/ubuntu \ --components main,universe,restricted,multiverse \ --addpkg openssh-server \ --libvirt qemu:///system \ --bridge br0; "virsh start crisps" starts the vm up : root 29297 0.5 3.3 2280396 271220 ? Sl 14:12 0:13 /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps -uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait -monitor chardev:monitor -boot c -drive file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on -net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0 -net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc 127.0.0.1:0 -vga cirrus Networking on the host is like so: br0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e inet addr:178.63.60.138 Bcast:178.63.60.191 Mask:255.255.255.192 inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0 TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:35266886777 (35.2 GB) TX bytes:60292047383 (60.2 GB) eth0 Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0 TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:36337218359 (36.3 GB) TX bytes:60378047518 (60.3 GB) Interrupt:29 Base address:0x4000 eth0:0Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a inet addr:178.63.60.178 Bcast:178.63.60.191 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:29 Base address:0x4000 loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0 TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:325111540 (325.1 MB) TX bytes:325111540 (325.1 MB) vnet0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:72 errors:0 dropped:0 overruns:0 frame:0 TX packets:33 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:3412 (3.4 KB) TX bytes:2698 (2.6 KB) the hosts /etc/network/interfaces is like so : auto lo iface lo inet loopback # device: eth0 auto eth0 iface eth0 inet manual auto br0 iface br0 inet static address 178.63.60.138 broadcast 178.63.60.191 netmask 255.255.255.192 gateway 178.63.60.129 network 178.63.60.128 bridge_ports eth0 bridge_stp off bridge_fd 0 bridge_maxwait 0 # subli.me.uk auto eth0:0 iface eth0:0 inet static address 178.63.60.178 netmask 255.255.255.192 If I ping the VM I see the packets on the host br0 but not vnet0, this is the output of brctl show on the host : bridge name bridge id STP enabled interfaces br0 8000.265dc9d2752e no eth0 vnet0 this is the routing table on the host Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 178.63.60.128 0.0.0.0 255.255.255.192 U 0 00 br0 0.0.0.0 178.63.60.129 0.0.0.0 UG10000 br0 this is the routing table on the VM Kernel IP routing table Destination Gate
Re: KVM timekeeping and TSC virtualization
On 08/24/2010 03:32 AM, David S. Ahern wrote: On 08/23/10 23:47, Zachary Amsden wrote: I've heard the rumor that TSC is orders of magnitude faster under VMware than under KVM from three people now, and I thought you were part of that camp. Needless to say, they are either laughably incorrect, or possess some great secret knowledge of how to make things under virtualization go faster than bare metal. I also have a magical talking unicorn, which, btw, is invisible. Extraordinary claims require extraordinary proof (the proof of my unicorn is too complex to fit in the margin of this e-mail, however, I assure you he is real). I have put in a lot of time over the past 3 years to understand how the 'magic' of virtualization works; please don't lump me into camps until I raise my hand as being part of one. My point is that kvmclock is Red Hat's answer for the future -- RHEL6, RHEL5.Y (whenever it proves reliable). What about the present? What about products based on other distributions newer than RHEL5 but pre-kvmclock? It should be obvious from this patchset... PIT or TSC. KVM did not have an in-kernel PIT implementation circa 2008, so this data is quite old. It's much faster now and will continue to get faster as exit cost goes down and the emulation gets further optimized. It was in-kernel pit in early 2008 (kernel git entry): commit 7837699fa6d7adf81f26ab73a5f6897ea1ab9d6a Author: Sheng Yang Date: Mon Jan 28 05:10:22 2008 +0800 KVM: In kernel PIT model Plus, now we have an error-free TSC. There are a lot of moving windows of what to use as a clock source, not just per major number (RHEL4, RHEL5) but minor number (e.g., TSC stability on RHEL4 -- e.g., https://bugzilla.redhat.com/show_bug.cgi?id=491154) and further maintenance releases (kvmclock requiring RHEL5.5+). That is not very friendly to a product making a transition to virtualization - and with the same code base running bare metal or in a VM. If you have old software running on broken hardware you do not get hardware performance and error-free time virtualization. With any vendor. Period. Sucks to be old *and* broken. But old with fancy new wheels, er hardware -- like commodity x86 servers running Nehalem-based processors -- is a different story. With this patchset, KVM now has a much stronger guarantee: If you have old guest software running on broken hardware, using SMP virtual machines, you do not get hardware performance and error-free time virtualization.However, if you have new guest software, non-broken hardware, or can simply run UP guests instead of SMP, you can have hardware performance, and it is now error free. Alternatively, you can sacrifice some accuracy and have hardware performance, even for SMP guests, if you can tolerate some minor cross-CPU TSC variation. No other vendor I know of can make that guarantee. Zach If the processor has a stable TSC why trap it? I realize you are trying to cover a gauntlet of hardware and guests, so maybe a nerd knob is needed to disable. Exactly. If you have a stable TSC, we don't trap it. If you don't have a stable TSC, we do. That's the point of these patches. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM timekeeping 12/35] Robust TSC compensation
On Fri, Aug 20, 2010 at 1:07 AM, Zachary Amsden wrote: [...] > + * or make a best guest using elapsed value. Perhaps s/guest/guess/ while the line is changing anyway. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SeaBIOS and GRUB booting from Virtio devices [was: GRUB and support for Virtio]
On 08/24/2010 01:51 PM, Daniel Bareiro wrote: Martin Kraus seems that recently experienced a similar problem and this was linked to SeaBIOS [1] and compiling the version from the git would solve this problem. I had no problems with GRUB trying to boot from two different IDE drives. I would like to know if there are plans to incorporate this to stable KVM in the short term. No, but the next release is very soon and will have this. It's a new feature. There's never been Regards, Anthony Liguori Regards, Daniel [1] http://www.spinics.net/lists/kvm/msg39928.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SeaBIOS and GRUB booting from Virtio devices [was: GRUB and support for Virtio]
On Tuesday, 24 August 2010 11:23:59 -0300, Daniel Bareiro wrote: > > Strange, i did 2 clean installs of centos 5.5 on an ubuntu 9.10 host > > and it went flawless with virtio devices ( both disks and network ). > Doing some tests with CentOS 5.5 on a KVM virtual machine, after doing > the installation, I added a second disk. But when trying to boot from > it, I get the following error: > > - > root (hd1,0) > > Error 21: Selected disk does not exist > - > > > The two disks are Virtio devices that are recognized when booting from > the first disk: > > - > [r...@localhost ~]# fdisk -l /dev/vda > > Disco /dev/vda: 10.7 GB, 10737418240 bytes > 255 heads, 63 sectors/track, 1305 cylinders > Unidades = cilindros de 16065 * 512 = 8225280 bytes > > Disposit. InicioComienzo Fin Bloques Id Sistema > /dev/vda1 * 11174 9430123+ fd Linux raid autodetect > /dev/vda211751305 1052257+ fd Linux raid autodetect > [r...@localhost ~]# > [r...@localhost ~]# > [r...@localhost ~]# fdisk -l /dev/vdb > > Disco /dev/vdb: 10.7 GB, 10737418240 bytes > 255 heads, 63 sectors/track, 1305 cylinders > Unidades = cilindros de 16065 * 512 = 8225280 bytes > > Disposit. InicioComienzo Fin Bloques Id Sistema > /dev/vdb1 * 11174 9430123+ fd Linux raid autodetect > /dev/vdb211751305 1052257+ fd Linux raid autodetect > - > > The idea of these tests is to set up software RAID1 on a running > system, since, it seems that Anaconda does not support installation on > degraded RAID. > > But I'm not sure if this is a problem of Virtio or that GRUB is not > recognizing the second disk. > > I made sure to modify /boot/grub/device.map with the entry for the new > disk: > > - > [r...@localhost grub]# cat /boot/grub/device.map > # this device map was generated by anaconda > (hd0) /dev/vda > (hd1) /dev/vdb > - > > And the reconfiguration of GRUB on both disks did not give problems: > > - > [r...@localhost grub]# grub --device-map=/boot/grub/device.map > > > GNU GRUB version 0.97 (640K lower / 3072K upper memory) > > [ Minimal BASH-like line editing is supported. For the first word, TAB >lists possible command completions. Anywhere else TAB lists the > possible >completions of a device/filename.] > grub> root (hd0,0) > root (hd0,0) > Filesystem type is ext2fs, partition type 0xfd > grub> setup (hd0) > setup (hd0) > Checking if "/boot/grub/stage1" exists... yes > Checking if "/boot/grub/stage2" exists... yes > Checking if "/boot/grub/e2fs_stage1_5" exists... yes > Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 15 sectors are > embedded. > succeeded > Running "install /boot/grub/stage1 (hd0) (hd0)1+15 p > (hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded > Done. > grub> root (hd1,0) > root (hd1,0) > Filesystem type is ext2fs, partition type 0xfd > grub> setup (hd1) > setup (hd1) > Checking if "/boot/grub/stage1" exists... yes > Checking if "/boot/grub/stage2" exists... yes > Checking if "/boot/grub/e2fs_stage1_5" exists... yes > Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 15 sectors are > embedded. > succeeded > Running "install /boot/grub/stage1 (hd1) (hd1)1+15 p > (hd1,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded > Done. > grub> quit > quit > - > > > Any idea what may be causing the problem? Martin Kraus seems that recently experienced a similar problem and this was linked to SeaBIOS [1] and compiling the version from the git would solve this problem. I had no problems with GRUB trying to boot from two different IDE drives. I would like to know if there are plans to incorporate this to stable KVM in the short term. Regards, Daniel [1] http://www.spinics.net/lists/kvm/msg39928.html -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Lenny - Linux user #188.598 signature.asc Description: Digital signature
Re: KVM Bridged Networking Issue in Ubuntu Lucid
On Tuesday, August 24, 2010 09:08:30 am Rus Hughes wrote: > Hi guys, > > I'm trying to sort out networking for a VM I've created on my Ubuntu > Lucid box but the VM cannot access the Internet. > I can connect to the VM (crisps) from the host (holly) and to the host > from the VM. > But the VM cannot connect to the Internet and the Internet cannot > connect to the VM. > > I followed the guide here: https://help.ubuntu.com/community/KVM > > I created it using ubuntu-vm-builder using : > > ubuntu-vm-builder kvm lucid \ > --domain crisps \ > --dest crisps \ > --arch amd64 \ > --hostname crisps \ > --mem 2048 \ > --rootsize 4096 \ > --swapsize 1024 \ > --user magicalusernameofdoom \ > --pass somesecretawesomepassword \ > --ip 178.63.60.159 \ > --mask 255.255.255.192 \ > --bcast 178.63.60.191 \ > --gw 178.63.60.129 \ > --dns 213.133.99.99 \ > --mirror http://de.archive.ubuntu.com/ubuntu \ > --components main,universe,restricted,multiverse \ > --addpkg openssh-server \ > --libvirt qemu:///system \ > --bridge br0; > > "virsh start crisps" starts the vm up : > > root 29297 0.5 3.3 2280396 271220 ? Sl 14:12 0:13 > /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps > -uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev > socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait > -monitor chardev:monitor -boot c -drive > file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on > -net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0 > -net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc > 127.0.0.1:0 -vga cirrus > > Networking on the host is like so: > > br0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e > inet addr:178.63.60.138 Bcast:178.63.60.191 > Mask:255.255.255.192 inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0 > TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:35266886777 (35.2 GB) TX bytes:60292047383 (60.2 GB) > > eth0 Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a > inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0 > TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:36337218359 (36.3 GB) TX bytes:60378047518 (60.3 GB) > Interrupt:29 Base address:0x4000 > > eth0:0Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a > inet addr:178.63.60.178 Bcast:178.63.60.191 > Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > Interrupt:29 Base address:0x4000 > Maybe look here. Should this maybe be br0:0 (or better yet, just use ip addr add to add the second address to the bridge instead of having aliases). > loLink encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:325111540 (325.1 MB) TX bytes:325111540 (325.1 MB) > > vnet0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e > inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:72 errors:0 dropped:0 overruns:0 frame:0 > TX packets:33 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:500 > RX bytes:3412 (3.4 KB) TX bytes:2698 (2.6 KB) > > the hosts /etc/network/interfaces is like so : > > auto lo > iface lo inet loopback > > # device: eth0 > auto eth0 > iface eth0 inet manual > > auto br0 > iface br0 inet static > address 178.63.60.138 > broadcast 178.63.60.191 > netmask 255.255.255.192 > gateway 178.63.60.129 > network 178.63.60.128 > bridge_ports eth0 > bridge_stp off > bridge_fd 0 > bridge_maxwait 0 > > # subli.me.uk > auto eth0:0 > iface eth0:0 inet static > address 178.63.60.178 > netmask 255.255.255.192 > > > If I ping the VM I see the packets on the host br0 but not vnet0, this > is the output of brctl show on the host : > > bridge name bridge id STP enabled interfaces > br0 8000.265dc9d2752e
Re: [PATCH] KVM: MMU: fix missing percpu counter destroy
On Mon 23 Aug at 16:13:15 +0800 yj...@cn.fujitsu.com said: > commit ad05c88266b4cce1c820928ce8a0fb7690912ba1 > (KVM: create aggregate kvm_total_used_mmu_pages value) > introduce percpu counter kvm_total_used_mmu_pages but never > destroy it, this may cause oops when rmmod & modprobe. > > Signed-off-by: Wei Yongjun Acked-by: Tim Pepper -- Tim Pepper IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at arch/x86/kernel/apic/apic.c:1238 setup_local_APIC+0x169/0x24b()
it worked. thankx a lot. On Tue, Aug 24, 2010 at 2:22 PM, Marcelo Tosatti wrote: >> [ 0.00] NR_IRQS:4352 nr_irqs:512 >> [ 0.00] Console: colour VGA+ 80x25 >> [ 0.00] console [tty0] enabled >> [ 0.00] ODEBUG: 0 of 0 active objects replaced >> [ 0.00] Fast TSC calibration failed >> [ 0.00] TSC: Unable to calibrate against PIT >> [ 0.00] TSC: No reference (HPET/PMTIMER) available >> [ 0.00] Marking TSC unstable due to could not calculate TSC khz >> [ 0.052000] Calibrating delay loop... 3932.16 BogoMIPS (lpj=7864320) > > Enabling CONFIG_KVM_GUEST and CONFIG_KVM_CLOCK should fix the warning. > >> [ 0.148000] pid_max: default: 32768 minimum: 301 >> [ 0.148000] Dentry cache hash table entries: 65536 (order: 7, 524288 >> bytes) >> [ 0.152000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) >> [ 0.152000] Mount-cache hash table entries: 256 >> [ 0.152000] Performance Events: unsupported p6 CPU model 2 no PMU >> driver, software events only. >> [ 0.16] Freeing SMP alternatives: 12k freed >> [ 0.16] Setting APIC routing to flat >> [ 0.164000] [ cut here ] >> [ 0.164000] WARNING: at arch/x86/kernel/apic/apic.c:1238 >> setup_local_APIC+0x169/0x24b() >> [ 0.164000] Hardware name: Bochs >> [ 0.168000] Pid: 1, comm: swapper Not tainted 2.6.36-rc1-tinux+ #89 >> [ 0.168000] Call Trace: >> [ 0.168000] [] warn_slowpath_common+0x75/0xb0 >> [ 0.168000] [] warn_slowpath_null+0x15/0x20 >> [ 0.172000] [] setup_local_APIC+0x169/0x24b >> [ 0.172000] [] native_smp_prepare_cpus+0x322/0x3ff >> [ 0.172000] [] kernel_init+0x57/0x1bd >> [ 0.172000] [] kernel_thread_helper+0x4/0x10 >> [ 0.176000] [] ? kernel_init+0x0/0x1bd >> [ 0.176000] [] ? kernel_thread_helper+0x0/0x10 >> [ 0.176000] ---[ end trace 4eaa2a86a8e2da22 ]--- >> [ 0.18] alloc irq_desc for 33 on node 0 >> [ 0.18] alloc kstat_irqs on node 0 >> [ 0.18] alloc irq_desc for 35 on node 0 >> [ 0.18] alloc kstat_irqs on node 0 >> [ 0.18] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 >> [ 0.22] CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03 >> [ 0.34] NMI watchdog failed to create perf event on cpu0: >> ffed > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at arch/x86/kernel/apic/apic.c:1238 setup_local_APIC+0x169/0x24b()
it worked. thankx a lot. On Tue, Aug 24, 2010 at 2:22 PM, Marcelo Tosatti wrote: >> [ 0.00] NR_IRQS:4352 nr_irqs:512 >> [ 0.00] Console: colour VGA+ 80x25 >> [ 0.00] console [tty0] enabled >> [ 0.00] ODEBUG: 0 of 0 active objects replaced >> [ 0.00] Fast TSC calibration failed >> [ 0.00] TSC: Unable to calibrate against PIT >> [ 0.00] TSC: No reference (HPET/PMTIMER) available >> [ 0.00] Marking TSC unstable due to could not calculate TSC khz >> [ 0.052000] Calibrating delay loop... 3932.16 BogoMIPS (lpj=7864320) > > Enabling CONFIG_KVM_GUEST and CONFIG_KVM_CLOCK should fix the warning. > >> [ 0.148000] pid_max: default: 32768 minimum: 301 >> [ 0.148000] Dentry cache hash table entries: 65536 (order: 7, 524288 >> bytes) >> [ 0.152000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) >> [ 0.152000] Mount-cache hash table entries: 256 >> [ 0.152000] Performance Events: unsupported p6 CPU model 2 no PMU >> driver, software events only. >> [ 0.16] Freeing SMP alternatives: 12k freed >> [ 0.16] Setting APIC routing to flat >> [ 0.164000] [ cut here ] >> [ 0.164000] WARNING: at arch/x86/kernel/apic/apic.c:1238 >> setup_local_APIC+0x169/0x24b() >> [ 0.164000] Hardware name: Bochs >> [ 0.168000] Pid: 1, comm: swapper Not tainted 2.6.36-rc1-tinux+ #89 >> [ 0.168000] Call Trace: >> [ 0.168000] [] warn_slowpath_common+0x75/0xb0 >> [ 0.168000] [] warn_slowpath_null+0x15/0x20 >> [ 0.172000] [] setup_local_APIC+0x169/0x24b >> [ 0.172000] [] native_smp_prepare_cpus+0x322/0x3ff >> [ 0.172000] [] kernel_init+0x57/0x1bd >> [ 0.172000] [] kernel_thread_helper+0x4/0x10 >> [ 0.176000] [] ? kernel_init+0x0/0x1bd >> [ 0.176000] [] ? kernel_thread_helper+0x0/0x10 >> [ 0.176000] ---[ end trace 4eaa2a86a8e2da22 ]--- >> [ 0.18] alloc irq_desc for 33 on node 0 >> [ 0.18] alloc kstat_irqs on node 0 >> [ 0.18] alloc irq_desc for 35 on node 0 >> [ 0.18] alloc kstat_irqs on node 0 >> [ 0.18] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 >> [ 0.22] CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03 >> [ 0.34] NMI watchdog failed to create perf event on cpu0: >> ffed > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to debug KVM with gdb?
Hello, I would like to debug the OS kernel loaded and simulated by kvm. I know that QEMU have some problem with KVM debug. So how can I debug this kernel? Is there any suggestion? Best regards, -- Hao Shen -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On Tue, Aug 24, 2010 at 05:28:12PM +0300, Avi Kivity wrote: > On 08/24/2010 05:06 PM, Gleb Natapov wrote: > >On Tue, Aug 24, 2010 at 05:01:10PM +0300, Avi Kivity wrote: > >> On 08/24/2010 04:52 PM, Gleb Natapov wrote: > >>>We can, of course. But for me it looks as arbitrary as -1/0/1 since not > >>>all enum values have meanings to the caller. > >>Yeah. -1/0/1's problem is that between reading the callee code and > >>caller code, I manage to forget what the values mean. > >> > >Luckily we have only one caller of x86_emulate_insn(), so documenting > >return values right where function is called should help. :) > > Please use #define instead of /* */. > Sigh. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
UIO: per-handle event counts
Hi, I've been using UIO for a userspace driver and noticed a feature when multiple uio file handles have been opened for the same device (e.g. when running separate processes). Since each handle stores its own count of the interrupts and uses this to determine whether uio_read returns, it's possible for uio_read to return before a new interrupt has occurred. Was this the intended behaviour? In my case, the irq is associated with completion of some hw processing so it seems a little strange to me. I expected uio_read to return only after an interrupt occurs that hasn't been already been handled by userspace code. Thanks Phil -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at arch/x86/kernel/apic/apic.c:1238 setup_local_APIC+0x169/0x24b()
> [ 0.00] NR_IRQS:4352 nr_irqs:512 > [ 0.00] Console: colour VGA+ 80x25 > [ 0.00] console [tty0] enabled > [ 0.00] ODEBUG: 0 of 0 active objects replaced > [ 0.00] Fast TSC calibration failed > [ 0.00] TSC: Unable to calibrate against PIT > [ 0.00] TSC: No reference (HPET/PMTIMER) available > [ 0.00] Marking TSC unstable due to could not calculate TSC khz > [ 0.052000] Calibrating delay loop... 3932.16 BogoMIPS (lpj=7864320) Enabling CONFIG_KVM_GUEST and CONFIG_KVM_CLOCK should fix the warning. > [ 0.148000] pid_max: default: 32768 minimum: 301 > [ 0.148000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) > [ 0.152000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) > [ 0.152000] Mount-cache hash table entries: 256 > [ 0.152000] Performance Events: unsupported p6 CPU model 2 no PMU > driver, software events only. > [ 0.16] Freeing SMP alternatives: 12k freed > [ 0.16] Setting APIC routing to flat > [ 0.164000] [ cut here ] > [ 0.164000] WARNING: at arch/x86/kernel/apic/apic.c:1238 > setup_local_APIC+0x169/0x24b() > [ 0.164000] Hardware name: Bochs > [ 0.168000] Pid: 1, comm: swapper Not tainted 2.6.36-rc1-tinux+ #89 > [ 0.168000] Call Trace: > [ 0.168000] [] warn_slowpath_common+0x75/0xb0 > [ 0.168000] [] warn_slowpath_null+0x15/0x20 > [ 0.172000] [] setup_local_APIC+0x169/0x24b > [ 0.172000] [] native_smp_prepare_cpus+0x322/0x3ff > [ 0.172000] [] kernel_init+0x57/0x1bd > [ 0.172000] [] kernel_thread_helper+0x4/0x10 > [ 0.176000] [] ? kernel_init+0x0/0x1bd > [ 0.176000] [] ? kernel_thread_helper+0x0/0x10 > [ 0.176000] ---[ end trace 4eaa2a86a8e2da22 ]--- > [ 0.18] alloc irq_desc for 33 on node 0 > [ 0.18] alloc kstat_irqs on node 0 > [ 0.18] alloc irq_desc for 35 on node 0 > [ 0.18] alloc kstat_irqs on node 0 > [ 0.18] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > [ 0.22] CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03 > [ 0.34] NMI watchdog failed to create perf event on cpu0: > ffed -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Add realmode test for jcxz instruction
On Tue, Aug 24, 2010 at 10:57:12AM +0800, Wei Yongjun wrote: > Signed-off-by: Wei Yongjun > --- > v2 -> v3: rebased > --- > x86/realmode.c | 42 ++ > 1 files changed, 42 insertions(+), 0 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On 08/24/2010 05:06 PM, Gleb Natapov wrote: On Tue, Aug 24, 2010 at 05:01:10PM +0300, Avi Kivity wrote: On 08/24/2010 04:52 PM, Gleb Natapov wrote: We can, of course. But for me it looks as arbitrary as -1/0/1 since not all enum values have meanings to the caller. Yeah. -1/0/1's problem is that between reading the callee code and caller code, I manage to forget what the values mean. Luckily we have only one caller of x86_emulate_insn(), so documenting return values right where function is called should help. :) Please use #define instead of /* */. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GRUB and support for Virtio
On Wednesday, 18 August 2010 21:45:56 +0300, Nikolai K. Bochev wrote: > Strange, i did 2 clean installs of centos 5.5 on an ubuntu 9.10 host > and it went flawless with virtio devices ( both disks and network ). Doing some tests with CentOS 5.5 on a KVM virtual machine, after doing the installation, I added a second disk. But when trying to boot from it, I get the following error: - root (hd1,0) Error 21: Selected disk does not exist - The two disks are Virtio devices that are recognized when booting from the first disk: - [r...@localhost ~]# fdisk -l /dev/vda Disco /dev/vda: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Unidades = cilindros de 16065 * 512 = 8225280 bytes Disposit. InicioComienzo Fin Bloques Id Sistema /dev/vda1 * 11174 9430123+ fd Linux raid autodetect /dev/vda211751305 1052257+ fd Linux raid autodetect [r...@localhost ~]# [r...@localhost ~]# [r...@localhost ~]# fdisk -l /dev/vdb Disco /dev/vdb: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Unidades = cilindros de 16065 * 512 = 8225280 bytes Disposit. InicioComienzo Fin Bloques Id Sistema /dev/vdb1 * 11174 9430123+ fd Linux raid autodetect /dev/vdb211751305 1052257+ fd Linux raid autodetect - The idea of these tests is to set up software RAID1 on a running system, since, it seems that Anaconda does not support installation on degraded RAID. But I'm not sure if this is a problem of Virtio or that GRUB is not recognizing the second disk. I made sure to modify /boot/grub/device.map with the entry for the new disk: - [r...@localhost grub]# cat /boot/grub/device.map # this device map was generated by anaconda (hd0) /dev/vda (hd1) /dev/vdb - And the reconfiguration of GRUB on both disks did not give problems: - [r...@localhost grub]# grub --device-map=/boot/grub/device.map GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename.] grub> root (hd0,0) root (hd0,0) Filesystem type is ext2fs, partition type 0xfd grub> setup (hd0) setup (hd0) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded. succeeded Running "install /boot/grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded Done. grub> root (hd1,0) root (hd1,0) Filesystem type is ext2fs, partition type 0xfd grub> setup (hd1) setup (hd1) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 15 sectors are embedded. succeeded Running "install /boot/grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded Done. grub> quit quit - Any idea what may be causing the problem? Regards, Daniel -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Lenny - Linux user #188.598 signature.asc Description: Digital signature
KVM Bridged Networking Issue in Ubuntu Lucid
Hi guys, I'm trying to sort out networking for a VM I've created on my Ubuntu Lucid box but the VM cannot access the Internet. I can connect to the VM (crisps) from the host (holly) and to the host from the VM. But the VM cannot connect to the Internet and the Internet cannot connect to the VM. I followed the guide here: https://help.ubuntu.com/community/KVM I created it using ubuntu-vm-builder using : ubuntu-vm-builder kvm lucid \ --domain crisps \ --dest crisps \ --arch amd64 \ --hostname crisps \ --mem 2048 \ --rootsize 4096 \ --swapsize 1024 \ --user magicalusernameofdoom \ --pass somesecretawesomepassword \ --ip 178.63.60.159 \ --mask 255.255.255.192 \ --bcast 178.63.60.191 \ --gw 178.63.60.129 \ --dns 213.133.99.99 \ --mirror http://de.archive.ubuntu.com/ubuntu \ --components main,universe,restricted,multiverse \ --addpkg openssh-server \ --libvirt qemu:///system \ --bridge br0; "virsh start crisps" starts the vm up : root 29297 0.5 3.3 2280396 271220 ? Sl 14:12 0:13 /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps -uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait -monitor chardev:monitor -boot c -drive file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on -net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0 -net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc 127.0.0.1:0 -vga cirrus Networking on the host is like so: br0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e inet addr:178.63.60.138 Bcast:178.63.60.191 Mask:255.255.255.192 inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0 TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:35266886777 (35.2 GB) TX bytes:60292047383 (60.2 GB) eth0 Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0 TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:36337218359 (36.3 GB) TX bytes:60378047518 (60.3 GB) Interrupt:29 Base address:0x4000 eth0:0Link encap:Ethernet HWaddr 40:61:86:e9:d6:9a inet addr:178.63.60.178 Bcast:178.63.60.191 Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:29 Base address:0x4000 loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0 TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:325111540 (325.1 MB) TX bytes:325111540 (325.1 MB) vnet0 Link encap:Ethernet HWaddr 26:5d:c9:d2:75:2e inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:72 errors:0 dropped:0 overruns:0 frame:0 TX packets:33 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:3412 (3.4 KB) TX bytes:2698 (2.6 KB) the hosts /etc/network/interfaces is like so : auto lo iface lo inet loopback # device: eth0 auto eth0 iface eth0 inet manual auto br0 iface br0 inet static address 178.63.60.138 broadcast 178.63.60.191 netmask 255.255.255.192 gateway 178.63.60.129 network 178.63.60.128 bridge_ports eth0 bridge_stp off bridge_fd 0 bridge_maxwait 0 # subli.me.uk auto eth0:0 iface eth0:0 inet static address 178.63.60.178 netmask 255.255.255.192 If I ping the VM I see the packets on the host br0 but not vnet0, this is the output of brctl show on the host : bridge name bridge id STP enabled interfaces br0 8000.265dc9d2752e no eth0 vnet0 this is the routing table on the host Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 178.63.60.128 0.0.0.0 255.255.255.192 U 0 00 br0 0.0.0.0 178.63.60.129 0.0.0.0 UG10000 br0 this is the routing table on the VM Kernel IP routing table Destination Gateway Genmask
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On Tue, Aug 24, 2010 at 05:01:10PM +0300, Avi Kivity wrote: > On 08/24/2010 04:52 PM, Gleb Natapov wrote: > > > >We can, of course. But for me it looks as arbitrary as -1/0/1 since not > >all enum values have meanings to the caller. > > Yeah. -1/0/1's problem is that between reading the callee code and > caller code, I manage to forget what the values mean. > Luckily we have only one caller of x86_emulate_insn(), so documenting return values right where function is called should help. :) -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On 08/24/2010 04:52 PM, Gleb Natapov wrote: We can, of course. But for me it looks as arbitrary as -1/0/1 since not all enum values have meanings to the caller. Yeah. -1/0/1's problem is that between reading the callee code and caller code, I manage to forget what the values mean. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 08/10] Check whether in long mode before testing vmexit caused by cr8 access
On 08/24/2010 04:47 PM, Jason Wang wrote: CR8 is only availabe when in long mode. Signed-off-by: Jason Wang --- x86/vmexit.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/x86/vmexit.c b/x86/vmexit.c index 819c24b..34b0af4 100644 --- a/x86/vmexit.c +++ b/x86/vmexit.c @@ -48,6 +48,7 @@ static void vmcall(void) #define MSR_EFER 0xc080 #define EFER_NX_MASK(1ull<< 11) +#define EFER_LMA_MASK (1ull<< 10) static void mov_from_cr8(void) { @@ -68,6 +69,15 @@ static int is_smp(void) return cpu_count()> 1; } +static int is_long_mode(void) +{ +#ifdef __i386__ +return 0; +#else +return rdmsr(MSR_EFER) | EFER_LMA_MASK; +#endif +} + static void nop(void *junk) { } @@ -100,8 +110,8 @@ static struct test { } tests[] = { { cpuid_test, "cpuid", .parallel = 1, }, { vmcall, "vmcall", .parallel = 1, }, - { mov_from_cr8, "mov_from_cr8", .parallel = 1, }, - { mov_to_cr8, "mov_to_cr8" , .parallel = 1, }, + { mov_from_cr8, "mov_from_cr8", is_long_mode, .parallel = 1, }, + { mov_to_cr8, "mov_to_cr8" , is_long_mode, .parallel = 1, }, { inl_pmtimer, "inl_from_pmtimer", .parallel = 1, }, { ipi, "ipi", is_smp, .parallel = 0, }, { ipi_halt, "ipi+halt", is_smp, .parallel = 0, }, Enough to #ifdef __x86_64__. If it's defined we're in long mode. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 07/10] Correct the tss size
On 08/24/2010 04:47 PM, Jason Wang wrote: TSS size should be 104 byte. Signed-off-by: Jason Wang --- x86/cstart64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/cstart64.S b/x86/cstart64.S index 5d358ad..b871153 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -69,7 +69,7 @@ tss: .long 0 .quad ring0stacktop - i * 4096 ring 0 stack .quad 0, 0, 0 rings 1, 2, 3 stack - .quad 0, 0, 0, 0, 0, 0, 0, 0 1 qword reserved, 7 qwords IST + .quad 0, 0, 0, 0, 0, 0, 0 .long 0, 0, 0 3 dwords reserved + I/O map base address - so this looks correct? i = i + 1 .endr -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On Tue, Aug 24, 2010 at 04:41:10PM +0300, Avi Kivity wrote: > On 08/24/2010 04:37 PM, Gleb Natapov wrote: > >On Tue, Aug 24, 2010 at 04:13:38PM +0300, Avi Kivity wrote: > >> On 08/24/2010 02:30 PM, Gleb Natapov wrote: > >>>x86_emulate_insn() will return 1 if instruction can be restarted > >>>without re-entering a guest. > >>> > >>So now we have an undocumented -1/0/1 return code? > >> > >>Better to have an enum for this. > >> > >We already have two. First is X86EMUL_ (not enum but close) for > >more or less internal emulator use. Second is EMULATE_* for users of > >emulate_instruction() now you want one more enum for communication > >between emulate_instruction() and x86_emulate_insn(). Lost in enums. > >emulate_instruction() and x86_emulate_insn() are tightly coupled right > >now should we define formal interface between them? May be comment will > >be enough? > > Can we reuse one or the other? Perhaps with extensions? > We can, of course. But for me it looks as arbitrary as -1/0/1 since not all enum values have meanings to the caller. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] S390: Add virtio hotplug add support
The one big missing feature in s390-virtio was hotplugging. This is no more. This patch implements hotplug add support, so you can on the fly add new devices in the guest. Keep in mind that this needs a patch for qemu to actually leverage the functionality. Signed-off-by: Alexander Graf --- v1 -> v2: - move dev_add to kvm_virtio.h --- arch/s390/include/asm/kvm_virtio.h |1 + drivers/s390/kvm/kvm_virtio.c | 47 2 files changed, 48 insertions(+), 0 deletions(-) diff --git a/arch/s390/include/asm/kvm_virtio.h b/arch/s390/include/asm/kvm_virtio.h index 3f5d100..72f6141 100644 --- a/arch/s390/include/asm/kvm_virtio.h +++ b/arch/s390/include/asm/kvm_virtio.h @@ -59,5 +59,6 @@ struct kvm_vqconfig { #define VIRTIO_PARAM_MASK 0xff #define VIRTIO_PARAM_VRING_INTERRUPT 0x0 #define VIRTIO_PARAM_CONFIG_CHANGED0x1 +#define VIRTIO_PARAM_DEV_ADD 0x2 #endif diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index 68cef4d..5a46b8c 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -32,6 +32,7 @@ * The pointer to our (page) of device descriptions. */ static void *kvm_devices; +struct work_struct hotplug_work; struct kvm_device { struct virtio_device vdev; @@ -328,6 +329,47 @@ static void scan_devices(void) } /* + * match for a kvm device with a specific desc pointer + */ +static int match_desc(struct device *dev, void *data) +{ + if ((ulong)to_kvmdev(dev_to_virtio(dev))->desc == (ulong)data) + return 1; + + return 0; +} + +/* + * hotplug_device tries to find changes in the device page. + */ +static void hotplug_devices(struct work_struct *dummy) +{ + unsigned int i; + struct kvm_device_desc *d; + struct device *dev; + + for (i = 0; i < PAGE_SIZE; i += desc_size(d)) { + d = kvm_devices + i; + + /* end of list */ + if (d->type == 0) + break; + + /* device already exists */ + dev = device_find_child(kvm_root, d, match_desc); + if (dev) { + /* XXX check for hotplug remove */ + put_device(dev); + continue; + } + + /* new device */ + printk(KERN_INFO "Adding new virtio device %p\n", d); + add_kvm_device(d, i); + } +} + +/* * we emulate the request_irq behaviour on top of s390 extints */ static void kvm_extint_handler(u16 code) @@ -357,6 +399,9 @@ static void kvm_extint_handler(u16 code) break; } + case VIRTIO_PARAM_DEV_ADD: + schedule_work(&hotplug_work); + break; case VIRTIO_PARAM_VRING_INTERRUPT: default: vring_interrupt(0, vq); @@ -390,6 +435,8 @@ static int __init kvm_devices_init(void) kvm_devices = (void *) real_memory_size; + INIT_WORK(&hotplug_work, hotplug_devices); + ctl_set_bit(0, 9); register_external_interrupt(0x2603, kvm_extint_handler); -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] S390: take a full byte as ext_param indicator
Currenty the ext_param field only distinguishes between "config change" and "vring interrupt". We can do a lot more with it though, so let's enable a full byte of possible values and constants to #defines while at it. Signed-off-by: Alexander Graf --- v1 -> v2: - move defines to virtio_s390.h --- arch/s390/include/asm/kvm_virtio.h |6 ++ drivers/s390/kvm/kvm_virtio.c | 19 +-- 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/s390/include/asm/kvm_virtio.h b/arch/s390/include/asm/kvm_virtio.h index acdfdff..3f5d100 100644 --- a/arch/s390/include/asm/kvm_virtio.h +++ b/arch/s390/include/asm/kvm_virtio.h @@ -54,4 +54,10 @@ struct kvm_vqconfig { * This is pagesize for historical reasons. */ #define KVM_S390_VIRTIO_RING_ALIGN 4096 + +/* These values are supposed to be in ext_params on an interrupt */ +#define VIRTIO_PARAM_MASK 0xff +#define VIRTIO_PARAM_VRING_INTERRUPT 0x0 +#define VIRTIO_PARAM_CONFIG_CHANGED0x1 + #endif diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index 4e298bc..68cef4d 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -334,7 +334,7 @@ static void kvm_extint_handler(u16 code) { struct virtqueue *vq; u16 subcode; - int config_changed; + u32 param; subcode = S390_lowcore.cpu_addr; if ((subcode & 0xff00) != VIRTIO_SUBCODE_64) @@ -343,18 +343,25 @@ static void kvm_extint_handler(u16 code) /* The LSB might be overloaded, we have to mask it */ vq = (struct virtqueue *)(S390_lowcore.ext_params2 & ~1UL); - /* We use the LSB of extparam, to decide, if this interrupt is a config -* change or a "standard" interrupt */ - config_changed = S390_lowcore.ext_params & 1; + /* We use ext_params to decide what this interrupt means */ + param = S390_lowcore.ext_params & VIRTIO_PARAM_MASK; - if (config_changed) { + switch (param) { + case VIRTIO_PARAM_CONFIG_CHANGED: + { struct virtio_driver *drv; drv = container_of(vq->vdev->dev.driver, struct virtio_driver, driver); if (drv->config_changed) drv->config_changed(vq->vdev); - } else + + break; + } + case VIRTIO_PARAM_VRING_INTERRUPT: + default: vring_interrupt(0, vq); + break; + } } /* -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] S390: Export kvm_virtio.h
As suggested by Christian, we should expose headers to user space with information that might be valuable there. The s390 virtio interface is one of those cases. It defines an ABI between hypervisor and guest, so it should be exposed to user space. Reported-by: Christian Borntraeger Signed-off-by: Alexander Graf --- arch/s390/include/asm/Kbuild |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 63a2341..ee7937d 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -8,6 +8,7 @@ header-y += ucontext.h header-y += vtoc.h header-y += zcrypt.h header-y += chsc.h +header-y += kvm_virtio.h unifdef-y += cmb.h unifdef-y += debug.h -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 10/10] Add the 32bit smp initialization code
Add the smp initlaization codes for 32bit. This would make the 32bit smp tests available. Signed-off-by: Jason Wang --- x86/cstart.S | 179 -- 1 files changed, 173 insertions(+), 6 deletions(-) diff --git a/x86/cstart.S b/x86/cstart.S index 1bdf789..6ce200d 100644 --- a/x86/cstart.S +++ b/x86/cstart.S @@ -1,9 +1,65 @@ +#include "apic-defs.h" + +.globl boot_idt +boot_idt = 0 + +ipi_vector = 0x20 + +max_cpus = 4 .bss + . = . + 4096 * max_cpus + .align 16 +stacktop: + + . = . + 4096 + .align 16 +ring0stacktop: + +.data + +.align 4096 +pt: +i = 0 +.rept 1024 +.long 0x1e7 | (i << 22) +i = i + 1 +.endr + +gdt32: + .quad 0 + .quad 0x00cf9b00 // flat 32-bit code segment + .quad 0x00cf9300 // flat 32-bit data segment + +tss_descr: +.rept max_cpus +.quad 0x8900 // 32-bit avail tss +.endr +gdt32_end: + +i = 0 +tss: +.rept max_cpus +.long 0 +.long ring0stacktop - i * 4096 +.long 0 +.quad 0, 0, 0 +.quad 0, 0, 0, 0, 0, 0, 0 +.long 0, 0, 0 +i = i + 1 +.endr +tss_end: + +idt_descr: + .word 16 * 256 - 1 + .long boot_idt + .section .init +.code32 + mb_magic = 0x1BADB002 mb_flags = 0x0 @@ -11,15 +67,126 @@ mb_flags = 0x0 .long mb_magic, mb_flags, 0 - (mb_magic + mb_flags) mb_cmdline = 16 +MSR_GS_BASE = 0xc101 + +.macro setup_percpu_area + lea -4096(%esp), %eax + mov $0, %edx + mov $MSR_GS_BASE, %ecx + wrmsr +.endm + .globl start start: - mov mb_cmdline(%ebx), %eax - mov %eax, __args - call __setup_args - pushl $__argv - pushl __argc - call main +mov mb_cmdline(%ebx), %eax +mov %eax, __args +call __setup_args +mov $stacktop, %esp +setup_percpu_area +call prepare_32 +jmpl $8, $start32 + +prepare_32: +lgdtl gdt32_descr + + mov %cr4, %eax + bts $4, %eax // pse + mov %eax, %cr4 + + mov $pt, %eax + mov %eax, %cr3 + + mov %cr0, %eax + bts $0, %eax + bts $31, %eax + mov %eax, %cr0 + ret + +smp_stacktop: .long 0xa + +ap_start32: + mov $0x10, %ax + mov %ax, %ds + mov %ax, %es + mov %ax, %fs + mov %ax, %gs + mov %ax, %ss + mov $-4096, %esp + lock/xaddl %esp, smp_stacktop + setup_percpu_area + call prepare_32 + call load_tss + call enable_apic + call enable_x2apic + sti + nop + lock incw cpu_online_count + +1: hlt + jmp 1b + +start32: + call load_tss + call mask_pic_interrupts + call enable_apic + call smp_init + call enable_x2apic +push $__argv +push __argc +call main push %eax call exit +load_tss: + lidt idt_descr + mov $16, %eax + mov %ax, %ss + mov $(APIC_DEFAULT_PHYS_BASE + APIC_ID), %eax + mov (%eax), %eax + shr $24, %eax + mov %eax, %ebx + shl $3, %ebx + mov $((tss_end - tss) / max_cpus), %edx + imul %edx + add $tss, %eax + mov %ax, tss_descr+2(%ebx) + shr $16, %eax + mov %al, tss_descr+4(%ebx) + shr $8, %eax + mov %al, tss_descr+7(%ebx) + lea tss_descr-gdt32(%ebx), %eax + ltr %ax + ret + +smp_init: + cld + lea sipi_entry, %esi + xor %edi, %edi + mov $(sipi_end - sipi_entry), %ecx + rep/movsb + mov $APIC_DEFAULT_PHYS_BASE, %eax + movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT), APIC_ICR(%eax) + movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT), APIC_ICR(%eax) + movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_STARTUP), APIC_ICR(%eax) + call fwcfg_get_nb_cpus +1: pause + cmpw %ax, cpu_online_count + jne 1b +smp_init_done: + ret + +cpu_online_count: .word 1 + +.code16 +sipi_entry: + mov %cr0, %eax + or $1, %eax + mov %eax, %cr0 + lgdtl gdt32_descr - sipi_entry + ljmpl $8, $ap_start32 + +gdt32_descr: + .word gdt32_end - gdt32 - 1 + .long gdt32 +sipi_end: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 09/10] Do not test IA32_EFER in 32bit mode.
Signed-off-by: Jason Wang --- x86/msr.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/x86/msr.c b/x86/msr.c index 9c85369..c3e0014 100644 --- a/x86/msr.c +++ b/x86/msr.c @@ -49,9 +49,11 @@ struct msr_info msr_info[] = { .index = 0xc102, .name = "MSR_KERNEL_GS_BASE", .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} }, +#ifdef __x86_64__ { .index = 0xc080, .name = "MSR_EFER", .val_pairs = {{ .valid = 1, .value = 0xD00, .expected = 0xD00}} }, +#endif { .index = 0xc082, .name = "MSR_LSTAR", .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} }, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 08/10] Check whether in long mode before testing vmexit caused by cr8 access
CR8 is only availabe when in long mode. Signed-off-by: Jason Wang --- x86/vmexit.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/x86/vmexit.c b/x86/vmexit.c index 819c24b..34b0af4 100644 --- a/x86/vmexit.c +++ b/x86/vmexit.c @@ -48,6 +48,7 @@ static void vmcall(void) #define MSR_EFER 0xc080 #define EFER_NX_MASK(1ull << 11) +#define EFER_LMA_MASK (1ull << 10) static void mov_from_cr8(void) { @@ -68,6 +69,15 @@ static int is_smp(void) return cpu_count() > 1; } +static int is_long_mode(void) +{ +#ifdef __i386__ +return 0; +#else +return rdmsr(MSR_EFER) | EFER_LMA_MASK; +#endif +} + static void nop(void *junk) { } @@ -100,8 +110,8 @@ static struct test { } tests[] = { { cpuid_test, "cpuid", .parallel = 1, }, { vmcall, "vmcall", .parallel = 1, }, - { mov_from_cr8, "mov_from_cr8", .parallel = 1, }, - { mov_to_cr8, "mov_to_cr8" , .parallel = 1, }, + { mov_from_cr8, "mov_from_cr8", is_long_mode, .parallel = 1, }, + { mov_to_cr8, "mov_to_cr8" , is_long_mode, .parallel = 1, }, { inl_pmtimer, "inl_from_pmtimer", .parallel = 1, }, { ipi, "ipi", is_smp, .parallel = 0, }, { ipi_halt, "ipi+halt", is_smp, .parallel = 0, }, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 06/10] Remove the duplicated rdmsr/wrmsr
They have been implemented in lib/x86/processor.h. Also rename the cpuid to cpuid_test because cpuid have been defined. Signed-off-by: Jason Wang --- x86/msr.c| 16 +--- x86/vmexit.c | 20 +++- 2 files changed, 4 insertions(+), 32 deletions(-) diff --git a/x86/msr.c b/x86/msr.c index 1e7987d..9c85369 100644 --- a/x86/msr.c +++ b/x86/msr.c @@ -1,6 +1,7 @@ /* msr tests */ #include "libcflat.h" +#include "processor.h" struct msr_info { int index; @@ -87,21 +88,6 @@ static void report(const char *name, int passed) printf("%s: %s\n", name, passed ? "PASS" : "FAIL"); } -static void wrmsr(unsigned index, unsigned long long value) -{ - unsigned a = value, d = value >> 32; - - asm volatile("wrmsr" : : "a"(a), "d"(d), "c"(index)); -} - -static unsigned long long rdmsr(unsigned index) -{ - unsigned a, d; - - asm volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(index)); - return ((unsigned long long)d << 32) | a; -} - static void test_msr_rw(int msr_index, unsigned long long input, unsigned long long expected) { unsigned long long r = 0; diff --git a/x86/vmexit.c b/x86/vmexit.c index 707d5c6..819c24b 100644 --- a/x86/vmexit.c +++ b/x86/vmexit.c @@ -1,6 +1,7 @@ #include "libcflat.h" #include "smp.h" +#include "processor.h" static inline unsigned long long rdtsc() { @@ -32,7 +33,7 @@ static unsigned int inl(unsigned short port) # define R "e" #endif -static void cpuid(void) +static void cpuid_test(void) { asm volatile ("push %%"R "bx; cpuid; pop %%"R "bx" : : : "eax", "ecx", "edx"); @@ -48,21 +49,6 @@ static void vmcall(void) #define MSR_EFER 0xc080 #define EFER_NX_MASK(1ull << 11) -unsigned long long rdmsr(unsigned index) -{ - unsigned a, d; - - asm volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(index)); - return ((unsigned long long)d << 32) | a; -} - -void wrmsr(unsigned index, unsigned long long val) -{ - unsigned a = val, d = val >> 32; - - asm volatile("wrmsr" : : "a"(a), "d"(d), "c"(index)); -} - static void mov_from_cr8(void) { unsigned long cr8; @@ -112,7 +98,7 @@ static struct test { int (*valid)(void); int parallel; } tests[] = { - { cpuid, "cpuid", .parallel = 1, }, + { cpuid_test, "cpuid", .parallel = 1, }, { vmcall, "vmcall", .parallel = 1, }, { mov_from_cr8, "mov_from_cr8", .parallel = 1, }, { mov_to_cr8, "mov_to_cr8" , .parallel = 1, }, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 07/10] Correct the tss size
TSS size should be 104 byte. Signed-off-by: Jason Wang --- x86/cstart64.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/cstart64.S b/x86/cstart64.S index 5d358ad..b871153 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -69,7 +69,7 @@ tss: .long 0 .quad ring0stacktop - i * 4096 .quad 0, 0, 0 - .quad 0, 0, 0, 0, 0, 0, 0, 0 + .quad 0, 0, 0, 0, 0, 0, 0 .long 0, 0, 0 i = i + 1 .endr -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 05/10] Drop print.S
We've already had lib/printf.c. Signed-off-by: Jason Wang --- config-x86-common.mak |9 - x86/print.S | 31 --- 2 files changed, 4 insertions(+), 36 deletions(-) delete mode 100644 x86/print.S diff --git a/config-x86-common.mak b/config-x86-common.mak index 2d2965f..a623983 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -33,7 +33,7 @@ test_cases: $(tests-common) $(tests) $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I lib -I lib/x86 -$(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o $(TEST_DIR)/print.o +$(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o $(TEST_DIR)/hypercall.flat: $(cstart.o) $(TEST_DIR)/hypercall.o @@ -45,14 +45,13 @@ $(TEST_DIR)/vmexit.flat: $(cstart.o) $(TEST_DIR)/vmexit.o $(TEST_DIR)/smptest.flat: $(cstart.o) $(TEST_DIR)/smptest.o $(TEST_DIR)/emulator.flat: $(cstart.o) $(TEST_DIR)/emulator.o \ - $(TEST_DIR)/vm.o $(TEST_DIR)/print.o + $(TEST_DIR)/vm.o $(TEST_DIR)/port80.flat: $(cstart.o) $(TEST_DIR)/port80.o $(TEST_DIR)/tsc.flat: $(cstart.o) $(TEST_DIR)/tsc.o -$(TEST_DIR)/apic.flat: $(cstart.o) $(TEST_DIR)/apic.o $(TEST_DIR)/vm.o \ - $(TEST_DIR)/print.o +$(TEST_DIR)/apic.flat: $(cstart.o) $(TEST_DIR)/apic.o $(TEST_DIR)/vm.o $(TEST_DIR)/realmode.flat: $(TEST_DIR)/realmode.o $(CC) -m32 -nostdlib -o $@ -Wl,-T,$(TEST_DIR)/realmode.lds $^ @@ -66,7 +65,7 @@ $(TEST_DIR)/idt_test.flat: $(cstart.o) $(TEST_DIR)/idt.o $(TEST_DIR)/idt_test.o $(TEST_DIR)/xsave.flat: $(cstart.o) $(TEST_DIR)/idt.o $(TEST_DIR)/xsave.o $(TEST_DIR)/rmap_chain.flat: $(cstart.o) $(TEST_DIR)/rmap_chain.o \ -$(TEST_DIR)/print.o $(TEST_DIR)/vm.o +$(TEST_DIR)/vm.o $(TEST_DIR)/svm.flat: $(cstart.o) $(TEST_DIR)/vm.o diff --git a/x86/print.S b/x86/print.S deleted file mode 100644 index b445381..000 --- a/x86/print.S +++ /dev/null @@ -1,31 +0,0 @@ - -#include "print.h" - -#define PSEUDO_SERIAL_PORT 0xf1 - - -.text - PRINT "boo" - hlt -1: jmp 1b - -.globl print -print: - push %rax - push %rsi - push %rdx - - mov %rdi, %rsi - mov $(PSEUDO_SERIAL_PORT), %edx - -putchar: - cmpb $0, (%rsi) - jz done - outsb - jmp putchar -done: - - pop %rdx - pop %rsi - pop %rax - ret -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 04/10] Correct the path in README
Signed-off-by: Jason Wang --- README | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/README b/README index 6a83831..9c0c518 100644 --- a/README +++ b/README @@ -7,17 +7,17 @@ Tests uses a qemu's virtual test device, named testdev, for services like printi See file testdev.txt for more details. To create the tests' images just type 'make' in this directory. -Tests' images created in ./test//*.flat +Tests' images created in .//*.flat An example of a test invocation: -qemu-system-x86_64 -device testdev,chardev=testlog -chardev file,id=testlog,path=msr.out -kernel ./test/x86/msr.flat +qemu-system-x86_64 -device testdev,chardev=testlog -chardev file,id=testlog,path=msr.out -kernel ./x86/msr.flat This invocation runs the msr test case. The test output is in file msr.out. Directory structure: .: Makefile and config files for the tests -./test/lib: general services for the tests -./test/lib/: architecture dependent services for the tests -./test/: the sources of the tests and the created objects/images +./lib: general services for the tests +./lib/: architecture dependent services for the tests +./: the sources of the tests and the created objects/images -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 03/10] Makefile cleanup
Remove the obsoleted target and directories. Signed-off-by: Jason Wang --- Makefile |9 + 1 files changed, 1 insertions(+), 8 deletions(-) diff --git a/Makefile b/Makefile index 5347ce8..d25e6f2 100644 --- a/Makefile +++ b/Makefile @@ -30,10 +30,6 @@ CFLAGS += -O1 CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall CFLAGS += $(call cc-option, -fno-stack-protector, "") CFLAGS += $(call cc-option, -fno-stack-protector-all, "") -CFLAGS += -I../include -CFLAGS += -I ../libkvm - -LDFLAGS += $(CFLAGS) -L ../libkvm CXXFLAGS = $(autodepend-flags) @@ -43,9 +39,6 @@ LDFLAGS += -pthread -lrt kvmtrace_objs= kvmtrace.o -kvmctl: $(kvmctl_objs) - $(CC) $(LDFLAGS) $^ -o $@ - kvmtrace: $(kvmtrace_objs) $(CC) $(LDFLAGS) $^ -o $@ @@ -62,4 +55,4 @@ install: install $(tests_and_config) $(DESTDIR) clean: arch_clean - $(RM) kvmctl kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs) + $(RM) kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 02/10] Remove trailing whitespaces
Signed-off-by: Jason Wang --- config-x86-common.mak | 14 +++--- x86/access.c |2 +- x86/cstart64.S|4 ++-- x86/print.S |6 +++--- x86/sieve.c |2 +- x86/vm.c |6 +++--- 6 files changed, 17 insertions(+), 17 deletions(-) diff --git a/config-x86-common.mak b/config-x86-common.mak index 19bffd4..2d2965f 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -32,18 +32,18 @@ tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg test_cases: $(tests-common) $(tests) $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I lib -I lib/x86 - + $(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o $(TEST_DIR)/print.o - + $(TEST_DIR)/hypercall.flat: $(cstart.o) $(TEST_DIR)/hypercall.o - + $(TEST_DIR)/sieve.flat: $(cstart.o) $(TEST_DIR)/sieve.o \ $(TEST_DIR)/vm.o - + $(TEST_DIR)/vmexit.flat: $(cstart.o) $(TEST_DIR)/vmexit.o - + $(TEST_DIR)/smptest.flat: $(cstart.o) $(TEST_DIR)/smptest.o - + $(TEST_DIR)/emulator.flat: $(cstart.o) $(TEST_DIR)/emulator.o \ $(TEST_DIR)/vm.o $(TEST_DIR)/print.o @@ -52,7 +52,7 @@ $(TEST_DIR)/port80.flat: $(cstart.o) $(TEST_DIR)/port80.o $(TEST_DIR)/tsc.flat: $(cstart.o) $(TEST_DIR)/tsc.o $(TEST_DIR)/apic.flat: $(cstart.o) $(TEST_DIR)/apic.o $(TEST_DIR)/vm.o \ - $(TEST_DIR)/print.o + $(TEST_DIR)/print.o $(TEST_DIR)/realmode.flat: $(TEST_DIR)/realmode.o $(CC) -m32 -nostdlib -o $@ -Wl,-T,$(TEST_DIR)/realmode.lds $^ diff --git a/x86/access.c b/x86/access.c index a0d88e1..067565b 100644 --- a/x86/access.c +++ b/x86/access.c @@ -157,7 +157,7 @@ static void ac_test_show(ac_test_t *at); void lidt(idt_entry_t *idt, int nentries) { descriptor_table_t dt; - + dt.limit = nentries * sizeof(*idt) - 1; dt.linear_addr = (unsigned long)idt; asm volatile ("lidt %0" : : "m"(dt)); diff --git a/x86/cstart64.S b/x86/cstart64.S index cc4c80f..5d358ad 100644 --- a/x86/cstart64.S +++ b/x86/cstart64.S @@ -19,7 +19,7 @@ stacktop: ring0stacktop: .data - + .align 4096 ptl2: i = 0 @@ -38,7 +38,7 @@ ptl3: .align 4096 ptl4: .quad ptl3 + 7 - + .align 4096 gdt64_desc: diff --git a/x86/print.S b/x86/print.S index c1b1c0d..b445381 100644 --- a/x86/print.S +++ b/x86/print.S @@ -3,12 +3,12 @@ #define PSEUDO_SERIAL_PORT 0xf1 - + .text PRINT "boo" hlt 1: jmp 1b - + .globl print print: push %rax @@ -24,7 +24,7 @@ putchar: outsb jmp putchar done: - + pop %rdx pop %rsi pop %rax diff --git a/x86/sieve.c b/x86/sieve.c index ef4a5a0..6cbcd6d 100644 --- a/x86/sieve.c +++ b/x86/sieve.c @@ -46,6 +46,6 @@ int main() test_sieve("virtual", v, VSIZE); vfree(v); } - + return 0; } diff --git a/x86/vm.c b/x86/vm.c index 62b3ba8..b34449f 100644 --- a/x86/vm.c +++ b/x86/vm.c @@ -30,7 +30,7 @@ void *alloc_page() if (!free) return 0; - + p = free; free = *(void **)free; @@ -184,7 +184,7 @@ void *vmalloc(unsigned long size) unsigned pages; size += sizeof(unsigned long); - + size = (size + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1); vfree_top -= size; mem = p = vfree_top; @@ -201,7 +201,7 @@ void *vmalloc(unsigned long size) void vfree(void *mem) { unsigned long size = ((unsigned long *)mem)[-1]; - + while (size) { free_page(phys_to_virt(get_pte(phys_to_virt(read_cr3()), mem) & PTE_ADDR)); mem += PAGE_SIZE; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 01/10] Do not track config.mak and kvmtrace
config.mak was generated by configure and kvmtrace were compiled throuh makefile. Signed-off-by: Jason Wang --- config.mak |8 kvmtrace | Bin 2 files changed, 0 insertions(+), 8 deletions(-) delete mode 100644 config.mak delete mode 100755 kvmtrace diff --git a/config.mak b/config.mak deleted file mode 100644 index 333ec7c..000 --- a/config.mak +++ /dev/null @@ -1,8 +0,0 @@ -PREFIX=/usr/local -KERNELDIR=/home/avi/kvm/linux-2.6 -ARCH=x86_64 -PROCESSOR=x86_64 -CC=gcc -LD=ld -OBJCOPY=objcopy -AR=ar diff --git a/kvmtrace b/kvmtrace deleted file mode 100755 index d4610290bd47922b276f6cd3604972b07bb2e07d.. GIT binary patch literal 0 HcmV?d1 literal 36834 zcmd6Q3w%_?_5a-6WZ8ryTp&nzC<_S?9tj~lF)thph?k{ib-...@tdxc ze60}$wf&)Oh1%AlElTSPK_M!N*lMlTM_XH|Rl5-%C_YfD`G3zmcJFSo{r&s...@e}4bz z|0+9k&YW}R%$YN1=H5GVv(;I=(xhojT_$!OBdAQbbCmdHtfq^D61Rw1nVn5z>C6I5 zJpS2v38f;Z(mp9#rB+F2MjA(XlSG*8A|u...@0%9sv8d6i2g)h6ziZp4u09DGEjU$MU zbk$|kke00prhmpe...@#;v...@a4wmgIFvvaxrI)qRY3rniFIAxDffT-?N{G-04@)2D}t}J})mAfb1`0AYIZT%lw z|L_Nm%4XxAvOkNbToShm;j{2>@u$5LbB}&G>cP9eB(XzY4zLYXcu5TW2}l`={`wg7 z12O2Q#Gp?Cy$%1OuL(fdqLF|v...@tu@x+kd7K1)h...@~&=(}Uk?}&jv6hqFg81%(4 z=-Xq^Z;l...@+9fsu#g3zysz<(73?}{P+bPW2ZV&LzOfnN|qer62(+8A=OW6*yWgZ}0i z_|h11?ukLaI|lyZ7;OjFT4~c6R62jXh zUd6>!NnaO5&k*;z{1oTrGPk>>-P_^z`Rlv2^**1+N7SAE zbzPqNCSu895^!JYZS4pFjVSkfAkNd()!}uwc^m8ft=PVtqltAwgtv>ecXvR_dgg0w>8NjG?dzLbyI70I-`U#4 zeEu%K7r6HNPUdS|=...@~ljptit=gkl@|7Osvg&*6(>(I~UdIZEIuAjcs0^ zM~LW#jQTc|q...@jex-4DZm);wd>x8(v#-(H0ef_LJgl&|C@ywzg#*eq5c0d2!z}ei4gye=5h...@ztjyh1fb)uX}uJ0?lwg zh9A+Tf%-C}R8E(f...@nhdr?j}l2nat2q%x4jisyv@tpwke...@ngi9p{flz73w-$t zJK<^GRF`!|9nx?%sunau170pci6l...@2{ohunob9v*k-;En6990T6C?pS2N!!V)C zZonsou#CA3_~8b8nE_98t-5Lq_+*7dT4%tg81U;1_z?ztrvb0}l8...@fnzm+yesp z+`GYmA8nxDVZhTGNnN`P_%RBJbdLdlmI1%lfFEnX?>FGn4EO^E{5S)?&ww9qz&~fe z)7nK{FB$M^4Nhfm81QE+Vx%Vw_;U>SpaDP0fInrx+YIl;8;j=Ww`zU>aim zHV&^Ln1)ur&f#)`...@m4xdjj4w<55-%|VY2(}S?g2PJ)rlHgS5{GjMrXkbc$Kja- z(@^Q(&*3Qq(-7(3!{PA+)6nSO!QoK^(~#)j#^FSQX(;q...@a&L4T1hTz;iz=3jE;` zSKw_|&pW5eDvOTwb)I8qu47Mb@a*x`bYyL#j6(`?ljzj...@dg|N z9mcE56?nIT2)8Zon`p!Qy6QmY^g8D19di(%E6YZ}lV{AX?KK0g#eeoq9WiMebirVG zN^=Fqd>=#usjqUX-Z9q`%i=$Gz5Q&w4_S?>TYLQJ=H}+?5tFVVHdf{8Sw8kG8w3R2 z^^ZD0DALV(bnk5...@orx`^z?e2xknf!bkt...@d45}v(m(hHzAR|s=CubaN0^%^|4tyMJfxWKn+njV{ z-6U9V2HfD}$S(>qELwMBZ+s%Efrc;iUal7hJ}nM>6kLbu9f6}jo_lTIR!49b*n1WY z_s?)`U!UrNynh1W3iS7mKHLt0uHNOBULl-~Gu^&<1_& z#g2;{7dtL_nztkHjU(`xtM^C^hF);PHvU-YTTKc*X zX>*Nupfk-i<2...@+*x}yuxqny!1dv+j)`0O+kK9Pc`vA}ZiX;6$gn;7SUM^o~Q#`Ae zs&n...@kcejmqqvkte=pu~xlg6qpr4e0k=;tax0a5lo...@rd%m?f`5wb0p+t#a0&wd z7!HBA25##_KJXN;{...@rx}o$yft%ba&eB5k!Eu7}3W#(RRS{d5IfrFW~|gce$...@#! z$vs2{Cqu>uPwvpxb}Q#xOcG&IWNW`zN}# zKc2AzG>7j%(}xit0c{kv...@mui<6Q$5{59XYD5Dc$|zw!+iTgyyOU3EZ6}C*m_-X zDpc!PWNO~wPd)NuGL;{Q14Pq5K#Z!c4}l6EKpD7l6*gGN8eRyynG-Wxtru9 zh2%WS<@}1*zPj<9^S56ymGursi~bI_!;gg;@%~o>1bag...@yg039i8*w>vJs`8J} zoa=Y80bOol!FhtRj*Kge{utp8qE<+HHhB6BDeWJ3@+RS zABCEa^-$NZ8z~Lzy6Px(0PkWR1Os0?4)o2cV~_2{a^UbQq-^i<<73sJEQEAn;C9p+ z3M0K|zX+0^%%$Powj)g1Nk& zr#NNL5|f`Mfy1uJyeZ!yq6V)50r&lr`L4iw2T3wU%OS`_gx!1=WClw}+*`pG-~{D3 zK(#~>_~$`dL-5*z*TV|1AxT*KbVbkdM<9N{f00mf0S32gmn#r|FR;DwyYY*D$~qRm z6FIsLTY0#?8ld2?q5^1;c;rj!%I6{c(6+2P*1I(Lw=XcD%oHxxK206f8^4E(>rOou ze=Bm_>D$of!3|U^1^SA>H^DyufZ~1NM#R7BgBwsQMq6+aDhfRF7}ORuqgAI}zl75^ zKIvLMfHm31(XJ-...@v^abu{ch2vohcdpozy^;?WW=n zdKz%TK0gDMu+Q_K4GbKlaqQ}yccc(@6aNY^CS*1d#Xr#tLiuD8*BieQSV-KBH)=+D~%DOr4c<}O2#{(DU9vCp_0y)TLbF$&CWAT%a+eTK~?$6L+ zl+l2L=RqAf=iDQApuS)ZszQAK3UwUbAv)ma$P1ICqfksR%@|0otFFkA65SEH?+G3NSkO8(9`#X(E^7RM z2It+MqV<8NsSDGwpg6pTx}bOYPvJ1a0+U0!G(iPd;C(sPj|IQ{gjmns1Jc8!-H|82 zE0~hN6nHui??;)4EL0YJ1}X^Kq;Xkq!zo%*Y~&OTpu|!Yh&q9&B?Q*3hb{M8?Y+xq z0u-|MqgvQ*!jWf5WEVstDE0y=BvXrz#eW00q^9%P%klUQ1m2$I$&kUeY<4yN)z$hJ zzughgjkd0+...@gc^)NS4kXS%Y=BX6T-!101^#rfZmeC-6TjyQjJT!_6&!h;l=zSv z...@vy!bx-xdch>2`{lTf>`e&!UA+y|1SSvR3%#5GI7DB40(PH-p9V5Drn3BSP{P#z+M zx$Z~yT&J<%...@5t1>=...@amg}ux8%|)w)JId1C~lUrGfW*4p1axBW{0k#3RSG=Uwr< zc=4(3_pW`p`_bU7G?Vt8{mBTMvI>Y{l`nx{Tb8+cZ`(xTu6l~54A+zICnC_iht>&A zzy5e&pxFFVBEfVor}?E>hroQr(W33QXDF^iH$Zg?Qm!eA_b_<56j#qu=...@1zk(3bH=aiA&Vgll`&CC;|g4qfxE*G78TFHUDLNk zfiGN7erqlYydHezBN*%25B(G8{)L{06bHU84t!A%_}DR!`lhSrY0b6xWY_zy-iv5e z)Rv(a_cmpu6(eUW?)7J67WZC}k%M`Ak-Du5-EYFbPYtu-X5-9&f...@o*~>!0h-)% z(u(_3OtUmvKA|ZecaY*fFjz{RPm2JWrZ8>{7k}(t...@k@rZ(A{>+$%!$wqGmY~E(u zR9~Xu9RPo%Kh-yfKTc=)^&Qjww)Xn<9...@-`vza;Uz#Ko9fG6zoFgVRp02zp4!AV z&SL6mdzBAQ{Fgwc_#c#WKsg6b{Qd4OPouZ1$;VbzuXb0KJMx|gf}#p...@ncr|t}i zsaim^dnk=ml=7%-...@fw;iZ7krj1psBdd+vf<%>v!}~OuODogQ`@>{...@ymaquzu0 zyh}dnsya1vq+?0j67?#f6z...@h)fWwRyHpyo2$9rwOv=GCz-P7l?-F~-tM+0>L_lNrY>)1C;FFOl5m5BEE1~11`BnQRhn!*Pood5 z;TH_q8{9kmz0&4<~u7YO3N$Ur6t8BGo+gmGK?h$eUOU;k zeXSiWXe?ekwE&E8psWUECB{aNx05)q{...@j??*2pn;>+^8X;^f15URJKm3hkG8)q zJ2pEXF=BJD)_Aktk( zFU34d)9%YiUqbp7(jZbRRumLw+pvbQAx))89cdbFmphSWBE11AU8y2r+V{$twn^6} zBn`9DJvia%;p-ykm`OARDWo!g$2K3S3;5p=A}>YHOG(R1u{lyQ9Vs~lDfavx%XRTL z#cem=j5_EE9zC;r2lSUhF3~$u(i|za{GPb$%r{9jcztA-X_&Ukkupb$p5J
[PATCH kvm-unit-tests 00/10] Minor cleanup and fix for 32bit test
The following series do some minor cleanup and fix for the 32bit test. --- Jason Wang (10): Do not track config.mak and kvmtrace Remove trailing whitespaces Makefile cleanup Correct the path in README Drop print.S Remove the duplicated rdmsr/wrmsr Correct the tss size Check whether in long mode before testing vmexit caused by cr8 access Do not test IA32_EFER in 32bit mode. Add the 32bit smp initialization code Makefile |9 -- README| 10 +-- config-x86-common.mak | 21 +++--- config.mak|8 -- kvmtrace | Bin x86/access.c |2 - x86/cstart.S | 179 +++-- x86/cstart64.S|6 +- x86/msr.c | 18 + x86/print.S | 31 x86/sieve.c |2 - x86/vm.c |6 +- x86/vmexit.c | 34 - 13 files changed, 215 insertions(+), 111 deletions(-) delete mode 100644 config.mak delete mode 100755 kvmtrace delete mode 100644 x86/print.S -- Jason Wang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On 08/24/2010 04:37 PM, Gleb Natapov wrote: On Tue, Aug 24, 2010 at 04:13:38PM +0300, Avi Kivity wrote: On 08/24/2010 02:30 PM, Gleb Natapov wrote: x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. So now we have an undocumented -1/0/1 return code? Better to have an enum for this. We already have two. First is X86EMUL_ (not enum but close) for more or less internal emulator use. Second is EMULATE_* for users of emulate_instruction() now you want one more enum for communication between emulate_instruction() and x86_emulate_insn(). Lost in enums. emulate_instruction() and x86_emulate_insn() are tightly coupled right now should we define formal interface between them? May be comment will be enough? Can we reuse one or the other? Perhaps with extensions? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On Tue, Aug 24, 2010 at 04:13:38PM +0300, Avi Kivity wrote: > On 08/24/2010 02:30 PM, Gleb Natapov wrote: > >x86_emulate_insn() will return 1 if instruction can be restarted > >without re-entering a guest. > > > > So now we have an undocumented -1/0/1 return code? > > Better to have an enum for this. > We already have two. First is X86EMUL_ (not enum but close) for more or less internal emulator use. Second is EMULATE_* for users of emulate_instruction() now you want one more enum for communication between emulate_instruction() and x86_emulate_insn(). Lost in enums. emulate_instruction() and x86_emulate_insn() are tightly coupled right now should we define formal interface between them? May be comment will be enough? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM timekeeping and TSC virtualization
On 08/23/10 23:47, Zachary Amsden wrote: > I've heard the rumor that TSC is orders of magnitude faster under VMware > than under KVM from three people now, and I thought you were part of > that camp. > > Needless to say, they are either laughably incorrect, or possess some > great secret knowledge of how to make things under virtualization go > faster than bare metal. > > I also have a magical talking unicorn, which, btw, is invisible. > Extraordinary claims require extraordinary proof (the proof of my > unicorn is too complex to fit in the margin of this e-mail, however, I > assure you he is real). I have put in a lot of time over the past 3 years to understand how the 'magic' of virtualization works; please don't lump me into camps until I raise my hand as being part of one. >> My point is that kvmclock is Red Hat's answer for the future -- RHEL6, >> RHEL5.Y (whenever it proves reliable). What about the present? What >> about products based on other distributions newer than RHEL5 but >> pre-kvmclock? >> > > It should be obvious from this patchset... PIT or TSC. > > KVM did not have an in-kernel PIT implementation circa 2008, so this > data is quite old. It's much faster now and will continue to get faster > as exit cost goes down and the emulation gets further optimized. It was in-kernel pit in early 2008 (kernel git entry): commit 7837699fa6d7adf81f26ab73a5f6897ea1ab9d6a Author: Sheng Yang Date: Mon Jan 28 05:10:22 2008 +0800 KVM: In kernel PIT model > > Plus, now we have an error-free TSC. > >> There are a lot of moving windows of what to use as a clock source, not >> just per major number (RHEL4, RHEL5) but minor number (e.g., TSC >> stability on RHEL4 -- e.g., >> https://bugzilla.redhat.com/show_bug.cgi?id=491154) and further >> maintenance releases (kvmclock requiring RHEL5.5+). That is not very >> friendly to a product making a transition to virtualization - and with >> the same code base running bare metal or in a VM. >> > > If you have old software running on broken hardware you do not get > hardware performance and error-free time virtualization. With any > vendor. Period. Sucks to be old *and* broken. But old with fancy new wheels, er hardware -- like commodity x86 servers running Nehalem-based processors -- is a different story. > > With this patchset, KVM now has a much stronger guarantee: If you have > old guest software running on broken hardware, using SMP virtual > machines, you do not get hardware performance and error-free time > virtualization.However, if you have new guest software, non-broken > hardware, or can simply run UP guests instead of SMP, you can have > hardware performance, and it is now error free. Alternatively, you can > sacrifice some accuracy and have hardware performance, even for SMP > guests, if you can tolerate some minor cross-CPU TSC variation. No > other vendor I know of can make that guarantee. > > Zach If the processor has a stable TSC why trap it? I realize you are trying to cover a gauntlet of hardware and guests, so maybe a nerd knob is needed to disable. David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call cancelled [was: Re: KVM call agenda for August 24]
* Anthony Liguori (anth...@codemonkey.ws) wrote: > On 08/23/2010 05:30 PM, Chris Wright wrote: > >Please send in any agenda items you are interested in covering. > > There are quite a few important discussions on the list but I think > they should stay on the list right now. > > So it sounds like we don't have an agenda for today. No agenda, so this week's call is cancelled. thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Rename PCI_FUNC_MAX to PCI_FUNCTIONS_PER_DEVICES in pci.[ch]
PCI_FUNC_MAX is introduced by 6eab3de16d36c48a983366b09d0a0029a5260bc3 and 6fa84913eccec4266a27c81ae88465f6790742b9 which should be safe to rename to PCI_FUNCTIONS_PER_DEVICES. Signed-off-by: Ken CC --- hw/pci.c |4 ++-- hw/pci.h |1 - 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index fc4becd..3901455 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -711,7 +711,7 @@ static int pci_init_multifunction(PCIBus *bus, PCIDevice *dev) return 0; } /* function 0 indicates single function, so function > 0 must be NULL */ -for (func = 1; func < PCI_FUNC_MAX; ++func) { +for (func = 1; func < PCI_FUNCTIONS_PER_DEVICE; ++func) { if (bus->devices[PCI_DEVFN(slot, func)]) { error_report("PCI: %x.0 indicates single function, " "but %x.%x is already populated.", @@ -750,7 +750,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, assert(devfn / PCI_FUNCTIONS_PER_DEVICE < PCIBUS_MAX_DEVICES); if (devfn < 0) { for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices); -devfn += PCI_FUNC_MAX) { +devfn += PCI_FUNCTIONS_PER_DEVICE) { if (!bus->devices[devfn]) goto found; } diff --git a/hw/pci.h b/hw/pci.h index eb97b76..f6fb6d8 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -16,7 +16,6 @@ struct kvm_irq_routing_entry; #define PCI_DEVFN(slot, func) slot) & 0x1f) << 3) | ((func) & 0x07)) #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f) #define PCI_FUNC(devfn) ((devfn) & 0x07) -#define PCI_FUNC_MAX8 #define PCI_FUNCTIONS_PER_DEVICE 8 #define PCIBUS_MAX_DEVICES 32 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Check pci slot number against PCIBUS_MAX_DEVICES in parse_pci_devfn
If pci addr provided from command line is bigger than 32, PCIBUS_MAX_DEVICES, return error -EINVAL. 32 << 3 | 7 == 256 (PCIBUS_MAX_FUNCTIONS) PCIBUS_MAX_FUNCTIONS = PCIBUS_MAX_DEVICES * PCI_FUNCTIONS_PER_DEVICE Signed-off-by: Ken CC --- hw/qdev-properties.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c index 9219cd7..565fd08 100644 --- a/hw/qdev-properties.c +++ b/hw/qdev-properties.c @@ -1,5 +1,5 @@ #include "net.h" -#include "qdev.h" +#include "pci.h" #include "qerror.h" void *qdev_get_prop_ptr(DeviceState *dev, Property *prop) @@ -514,6 +514,8 @@ static int parse_pci_devfn(DeviceState *dev, Property *prop, const char *str) return -EINVAL; } } +if (slot >= PCIBUS_MAX_DEVICES) +return -EINVAL; if (str[n] != '\0') return -EINVAL; if (fn > 7) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] pci init: fail qemu if devfn exceeding the max function number supported on bus
Check if devfn < PCIBUS_MAX_DEVICES * PCI_FUNCTIONS_PER_DEVICE Signed-off-by: Ken CC --- hw/pci.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 9234fe3..fc4becd 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -747,6 +747,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, PCIConfigWriteFunc *config_write, bool is_bridge) { +assert(devfn / PCI_FUNCTIONS_PER_DEVICE < PCIBUS_MAX_DEVICES); if (devfn < 0) { for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices); devfn += PCI_FUNC_MAX) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] PCI: define PCIBUS_MAX_DEVICES and PCI_FUNCTIONS_PER_DEVICE in pci.h
And update the max function number used in struct PCIBus{} to PCIBUS_MAX_FUNCTIONS = PCI_FUNCTIONS_PER_DEVICE * PCIBUS_MAX_DEVICES TODO: according to Avi Kivity, PCIBus.devices[] should be renamed to functions[] Signed-off-by: Ken CC --- hw/pci.c |2 +- hw/pci.h |2 ++ 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 70dbace..9234fe3 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -47,7 +47,7 @@ struct PCIBus { pci_hotplug_fn hotplug; DeviceState *hotplug_qdev; void *irq_opaque; -PCIDevice *devices[256]; +PCIDevice *devices[PCIBUS_MAX_DEVICES * PCI_FUNCTIONS_PER_DEVICE]; PCIDevice *parent_dev; target_phys_addr_t mem_base; diff --git a/hw/pci.h b/hw/pci.h index ccb99d0..eb97b76 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -17,6 +17,8 @@ struct kvm_irq_routing_entry; #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f) #define PCI_FUNC(devfn) ((devfn) & 0x07) #define PCI_FUNC_MAX8 +#define PCI_FUNCTIONS_PER_DEVICE 8 +#define PCIBUS_MAX_DEVICES 32 /* Class, Vendor and Device IDs from Linux's pci_ids.h */ #include "pci_ids.h" -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
Am 24.08.2010 14:22, schrieb Avi Kivity: > First of all we need a virtio/s390 specification, like we have a > virtio/pci spec. Here is something that I started a year ago but never finished. Christian guest/host interface for s390/virtio devices KVM_DEVICE_DESCRIPTOR PAGE 0816 2431 +++++ 0 | type | num_vq | featlen| conflen| +++++ 1 | status | | ++ +-->+ 2 | config| | + + | ... | | +-+ | | CONFIG ARRAY | 0816 2431 +>+++++ 0 | | + + 1 | | + + 2 | | +virtqueue 0+-->+ 3 | | | + + | 4 | | | + + | 5 | | | +++++ | |///+ | |///+ | +++++ | | | | + + | | | | + + | | | | +virtqueue num_vq - 1 +-->+ | | | + + | | | | + + | | | | +++++ | num_vq*6 | feature bits (featlen * 2 Bytes) | | + +++ | | | | | +++ + | | config space (conflen Bytes) | | + ++ | | || ++++| | | +-+ | | VIRTQUEUE | 0816 2431 |>+++++ 0 | interrupt token ... | + + 1 | ... set by guest | +++++ 2 | virtio ring address | + + 3 | ... set by host | +++++ 4 | number of elems |/+ +++/+ 5 |///+ +++++ COMPLETE KVM_DEVICE_DESCRIPTOR PAGE 0816 2431 +++++ 0 | type | num_vq | featlen| conflen| +++++ 1 | status | interrupt token | ++ + 2 | virtqueue 0(set by guest) | +++++ 3 || virtio ring address | ++ + 4 | virtqueue 0 (set by host) | +++++ 5 || number of descr || +++++ 6 |///| +++++ 7 || interrupt token| +---
Re: [PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function
On 08/24/2010 04:20 PM, Gleb Natapov wrote: +{ + struct decode_cache *c =&ctxt->decode; + + /* All REP prefixes have the same first termination condition */ + if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) + return true; This is checked during the beginning of the instruction, not after completion. Why is it here? it will just be duplicated. SDM describes REP instruction algorithm this way: WHILE CountReg ≠ 0 DO Service pending interrupts (if any); Execute associated string instruction; CountReg ← (CountReg – 1); IF CountReg = 0 THEN exit WHILE loop; FI; IF (Repeat prefix is REPZ or REPE) and (ZF = 0) or (Repeat prefix is REPNZ or REPNE) and (ZF = 1) THEN exit WHILE loop; FI; OD; So CountReg is checked at the beginning and after each iteration. The second check is meaningless (and ZF checks should be qualified with the actual instruction). Practically it will save us one return to a guest and exit back to emulator at the end of rep instruction (not a big deal). Not even that - if we reenter to the beginning of the rep instruction the cpu will skip over it without exiting (unless in big real mode with eigs=1). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function
On Tue, Aug 24, 2010 at 04:11:20PM +0300, Avi Kivity wrote: > On 08/24/2010 02:30 PM, Gleb Natapov wrote: > >Signed-off-by: Gleb Natapov > >--- > > arch/x86/kvm/emulate.c | 42 +- > > 1 files changed, 29 insertions(+), 13 deletions(-) > > > >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > >index f9f8353..d34d706 100644 > >--- a/arch/x86/kvm/emulate.c > >+++ b/arch/x86/kvm/emulate.c > >@@ -2921,6 +2921,32 @@ done: > > return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; > > } > > > >+static bool string_inst_completed(struct x86_emulate_ctxt *ctxt) > > s/inst/insn/. > > >+{ > >+struct decode_cache *c =&ctxt->decode; > >+ > >+/* All REP prefixes have the same first termination condition */ > >+if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) > >+return true; > > This is checked during the beginning of the instruction, not after > completion. Why is it here? it will just be duplicated. > SDM describes REP instruction algorithm this way: WHILE CountReg ≠ 0 DO Service pending interrupts (if any); Execute associated string instruction; CountReg ← (CountReg – 1); IF CountReg = 0 THEN exit WHILE loop; FI; IF (Repeat prefix is REPZ or REPE) and (ZF = 0) or (Repeat prefix is REPNZ or REPNE) and (ZF = 1) THEN exit WHILE loop; FI; OD; So CountReg is checked at the beginning and after each iteration. Practically it will save us one return to a guest and exit back to emulator at the end of rep instruction (not a big deal). > >+ > >+/* The second termination condition only applies for REPE > >+ * and REPNE. Test if the repeat string operation prefix is > >+ * REPE/REPZ or REPNE/REPNZ and if it's the case it tests the > >+ * corresponding termination condition according to: > >+ * - if REPE/REPZ and ZF = 0 then done > >+ * - if REPNE/REPNZ and ZF = 1 then done > >+ */ > >+if (((c->b == 0xa6) || (c->b == 0xa7) || > >+ (c->b == 0xae) || (c->b == 0xaf)) > >+&& (((c->rep_prefix == REPE_PREFIX)&& > >+ ((ctxt->eflags& EFLG_ZF) == 0)) > >+|| ((c->rep_prefix == REPNE_PREFIX)&& > >+((ctxt->eflags& EFLG_ZF) == EFLG_ZF > >+return true; > >+ > >+return false; > >+} > >+ > > -- > error compiling committee.c: too many arguments to function -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
On 08/24/2010 02:30 PM, Gleb Natapov wrote: x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. So now we have an undocumented -1/0/1 return code? Better to have an enum for this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -kvm] kvm: fix regression from rework KVM mmu_shrink() code
On Tue, Aug 24, 2010 at 10:31:07AM +0800, Xiaotian Feng wrote: > Latest kvm mmu_shrink code rework makes kernel changes > kvm->arch.n_used_mmu_pages/ > kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is > called > by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or > kvm_mmu_available_pages(vcpu->kvm) is unchanged after > kvm_mmu_prepare_zap_page(), > This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever. > Moving kvm_mmu_commit_zap_page would make the while loop performs as normal. > > Reported-by: Avi Kivity > Signed-off-by: Xiaotian Feng > Tested-by: Avi Kivity > Cc: Marcelo Tosatti > Cc: Dave Hansen > Cc: Tim Pepper > --- > arch/x86/kvm/mmu.c |7 +++ > 1 files changed, 3 insertions(+), 4 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function
On 08/24/2010 02:30 PM, Gleb Natapov wrote: Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 42 +- 1 files changed, 29 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f9f8353..d34d706 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2921,6 +2921,32 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static bool string_inst_completed(struct x86_emulate_ctxt *ctxt) s/inst/insn/. +{ + struct decode_cache *c =&ctxt->decode; + + /* All REP prefixes have the same first termination condition */ + if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) + return true; This is checked during the beginning of the instruction, not after completion. Why is it here? it will just be duplicated. + + /* The second termination condition only applies for REPE +* and REPNE. Test if the repeat string operation prefix is +* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the +* corresponding termination condition according to: +* - if REPE/REPZ and ZF = 0 then done +* - if REPNE/REPNZ and ZF = 1 then done +*/ + if (((c->b == 0xa6) || (c->b == 0xa7) || +(c->b == 0xae) || (c->b == 0xaf)) + && (((c->rep_prefix == REPE_PREFIX)&& +((ctxt->eflags& EFLG_ZF) == 0)) + || ((c->rep_prefix == REPNE_PREFIX)&& + ((ctxt->eflags& EFLG_ZF) == EFLG_ZF + return true; + + return false; +} + -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for August 24
On 08/23/2010 05:30 PM, Chris Wright wrote: Please send in any agenda items you are interested in covering. There are quite a few important discussions on the list but I think they should stay on the list right now. So it sounds like we don't have an agenda for today. Regards, Anthony Liguori thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having trouble with ballooning
Am 23.08.2010 20:16, schrieb Marcelo Tosatti: On Thu, Aug 05, 2010 at 01:41:16PM +0200, Moritz Duge wrote: Hi, I had some trouble while using the ballooning feature of KVM (using Ubuntu 10.04 with standard software versions). The first scenario: 1. Having a guest started by this command: qemu -enable-kvm -m 768 -balloon virtio -cdrom linux_2.6.34.iso The guest is running Linux 2.6.34 including the ballooning driver. 2. Entering "balloon 256" and a few seconds later "info balloon" in the Qemu monitor. Qemu will report, the guest uses 256mb of memory now. The guest is reporting the same (using "free" for example). 3. Entering "change ide1-cd0 linux_2.6.18.iso" to change the guests CD-ROM to another image, containing a Linux kernel without ballooning driver. 4. Rebooting the guest. 5. After booting the 2.6.18-OS, it will report it has 768mb memory (using "free). But Qemu monitor will still tell 256, when entering "info memory". I know why this happens. But is this a good behaviour? Shouldn't Qemu tell something like "maybe 256, but there is no more balloon driver in the guest and maybe it uses the full 768 now"??? What version of qemu-kvm are you using? Reporting should work properly with a recent qemu-kvm version. I tested it on 2 systems: - Ubuntu 10.04 and it's native version of Qemu+KVM (it's a 0.12.3) - Debian 5.0.5 on which I compiled a 2.6.34 kernel, it's KVM module and Qemu 0.12.4 from sources. On both systems, the command "info balloon" outputs the value I set, when the ballooning has been still active. And it continues to report this value, also if the guest has no more ballooning driver. It also continues to report this value, when the guest starts to use more, then the ballooned down memory. (If you have a look at scenario one: If the guests starts to use more then 256mb, up to 768, "info balloon" still reports 256) For testing I actually booted a Debian 5.0.5 guest, ballooned down the memory to 256, inserted a Kubuntu 6.06 cdrom-iso and rebooted the guest from that cdrom-iso. Kubuntu 6.06 comes with a 2.6.18 kernel, so it has no virtio support at all. After that, I created a tmpfs-mount in the Kubuntu guest and filled it using "dd if=/dev/hdc of=/tmpfs-mount/foobar". The second scenario: After the first scenario, the guest can also really start using the additional 512mb of memory (768 - 256)!!! I think this shouldn't happen or at least there should be an option to allow or deny this. Or at least least least this should be printed in big letters in the man-pages or somewhere else where everyone will read it! Because before I experienced this, I assumed I can be sure the guest can't get back the memory which was freed using ballooning. So if I use the memory freed by ballooning for some other qemu-instances and the first one starts to use those memory again, all Qemu instances will crash (this is what actually happens in most cases). What I'm asking for, is a way to force the guest to stay in the memory I assigned by ballooning. And if the guest tries to use more memory (maybe because it just unloaded the ballooning driver) the guest should crash, but the host shouldn't get in any trouble!!! This can be really annoying. I think a very common use-case for virtualization is, to run untrusted software or unsecure webservices in a vm, so the bad software can't do anything to the host or other VMs on the host. But when using ballooning, the bad software can! It's no "remote code execution", but the guest can consume a lot of memory and cause the host or at least the other VMs on the host to crash. Ballooning requires guest cooperation. If you want to enforce memory limits, take a look at cgroups (Documentation/cgroups/memory.txt in the kernel source tree). Wohooo! That's what I was looking for! It's just a little sad, that I have to guess the overhead made by the Qemu process itself, to add it to the memory limit I want to set. A hard ballooning implemented by Qemu would look a little "cleaner" I think. But for the beginning this is a nice way to do it, also because it gives the possibility to use the swap if the guest ignores the ballooning, so the guest doesn't has to crash. The third scenario: 1. Booting a machine with a guest not having a ballooning driver. (e.g. qemu -enable-kvm -m 768 -balloon virtio -cdrom linux_2.6.18.iso) 2. Adjusting the memory by "balloon 512" in the qemu monitor. 3. Qemu won't report that it couldn't adjust the memory. Instead it will wait until the guest loads a ballooning driver. Is this a good behaviour? Shouldn't there be at least a switch in the qemu monitor for the command "balloon". So if I use the switch when changing the memory (e.g. "balloon -h 256"), qemu won't try to change the memory later and it will tell me "error: no ballooning driver found". You see that that guest has not ballooned down with the output from "info balloon". Thanks for reading and thanks a lot, if there will be a solution for this, s
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
On 08/24/2010 03:32 PM, Alexander Graf wrote: Perhaps we should freeze virtio/s390 development until someone feels sufficiently motivated. Sure, go ahead. I don't think that'll help anyone but if it makes you feel good... I don't maintain virtio or the virtio-s390 interface, so I can't freeze it, but I think we should do so. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 08/12] Inject asynchronous page fault into a guest if page is swapped out.
On 08/24/2010 03:28 PM, Gleb Natapov wrote: On Mon, Aug 23, 2010 at 07:17:20PM +0300, Avi Kivity wrote: +static int apf_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + if (unlikely(vcpu->arch.apf_memslot_ver != +vcpu->kvm->memslot_version)) { + u64 gpa = vcpu->arch.apf_msr_val& ~0x3f; + unsigned long addr; + int offset = offset_in_page(gpa); + + addr = gfn_to_hva(vcpu->kvm, gpa>> PAGE_SHIFT); + vcpu->arch.apf_data = (u32 __user *)(addr + offset); + if (kvm_is_error_hva(addr)) { + vcpu->arch.apf_data = NULL; + return -EFAULT; + } + } + + return put_user(val, vcpu->arch.apf_data); +} This nice cache needs to be outside apf to reduce complexity for reviewers and since it is useful for others. Would be good to have memslot-cached kvm_put_guest() and kvm_get_guest(). Something like this? (only compile tested) Yes, exactly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
Avi Kivity wrote: > On 08/24/2010 03:25 PM, Alexander Graf wrote: >> Avi Kivity wrote: >>> On 08/24/2010 03:14 PM, Christian Borntraeger wrote: I have no strong opinion on that, but I think its more a matter of where to put an interface description. A header file seems just the right place. I will let you (or Rusty) decide. >>> First of all we need a virtio/s390 specification, like we have a >>> virtio/pci spec. >> Sure, go ahead and write one :). All the bits are open. >> No seriously, I've wanted to write one for quite a while but this is not >> the right patch set for this. > > Perhaps we should freeze virtio/s390 development until someone feels > sufficiently motivated. Sure, go ahead. I don't think that'll help anyone but if it makes you feel good... Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
On 08/24/2010 03:25 PM, Alexander Graf wrote: Avi Kivity wrote: On 08/24/2010 03:14 PM, Christian Borntraeger wrote: I have no strong opinion on that, but I think its more a matter of where to put an interface description. A header file seems just the right place. I will let you (or Rusty) decide. First of all we need a virtio/s390 specification, like we have a virtio/pci spec. Sure, go ahead and write one :). All the bits are open. No seriously, I've wanted to write one for quite a while but this is not the right patch set for this. Perhaps we should freeze virtio/s390 development until someone feels sufficiently motivated. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 08/12] Inject asynchronous page fault into a guest if page is swapped out.
On Mon, Aug 23, 2010 at 07:17:20PM +0300, Avi Kivity wrote: > > > >+static int apf_put_user(struct kvm_vcpu *vcpu, u32 val) > >+{ > >+if (unlikely(vcpu->arch.apf_memslot_ver != > >+ vcpu->kvm->memslot_version)) { > >+u64 gpa = vcpu->arch.apf_msr_val& ~0x3f; > >+unsigned long addr; > >+int offset = offset_in_page(gpa); > >+ > >+addr = gfn_to_hva(vcpu->kvm, gpa>> PAGE_SHIFT); > >+vcpu->arch.apf_data = (u32 __user *)(addr + offset); > >+if (kvm_is_error_hva(addr)) { > >+vcpu->arch.apf_data = NULL; > >+return -EFAULT; > >+} > >+} > >+ > >+return put_user(val, vcpu->arch.apf_data); > >+} > > This nice cache needs to be outside apf to reduce complexity for > reviewers and since it is useful for others. > > Would be good to have memslot-cached kvm_put_guest() and kvm_get_guest(). > Something like this? (only compile tested) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c13cc48..9aa3dd2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -168,10 +168,18 @@ struct kvm_irq_routing_table {}; struct kvm_memslots { int nmemslots; + u32 generation; struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; }; +struct gfn_to_hva_cache { + u32 generation; + gpa_t gpa; + unsigned long hva; + struct kvm_memory_slot *memslot; +}; + struct kvm { spinlock_t mmu_lock; raw_spinlock_t requests_lock; @@ -315,12 +323,16 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, int offset, int len); int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, unsigned long len); +int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + void *data, unsigned long len); int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len); int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len); struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, +gfn_t gfn); void kvm_vcpu_block(struct kvm_vcpu *vcpu); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b78b794..512cf9b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -685,6 +685,7 @@ skip_lpage: memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); if (mem->slot >= slots->nmemslots) slots->nmemslots = mem->slot + 1; + slots->generation++; slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID; old_memslots = kvm->memslots; @@ -721,6 +722,7 @@ skip_lpage: memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); if (mem->slot >= slots->nmemslots) slots->nmemslots = mem->slot + 1; + slots->generation++; /* actual memory is freed via old in kvm_free_physmem_slot below */ if (!npages) { @@ -1175,6 +1177,36 @@ int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, return 0; } +int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + void *data, unsigned long len) +{ + int r; + gfn_t gfn = ghc->gpa >> PAGE_SHIFT; + struct kvm_memslots *slots = kvm_memslots(kvm); + + if (slots->generation != ghc->generation) { + int offset = offset_in_page(ghc->gpa); + + ghc->hva = gfn_to_hva(kvm, gfn); + if (!kvm_is_error_hva(ghc->hva)) + ghc->hva += offset; + ghc->generation = slots->generation; + + ghc->memslot = gfn_to_memslot(kvm, gfn); + } + + if (kvm_is_error_hva(ghc->hva)) + return -EFAULT; + + r = copy_to_user((void __user *)ghc->hva, data, len); + if (r) + return -EFAULT; + mark_page_dirty_in_slot(kvm, ghc->memslot, gfn); + + return 0; +} +EXPORT_SYMBOL_GPL(kvm_write_guest_cached); + int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len) { return kvm_write_guest_page(kvm, gfn, empty_zero_page, offset, len); @@ -1200,11 +1232,9 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) } EXPORT_SYMBOL_GPL(kvm_clear_guest); -void mark_page_dirty(struct kvm *kvm, gfn_t gfn) +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, +gfn_t gfn) { - struct kvm_memory_slot *memslot; - -
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
Avi Kivity wrote: > On 08/24/2010 03:14 PM, Christian Borntraeger wrote: >> >> I have no strong opinion on that, but I think its more a matter of where >> to put an interface description. A header file seems just the right >> place. >> I will let you (or Rusty) decide. > > First of all we need a virtio/s390 specification, like we have a > virtio/pci spec. Sure, go ahead and write one :). All the bits are open. No seriously, I've wanted to write one for quite a while but this is not the right patch set for this. > Second, I agree it should be in an exported header file, even if qemu > doesn't make use of it. It's an interface and should be exported. Hrm. *shrug* if you think it makes sense. I'm reasonably indifferent either way, but I don't see value-add in it. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
On 08/24/2010 03:14 PM, Christian Borntraeger wrote: I have no strong opinion on that, but I think its more a matter of where to put an interface description. A header file seems just the right place. I will let you (or Rusty) decide. First of all we need a virtio/s390 specification, like we have a virtio/pci spec. Second, I agree it should be in an exported header file, even if qemu doesn't make use of it. It's an interface and should be exported. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
On 08/24/2010 03:16 PM, Chen Cao wrote: On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote: On 08/24/2010 03:07 PM, Isaku Yamahata wrote: On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote: On 08/24/2010 02:35 PM, Isaku Yamahata wrote: Add Cc: m...@redhat.com. MAX_PCI_SLOTS should be in pci.h instead of qdev.h? And the name should be start with PCI_ prefix for consistency? Except that, the patches look okay. These aren't slots, are they? They are functions. The function checks if given $slot.$fn (or $slot) is valid. So it's slots. max 32. +assert(devfn< PCIBUS_MAX_DEVICES); Looks like we're comparing a function number to PCIBUS_MAX_DEVICES. PCIBUS_MAX_DEVICES is the size of PCIBus.devices[], I have added it in the first patch at the defination of struct PCIBus, line 50 hw/pci.c. so i think the better name of the macro should be PCIBUS_MAX_FN, right? Or make it 32 and scale it by PCI_FUNCTIONS_PER_DEVICE. PCIBus.devices[] should be renamed to functions[] (later). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote: > On 08/24/2010 03:07 PM, Isaku Yamahata wrote: > >On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote: > >> On 08/24/2010 02:35 PM, Isaku Yamahata wrote: > >>>Add Cc: m...@redhat.com. > >>> > >>>MAX_PCI_SLOTS should be in pci.h instead of qdev.h? > >>>And the name should be start with PCI_ prefix for consistency? > >>> > >>>Except that, the patches look okay. > >>> > >>These aren't slots, are they? They are functions. > >The function checks if given $slot.$fn (or $slot) is valid. > >So it's slots. max 32. > > +assert(devfn< PCIBUS_MAX_DEVICES); > > > Looks like we're comparing a function number to PCIBUS_MAX_DEVICES. > PCIBUS_MAX_DEVICES is the size of PCIBus.devices[], I have added it in the first patch at the defination of struct PCIBus, line 50 hw/pci.c. so i think the better name of the macro should be PCIBUS_MAX_FN, right? Ken > -- > error compiling committee.c: too many arguments to function > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
On 08/24/2010 03:24 PM, Isaku Yamahata wrote: On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote: On 08/24/2010 03:07 PM, Isaku Yamahata wrote: On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote: On 08/24/2010 02:35 PM, Isaku Yamahata wrote: Add Cc: m...@redhat.com. MAX_PCI_SLOTS should be in pci.h instead of qdev.h? And the name should be start with PCI_ prefix for consistency? Except that, the patches look okay. These aren't slots, are they? They are functions. The function checks if given $slot.$fn (or $slot) is valid. So it's slots. max 32. +assert(devfn< PCIBUS_MAX_DEVICES); Looks like we're comparing a function number to PCIBUS_MAX_DEVICES. Ah, you're talking about 2/3. I talked about 3/3. You're right. The name is misleading. PCIBUS_MAX_FUNCTIONS? Or suggestions? Or assert(devfn / 8 < PCIBUS_MAX_DEVICES) (and change that to be 32). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote: > On 08/24/2010 03:07 PM, Isaku Yamahata wrote: >> On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote: >>> On 08/24/2010 02:35 PM, Isaku Yamahata wrote: Add Cc: m...@redhat.com. MAX_PCI_SLOTS should be in pci.h instead of qdev.h? And the name should be start with PCI_ prefix for consistency? Except that, the patches look okay. >>> These aren't slots, are they? They are functions. >> The function checks if given $slot.$fn (or $slot) is valid. >> So it's slots. max 32. > > +assert(devfn< PCIBUS_MAX_DEVICES); > > > Looks like we're comparing a function number to PCIBUS_MAX_DEVICES. Ah, you're talking about 2/3. I talked about 3/3. You're right. The name is misleading. PCIBUS_MAX_FUNCTIONS? Or suggestions? -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
Am 24.08.2010 14:06, schrieb Alexander Graf: >>> #define VIRTIO_SUBCODE_64 0x0D00 >>> +#define VIRTIO_PARAM_MASK 0xff >>> +#define VIRTIO_PARAM_VRING_INTERRUPT 0x0 >>> +#define VIRTIO_PARAM_CONFIG_CHANGED0x1 >>> >> >> Maybe this should be exported in a header, something like >> arch/s390/include/asm/kvm_virtio.h? In that case this file >> must be added to Kbuild for make headers_install. >> > > While that thought sounds good at first, it's really no use for anyone, > right? I mean - in qemu we need to define the defines manually anyways > because we need to potentially be able to build the s390x target on > non-s390x, possibly on non-Linux. I have no strong opinion on that, but I think its more a matter of where to put an interface description. A header file seems just the right place. I will let you (or Rusty) decide. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] S390: take a full byte as ext_param indicator
Christian Borntraeger wrote: > Am 23.08.2010 23:31, schrieb Alexander Graf: > >> Currenty the ext_param field only distinguishes between "config change" and >> "vring interrupt". We can do a lot more with it though, so let's enable a >> full byte of possible values and constants to #defines while at it. >> > > Makes a lot of sense. > [...] > > >> #define VIRTIO_SUBCODE_64 0x0D00 >> +#define VIRTIO_PARAM_MASK 0xff >> +#define VIRTIO_PARAM_VRING_INTERRUPT0x0 >> +#define VIRTIO_PARAM_CONFIG_CHANGED 0x1 >> > > Maybe this should be exported in a header, something like > arch/s390/include/asm/kvm_virtio.h? In that case this file > must be added to Kbuild for make headers_install. > While that thought sounds good at first, it's really no use for anyone, right? I mean - in qemu we need to define the defines manually anyways because we need to potentially be able to build the s390x target on non-s390x, possibly on non-Linux. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
On 08/24/2010 03:07 PM, Isaku Yamahata wrote: On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote: On 08/24/2010 02:35 PM, Isaku Yamahata wrote: Add Cc: m...@redhat.com. MAX_PCI_SLOTS should be in pci.h instead of qdev.h? And the name should be start with PCI_ prefix for consistency? Except that, the patches look okay. These aren't slots, are they? They are functions. The function checks if given $slot.$fn (or $slot) is valid. So it's slots. max 32. +assert(devfn< PCIBUS_MAX_DEVICES); Looks like we're comparing a function number to PCIBUS_MAX_DEVICES. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote: > On 08/24/2010 02:35 PM, Isaku Yamahata wrote: >> Add Cc: m...@redhat.com. >> >> MAX_PCI_SLOTS should be in pci.h instead of qdev.h? >> And the name should be start with PCI_ prefix for consistency? >> >> Except that, the patches look okay. >> > > These aren't slots, are they? They are functions. The function checks if given $slot.$fn (or $slot) is valid. So it's slots. max 32. > There's a lot of slot/function confusion in qemu. Agreed. -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
On 08/24/2010 02:35 PM, Isaku Yamahata wrote: Add Cc: m...@redhat.com. MAX_PCI_SLOTS should be in pci.h instead of qdev.h? And the name should be start with PCI_ prefix for consistency? Except that, the patches look okay. These aren't slots, are they? They are functions. There's a lot of slot/function confusion in qemu. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] x86: allow kernel exception fixup for divide errors (#DE)
On 08/24/2010 02:22 PM, Brian Gerst wrote: +dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code) +{ + if (!user_mode_vm(regs)&& fixup_exception(regs)) + return; + do_divide_error_user(regs, error_code); +} + #ifdef CONFIG_X86_64 /* Runs on IST stack */ dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code) Kernel mode exceptions should already be handled by do_trap(). This is unnecessary. Correct. Please ignore this patch. I've verified that the other two work without it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.
x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_emulate.h |1 - arch/x86/kvm/emulate.c | 47 --- arch/x86/kvm/x86.c | 16 ++-- 3 files changed, 30 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f22e5da..5e120a7 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -220,7 +220,6 @@ struct x86_emulate_ctxt { /* interruptibility state, as a result of execution of STI or MOV SS */ int interruptibility; - bool restart; /* restart string instruction after writeback */ bool perm_ok; /* do not check permissions if true */ int exception; /* exception that happens during emulation or -1 */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d34d706..d98fc3b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -438,7 +438,6 @@ static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec, ctxt->exception = vec; ctxt->error_code = error; ctxt->error_code_valid = valid; - ctxt->restart = false; } static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err) @@ -2618,9 +2617,6 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt) struct opcode opcode, *g_mod012, *g_mod3; struct operand memop = { .type = OP_NONE }; - /* we cannot decode insn before we complete previous rep insn */ - WARN_ON(ctxt->restart); - c->eip = ctxt->eip; c->fetch.start = c->fetch.end = c->eip; ctxt->cs_base = seg_base(ctxt, ops, VCPU_SREG_CS); @@ -2977,10 +2973,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt) } if (c->rep_prefix && (c->d & String)) { - ctxt->restart = true; - /* All REP prefixes have the same first termination condition */ + /* +* counter == 0 termination condition is checked before +* instruction execution. +*/ if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) { - ctxt->restart = false; ctxt->eip = c->eip; goto done; } @@ -3422,26 +3419,26 @@ writeback: struct read_cache *r = &ctxt->decode.io_read; register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); - - if (string_inst_completed(ctxt)) - ctxt->restart = false; - /* -* Re-enter guest when pio read ahead buffer is empty or, -* if it is not used, after each 1024 iteration. -*/ - else if ((r->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) || -(r->end != 0 && r->end == r->pos)) { - ctxt->restart = false; - c->eip = ctxt->eip; + if (!string_inst_completed(ctxt)) { + /* +* Re-enter guest when pio read ahead buffer is empty +* or, if it is not used, after each 1024 iteration. +*/ + if ((r->end != 0 || c->regs[VCPU_REGS_RCX] & 0x3ff) && + (r->end == 0 || r->end != r->pos)) { + /* +* Reset read cache. Usually happens before +* decode, but since instruction is restarted +* we have to do it here. +*/ + ctxt->decode.mem_read.end = 0; + return 1; /* restart instruction */ + } + goto done; /* skip rip writeback */ } } - /* -* reset read cache here in case string instruction is restared -* without decoding -*/ - ctxt->decode.mem_read.end = 0; - if (!ctxt->restart) - ctxt->eip = c->eip; + + ctxt->eip = c->eip; done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4e179c5..131f2c8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4066,18 +4066,17 @@ int emulate_instruction(struct kvm_vcpu *vcpu, restart: r = x86_emulate_insn(&vcpu->arch.emulate_ctxt); - if (r) { /* emulation failed */ + if (r < 0) { /* emulation failed */ if (reexecute_instruction(vcpu, cr2)) return EMULATE_DONE; return handle_emulation_failure(vcpu); } - r = EMULATE_DONE; - - if (vcpu->arch.emulate_ctxt.exception >= 0) + if (vcpu->arch.emulate_ctxt.exception >= 0) {
[PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function
Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c | 42 +- 1 files changed, 29 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f9f8353..d34d706 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2921,6 +2921,32 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static bool string_inst_completed(struct x86_emulate_ctxt *ctxt) +{ + struct decode_cache *c = &ctxt->decode; + + /* All REP prefixes have the same first termination condition */ + if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) + return true; + + /* The second termination condition only applies for REPE +* and REPNE. Test if the repeat string operation prefix is +* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the +* corresponding termination condition according to: +* - if REPE/REPZ and ZF = 0 then done +* - if REPNE/REPNZ and ZF = 1 then done +*/ + if (((c->b == 0xa6) || (c->b == 0xa7) || +(c->b == 0xae) || (c->b == 0xaf)) + && (((c->rep_prefix == REPE_PREFIX) && +((ctxt->eflags & EFLG_ZF) == 0)) + || ((c->rep_prefix == REPNE_PREFIX) && + ((ctxt->eflags & EFLG_ZF) == EFLG_ZF + return true; + + return false; +} + int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) { @@ -3395,19 +3421,9 @@ writeback: if (c->rep_prefix && (c->d & String)) { struct read_cache *r = &ctxt->decode.io_read; register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); - /* The second termination condition only applies for REPE -* and REPNE. Test if the repeat string operation prefix is -* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the -* corresponding termination condition according to: -* - if REPE/REPZ and ZF = 0 then done -* - if REPNE/REPNZ and ZF = 1 then done -*/ - if (((c->b == 0xa6) || (c->b == 0xa7) || -(c->b == 0xae) || (c->b == 0xaf)) - && (((c->rep_prefix == REPE_PREFIX) && -((ctxt->eflags & EFLG_ZF) == 0)) - || ((c->rep_prefix == REPNE_PREFIX) && - ((ctxt->eflags & EFLG_ZF) == EFLG_ZF + + + if (string_inst_completed(ctxt)) ctxt->restart = false; /* * Re-enter guest when pio read ahead buffer is empty or, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: x86 emulator: Rename variable that shadows another local variable.
Signed-off-by: Gleb Natapov --- arch/x86/kvm/emulate.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 017ae0c..f9f8353 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3393,7 +3393,7 @@ writeback: &c->dst); if (c->rep_prefix && (c->d & String)) { - struct read_cache *rc = &ctxt->decode.io_read; + struct read_cache *r = &ctxt->decode.io_read; register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1); /* The second termination condition only applies for REPE * and REPNE. Test if the repeat string operation prefix is @@ -3413,8 +3413,8 @@ writeback: * Re-enter guest when pio read ahead buffer is empty or, * if it is not used, after each 1024 iteration. */ - else if ((rc->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) || -(rc->end != 0 && rc->end == rc->pos)) { + else if ((r->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) || +(r->end != 0 && r->end == r->pos)) { ctxt->restart = false; c->eip = ctxt->eip; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn
Add Cc: m...@redhat.com. MAX_PCI_SLOTS should be in pci.h instead of qdev.h? And the name should be start with PCI_ prefix for consistency? Except that, the patches look okay. thanks, On Tue, Aug 24, 2010 at 02:49:27PM +0800, Ken CC wrote: > Define MAX_PCI_SLOTS as 0x1f, if pci addr provided from command line > is bigger than 0x1f, return error -EINVAL. > > 0x1f << 3 | 7 == 255 (PCIBUS_MAX_DEVICES - 1) > > Signed-off-by: Ken CC > --- > hw/qdev-properties.c |2 ++ > hw/qdev.h|3 +++ > 2 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c > index 9219cd7..96d814c 100644 > --- a/hw/qdev-properties.c > +++ b/hw/qdev-properties.c > @@ -514,6 +514,8 @@ static int parse_pci_devfn(DeviceState *dev, Property > *prop, const char *str) > return -EINVAL; > } > } > +if (slot > MAX_PCI_SLOTS) > +return -EINVAL; > if (str[n] != '\0') > return -EINVAL; > if (fn > 7) > diff --git a/hw/qdev.h b/hw/qdev.h > index 678f8b7..fcfe52e 100644 > --- a/hw/qdev.h > +++ b/hw/qdev.h > @@ -7,6 +7,9 @@ > #include "qemu-char.h" > #include "qemu-option.h" > > + > +#define MAX_PCI_SLOTS 0x1f > + > typedef struct Property Property; > > typedef struct PropertyInfo PropertyInfo; > > -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] x86: allow kernel exception fixup for divide errors (#DE)
On Tue, Aug 24, 2010 at 7:10 AM, Avi Kivity wrote: > KVM wants to emulate the DIV and IDIV instructions by executing them natively; > this can cause a #DE to be raised. > > Allow the exception handling mechanism to process #DE exceptions so KVM can > catch and process them. > > Signed-off-by: Avi Kivity > --- > arch/x86/kernel/traps.c | 10 +- > 1 files changed, 9 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c > index 725ef4d..dd313cf 100644 > --- a/arch/x86/kernel/traps.c > +++ b/arch/x86/kernel/traps.c > @@ -205,7 +205,8 @@ dotraplinkage void do_##name(struct pt_regs *regs, long > error_code) \ > do_trap(trapnr, signr, str, regs, error_code, &info); \ > } > > -DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip) > +static DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error_user, > + FPE_INTDIV, regs->ip) > DO_ERROR(4, SIGSEGV, "overflow", overflow) > DO_ERROR(5, SIGSEGV, "bounds", bounds) > DO_ERROR_INFO(6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip) > @@ -217,6 +218,13 @@ DO_ERROR(12, SIGBUS, "stack segment", stack_segment) > #endif > DO_ERROR_INFO(17, SIGBUS, "alignment check", alignment_check, BUS_ADRALN, 0) > > +dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code) > +{ > + if (!user_mode_vm(regs) && fixup_exception(regs)) > + return; > + do_divide_error_user(regs, error_code); > +} > + > #ifdef CONFIG_X86_64 > /* Runs on IST stack */ > dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code) Kernel mode exceptions should already be handled by do_trap(). This is unnecessary. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] x86: allow kernel exception fixup for divide errors (#DE)
KVM wants to emulate the DIV and IDIV instructions by executing them natively; this can cause a #DE to be raised. Allow the exception handling mechanism to process #DE exceptions so KVM can catch and process them. Signed-off-by: Avi Kivity --- arch/x86/kernel/traps.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 725ef4d..dd313cf 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -205,7 +205,8 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \ do_trap(trapnr, signr, str, regs, error_code, &info); \ } -DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip) +static DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error_user, +FPE_INTDIV, regs->ip) DO_ERROR(4, SIGSEGV, "overflow", overflow) DO_ERROR(5, SIGSEGV, "bounds", bounds) DO_ERROR_INFO(6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip) @@ -217,6 +218,13 @@ DO_ERROR(12, SIGBUS, "stack segment", stack_segment) #endif DO_ERROR_INFO(17, SIGBUS, "alignment check", alignment_check, BUS_ADRALN, 0) +dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code) +{ + if (!user_mode_vm(regs) && fixup_exception(regs)) + return; + do_divide_error_user(regs, error_code); +} + #ifdef CONFIG_X86_64 /* Runs on IST stack */ dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: x86 emulator: trap and propagate #DE from DIV and IDIV
Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 17 ++--- 1 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f82e43a..a7e26d0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -505,6 +505,12 @@ static void emulate_ts(struct x86_emulate_ctxt *ctxt, int err) emulate_exception(ctxt, TS_VECTOR, err, true); } +static int emulate_de(struct x86_emulate_ctxt *ctxt) +{ + emulate_exception(ctxt, DE_VECTOR, 0, false); + return X86EMUL_PROPAGATE_FAULT; +} + static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, unsigned long eip, u8 *dest) @@ -1459,6 +1465,7 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = &ctxt->decode; unsigned long *rax = &c->regs[VCPU_REGS_RAX]; unsigned long *rdx = &c->regs[VCPU_REGS_RDX]; + u8 de = 0; switch (c->modrm_reg) { case 0 ... 1: /* test */ @@ -1477,14 +1484,18 @@ static inline int emulate_grp3(struct x86_emulate_ctxt *ctxt, emulate_1op_rax_rdx("imul", c->src, *rax, *rdx, ctxt->eflags); break; case 6: /* div */ - emulate_1op_rax_rdx("div", c->src, *rax, *rdx, ctxt->eflags); + emulate_1op_rax_rdx_ex("div", c->src, *rax, *rdx, + ctxt->eflags, de); break; case 7: /* idiv */ - emulate_1op_rax_rdx("idiv", c->src, *rax, *rdx, ctxt->eflags); + emulate_1op_rax_rdx_ex("idiv", c->src, *rax, *rdx, + ctxt->eflags, de); break; default: return X86EMUL_UNHANDLEABLE; } + if (de) + return emulate_de(ctxt); return X86EMUL_CONTINUE; } @@ -3363,7 +3374,7 @@ special_insn: break; case 0xf6 ... 0xf7: /* Grp3 */ if (emulate_grp3(ctxt, ops) != X86EMUL_CONTINUE) - goto cannot_emulate; + goto done; break; case 0xf8: /* clc */ ctxt->eflags &= ~EFLG_CF; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Trap and propagate divide errors when emulating DIV
KVM recently started emulating DIV and IDIV. However, those instructions trap when given the right operands. Since figuring out when to trap or not is difficult, we just execute the instruction and see if the processor trapped or not. tip: please queue the first patch on fast-forward-only branch kvm.git can merge, or we can carry the patch in kvm.git with your ack. Avi Kivity (3): x86: allow kernel exception fixup for divide errors (#DE) KVM: x86 emulator: add macros for executing instructions that may trap KVM: x86 emulator: trap and propagate #DE from DIV and IDIV arch/x86/kernel/traps.c | 10 +++- arch/x86/kvm/emulate.c | 60 -- 2 files changed, 66 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: x86 emulator: add macros for executing instructions that may trap
Like DIV and IDIV. Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 43 +++ 1 files changed, 43 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 808934c..f82e43a 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -331,6 +331,27 @@ struct group_dual { "a" (_rax), "d" (_rdx)); \ } while (0) +#define __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, _eflags, _suffix, _ex) \ + do {\ + unsigned long _tmp; \ + \ + __asm__ __volatile__ ( \ + _PRE_EFLAGS("0", "5", "1") \ + "1: \n\t" \ + _op _suffix " %6; " \ + "2: \n\t" \ + _POST_EFLAGS("0", "5", "1") \ + ".pushsection .fixup,\"ax\" \n\t" \ + "3: movb $1, %4 \n\t" \ + "jmp 2b \n\t" \ + ".popsection \n\t" \ + _ASM_EXTABLE(1b, 3b)\ + : "=m" (_eflags), "=&r" (_tmp), \ + "+a" (_rax), "+d" (_rdx), "+qm"(_ex) \ + : "i" (EFLAGS_MASK), "m" ((_src).val), \ + "a" (_rax), "d" (_rdx)); \ + } while (0) + /* instruction has only one source operand, destination is implicit (e.g. mul, div, imul, idiv) */ #define emulate_1op_rax_rdx(_op, _src, _rax, _rdx, _eflags) \ do { \ @@ -342,6 +363,28 @@ struct group_dual { } \ } while (0) +#define emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, _eflags, _ex)\ + do {\ + switch((_src).bytes) { \ + case 1: \ + __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \ +_eflags, "b", _ex);\ + break; \ + case 2: \ + __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \ +_eflags, "w", _ex);\ + break; \ + case 4: \ + __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \ +_eflags, "l", _ex);\ + break; \ + case 8: ON64( \ + __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \ +_eflags, "q", _ex)); \ + break; \ + } \ + } while (0) + /* Fetch next part of the instruction being emulated. */ #define insn_fetch(_type, _size, _eip) \ ({ unsigned long _x; \ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/3] Allow emulation tests to trap exceptions
Some instructions trap on execution, we need a way to see if they raise an exception as expected. Signed-off-by: Avi Kivity --- config-x86-common.mak |3 ++- x86/emulator.c|2 ++ 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/config-x86-common.mak b/config-x86-common.mak index 19bffd4..0ed13e4 100644 --- a/config-x86-common.mak +++ b/config-x86-common.mak @@ -45,7 +45,8 @@ $(TEST_DIR)/vmexit.flat: $(cstart.o) $(TEST_DIR)/vmexit.o $(TEST_DIR)/smptest.flat: $(cstart.o) $(TEST_DIR)/smptest.o $(TEST_DIR)/emulator.flat: $(cstart.o) $(TEST_DIR)/emulator.o \ - $(TEST_DIR)/vm.o $(TEST_DIR)/print.o + $(TEST_DIR)/vm.o $(TEST_DIR)/print.o \ + $(TEST_DIR)/idt.o $(TEST_DIR)/port80.flat: $(cstart.o) $(TEST_DIR)/port80.o diff --git a/x86/emulator.c b/x86/emulator.c index b0d15c0..5d1659f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -1,6 +1,7 @@ #include "ioram.h" #include "vm.h" #include "libcflat.h" +#include "idt.h" #define memset __builtin_memset #define TESTDEV_IO_PORT 0xe0 @@ -582,6 +583,7 @@ int main() unsigned long t1, t2; setup_vm(); + setup_idt(); mem = vmap(IORAM_BASE_PHYS, IORAM_LEN); // test mov reg, r/m and mov r/m, reg -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 1/3] Support #DE (divide error) exception handlers
Signed-off-by: Avi Kivity --- x86/idt.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/x86/idt.c b/x86/idt.c index 590839f..4480833 100644 --- a/x86/idt.c +++ b/x86/idt.c @@ -100,6 +100,11 @@ asm (".pushsection .text \n\t" "pushq $13 \n\t" "jmp handle_exception \n\t" + "de_fault: \n\t" + "pushq $0 \n\t" + "pushq $0 \n\t" + "jmp handle_exception \n\t" + "handle_exception: \n\t" "push %r15; push %r14; push %r13; push %r12 \n\t" "push %r11; push %r10; push %r9; push %r8 \n\t" @@ -118,9 +123,10 @@ asm (".pushsection .text \n\t" void setup_idt(void) { -extern char ud_fault, gp_fault; +extern char ud_fault, gp_fault, de_fault; lidt(idt, 256); +set_idt_entry(&idt[0], &de_fault, 0); set_idt_entry(&idt[6], &ud_fault, 0); set_idt_entry(&idt[13], &gp_fault, 0); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 0/3] DIV exception tests
DIV may raise an exception if dividing by zero or causing an overflow. Test it. Avi Kivity (3): Support #DE (divide error) exception handlers Allow emulation tests to trap exceptions Add DIV tests config-x86-common.mak |3 ++- x86/emulator.c| 20 x86/idt.c |8 +++- 3 files changed, 29 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 3/3] Add DIV tests
Test divide-by-zero and normal cases. Signed-off-by: Avi Kivity --- x86/emulator.c | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/x86/emulator.c b/x86/emulator.c index 5d1659f..845e7a0 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -577,6 +577,23 @@ static void test_imul(ulong *mem) report("imul rax, mem, imm", a == 0x1D950BDE1D950BC8L); } +static void test_div(long *mem) +{ +long a, d; +u8 ex = 1; + +*mem = 0; a = 1; d = 2; +asm (ASM_TRY("1f") "divq %3; movb $0, %2; 1:" +: "+a"(a), "+d"(d), "+q"(ex) : "m"(*mem)); +report("divq (fault)", a == 1 && d == 2 && ex); + +*mem = 987654321098765UL; a = 123456789012345UL; d = 123456789012345UL; +asm (ASM_TRY("1f") "divq %3; movb $0, %2; 1:" +: "+a"(a), "+d"(d), "+q"(ex) : "m"(*mem)); +report("divq (1)", + a == 0x1ffb1b963b33ul && d == 0x273ba4384ede2ul && !ex); +} + int main() { void *mem; @@ -614,6 +631,7 @@ int main() test_btc(mem); test_bsfbsr(mem); test_imul(mem); + test_div(mem); printf("\nSUMMARY: %d tests, %d failures\n", tests, fails); return fails ? 1 : 0; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: confusion about kvm_mmu_pte_write
On 08/24/2010 11:16 AM, cs-jerry** wrote: hello all, i am reading kvm code and confused about the mmu_pte_write process, when update a gpte, function kvm_mmu_pte_write is called , and walk the shadow page for it.why condition is : if (sp->gfn != gfn || sp->role.direct || sp->role.invalid) continue; and not update those sp->gfn == gfn&& sp->role.direct,what does it mean when role.direct is clear? thanks.N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¤¾h§¶›¡Ü¨}©ž²Æ zÚ&j:+v‰¨¾«‘êçzZ+€Ê+zf£¢·hšˆ§~††Ûiÿûàz¹®w¥¢¸?™¨èÚ&¢)ߢfl=== Please fix you mail client not to send corrupted emails. Also, read Documentation/kvm/mmu.txt. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 v2] kvm: rework KVM mmu_shrink() code
On Tue, Aug 24, 2010 at 10:07 AM, Marcelo Tosatti wrote: > On Mon, Aug 23, 2010 at 07:11:11PM +0800, Xiaotian Feng wrote: >> On Mon, Aug 23, 2010 at 6:27 PM, Avi Kivity wrote: >> > On 08/23/2010 01:22 PM, Avi Kivity wrote: >> >> >> >> >> >> I see a lot of soft lockups with this patchset: >> > >> > This is running the emulator.flat test case, with shadow paging. This test >> > triggers a lot (millions) of mmu mode switches. >> > >> >> Does following patch fix your issue? >> >> Latest kvm mmu_shrink code rework makes kernel changes >> kvm->arch.n_used_mmu_pages/ >> kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, >> which is called >> by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or >> kvm_mmu_available_pages(vcpu->kvm) is unchanged after >> kvm_mmu_commit_zap_page(), >> This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages looping >> forever. >> Moving kvm_mmu_commit_zap_page would make the while loop performs as normal. >> >> --- >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index f52a965..7e09a21 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -1726,8 +1726,8 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, >> unsigned int goal_nr_mmu_pages) >> struct kvm_mmu_page, link); >> kvm_mmu_prepare_zap_page(kvm, page, >> >> &invalid_list); >> + kvm_mmu_commit_zap_page(kvm, &invalid_list); >> } >> - kvm_mmu_commit_zap_page(kvm, &invalid_list); >> goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages; >> } >> >> @@ -2976,9 +2976,9 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu) >> sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev, >> struct kvm_mmu_page, link); >> kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); >> + kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); >> ++vcpu->kvm->stat.mmu_recycled; >> } >> - kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); >> } >> >> int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code) > > Please resend with a signed-off-by, and proper subject for the patch. It's available at: https://patchwork.kernel.org/patch/125431/ Thanks Xiaotian > > Thanks > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 10/12] Handle async PF in non preemptable context
On 08/24/2010 12:36 PM, Gleb Natapov wrote: On Tue, Aug 24, 2010 at 12:30:25PM +0300, Avi Kivity wrote: On 07/19/2010 06:31 PM, Gleb Natapov wrote: If async page fault is received by idle task or when preemp_count is not zero guest cannot reschedule, so do sti; hlt and wait for page to be ready. vcpu can still process interrupts while it waits for the page to be ready. Acked-by: Rik van Riel Signed-off-by: Gleb Natapov --- arch/x86/kernel/kvm.c | 36 1 files changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index a6db92e..914b0fc 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -37,6 +37,7 @@ #include #include #include +#include #define MMU_QUEUE_SIZE 1024 @@ -68,6 +69,8 @@ struct kvm_task_sleep_node { wait_queue_head_t wq; u32 token; int cpu; + bool halted; + struct mm_struct *mm; }; static struct kvm_task_sleep_head { @@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 token) struct kvm_task_sleep_head *b =&async_pf_sleepers[key]; struct kvm_task_sleep_node n, *e; DEFINE_WAIT(wait); + int cpu, idle; + + cpu = get_cpu(); + idle = idle_cpu(cpu); + put_cpu(); spin_lock(&b->lock); e = _find_apf_task(b, token); @@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 token) n.token = token; n.cpu = smp_processor_id(); + n.mm = current->active_mm; + n.halted = idle || preempt_count()> 1; + atomic_inc(&n.mm->mm_count); init_waitqueue_head(&n.wq); hlist_add_head(&n.link,&b->list); spin_unlock(&b->lock); for (;;) { - prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE); + if (!n.halted) + prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE); if (hlist_unhashed(&n.link)) break; - schedule(); + + if (!n.halted) { + schedule(); + } else { + /* +* We cannot reschedule. So halt. +*/ If we get the wakeup here, we'll halt and never wake up again. We will not. IRQs are disabled here. native_safe_halt() enables them. Ok. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 09/12] Retry fault before vmentry
On 08/24/2010 12:33 PM, Gleb Natapov wrote: @@ -505,6 +506,37 @@ out_unlock: return 0; } +static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3, + gva_t addr, u32 error_code) +{ + int r = 0; + gpa_t curr_cr3 = vcpu->arch.cr3; + + if (curr_cr3 != cr3) { + /* +* We do page fault on behalf of a process that is sleeping +* because of async PF. PV guest takes reference to mm that cr3 +* belongs too, so it has to be valid here. +*/ + kvm_set_cr3(vcpu, cr3); + if (kvm_mmu_reload(vcpu)) + goto switch_cr3; + } With nested virtualization, we need to switch cr0, cr4, and efer as well... On SVM or VMX or both? Both. Let's defer this patch since it's an optimization, this is really complicated. + + r = FNAME(page_fault)(vcpu, addr, error_code, true); + + if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu)) + kvm_mmu_sync_roots(vcpu); Why is this needed? http://www.mail-archive.com/kvm@vger.kernel.org/msg37827.html KVM_REQ_MMU_SYNC request generated here must be processed before switching to a different cr3 (otherwise vcpu_enter_guest will process it with the wrong cr3 in place). Ah, it should be part of the cr3 switch block above. + +switch_cr3: + if (curr_cr3 != vcpu->arch.cr3) { + kvm_set_cr3(vcpu, curr_cr3); + kvm_mmu_reload(vcpu); + } + + return r; +} This has the nasty effect of flushing the TLB on AMD. What is more expansive reenter the guest and handle one more fault, or flash TLB here? No idea. Probably the reentry. On Intel the tlb is flushed anyway. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 12/12] Send async PF when guest is not in userspace too.
On 07/19/2010 06:31 PM, Gleb Natapov wrote: If guest indicates that it can handle async pf in kernel mode too send it, but only if interrupt are enabled. Reviewed-by: Rik van Riel Signed-off-by: Gleb Natapov --- arch/x86/kvm/mmu.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 12d1a7b..ed87b1c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2361,7 +2361,13 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu) if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu)) return false; - return !!kvm_x86_ops->get_cpl(vcpu); + if (vcpu->arch.apf_send_user_only) + return !!kvm_x86_ops->get_cpl(vcpu); cpl is not a bool. Compare it with 0. + + if (!kvm_x86_ops->interrupt_allowed(vcpu)) + return false; + + return true; } Should have commented before, but get_cpl() is not accurate when doing nested virtualization. When L1 intercepts page faults, being in L2 is equivalent to CPL 3. But we need to get the apf information to L1 somehow. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 10/12] Handle async PF in non preemptable context
On Tue, Aug 24, 2010 at 12:30:25PM +0300, Avi Kivity wrote: > On 07/19/2010 06:31 PM, Gleb Natapov wrote: > >If async page fault is received by idle task or when preemp_count is > >not zero guest cannot reschedule, so do sti; hlt and wait for page to be > >ready. vcpu can still process interrupts while it waits for the page to > >be ready. > > > >Acked-by: Rik van Riel > >Signed-off-by: Gleb Natapov > >--- > > arch/x86/kernel/kvm.c | 36 > > 1 files changed, 32 insertions(+), 4 deletions(-) > > > >diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > >index a6db92e..914b0fc 100644 > >--- a/arch/x86/kernel/kvm.c > >+++ b/arch/x86/kernel/kvm.c > >@@ -37,6 +37,7 @@ > > #include > > #include > > #include > >+#include > > > > #define MMU_QUEUE_SIZE 1024 > > > >@@ -68,6 +69,8 @@ struct kvm_task_sleep_node { > > wait_queue_head_t wq; > > u32 token; > > int cpu; > >+bool halted; > >+struct mm_struct *mm; > > }; > > > > static struct kvm_task_sleep_head { > >@@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 > >token) > > struct kvm_task_sleep_head *b =&async_pf_sleepers[key]; > > struct kvm_task_sleep_node n, *e; > > DEFINE_WAIT(wait); > >+int cpu, idle; > >+ > >+cpu = get_cpu(); > >+idle = idle_cpu(cpu); > >+put_cpu(); > > > > spin_lock(&b->lock); > > e = _find_apf_task(b, token); > >@@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 > >token) > > > > n.token = token; > > n.cpu = smp_processor_id(); > >+n.mm = current->active_mm; > >+n.halted = idle || preempt_count()> 1; > >+atomic_inc(&n.mm->mm_count); > > init_waitqueue_head(&n.wq); > > hlist_add_head(&n.link,&b->list); > > spin_unlock(&b->lock); > > > > for (;;) { > >-prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE); > >+if (!n.halted) > >+prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE); > > if (hlist_unhashed(&n.link)) > > break; > >-schedule(); > >+ > >+if (!n.halted) { > >+schedule(); > >+} else { > >+/* > >+ * We cannot reschedule. So halt. > >+ */ > > If we get the wakeup here, we'll halt and never wake up again. > We will not. IRQs are disabled here. native_safe_halt() enables them. > >+native_safe_halt(); > >+local_irq_disable(); > > So we need a local_irq_disable() before the hlish_unhashed() check. We are still in exception handler, so IRQ should be off. > > >+} > > } > >-finish_wait(&n.wq,&wait); > >+if (!n.halted) > >+finish_wait(&n.wq,&wait); > > > > return; > > } > >@@ -127,7 +149,12 @@ static void apf_task_wait(struct task_struct *tsk, u32 > >token) > > static void apf_task_wake_one(struct kvm_task_sleep_node *n) > > { > > hlist_del_init(&n->link); > >-if (waitqueue_active(&n->wq)) > >+if (!n->mm) > >+return; > >+mmdrop(n->mm); > >+if (n->halted) > >+smp_send_reschedule(n->cpu); > >+else if (waitqueue_active(&n->wq)) > > wake_up(&n->wq); > > } > > > >@@ -157,6 +184,7 @@ again: > > } > > n->token = token; > > n->cpu = smp_processor_id(); > >+n->mm = NULL; > > init_waitqueue_head(&n->wq); > > hlist_add_head(&n->link,&b->list); > > } else > > > -- > error compiling committee.c: too many arguments to function -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 09/12] Retry fault before vmentry
On Tue, Aug 24, 2010 at 12:25:33PM +0300, Avi Kivity wrote: > On 07/19/2010 06:30 PM, Gleb Natapov wrote: > >When page is swapped in it is mapped into guest memory only after guest > >tries to access it again and generate another fault. To save this fault > >we can map it immediately since we know that guest is going to access > >the page. > > > > > > > >-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, > >-u32 error_code) > >+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, > >+ bool sync) > > 'sync' means something else in the shadow mmu. Please rename to > something longer, maybe 'apf_completion'. > > Alternatively, split to two functions, a base function that doesn't > do apf and a wrapper that handles apf. > Will rename to something else. > >@@ -505,6 +506,37 @@ out_unlock: > > return 0; > > } > > > >+static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3, > >+ gva_t addr, u32 error_code) > >+{ > >+int r = 0; > >+gpa_t curr_cr3 = vcpu->arch.cr3; > >+ > >+if (curr_cr3 != cr3) { > >+/* > >+ * We do page fault on behalf of a process that is sleeping > >+ * because of async PF. PV guest takes reference to mm that cr3 > >+ * belongs too, so it has to be valid here. > >+ */ > >+kvm_set_cr3(vcpu, cr3); > >+if (kvm_mmu_reload(vcpu)) > >+goto switch_cr3; > >+} > > With nested virtualization, we need to switch cr0, cr4, and efer as well... > On SVM or VMX or both? > >+ > >+r = FNAME(page_fault)(vcpu, addr, error_code, true); > >+ > >+if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu)) > >+kvm_mmu_sync_roots(vcpu); > > Why is this needed? > http://www.mail-archive.com/kvm@vger.kernel.org/msg37827.html KVM_REQ_MMU_SYNC request generated here must be processed before switching to a different cr3 (otherwise vcpu_enter_guest will process it with the wrong cr3 in place). > >+ > >+switch_cr3: > >+if (curr_cr3 != vcpu->arch.cr3) { > >+kvm_set_cr3(vcpu, curr_cr3); > >+kvm_mmu_reload(vcpu); > >+} > >+ > >+return r; > >+} > > This has the nasty effect of flushing the TLB on AMD. > What is more expansive reenter the guest and handle one more fault, or flash TLB here? > >+ > > static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) > > { > > struct kvm_shadow_walk_iterator iterator; > >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >index 2603cc4..5482db0 100644 > >--- a/arch/x86/kvm/x86.c > >+++ b/arch/x86/kvm/x86.c > >@@ -5743,6 +5743,15 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned > >long rflags) > > } > > EXPORT_SYMBOL_GPL(kvm_set_rflags); > > > >+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, > >+ struct kvm_async_pf *work) > >+{ > >+if (!vcpu->arch.mmu.page_fault_other_cr3 || is_error_page(work->page)) > >+return; > >+vcpu->arch.mmu.page_fault_other_cr3(vcpu, work->arch.cr3, work->gva, > >+work->arch.error_code); > >+} > >+ > > static int apf_put_user(struct kvm_vcpu *vcpu, u32 val) > > { > > if (unlikely(vcpu->arch.apf_memslot_ver != > >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > >index f56e8ac..de1d5b6 100644 > >--- a/virt/kvm/kvm_main.c > >+++ b/virt/kvm/kvm_main.c > >@@ -1348,6 +1348,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu > >*vcpu) > > spin_lock(&vcpu->async_pf_lock); > > list_del(&work->link); > > spin_unlock(&vcpu->async_pf_lock); > >+kvm_arch_async_page_ready(vcpu, work); > > put_page(work->page); > > async_pf_work_free(work); > > list_del(&work->queue); > >@@ -1366,6 +1367,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu > >*vcpu) > > list_del(&work->queue); > > vcpu->async_pf_queued--; > > > >+kvm_arch_async_page_ready(vcpu, work); > > kvm_arch_inject_async_page_present(vcpu, work); > > > > put_page(work->page); > > > -- > error compiling committee.c: too many arguments to function -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 11/12] Let host know whether the guest can handle async PF in non-userspace context.
On 07/19/2010 06:31 PM, Gleb Natapov wrote: If guest can detect that it runs in non-preemptable context it can handle async PFs at any time, so let host know that it can send async PF even if guest cpu is not in userspace. Acked-by: Rik van Riel Signed-off-by: Gleb Natapov --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/include/asm/kvm_para.h |1 + arch/x86/kernel/kvm.c |3 +++ arch/x86/kvm/x86.c |5 +++-- 4 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 45e6c12..c675d5d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -367,6 +367,7 @@ struct kvm_vcpu_arch { cpumask_var_t wbinvd_dirty_mask; u32 __user *apf_data; + bool apf_send_user_only; u32 apf_memslot_ver; u64 apf_msr_val; u32 async_pf_id; Lots of apf stuff in here. Make it apg.data etc.? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 10/12] Handle async PF in non preemptable context
On 07/19/2010 06:31 PM, Gleb Natapov wrote: If async page fault is received by idle task or when preemp_count is not zero guest cannot reschedule, so do sti; hlt and wait for page to be ready. vcpu can still process interrupts while it waits for the page to be ready. Acked-by: Rik van Riel Signed-off-by: Gleb Natapov --- arch/x86/kernel/kvm.c | 36 1 files changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index a6db92e..914b0fc 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -37,6 +37,7 @@ #include #include #include +#include #define MMU_QUEUE_SIZE 1024 @@ -68,6 +69,8 @@ struct kvm_task_sleep_node { wait_queue_head_t wq; u32 token; int cpu; + bool halted; + struct mm_struct *mm; }; static struct kvm_task_sleep_head { @@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 token) struct kvm_task_sleep_head *b =&async_pf_sleepers[key]; struct kvm_task_sleep_node n, *e; DEFINE_WAIT(wait); + int cpu, idle; + + cpu = get_cpu(); + idle = idle_cpu(cpu); + put_cpu(); spin_lock(&b->lock); e = _find_apf_task(b, token); @@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 token) n.token = token; n.cpu = smp_processor_id(); + n.mm = current->active_mm; + n.halted = idle || preempt_count()> 1; + atomic_inc(&n.mm->mm_count); init_waitqueue_head(&n.wq); hlist_add_head(&n.link,&b->list); spin_unlock(&b->lock); for (;;) { - prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE); + if (!n.halted) + prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE); if (hlist_unhashed(&n.link)) break; - schedule(); + + if (!n.halted) { + schedule(); + } else { + /* +* We cannot reschedule. So halt. +*/ If we get the wakeup here, we'll halt and never wake up again. + native_safe_halt(); + local_irq_disable(); So we need a local_irq_disable() before the hlish_unhashed() check. + } } - finish_wait(&n.wq,&wait); + if (!n.halted) + finish_wait(&n.wq,&wait); return; } @@ -127,7 +149,12 @@ static void apf_task_wait(struct task_struct *tsk, u32 token) static void apf_task_wake_one(struct kvm_task_sleep_node *n) { hlist_del_init(&n->link); - if (waitqueue_active(&n->wq)) + if (!n->mm) + return; + mmdrop(n->mm); + if (n->halted) + smp_send_reschedule(n->cpu); + else if (waitqueue_active(&n->wq)) wake_up(&n->wq); } @@ -157,6 +184,7 @@ again: } n->token = token; n->cpu = smp_processor_id(); + n->mm = NULL; init_waitqueue_head(&n->wq); hlist_add_head(&n->link,&b->list); } else -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 09/12] Retry fault before vmentry
On 07/19/2010 06:30 PM, Gleb Natapov wrote: When page is swapped in it is mapped into guest memory only after guest tries to access it again and generate another fault. To save this fault we can map it immediately since we know that guest is going to access the page. -static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, - u32 error_code) +static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, + bool sync) 'sync' means something else in the shadow mmu. Please rename to something longer, maybe 'apf_completion'. Alternatively, split to two functions, a base function that doesn't do apf and a wrapper that handles apf. @@ -505,6 +506,37 @@ out_unlock: return 0; } +static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3, + gva_t addr, u32 error_code) +{ + int r = 0; + gpa_t curr_cr3 = vcpu->arch.cr3; + + if (curr_cr3 != cr3) { + /* +* We do page fault on behalf of a process that is sleeping +* because of async PF. PV guest takes reference to mm that cr3 +* belongs too, so it has to be valid here. +*/ + kvm_set_cr3(vcpu, cr3); + if (kvm_mmu_reload(vcpu)) + goto switch_cr3; + } With nested virtualization, we need to switch cr0, cr4, and efer as well... + + r = FNAME(page_fault)(vcpu, addr, error_code, true); + + if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu)) + kvm_mmu_sync_roots(vcpu); Why is this needed? + +switch_cr3: + if (curr_cr3 != vcpu->arch.cr3) { + kvm_set_cr3(vcpu, curr_cr3); + kvm_mmu_reload(vcpu); + } + + return r; +} This has the nasty effect of flushing the TLB on AMD. + static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) { struct kvm_shadow_walk_iterator iterator; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2603cc4..5482db0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5743,6 +5743,15 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) } EXPORT_SYMBOL_GPL(kvm_set_rflags); +void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, + struct kvm_async_pf *work) +{ + if (!vcpu->arch.mmu.page_fault_other_cr3 || is_error_page(work->page)) + return; + vcpu->arch.mmu.page_fault_other_cr3(vcpu, work->arch.cr3, work->gva, + work->arch.error_code); +} + static int apf_put_user(struct kvm_vcpu *vcpu, u32 val) { if (unlikely(vcpu->arch.apf_memslot_ver != diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f56e8ac..de1d5b6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1348,6 +1348,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) spin_lock(&vcpu->async_pf_lock); list_del(&work->link); spin_unlock(&vcpu->async_pf_lock); + kvm_arch_async_page_ready(vcpu, work); put_page(work->page); async_pf_work_free(work); list_del(&work->queue); @@ -1366,6 +1367,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) list_del(&work->queue); vcpu->async_pf_queued--; + kvm_arch_async_page_ready(vcpu, work); kvm_arch_inject_async_page_present(vcpu, work); put_page(work->page); -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 08/12] Inject asynchronous page fault into a guest if page is swapped out.
On 08/24/2010 10:52 AM, Gleb Natapov wrote: This nice cache needs to be outside apf to reduce complexity for reviewers and since it is useful for others. Would be good to have memslot-cached kvm_put_guest() and kvm_get_guest(). Will look into it. In the meantime, you can just drop the caching. + struct kvm_arch_async_pf *arch) +{ + struct kvm_async_pf *work; + + if (vcpu->async_pf_queued>= ASYNC_PF_PER_VCPU) + return 0; 100 == too high. At 16 vcpus, this allows 1600 kernel threads to wait for I/O. Number of kernel threads are limited by other means. Slow work subsystem has its own knobs to tune that. Here we limit how much slow work items can be queued per vcpu. OK. Would have been best if we could ask for a page to be paged in asynchronously. You mean to have core kernel facility for that? I agree it would be nice, but much harder. Yes, that's what I meant. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 04/12] Provide special async page fault handler when async PF capability is detected
On 08/24/2010 10:31 AM, Gleb Natapov wrote: + +static void apf_task_wait(struct task_struct *tsk, u32 token) +{ + u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); + struct kvm_task_sleep_head *b =&async_pf_sleepers[key]; + struct kvm_task_sleep_node n, *e; + DEFINE_WAIT(wait); + + spin_lock(&b->lock); + e = _find_apf_task(b, token); + if (e) { + /* dummy entry exist -> wake up was delivered ahead of PF */ + hlist_del(&e->link); + kfree(e); + spin_unlock(&b->lock); + return; + } + + n.token = token; + n.cpu = smp_processor_id(); What's the meaning of cpu? Won't the waiter migrate to other cpus? Waiter cannot migrate to other cpu since it is sleeping. It may be scheduled to run on any cpu when it will be waked. What if you have a spurious wakeup? Also, nothing prevents the scheduler from migrating the thread even if it is sleeping. It may not do so now, but it might do it in the future. Oh, it probably does now on cpu hotunplug. Why do you need n.cpu? + spin_unlock(&b->lock); + cpu_relax(); + goto again; + } The other cpu might be waiting for us to yield. We can fix it later with the the pv spinlock infrastructure. This busy wait happens only if (very small) allocation fails, so if a guest ever hits this code path I expect it to be on his way to die anyway. Hm. I don't have a good feel on how rare atomic allocation failures are on common workloads. Note a kmem_cache for apfs will make failures even more rare. Or, we can avoid the allocation. If at most one apf can be pending (is this true?), we can use a per-cpu variable for this dummy entry. We can have may outstanding apfs. But, while we're processing an apf, we can't take any more. So we can have a buffer of one pre-allocated entry per cpu, and do something like: apf: disable apf for this cpu handle apf using buffered entry enable interrupts allocate new entry buffer it enable apf for that cpu this trades off a bigger apf disabled window for not busy looping. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html