Re: [PATCH] Add realmode test for CALL FAR IMM instruction

2010-08-24 Thread Avi Kivity
 On 08/25/2010 09:16 AM, Wei Yongjun wrote:
> Signed-off-by: Wei Yongjun 
> ---
>  x86/realmode.c |6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/x86/realmode.c b/x86/realmode.c
> index a833829..2e12680 100644
> --- a/x86/realmode.c
> +++ b/x86/realmode.c
> @@ -437,6 +437,9 @@ void test_call(void)
>   "ret\n\t"
>   "2:\t");
>   MK_INSN(call_far1,  "lcallw *(%ebx)\n\t");
> + MK_INSN(call_far2,  ".byte 0x9a\n\t"
> + ".word retf\n\t"
> + ".word 0x00\n\t");

Why .byte encoding? won't "lcallw $0, $retf" (or the other way round) work?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add realmode test for CALL FAR IMM instruction

2010-08-24 Thread Wei Yongjun
Signed-off-by: Wei Yongjun 
---
 x86/realmode.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/x86/realmode.c b/x86/realmode.c
index a833829..2e12680 100644
--- a/x86/realmode.c
+++ b/x86/realmode.c
@@ -437,6 +437,9 @@ void test_call(void)
"ret\n\t"
"2:\t");
MK_INSN(call_far1,  "lcallw *(%ebx)\n\t");
+   MK_INSN(call_far2,  ".byte 0x9a\n\t"
+   ".word retf\n\t"
+   ".word 0x00\n\t");
MK_INSN(ret_imm,"sub $10, %sp; jmp 2f; 1: retw $10; 2: callw 1b");
 
exec_in_big_real_mode(&insn_call1);
@@ -453,6 +456,9 @@ void test_call(void)
exec_in_big_real_mode(&insn_call_far1);
report("call far 1", 0, 1);
 
+   exec_in_big_real_mode(&insn_call_far2);
+   report("call far 2", 0, 1);
+
exec_in_big_real_mode(&insn_ret_imm);
report("ret imm 1", 0, 1);
 }
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)

2010-08-24 Thread Wei Yongjun
Signed-off-by: Wei Yongjun 
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 26f69c2..e5f7f5c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2487,7 +2487,7 @@ static struct opcode opcode_table[256] = {
X8(D(SrcAcc | DstReg)),
/* 0x98 - 0x9F */
D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
-   D(SrcImmFAddr | No64), N,
+   I(SrcImmFAddr | No64, em_call_far), N,
D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N,
/* 0xA0 - 0xA7 */
D(ByteOp | DstAcc | SrcMem | Mov | MemAbs), D(DstAcc | SrcMem | Mov | 
MemAbs),
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM timekeeping and TSC virtualization

2010-08-24 Thread Zachary Amsden

On 08/24/2010 12:13 PM, Marcelo Tosatti wrote:

On Thu, Aug 19, 2010 at 10:07:14PM -1000, Zachary Amsden wrote:
   

This patch set implements full TSC virtualization, with both
trapping and passthrough modes, and intelligent mode switching.
As a result, TSC will never go backwards, we are stable against
guest re-calibration attempts, VM reset, and migration.  For guests
which require it, the TSC khz can even be preserved on migration
to a new host.

The TSC will never be trapped on UP systems unless the host TSC
actually runs faster than the guest; other conditions, including
bad hardware and changing speeds are accomodated by using catchup
mode to keep the guest passthrough TSC in line with the host clock.

What is still needed on top of this is a way to force TSC
trapping, or disable it entirely, for benchmarking purposes.
I refrained from adding that last bit because it wasn't clear
whether the best thing to do is a global 'force TSC trapping' /
'force TSC passthrough' / 'intelligent choice', or if this control
should be on a per-VM level, via an ioctl(), module parameter,
or sysfs.

John and Thomas I have cc'd on this because it may be relevant to
their interests and I always appreciate feedback, especially on
a change set as large and complex as this.

Enjoy.  This time, there are no howler monkeys.  I've included
all the feedback I got from previous rounds of this and more.
 

Applied 1-19, thanks.

   


Great, thanks.  I'll wait for more feedback on the rest and rebase.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: fix missing percpu counter destroy

2010-08-24 Thread Marcelo Tosatti
On Mon, Aug 23, 2010 at 04:13:15PM +0800, Wei Yongjun wrote:
> commit ad05c88266b4cce1c820928ce8a0fb7690912ba1
> (KVM: create aggregate kvm_total_used_mmu_pages value)
> introduce percpu counter kvm_total_used_mmu_pages but never
> destroy it, this may cause oops when rmmod & modprobe.
> 
> Signed-off-by: Wei Yongjun 
> ---
>  arch/x86/kvm/mmu.c |5 -
>  1 files changed, 4 insertions(+), 1 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM timekeeping and TSC virtualization

2010-08-24 Thread Marcelo Tosatti
On Thu, Aug 19, 2010 at 10:07:14PM -1000, Zachary Amsden wrote:
> This patch set implements full TSC virtualization, with both
> trapping and passthrough modes, and intelligent mode switching.
> As a result, TSC will never go backwards, we are stable against
> guest re-calibration attempts, VM reset, and migration.  For guests
> which require it, the TSC khz can even be preserved on migration
> to a new host.
> 
> The TSC will never be trapped on UP systems unless the host TSC
> actually runs faster than the guest; other conditions, including
> bad hardware and changing speeds are accomodated by using catchup
> mode to keep the guest passthrough TSC in line with the host clock.
> 
> What is still needed on top of this is a way to force TSC
> trapping, or disable it entirely, for benchmarking purposes.
> I refrained from adding that last bit because it wasn't clear
> whether the best thing to do is a global 'force TSC trapping' /
> 'force TSC passthrough' / 'intelligent choice', or if this control
> should be on a per-VM level, via an ioctl(), module parameter,
> or sysfs.
> 
> John and Thomas I have cc'd on this because it may be relevant to
> their interests and I always appreciate feedback, especially on
> a change set as large and complex as this.
> 
> Enjoy.  This time, there are no howler monkeys.  I've included
> all the feedback I got from previous rounds of this and more.

Applied 1-19, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Bridged Networking Issue in Ubuntu Lucid

2010-08-24 Thread haishan

Rus Hughes wrote:

Hi guys,

I'm trying to sort out networking for a VM I've created on my Ubuntu
Lucid box but the VM cannot access the Internet.
I can connect to the VM (crisps) from the host (holly) and to the host
from the VM.
But the VM cannot connect to the Internet and the Internet cannot
connect to the VM.

I followed the guide here: https://help.ubuntu.com/community/KVM

I created it using ubuntu-vm-builder using :

ubuntu-vm-builder kvm lucid \
  --domain crisps \
  --dest crisps \
  --arch amd64 \
  --hostname crisps \
  --mem 2048 \
  --rootsize 4096 \
  --swapsize 1024 \
  --user magicalusernameofdoom \
  --pass somesecretawesomepassword \
  --ip 178.63.60.159 \
  --mask 255.255.255.192 \
  --bcast 178.63.60.191 \
  --gw 178.63.60.129 \
  --dns 213.133.99.99 \
  --mirror http://de.archive.ubuntu.com/ubuntu \
  --components main,universe,restricted,multiverse \
  --addpkg openssh-server \
  --libvirt qemu:///system \
--bridge br0;

"virsh start crisps" starts the vm up :

root 29297  0.5  3.3 2280396 271220 ?  Sl   14:12   0:13
/usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps
-uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait
-monitor chardev:monitor -boot c -drive
file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on
-net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0
-net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc
127.0.0.1:0 -vga cirrus

Networking on the host is like so:

br0   Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
  inet addr:178.63.60.138  Bcast:178.63.60.191  Mask:255.255.255.192
  inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0
  TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:35266886777 (35.2 GB)  TX bytes:60292047383 (60.2 GB)

eth0  Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
  inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0
  TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:36337218359 (36.3 GB)  TX bytes:60378047518 (60.3 GB)
  Interrupt:29 Base address:0x4000

eth0:0Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
  inet addr:178.63.60.178  Bcast:178.63.60.191  Mask:255.255.255.192
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  Interrupt:29 Base address:0x4000

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:325111540 (325.1 MB)  TX bytes:325111540 (325.1 MB)

vnet0 Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
  inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:72 errors:0 dropped:0 overruns:0 frame:0
  TX packets:33 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:500
  RX bytes:3412 (3.4 KB)  TX bytes:2698 (2.6 KB)

the hosts /etc/network/interfaces is like so :

auto lo
iface lo inet loopback

# device: eth0
auto  eth0
iface eth0 inet manual

auto br0
iface br0 inet static
  address   178.63.60.138
  broadcast 178.63.60.191
  netmask   255.255.255.192
  gateway   178.63.60.129
  network 178.63.60.128
  bridge_ports eth0
  bridge_stp off
  bridge_fd 0
  bridge_maxwait 0

# subli.me.uk
auto eth0:0
iface eth0:0 inet static
  address 178.63.60.178
  netmask 255.255.255.192


If I ping the VM I see the packets on the host br0 but not vnet0, this
is the output of brctl show on the host :

bridge name bridge id   STP enabled interfaces
br0 8000.265dc9d2752e   no  eth0

 vnet0
this is the routing table on the host

Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
178.63.60.128   0.0.0.0 255.255.255.192 U 0  00 br0
0.0.0.0 178.63.60.129   0.0.0.0 UG10000 br0

this is the routing table on the VM

Kernel IP routing table
Destination Gate

Re: KVM timekeeping and TSC virtualization

2010-08-24 Thread Zachary Amsden

On 08/24/2010 03:32 AM, David S. Ahern wrote:


On 08/23/10 23:47, Zachary Amsden wrote:
   

I've heard the rumor that TSC is orders of magnitude faster under VMware
than under KVM from three people now, and I thought you were part of
that camp.

Needless to say, they are either laughably incorrect, or possess some
great secret knowledge of how to make things under virtualization go
faster than bare metal.

I also have a magical talking unicorn, which, btw, is invisible.
Extraordinary claims require extraordinary proof (the proof of my
unicorn is too complex to fit in the margin of this e-mail, however, I
assure you he is real).
 

I have put in a lot of time over the past 3 years to understand how the
'magic' of virtualization works; please don't lump me into camps until I
raise my hand as being part of one.


   

My point is that kvmclock is Red Hat's answer for the future -- RHEL6,
RHEL5.Y (whenever it proves reliable). What about the present?  What
about products based on other distributions newer than RHEL5 but
pre-kvmclock?

   

It should be obvious from this patchset... PIT or TSC.

KVM did not have an in-kernel PIT implementation circa 2008, so this
data is quite old.  It's much faster now and will continue to get faster
as exit cost goes down and the emulation gets further optimized.
 

It was in-kernel pit in early 2008 (kernel git entry):

commit 7837699fa6d7adf81f26ab73a5f6897ea1ab9d6a
Author: Sheng Yang
Date:   Mon Jan 28 05:10:22 2008 +0800

 KVM: In kernel PIT model


   

Plus, now we have an error-free TSC.

 

There are a lot of moving windows of what to use as a clock source, not
just per major number (RHEL4, RHEL5) but minor number (e.g., TSC
stability on RHEL4 -- e.g.,
https://bugzilla.redhat.com/show_bug.cgi?id=491154) and further
maintenance releases (kvmclock requiring RHEL5.5+). That is not very
friendly to a product making a transition to virtualization - and with
the same code base running bare metal or in a VM.

   

If you have old software running on broken hardware you do not get
hardware performance and error-free time virtualization.  With any
vendor.  Period.
 

Sucks to be old *and* broken. But old with fancy new wheels, er hardware
-- like commodity x86 servers running Nehalem-based processors -- is a
different story.

   

With this patchset, KVM now has a much stronger guarantee: If you have
old guest software running on broken hardware, using SMP virtual
machines, you do not get hardware performance and error-free time
virtualization.However, if you have new guest software, non-broken
hardware, or can simply run UP guests instead of SMP, you can have
hardware performance, and it is now error free.  Alternatively, you can
sacrifice some accuracy and have hardware performance, even for SMP
guests, if you can tolerate some minor cross-CPU TSC variation.  No
other vendor I know of can make that guarantee.

Zach
 

If the processor has a stable TSC why trap it? I realize you are trying
to cover a gauntlet of hardware and guests, so maybe a nerd knob is
needed to disable.
   


Exactly.  If you have a stable TSC, we don't trap it.  If you don't have 
a stable TSC, we do.  That's the point of these patches.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM timekeeping 12/35] Robust TSC compensation

2010-08-24 Thread Daniel Verkamp
On Fri, Aug 20, 2010 at 1:07 AM, Zachary Amsden  wrote:
[...]
> +        * or make a best guest using elapsed value.

Perhaps s/guest/guess/ while the line is changing anyway.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SeaBIOS and GRUB booting from Virtio devices [was: GRUB and support for Virtio]

2010-08-24 Thread Anthony Liguori

On 08/24/2010 01:51 PM, Daniel Bareiro wrote:


Martin Kraus seems that recently experienced a similar problem and this
was linked to SeaBIOS [1] and compiling the version from the git would
solve this problem. I had no problems with GRUB trying to boot from two
different IDE drives.

I would like to know if there are plans to incorporate this to stable
KVM in the short term.
   


No, but the next release is very soon and will have this.  It's a new 
feature.  There's never been


Regards,

Anthony Liguori


Regards,
Daniel

[1] http://www.spinics.net/lists/kvm/msg39928.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SeaBIOS and GRUB booting from Virtio devices [was: GRUB and support for Virtio]

2010-08-24 Thread Daniel Bareiro
On Tuesday, 24 August 2010 11:23:59 -0300,
Daniel Bareiro wrote:

> > Strange, i did 2 clean installs of centos 5.5 on an ubuntu 9.10 host
> > and it went flawless with virtio devices ( both disks and network ).

> Doing some tests with CentOS 5.5 on a KVM virtual machine, after doing
> the installation, I added a second disk. But when trying to boot from
> it, I get the following error:
> 
> -
> root (hd1,0)
> 
> Error 21: Selected disk does not exist
> -
> 
> 
> The two disks are Virtio devices that are recognized when booting from
> the first disk:
> 
> -
> [r...@localhost ~]# fdisk -l /dev/vda
> 
> Disco /dev/vda: 10.7 GB, 10737418240 bytes
> 255 heads, 63 sectors/track, 1305 cylinders
> Unidades = cilindros de 16065 * 512 = 8225280 bytes
> 
> Disposit. InicioComienzo  Fin  Bloques  Id  Sistema
> /dev/vda1   *   11174 9430123+  fd  Linux raid autodetect
> /dev/vda211751305 1052257+  fd  Linux raid autodetect
> [r...@localhost ~]#
> [r...@localhost ~]#
> [r...@localhost ~]# fdisk -l /dev/vdb
> 
> Disco /dev/vdb: 10.7 GB, 10737418240 bytes
> 255 heads, 63 sectors/track, 1305 cylinders
> Unidades = cilindros de 16065 * 512 = 8225280 bytes
> 
> Disposit. InicioComienzo  Fin  Bloques  Id  Sistema
> /dev/vdb1   *   11174 9430123+  fd  Linux raid autodetect
> /dev/vdb211751305 1052257+  fd  Linux raid autodetect
> -
> 
> The idea of these tests is to set up software RAID1 on a running
> system, since, it seems that Anaconda does not support installation on
> degraded RAID.
> 
> But I'm not sure if this is a problem of Virtio or that GRUB is not
> recognizing the second disk.
> 
> I made sure to modify /boot/grub/device.map with the entry for the new
> disk:
> 
> -
> [r...@localhost grub]# cat /boot/grub/device.map
> # this device map was generated by anaconda
> (hd0) /dev/vda
> (hd1) /dev/vdb
> -
> 
> And the reconfiguration of GRUB on both disks did not give problems:
> 
> -
> [r...@localhost grub]# grub --device-map=/boot/grub/device.map
> 
> 
> GNU GRUB  version 0.97  (640K lower / 3072K upper memory)
> 
>  [ Minimal BASH-like line editing is supported.  For the first word, TAB
>lists possible command completions.  Anywhere else TAB lists the
> possible
>completions of a device/filename.]
> grub> root (hd0,0)
> root (hd0,0)
>  Filesystem type is ext2fs, partition type 0xfd
> grub> setup (hd0)
> setup (hd0)
>  Checking if "/boot/grub/stage1" exists... yes
>  Checking if "/boot/grub/stage2" exists... yes
>  Checking if "/boot/grub/e2fs_stage1_5" exists... yes
>  Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  15 sectors are
> embedded.
> succeeded
>  Running "install /boot/grub/stage1 (hd0) (hd0)1+15 p
> (hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
> Done.
> grub> root (hd1,0)
> root (hd1,0)
>  Filesystem type is ext2fs, partition type 0xfd
> grub> setup (hd1)
> setup (hd1)
>  Checking if "/boot/grub/stage1" exists... yes
>  Checking if "/boot/grub/stage2" exists... yes
>  Checking if "/boot/grub/e2fs_stage1_5" exists... yes
>  Running "embed /boot/grub/e2fs_stage1_5 (hd1)"...  15 sectors are
> embedded.
> succeeded
>  Running "install /boot/grub/stage1 (hd1) (hd1)1+15 p
> (hd1,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
> Done.
> grub> quit
> quit
> -
> 
> 
> Any idea what may be causing the problem?

Martin Kraus seems that recently experienced a similar problem and this
was linked to SeaBIOS [1] and compiling the version from the git would
solve this problem. I had no problems with GRUB trying to boot from two
different IDE drives. 

I would like to know if there are plans to incorporate this to stable
KVM in the short term.

Regards,
Daniel

[1] http://www.spinics.net/lists/kvm/msg39928.html
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598


signature.asc
Description: Digital signature


Re: KVM Bridged Networking Issue in Ubuntu Lucid

2010-08-24 Thread Brian Jackson
On Tuesday, August 24, 2010 09:08:30 am Rus Hughes wrote:
> Hi guys,
> 
> I'm trying to sort out networking for a VM I've created on my Ubuntu
> Lucid box but the VM cannot access the Internet.
> I can connect to the VM (crisps) from the host (holly) and to the host
> from the VM.
> But the VM cannot connect to the Internet and the Internet cannot
> connect to the VM.
> 
> I followed the guide here: https://help.ubuntu.com/community/KVM
> 
> I created it using ubuntu-vm-builder using :
> 
> ubuntu-vm-builder kvm lucid \
>   --domain crisps \
>   --dest crisps \
>   --arch amd64 \
>   --hostname crisps \
>   --mem 2048 \
>   --rootsize 4096 \
>   --swapsize 1024 \
>   --user magicalusernameofdoom \
>   --pass somesecretawesomepassword \
>   --ip 178.63.60.159 \
>   --mask 255.255.255.192 \
>   --bcast 178.63.60.191 \
>   --gw 178.63.60.129 \
>   --dns 213.133.99.99 \
>   --mirror http://de.archive.ubuntu.com/ubuntu \
>   --components main,universe,restricted,multiverse \
>   --addpkg openssh-server \
>   --libvirt qemu:///system \
> --bridge br0;
> 
> "virsh start crisps" starts the vm up :
> 
> root 29297  0.5  3.3 2280396 271220 ?  Sl   14:12   0:13
> /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps
> -uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev
> socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait
> -monitor chardev:monitor -boot c -drive
> file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on
> -net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0
> -net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc
> 127.0.0.1:0 -vga cirrus
> 
> Networking on the host is like so:
> 
> br0   Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
>   inet addr:178.63.60.138  Bcast:178.63.60.191 
> Mask:255.255.255.192 inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:0
>   RX bytes:35266886777 (35.2 GB)  TX bytes:60292047383 (60.2 GB)
> 
> eth0  Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
>   inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:36337218359 (36.3 GB)  TX bytes:60378047518 (60.3 GB)
>   Interrupt:29 Base address:0x4000
> 
> eth0:0Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
>   inet addr:178.63.60.178  Bcast:178.63.60.191 
> Mask:255.255.255.192 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   Interrupt:29 Base address:0x4000
> 


Maybe look here. Should this maybe be br0:0 (or better yet, just use ip addr 
add to add the second address to the bridge instead of having aliases).



> loLink encap:Local Loopback
>   inet addr:127.0.0.1  Mask:255.0.0.0
>   inet6 addr: ::1/128 Scope:Host
>   UP LOOPBACK RUNNING  MTU:16436  Metric:1
>   RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:0
>   RX bytes:325111540 (325.1 MB)  TX bytes:325111540 (325.1 MB)
> 
> vnet0 Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
>   inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:72 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:33 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:500
>   RX bytes:3412 (3.4 KB)  TX bytes:2698 (2.6 KB)
> 
> the hosts /etc/network/interfaces is like so :
> 
> auto lo
> iface lo inet loopback
> 
> # device: eth0
> auto  eth0
> iface eth0 inet manual
> 
> auto br0
> iface br0 inet static
>   address   178.63.60.138
>   broadcast 178.63.60.191
>   netmask   255.255.255.192
>   gateway   178.63.60.129
>   network 178.63.60.128
>   bridge_ports eth0
>   bridge_stp off
>   bridge_fd 0
>   bridge_maxwait 0
> 
> # subli.me.uk
> auto eth0:0
> iface eth0:0 inet static
>   address 178.63.60.178
>   netmask 255.255.255.192
> 
> 
> If I ping the VM I see the packets on the host br0 but not vnet0, this
> is the output of brctl show on the host :
> 
> bridge name bridge id   STP enabled interfaces
> br0 8000.265dc9d2752e

Re: [PATCH] KVM: MMU: fix missing percpu counter destroy

2010-08-24 Thread Tim Pepper
On Mon 23 Aug at 16:13:15 +0800 yj...@cn.fujitsu.com said:
> commit ad05c88266b4cce1c820928ce8a0fb7690912ba1
> (KVM: create aggregate kvm_total_used_mmu_pages value)
> introduce percpu counter kvm_total_used_mmu_pages but never
> destroy it, this may cause oops when rmmod & modprobe.
> 
> Signed-off-by: Wei Yongjun 

Acked-by: Tim Pepper 

-- 
Tim Pepper  
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at arch/x86/kernel/apic/apic.c:1238 setup_local_APIC+0x169/0x24b()

2010-08-24 Thread Tharindu Rukshan Bamunuarachchi
it worked. thankx a lot.





On Tue, Aug 24, 2010 at 2:22 PM, Marcelo Tosatti  wrote:
>> [    0.00] NR_IRQS:4352 nr_irqs:512
>> [    0.00] Console: colour VGA+ 80x25
>> [    0.00] console [tty0] enabled
>> [    0.00] ODEBUG: 0 of 0 active objects replaced
>> [    0.00] Fast TSC calibration failed
>> [    0.00] TSC: Unable to calibrate against PIT
>> [    0.00] TSC: No reference (HPET/PMTIMER) available
>> [    0.00] Marking TSC unstable due to could not calculate TSC khz
>> [    0.052000] Calibrating delay loop... 3932.16 BogoMIPS (lpj=7864320)
>
> Enabling CONFIG_KVM_GUEST and CONFIG_KVM_CLOCK should fix the warning.
>
>> [    0.148000] pid_max: default: 32768 minimum: 301
>> [    0.148000] Dentry cache hash table entries: 65536 (order: 7, 524288 
>> bytes)
>> [    0.152000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
>> [    0.152000] Mount-cache hash table entries: 256
>> [    0.152000] Performance Events: unsupported p6 CPU model 2 no PMU
>> driver, software events only.
>> [    0.16] Freeing SMP alternatives: 12k freed
>> [    0.16] Setting APIC routing to flat
>> [    0.164000] [ cut here ]
>> [    0.164000] WARNING: at arch/x86/kernel/apic/apic.c:1238
>> setup_local_APIC+0x169/0x24b()
>> [    0.164000] Hardware name: Bochs
>> [    0.168000] Pid: 1, comm: swapper Not tainted 2.6.36-rc1-tinux+ #89
>> [    0.168000] Call Trace:
>> [    0.168000]  [] warn_slowpath_common+0x75/0xb0
>> [    0.168000]  [] warn_slowpath_null+0x15/0x20
>> [    0.172000]  [] setup_local_APIC+0x169/0x24b
>> [    0.172000]  [] native_smp_prepare_cpus+0x322/0x3ff
>> [    0.172000]  [] kernel_init+0x57/0x1bd
>> [    0.172000]  [] kernel_thread_helper+0x4/0x10
>> [    0.176000]  [] ? kernel_init+0x0/0x1bd
>> [    0.176000]  [] ? kernel_thread_helper+0x0/0x10
>> [    0.176000] ---[ end trace 4eaa2a86a8e2da22 ]---
>> [    0.18]   alloc irq_desc for 33 on node 0
>> [    0.18]   alloc kstat_irqs on node 0
>> [    0.18]   alloc irq_desc for 35 on node 0
>> [    0.18]   alloc kstat_irqs on node 0
>> [    0.18] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>> [    0.22] CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03
>> [    0.34] NMI watchdog failed to create perf event on cpu0:
>> ffed
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at arch/x86/kernel/apic/apic.c:1238 setup_local_APIC+0x169/0x24b()

2010-08-24 Thread Tharindu Rukshan Bamunuarachchi
it worked. thankx a lot.





On Tue, Aug 24, 2010 at 2:22 PM, Marcelo Tosatti  wrote:
>> [    0.00] NR_IRQS:4352 nr_irqs:512
>> [    0.00] Console: colour VGA+ 80x25
>> [    0.00] console [tty0] enabled
>> [    0.00] ODEBUG: 0 of 0 active objects replaced
>> [    0.00] Fast TSC calibration failed
>> [    0.00] TSC: Unable to calibrate against PIT
>> [    0.00] TSC: No reference (HPET/PMTIMER) available
>> [    0.00] Marking TSC unstable due to could not calculate TSC khz
>> [    0.052000] Calibrating delay loop... 3932.16 BogoMIPS (lpj=7864320)
>
> Enabling CONFIG_KVM_GUEST and CONFIG_KVM_CLOCK should fix the warning.
>
>> [    0.148000] pid_max: default: 32768 minimum: 301
>> [    0.148000] Dentry cache hash table entries: 65536 (order: 7, 524288 
>> bytes)
>> [    0.152000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
>> [    0.152000] Mount-cache hash table entries: 256
>> [    0.152000] Performance Events: unsupported p6 CPU model 2 no PMU
>> driver, software events only.
>> [    0.16] Freeing SMP alternatives: 12k freed
>> [    0.16] Setting APIC routing to flat
>> [    0.164000] [ cut here ]
>> [    0.164000] WARNING: at arch/x86/kernel/apic/apic.c:1238
>> setup_local_APIC+0x169/0x24b()
>> [    0.164000] Hardware name: Bochs
>> [    0.168000] Pid: 1, comm: swapper Not tainted 2.6.36-rc1-tinux+ #89
>> [    0.168000] Call Trace:
>> [    0.168000]  [] warn_slowpath_common+0x75/0xb0
>> [    0.168000]  [] warn_slowpath_null+0x15/0x20
>> [    0.172000]  [] setup_local_APIC+0x169/0x24b
>> [    0.172000]  [] native_smp_prepare_cpus+0x322/0x3ff
>> [    0.172000]  [] kernel_init+0x57/0x1bd
>> [    0.172000]  [] kernel_thread_helper+0x4/0x10
>> [    0.176000]  [] ? kernel_init+0x0/0x1bd
>> [    0.176000]  [] ? kernel_thread_helper+0x0/0x10
>> [    0.176000] ---[ end trace 4eaa2a86a8e2da22 ]---
>> [    0.18]   alloc irq_desc for 33 on node 0
>> [    0.18]   alloc kstat_irqs on node 0
>> [    0.18]   alloc irq_desc for 35 on node 0
>> [    0.18]   alloc kstat_irqs on node 0
>> [    0.18] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>> [    0.22] CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03
>> [    0.34] NMI watchdog failed to create perf event on cpu0:
>> ffed
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to debug KVM with gdb?

2010-08-24 Thread SHEN Hao
Hello,

I would like to debug the OS kernel loaded and simulated by kvm. I
know that QEMU have some problem with KVM debug. So how can I debug
this kernel? Is there any suggestion?

Best regards,
-- 
Hao Shen
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Gleb Natapov
On Tue, Aug 24, 2010 at 05:28:12PM +0300, Avi Kivity wrote:
>  On 08/24/2010 05:06 PM, Gleb Natapov wrote:
> >On Tue, Aug 24, 2010 at 05:01:10PM +0300, Avi Kivity wrote:
> >>  On 08/24/2010 04:52 PM, Gleb Natapov wrote:
> >>>We can, of course. But for me it looks as arbitrary as -1/0/1 since not
> >>>all enum values have meanings to the caller.
> >>Yeah.  -1/0/1's problem is that between reading the callee code and
> >>caller code, I manage to forget what the values mean.
> >>
> >Luckily we have only one caller of x86_emulate_insn(), so documenting
> >return values right where function is called should help. :)
> 
> Please use #define instead of /* */.
> 
Sigh.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


UIO: per-handle event counts

2010-08-24 Thread Phil Edworthy
Hi,

I've been using UIO for a userspace driver and noticed a feature when
multiple uio file handles have been opened for the same device (e.g. when
running separate processes).

Since each handle stores its own count of the interrupts and uses this to
determine whether uio_read returns, it's possible for uio_read to return
before a new interrupt has occurred.

Was this the intended behaviour? In my case, the irq is associated with
completion of some hw processing so it seems a little strange to me. I
expected uio_read to return only after an interrupt occurs that hasn't been
already been handled by userspace code.

Thanks
Phil
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at arch/x86/kernel/apic/apic.c:1238 setup_local_APIC+0x169/0x24b()

2010-08-24 Thread Marcelo Tosatti
> [    0.00] NR_IRQS:4352 nr_irqs:512
> [    0.00] Console: colour VGA+ 80x25
> [    0.00] console [tty0] enabled
> [    0.00] ODEBUG: 0 of 0 active objects replaced
> [    0.00] Fast TSC calibration failed
> [    0.00] TSC: Unable to calibrate against PIT
> [    0.00] TSC: No reference (HPET/PMTIMER) available
> [    0.00] Marking TSC unstable due to could not calculate TSC khz
> [    0.052000] Calibrating delay loop... 3932.16 BogoMIPS (lpj=7864320)

Enabling CONFIG_KVM_GUEST and CONFIG_KVM_CLOCK should fix the warning.

> [    0.148000] pid_max: default: 32768 minimum: 301
> [    0.148000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
> [    0.152000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
> [    0.152000] Mount-cache hash table entries: 256
> [    0.152000] Performance Events: unsupported p6 CPU model 2 no PMU
> driver, software events only.
> [    0.16] Freeing SMP alternatives: 12k freed
> [    0.16] Setting APIC routing to flat
> [    0.164000] [ cut here ]
> [    0.164000] WARNING: at arch/x86/kernel/apic/apic.c:1238
> setup_local_APIC+0x169/0x24b()
> [    0.164000] Hardware name: Bochs
> [    0.168000] Pid: 1, comm: swapper Not tainted 2.6.36-rc1-tinux+ #89
> [    0.168000] Call Trace:
> [    0.168000]  [] warn_slowpath_common+0x75/0xb0
> [    0.168000]  [] warn_slowpath_null+0x15/0x20
> [    0.172000]  [] setup_local_APIC+0x169/0x24b
> [    0.172000]  [] native_smp_prepare_cpus+0x322/0x3ff
> [    0.172000]  [] kernel_init+0x57/0x1bd
> [    0.172000]  [] kernel_thread_helper+0x4/0x10
> [    0.176000]  [] ? kernel_init+0x0/0x1bd
> [    0.176000]  [] ? kernel_thread_helper+0x0/0x10
> [    0.176000] ---[ end trace 4eaa2a86a8e2da22 ]---
> [    0.18]   alloc irq_desc for 33 on node 0
> [    0.18]   alloc kstat_irqs on node 0
> [    0.18]   alloc irq_desc for 35 on node 0
> [    0.18]   alloc kstat_irqs on node 0
> [    0.18] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.22] CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03
> [    0.34] NMI watchdog failed to create perf event on cpu0:
> ffed
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Add realmode test for jcxz instruction

2010-08-24 Thread Marcelo Tosatti
On Tue, Aug 24, 2010 at 10:57:12AM +0800, Wei Yongjun wrote:
> Signed-off-by: Wei Yongjun 
> ---
> v2 -> v3: rebased
> ---
>  x86/realmode.c |   42 ++
>  1 files changed, 42 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Avi Kivity

 On 08/24/2010 05:06 PM, Gleb Natapov wrote:

On Tue, Aug 24, 2010 at 05:01:10PM +0300, Avi Kivity wrote:

  On 08/24/2010 04:52 PM, Gleb Natapov wrote:

We can, of course. But for me it looks as arbitrary as -1/0/1 since not
all enum values have meanings to the caller.

Yeah.  -1/0/1's problem is that between reading the callee code and
caller code, I manage to forget what the values mean.


Luckily we have only one caller of x86_emulate_insn(), so documenting
return values right where function is called should help. :)


Please use #define instead of /* */.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GRUB and support for Virtio

2010-08-24 Thread Daniel Bareiro
On Wednesday, 18 August 2010 21:45:56 +0300,
Nikolai K. Bochev wrote:

> Strange, i did 2 clean installs of centos 5.5 on an ubuntu 9.10 host
> and it went flawless with virtio devices ( both disks and network ).

Doing some tests with CentOS 5.5 on a KVM virtual machine, after doing
the installation, I added a second disk. But when trying to boot from
it, I get the following error:

-
root (hd1,0)

Error 21: Selected disk does not exist
-


The two disks are Virtio devices that are recognized when booting from
the first disk:

-
[r...@localhost ~]# fdisk -l /dev/vda

Disco /dev/vda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Unidades = cilindros de 16065 * 512 = 8225280 bytes

Disposit. InicioComienzo  Fin  Bloques  Id  Sistema
/dev/vda1   *   11174 9430123+  fd  Linux raid autodetect
/dev/vda211751305 1052257+  fd  Linux raid autodetect
[r...@localhost ~]#
[r...@localhost ~]#
[r...@localhost ~]# fdisk -l /dev/vdb

Disco /dev/vdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Unidades = cilindros de 16065 * 512 = 8225280 bytes

Disposit. InicioComienzo  Fin  Bloques  Id  Sistema
/dev/vdb1   *   11174 9430123+  fd  Linux raid autodetect
/dev/vdb211751305 1052257+  fd  Linux raid autodetect
-

The idea of these tests is to set up software RAID1 on a running system,
since, it seems that Anaconda does not support installation on degraded
RAID.

But I'm not sure if this is a problem of Virtio or that GRUB is not
recognizing the second disk.

I made sure to modify /boot/grub/device.map with the entry for the new
disk:

-
[r...@localhost grub]# cat /boot/grub/device.map
# this device map was generated by anaconda
(hd0) /dev/vda
(hd1) /dev/vdb
-

And the reconfiguration of GRUB on both disks did not give problems:

-
[r...@localhost grub]# grub --device-map=/boot/grub/device.map


GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the
possible
   completions of a device/filename.]
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  15 sectors are
embedded.
succeeded
 Running "install /boot/grub/stage1 (hd0) (hd0)1+15 p
(hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
Done.
grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd1)"...  15 sectors are
embedded.
succeeded
 Running "install /boot/grub/stage1 (hd1) (hd1)1+15 p
(hd1,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
Done.
grub> quit
quit
-


Any idea what may be causing the problem?

Regards,
Daniel
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598


signature.asc
Description: Digital signature


KVM Bridged Networking Issue in Ubuntu Lucid

2010-08-24 Thread Rus Hughes
Hi guys,

I'm trying to sort out networking for a VM I've created on my Ubuntu
Lucid box but the VM cannot access the Internet.
I can connect to the VM (crisps) from the host (holly) and to the host
from the VM.
But the VM cannot connect to the Internet and the Internet cannot
connect to the VM.

I followed the guide here: https://help.ubuntu.com/community/KVM

I created it using ubuntu-vm-builder using :

ubuntu-vm-builder kvm lucid \
  --domain crisps \
  --dest crisps \
  --arch amd64 \
  --hostname crisps \
  --mem 2048 \
  --rootsize 4096 \
  --swapsize 1024 \
  --user magicalusernameofdoom \
  --pass somesecretawesomepassword \
  --ip 178.63.60.159 \
  --mask 255.255.255.192 \
  --bcast 178.63.60.191 \
  --gw 178.63.60.129 \
  --dns 213.133.99.99 \
  --mirror http://de.archive.ubuntu.com/ubuntu \
  --components main,universe,restricted,multiverse \
  --addpkg openssh-server \
  --libvirt qemu:///system \
--bridge br0;

"virsh start crisps" starts the vm up :

root 29297  0.5  3.3 2280396 271220 ?  Sl   14:12   0:13
/usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp 1 -name crisps
-uuid e2cda480-29e7-3740-d6c6-816e2f78223e -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/crisps.monitor,server,nowait
-monitor chardev:monitor -boot c -drive
file=/home/idimmu/vms/crisps/tmplXQBtt.qcow2,if=ide,index=0,boot=on
-net nic,macaddr=52:54:00:a6:28:b9,vlan=0,model=virtio,name=virtio.0
-net tap,fd=41,vlan=0,name=tap.0 -serial none -parallel none -usb -vnc
127.0.0.1:0 -vga cirrus

Networking on the host is like so:

br0   Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
  inet addr:178.63.60.138  Bcast:178.63.60.191  Mask:255.255.255.192
  inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:49080900 errors:0 dropped:0 overruns:0 frame:0
  TX packets:54086580 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:35266886777 (35.2 GB)  TX bytes:60292047383 (60.2 GB)

eth0  Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
  inet6 addr: fe80::4261:86ff:fee9:d69a/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:49418744 errors:0 dropped:0 overruns:0 frame:0
  TX packets:54348661 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:36337218359 (36.3 GB)  TX bytes:60378047518 (60.3 GB)
  Interrupt:29 Base address:0x4000

eth0:0Link encap:Ethernet  HWaddr 40:61:86:e9:d6:9a
  inet addr:178.63.60.178  Bcast:178.63.60.191  Mask:255.255.255.192
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  Interrupt:29 Base address:0x4000

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:1063249 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1063249 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:325111540 (325.1 MB)  TX bytes:325111540 (325.1 MB)

vnet0 Link encap:Ethernet  HWaddr 26:5d:c9:d2:75:2e
  inet6 addr: fe80::245d:c9ff:fed2:752e/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:72 errors:0 dropped:0 overruns:0 frame:0
  TX packets:33 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:500
  RX bytes:3412 (3.4 KB)  TX bytes:2698 (2.6 KB)

the hosts /etc/network/interfaces is like so :

auto lo
iface lo inet loopback

# device: eth0
auto  eth0
iface eth0 inet manual

auto br0
iface br0 inet static
  address   178.63.60.138
  broadcast 178.63.60.191
  netmask   255.255.255.192
  gateway   178.63.60.129
  network 178.63.60.128
  bridge_ports eth0
  bridge_stp off
  bridge_fd 0
  bridge_maxwait 0

# subli.me.uk
auto eth0:0
iface eth0:0 inet static
  address 178.63.60.178
  netmask 255.255.255.192


If I ping the VM I see the packets on the host br0 but not vnet0, this
is the output of brctl show on the host :

bridge name bridge id   STP enabled interfaces
br0 8000.265dc9d2752e   no  eth0

 vnet0
this is the routing table on the host

Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
178.63.60.128   0.0.0.0 255.255.255.192 U 0  00 br0
0.0.0.0 178.63.60.129   0.0.0.0 UG10000 br0

this is the routing table on the VM

Kernel IP routing table
Destination Gateway Genmask 

Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Gleb Natapov
On Tue, Aug 24, 2010 at 05:01:10PM +0300, Avi Kivity wrote:
>  On 08/24/2010 04:52 PM, Gleb Natapov wrote:
> >
> >We can, of course. But for me it looks as arbitrary as -1/0/1 since not
> >all enum values have meanings to the caller.
> 
> Yeah.  -1/0/1's problem is that between reading the callee code and
> caller code, I manage to forget what the values mean.
> 
Luckily we have only one caller of x86_emulate_insn(), so documenting
return values right where function is called should help. :)

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Avi Kivity

 On 08/24/2010 04:52 PM, Gleb Natapov wrote:


We can, of course. But for me it looks as arbitrary as -1/0/1 since not
all enum values have meanings to the caller.


Yeah.  -1/0/1's problem is that between reading the callee code and 
caller code, I manage to forget what the values mean.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 08/10] Check whether in long mode before testing vmexit caused by cr8 access

2010-08-24 Thread Avi Kivity

 On 08/24/2010 04:47 PM, Jason Wang wrote:

CR8 is only availabe when in long mode.

Signed-off-by: Jason Wang
---
  x86/vmexit.c |   14 --
  1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/x86/vmexit.c b/x86/vmexit.c
index 819c24b..34b0af4 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -48,6 +48,7 @@ static void vmcall(void)

  #define MSR_EFER 0xc080
  #define EFER_NX_MASK(1ull<<  11)
+#define EFER_LMA_MASK   (1ull<<  10)

  static void mov_from_cr8(void)
  {
@@ -68,6 +69,15 @@ static int is_smp(void)
return cpu_count()>  1;
  }

+static int is_long_mode(void)
+{
+#ifdef __i386__
+return 0;
+#else
+return rdmsr(MSR_EFER) | EFER_LMA_MASK;
+#endif
+}
+
  static void nop(void *junk)
  {
  }
@@ -100,8 +110,8 @@ static struct test {
  } tests[] = {
{ cpuid_test, "cpuid", .parallel = 1,  },
{ vmcall, "vmcall", .parallel = 1, },
-   { mov_from_cr8, "mov_from_cr8", .parallel = 1, },
-   { mov_to_cr8, "mov_to_cr8" , .parallel = 1, },
+   { mov_from_cr8, "mov_from_cr8", is_long_mode, .parallel = 1, },
+   { mov_to_cr8, "mov_to_cr8" , is_long_mode, .parallel = 1, },
{ inl_pmtimer, "inl_from_pmtimer", .parallel = 1, },
{ ipi, "ipi", is_smp, .parallel = 0, },
{ ipi_halt, "ipi+halt", is_smp, .parallel = 0, },


Enough to #ifdef __x86_64__.  If it's defined we're in long mode.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 07/10] Correct the tss size

2010-08-24 Thread Avi Kivity

 On 08/24/2010 04:47 PM, Jason Wang wrote:

TSS size should be 104 byte.

Signed-off-by: Jason Wang
---
  x86/cstart64.S |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/x86/cstart64.S b/x86/cstart64.S
index 5d358ad..b871153 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -69,7 +69,7 @@ tss:
.long 0
.quad ring0stacktop - i * 4096


ring 0 stack


.quad 0, 0, 0


rings 1, 2, 3 stack


-   .quad 0, 0, 0, 0, 0, 0, 0, 0


1 qword reserved, 7 qwords IST


+   .quad 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0


3 dwords reserved + I/O map base address - so this looks correct?


  i = i + 1
.endr




--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Gleb Natapov
On Tue, Aug 24, 2010 at 04:41:10PM +0300, Avi Kivity wrote:
>  On 08/24/2010 04:37 PM, Gleb Natapov wrote:
> >On Tue, Aug 24, 2010 at 04:13:38PM +0300, Avi Kivity wrote:
> >>  On 08/24/2010 02:30 PM, Gleb Natapov wrote:
> >>>x86_emulate_insn() will return 1 if instruction can be restarted
> >>>without re-entering a guest.
> >>>
> >>So now we have an undocumented -1/0/1 return code?
> >>
> >>Better to have an enum for this.
> >>
> >We already have two. First is X86EMUL_ (not enum but close) for
> >more or less internal emulator use. Second is EMULATE_* for users of
> >emulate_instruction() now you want one more enum for communication
> >between emulate_instruction() and x86_emulate_insn(). Lost in enums.
> >emulate_instruction() and x86_emulate_insn() are tightly coupled right
> >now should we define formal interface between them? May be comment will
> >be enough?
> 
> Can we reuse one or the other?  Perhaps with extensions?
> 
We can, of course. But for me it looks as arbitrary as -1/0/1 since not
all enum values have meanings to the caller.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] S390: Add virtio hotplug add support

2010-08-24 Thread Alexander Graf
The one big missing feature in s390-virtio was hotplugging. This is no more.
This patch implements hotplug add support, so you can on the fly add new devices
in the guest.

Keep in mind that this needs a patch for qemu to actually leverage the
functionality.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - move dev_add to kvm_virtio.h
---
 arch/s390/include/asm/kvm_virtio.h |1 +
 drivers/s390/kvm/kvm_virtio.c  |   47 
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/arch/s390/include/asm/kvm_virtio.h 
b/arch/s390/include/asm/kvm_virtio.h
index 3f5d100..72f6141 100644
--- a/arch/s390/include/asm/kvm_virtio.h
+++ b/arch/s390/include/asm/kvm_virtio.h
@@ -59,5 +59,6 @@ struct kvm_vqconfig {
 #define VIRTIO_PARAM_MASK  0xff
 #define VIRTIO_PARAM_VRING_INTERRUPT   0x0
 #define VIRTIO_PARAM_CONFIG_CHANGED0x1
+#define VIRTIO_PARAM_DEV_ADD   0x2
 
 #endif
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 68cef4d..5a46b8c 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -32,6 +32,7 @@
  * The pointer to our (page) of device descriptions.
  */
 static void *kvm_devices;
+struct work_struct hotplug_work;
 
 struct kvm_device {
struct virtio_device vdev;
@@ -328,6 +329,47 @@ static void scan_devices(void)
 }
 
 /*
+ * match for a kvm device with a specific desc pointer
+ */
+static int match_desc(struct device *dev, void *data)
+{
+   if ((ulong)to_kvmdev(dev_to_virtio(dev))->desc == (ulong)data)
+   return 1;
+
+   return 0;
+}
+
+/*
+ * hotplug_device tries to find changes in the device page.
+ */
+static void hotplug_devices(struct work_struct *dummy)
+{
+   unsigned int i;
+   struct kvm_device_desc *d;
+   struct device *dev;
+
+   for (i = 0; i < PAGE_SIZE; i += desc_size(d)) {
+   d = kvm_devices + i;
+
+   /* end of list */
+   if (d->type == 0)
+   break;
+
+   /* device already exists */
+   dev = device_find_child(kvm_root, d, match_desc);
+   if (dev) {
+   /* XXX check for hotplug remove */
+   put_device(dev);
+   continue;
+   }
+
+   /* new device */
+   printk(KERN_INFO "Adding new virtio device %p\n", d);
+   add_kvm_device(d, i);
+   }
+}
+
+/*
  * we emulate the request_irq behaviour on top of s390 extints
  */
 static void kvm_extint_handler(u16 code)
@@ -357,6 +399,9 @@ static void kvm_extint_handler(u16 code)
 
break;
}
+   case VIRTIO_PARAM_DEV_ADD:
+   schedule_work(&hotplug_work);
+   break;
case VIRTIO_PARAM_VRING_INTERRUPT:
default:
vring_interrupt(0, vq);
@@ -390,6 +435,8 @@ static int __init kvm_devices_init(void)
 
kvm_devices = (void *) real_memory_size;
 
+   INIT_WORK(&hotplug_work, hotplug_devices);
+
ctl_set_bit(0, 9);
register_external_interrupt(0x2603, kvm_extint_handler);
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] S390: take a full byte as ext_param indicator

2010-08-24 Thread Alexander Graf
Currenty the ext_param field only distinguishes between "config change" and
"vring interrupt". We can do a lot more with it though, so let's enable a
full byte of possible values and constants to #defines while at it.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - move defines to virtio_s390.h
---
 arch/s390/include/asm/kvm_virtio.h |6 ++
 drivers/s390/kvm/kvm_virtio.c  |   19 +--
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/s390/include/asm/kvm_virtio.h 
b/arch/s390/include/asm/kvm_virtio.h
index acdfdff..3f5d100 100644
--- a/arch/s390/include/asm/kvm_virtio.h
+++ b/arch/s390/include/asm/kvm_virtio.h
@@ -54,4 +54,10 @@ struct kvm_vqconfig {
  * This is pagesize for historical reasons. */
 #define KVM_S390_VIRTIO_RING_ALIGN 4096
 
+
+/* These values are supposed to be in ext_params on an interrupt */
+#define VIRTIO_PARAM_MASK  0xff
+#define VIRTIO_PARAM_VRING_INTERRUPT   0x0
+#define VIRTIO_PARAM_CONFIG_CHANGED0x1
+
 #endif
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 4e298bc..68cef4d 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -334,7 +334,7 @@ static void kvm_extint_handler(u16 code)
 {
struct virtqueue *vq;
u16 subcode;
-   int config_changed;
+   u32 param;
 
subcode = S390_lowcore.cpu_addr;
if ((subcode & 0xff00) != VIRTIO_SUBCODE_64)
@@ -343,18 +343,25 @@ static void kvm_extint_handler(u16 code)
/* The LSB might be overloaded, we have to mask it */
vq = (struct virtqueue *)(S390_lowcore.ext_params2 & ~1UL);
 
-   /* We use the LSB of extparam, to decide, if this interrupt is a config
-* change or a "standard" interrupt */
-   config_changed = S390_lowcore.ext_params & 1;
+   /* We use ext_params to decide what this interrupt means */
+   param = S390_lowcore.ext_params & VIRTIO_PARAM_MASK;
 
-   if (config_changed) {
+   switch (param) {
+   case VIRTIO_PARAM_CONFIG_CHANGED:
+   {
struct virtio_driver *drv;
drv = container_of(vq->vdev->dev.driver,
   struct virtio_driver, driver);
if (drv->config_changed)
drv->config_changed(vq->vdev);
-   } else
+
+   break;
+   }
+   case VIRTIO_PARAM_VRING_INTERRUPT:
+   default:
vring_interrupt(0, vq);
+   break;
+   }
 }
 
 /*
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] S390: Export kvm_virtio.h

2010-08-24 Thread Alexander Graf
As suggested by Christian, we should expose headers to user space with
information that might be valuable there. The s390 virtio interface is
one of those cases. It defines an ABI between hypervisor and guest, so
it should be exposed to user space.

Reported-by: Christian Borntraeger 
Signed-off-by: Alexander Graf 
---
 arch/s390/include/asm/Kbuild |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 63a2341..ee7937d 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -8,6 +8,7 @@ header-y += ucontext.h
 header-y += vtoc.h
 header-y += zcrypt.h
 header-y += chsc.h
+header-y += kvm_virtio.h
 
 unifdef-y += cmb.h
 unifdef-y += debug.h
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 10/10] Add the 32bit smp initialization code

2010-08-24 Thread Jason Wang
Add the smp initlaization codes for 32bit. This would make the 32bit
smp tests available.

Signed-off-by: Jason Wang 
---
 x86/cstart.S |  179 --
 1 files changed, 173 insertions(+), 6 deletions(-)

diff --git a/x86/cstart.S b/x86/cstart.S
index 1bdf789..6ce200d 100644
--- a/x86/cstart.S
+++ b/x86/cstart.S
@@ -1,9 +1,65 @@
 
+#include "apic-defs.h"
+
+.globl boot_idt
+boot_idt = 0
+
+ipi_vector = 0x20
+
+max_cpus = 4
 
 .bss
 
+   . = . + 4096 * max_cpus
+   .align 16
+stacktop:
+
+   . = . + 4096
+   .align 16
+ring0stacktop:
+
+.data
+
+.align 4096
+pt:
+i = 0
+.rept 1024
+.long 0x1e7 | (i << 22)
+i = i + 1
+.endr
+
+gdt32:
+   .quad 0
+   .quad 0x00cf9b00 // flat 32-bit code segment
+   .quad 0x00cf9300 // flat 32-bit data segment
+
+tss_descr:
+.rept max_cpus
+.quad 0x8900 // 32-bit avail tss
+.endr
+gdt32_end:
+
+i = 0
+tss:
+.rept max_cpus
+.long 0
+.long ring0stacktop - i * 4096
+.long 0
+.quad 0, 0, 0
+.quad 0, 0, 0, 0, 0, 0, 0
+.long 0, 0, 0
+i = i + 1
+.endr
+tss_end:
+
+idt_descr:
+   .word 16 * 256 - 1
+   .long boot_idt
+
 .section .init
 
+.code32
+
 mb_magic = 0x1BADB002
 mb_flags = 0x0
 
@@ -11,15 +67,126 @@ mb_flags = 0x0
.long mb_magic, mb_flags, 0 - (mb_magic + mb_flags)
 mb_cmdline = 16
 
+MSR_GS_BASE = 0xc101
+
+.macro setup_percpu_area
+   lea -4096(%esp), %eax
+   mov $0, %edx
+   mov $MSR_GS_BASE, %ecx
+   wrmsr
+.endm
+
 .globl start
 start:
-   mov mb_cmdline(%ebx), %eax
-   mov %eax, __args
-   call __setup_args
-   pushl $__argv
-   pushl __argc
-   call main
+mov mb_cmdline(%ebx), %eax
+mov %eax, __args
+call __setup_args
+mov $stacktop, %esp
+setup_percpu_area
+call prepare_32
+jmpl $8, $start32
+
+prepare_32:
+lgdtl gdt32_descr
+
+   mov %cr4, %eax
+   bts $4, %eax  // pse
+   mov %eax, %cr4
+
+   mov $pt, %eax
+   mov %eax, %cr3
+
+   mov %cr0, %eax
+   bts $0, %eax
+   bts $31, %eax
+   mov %eax, %cr0
+   ret
+
+smp_stacktop:  .long 0xa
+
+ap_start32:
+   mov $0x10, %ax
+   mov %ax, %ds
+   mov %ax, %es
+   mov %ax, %fs
+   mov %ax, %gs
+   mov %ax, %ss
+   mov $-4096, %esp
+   lock/xaddl %esp, smp_stacktop
+   setup_percpu_area
+   call prepare_32
+   call load_tss
+   call enable_apic
+   call enable_x2apic
+   sti
+   nop
+   lock incw cpu_online_count
+
+1: hlt
+   jmp 1b
+
+start32:
+   call load_tss
+   call mask_pic_interrupts
+   call enable_apic
+   call smp_init
+   call enable_x2apic
+push $__argv
+push __argc
+call main
push %eax
call exit
 
+load_tss:
+   lidt idt_descr
+   mov $16, %eax
+   mov %ax, %ss
+   mov $(APIC_DEFAULT_PHYS_BASE + APIC_ID), %eax
+   mov (%eax), %eax
+   shr $24, %eax
+   mov %eax, %ebx
+   shl $3, %ebx
+   mov $((tss_end - tss) / max_cpus), %edx
+   imul %edx
+   add $tss, %eax
+   mov %ax, tss_descr+2(%ebx)
+   shr $16, %eax
+   mov %al, tss_descr+4(%ebx)
+   shr $8, %eax
+   mov %al, tss_descr+7(%ebx)
+   lea tss_descr-gdt32(%ebx), %eax
+   ltr %ax
+   ret
+
+smp_init:
+   cld
+   lea sipi_entry, %esi
+   xor %edi, %edi
+   mov $(sipi_end - sipi_entry), %ecx
+   rep/movsb
+   mov $APIC_DEFAULT_PHYS_BASE, %eax
+   movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | 
APIC_INT_ASSERT), APIC_ICR(%eax)
+   movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT), 
APIC_ICR(%eax)
+   movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_STARTUP), 
APIC_ICR(%eax)
+   call fwcfg_get_nb_cpus
+1: pause
+   cmpw %ax, cpu_online_count
+   jne 1b
+smp_init_done:
+   ret
+
+cpu_online_count:  .word 1
+
+.code16
+sipi_entry:
+   mov %cr0, %eax
+   or $1, %eax
+   mov %eax, %cr0
+   lgdtl gdt32_descr - sipi_entry
+   ljmpl $8, $ap_start32
+
+gdt32_descr:
+   .word gdt32_end - gdt32 - 1
+   .long gdt32
 
+sipi_end:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 09/10] Do not test IA32_EFER in 32bit mode.

2010-08-24 Thread Jason Wang
Signed-off-by: Jason Wang 
---
 x86/msr.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/x86/msr.c b/x86/msr.c
index 9c85369..c3e0014 100644
--- a/x86/msr.c
+++ b/x86/msr.c
@@ -49,9 +49,11 @@ struct msr_info msr_info[] =
 { .index = 0xc102, .name = "MSR_KERNEL_GS_BASE",
   .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
 },
+#ifdef __x86_64__
 { .index = 0xc080, .name = "MSR_EFER",
   .val_pairs = {{ .valid = 1, .value = 0xD00, .expected = 0xD00}}
 },
+#endif
 { .index = 0xc082, .name = "MSR_LSTAR",
   .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
 },

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 08/10] Check whether in long mode before testing vmexit caused by cr8 access

2010-08-24 Thread Jason Wang
CR8 is only availabe when in long mode.

Signed-off-by: Jason Wang 
---
 x86/vmexit.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/x86/vmexit.c b/x86/vmexit.c
index 819c24b..34b0af4 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -48,6 +48,7 @@ static void vmcall(void)
 
 #define MSR_EFER 0xc080
 #define EFER_NX_MASK(1ull << 11)
+#define EFER_LMA_MASK   (1ull << 10)
 
 static void mov_from_cr8(void)
 {
@@ -68,6 +69,15 @@ static int is_smp(void)
return cpu_count() > 1;
 }
 
+static int is_long_mode(void)
+{
+#ifdef __i386__
+return 0;
+#else
+return rdmsr(MSR_EFER) | EFER_LMA_MASK;
+#endif
+}
+
 static void nop(void *junk)
 {
 }
@@ -100,8 +110,8 @@ static struct test {
 } tests[] = {
{ cpuid_test, "cpuid", .parallel = 1,  },
{ vmcall, "vmcall", .parallel = 1, },
-   { mov_from_cr8, "mov_from_cr8", .parallel = 1, },
-   { mov_to_cr8, "mov_to_cr8" , .parallel = 1, },
+   { mov_from_cr8, "mov_from_cr8", is_long_mode, .parallel = 1, },
+   { mov_to_cr8, "mov_to_cr8" , is_long_mode, .parallel = 1, },
{ inl_pmtimer, "inl_from_pmtimer", .parallel = 1, },
{ ipi, "ipi", is_smp, .parallel = 0, },
{ ipi_halt, "ipi+halt", is_smp, .parallel = 0, },

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 06/10] Remove the duplicated rdmsr/wrmsr

2010-08-24 Thread Jason Wang
They have been implemented in lib/x86/processor.h. Also rename the
cpuid to cpuid_test because cpuid have been defined.

Signed-off-by: Jason Wang 
---
 x86/msr.c|   16 +---
 x86/vmexit.c |   20 +++-
 2 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/x86/msr.c b/x86/msr.c
index 1e7987d..9c85369 100644
--- a/x86/msr.c
+++ b/x86/msr.c
@@ -1,6 +1,7 @@
 /* msr tests */
 
 #include "libcflat.h"
+#include "processor.h"
 
 struct msr_info {
 int index;
@@ -87,21 +88,6 @@ static void report(const char *name, int passed)
printf("%s: %s\n", name, passed ? "PASS" : "FAIL");
 }
 
-static void wrmsr(unsigned index, unsigned long long value)
-{
-   unsigned a = value, d = value >> 32;
-
-   asm volatile("wrmsr" : : "a"(a), "d"(d), "c"(index));
-}
-
-static unsigned long long rdmsr(unsigned index)
-{
-   unsigned a, d;
-
-   asm volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(index));
-   return ((unsigned long long)d << 32) | a;
-}
-
 static void test_msr_rw(int msr_index, unsigned long long input, unsigned long 
long expected)
 {
 unsigned long long r = 0;
diff --git a/x86/vmexit.c b/x86/vmexit.c
index 707d5c6..819c24b 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -1,6 +1,7 @@
 
 #include "libcflat.h"
 #include "smp.h"
+#include "processor.h"
 
 static inline unsigned long long rdtsc()
 {
@@ -32,7 +33,7 @@ static unsigned int inl(unsigned short port)
 #  define R "e"
 #endif
 
-static void cpuid(void)
+static void cpuid_test(void)
 {
asm volatile ("push %%"R "bx; cpuid; pop %%"R "bx"
  : : : "eax", "ecx", "edx");
@@ -48,21 +49,6 @@ static void vmcall(void)
 #define MSR_EFER 0xc080
 #define EFER_NX_MASK(1ull << 11)
 
-unsigned long long rdmsr(unsigned index)
-{
-   unsigned a, d;
-
-   asm volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(index));
-   return ((unsigned long long)d << 32) | a;
-}
-
-void wrmsr(unsigned index, unsigned long long val)
-{
-   unsigned a = val, d = val >> 32;
-
-   asm volatile("wrmsr" : : "a"(a), "d"(d), "c"(index));
-}
-
 static void mov_from_cr8(void)
 {
unsigned long cr8;
@@ -112,7 +98,7 @@ static struct test {
int (*valid)(void);
int parallel;
 } tests[] = {
-   { cpuid, "cpuid", .parallel = 1,  },
+   { cpuid_test, "cpuid", .parallel = 1,  },
{ vmcall, "vmcall", .parallel = 1, },
{ mov_from_cr8, "mov_from_cr8", .parallel = 1, },
{ mov_to_cr8, "mov_to_cr8" , .parallel = 1, },

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 07/10] Correct the tss size

2010-08-24 Thread Jason Wang
TSS size should be 104 byte.

Signed-off-by: Jason Wang 
---
 x86/cstart64.S |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/x86/cstart64.S b/x86/cstart64.S
index 5d358ad..b871153 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -69,7 +69,7 @@ tss:
.long 0
.quad ring0stacktop - i * 4096
.quad 0, 0, 0
-   .quad 0, 0, 0, 0, 0, 0, 0, 0
+   .quad 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0
 i = i + 1
.endr

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 05/10] Drop print.S

2010-08-24 Thread Jason Wang
We've already had lib/printf.c.

Signed-off-by: Jason Wang 
---
 config-x86-common.mak |9 -
 x86/print.S   |   31 ---
 2 files changed, 4 insertions(+), 36 deletions(-)
 delete mode 100644 x86/print.S

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 2d2965f..a623983 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -33,7 +33,7 @@ test_cases: $(tests-common) $(tests)
 
 $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I lib -I lib/x86
 
-$(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o $(TEST_DIR)/print.o
+$(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o
 
 $(TEST_DIR)/hypercall.flat: $(cstart.o) $(TEST_DIR)/hypercall.o
 
@@ -45,14 +45,13 @@ $(TEST_DIR)/vmexit.flat: $(cstart.o) $(TEST_DIR)/vmexit.o
 $(TEST_DIR)/smptest.flat: $(cstart.o) $(TEST_DIR)/smptest.o
 
 $(TEST_DIR)/emulator.flat: $(cstart.o) $(TEST_DIR)/emulator.o \
-  $(TEST_DIR)/vm.o $(TEST_DIR)/print.o
+  $(TEST_DIR)/vm.o
 
 $(TEST_DIR)/port80.flat: $(cstart.o) $(TEST_DIR)/port80.o
 
 $(TEST_DIR)/tsc.flat: $(cstart.o) $(TEST_DIR)/tsc.o
 
-$(TEST_DIR)/apic.flat: $(cstart.o) $(TEST_DIR)/apic.o $(TEST_DIR)/vm.o \
-  $(TEST_DIR)/print.o
+$(TEST_DIR)/apic.flat: $(cstart.o) $(TEST_DIR)/apic.o $(TEST_DIR)/vm.o
 
 $(TEST_DIR)/realmode.flat: $(TEST_DIR)/realmode.o
$(CC) -m32 -nostdlib -o $@ -Wl,-T,$(TEST_DIR)/realmode.lds $^
@@ -66,7 +65,7 @@ $(TEST_DIR)/idt_test.flat: $(cstart.o) $(TEST_DIR)/idt.o 
$(TEST_DIR)/idt_test.o
 $(TEST_DIR)/xsave.flat: $(cstart.o) $(TEST_DIR)/idt.o $(TEST_DIR)/xsave.o
 
 $(TEST_DIR)/rmap_chain.flat: $(cstart.o) $(TEST_DIR)/rmap_chain.o \
-$(TEST_DIR)/print.o $(TEST_DIR)/vm.o
+$(TEST_DIR)/vm.o
 
 $(TEST_DIR)/svm.flat: $(cstart.o) $(TEST_DIR)/vm.o
 
diff --git a/x86/print.S b/x86/print.S
deleted file mode 100644
index b445381..000
--- a/x86/print.S
+++ /dev/null
@@ -1,31 +0,0 @@
-
-#include "print.h"
-
-#define PSEUDO_SERIAL_PORT 0xf1
-
-
-.text
-   PRINT "boo"
-   hlt
-1: jmp 1b
-
-.globl print
-print:
-   push %rax
-   push %rsi
-   push %rdx
-
-   mov %rdi, %rsi
-   mov $(PSEUDO_SERIAL_PORT), %edx
-
-putchar:
-   cmpb $0, (%rsi)
-   jz done
-   outsb
-   jmp putchar
-done:
-
-   pop %rdx
-   pop %rsi
-   pop %rax
-   ret

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 04/10] Correct the path in README

2010-08-24 Thread Jason Wang
Signed-off-by: Jason Wang 
---
 README |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/README b/README
index 6a83831..9c0c518 100644
--- a/README
+++ b/README
@@ -7,17 +7,17 @@ Tests uses a qemu's virtual test device, named testdev, for 
services like printi
 See file testdev.txt for more details.
 
 To create the tests' images just type 'make' in this directory.
-Tests' images created in ./test//*.flat
+Tests' images created in .//*.flat
 
 An example of a test invocation:
-qemu-system-x86_64 -device testdev,chardev=testlog -chardev 
file,id=testlog,path=msr.out -kernel ./test/x86/msr.flat
+qemu-system-x86_64 -device testdev,chardev=testlog -chardev 
file,id=testlog,path=msr.out -kernel ./x86/msr.flat
 This invocation runs the msr test case. The test output is in file msr.out.
 
 
 
 Directory structure:
 .:  Makefile and config files for the tests
-./test/lib: general services for the tests
-./test/lib/: architecture dependent services for the tests
-./test/: the sources of the tests and the created objects/images
+./lib: general services for the tests
+./lib/: architecture dependent services for the tests
+./: the sources of the tests and the created objects/images
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 03/10] Makefile cleanup

2010-08-24 Thread Jason Wang
Remove the obsoleted target and directories.

Signed-off-by: Jason Wang 
---
 Makefile |9 +
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/Makefile b/Makefile
index 5347ce8..d25e6f2 100644
--- a/Makefile
+++ b/Makefile
@@ -30,10 +30,6 @@ CFLAGS += -O1
 CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall
 CFLAGS += $(call cc-option, -fno-stack-protector, "")
 CFLAGS += $(call cc-option, -fno-stack-protector-all, "")
-CFLAGS += -I../include
-CFLAGS += -I ../libkvm
-
-LDFLAGS += $(CFLAGS) -L ../libkvm
 
 CXXFLAGS = $(autodepend-flags)
 
@@ -43,9 +39,6 @@ LDFLAGS += -pthread -lrt
 
 kvmtrace_objs= kvmtrace.o
 
-kvmctl: $(kvmctl_objs)
-   $(CC) $(LDFLAGS) $^ -o $@
-
 kvmtrace: $(kvmtrace_objs)
$(CC) $(LDFLAGS) $^ -o $@
 
@@ -62,4 +55,4 @@ install:
install $(tests_and_config) $(DESTDIR)
 
 clean: arch_clean
-   $(RM) kvmctl kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs)
+   $(RM) kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 02/10] Remove trailing whitespaces

2010-08-24 Thread Jason Wang
Signed-off-by: Jason Wang 
---
 config-x86-common.mak |   14 +++---
 x86/access.c  |2 +-
 x86/cstart64.S|4 ++--
 x86/print.S   |6 +++---
 x86/sieve.c   |2 +-
 x86/vm.c  |6 +++---
 6 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 19bffd4..2d2965f 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -32,18 +32,18 @@ tests_and_config = $(TEST_DIR)/*.flat 
$(TEST_DIR)/unittests.cfg
 test_cases: $(tests-common) $(tests)
 
 $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I lib -I lib/x86
- 
+
 $(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o $(TEST_DIR)/print.o
- 
+
 $(TEST_DIR)/hypercall.flat: $(cstart.o) $(TEST_DIR)/hypercall.o
- 
+
 $(TEST_DIR)/sieve.flat: $(cstart.o) $(TEST_DIR)/sieve.o \
$(TEST_DIR)/vm.o
- 
+
 $(TEST_DIR)/vmexit.flat: $(cstart.o) $(TEST_DIR)/vmexit.o
- 
+
 $(TEST_DIR)/smptest.flat: $(cstart.o) $(TEST_DIR)/smptest.o
- 
+
 $(TEST_DIR)/emulator.flat: $(cstart.o) $(TEST_DIR)/emulator.o \
   $(TEST_DIR)/vm.o $(TEST_DIR)/print.o
 
@@ -52,7 +52,7 @@ $(TEST_DIR)/port80.flat: $(cstart.o) $(TEST_DIR)/port80.o
 $(TEST_DIR)/tsc.flat: $(cstart.o) $(TEST_DIR)/tsc.o
 
 $(TEST_DIR)/apic.flat: $(cstart.o) $(TEST_DIR)/apic.o $(TEST_DIR)/vm.o \
-  $(TEST_DIR)/print.o 
+  $(TEST_DIR)/print.o
 
 $(TEST_DIR)/realmode.flat: $(TEST_DIR)/realmode.o
$(CC) -m32 -nostdlib -o $@ -Wl,-T,$(TEST_DIR)/realmode.lds $^
diff --git a/x86/access.c b/x86/access.c
index a0d88e1..067565b 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -157,7 +157,7 @@ static void ac_test_show(ac_test_t *at);
 void lidt(idt_entry_t *idt, int nentries)
 {
 descriptor_table_t dt;
-
+
 dt.limit = nentries * sizeof(*idt) - 1;
 dt.linear_addr = (unsigned long)idt;
 asm volatile ("lidt %0" : : "m"(dt));
diff --git a/x86/cstart64.S b/x86/cstart64.S
index cc4c80f..5d358ad 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -19,7 +19,7 @@ stacktop:
 ring0stacktop:
 
 .data
-   
+
 .align 4096
 ptl2:
 i = 0
@@ -38,7 +38,7 @@ ptl3:
 .align 4096
 ptl4:
.quad ptl3 + 7
-   
+
 .align 4096
 
 gdt64_desc:
diff --git a/x86/print.S b/x86/print.S
index c1b1c0d..b445381 100644
--- a/x86/print.S
+++ b/x86/print.S
@@ -3,12 +3,12 @@
 
 #define PSEUDO_SERIAL_PORT 0xf1
 
-   
+
 .text
PRINT "boo"
hlt
 1: jmp 1b
-   
+
 .globl print
 print:
push %rax
@@ -24,7 +24,7 @@ putchar:
outsb
jmp putchar
 done:
-   
+
pop %rdx
pop %rsi
pop %rax
diff --git a/x86/sieve.c b/x86/sieve.c
index ef4a5a0..6cbcd6d 100644
--- a/x86/sieve.c
+++ b/x86/sieve.c
@@ -46,6 +46,6 @@ int main()
test_sieve("virtual", v, VSIZE);
vfree(v);
 }
-
+
 return 0;
 }
diff --git a/x86/vm.c b/x86/vm.c
index 62b3ba8..b34449f 100644
--- a/x86/vm.c
+++ b/x86/vm.c
@@ -30,7 +30,7 @@ void *alloc_page()
 
 if (!free)
return 0;
-
+
 p = free;
 free = *(void **)free;
 
@@ -184,7 +184,7 @@ void *vmalloc(unsigned long size)
 unsigned pages;
 
 size += sizeof(unsigned long);
-
+
 size = (size + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1);
 vfree_top -= size;
 mem = p = vfree_top;
@@ -201,7 +201,7 @@ void *vmalloc(unsigned long size)
 void vfree(void *mem)
 {
 unsigned long size = ((unsigned long *)mem)[-1];
-
+
 while (size) {
free_page(phys_to_virt(get_pte(phys_to_virt(read_cr3()), mem) & 
PTE_ADDR));
mem += PAGE_SIZE;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 01/10] Do not track config.mak and kvmtrace

2010-08-24 Thread Jason Wang
config.mak was generated by configure and kvmtrace were compiled
throuh makefile.

Signed-off-by: Jason Wang 
---
 config.mak |8 
 kvmtrace   |  Bin
 2 files changed, 0 insertions(+), 8 deletions(-)
 delete mode 100644 config.mak
 delete mode 100755 kvmtrace

diff --git a/config.mak b/config.mak
deleted file mode 100644
index 333ec7c..000
--- a/config.mak
+++ /dev/null
@@ -1,8 +0,0 @@
-PREFIX=/usr/local
-KERNELDIR=/home/avi/kvm/linux-2.6
-ARCH=x86_64
-PROCESSOR=x86_64
-CC=gcc
-LD=ld
-OBJCOPY=objcopy
-AR=ar
diff --git a/kvmtrace b/kvmtrace
deleted file mode 100755
index 
d4610290bd47922b276f6cd3604972b07bb2e07d..
GIT binary patch
literal 0
HcmV?d1

literal 36834
zcmd6Q3w%_?_5a-6WZ8ryTp&nzC<_S?9tj~lF)thph?k{ib-...@tdxc
ze60}$wf&)Oh1%AlElTSPK_M!N*lMlTM_XH|Rl5-%C_YfD`G3zmcJFSo{r&s...@e}4bz
z|0+9k&YW}R%$YN1=H5GVv(;I=(xhojT_$!OBdAQbbCmdHtfq^D61Rw1nVn5z>C6I5
zJpS2v38f;Z(mp9#rB+F2MjA(XlSG*8A|u...@0%9sv8d6i2g)h6ziZp4u09DGEjU$MU
zbk$|kke00prhmpe...@#;v...@a4wmgIFvvaxrI)qRY3rniFIAxDffT-?N{G-04@)2D}t}J})mAfb1`0AYIZT%lw
z|L_Nm%4XxAvOkNbToShm;j{2>@u$5LbB}&G>cP9eB(XzY4zLYXcu5TW2}l`={`wg7
z12O2Q#Gp?Cy$%1OuL(fdqLF|v...@tu@x+kd7K1)h...@~&=(}Uk?}&jv6hqFg81%(4
z=-Xq^Z;l...@+9fsu#g3zysz<(73?}{P+bPW2ZV&LzOfnN|qer62(+8A=OW6*yWgZ}0i
z_|h11?ukLaI|lyZ7;OjFT4~c6R62jXh
zUd6>!NnaO5&k*;z{1oTrGPk>>-P_^z`Rlv2^**1+N7SAE
zbzPqNCSu895^!JYZS4pFjVSkfAkNd()!}uwc^m8ft=PVtqltAwgtv>ecXvR_dgg0w>8NjG?dzLbyI70I-`U#4
zeEu%K7r6HNPUdS|=...@~ljptit=gkl@|7Osvg&*6(>(I~UdIZEIuAjcs0^
zM~LW#jQTc|q...@jex-4DZm);wd>x8(v#-(H0ef_LJgl&|C@ywzg#*eq5c0d2!z}ei4gye=5h...@ztjyh1fb)uX}uJ0?lwg
zh9A+Tf%-C}R8E(f...@nhdr?j}l2nat2q%x4jisyv@tpwke...@ngi9p{flz73w-$t
zJK<^GRF`!|9nx?%sunau170pci6l...@2{ohunob9v*k-;En6990T6C?pS2N!!V)C
zZonsou#CA3_~8b8nE_98t-5Lq_+*7dT4%tg81U;1_z?ztrvb0}l8...@fnzm+yesp
z+`GYmA8nxDVZhTGNnN`P_%RBJbdLdlmI1%lfFEnX?>FGn4EO^E{5S)?&ww9qz&~fe
z)7nK{FB$M^4Nhfm81QE+Vx%Vw_;U>SpaDP0fInrx+YIl;8;j=Ww`zU>aim
zHV&^Ln1)ur&f#)`...@m4xdjj4w<55-%|VY2(}S?g2PJ)rlHgS5{GjMrXkbc$Kja-
z(@^Q(&*3Qq(-7(3!{PA+)6nSO!QoK^(~#)j#^FSQX(;q...@a&L4T1hTz;iz=3jE;`
zSKw_|&pW5eDvOTwb)I8qu47Mb@a*x`bYyL#j6(`?ljzj...@dg|N
z9mcE56?nIT2)8Zon`p!Qy6QmY^g8D19di(%E6YZ}lV{AX?KK0g#eeoq9WiMebirVG
zN^=Fqd>=#usjqUX-Z9q`%i=$Gz5Q&w4_S?>TYLQJ=H}+?5tFVVHdf{8Sw8kG8w3R2
z^^ZD0DALV(bnk5...@orx`^z?e2xknf!bkt...@d45}v(m(hHzAR|s=CubaN0^%^|4tyMJfxWKn+njV{
z-6U9V2HfD}$S(>qELwMBZ+s%Efrc;iUal7hJ}nM>6kLbu9f6}jo_lTIR!49b*n1WY
z_s?)`U!UrNynh1W3iS7mKHLt0uHNOBULl-~Gu^&<1_&
z#g2;{7dtL_nztkHjU(`xtM^C^hF);PHvU-YTTKc*X
zX>*Nupfk-i<2...@+*x}yuxqny!1dv+j)`0O+kK9Pc`vA}ZiX;6$gn;7SUM^o~Q#`Ae
zs&n...@kcejmqqvkte=pu~xlg6qpr4e0k=;tax0a5lo...@rd%m?f`5wb0p+t#a0&wd
z7!HBA25##_KJXN;{...@rx}o$yft%ba&eB5k!Eu7}3W#(RRS{d5IfrFW~|gce$...@#!
z$vs2{Cqu>uPwvpxb}Q#xOcG&IWNW`zN}#
zKc2AzG>7j%(}xit0c{kv...@mui<6Q$5{59XYD5Dc$|zw!+iTgyyOU3EZ6}C*m_-X
zDpc!PWNO~wPd)NuGL;{Q14Pq5K#Z!c4}l6EKpD7l6*gGN8eRyynG-Wxtru9
zh2%WS<@}1*zPj<9^S56ymGursi~bI_!;gg;@%~o>1bag...@yg039i8*w>vJs`8J}
zoa=Y80bOol!FhtRj*Kge{utp8qE<+HHhB6BDeWJ3@+RS
zABCEa^-$NZ8z~Lzy6Px(0PkWR1Os0?4)o2cV~_2{a^UbQq-^i<<73sJEQEAn;C9p+
z3M0K|zX+0^%%$Powj)g1Nk&
zr#NNL5|f`Mfy1uJyeZ!yq6V)50r&lr`L4iw2T3wU%OS`_gx!1=WClw}+*`pG-~{D3
zK(#~>_~$`dL-5*z*TV|1AxT*KbVbkdM<9N{f00mf0S32gmn#r|FR;DwyYY*D$~qRm
z6FIsLTY0#?8ld2?q5^1;c;rj!%I6{c(6+2P*1I(Lw=XcD%oHxxK206f8^4E(>rOou
ze=Bm_>D$of!3|U^1^SA>H^DyufZ~1NM#R7BgBwsQMq6+aDhfRF7}ORuqgAI}zl75^
zKIvLMfHm31(XJ-...@v^abu{ch2vohcdpozy^;?WW=n
zdKz%TK0gDMu+Q_K4GbKlaqQ}yccc(@6aNY^CS*1d#Xr#tLiuD8*BieQSV-KBH)=+D~%DOr4c<}O2#{(DU9vCp_0y)TLbF$&CWAT%a+eTK~?$6L+
zl+l2L=RqAf=iDQApuS)ZszQAK3UwUbAv)ma$P1ICqfksR%@|0otFFkA65SEH?+G3NSkO8(9`#X(E^7RM
z2It+MqV<8NsSDGwpg6pTx}bOYPvJ1a0+U0!G(iPd;C(sPj|IQ{gjmns1Jc8!-H|82
zE0~hN6nHui??;)4EL0YJ1}X^Kq;Xkq!zo%*Y~&OTpu|!Yh&q9&B?Q*3hb{M8?Y+xq
z0u-|MqgvQ*!jWf5WEVstDE0y=BvXrz#eW00q^9%P%klUQ1m2$I$&kUeY<4yN)z$hJ
zzughgjkd0+...@gc^)NS4kXS%Y=BX6T-!101^#rfZmeC-6TjyQjJT!_6&!h;l=zSv
z...@vy!bx-xdch>2`{lTf>`e&!UA+y|1SSvR3%#5GI7DB40(PH-p9V5Drn3BSP{P#z+M
zx$Z~yT&J<%...@5t1>=...@amg}ux8%|)w)JId1C~lUrGfW*4p1axBW{0k#3RSG=Uwr<
zc=4(3_pW`p`_bU7G?Vt8{mBTMvI>Y{l`nx{Tb8+cZ`(xTu6l~54A+zICnC_iht>&A
zzy5e&pxFFVBEfVor}?E>hroQr(W33QXDF^iH$Zg?Qm!eA_b_<56j#qu=...@1zk(3bH=aiA&Vgll`&CC;|g4qfxE*G78TFHUDLNk
zfiGN7erqlYydHezBN*%25B(G8{)L{06bHU84t!A%_}DR!`lhSrY0b6xWY_zy-iv5e
z)Rv(a_cmpu6(eUW?)7J67WZC}k%M`Ak-Du5-EYFbPYtu-X5-9&f...@o*~>!0h-)%
z(u(_3OtUmvKA|ZecaY*fFjz{RPm2JWrZ8>{7k}(t...@k@rZ(A{>+$%!$wqGmY~E(u
zR9~Xu9RPo%Kh-yfKTc=)^&Qjww)Xn<9...@-`vza;Uz#Ko9fG6zoFgVRp02zp4!AV
z&SL6mdzBAQ{Fgwc_#c#WKsg6b{Qd4OPouZ1$;VbzuXb0KJMx|gf}#p...@ncr|t}i
zsaim^dnk=ml=7%-...@fw;iZ7krj1psBdd+vf<%>v!}~OuODogQ`@>{...@ymaquzu0
zyh}dnsya1vq+?0j67?#f6z...@h)fWwRyHpyo2$9rwOv=GCz-P7l?-F~-tM+0>L_lNrY>)1C;FFOl5m5BEE1~11`BnQRhn!*Pood5
z;TH_q8{9kmz0&4<~u7YO3N$Ur6t8BGo+gmGK?h$eUOU;k
zeXSiWXe?ekwE&E8psWUECB{aNx05)q{...@j??*2pn;>+^8X;^f15URJKm3hkG8)q
zJ2pEXF=BJD)_Aktk(
zFU34d)9%YiUqbp7(jZbRRumLw+pvbQAx))89cdbFmphSWBE11AU8y2r+V{$twn^6}
zBn`9DJvia%;p-ykm`OARDWo!g$2K3S3;5p=A}>YHOG(R1u{lyQ9Vs~lDfavx%XRTL
z#cem=j5_EE9zC;r2lSUhF3~$u(i|za{GPb$%r{9jcztA-X_&Ukkupb$p5J

[PATCH kvm-unit-tests 00/10] Minor cleanup and fix for 32bit test

2010-08-24 Thread Jason Wang
The following series do some minor cleanup and fix for the 32bit test.

---

Jason Wang (10):
  Do not track config.mak and kvmtrace
  Remove trailing whitespaces
  Makefile cleanup
  Correct the path in README
  Drop print.S
  Remove the duplicated rdmsr/wrmsr
  Correct the tss size
  Check whether in long mode before testing vmexit caused by cr8 access
  Do not test IA32_EFER in 32bit mode.
  Add the 32bit smp initialization code


 Makefile  |9 --
 README|   10 +--
 config-x86-common.mak |   21 +++---
 config.mak|8 --
 kvmtrace  |  Bin
 x86/access.c  |2 -
 x86/cstart.S  |  179 +++--
 x86/cstart64.S|6 +-
 x86/msr.c |   18 +
 x86/print.S   |   31 
 x86/sieve.c   |2 -
 x86/vm.c  |6 +-
 x86/vmexit.c  |   34 -
 13 files changed, 215 insertions(+), 111 deletions(-)
 delete mode 100644 config.mak
 delete mode 100755 kvmtrace
 delete mode 100644 x86/print.S

-- 
Jason Wang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Avi Kivity

 On 08/24/2010 04:37 PM, Gleb Natapov wrote:

On Tue, Aug 24, 2010 at 04:13:38PM +0300, Avi Kivity wrote:

  On 08/24/2010 02:30 PM, Gleb Natapov wrote:

x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.


So now we have an undocumented -1/0/1 return code?

Better to have an enum for this.


We already have two. First is X86EMUL_ (not enum but close) for
more or less internal emulator use. Second is EMULATE_* for users of
emulate_instruction() now you want one more enum for communication
between emulate_instruction() and x86_emulate_insn(). Lost in enums.
emulate_instruction() and x86_emulate_insn() are tightly coupled right
now should we define formal interface between them? May be comment will
be enough?


Can we reuse one or the other?  Perhaps with extensions?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Gleb Natapov
On Tue, Aug 24, 2010 at 04:13:38PM +0300, Avi Kivity wrote:
>  On 08/24/2010 02:30 PM, Gleb Natapov wrote:
> >x86_emulate_insn() will return 1 if instruction can be restarted
> >without re-entering a guest.
> >
> 
> So now we have an undocumented -1/0/1 return code?
> 
> Better to have an enum for this.
> 
We already have two. First is X86EMUL_ (not enum but close) for
more or less internal emulator use. Second is EMULATE_* for users of
emulate_instruction() now you want one more enum for communication
between emulate_instruction() and x86_emulate_insn(). Lost in enums.
emulate_instruction() and x86_emulate_insn() are tightly coupled right
now should we define formal interface between them? May be comment will
be enough?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM timekeeping and TSC virtualization

2010-08-24 Thread David S. Ahern


On 08/23/10 23:47, Zachary Amsden wrote:
> I've heard the rumor that TSC is orders of magnitude faster under VMware
> than under KVM from three people now, and I thought you were part of
> that camp.
> 
> Needless to say, they are either laughably incorrect, or possess some
> great secret knowledge of how to make things under virtualization go
> faster than bare metal.
> 
> I also have a magical talking unicorn, which, btw, is invisible. 
> Extraordinary claims require extraordinary proof (the proof of my
> unicorn is too complex to fit in the margin of this e-mail, however, I
> assure you he is real).

I have put in a lot of time over the past 3 years to understand how the
'magic' of virtualization works; please don't lump me into camps until I
raise my hand as being part of one.


>> My point is that kvmclock is Red Hat's answer for the future -- RHEL6,
>> RHEL5.Y (whenever it proves reliable). What about the present?  What
>> about products based on other distributions newer than RHEL5 but
>> pre-kvmclock?
>>
> 
> It should be obvious from this patchset... PIT or TSC.
> 
> KVM did not have an in-kernel PIT implementation circa 2008, so this
> data is quite old.  It's much faster now and will continue to get faster
> as exit cost goes down and the emulation gets further optimized.

It was in-kernel pit in early 2008 (kernel git entry):

commit 7837699fa6d7adf81f26ab73a5f6897ea1ab9d6a
Author: Sheng Yang 
Date:   Mon Jan 28 05:10:22 2008 +0800

KVM: In kernel PIT model


> 
> Plus, now we have an error-free TSC.
> 
>> There are a lot of moving windows of what to use as a clock source, not
>> just per major number (RHEL4, RHEL5) but minor number (e.g., TSC
>> stability on RHEL4 -- e.g.,
>> https://bugzilla.redhat.com/show_bug.cgi?id=491154) and further
>> maintenance releases (kvmclock requiring RHEL5.5+). That is not very
>> friendly to a product making a transition to virtualization - and with
>> the same code base running bare metal or in a VM.
>>
> 
> If you have old software running on broken hardware you do not get
> hardware performance and error-free time virtualization.  With any
> vendor.  Period.

Sucks to be old *and* broken. But old with fancy new wheels, er hardware
-- like commodity x86 servers running Nehalem-based processors -- is a
different story.

> 
> With this patchset, KVM now has a much stronger guarantee: If you have
> old guest software running on broken hardware, using SMP virtual
> machines, you do not get hardware performance and error-free time
> virtualization.However, if you have new guest software, non-broken
> hardware, or can simply run UP guests instead of SMP, you can have
> hardware performance, and it is now error free.  Alternatively, you can
> sacrifice some accuracy and have hardware performance, even for SMP
> guests, if you can tolerate some minor cross-CPU TSC variation.  No
> other vendor I know of can make that guarantee.
> 
> Zach

If the processor has a stable TSC why trap it? I realize you are trying
to cover a gauntlet of hardware and guests, so maybe a nerd knob is
needed to disable.

David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call cancelled [was: Re: KVM call agenda for August 24]

2010-08-24 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
> On 08/23/2010 05:30 PM, Chris Wright wrote:
> >Please send in any agenda items you are interested in covering.
> 
> There are quite a few important discussions on the list but I think
> they should stay on the list right now.
> 
> So it sounds like we don't have an agenda for today.

No agenda, so this week's call is cancelled.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Rename PCI_FUNC_MAX to PCI_FUNCTIONS_PER_DEVICES in pci.[ch]

2010-08-24 Thread Ken CC
PCI_FUNC_MAX is introduced by
6eab3de16d36c48a983366b09d0a0029a5260bc3
and
6fa84913eccec4266a27c81ae88465f6790742b9
which should be safe to rename to PCI_FUNCTIONS_PER_DEVICES.

Signed-off-by: Ken CC 
---
 hw/pci.c |4 ++--
 hw/pci.h |1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index fc4becd..3901455 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -711,7 +711,7 @@ static int pci_init_multifunction(PCIBus *bus, PCIDevice 
*dev)
 return 0;
 }
 /* function 0 indicates single function, so function > 0 must be NULL */
-for (func = 1; func < PCI_FUNC_MAX; ++func) {
+for (func = 1; func < PCI_FUNCTIONS_PER_DEVICE; ++func) {
 if (bus->devices[PCI_DEVFN(slot, func)]) {
 error_report("PCI: %x.0 indicates single function, "
  "but %x.%x is already populated.",
@@ -750,7 +750,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 assert(devfn / PCI_FUNCTIONS_PER_DEVICE < PCIBUS_MAX_DEVICES);
 if (devfn < 0) {
 for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
-devfn += PCI_FUNC_MAX) {
+devfn += PCI_FUNCTIONS_PER_DEVICE) {
 if (!bus->devices[devfn])
 goto found;
 }
diff --git a/hw/pci.h b/hw/pci.h
index eb97b76..f6fb6d8 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -16,7 +16,6 @@ struct kvm_irq_routing_entry;
 #define PCI_DEVFN(slot, func)   slot) & 0x1f) << 3) | ((func) & 0x07))
 #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn) ((devfn) & 0x07)
-#define PCI_FUNC_MAX8
 #define PCI_FUNCTIONS_PER_DEVICE 8
 #define PCIBUS_MAX_DEVICES 32
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Check pci slot number against PCIBUS_MAX_DEVICES in parse_pci_devfn

2010-08-24 Thread Ken CC
If pci addr provided from command line is bigger than 32,
PCIBUS_MAX_DEVICES, return error -EINVAL.

32 << 3 | 7 == 256 (PCIBUS_MAX_FUNCTIONS)
PCIBUS_MAX_FUNCTIONS = PCIBUS_MAX_DEVICES * PCI_FUNCTIONS_PER_DEVICE

Signed-off-by: Ken CC 
---
 hw/qdev-properties.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 9219cd7..565fd08 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -1,5 +1,5 @@
 #include "net.h"
-#include "qdev.h"
+#include "pci.h"
 #include "qerror.h"
 
 void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
@@ -514,6 +514,8 @@ static int parse_pci_devfn(DeviceState *dev, Property 
*prop, const char *str)
 return -EINVAL;
 }
 }
+if (slot >= PCIBUS_MAX_DEVICES)
+return -EINVAL;
 if (str[n] != '\0')
 return -EINVAL;
 if (fn > 7)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] pci init: fail qemu if devfn exceeding the max function number supported on bus

2010-08-24 Thread Ken CC
Check if devfn < PCIBUS_MAX_DEVICES * PCI_FUNCTIONS_PER_DEVICE

Signed-off-by: Ken CC 
---
 hw/pci.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 9234fe3..fc4becd 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -747,6 +747,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
  PCIConfigWriteFunc *config_write,
  bool is_bridge)
 {
+assert(devfn / PCI_FUNCTIONS_PER_DEVICE < PCIBUS_MAX_DEVICES);
 if (devfn < 0) {
 for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
 devfn += PCI_FUNC_MAX) {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] PCI: define PCIBUS_MAX_DEVICES and PCI_FUNCTIONS_PER_DEVICE in pci.h

2010-08-24 Thread Ken CC
And update the max function number used in struct PCIBus{} to
PCIBUS_MAX_FUNCTIONS = PCI_FUNCTIONS_PER_DEVICE * PCIBUS_MAX_DEVICES

TODO:
according to Avi Kivity, PCIBus.devices[] should be renamed to functions[]

Signed-off-by: Ken CC 
---
 hw/pci.c |2 +-
 hw/pci.h |2 ++
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 70dbace..9234fe3 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -47,7 +47,7 @@ struct PCIBus {
 pci_hotplug_fn hotplug;
 DeviceState *hotplug_qdev;
 void *irq_opaque;
-PCIDevice *devices[256];
+PCIDevice *devices[PCIBUS_MAX_DEVICES * PCI_FUNCTIONS_PER_DEVICE];
 PCIDevice *parent_dev;
 target_phys_addr_t mem_base;
 
diff --git a/hw/pci.h b/hw/pci.h
index ccb99d0..eb97b76 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -17,6 +17,8 @@ struct kvm_irq_routing_entry;
 #define PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn) ((devfn) & 0x07)
 #define PCI_FUNC_MAX8
+#define PCI_FUNCTIONS_PER_DEVICE 8
+#define PCIBUS_MAX_DEVICES 32
 
 /* Class, Vendor and Device IDs from Linux's pci_ids.h */
 #include "pci_ids.h"

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Christian Borntraeger
Am 24.08.2010 14:22, schrieb Avi Kivity:
> First of all we need a virtio/s390 specification, like we have a
> virtio/pci spec.

Here is something that I started a year ago but never finished.

Christian




guest/host interface for s390/virtio devices

KVM_DEVICE_DESCRIPTOR PAGE
0816   2431
  +++++
0 | type   | num_vq | featlen| conflen|
  +++++
1 | status |  |
  ++  +-->+
2 | config|   |
  +   +   |
...   |
  |
+-+
|
|   CONFIG ARRAY
|   0816   2431
+>+++++
0 |   |
  +   +
1 |   |
  +   +
2 |   |
  +virtqueue 0+-->+
3 |   |   |
  +   +   |
4 |   |   |
  +   +   |
5 |   |   |
  +++++   |
  |///+   |
  |///+   |
  +++++   |
  |   |   |
  +   +   |
  |   |   |
  +   +   |
  |   |   |
  +virtqueue num_vq - 1   +-->+
  |   |   |
  +   +   |
  |   |   |
  +   +   |
  |   |   |
  +++++   |
 num_vq*6 | feature bits  (featlen * 2 Bytes) |   |
  + +++   |
  | | |   |
  +++ +   |
  | config space (conflen Bytes)  |   |
  +  ++   |
  |  ||
  ++++|
  |
  |
+-+
|
|   VIRTQUEUE
|  0816   2431
|>+++++
0 | interrupt token ...   |
  +   +
1 | ... set by guest  |
  +++++
2 | virtio ring address   |
  +   +
3 | ... set by host   |
  +++++
4 | number of elems |/+
  +++/+
5 |///+
  +++++



COMPLETE KVM_DEVICE_DESCRIPTOR PAGE
0816   2431
  +++++
0 | type   | num_vq | featlen| conflen|
  +++++
1 | status |  interrupt token |
  ++  +
2 | virtqueue 0(set by guest) |
  +++++
3 || virtio ring address  |
  ++  +
4 | virtqueue 0 (set by host) |
  +++++
5 || number of descr ||
  +++++
6 |///|
  +++++
7 ||   interrupt token|
  +---

Re: [PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function

2010-08-24 Thread Avi Kivity

 On 08/24/2010 04:20 PM, Gleb Natapov wrote:



+{
+   struct decode_cache *c =&ctxt->decode;
+
+   /* All REP prefixes have the same first termination condition */
+   if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0)
+   return true;

This is checked during the beginning of the instruction, not after
completion.  Why is it here?  it will just be duplicated.


SDM describes REP instruction algorithm this way:

WHILE CountReg ≠ 0
   DO
Service pending interrupts (if any);
Execute associated string instruction;
CountReg ← (CountReg – 1);
IF CountReg = 0
 THEN exit WHILE loop; FI;
IF (Repeat prefix is REPZ or REPE) and (ZF = 0)
or (Repeat prefix is REPNZ or REPNE) and (ZF = 1)
 THEN exit WHILE loop; FI;
   OD;

So CountReg is checked at the beginning and after each iteration.


The second check is meaningless (and ZF checks should be qualified with 
the actual instruction).



Practically it will save us one return to a guest and exit back
to emulator at the end of rep instruction (not a big deal).


Not even that - if we reenter to the beginning of the rep instruction 
the cpu will skip over it without exiting (unless in big real mode with 
eigs=1).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function

2010-08-24 Thread Gleb Natapov
On Tue, Aug 24, 2010 at 04:11:20PM +0300, Avi Kivity wrote:
>  On 08/24/2010 02:30 PM, Gleb Natapov wrote:
> >Signed-off-by: Gleb Natapov
> >---
> >  arch/x86/kvm/emulate.c |   42 +-
> >  1 files changed, 29 insertions(+), 13 deletions(-)
> >
> >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >index f9f8353..d34d706 100644
> >--- a/arch/x86/kvm/emulate.c
> >+++ b/arch/x86/kvm/emulate.c
> >@@ -2921,6 +2921,32 @@ done:
> > return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
> >  }
> >
> >+static bool string_inst_completed(struct x86_emulate_ctxt *ctxt)
> 
> s/inst/insn/.
> 
> >+{
> >+struct decode_cache *c =&ctxt->decode;
> >+
> >+/* All REP prefixes have the same first termination condition */
> >+if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0)
> >+return true;
> 
> This is checked during the beginning of the instruction, not after
> completion.  Why is it here?  it will just be duplicated.
> 
SDM describes REP instruction algorithm this way:

WHILE CountReg ≠ 0
  DO
   Service pending interrupts (if any);
   Execute associated string instruction;
   CountReg ← (CountReg – 1);
   IF CountReg = 0
THEN exit WHILE loop; FI;
   IF (Repeat prefix is REPZ or REPE) and (ZF = 0)
   or (Repeat prefix is REPNZ or REPNE) and (ZF = 1)
THEN exit WHILE loop; FI;
  OD;

So CountReg is checked at the beginning and after each iteration.
Practically it will save us one return to a guest and exit back
to emulator at the end of rep instruction (not a big deal).

> >+
> >+/* The second termination condition only applies for REPE
> >+ * and REPNE. Test if the repeat string operation prefix is
> >+ * REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
> >+ * corresponding termination condition according to:
> >+ *  - if REPE/REPZ and ZF = 0 then done
> >+ *  - if REPNE/REPNZ and ZF = 1 then done
> >+ */
> >+if (((c->b == 0xa6) || (c->b == 0xa7) ||
> >+ (c->b == 0xae) || (c->b == 0xaf))
> >+&&  (((c->rep_prefix == REPE_PREFIX)&&
> >+ ((ctxt->eflags&  EFLG_ZF) == 0))
> >+|| ((c->rep_prefix == REPNE_PREFIX)&&
> >+((ctxt->eflags&  EFLG_ZF) == EFLG_ZF
> >+return true;
> >+
> >+return false;
> >+}
> >+
> 
> -- 
> error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Avi Kivity

 On 08/24/2010 02:30 PM, Gleb Natapov wrote:

x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.



So now we have an undocumented -1/0/1 return code?

Better to have an enum for this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -kvm] kvm: fix regression from rework KVM mmu_shrink() code

2010-08-24 Thread Marcelo Tosatti
On Tue, Aug 24, 2010 at 10:31:07AM +0800, Xiaotian Feng wrote:
> Latest kvm mmu_shrink code rework makes kernel changes 
> kvm->arch.n_used_mmu_pages/
> kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is 
> called
> by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
> kvm_mmu_available_pages(vcpu->kvm) is unchanged after 
> kvm_mmu_prepare_zap_page(),
> This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever.
> Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.
> 
> Reported-by: Avi Kivity 
> Signed-off-by: Xiaotian Feng 
> Tested-by: Avi Kivity 
> Cc: Marcelo Tosatti 
> Cc: Dave Hansen 
> Cc: Tim Pepper 
> ---
>  arch/x86/kvm/mmu.c |7 +++
>  1 files changed, 3 insertions(+), 4 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function

2010-08-24 Thread Avi Kivity

 On 08/24/2010 02:30 PM, Gleb Natapov wrote:

Signed-off-by: Gleb Natapov
---
  arch/x86/kvm/emulate.c |   42 +-
  1 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f9f8353..d34d706 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2921,6 +2921,32 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
  }

+static bool string_inst_completed(struct x86_emulate_ctxt *ctxt)


s/inst/insn/.


+{
+   struct decode_cache *c =&ctxt->decode;
+
+   /* All REP prefixes have the same first termination condition */
+   if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0)
+   return true;


This is checked during the beginning of the instruction, not after 
completion.  Why is it here?  it will just be duplicated.



+
+   /* The second termination condition only applies for REPE
+* and REPNE. Test if the repeat string operation prefix is
+* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
+* corresponding termination condition according to:
+*  - if REPE/REPZ and ZF = 0 then done
+*  - if REPNE/REPNZ and ZF = 1 then done
+*/
+   if (((c->b == 0xa6) || (c->b == 0xa7) ||
+(c->b == 0xae) || (c->b == 0xaf))
+   &&  (((c->rep_prefix == REPE_PREFIX)&&
+((ctxt->eflags&  EFLG_ZF) == 0))
+   || ((c->rep_prefix == REPNE_PREFIX)&&
+   ((ctxt->eflags&  EFLG_ZF) == EFLG_ZF
+   return true;
+
+   return false;
+}
+


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for August 24

2010-08-24 Thread Anthony Liguori

On 08/23/2010 05:30 PM, Chris Wright wrote:

Please send in any agenda items you are interested in covering.
   


There are quite a few important discussions on the list but I think they 
should stay on the list right now.


So it sounds like we don't have an agenda for today.

Regards,

Anthony Liguori


thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having trouble with ballooning

2010-08-24 Thread Moritz Duge

Am 23.08.2010 20:16, schrieb Marcelo Tosatti:

On Thu, Aug 05, 2010 at 01:41:16PM +0200, Moritz Duge wrote:
   

Hi,
I had some trouble while using the ballooning feature of KVM (using
Ubuntu 10.04 with standard software versions).


The first scenario:
1. Having a guest started by this command: qemu -enable-kvm -m 768
-balloon virtio -cdrom linux_2.6.34.iso
The guest is running Linux 2.6.34 including the ballooning driver.
2. Entering "balloon 256" and a few seconds later "info balloon" in
the Qemu monitor. Qemu will report, the guest uses 256mb of memory
now. The guest is reporting the same (using "free" for example).
3. Entering "change ide1-cd0 linux_2.6.18.iso" to change the guests
CD-ROM to another image, containing a Linux kernel without
ballooning driver.
4. Rebooting the guest.
5. After booting the 2.6.18-OS, it will report it has 768mb memory
(using "free). But Qemu monitor will still tell 256, when entering
"info memory".
I know why this happens. But is this a good behaviour? Shouldn't
Qemu tell something like "maybe 256, but there is no more balloon
driver in the guest and maybe it uses the full 768 now"???
 

What version of qemu-kvm are you using? Reporting should work
properly with a recent qemu-kvm version.
   

I tested it on 2 systems:
- Ubuntu 10.04 and it's native version of Qemu+KVM (it's a 0.12.3)

- Debian 5.0.5 on which I compiled a 2.6.34 kernel, it's KVM module
  and Qemu 0.12.4 from sources.

On both systems, the command "info balloon" outputs the value I set,
when the ballooning has been still active. And it continues to report
this value, also if the guest has no more ballooning driver.
It also continues to report this value, when the guest starts to use
more, then the ballooned down memory.
(If you have a look at scenario one: If the guests starts to use more
then 256mb, up to 768, "info balloon" still reports 256)

For testing I actually booted a Debian 5.0.5 guest, ballooned down
the memory to 256, inserted a Kubuntu 6.06 cdrom-iso and rebooted
the guest from that cdrom-iso. Kubuntu 6.06 comes with a 2.6.18
kernel, so it has no virtio support at all.
After that, I created a tmpfs-mount in the Kubuntu guest and filled
it using "dd if=/dev/hdc of=/tmpfs-mount/foobar".



The second scenario:
After the first scenario, the guest can also really start using the
additional 512mb of memory (768 - 256)!!! I think this shouldn't
happen or at least there should be an option to allow or deny this.
Or at least least least this should be printed in big letters in the
man-pages or somewhere else where everyone will read it!
Because before I experienced this, I assumed I can be sure the guest
can't get back the memory which was freed using ballooning. So if I
use the memory freed by ballooning for some other qemu-instances and
the first one starts to use those memory again, all Qemu instances
will crash (this is what actually happens in most cases).
What I'm asking for, is a way to force the guest to stay in the
memory I assigned by ballooning. And if the guest tries to use more
memory (maybe because it just unloaded the ballooning driver) the
guest should crash, but the host shouldn't get in any trouble!!!
This can be really annoying. I think a very common use-case for
virtualization is, to run untrusted software or unsecure webservices
in a vm, so the bad software can't do anything to the host or other
VMs on the host. But when using ballooning, the bad software can!
It's no "remote code execution", but the guest can consume a lot of
memory and cause the host or at least the other VMs on the host to
crash.
 

Ballooning requires guest cooperation. If you want to enforce memory
limits, take a look at cgroups (Documentation/cgroups/memory.txt in the
kernel source tree).

   

Wohooo! That's what I was looking for!
It's just a little sad, that I have to guess the overhead made by the
Qemu process itself, to add it to the memory limit I want to set.
A hard ballooning implemented by Qemu would look a little "cleaner" I
think. But for the beginning this is a nice way to do it, also because
it gives the possibility to use the swap if the guest ignores the
ballooning, so the guest doesn't has to crash.




The third scenario:
1. Booting a machine with a guest not having a ballooning driver.
(e.g. qemu -enable-kvm -m 768 -balloon virtio -cdrom
linux_2.6.18.iso)
2. Adjusting the memory by "balloon 512" in the qemu monitor.
3. Qemu won't report that it couldn't adjust the memory. Instead it
will wait until the guest loads a ballooning driver. Is this a good
behaviour? Shouldn't there be at least a switch in the qemu monitor
for the command "balloon". So if I use the switch when changing the
memory (e.g. "balloon -h 256"), qemu won't try to change the memory
later and it will tell me "error: no ballooning driver found".
 

You see that that guest has not ballooned down with the output
from "info balloon".

   

Thanks for reading and thanks a lot, if there will be a solution for
this, s

Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Avi Kivity

 On 08/24/2010 03:32 PM, Alexander Graf wrote:



Perhaps we should freeze virtio/s390 development until someone feels
sufficiently motivated.

Sure, go ahead. I don't think that'll help anyone but if it makes you
feel good...


I don't maintain virtio or the virtio-s390 interface, so I can't freeze 
it, but I think we should do so.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 08/12] Inject asynchronous page fault into a guest if page is swapped out.

2010-08-24 Thread Avi Kivity

 On 08/24/2010 03:28 PM, Gleb Natapov wrote:

On Mon, Aug 23, 2010 at 07:17:20PM +0300, Avi Kivity wrote:

+static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+{
+   if (unlikely(vcpu->arch.apf_memslot_ver !=
+vcpu->kvm->memslot_version)) {
+   u64 gpa = vcpu->arch.apf_msr_val&   ~0x3f;
+   unsigned long addr;
+   int offset = offset_in_page(gpa);
+
+   addr = gfn_to_hva(vcpu->kvm, gpa>>   PAGE_SHIFT);
+   vcpu->arch.apf_data = (u32 __user *)(addr + offset);
+   if (kvm_is_error_hva(addr)) {
+   vcpu->arch.apf_data = NULL;
+   return -EFAULT;
+   }
+   }
+
+   return put_user(val, vcpu->arch.apf_data);
+}

This nice cache needs to be outside apf to reduce complexity for
reviewers and since it is useful for others.

Would be good to have memslot-cached kvm_put_guest() and kvm_get_guest().


Something like this? (only compile tested)


Yes, exactly.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Alexander Graf
Avi Kivity wrote:
>  On 08/24/2010 03:25 PM, Alexander Graf wrote:
>> Avi Kivity wrote:
>>>   On 08/24/2010 03:14 PM, Christian Borntraeger wrote:
 I have no strong opinion on that, but I think its more a matter of
 where
 to put an interface description. A header file seems just the right
 place.
 I will let you (or Rusty) decide.
>>> First of all we need a virtio/s390 specification, like we have a
>>> virtio/pci spec.
>> Sure, go ahead and write one :). All the bits are open.
>> No seriously, I've wanted to write one for quite a while but this is not
>> the right patch set for this.
>
> Perhaps we should freeze virtio/s390 development until someone feels
> sufficiently motivated.

Sure, go ahead. I don't think that'll help anyone but if it makes you
feel good...


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Avi Kivity

 On 08/24/2010 03:25 PM, Alexander Graf wrote:

Avi Kivity wrote:

  On 08/24/2010 03:14 PM, Christian Borntraeger wrote:

I have no strong opinion on that, but I think its more a matter of where
to put an interface description. A header file seems just the right
place.
I will let you (or Rusty) decide.

First of all we need a virtio/s390 specification, like we have a
virtio/pci spec.

Sure, go ahead and write one :). All the bits are open.
No seriously, I've wanted to write one for quite a while but this is not
the right patch set for this.


Perhaps we should freeze virtio/s390 development until someone feels 
sufficiently motivated.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 08/12] Inject asynchronous page fault into a guest if page is swapped out.

2010-08-24 Thread Gleb Natapov
On Mon, Aug 23, 2010 at 07:17:20PM +0300, Avi Kivity wrote:
> >
> >+static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
> >+{
> >+if (unlikely(vcpu->arch.apf_memslot_ver !=
> >+ vcpu->kvm->memslot_version)) {
> >+u64 gpa = vcpu->arch.apf_msr_val&  ~0x3f;
> >+unsigned long addr;
> >+int offset = offset_in_page(gpa);
> >+
> >+addr = gfn_to_hva(vcpu->kvm, gpa>>  PAGE_SHIFT);
> >+vcpu->arch.apf_data = (u32 __user *)(addr + offset);
> >+if (kvm_is_error_hva(addr)) {
> >+vcpu->arch.apf_data = NULL;
> >+return -EFAULT;
> >+}
> >+}
> >+
> >+return put_user(val, vcpu->arch.apf_data);
> >+}
> 
> This nice cache needs to be outside apf to reduce complexity for
> reviewers and since it is useful for others.
> 
> Would be good to have memslot-cached kvm_put_guest() and kvm_get_guest().
> 
Something like this? (only compile tested)


diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c13cc48..9aa3dd2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -168,10 +168,18 @@ struct kvm_irq_routing_table {};
 
 struct kvm_memslots {
int nmemslots;
+   u32 generation;
struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
KVM_PRIVATE_MEM_SLOTS];
 };
 
+struct gfn_to_hva_cache {
+   u32 generation;
+   gpa_t gpa;
+   unsigned long hva;
+   struct kvm_memory_slot *memslot;
+};
+
 struct kvm {
spinlock_t mmu_lock;
raw_spinlock_t requests_lock;
@@ -315,12 +323,16 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, 
const void *data,
 int offset, int len);
 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
unsigned long len);
+int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+  void *data, unsigned long len);
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
+gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b78b794..512cf9b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -685,6 +685,7 @@ skip_lpage:
memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
if (mem->slot >= slots->nmemslots)
slots->nmemslots = mem->slot + 1;
+   slots->generation++;
slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
 
old_memslots = kvm->memslots;
@@ -721,6 +722,7 @@ skip_lpage:
memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
if (mem->slot >= slots->nmemslots)
slots->nmemslots = mem->slot + 1;
+   slots->generation++;
 
/* actual memory is freed via old in kvm_free_physmem_slot below */
if (!npages) {
@@ -1175,6 +1177,36 @@ int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const 
void *data,
return 0;
 }
 
+int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+  void *data, unsigned long len)
+{
+   int r;
+   gfn_t gfn = ghc->gpa >> PAGE_SHIFT;
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+
+   if (slots->generation != ghc->generation) {
+   int offset = offset_in_page(ghc->gpa);
+
+   ghc->hva = gfn_to_hva(kvm, gfn);
+   if (!kvm_is_error_hva(ghc->hva))
+   ghc->hva += offset;
+   ghc->generation = slots->generation;
+
+   ghc->memslot = gfn_to_memslot(kvm, gfn);
+   }
+   
+   if (kvm_is_error_hva(ghc->hva))
+   return -EFAULT;
+
+   r = copy_to_user((void __user *)ghc->hva, data, len);
+   if (r)
+   return -EFAULT;
+   mark_page_dirty_in_slot(kvm, ghc->memslot, gfn);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_guest_cached);
+
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
 {
return kvm_write_guest_page(kvm, gfn, empty_zero_page, offset, len);
@@ -1200,11 +1232,9 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned 
long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
-void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
+gfn_t gfn)
 {
-   struct kvm_memory_slot *memslot;
-
-

Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Alexander Graf
Avi Kivity wrote:
>  On 08/24/2010 03:14 PM, Christian Borntraeger wrote:
>>
>> I have no strong opinion on that, but I think its more a matter of where
>> to put an interface description. A header file seems just the right
>> place.
>> I will let you (or Rusty) decide.
>
> First of all we need a virtio/s390 specification, like we have a
> virtio/pci spec.

Sure, go ahead and write one :). All the bits are open.
No seriously, I've wanted to write one for quite a while but this is not
the right patch set for this.

> Second, I agree it should be in an exported header file, even if qemu
> doesn't make use of it.  It's an interface and should be exported.

Hrm. *shrug* if you think it makes sense. I'm reasonably indifferent
either way, but I don't see value-add in it.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Avi Kivity

 On 08/24/2010 03:14 PM, Christian Borntraeger wrote:


I have no strong opinion on that, but I think its more a matter of where
to put an interface description. A header file seems just the right place.
I will let you (or Rusty) decide.


First of all we need a virtio/s390 specification, like we have a 
virtio/pci spec.


Second, I agree it should be in an exported header file, even if qemu 
doesn't make use of it.  It's an interface and should be exported.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Avi Kivity

 On 08/24/2010 03:16 PM, Chen Cao wrote:

On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote:

  On 08/24/2010 03:07 PM, Isaku Yamahata wrote:

On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote:

  On 08/24/2010 02:35 PM, Isaku Yamahata wrote:

Add Cc: m...@redhat.com.

MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
And the name should be start with PCI_ prefix for consistency?

Except that, the patches look okay.


These aren't slots, are they?  They are functions.

The function checks if given $slot.$fn (or $slot) is valid.
So it's slots. max 32.

+assert(devfn<   PCIBUS_MAX_DEVICES);


Looks like we're comparing a function number to PCIBUS_MAX_DEVICES.


PCIBUS_MAX_DEVICES is the size of PCIBus.devices[], I have added it in
the first patch at the defination of struct PCIBus, line 50 hw/pci.c.
so i think the better name of the macro should be PCIBUS_MAX_FN,
right?


Or make it 32 and scale it by PCI_FUNCTIONS_PER_DEVICE.

PCIBus.devices[] should be renamed to functions[] (later).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Chen Cao
On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote:
>  On 08/24/2010 03:07 PM, Isaku Yamahata wrote:
> >On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote:
> >>  On 08/24/2010 02:35 PM, Isaku Yamahata wrote:
> >>>Add Cc: m...@redhat.com.
> >>>
> >>>MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
> >>>And the name should be start with PCI_ prefix for consistency?
> >>>
> >>>Except that, the patches look okay.
> >>>
> >>These aren't slots, are they?  They are functions.
> >The function checks if given $slot.$fn (or $slot) is valid.
> >So it's slots. max 32.
> 
> +assert(devfn<  PCIBUS_MAX_DEVICES);
> 
> 
> Looks like we're comparing a function number to PCIBUS_MAX_DEVICES.
> 

PCIBUS_MAX_DEVICES is the size of PCIBus.devices[], I have added it in
the first patch at the defination of struct PCIBus, line 50 hw/pci.c.
so i think the better name of the macro should be PCIBUS_MAX_FN,
right?


Ken

> -- 
> error compiling committee.c: too many arguments to function
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Avi Kivity

 On 08/24/2010 03:24 PM, Isaku Yamahata wrote:

On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote:

  On 08/24/2010 03:07 PM, Isaku Yamahata wrote:

On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote:

   On 08/24/2010 02:35 PM, Isaku Yamahata wrote:

Add Cc: m...@redhat.com.

MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
And the name should be start with PCI_ prefix for consistency?

Except that, the patches look okay.


These aren't slots, are they?  They are functions.

The function checks if given $slot.$fn (or $slot) is valid.
So it's slots. max 32.

+assert(devfn<   PCIBUS_MAX_DEVICES);


Looks like we're comparing a function number to PCIBUS_MAX_DEVICES.

Ah, you're talking about 2/3. I talked about 3/3.
You're right. The name is misleading.
PCIBUS_MAX_FUNCTIONS? Or suggestions?


Or assert(devfn / 8 < PCIBUS_MAX_DEVICES) (and change that to be 32).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Isaku Yamahata
On Tue, Aug 24, 2010 at 03:04:44PM +0300, Avi Kivity wrote:
>  On 08/24/2010 03:07 PM, Isaku Yamahata wrote:
>> On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote:
>>>   On 08/24/2010 02:35 PM, Isaku Yamahata wrote:
 Add Cc: m...@redhat.com.

 MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
 And the name should be start with PCI_ prefix for consistency?

 Except that, the patches look okay.

>>> These aren't slots, are they?  They are functions.
>> The function checks if given $slot.$fn (or $slot) is valid.
>> So it's slots. max 32.
>
> +assert(devfn<  PCIBUS_MAX_DEVICES);
>
>
> Looks like we're comparing a function number to PCIBUS_MAX_DEVICES.

Ah, you're talking about 2/3. I talked about 3/3.
You're right. The name is misleading.
PCIBUS_MAX_FUNCTIONS? Or suggestions?
-- 
yamahata
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Christian Borntraeger
Am 24.08.2010 14:06, schrieb Alexander Graf:
>>>  #define VIRTIO_SUBCODE_64 0x0D00
>>> +#define VIRTIO_PARAM_MASK  0xff
>>> +#define VIRTIO_PARAM_VRING_INTERRUPT   0x0
>>> +#define VIRTIO_PARAM_CONFIG_CHANGED0x1
>>> 
>>
>> Maybe this should be exported in a header, something like 
>> arch/s390/include/asm/kvm_virtio.h? In that case this file
>> must be added to Kbuild for make headers_install.
>>   
> 
> While that thought sounds good at first, it's really no use for anyone,
> right? I mean - in qemu we need to define the defines manually anyways
> because we need to potentially be able to build the s390x target on
> non-s390x, possibly on non-Linux.

I have no strong opinion on that, but I think its more a matter of where
to put an interface description. A header file seems just the right place.
I will let you (or Rusty) decide.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] S390: take a full byte as ext_param indicator

2010-08-24 Thread Alexander Graf
Christian Borntraeger wrote:
> Am 23.08.2010 23:31, schrieb Alexander Graf:
>   
>> Currenty the ext_param field only distinguishes between "config change" and
>> "vring interrupt". We can do a lot more with it though, so let's enable a
>> full byte of possible values and constants to #defines while at it.
>> 
>
> Makes a lot of sense.
> [...]
>
>   
>>  #define VIRTIO_SUBCODE_64 0x0D00
>> +#define VIRTIO_PARAM_MASK   0xff
>> +#define VIRTIO_PARAM_VRING_INTERRUPT0x0
>> +#define VIRTIO_PARAM_CONFIG_CHANGED 0x1
>> 
>
> Maybe this should be exported in a header, something like 
> arch/s390/include/asm/kvm_virtio.h? In that case this file
> must be added to Kbuild for make headers_install.
>   

While that thought sounds good at first, it's really no use for anyone,
right? I mean - in qemu we need to define the defines manually anyways
because we need to potentially be able to build the s390x target on
non-s390x, possibly on non-Linux.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Avi Kivity

 On 08/24/2010 03:07 PM, Isaku Yamahata wrote:

On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote:

  On 08/24/2010 02:35 PM, Isaku Yamahata wrote:

Add Cc: m...@redhat.com.

MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
And the name should be start with PCI_ prefix for consistency?

Except that, the patches look okay.


These aren't slots, are they?  They are functions.

The function checks if given $slot.$fn (or $slot) is valid.
So it's slots. max 32.


+assert(devfn<  PCIBUS_MAX_DEVICES);


Looks like we're comparing a function number to PCIBUS_MAX_DEVICES.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Isaku Yamahata
On Tue, Aug 24, 2010 at 02:42:18PM +0300, Avi Kivity wrote:
>  On 08/24/2010 02:35 PM, Isaku Yamahata wrote:
>> Add Cc: m...@redhat.com.
>>
>> MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
>> And the name should be start with PCI_ prefix for consistency?
>>
>> Except that, the patches look okay.
>>
>
> These aren't slots, are they?  They are functions.

The function checks if given $slot.$fn (or $slot) is valid.
So it's slots. max 32.

> There's a lot of slot/function confusion in qemu.

Agreed.
-- 
yamahata
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Avi Kivity

 On 08/24/2010 02:35 PM, Isaku Yamahata wrote:

Add Cc: m...@redhat.com.

MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
And the name should be start with PCI_ prefix for consistency?

Except that, the patches look okay.



These aren't slots, are they?  They are functions.

There's a lot of slot/function confusion in qemu.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] x86: allow kernel exception fixup for divide errors (#DE)

2010-08-24 Thread Avi Kivity

 On 08/24/2010 02:22 PM, Brian Gerst wrote:



+dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code)
+{
+   if (!user_mode_vm(regs)&&  fixup_exception(regs))
+   return;
+   do_divide_error_user(regs, error_code);
+}
+
  #ifdef CONFIG_X86_64
  /* Runs on IST stack */
  dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code)

Kernel mode exceptions should already be handled by do_trap().  This
is unnecessary.


Correct.  Please ignore this patch.  I've verified that the other two 
work without it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86 emulator: get rid of "restart" in emulation context.

2010-08-24 Thread Gleb Natapov
x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 -
 arch/x86/kvm/emulate.c |   47 ---
 arch/x86/kvm/x86.c |   16 ++--
 3 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f22e5da..5e120a7 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -220,7 +220,6 @@ struct x86_emulate_ctxt {
/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
 
-   bool restart; /* restart string instruction after writeback */
bool perm_ok; /* do not check permissions if true */
 
int exception; /* exception that happens during emulation or -1 */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d34d706..d98fc3b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -438,7 +438,6 @@ static void emulate_exception(struct x86_emulate_ctxt 
*ctxt, int vec,
ctxt->exception = vec;
ctxt->error_code = error;
ctxt->error_code_valid = valid;
-   ctxt->restart = false;
 }
 
 static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err)
@@ -2618,9 +2617,6 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt)
struct opcode opcode, *g_mod012, *g_mod3;
struct operand memop = { .type = OP_NONE };
 
-   /* we cannot decode insn before we complete previous rep insn */
-   WARN_ON(ctxt->restart);
-
c->eip = ctxt->eip;
c->fetch.start = c->fetch.end = c->eip;
ctxt->cs_base = seg_base(ctxt, ops, VCPU_SREG_CS);
@@ -2977,10 +2973,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
}
 
if (c->rep_prefix && (c->d & String)) {
-   ctxt->restart = true;
-   /* All REP prefixes have the same first termination condition */
+   /*
+* counter == 0 termination condition is checked before
+* instruction execution.
+*/
if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) {
-   ctxt->restart = false;
ctxt->eip = c->eip;
goto done;
}
@@ -3422,26 +3419,26 @@ writeback:
struct read_cache *r = &ctxt->decode.io_read;
register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
 
-
-   if (string_inst_completed(ctxt))
-   ctxt->restart = false;
-   /*
-* Re-enter guest when pio read ahead buffer is empty or,
-* if it is not used, after each 1024 iteration.
-*/
-   else if ((r->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) ||
-(r->end != 0 && r->end == r->pos)) {
-   ctxt->restart = false;
-   c->eip = ctxt->eip;
+   if (!string_inst_completed(ctxt)) {
+   /*
+* Re-enter guest when pio read ahead buffer is empty
+* or, if it is not used, after each 1024 iteration.
+*/
+   if ((r->end != 0 || c->regs[VCPU_REGS_RCX] & 0x3ff) &&
+   (r->end == 0 || r->end != r->pos)) {
+   /*
+* Reset read cache. Usually happens before
+* decode, but since instruction is restarted
+* we have to do it here.
+*/
+   ctxt->decode.mem_read.end = 0;
+   return 1; /* restart instruction */
+   }
+   goto done; /* skip rip writeback */
}
}
-   /*
-* reset read cache here in case string instruction is restared
-* without decoding
-*/
-   ctxt->decode.mem_read.end = 0;
-   if (!ctxt->restart)
-   ctxt->eip = c->eip;
+
+   ctxt->eip = c->eip;
 
 done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e179c5..131f2c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4066,18 +4066,17 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 restart:
r = x86_emulate_insn(&vcpu->arch.emulate_ctxt);
 
-   if (r) { /* emulation failed */
+   if (r < 0) { /* emulation failed */
if (reexecute_instruction(vcpu, cr2))
return EMULATE_DONE;
 
return handle_emulation_failure(vcpu);
}
 
-   r = EMULATE_DONE;
-
-   if (vcpu->arch.emulate_ctxt.exception >= 0)
+   if (vcpu->arch.emulate_ctxt.exception >= 0) {
   

[PATCH 2/3] KVM: x86 emulator: move string instruction completion check into separate function

2010-08-24 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   42 +-
 1 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f9f8353..d34d706 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2921,6 +2921,32 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static bool string_inst_completed(struct x86_emulate_ctxt *ctxt)
+{
+   struct decode_cache *c = &ctxt->decode;
+
+   /* All REP prefixes have the same first termination condition */
+   if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0)
+   return true;
+
+   /* The second termination condition only applies for REPE
+* and REPNE. Test if the repeat string operation prefix is
+* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
+* corresponding termination condition according to:
+*  - if REPE/REPZ and ZF = 0 then done
+*  - if REPNE/REPNZ and ZF = 1 then done
+*/
+   if (((c->b == 0xa6) || (c->b == 0xa7) ||
+(c->b == 0xae) || (c->b == 0xaf))
+   && (((c->rep_prefix == REPE_PREFIX) &&
+((ctxt->eflags & EFLG_ZF) == 0))
+   || ((c->rep_prefix == REPNE_PREFIX) &&
+   ((ctxt->eflags & EFLG_ZF) == EFLG_ZF
+   return true;
+
+   return false;
+}
+
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 {
@@ -3395,19 +3421,9 @@ writeback:
if (c->rep_prefix && (c->d & String)) {
struct read_cache *r = &ctxt->decode.io_read;
register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
-   /* The second termination condition only applies for REPE
-* and REPNE. Test if the repeat string operation prefix is
-* REPE/REPZ or REPNE/REPNZ and if it's the case it tests the
-* corresponding termination condition according to:
-*  - if REPE/REPZ and ZF = 0 then done
-*  - if REPNE/REPNZ and ZF = 1 then done
-*/
-   if (((c->b == 0xa6) || (c->b == 0xa7) ||
-(c->b == 0xae) || (c->b == 0xaf))
-   && (((c->rep_prefix == REPE_PREFIX) &&
-((ctxt->eflags & EFLG_ZF) == 0))
-   || ((c->rep_prefix == REPNE_PREFIX) &&
-   ((ctxt->eflags & EFLG_ZF) == EFLG_ZF
+
+
+   if (string_inst_completed(ctxt))
ctxt->restart = false;
/*
 * Re-enter guest when pio read ahead buffer is empty or,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: x86 emulator: Rename variable that shadows another local variable.

2010-08-24 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 017ae0c..f9f8353 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3393,7 +3393,7 @@ writeback:
&c->dst);
 
if (c->rep_prefix && (c->d & String)) {
-   struct read_cache *rc = &ctxt->decode.io_read;
+   struct read_cache *r = &ctxt->decode.io_read;
register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
/* The second termination condition only applies for REPE
 * and REPNE. Test if the repeat string operation prefix is
@@ -3413,8 +3413,8 @@ writeback:
 * Re-enter guest when pio read ahead buffer is empty or,
 * if it is not used, after each 1024 iteration.
 */
-   else if ((rc->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) ||
-(rc->end != 0 && rc->end == rc->pos)) {
+   else if ((r->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) ||
+(r->end != 0 && r->end == r->pos)) {
ctxt->restart = false;
c->eip = ctxt->eip;
}
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] Check pci slot number in parse_pci_devfn

2010-08-24 Thread Isaku Yamahata
Add Cc: m...@redhat.com.

MAX_PCI_SLOTS should be in pci.h instead of qdev.h?
And the name should be start with PCI_ prefix for consistency?

Except that, the patches look okay.

thanks,

On Tue, Aug 24, 2010 at 02:49:27PM +0800, Ken CC wrote:
> Define MAX_PCI_SLOTS as 0x1f, if pci addr provided from command line
> is bigger than 0x1f, return error -EINVAL.
> 
> 0x1f << 3 | 7 == 255 (PCIBUS_MAX_DEVICES - 1)
> 
> Signed-off-by: Ken CC 
> ---
>  hw/qdev-properties.c |2 ++
>  hw/qdev.h|3 +++
>  2 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 9219cd7..96d814c 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -514,6 +514,8 @@ static int parse_pci_devfn(DeviceState *dev, Property 
> *prop, const char *str)
>  return -EINVAL;
>  }
>  }
> +if (slot > MAX_PCI_SLOTS)
> +return -EINVAL;
>  if (str[n] != '\0')
>  return -EINVAL;
>  if (fn > 7)
> diff --git a/hw/qdev.h b/hw/qdev.h
> index 678f8b7..fcfe52e 100644
> --- a/hw/qdev.h
> +++ b/hw/qdev.h
> @@ -7,6 +7,9 @@
>  #include "qemu-char.h"
>  #include "qemu-option.h"
>  
> +
> +#define MAX_PCI_SLOTS 0x1f
> +
>  typedef struct Property Property;
>  
>  typedef struct PropertyInfo PropertyInfo;
> 
> 

-- 
yamahata
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] x86: allow kernel exception fixup for divide errors (#DE)

2010-08-24 Thread Brian Gerst
On Tue, Aug 24, 2010 at 7:10 AM, Avi Kivity  wrote:
> KVM wants to emulate the DIV and IDIV instructions by executing them natively;
> this can cause a #DE to be raised.
>
> Allow the exception handling mechanism to process #DE exceptions so KVM can
> catch and process them.
>
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kernel/traps.c |   10 +-
>  1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 725ef4d..dd313cf 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -205,7 +205,8 @@ dotraplinkage void do_##name(struct pt_regs *regs, long 
> error_code) \
>        do_trap(trapnr, signr, str, regs, error_code, &info);           \
>  }
>
> -DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip)
> +static DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error_user,
> +                    FPE_INTDIV, regs->ip)
>  DO_ERROR(4, SIGSEGV, "overflow", overflow)
>  DO_ERROR(5, SIGSEGV, "bounds", bounds)
>  DO_ERROR_INFO(6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip)
> @@ -217,6 +218,13 @@ DO_ERROR(12, SIGBUS, "stack segment", stack_segment)
>  #endif
>  DO_ERROR_INFO(17, SIGBUS, "alignment check", alignment_check, BUS_ADRALN, 0)
>
> +dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code)
> +{
> +       if (!user_mode_vm(regs) && fixup_exception(regs))
> +               return;
> +       do_divide_error_user(regs, error_code);
> +}
> +
>  #ifdef CONFIG_X86_64
>  /* Runs on IST stack */
>  dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code)

Kernel mode exceptions should already be handled by do_trap().  This
is unnecessary.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] x86: allow kernel exception fixup for divide errors (#DE)

2010-08-24 Thread Avi Kivity
KVM wants to emulate the DIV and IDIV instructions by executing them natively;
this can cause a #DE to be raised.

Allow the exception handling mechanism to process #DE exceptions so KVM can
catch and process them.

Signed-off-by: Avi Kivity 
---
 arch/x86/kernel/traps.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 725ef4d..dd313cf 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -205,7 +205,8 @@ dotraplinkage void do_##name(struct pt_regs *regs, long 
error_code) \
do_trap(trapnr, signr, str, regs, error_code, &info);   \
 }
 
-DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip)
+static DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error_user,
+FPE_INTDIV, regs->ip)
 DO_ERROR(4, SIGSEGV, "overflow", overflow)
 DO_ERROR(5, SIGSEGV, "bounds", bounds)
 DO_ERROR_INFO(6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip)
@@ -217,6 +218,13 @@ DO_ERROR(12, SIGBUS, "stack segment", stack_segment)
 #endif
 DO_ERROR_INFO(17, SIGBUS, "alignment check", alignment_check, BUS_ADRALN, 0)
 
+dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code)
+{
+   if (!user_mode_vm(regs) && fixup_exception(regs))
+   return;
+   do_divide_error_user(regs, error_code);
+}
+
 #ifdef CONFIG_X86_64
 /* Runs on IST stack */
 dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86 emulator: trap and propagate #DE from DIV and IDIV

2010-08-24 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   17 ++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f82e43a..a7e26d0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -505,6 +505,12 @@ static void emulate_ts(struct x86_emulate_ctxt *ctxt, int 
err)
emulate_exception(ctxt, TS_VECTOR, err, true);
 }
 
+static int emulate_de(struct x86_emulate_ctxt *ctxt)
+{
+   emulate_exception(ctxt, DE_VECTOR, 0, false);
+   return X86EMUL_PROPAGATE_FAULT;
+}
+
 static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
  struct x86_emulate_ops *ops,
  unsigned long eip, u8 *dest)
@@ -1459,6 +1465,7 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
struct decode_cache *c = &ctxt->decode;
unsigned long *rax = &c->regs[VCPU_REGS_RAX];
unsigned long *rdx = &c->regs[VCPU_REGS_RDX];
+   u8 de = 0;
 
switch (c->modrm_reg) {
case 0 ... 1:   /* test */
@@ -1477,14 +1484,18 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
emulate_1op_rax_rdx("imul", c->src, *rax, *rdx, ctxt->eflags);
break;
case 6: /* div */
-   emulate_1op_rax_rdx("div", c->src, *rax, *rdx, ctxt->eflags);
+   emulate_1op_rax_rdx_ex("div", c->src, *rax, *rdx,
+  ctxt->eflags, de);
break;
case 7: /* idiv */
-   emulate_1op_rax_rdx("idiv", c->src, *rax, *rdx, ctxt->eflags);
+   emulate_1op_rax_rdx_ex("idiv", c->src, *rax, *rdx,
+  ctxt->eflags, de);
break;
default:
return X86EMUL_UNHANDLEABLE;
}
+   if (de)
+   return emulate_de(ctxt);
return X86EMUL_CONTINUE;
 }
 
@@ -3363,7 +3374,7 @@ special_insn:
break;
case 0xf6 ... 0xf7: /* Grp3 */
if (emulate_grp3(ctxt, ops) != X86EMUL_CONTINUE)
-   goto cannot_emulate;
+   goto done;
break;
case 0xf8: /* clc */
ctxt->eflags &= ~EFLG_CF;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Trap and propagate divide errors when emulating DIV

2010-08-24 Thread Avi Kivity
KVM recently started emulating DIV and IDIV.  However, those instructions
trap when given the right operands.  Since figuring out when to trap or not
is difficult, we just execute the instruction and see if the processor trapped
or not.

tip: please queue the first patch on fast-forward-only branch kvm.git can
 merge, or we can carry the patch in kvm.git with your ack.

Avi Kivity (3):
  x86: allow kernel exception fixup for divide errors (#DE)
  KVM: x86 emulator: add macros for executing instructions that may
trap
  KVM: x86 emulator: trap and propagate #DE from DIV and IDIV

 arch/x86/kernel/traps.c |   10 +++-
 arch/x86/kvm/emulate.c  |   60 --
 2 files changed, 66 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: x86 emulator: add macros for executing instructions that may trap

2010-08-24 Thread Avi Kivity
Like DIV and IDIV.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   43 +++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 808934c..f82e43a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -331,6 +331,27 @@ struct group_dual {
  "a" (_rax), "d" (_rdx));  \
} while (0)
 
+#define __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, _eflags, _suffix, _ex) 
\
+   do {\
+   unsigned long _tmp; \
+   \
+   __asm__ __volatile__ (  \
+   _PRE_EFLAGS("0", "5", "1")  \
+   "1: \n\t"   \
+   _op _suffix " %6; " \
+   "2: \n\t"   \
+   _POST_EFLAGS("0", "5", "1") \
+   ".pushsection .fixup,\"ax\" \n\t"   \
+   "3: movb $1, %4 \n\t"   \
+   "jmp 2b \n\t"   \
+   ".popsection \n\t"  \
+   _ASM_EXTABLE(1b, 3b)\
+   : "=m" (_eflags), "=&r" (_tmp), \
+ "+a" (_rax), "+d" (_rdx), "+qm"(_ex)  \
+   : "i" (EFLAGS_MASK), "m" ((_src).val),  \
+ "a" (_rax), "d" (_rdx));  \
+   } while (0)
+
 /* instruction has only one source operand, destination is implicit (e.g. mul, 
div, imul, idiv) */
 #define emulate_1op_rax_rdx(_op, _src, _rax, _rdx, _eflags)
\
do {
\
@@ -342,6 +363,28 @@ struct group_dual {
}   \
} while (0)
 
+#define emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, _eflags, _ex)\
+   do {\
+   switch((_src).bytes) {  \
+   case 1: \
+   __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \
+_eflags, "b", _ex);\
+   break;  \
+   case 2: \
+   __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \
+_eflags, "w", _ex);\
+   break;  \
+   case 4: \
+   __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \
+_eflags, "l", _ex);\
+   break;  \
+   case 8: ON64(   \
+   __emulate_1op_rax_rdx_ex(_op, _src, _rax, _rdx, \
+_eflags, "q", _ex));   \
+   break;  \
+   }   \
+   } while (0)
+
 /* Fetch next part of the instruction being emulated. */
 #define insn_fetch(_type, _size, _eip)  \
 ({ unsigned long _x;   \
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 2/3] Allow emulation tests to trap exceptions

2010-08-24 Thread Avi Kivity
Some instructions trap on execution, we need a way to see if they raise an
exception as expected.

Signed-off-by: Avi Kivity 
---
 config-x86-common.mak |3 ++-
 x86/emulator.c|2 ++
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 19bffd4..0ed13e4 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -45,7 +45,8 @@ $(TEST_DIR)/vmexit.flat: $(cstart.o) $(TEST_DIR)/vmexit.o
 $(TEST_DIR)/smptest.flat: $(cstart.o) $(TEST_DIR)/smptest.o
  
 $(TEST_DIR)/emulator.flat: $(cstart.o) $(TEST_DIR)/emulator.o \
-  $(TEST_DIR)/vm.o $(TEST_DIR)/print.o
+  $(TEST_DIR)/vm.o $(TEST_DIR)/print.o \
+  $(TEST_DIR)/idt.o
 
 $(TEST_DIR)/port80.flat: $(cstart.o) $(TEST_DIR)/port80.o
 
diff --git a/x86/emulator.c b/x86/emulator.c
index b0d15c0..5d1659f 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -1,6 +1,7 @@
 #include "ioram.h"
 #include "vm.h"
 #include "libcflat.h"
+#include "idt.h"
 
 #define memset __builtin_memset
 #define TESTDEV_IO_PORT 0xe0
@@ -582,6 +583,7 @@ int main()
unsigned long t1, t2;
 
setup_vm();
+   setup_idt();
mem = vmap(IORAM_BASE_PHYS, IORAM_LEN);
 
// test mov reg, r/m and mov r/m, reg
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 1/3] Support #DE (divide error) exception handlers

2010-08-24 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 x86/idt.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/x86/idt.c b/x86/idt.c
index 590839f..4480833 100644
--- a/x86/idt.c
+++ b/x86/idt.c
@@ -100,6 +100,11 @@ asm (".pushsection .text \n\t"
  "pushq $13 \n\t"
  "jmp handle_exception \n\t"
 
+ "de_fault: \n\t"
+ "pushq $0 \n\t"
+ "pushq $0 \n\t"
+ "jmp handle_exception \n\t"
+
  "handle_exception: \n\t"
  "push %r15; push %r14; push %r13; push %r12 \n\t"
  "push %r11; push %r10; push %r9; push %r8 \n\t"
@@ -118,9 +123,10 @@ asm (".pushsection .text \n\t"
 
 void setup_idt(void)
 {
-extern char ud_fault, gp_fault;
+extern char ud_fault, gp_fault, de_fault;
 
 lidt(idt, 256);
+set_idt_entry(&idt[0], &de_fault, 0);
 set_idt_entry(&idt[6], &ud_fault, 0);
 set_idt_entry(&idt[13], &gp_fault, 0);
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 0/3] DIV exception tests

2010-08-24 Thread Avi Kivity
DIV may raise an exception if dividing by zero or causing an overflow.  Test
it.

Avi Kivity (3):
  Support #DE (divide error) exception handlers
  Allow emulation tests to trap exceptions
  Add DIV tests

 config-x86-common.mak |3 ++-
 x86/emulator.c|   20 
 x86/idt.c |8 +++-
 3 files changed, 29 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/3] Add DIV tests

2010-08-24 Thread Avi Kivity
Test divide-by-zero and normal cases.

Signed-off-by: Avi Kivity 
---
 x86/emulator.c |   18 ++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/x86/emulator.c b/x86/emulator.c
index 5d1659f..845e7a0 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -577,6 +577,23 @@ static void test_imul(ulong *mem)
 report("imul rax, mem, imm", a == 0x1D950BDE1D950BC8L);
 }
 
+static void test_div(long *mem)
+{
+long a, d;
+u8 ex = 1;
+
+*mem = 0; a = 1; d = 2;
+asm (ASM_TRY("1f") "divq %3; movb $0, %2; 1:"
+: "+a"(a), "+d"(d), "+q"(ex) : "m"(*mem));
+report("divq (fault)", a == 1 && d == 2 && ex);
+
+*mem = 987654321098765UL; a = 123456789012345UL; d = 123456789012345UL;
+asm (ASM_TRY("1f") "divq %3; movb $0, %2; 1:"
+: "+a"(a), "+d"(d), "+q"(ex) : "m"(*mem));
+report("divq (1)",
+  a == 0x1ffb1b963b33ul && d == 0x273ba4384ede2ul && !ex);
+}
+
 int main()
 {
void *mem;
@@ -614,6 +631,7 @@ int main()
test_btc(mem);
test_bsfbsr(mem);
test_imul(mem);
+   test_div(mem);
 
printf("\nSUMMARY: %d tests, %d failures\n", tests, fails);
return fails ? 1 : 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: confusion about kvm_mmu_pte_write

2010-08-24 Thread Avi Kivity

 On 08/24/2010 11:16 AM, cs-jerry** wrote:

hello all,
i am reading kvm code and confused about the mmu_pte_write process,
when update a gpte, function kvm_mmu_pte_write is called , and walk the shadow 
page for it.why condition is :
if (sp->gfn != gfn || sp->role.direct || sp->role.invalid)
continue;
and not update those sp->gfn == gfn&&  sp->role.direct,what does it mean when 
role.direct is clear?
thanks.N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¤¾h§¶›¡Ü¨}©ž²Æ 
zÚ&j:+v‰¨¾«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹®w¥¢¸?™¨è­Ú&¢)ߢfl===


Please fix you mail client not to send corrupted emails. Also, read 
Documentation/kvm/mmu.txt.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 v2] kvm: rework KVM mmu_shrink() code

2010-08-24 Thread Xiaotian Feng
On Tue, Aug 24, 2010 at 10:07 AM, Marcelo Tosatti  wrote:
> On Mon, Aug 23, 2010 at 07:11:11PM +0800, Xiaotian Feng wrote:
>> On Mon, Aug 23, 2010 at 6:27 PM, Avi Kivity  wrote:
>> >  On 08/23/2010 01:22 PM, Avi Kivity wrote:
>> >>
>> >>
>> >> I see a lot of soft lockups with this patchset:
>> >
>> > This is running the emulator.flat test case, with shadow paging.  This test
>> > triggers a lot (millions) of mmu mode switches.
>> >
>>
>> Does following patch fix your issue?
>>
>> Latest kvm mmu_shrink code rework makes kernel changes
>> kvm->arch.n_used_mmu_pages/
>> kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page,
>> which is called
>> by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
>> kvm_mmu_available_pages(vcpu->kvm) is unchanged after 
>> kvm_mmu_commit_zap_page(),
>> This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages looping 
>> forever.
>> Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.
>>
>> ---
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index f52a965..7e09a21 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1726,8 +1726,8 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm,
>> unsigned int goal_nr_mmu_pages)
>>                                             struct kvm_mmu_page, link);
>>                         kvm_mmu_prepare_zap_page(kvm, page,
>>                                                                
>> &invalid_list);
>> +                       kvm_mmu_commit_zap_page(kvm, &invalid_list);
>>                 }
>> -               kvm_mmu_commit_zap_page(kvm, &invalid_list);
>>                 goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
>>         }
>>
>> @@ -2976,9 +2976,9 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
>>                 sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev,
>>                                   struct kvm_mmu_page, link);
>>                 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
>> +               kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>>                 ++vcpu->kvm->stat.mmu_recycled;
>>         }
>> -       kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>>  }
>>
>>  int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code)
>
> Please resend with a signed-off-by, and proper subject for the patch.

It's available at: https://patchwork.kernel.org/patch/125431/

Thanks
Xiaotian

>
> Thanks
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 10/12] Handle async PF in non preemptable context

2010-08-24 Thread Avi Kivity

 On 08/24/2010 12:36 PM, Gleb Natapov wrote:

On Tue, Aug 24, 2010 at 12:30:25PM +0300, Avi Kivity wrote:

  On 07/19/2010 06:31 PM, Gleb Natapov wrote:

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Acked-by: Rik van Riel
Signed-off-by: Gleb Natapov
---
  arch/x86/kernel/kvm.c |   36 
  1 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a6db92e..914b0fc 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -37,6 +37,7 @@
  #include
  #include
  #include
+#include

  #define MMU_QUEUE_SIZE 1024

@@ -68,6 +69,8 @@ struct kvm_task_sleep_node {
wait_queue_head_t wq;
u32 token;
int cpu;
+   bool halted;
+   struct mm_struct *mm;
  };

  static struct kvm_task_sleep_head {
@@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
struct kvm_task_sleep_node n, *e;
DEFINE_WAIT(wait);
+   int cpu, idle;
+
+   cpu = get_cpu();
+   idle = idle_cpu(cpu);
+   put_cpu();

spin_lock(&b->lock);
e = _find_apf_task(b, token);
@@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 
token)

n.token = token;
n.cpu = smp_processor_id();
+   n.mm = current->active_mm;
+   n.halted = idle || preempt_count()>   1;
+   atomic_inc(&n.mm->mm_count);
init_waitqueue_head(&n.wq);
hlist_add_head(&n.link,&b->list);
spin_unlock(&b->lock);

for (;;) {
-   prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
+   if (!n.halted)
+   prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
if (hlist_unhashed(&n.link))
break;
-   schedule();
+
+   if (!n.halted) {
+   schedule();
+   } else {
+   /*
+* We cannot reschedule. So halt.
+*/

If we get the wakeup here, we'll halt and never wake up again.


We will not. IRQs are disabled here. native_safe_halt() enables them.


Ok.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 09/12] Retry fault before vmentry

2010-08-24 Thread Avi Kivity

 On 08/24/2010 12:33 PM, Gleb Natapov wrote:




@@ -505,6 +506,37 @@ out_unlock:
return 0;
  }

+static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3,
+  gva_t addr, u32 error_code)
+{
+   int r = 0;
+   gpa_t curr_cr3 = vcpu->arch.cr3;
+
+   if (curr_cr3 != cr3) {
+   /*
+* We do page fault on behalf of a process that is sleeping
+* because of async PF. PV guest takes reference to mm that cr3
+* belongs too, so it has to be valid here.
+*/
+   kvm_set_cr3(vcpu, cr3);
+   if (kvm_mmu_reload(vcpu))
+   goto switch_cr3;
+   }

With nested virtualization, we need to switch cr0, cr4, and efer as well...


On SVM or VMX or both?


Both.  Let's defer this patch since it's an optimization, this is really 
complicated.



+
+   r = FNAME(page_fault)(vcpu, addr, error_code, true);
+
+   if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
+   kvm_mmu_sync_roots(vcpu);

Why is this needed?


http://www.mail-archive.com/kvm@vger.kernel.org/msg37827.html

  KVM_REQ_MMU_SYNC request generated here must be processed before
  switching to a different cr3 (otherwise vcpu_enter_guest will process it
  with the wrong cr3 in place).


Ah, it should be part of the cr3 switch block above.


+
+switch_cr3:
+   if (curr_cr3 != vcpu->arch.cr3) {
+   kvm_set_cr3(vcpu, curr_cr3);
+   kvm_mmu_reload(vcpu);
+   }
+
+   return r;
+}

This has the nasty effect of flushing the TLB on AMD.


What is more expansive reenter the guest and handle one more fault, or
flash TLB here?


No idea.  Probably the reentry.  On Intel the tlb is flushed anyway.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 12/12] Send async PF when guest is not in userspace too.

2010-08-24 Thread Avi Kivity

 On 07/19/2010 06:31 PM, Gleb Natapov wrote:

If guest indicates that it can handle async pf in kernel mode too send
it, but only if interrupt are enabled.

Reviewed-by: Rik van Riel
Signed-off-by: Gleb Natapov
---
  arch/x86/kvm/mmu.c |8 +++-
  1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 12d1a7b..ed87b1c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2361,7 +2361,13 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
if (!vcpu->arch.apf_data || kvm_event_needs_reinjection(vcpu))
return false;

-   return !!kvm_x86_ops->get_cpl(vcpu);
+   if (vcpu->arch.apf_send_user_only)
+   return !!kvm_x86_ops->get_cpl(vcpu);


cpl is not a bool.  Compare it with 0.


+
+   if (!kvm_x86_ops->interrupt_allowed(vcpu))
+   return false;
+
+   return true;
  }


Should have commented before, but get_cpl() is not accurate when doing 
nested virtualization.  When L1 intercepts page faults, being in L2 is 
equivalent to CPL 3.  But we need to get the apf information to L1 somehow.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 10/12] Handle async PF in non preemptable context

2010-08-24 Thread Gleb Natapov
On Tue, Aug 24, 2010 at 12:30:25PM +0300, Avi Kivity wrote:
>  On 07/19/2010 06:31 PM, Gleb Natapov wrote:
> >If async page fault is received by idle task or when preemp_count is
> >not zero guest cannot reschedule, so do sti; hlt and wait for page to be
> >ready. vcpu can still process interrupts while it waits for the page to
> >be ready.
> >
> >Acked-by: Rik van Riel
> >Signed-off-by: Gleb Natapov
> >---
> >  arch/x86/kernel/kvm.c |   36 
> >  1 files changed, 32 insertions(+), 4 deletions(-)
> >
> >diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> >index a6db92e..914b0fc 100644
> >--- a/arch/x86/kernel/kvm.c
> >+++ b/arch/x86/kernel/kvm.c
> >@@ -37,6 +37,7 @@
> >  #include
> >  #include
> >  #include
> >+#include
> >
> >  #define MMU_QUEUE_SIZE 1024
> >
> >@@ -68,6 +69,8 @@ struct kvm_task_sleep_node {
> > wait_queue_head_t wq;
> > u32 token;
> > int cpu;
> >+bool halted;
> >+struct mm_struct *mm;
> >  };
> >
> >  static struct kvm_task_sleep_head {
> >@@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 
> >token)
> > struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
> > struct kvm_task_sleep_node n, *e;
> > DEFINE_WAIT(wait);
> >+int cpu, idle;
> >+
> >+cpu = get_cpu();
> >+idle = idle_cpu(cpu);
> >+put_cpu();
> >
> > spin_lock(&b->lock);
> > e = _find_apf_task(b, token);
> >@@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 
> >token)
> >
> > n.token = token;
> > n.cpu = smp_processor_id();
> >+n.mm = current->active_mm;
> >+n.halted = idle || preempt_count()>  1;
> >+atomic_inc(&n.mm->mm_count);
> > init_waitqueue_head(&n.wq);
> > hlist_add_head(&n.link,&b->list);
> > spin_unlock(&b->lock);
> >
> > for (;;) {
> >-prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
> >+if (!n.halted)
> >+prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
> > if (hlist_unhashed(&n.link))
> > break;
> >-schedule();
> >+
> >+if (!n.halted) {
> >+schedule();
> >+} else {
> >+/*
> >+ * We cannot reschedule. So halt.
> >+ */
> 
> If we get the wakeup here, we'll halt and never wake up again.
> 
We will not. IRQs are disabled here. native_safe_halt() enables them.

> >+native_safe_halt();
> >+local_irq_disable();
> 
> So we need a local_irq_disable() before the hlish_unhashed() check.
We are still in exception handler, so IRQ should be off.

> 
> >+}
> > }
> >-finish_wait(&n.wq,&wait);
> >+if (!n.halted)
> >+finish_wait(&n.wq,&wait);
> >
> > return;
> >  }
> >@@ -127,7 +149,12 @@ static void apf_task_wait(struct task_struct *tsk, u32 
> >token)
> >  static void apf_task_wake_one(struct kvm_task_sleep_node *n)
> >  {
> > hlist_del_init(&n->link);
> >-if (waitqueue_active(&n->wq))
> >+if (!n->mm)
> >+return;
> >+mmdrop(n->mm);
> >+if (n->halted)
> >+smp_send_reschedule(n->cpu);
> >+else if (waitqueue_active(&n->wq))
> > wake_up(&n->wq);
> >  }
> >
> >@@ -157,6 +184,7 @@ again:
> > }
> > n->token = token;
> > n->cpu = smp_processor_id();
> >+n->mm = NULL;
> > init_waitqueue_head(&n->wq);
> > hlist_add_head(&n->link,&b->list);
> > } else
> 
> 
> -- 
> error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 09/12] Retry fault before vmentry

2010-08-24 Thread Gleb Natapov
On Tue, Aug 24, 2010 at 12:25:33PM +0300, Avi Kivity wrote:
>  On 07/19/2010 06:30 PM, Gleb Natapov wrote:
> >When page is swapped in it is mapped into guest memory only after guest
> >tries to access it again and generate another fault. To save this fault
> >we can map it immediately since we know that guest is going to access
> >the page.
> >
> >
> >
> >-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
> >-u32 error_code)
> >+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
> >+  bool sync)
> 
> 'sync' means something else in the shadow mmu.  Please rename to
> something longer, maybe 'apf_completion'.
> 
> Alternatively, split to two functions, a base function that doesn't
> do apf and a wrapper that handles apf.
> 
Will rename to something else.

> >@@ -505,6 +506,37 @@ out_unlock:
> > return 0;
> >  }
> >
> >+static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3,
> >+   gva_t addr, u32 error_code)
> >+{
> >+int r = 0;
> >+gpa_t curr_cr3 = vcpu->arch.cr3;
> >+
> >+if (curr_cr3 != cr3) {
> >+/*
> >+ * We do page fault on behalf of a process that is sleeping
> >+ * because of async PF. PV guest takes reference to mm that cr3
> >+ * belongs too, so it has to be valid here.
> >+ */
> >+kvm_set_cr3(vcpu, cr3);
> >+if (kvm_mmu_reload(vcpu))
> >+goto switch_cr3;
> >+}
> 
> With nested virtualization, we need to switch cr0, cr4, and efer as well...
> 
On SVM or VMX or both?

> >+
> >+r = FNAME(page_fault)(vcpu, addr, error_code, true);
> >+
> >+if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
> >+kvm_mmu_sync_roots(vcpu);
> 
> Why is this needed?
> 
http://www.mail-archive.com/kvm@vger.kernel.org/msg37827.html

 KVM_REQ_MMU_SYNC request generated here must be processed before
 switching to a different cr3 (otherwise vcpu_enter_guest will process it 
 with the wrong cr3 in place).


> >+
> >+switch_cr3:
> >+if (curr_cr3 != vcpu->arch.cr3) {
> >+kvm_set_cr3(vcpu, curr_cr3);
> >+kvm_mmu_reload(vcpu);
> >+}
> >+
> >+return r;
> >+}
> 
> This has the nasty effect of flushing the TLB on AMD.
> 
What is more expansive reenter the guest and handle one more fault, or
flash TLB here?

> >+
> >  static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
> >  {
> > struct kvm_shadow_walk_iterator iterator;
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index 2603cc4..5482db0 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -5743,6 +5743,15 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned 
> >long rflags)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_set_rflags);
> >
> >+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
> >+   struct kvm_async_pf *work)
> >+{
> >+if (!vcpu->arch.mmu.page_fault_other_cr3 || is_error_page(work->page))
> >+return;
> >+vcpu->arch.mmu.page_fault_other_cr3(vcpu, work->arch.cr3, work->gva,
> >+work->arch.error_code);
> >+}
> >+
> >  static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
> >  {
> > if (unlikely(vcpu->arch.apf_memslot_ver !=
> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >index f56e8ac..de1d5b6 100644
> >--- a/virt/kvm/kvm_main.c
> >+++ b/virt/kvm/kvm_main.c
> >@@ -1348,6 +1348,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu 
> >*vcpu)
> > spin_lock(&vcpu->async_pf_lock);
> > list_del(&work->link);
> > spin_unlock(&vcpu->async_pf_lock);
> >+kvm_arch_async_page_ready(vcpu, work);
> > put_page(work->page);
> > async_pf_work_free(work);
> > list_del(&work->queue);
> >@@ -1366,6 +1367,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu 
> >*vcpu)
> > list_del(&work->queue);
> > vcpu->async_pf_queued--;
> >
> >+kvm_arch_async_page_ready(vcpu, work);
> > kvm_arch_inject_async_page_present(vcpu, work);
> >
> > put_page(work->page);
> 
> 
> -- 
> error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 11/12] Let host know whether the guest can handle async PF in non-userspace context.

2010-08-24 Thread Avi Kivity

 On 07/19/2010 06:31 PM, Gleb Natapov wrote:

If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.

Acked-by: Rik van Riel
Signed-off-by: Gleb Natapov
---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/include/asm/kvm_para.h |1 +
  arch/x86/kernel/kvm.c   |3 +++
  arch/x86/kvm/x86.c  |5 +++--
  4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 45e6c12..c675d5d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -367,6 +367,7 @@ struct kvm_vcpu_arch {
cpumask_var_t wbinvd_dirty_mask;

u32 __user *apf_data;
+   bool apf_send_user_only;
u32 apf_memslot_ver;
u64 apf_msr_val;
u32 async_pf_id;


Lots of apf stuff in here.  Make it apg.data etc.?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 10/12] Handle async PF in non preemptable context

2010-08-24 Thread Avi Kivity

 On 07/19/2010 06:31 PM, Gleb Natapov wrote:

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Acked-by: Rik van Riel
Signed-off-by: Gleb Natapov
---
  arch/x86/kernel/kvm.c |   36 
  1 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a6db92e..914b0fc 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -37,6 +37,7 @@
  #include
  #include
  #include
+#include

  #define MMU_QUEUE_SIZE 1024

@@ -68,6 +69,8 @@ struct kvm_task_sleep_node {
wait_queue_head_t wq;
u32 token;
int cpu;
+   bool halted;
+   struct mm_struct *mm;
  };

  static struct kvm_task_sleep_head {
@@ -96,6 +99,11 @@ static void apf_task_wait(struct task_struct *tsk, u32 token)
struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
struct kvm_task_sleep_node n, *e;
DEFINE_WAIT(wait);
+   int cpu, idle;
+
+   cpu = get_cpu();
+   idle = idle_cpu(cpu);
+   put_cpu();

spin_lock(&b->lock);
e = _find_apf_task(b, token);
@@ -109,17 +117,31 @@ static void apf_task_wait(struct task_struct *tsk, u32 
token)

n.token = token;
n.cpu = smp_processor_id();
+   n.mm = current->active_mm;
+   n.halted = idle || preempt_count()>  1;
+   atomic_inc(&n.mm->mm_count);
init_waitqueue_head(&n.wq);
hlist_add_head(&n.link,&b->list);
spin_unlock(&b->lock);

for (;;) {
-   prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
+   if (!n.halted)
+   prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
if (hlist_unhashed(&n.link))
break;
-   schedule();
+
+   if (!n.halted) {
+   schedule();
+   } else {
+   /*
+* We cannot reschedule. So halt.
+*/


If we get the wakeup here, we'll halt and never wake up again.


+   native_safe_halt();
+   local_irq_disable();


So we need a local_irq_disable() before the hlish_unhashed() check.


+   }
}
-   finish_wait(&n.wq,&wait);
+   if (!n.halted)
+   finish_wait(&n.wq,&wait);

return;
  }
@@ -127,7 +149,12 @@ static void apf_task_wait(struct task_struct *tsk, u32 
token)
  static void apf_task_wake_one(struct kvm_task_sleep_node *n)
  {
hlist_del_init(&n->link);
-   if (waitqueue_active(&n->wq))
+   if (!n->mm)
+   return;
+   mmdrop(n->mm);
+   if (n->halted)
+   smp_send_reschedule(n->cpu);
+   else if (waitqueue_active(&n->wq))
wake_up(&n->wq);
  }

@@ -157,6 +184,7 @@ again:
}
n->token = token;
n->cpu = smp_processor_id();
+   n->mm = NULL;
init_waitqueue_head(&n->wq);
hlist_add_head(&n->link,&b->list);
} else



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 09/12] Retry fault before vmentry

2010-08-24 Thread Avi Kivity

 On 07/19/2010 06:30 PM, Gleb Natapov wrote:

When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page.



-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
-   u32 error_code)
+static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
+ bool sync)


'sync' means something else in the shadow mmu.  Please rename to 
something longer, maybe 'apf_completion'.


Alternatively, split to two functions, a base function that doesn't do 
apf and a wrapper that handles apf.



@@ -505,6 +506,37 @@ out_unlock:
return 0;
  }

+static int FNAME(page_fault_other_cr3)(struct kvm_vcpu *vcpu, gpa_t cr3,
+  gva_t addr, u32 error_code)
+{
+   int r = 0;
+   gpa_t curr_cr3 = vcpu->arch.cr3;
+
+   if (curr_cr3 != cr3) {
+   /*
+* We do page fault on behalf of a process that is sleeping
+* because of async PF. PV guest takes reference to mm that cr3
+* belongs too, so it has to be valid here.
+*/
+   kvm_set_cr3(vcpu, cr3);
+   if (kvm_mmu_reload(vcpu))
+   goto switch_cr3;
+   }


With nested virtualization, we need to switch cr0, cr4, and efer as well...


+
+   r = FNAME(page_fault)(vcpu, addr, error_code, true);
+
+   if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
+   kvm_mmu_sync_roots(vcpu);


Why is this needed?


+
+switch_cr3:
+   if (curr_cr3 != vcpu->arch.cr3) {
+   kvm_set_cr3(vcpu, curr_cr3);
+   kvm_mmu_reload(vcpu);
+   }
+
+   return r;
+}


This has the nasty effect of flushing the TLB on AMD.


+
  static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
  {
struct kvm_shadow_walk_iterator iterator;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2603cc4..5482db0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5743,6 +5743,15 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long 
rflags)
  }
  EXPORT_SYMBOL_GPL(kvm_set_rflags);

+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+  struct kvm_async_pf *work)
+{
+   if (!vcpu->arch.mmu.page_fault_other_cr3 || is_error_page(work->page))
+   return;
+   vcpu->arch.mmu.page_fault_other_cr3(vcpu, work->arch.cr3, work->gva,
+   work->arch.error_code);
+}
+
  static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
  {
if (unlikely(vcpu->arch.apf_memslot_ver !=
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f56e8ac..de1d5b6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1348,6 +1348,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
spin_lock(&vcpu->async_pf_lock);
list_del(&work->link);
spin_unlock(&vcpu->async_pf_lock);
+   kvm_arch_async_page_ready(vcpu, work);
put_page(work->page);
async_pf_work_free(work);
list_del(&work->queue);
@@ -1366,6 +1367,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
list_del(&work->queue);
vcpu->async_pf_queued--;

+   kvm_arch_async_page_ready(vcpu, work);
kvm_arch_inject_async_page_present(vcpu, work);

put_page(work->page);



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 08/12] Inject asynchronous page fault into a guest if page is swapped out.

2010-08-24 Thread Avi Kivity

 On 08/24/2010 10:52 AM, Gleb Natapov wrote:

This nice cache needs to be outside apf to reduce complexity for
reviewers and since it is useful for others.

Would be good to have memslot-cached kvm_put_guest() and kvm_get_guest().

Will look into it.


In the meantime, you can just drop the caching.



+  struct kvm_arch_async_pf *arch)
+{
+   struct kvm_async_pf *work;
+
+   if (vcpu->async_pf_queued>= ASYNC_PF_PER_VCPU)
+   return 0;

100 == too high.  At 16 vcpus, this allows 1600 kernel threads to
wait for I/O.

Number of kernel threads are limited by other means. Slow work subsystem
has its own knobs to tune that. Here we limit how much slow work items
can be queued per vcpu.


OK.


Would have been best if we could ask for a page to be paged in
asynchronously.


You mean to have core kernel facility for that? I agree it would be
nice, but much harder.


Yes, that's what I meant.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/12] Provide special async page fault handler when async PF capability is detected

2010-08-24 Thread Avi Kivity

 On 08/24/2010 10:31 AM, Gleb Natapov wrote:

+

+static void apf_task_wait(struct task_struct *tsk, u32 token)
+{
+   u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+   struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
+   struct kvm_task_sleep_node n, *e;
+   DEFINE_WAIT(wait);
+
+   spin_lock(&b->lock);
+   e = _find_apf_task(b, token);
+   if (e) {
+   /* dummy entry exist ->   wake up was delivered ahead of PF */
+   hlist_del(&e->link);
+   kfree(e);
+   spin_unlock(&b->lock);
+   return;
+   }
+
+   n.token = token;
+   n.cpu = smp_processor_id();

What's the meaning of cpu?  Won't the waiter migrate to other cpus?

Waiter cannot migrate to other cpu since it is sleeping. It may be
scheduled to run on any cpu when it will be waked.


What if you have a spurious wakeup?  Also, nothing prevents the 
scheduler from migrating the thread even if it is sleeping.  It may not 
do so now, but it might do it in the future.


Oh, it probably does now on cpu hotunplug.

Why do you need n.cpu?



+   spin_unlock(&b->lock);
+   cpu_relax();
+   goto again;
+   }

The other cpu might be waiting for us to yield.  We can fix it later
with the the pv spinlock infrastructure.


This busy wait happens only if (very small) allocation fails, so if
a guest ever hits this code path I expect it to be on his way to die
anyway.


Hm.  I don't have a good feel on how rare atomic allocation failures are 
on common workloads.


Note a kmem_cache for apfs will make failures even more rare.


Or, we can avoid the allocation.  If at most one apf can be pending
(is this true?), we can use a per-cpu variable for this dummy entry.


We can have may outstanding apfs.


But, while we're processing an apf, we can't take any more.

So we can have a buffer of one pre-allocated entry per cpu, and do 
something like:


apf:
  disable apf for this cpu
  handle apf using buffered entry
  enable interrupts
  allocate new entry
  buffer it
  enable apf for that cpu

this trades off a bigger apf disabled window for not busy looping.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >