date:20091116

Re: [ANNOUNCE] kvm-kmod-2.6.32-rc7

2009-11-16 Thread Jan Kiszka

[ fixing up the list address ]

John Wong wrote:
 When i install win7_x86 with this kvm-kmod-2.6.32-rc7,

win7_x86 means 64-bit version?

 kvm will trip to blue screen.
 
 I can install win7_86 with my debian-sid/2.6.31-x kernel modules.
 
 uname -a: Linux retro 2.6.31-1-amd64 #1 SMP Sat Oct 24 17:50:31 UTC 2009 
 x86_64 GNU/Linux

Which qemu-kvm version are you using for these tests?

Does anyone have some idea on this effect?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-11-16 Thread Dor Laor


On 10/28/2009 08:54 AM, Michael Goldish wrote:


- Dor Laordl...@redhat.com  wrote:


On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:

Hi Michael, I am reviewing your patchset and have just a minor

remark

to make here:

On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldishmgold...@redhat.com

  wrote:

This patch adds a new test that checks the timedrift introduced by

migrations.

It uses the same parameters used by the timedrift test to get the

guest time.

In addition, the number of migrations the test performs is

controlled by the

parameter 'migration_iterations'.

Signed-off-by: Michael Goldishmgold...@redhat.com
---
   client/tests/kvm/kvm_tests.cfg.sample  |   33

---

   client/tests/kvm/tests/timedrift_with_migration.py |   95



   2 files changed, 115 insertions(+), 13 deletions(-)
   create mode 100644

client/tests/kvm/tests/timedrift_with_migration.py


diff --git a/client/tests/kvm/kvm_tests.cfg.sample

b/client/tests/kvm/kvm_tests.cfg.sample

index 540d0a2..618c21e 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,19 +100,26 @@ variants:
  type = linux_s3

  - timedrift:install setup
-type = timedrift
  extra_params +=  -rtc-td-hack
-# Pin the VM and host load to CPU #0
-cpu_mask = 0x1
-# Set the load and rest durations
-load_duration = 20
-rest_duration = 20
-# Fail if the drift after load is higher than 50%
-drift_threshold = 50
-# Fail if the drift after the rest period is higher than

10%

-drift_threshold_after_rest = 10
-# For now, make sure this test is executed alone
-used_cpus = 100
+variants:
+- with_load:
+type = timedrift
+# Pin the VM and host load to CPU #0
+cpu_mask = 0x1



Let's use -smp 2 always.


We can also just make -smp 2 the default for all tests. Does that sound
good?


Yes




btw: we need not to parallel the load test with standard tests.


We already don't, because the load test has used_cpus = 100 which
forces it to run alone.


Soon I'll have 100 on my laptop :), better change it to -1 or MAX_INT




+# Set the load and rest durations
+load_duration = 20
+rest_duration = 20


Even the default duration here seems way too brief here, is there

any

reason why 20s was chosen instead of, let's say, 1800s? I am under

the

impression that 20s of load won't be enough to cause any noticeable
drift...


+# Fail if the drift after load is higher than 50%
+drift_threshold = 50
+# Fail if the drift after the rest period is

higher than 10%

+drift_threshold_after_rest = 10


I am also curious about those tresholds and the reasoning behind

them.

Is there any official agreement on what we consider to be an
unreasonable drift?

Another thing that struck me out is drift calculation: On the

original

timedrift test, the guest drift is normalized against the host

drift:


drift = 100.0 * (host_delta - guest_delta) / host_delta

While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community

should

be reached.


I think we don't need to calculate drift ratio. We should define a
threshold in seconds, let's say 2 seconds. Beyond that, there should
not be any drift.


Are you talking about the timedrift with load or timedrift with
migration or reboot tests?  I was told that when running the load test
for e.g 60 secs, the drift should be given in % of that duration.
In the case of migration and reboot, absolute durations are used (in
seconds, no %).  Should we do that in the load test too?


Yes, but: during extreme load, we do predict that a guest *without* pv 
clock will drift and won't be able to catchup until the load stops and 
only then it will catchup. So my recommendation is to do the following:
- pvclock guest - can check with 'cat 
/sys/devices/system/clocksource/clocksource0/current_clocksource ' don't 
allow drift during huge loads.

  Exist (+safe) for rhel5.4 guests and ~2.6.29 (from 2.6.27).
- non-pv clock - run the load, stop the load, wait 5 seconds, measure time

For both, use absolute times.





Do we support migration to a different host? We should, especially in
this test too. The destination host reading should also be used.
Apart for that, good patchset, and good thing you refactored some of
the code to shared utils.


We don't, and it would be very messy to implement with the framework
right now.  We should probably do that as some sort of server side test,
but we don't have server side tests right now, so doing it may take a
little time and effort.  I got the

problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Dietmar Maurer

Hi all,

We are testing kvm-kmod-2.6.31.6, and several people reported problems with AMD 
cpus:

Nov 14 21:17:59 bigproxmox kernel: Pid: 3616, comm: kvm Not tainted 
2.6.24-9-pve #1 ovz005
Nov 14 21:17:59 bigproxmox kernel: RIP: 0010:[88537906]  
[88537906] :kvm_amd:svm_get_msr+0x146/0x300
...

see http://www.proxmox.com/forum/showthread.php?t=2591

Any ideas? kvm-kmod-2.6.31.5 worked without problems.

- Dietmar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Virtualization Performance: Intel vs. AMD

2009-11-16 Thread Avi Kivity


On 11/15/2009 05:55 PM, Thomas Treutner wrote:

On Sunday 15 November 2009 14:05:52 Neil Aggarwal wrote:
   

I prefer AMD CPUs, they give you a better bang for the buck.
Besides that, I don't think they would be any technical
differences, they are supposed to be completely compatible.
I have seen no evidence to the contrary.
 

Isn't AMD the only one who has hardware support for nested virtualization? Or
isn't that true any longer?


No, the Core i7 has ept which is the Intel equivalent.


Anyways, I'm just curious, as this feature is
primarily interesting for development, IMHO.
   


No, it's primarily interesting for performance.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: monitoring guest sidt execution

2009-11-16 Thread Avi Kivity


On 11/15/2009 05:37 PM, matteo wrote:

Hi to all,

I'm trying to intercept the guest sidt instruction execution from the
host

i've added the bit to the control structure:


control-intercept = | (1ULL  INTERCEPT_STORE_IDTR);

then I have defined the sidt handler to manage the STORE_IDTR action:


[SVM_EXIT_IDTR_READ]= idtr_write_interception,

So, in the idtr_write_interception handler there is the invocation of
the emulate_instruction(svm-
vcpu, kvm_run, 0, 0, 0); function.
Following the execution flow i found that the emulation failed in the

x86_emulate.c source file and precisely in the if (c-d == 0) 
conditional statement but i really don't know why it happens and how to
fix it.

could you please give me some hints with respect to this issue?



You need to fill the appropriate table entry for sidt (most likely 
group_table) and implement the opcode in the emulator.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Avi Kivity


On 11/16/2009 11:42 AM, Dietmar Maurer wrote:

Hi all,

We are testing kvm-kmod-2.6.31.6, and several people reported problems with AMD 
cpus:

Nov 14 21:17:59 bigproxmox kernel: Pid: 3616, comm: kvm Not tainted 
2.6.24-9-pve #1 ovz005
Nov 14 21:17:59 bigproxmox kernel: RIP: 0010:[88537906]  
[88537906] :kvm_amd:svm_get_msr+0x146/0x300
...

see http://www.proxmox.com/forum/showthread.php?t=2591

Any ideas? kvm-kmod-2.6.31.5 worked without problems.

   


Nothing changed between these two versions to warrant this.

Can you post a disassembly of svm_get_msr() around the offending address?

Did you change qemu-kvm as well?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git access via http to *bios.git repositories

2009-11-16 Thread Bernhard Kohl


ext Avi Kivity schrieb:

On 11/14/2009 02:46 PM, Avi Kivity wrote:

On 11/13/2009 08:16 PM, Jan Kiszka wrote:

Bernhard Kohl wrote:

Hi,

there is something wrong with the new *bios.git repositories.
I need to use git via http because of a firewall.
This works well with the other repos, e.g. kvm.git.

$ git clone http://www.kernel.org/pub/scm/virt/kvm/pcbios.git
Initialized empty Git repository in /home/bernd/src/pcbios/.git/
fatal: http://www.kernel.org/pub/scm/virt/kvm/pcbios.git/info/refs 
not found:

did you run git update-server-info on the server?
I think that should happen automatically on kernel.org. But the 
required

files are missing in the bios repositories, at least in the non-public
master that is replicated into the public area. Avi, any idea?



Just like the comment says, I need to run git update-server-info.  
I'll do that and set the hooks to automatically do it from now on.




That's now done.  Please try again.


Thanks, now it works.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Dietmar Maurer

 Nothing changed between these two versions to warrant this.

Oh, sorry - the one which works is kvm-kmod-2.6.30.1

 Can you post a disassembly of svm_get_msr() around the offending
 address?

Please can you tell me how to do that?

 Did you change qemu-kvm as well?

no, same qemu-kvm version (0.11.0)

- Dietmar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Avi Kivity


On 11/16/2009 12:46 PM, Dietmar Maurer wrote:

Nothing changed between these two versions to warrant this.
 

Oh, sorry - the one which works is kvm-kmod-2.6.30.1

   

Can you post a disassembly of svm_get_msr() around the offending
address?
 

Please can you tell me how to do that?

   


objdump -Dr  .../kvm-amd.ko

Look at the start address of svm_get_msr (search for the name), add 
0x146 (from :kvm_amd:svm_get_msr+0x146/0x300), list ~30 lines above 
and below that.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Dietmar Maurer

37c0 svm_get_msr:
...

387e:   66 90   xchg   %ax,%ax
3880:   0f 84 8a 00 00 00   je 3910 svm_get_msr+0x150
3886:   66 90   xchg   %ax,%ax
3888:   0f 86 c2 01 00 00   jbe3a50 svm_get_msr+0x290
388e:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
3895:   48 8b 80 08 06 00 00mov0x608(%rax),%rax
389c:   48 89 02mov%rax,(%rdx)
389f:   90  nop
38a0:   31 c0   xor%eax,%eax
38a2:   c3  retq
38a3:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
38a8:   81 fe d9 01 00 00   cmp$0x1d9,%esi
38ae:   0f 84 7c 00 00 00   je 3930 svm_get_msr+0x170
38b4:   0f 86 46 01 00 00   jbe3a00 svm_get_msr+0x240
38ba:   81 fe db 01 00 00   cmp$0x1db,%esi
38c0:   0f 84 ca 01 00 00   je 3a90 svm_get_msr+0x2d0
38c6:   81 fe dc 01 00 00   cmp$0x1dc,%esi
38cc:   0f 1f 40 00 nopl   0x0(%rax)
38d0:   75 98   jne386a svm_get_msr+0xaa
38d2:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
38d9:   48 8b 80 80 06 00 00mov0x680(%rax),%rax
38e0:   48 89 02mov%rax,(%rdx)
38e3:   eb bb   jmp38a0 svm_get_msr+0xe0
38e5:   0f 1f 00nopl   (%rax)
38e8:   48 83 bf 78 28 00 00cmpq   $0x0,0x2878(%rdi)
38ef:   00
38f0:   0f 85 82 01 00 00   jne3a78 svm_get_msr+0x2b8
38f6:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
38fd:   48 8b 48 50 mov0x50(%rax),%rcx
3901:   0f 31   rdtsc
3903:   48 01 c8add%rcx,%rax

# this is svm_get_msr+0x146
3906:   48 89 02mov%rax,(%rdx)
3909:   eb 95   jmp38a0 svm_get_msr+0xe0
390b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
3910:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
3917:   48 8b 80 00 06 00 00mov0x600(%rax),%rax
391e:   48 89 02mov%rax,(%rdx)
3921:   e9 7a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
3926:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
392d:   00 00 00
3930:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
3937:   48 8b 80 70 06 00 00mov0x670(%rax),%rax
393e:   48 89 02mov%rax,(%rdx)
3941:   e9 5a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
3946:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
394d:   00 00 00
3950:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
3957:   48 8b 80 28 06 00 00mov0x628(%rax),%rax
395e:   48 89 02mov%rax,(%rdx)
3961:   e9 3a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
3966:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
396d:   00 00 00
3970:   48 c7 02 65 00 00 01movq   $0x165,(%rdx)
3977:   e9 24 ff ff ff  jmpq   38a0 svm_get_msr+0xe0
397c:   0f 1f 40 00 nopl   0x0(%rax)
3980:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
3987:   48 8b 80 10 06 00 00mov0x610(%rax),%rax
398e:   48 89 02mov%rax,(%rdx)
3991:   e9 0a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
3996:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
399d:   00 00 00
...


We use the ubunto 2.6.24 kernel 
(http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=summary)

They have a few more patches applied:

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=history;f=include/asm-x86/msr.h;h=cfe169475b5b50a448326ef3c34f50100ac83faf;hb=HEAD

Maybe those last 2 patches can cause the problem?

 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Montag, 16. November 2009 11:52
 To: Dietmar Maurer
 Cc: kvm
 Subject: Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6
 
 On 11/16/2009 12:46 PM, Dietmar Maurer wrote:
  Nothing changed between these two versions to warrant this.
 
  Oh, sorry - the one which works is kvm-kmod-2.6.30.1
 
 
  Can you post a disassembly of svm_get_msr() around the offending
  address?
 
  Please can you tell me how to do that?
 
 
 
 objdump -Dr  .../kvm-amd.ko
 
 Look at the start address of svm_get_msr (search for the name), add
 0x146 (from :kvm_amd:svm_get_msr+0x146/0x300), list ~30 lines above
 and below that.
 
 --
 error compiling committee.c: too many arguments to function
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Avi Kivity


On 11/16/2009 01:17 PM, Dietmar Maurer wrote:

 38f0:   0f 85 82 01 00 00   jne3a78svm_get_msr+0x2b8
 38f6:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 38fd:   48 8b 48 50 mov0x50(%rax),%rcx
 3901:   0f 31   rdtsc
 3903:   48 01 c8add%rcx,%rax

# this is svm_get_msr+0x146
 3906:   48 89 02mov%rax,(%rdx)
   



Looks like a miscompile of native_read_tsc(), it needs to use %edx:%eax, 
not assume the result is in %rax.


Jan, looks like the culprit is

  static inline unsigned long long kvm_native_read_tsc(void)
  {
unsigned long long val;
asm volatile(rdtsc : =A (val));
return val;
  }

=A only works correctly on i386, need to use =a =d for portability.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Virtualization Performance: Intel vs. AMD

2009-11-16 Thread Andi Kleen

Thomas Fjellstrom tfjellst...@shaw.ca writes:

 Hardware context switches aren't free either. 

FWIW, SMT has no hardware context switches, the 'S' stands for
simultaneous: the operations from the different threads are travelling
simultaneously through the CPU's pipeline.

You seem to confuse it with 'CMT' (Coarse-grained Multi Threading),
which has context switches.

-Andi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Jan Kiszka

Dietmar Maurer wrote:
 37c0 svm_get_msr:
 ...
 
 387e:   66 90   xchg   %ax,%ax
 3880:   0f 84 8a 00 00 00   je 3910 svm_get_msr+0x150
 3886:   66 90   xchg   %ax,%ax
 3888:   0f 86 c2 01 00 00   jbe3a50 svm_get_msr+0x290
 388e:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 3895:   48 8b 80 08 06 00 00mov0x608(%rax),%rax
 389c:   48 89 02mov%rax,(%rdx)
 389f:   90  nop
 38a0:   31 c0   xor%eax,%eax
 38a2:   c3  retq
 38a3:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
 38a8:   81 fe d9 01 00 00   cmp$0x1d9,%esi
 38ae:   0f 84 7c 00 00 00   je 3930 svm_get_msr+0x170
 38b4:   0f 86 46 01 00 00   jbe3a00 svm_get_msr+0x240
 38ba:   81 fe db 01 00 00   cmp$0x1db,%esi
 38c0:   0f 84 ca 01 00 00   je 3a90 svm_get_msr+0x2d0
 38c6:   81 fe dc 01 00 00   cmp$0x1dc,%esi
 38cc:   0f 1f 40 00 nopl   0x0(%rax)
 38d0:   75 98   jne386a svm_get_msr+0xaa
 38d2:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 38d9:   48 8b 80 80 06 00 00mov0x680(%rax),%rax
 38e0:   48 89 02mov%rax,(%rdx)
 38e3:   eb bb   jmp38a0 svm_get_msr+0xe0
 38e5:   0f 1f 00nopl   (%rax)
 38e8:   48 83 bf 78 28 00 00cmpq   $0x0,0x2878(%rdi)
 38ef:   00
 38f0:   0f 85 82 01 00 00   jne3a78 svm_get_msr+0x2b8
 38f6:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 38fd:   48 8b 48 50 mov0x50(%rax),%rcx
 3901:   0f 31   rdtsc
 3903:   48 01 c8add%rcx,%rax
 
 # this is svm_get_msr+0x146
 3906:   48 89 02mov%rax,(%rdx)
 3909:   eb 95   jmp38a0 svm_get_msr+0xe0
 390b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
 3910:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 3917:   48 8b 80 00 06 00 00mov0x600(%rax),%rax
 391e:   48 89 02mov%rax,(%rdx)
 3921:   e9 7a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
 3926:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
 392d:   00 00 00
 3930:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 3937:   48 8b 80 70 06 00 00mov0x670(%rax),%rax
 393e:   48 89 02mov%rax,(%rdx)
 3941:   e9 5a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
 3946:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
 394d:   00 00 00
 3950:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 3957:   48 8b 80 28 06 00 00mov0x628(%rax),%rax
 395e:   48 89 02mov%rax,(%rdx)
 3961:   e9 3a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
 3966:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
 396d:   00 00 00
 3970:   48 c7 02 65 00 00 01movq   $0x165,(%rdx)
 3977:   e9 24 ff ff ff  jmpq   38a0 svm_get_msr+0xe0
 397c:   0f 1f 40 00 nopl   0x0(%rax)
 3980:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
 3987:   48 8b 80 10 06 00 00mov0x610(%rax),%rax
 398e:   48 89 02mov%rax,(%rdx)
 3991:   e9 0a ff ff ff  jmpq   38a0 svm_get_msr+0xe0
 3996:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
 399d:   00 00 00
 ...
 
 
 We use the ubunto 2.6.24 kernel 
 (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=summary)
 
 They have a few more patches applied:
 
 http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=history;f=include/asm-x86/msr.h;h=cfe169475b5b50a448326ef3c34f50100ac83faf;hb=HEAD
 
 Maybe those last 2 patches can cause the problem?

Nope, it was most probably a kvm-kmod bug. Patch below should fix it.

Jan

-

Fix native_read_tsc wrapping for x86-64

Use register constraint macros so that the return values of rdtsc are
properly picked up and no local variable is overwritten.

This is supposed to fix an oops on x86-64 with a 2.6.24 host kernel.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 x86/external-module-compat.h |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/x86/external-module-compat.h b/x86/external-module-compat.h
index b0b9f21..b0de024 100644
--- a/x86/external-module-compat.h
+++ b/x86/external-module-compat.h
@@ -94,9 +94,10 @@ static inline unsigned long long 
native_read_msr_safe(unsigned int msr,
 
 static inline unsigned long long kvm_native_read_tsc(void)
 {
-   unsigned long long val;
-   asm

Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Jan Kiszka

Avi Kivity wrote:
 On 11/16/2009 01:17 PM, Dietmar Maurer wrote:
  38f0:   0f 85 82 01 00 00   jne3a78svm_get_msr+0x2b8
  38f6:   48 8b 87 e0 27 00 00mov0x27e0(%rdi),%rax
  38fd:   48 8b 48 50 mov0x50(%rax),%rcx
  3901:   0f 31   rdtsc
  3903:   48 01 c8add%rcx,%rax

 # this is svm_get_msr+0x146
  3906:   48 89 02mov%rax,(%rdx)

 
 
 Looks like a miscompile of native_read_tsc(), it needs to use %edx:%eax, 
 not assume the result is in %rax.
 
 Jan, looks like the culprit is
 
static inline unsigned long long kvm_native_read_tsc(void)
{
  unsigned long long val;
  asm volatile(rdtsc : =A (val));
  return val;
}
 
 =A only works correctly on i386, need to use =a =d for portability.
 

Yes, already commit a fix, currently propagating it through all series.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Avi Kivity


On 11/16/2009 02:03 PM, Jan Kiszka wrote:

Yes, already commit a fix, currently propagating it through all series.
   


Naming the fix will be interesting.  kvm-kmod-2.6.31.6.1?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Virtualization Performance: Intel vs. AMD

2009-11-16 Thread Avi Kivity


On 11/16/2009 12:29 AM, Gordan Bobic wrote:

Thomas Fjellstrom wrote:

On Sun November 15 2009, Neil Aggarwal wrote:

The Core i7 has hyperthreading, so you see 8 logical CPUs.

Are you saying the AMD processors do not have hyperthreading?


Course not. Hyperthreading is dubious at best.


That's a rather questionable answer to a rather broad issue. SMT is 
useful, especially on processors with deep pipelines (think Pentium 4 
- and in general, deeper pipelines tend to be required for higher 
clock speeds), because it reduces the number of context switches. 
Context switches are certainly one of the most expensive operations if 
not the most expensive operation you can do on a processor, and 
typically requires flushing the pipelines. Double the number of 
hardware threads, and you halve the number of context switches.




The real win is in parallelizing memory access.  If a cache miss costs 
200 cycles, no amount of pipelining and out-of-order execution will hide 
this cost.  Running two threads in parallel will at best hide the cost 
by letting another thread execute, or at least issue two memory accesses 
in parallel instead of just one.


This typically isn't useful if your CPU is processing one 
single-threaded application 99% of the time, but on a loaded server it 
can make a significant difference to throughput.


If you are able to saturate the multiple threads (typically easier with 
many small guests rather than a few large ones) then hyperthreading is 
likely a win.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: problem wit svm_get_msr on kvm-kmod-2.6.31.6

2009-11-16 Thread Avi Kivity


On 11/16/2009 02:08 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 11/16/2009 02:03 PM, Jan Kiszka wrote:
 

Yes, already commit a fix, currently propagating it through all series.

   

Naming the fix will be interesting.  kvm-kmod-2.6.31.6.1?

 

Yes, good question. I already thought about kvm-kmod-2.6.31.6b or
kvm-kmod-2.6.31.6-2 as well. Nothing convinced be yet, still open for
creative ideas.

   


-2 may confuse rpm if someone packages it.  b or .1 ought to work.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/42] KVM updates for the 2.6.33 merge window (batch 1/2)

2009-11-16 Thread Avi Kivity

Highlights:
- improved kernel context switching speed
- better interoperation with other users of virtualization extensions
- improved irq scaling
- nested svm improvements and tracing
- improved cpufreq integration
- spin loop detection on newer hardware

Notes:
- kvm/ppc64 support will be merged through the powerpc tree
- depends on tip x86/entry branch (user return notifiers)

Alexander Graf (2):
  KVM: Activate Virtualization On Demand
  KVM: SVM: Notify nested hypervisor of lost event injections

Avi Kivity (6):
  core, x86: Add user return notifiers
  x86: Fix user return notifier build
  KVM: Don't wrap schedule() with vcpu_put()/vcpu_load()
  KVM: Don't pass kvm_run arguments
  KVM: Return -ENOTTY on unrecognized ioctls
  KVM: Move assigned device code to own file

Glauber Costa (1):
  KVM: x86: include pvclock MSRs in msrs_to_save

Gleb Natapov (9):
  KVM: Call pic_clear_isr() on pic reset to reuse logic there
  KVM: Move irq sharing information to irqchip level
  KVM: Change irq routing table to use gsi indexed array
  KVM: Maintain back mapping from irqchip/pin to gsi
  KVM: Move irq routing data structure to rcu locking
  KVM: Move irq ack notifier list to arch independent code
  KVM: Convert irq notifiers lists to RCU locking
  KVM: Move IO APIC to its own lock
  KVM: Drop kvm-irq_lock lock from irq injection path

Huang Weiyi (1):
  KVM: remove duplicated #include

Jan Kiszka (2):
  KVM: x86: Refactor guest debug IOCTL handling
  KVM: x86: Rework guest single-step flag injection and filtering

Jiri Slaby (1):
  KVM: fix lock imbalance in kvm_*_irq_source_id()

Joerg Roedel (7):
  KVM: SVM: reorganize svm_interrupt_allowed
  KVM: SVM: don't copy exit_int_info on nested vmrun
  KVM: SVM: Remove remaining occurences of rdtscll
  KVM: SVM: Move INTR vmexit out of atomic code
  KVM: SVM: Add tracepoint for nested vmrun
  KVM: SVM: Add tracepoint for nested #vmexit
  KVM: SVM: Add tracepoint for injected #vmexit

Juan Quintela (1):
  KVM: remove pre_task_link setting in save_state_to_tss16

Marcelo Tosatti (2):
  KVM: SVM: remove needless mmap_sem acquision from nested_svm_map
  KVM: x86: disable paravirt mmu reporting

Mohammed Gamal (5):
  KVM: x86 emulator: Add 'push/pop sreg' instructions
  KVM: x86 emulator: Introduce No64 decode option
  KVM: x86 emulator: Add missing decoder flags for 'or' instructions
  KVM: x86 emulator: Add pusha and popa instructions
  KVM: VMX: Enhance invalid guest state emulation

Stephen Rothwell (1):
  x86: Fix user return notifier put_cpu_var() invocation

Zachary Amsden (4):
  KVM: Separate timer intialization into an indepedent function
  KVM: Kill the confusing tsc_ref_khz and ref_freq variables
  KVM: Fix printk name error in svm.c
  KVM: Fix hotplug of CPUs

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/42] x86: Fix user return notifier put_cpu_var() invocation

2009-11-16 Thread Avi Kivity

From: Stephen Rothwell s...@canb.auug.org.au

Today's linux-next build (x86_64 allmodconfig) failed like this:

  kernel/user-return-notifier.c: In function
  'fire_user_return_notifiers': kernel/user-return-notifier.c:45:
  error: expected expression before ')' token

Introduced by commit 7c68af6e32c73992bad24107311f3433c89016e2
(core, x86: Add user return notifiers) from the tip and kvm trees
but revealed by commit e0fdb0e050eae331046385643618f12452aa7e73
(percpu: add __percpu for sparse) from the percpu tree.

Before that percpu tree commit, put_cpu_var() would compile
without error (even though it really needs a parameter).

Signed-off-by: Stephen Rothwell s...@canb.auug.org.au
Cc: Avi Kivity a...@redhat.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Tejun Heo t...@kernel.org
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Christoph Lameter c...@linux-foundation.org
LKML-Reference: 20091102161722.eea4358d@canb.auug.org.au
Signed-off-by: Ingo Molnar mi...@elte.hu
---
 kernel/user-return-notifier.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/user-return-notifier.c b/kernel/user-return-notifier.c
index 530ccb8..03e2d6f 100644
--- a/kernel/user-return-notifier.c
+++ b/kernel/user-return-notifier.c
@@ -42,5 +42,5 @@ void fire_user_return_notifiers(void)
head = get_cpu_var(return_notifier_list);
hlist_for_each_entry_safe(urn, tmp1, tmp2, head, link)
urn-on_user_return(urn);
-   put_cpu_var();
+   put_cpu_var(return_notifier_list);
 }
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 28/42] KVM: fix lock imbalance in kvm_*_irq_source_id()

2009-11-16 Thread Avi Kivity

From: Jiri Slaby jirisl...@gmail.com

Stanse found 2 lock imbalances in kvm_request_irq_source_id and
kvm_free_irq_source_id. They omit to unlock kvm-irq_lock on fail paths.

Fix that by adding unlock labels at the end of the functions and jump
there from the fail paths.

Signed-off-by: Jiri Slaby jirisl...@gmail.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 virt/kvm/irq_comm.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 15a83b9..00c68d2 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -220,11 +220,13 @@ int kvm_request_irq_source_id(struct kvm *kvm)
 
if (irq_source_id = sizeof(kvm-arch.irq_sources_bitmap)) {
printk(KERN_WARNING kvm: exhaust allocatable IRQ sources!\n);
-   return -EFAULT;
+   irq_source_id = -EFAULT;
+   goto unlock;
}
 
ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
set_bit(irq_source_id, bitmap);
+unlock:
mutex_unlock(kvm-irq_lock);
 
return irq_source_id;
@@ -240,7 +242,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int 
irq_source_id)
if (irq_source_id  0 ||
irq_source_id = sizeof(kvm-arch.irq_sources_bitmap)) {
printk(KERN_ERR kvm: IRQ source ID out of range!\n);
-   return;
+   goto unlock;
}
for (i = 0; i  KVM_IOAPIC_NUM_PINS; i++) {
clear_bit(irq_source_id, kvm-arch.vioapic-irq_states[i]);
@@ -251,6 +253,7 @@ void kvm_free_irq_source_id(struct kvm *kvm, int 
irq_source_id)
 #endif
}
clear_bit(irq_source_id, kvm-arch.irq_sources_bitmap);
+unlock:
mutex_unlock(kvm-irq_lock);
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/42] KVM: Move assigned device code to own file

2009-11-16 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/ia64/kvm/Makefile   |2 +-
 arch/x86/kvm/Makefile|3 +-
 include/linux/kvm_host.h |   17 +
 virt/kvm/assigned-dev.c  |  818 ++
 virt/kvm/kvm_main.c  |  798 +
 5 files changed, 840 insertions(+), 798 deletions(-)
 create mode 100644 virt/kvm/assigned-dev.c

diff --git a/arch/ia64/kvm/Makefile b/arch/ia64/kvm/Makefile
index 0bb99b7..1089b3e 100644
--- a/arch/ia64/kvm/Makefile
+++ b/arch/ia64/kvm/Makefile
@@ -49,7 +49,7 @@ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/ia64/kvm/
 EXTRA_AFLAGS += -Ivirt/kvm -Iarch/ia64/kvm/
 
 common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-   coalesced_mmio.o irq_comm.o)
+   coalesced_mmio.o irq_comm.o assigned-dev.o)
 
 ifeq ($(CONFIG_IOMMU_API),y)
 common-objs += $(addprefix ../../../virt/kvm/, iommu.o)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 0e7fe78..31a7035 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -6,7 +6,8 @@ CFLAGS_svm.o := -I.
 CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-   coalesced_mmio.o irq_comm.o eventfd.o)
+   coalesced_mmio.o irq_comm.o eventfd.o \
+   assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4aa5e1d..c0a1cc3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -577,4 +577,21 @@ static inline bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
return vcpu-kvm-bsp_vcpu_id == vcpu-vcpu_id;
 }
 #endif
+
+#ifdef __KVM_HAVE_DEVICE_ASSIGNMENT
+
+long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
+ unsigned long arg);
+
+#else
+
+static inline long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned 
ioctl,
+   unsigned long arg)
+{
+   return -ENOTTY;
+}
+
 #endif
+
+#endif
+
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
new file mode 100644
index 000..fd9c097
--- /dev/null
+++ b/virt/kvm/assigned-dev.c
@@ -0,0 +1,818 @@
+/*
+ * Kernel-based Virtual Machine - device assignment support
+ *
+ * Copyright (C) 2006-9 Red Hat, Inc
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include linux/kvm_host.h
+#include linux/kvm.h
+#include linux/uaccess.h
+#include linux/vmalloc.h
+#include linux/errno.h
+#include linux/spinlock.h
+#include linux/pci.h
+#include linux/interrupt.h
+#include irq.h
+
+static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
+ int assigned_dev_id)
+{
+   struct list_head *ptr;
+   struct kvm_assigned_dev_kernel *match;
+
+   list_for_each(ptr, head) {
+   match = list_entry(ptr, struct kvm_assigned_dev_kernel, list);
+   if (match-assigned_dev_id == assigned_dev_id)
+   return match;
+   }
+   return NULL;
+}
+
+static int find_index_from_host_irq(struct kvm_assigned_dev_kernel
+   *assigned_dev, int irq)
+{
+   int i, index;
+   struct msix_entry *host_msix_entries;
+
+   host_msix_entries = assigned_dev-host_msix_entries;
+
+   index = -1;
+   for (i = 0; i  assigned_dev-entries_nr; i++)
+   if (irq == host_msix_entries[i].vector) {
+   index = i;
+   break;
+   }
+   if (index  0) {
+   printk(KERN_WARNING Fail to find correlated MSI-X entry!\n);
+   return 0;
+   }
+
+   return index;
+}
+
+static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
+{
+   struct kvm_assigned_dev_kernel *assigned_dev;
+   struct kvm *kvm;
+   int i;
+
+   assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
+   interrupt_work);
+   kvm = assigned_dev-kvm;
+
+   spin_lock_irq(assigned_dev-assigned_dev_lock);
+   if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX) {
+   struct kvm_guest_msix_entry *guest_entries =
+   assigned_dev-guest_msix_entries;
+   for (i = 0; i  assigned_dev-entries_nr; i++) {
+   if (!(guest_entries[i].flags 
+   KVM_ASSIGNED_MSIX_PENDING))
+   continue;
+   guest_entries[i].flags = ~KVM_ASSIGNED_MSIX_PENDING;
+   kvm_set_irq(assigned_dev-kvm,
+

[PATCH 25/42] KVM: SVM: reorganize svm_interrupt_allowed

2009-11-16 Thread Avi Kivity

From: Joerg Roedel joerg.roe...@amd.com

This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 59fe4d5..3f3fe81 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2472,10 +2472,18 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm-vmcb;
-   return (vmcb-save.rflags  X86_EFLAGS_IF) 
-   !(vmcb-control.int_state  SVM_INTERRUPT_SHADOW_MASK) 
-   gif_set(svm) 
-   !(is_nested(svm)  (svm-vcpu.arch.hflags  HF_VINTR_MASK));
+   int ret;
+
+   if (!gif_set(svm) ||
+(vmcb-control.int_state  SVM_INTERRUPT_SHADOW_MASK))
+   return 0;
+
+   ret = !!(vmcb-save.rflags  X86_EFLAGS_IF);
+
+   if (is_nested(svm))
+   return ret  !(svm-vcpu.arch.hflags  HF_VINTR_MASK);
+
+   return ret;
 }
 
 static void enable_irq_window(struct kvm_vcpu *vcpu)
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 37/42] KVM: x86: include pvclock MSRs in msrs_to_save

2009-11-16 Thread Avi Kivity

From: Glauber Costa glom...@redhat.com

For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: sta...@kernel.org
Signed-off-by: Glauber Costa glom...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/x86.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 385cd0a..4de5bc0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -503,16 +503,19 @@ static inline u32 bit(int bitno)
  * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST.
  *
  * This list is modified at module load time to reflect the
- * capabilities of the host cpu.
+ * capabilities of the host cpu. This capabilities test skips MSRs that are
+ * kvm-specific. Those are put in the beginning of the list.
  */
+
+#define KVM_SAVE_MSRS_BEGIN2
 static u32 msrs_to_save[] = {
+   MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_K6_STAR,
 #ifdef CONFIG_X86_64
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-   MSR_IA32_TSC, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
-   MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
+   MSR_IA32_TSC, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
 };
 
 static unsigned num_msrs_to_save;
@@ -2446,7 +2449,8 @@ static void kvm_init_msr_list(void)
u32 dummy[2];
unsigned i, j;
 
-   for (i = j = 0; i  ARRAY_SIZE(msrs_to_save); i++) {
+   /* skip the first msrs in the list. KVM-specific */
+   for (i = j = KVM_SAVE_MSRS_BEGIN; i  ARRAY_SIZE(msrs_to_save); i++) {
if (rdmsr_safe(msrs_to_save[i], dummy[0], dummy[1])  0)
continue;
if (j  i)
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/42] KVM: VMX: Enhance invalid guest state emulation

2009-11-16 Thread Avi Kivity

From: Mohammed Gamal m.gamal...@gmail.com

- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that 
have already failed

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/vmx.c |   44 
 1 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4635298..73cb5dd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -107,7 +107,6 @@ struct vcpu_vmx {
} rmode;
int vpid;
bool emulation_required;
-   enum emulation_result invalid_state_emulation_result;
 
/* Support for vnmi-less CPUs */
int soft_vnmi_blocked;
@@ -3322,35 +3321,37 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu)
return 1;
 }
 
-static void handle_invalid_guest_state(struct kvm_vcpu *vcpu)
+static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
enum emulation_result err = EMULATE_DONE;
-
-   local_irq_enable();
-   preempt_enable();
+   int ret = 1;
 
while (!guest_state_valid(vcpu)) {
err = emulate_instruction(vcpu, 0, 0, 0);
 
-   if (err == EMULATE_DO_MMIO)
-   break;
+   if (err == EMULATE_DO_MMIO) {
+   ret = 0;
+   goto out;
+   }
 
if (err != EMULATE_DONE) {
kvm_report_emulation_failure(vcpu, emulation failure);
-   break;
+   vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   vcpu-run-internal.suberror = 
KVM_INTERNAL_ERROR_EMULATION;
+   ret = 0;
+   goto out;
}
 
if (signal_pending(current))
-   break;
+   goto out;
if (need_resched())
schedule();
}
 
-   preempt_disable();
-   local_irq_disable();
-
-   vmx-invalid_state_emulation_result = err;
+   vmx-emulation_required = 0;
+out:
+   return ret;
 }
 
 /*
@@ -3406,13 +3407,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 
trace_kvm_exit(exit_reason, kvm_rip_read(vcpu));
 
-   /* If we need to emulate an MMIO from handle_invalid_guest_state
-* we just return 0 */
-   if (vmx-emulation_required  emulate_invalid_guest_state) {
-   if (guest_state_valid(vcpu))
-   vmx-emulation_required = 0;
-   return vmx-invalid_state_emulation_result != EMULATE_DO_MMIO;
-   }
+   /* If guest state is invalid, start emulating */
+   if (vmx-emulation_required  emulate_invalid_guest_state)
+   return handle_invalid_guest_state(vcpu);
 
/* Access CR3 don't cause VMExit in paging mode, so we need
 * to sync with guest real CR3. */
@@ -3607,11 +3604,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
vmx-entry_time = ktime_get();
 
-   /* Handle invalid guest state instead of entering VMX */
-   if (vmx-emulation_required  emulate_invalid_guest_state) {
-   handle_invalid_guest_state(vcpu);
+   /* Don't enter VMX if guest state is invalid, let the exit handler
+  start emulation until we arrive back to a valid state */
+   if (vmx-emulation_required  emulate_invalid_guest_state)
return;
-   }
 
if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty))
vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/42] KVM: Maintain back mapping from irqchip/pin to gsi

2009-11-16 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/ia64/include/asm/kvm.h |1 +
 arch/x86/include/asm/kvm.h  |1 +
 include/linux/kvm_host.h|9 +
 virt/kvm/irq_comm.c |   31 ++-
 4 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/include/asm/kvm.h b/arch/ia64/include/asm/kvm.h
index 18a7e49..bc90c75 100644
--- a/arch/ia64/include/asm/kvm.h
+++ b/arch/ia64/include/asm/kvm.h
@@ -60,6 +60,7 @@ struct kvm_ioapic_state {
 #define KVM_IRQCHIP_PIC_MASTER   0
 #define KVM_IRQCHIP_PIC_SLAVE1
 #define KVM_IRQCHIP_IOAPIC   2
+#define KVM_NR_IRQCHIPS  3
 
 #define KVM_CONTEXT_SIZE   8*1024
 
diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 4a5fe91..f02e87a 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -79,6 +79,7 @@ struct kvm_ioapic_state {
 #define KVM_IRQCHIP_PIC_MASTER   0
 #define KVM_IRQCHIP_PIC_SLAVE1
 #define KVM_IRQCHIP_IOAPIC   2
+#define KVM_NR_IRQCHIPS  3
 
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f403e66..cc2d749 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -131,7 +131,10 @@ struct kvm_kernel_irq_routing_entry {
struct hlist_node link;
 };
 
+#ifdef __KVM_HAVE_IOAPIC
+
 struct kvm_irq_routing_table {
+   int chip[KVM_NR_IRQCHIPS][KVM_IOAPIC_NUM_PINS];
struct kvm_kernel_irq_routing_entry *rt_entries;
u32 nr_rt_entries;
/*
@@ -141,6 +144,12 @@ struct kvm_irq_routing_table {
struct hlist_head map[0];
 };
 
+#else
+
+struct kvm_irq_routing_table {};
+
+#endif
+
 struct kvm {
spinlock_t mmu_lock;
spinlock_t requests_lock;
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 81950f6..59cf8da 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -175,25 +175,16 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned 
irqchip, unsigned pin)
 {
struct kvm_irq_ack_notifier *kian;
struct hlist_node *n;
-   unsigned gsi = pin;
-   int i;
+   int gsi;
 
trace_kvm_ack_irq(irqchip, pin);
 
-   for (i = 0; i  kvm-irq_routing-nr_rt_entries; i++) {
-   struct kvm_kernel_irq_routing_entry *e;
-   e = kvm-irq_routing-rt_entries[i];
-   if (e-type == KVM_IRQ_ROUTING_IRQCHIP 
-   e-irqchip.irqchip == irqchip 
-   e-irqchip.pin == pin) {
-   gsi = e-gsi;
-   break;
-   }
-   }
-
-   hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list, link)
-   if (kian-gsi == gsi)
-   kian-irq_acked(kian);
+   gsi = kvm-irq_routing-chip[irqchip][pin];
+   if (gsi != -1)
+   hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list,
+link)
+   if (kian-gsi == gsi)
+   kian-irq_acked(kian);
 }
 
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
@@ -332,6 +323,9 @@ static int setup_routing_entry(struct kvm_irq_routing_table 
*rt,
}
e-irqchip.irqchip = ue-u.irqchip.irqchip;
e-irqchip.pin = ue-u.irqchip.pin + delta;
+   if (e-irqchip.pin = KVM_IOAPIC_NUM_PINS)
+   goto out;
+   rt-chip[ue-u.irqchip.irqchip][e-irqchip.pin] = ue-gsi;
break;
case KVM_IRQ_ROUTING_MSI:
e-set = kvm_set_msi;
@@ -356,7 +350,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
unsigned flags)
 {
struct kvm_irq_routing_table *new, *old;
-   u32 i, nr_rt_entries = 0;
+   u32 i, j, nr_rt_entries = 0;
int r;
 
for (i = 0; i  nr; ++i) {
@@ -377,6 +371,9 @@ int kvm_set_irq_routing(struct kvm *kvm,
new-rt_entries = (void *)new-map[nr_rt_entries];
 
new-nr_rt_entries = nr_rt_entries;
+   for (i = 0; i  3; i++)
+   for (j = 0; j  KVM_IOAPIC_NUM_PINS; j++)
+   new-chip[i][j] = -1;
 
for (i = 0; i  nr; ++i) {
r = -EINVAL;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/42] KVM: Change irq routing table to use gsi indexed array

2009-11-16 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Use gsi indexed array instead of scanning all entries on each interrupt
injection.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 include/linux/kvm_host.h |   21 +--
 virt/kvm/irq_comm.c  |   88 +++--
 virt/kvm/kvm_main.c  |1 -
 3 files changed, 71 insertions(+), 39 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c7f8c4..f403e66 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -128,7 +128,17 @@ struct kvm_kernel_irq_routing_entry {
} irqchip;
struct msi_msg msi;
};
-   struct list_head link;
+   struct hlist_node link;
+};
+
+struct kvm_irq_routing_table {
+   struct kvm_kernel_irq_routing_entry *rt_entries;
+   u32 nr_rt_entries;
+   /*
+* Array indexed by gsi. Each entry contains list of irq chips
+* the gsi is connected to.
+*/
+   struct hlist_head map[0];
 };
 
 struct kvm {
@@ -166,7 +176,7 @@ struct kvm {
 
struct mutex irq_lock;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
-   struct list_head irq_routing; /* of kvm_kernel_irq_routing_entry */
+   struct kvm_irq_routing_table *irq_routing;
struct hlist_head mask_notifier_list;
 #endif
 
@@ -390,7 +400,12 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int 
irq,
  struct kvm_irq_mask_notifier *kimn);
 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask);
 
-int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
+#ifdef __KVM_HAVE_IOAPIC
+void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
+  union kvm_ioapic_redirect_entry *entry,
+  unsigned long *deliver_bitmask);
+#endif
+int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 9783f5c..81950f6 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -144,10 +144,12 @@ static int kvm_set_msi(struct 
kvm_kernel_irq_routing_entry *e,
  *  = 0   Interrupt was coalesced (previous irq is still pending)
  *   0   Number of CPUs interrupt was delivered to
  */
-int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level)
+int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 {
struct kvm_kernel_irq_routing_entry *e;
int ret = -1;
+   struct kvm_irq_routing_table *irq_rt;
+   struct hlist_node *n;
 
trace_kvm_set_irq(irq, level, irq_source_id);
 
@@ -157,8 +159,9 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, int 
irq, int level)
 * IOAPIC.  So set the bit in both. The guest will ignore
 * writes to the unused one.
 */
-   list_for_each_entry(e, kvm-irq_routing, link)
-   if (e-gsi == irq) {
+   irq_rt = kvm-irq_routing;
+   if (irq  irq_rt-nr_rt_entries)
+   hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
int r = e-set(e, kvm, irq_source_id, level);
if (r  0)
continue;
@@ -170,20 +173,23 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, int 
irq, int level)
 
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
-   struct kvm_kernel_irq_routing_entry *e;
struct kvm_irq_ack_notifier *kian;
struct hlist_node *n;
unsigned gsi = pin;
+   int i;
 
trace_kvm_ack_irq(irqchip, pin);
 
-   list_for_each_entry(e, kvm-irq_routing, link)
+   for (i = 0; i  kvm-irq_routing-nr_rt_entries; i++) {
+   struct kvm_kernel_irq_routing_entry *e;
+   e = kvm-irq_routing-rt_entries[i];
if (e-type == KVM_IRQ_ROUTING_IRQCHIP 
e-irqchip.irqchip == irqchip 
e-irqchip.pin == pin) {
gsi = e-gsi;
break;
}
+   }
 
hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list, link)
if (kian-gsi == gsi)
@@ -280,26 +286,30 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, 
bool mask)
kimn-func(kimn, mask);
 }
 
-static void __kvm_free_irq_routing(struct list_head *irq_routing)
-{
-   struct kvm_kernel_irq_routing_entry *e, *n;
-
-   list_for_each_entry_safe(e, n, irq_routing, link)
-   kfree(e);
-}
-
 void kvm_free_irq_routing(struct kvm *kvm)
 {
mutex_lock(kvm-irq_lock);
-   __kvm_free_irq_routing(kvm-irq_routing);
+   kfree(kvm-irq_routing);
mutex_unlock(kvm-irq_lock);
 }
 
-static

[PATCH 09/42] KVM: Move irq sharing information to irqchip level

2009-11-16 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 -
 arch/x86/kvm/irq.h  |1 +
 include/linux/kvm_host.h|2 +-
 virt/kvm/ioapic.h   |1 +
 virt/kvm/irq_comm.c |   59 +++---
 5 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b113f2..35d3236 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -410,7 +410,6 @@ struct kvm_arch{
gpa_t ept_identity_map_addr;
 
unsigned long irq_sources_bitmap;
-   unsigned long irq_states[KVM_IOAPIC_NUM_PINS];
u64 vm_init_tsc;
 };
 
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 7d6058a..c025a23 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -71,6 +71,7 @@ struct kvm_pic {
int output; /* intr from master PIC */
struct kvm_io_device dev;
void (*ack_notifier)(void *opaque, int irq);
+   unsigned long irq_states[16];
 };
 
 struct kvm_pic *kvm_create_pic(struct kvm *kvm);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b7bbb5d..1c7f8c4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -120,7 +120,7 @@ struct kvm_kernel_irq_routing_entry {
u32 gsi;
u32 type;
int (*set)(struct kvm_kernel_irq_routing_entry *e,
-   struct kvm *kvm, int level);
+  struct kvm *kvm, int irq_source_id, int level);
union {
struct {
unsigned irqchip;
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 7080b71..6e461ad 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -41,6 +41,7 @@ struct kvm_ioapic {
u32 irr;
u32 pad;
union kvm_ioapic_redirect_entry redirtbl[IOAPIC_NUM_PINS];
+   unsigned long irq_states[IOAPIC_NUM_PINS];
struct kvm_io_device dev;
struct kvm *kvm;
void (*ack_notifier)(void *opaque, int irq);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 001663f..9783f5c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -31,20 +31,39 @@
 
 #include ioapic.h
 
+static inline int kvm_irq_line_state(unsigned long *irq_state,
+int irq_source_id, int level)
+{
+   /* Logical OR for level trig interrupt */
+   if (level)
+   set_bit(irq_source_id, irq_state);
+   else
+   clear_bit(irq_source_id, irq_state);
+
+   return !!(*irq_state);
+}
+
 static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
-  struct kvm *kvm, int level)
+  struct kvm *kvm, int irq_source_id, int level)
 {
 #ifdef CONFIG_X86
-   return kvm_pic_set_irq(pic_irqchip(kvm), e-irqchip.pin, level);
+   struct kvm_pic *pic = pic_irqchip(kvm);
+   level = kvm_irq_line_state(pic-irq_states[e-irqchip.pin],
+  irq_source_id, level);
+   return kvm_pic_set_irq(pic, e-irqchip.pin, level);
 #else
return -1;
 #endif
 }
 
 static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
- struct kvm *kvm, int level)
+ struct kvm *kvm, int irq_source_id, int level)
 {
-   return kvm_ioapic_set_irq(kvm-arch.vioapic, e-irqchip.pin, level);
+   struct kvm_ioapic *ioapic = kvm-arch.vioapic;
+   level = kvm_irq_line_state(ioapic-irq_states[e-irqchip.pin],
+  irq_source_id, level);
+
+   return kvm_ioapic_set_irq(ioapic, e-irqchip.pin, level);
 }
 
 inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
@@ -96,10 +115,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
 }
 
 static int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
-  struct kvm *kvm, int level)
+  struct kvm *kvm, int irq_source_id, int level)
 {
struct kvm_lapic_irq irq;
 
+   if (!level)
+   return -1;
+
trace_kvm_msi_set_irq(e-msi.address_lo, e-msi.data);
 
irq.dest_id = (e-msi.address_lo 
@@ -125,34 +147,19 @@ static int kvm_set_msi(struct 
kvm_kernel_irq_routing_entry *e,
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level)
 {
struct kvm_kernel_irq_routing_entry *e;
-   unsigned long *irq_state, sig_level;
int ret = -1;
 
trace_kvm_set_irq(irq, level, irq_source_id);
 
WARN_ON(!mutex_is_locked(kvm-irq_lock));
 
-   if (irq  KVM_IOAPIC_NUM_PINS) {
-   irq_state = (unsigned long *)kvm-arch.irq_states[irq];
-
-

[PATCH 06/42] KVM: x86 emulator: Introduce No64 decode option

2009-11-16 Thread Avi Kivity

From: Mohammed Gamal m.gamal...@gmail.com

Introduces a new decode option No64, which is used for instructions that are
invalid in long mode.

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |   42 ++
 1 files changed, 14 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1cdfec5..1f0ff4a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -75,6 +75,8 @@
 #define Group   (114) /* Bits 3:5 of modrm byte extend opcode */
 #define GroupDual   (115) /* Alternate decoding of mod == 3 */
 #define GroupMask   0xff/* Group number stored in bits 0:7 */
+/* Misc flags */
+#define No64   (128)
 /* Source 2 operand type */
 #define Src2None(029)
 #define Src2CL  (129)
@@ -93,21 +95,21 @@ static u32 opcode_table[256] = {
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
-   ImplicitOps | Stack, ImplicitOps | Stack,
+   ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
/* 0x08 - 0x0F */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
-   0, 0, ImplicitOps | Stack, 0,
+   0, 0, ImplicitOps | Stack | No64, 0,
/* 0x10 - 0x17 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
-   ImplicitOps | Stack, ImplicitOps | Stack,
+   ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
/* 0x18 - 0x1F */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
-   ImplicitOps | Stack, ImplicitOps | Stack,
+   ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
/* 0x20 - 0x27 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
@@ -161,7 +163,7 @@ static u32 opcode_table[256] = {
/* 0x90 - 0x97 */
DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg,
/* 0x98 - 0x9F */
-   0, 0, SrcImm | Src2Imm16, 0,
+   0, 0, SrcImm | Src2Imm16 | No64, 0,
ImplicitOps | Stack, ImplicitOps | Stack, 0, 0,
/* 0xA0 - 0xA7 */
ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs,
@@ -188,7 +190,7 @@ static u32 opcode_table[256] = {
ByteOp | DstMem | SrcImm | ModRM | Mov, DstMem | SrcImm | ModRM | Mov,
/* 0xC8 - 0xCF */
0, 0, 0, ImplicitOps | Stack,
-   ImplicitOps, SrcImmByte, ImplicitOps, ImplicitOps,
+   ImplicitOps, SrcImmByte, ImplicitOps | No64, ImplicitOps,
/* 0xD0 - 0xD7 */
ByteOp | DstMem | SrcImplicit | ModRM, DstMem | SrcImplicit | ModRM,
ByteOp | DstMem | SrcImplicit | ModRM, DstMem | SrcImplicit | ModRM,
@@ -201,7 +203,7 @@ static u32 opcode_table[256] = {
ByteOp | SrcImmUByte, SrcImmUByte,
/* 0xE8 - 0xEF */
SrcImm | Stack, SrcImm | ImplicitOps,
-   SrcImmU | Src2Imm16, SrcImmByte | ImplicitOps,
+   SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps,
SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
/* 0xF0 - 0xF7 */
@@ -967,6 +969,11 @@ done_prefixes:
}
}
 
+   if (mode == X86EMUL_MODE_PROT64  (c-d  No64)) {
+   kvm_report_emulation_failure(ctxt-vcpu, invalid x86/64 
instruction);;
+   return -1;
+   }
+
if (c-d  Group) {
group = c-d  GroupMask;
c-modrm = insn_fetch(u8, 1, c-eip);
@@ -1739,15 +1746,9 @@ special_insn:
emulate_2op_SrcV(add, c-src, c-dst, ctxt-eflags);
break;
case 0x06:  /* push es */
-   if (ctxt-mode == X86EMUL_MODE_PROT64)
-   goto cannot_emulate;
-
emulate_push_sreg(ctxt, VCPU_SREG_ES);
break;
case 0x07:  /* pop es */
-if (ctxt-mode == X86EMUL_MODE_PROT64)
-goto cannot_emulate;
-
rc = emulate_pop_sreg(ctxt, ops, VCPU_SREG_ES);
if (rc != 0)
goto done;
@@ -1757,9 +1758,6 @@ special_insn:
emulate_2op_SrcV(or, c-src, c-dst, ctxt-eflags);
break;
case 0x0e:  /* push cs */
-if (ctxt-mode == X86EMUL_MODE_PROT64)
-goto cannot_emulate;
-
emulate_push_sreg(ctxt, VCPU_SREG_CS);
break;
case

[PATCH 02/42] x86: Fix user return notifier build

2009-11-16 Thread Avi Kivity

When CONFIG_USER_RETURN_NOTIFIER is set, we need to link
kernel/user-return-notifier.o.

Signed-off-by: Avi Kivity a...@redhat.com
LKML-Reference: 1256473485-23109-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar mi...@elte.hu
---
 kernel/Makefile |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index b8d4cd8..0ae57a8 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -95,6 +95,7 @@ obj-$(CONFIG_RING_BUFFER) += trace/
 obj-$(CONFIG_SMP) += sched_cpupri.o
 obj-$(CONFIG_SLOW_WORK) += slow-work.o
 obj-$(CONFIG_PERF_EVENTS) += perf_event.o
+obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
 
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra a...@linuxcare.com.au, the -fno-omit-frame-pointer 
is
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/42] KVM: Don't wrap schedule() with vcpu_put()/vcpu_load()

2009-11-16 Thread Avi Kivity

Preemption notifiers will do that for us automatically.

Signed-off-by: Avi Kivity a...@redhat.com
---
 virt/kvm/kvm_main.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7495ce3..22b520b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1689,9 +1689,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
if (signal_pending(current))
break;
 
-   vcpu_put(vcpu);
schedule();
-   vcpu_load(vcpu);
}
 
finish_wait(vcpu-wq, wait);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/42] core, x86: Add user return notifiers

2009-11-16 Thread Avi Kivity

Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace.  The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.

This will be used initially to speed up KVM task switching by lazily
updating MSRs.

Signed-off-by: Avi Kivity a...@redhat.com
LKML-Reference: 1253342422-13811-1-git-send-email-...@redhat.com
Signed-off-by: H. Peter Anvin h...@zytor.com
---
 arch/Kconfig |   10 +++
 arch/x86/Kconfig |1 +
 arch/x86/include/asm/thread_info.h   |7 +++-
 arch/x86/kernel/process.c|2 +
 arch/x86/kernel/signal.c |3 ++
 include/linux/user-return-notifier.h |   42 +++
 kernel/user-return-notifier.c|   46 ++
 7 files changed, 109 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/user-return-notifier.h
 create mode 100644 kernel/user-return-notifier.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 7f418bb..4e312ff 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -83,6 +83,13 @@ config KRETPROBES
def_bool y
depends on KPROBES  HAVE_KRETPROBES
 
+config USER_RETURN_NOTIFIER
+   bool
+   depends on HAVE_USER_RETURN_NOTIFIER
+   help
+ Provide a kernel-internal notification when a cpu is about to
+ switch to user mode.
+
 config HAVE_IOREMAP_PROT
bool
 
@@ -126,4 +133,7 @@ config HAVE_DMA_API_DEBUG
 config HAVE_DEFAULT_NO_SPIN_MUTEXES
bool
 
+config HAVE_USER_RETURN_NOTIFIER
+   bool
+
 source kernel/gcov/Kconfig
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8da9374..1df175d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -50,6 +50,7 @@ config X86
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_LZMA
select HAVE_ARCH_KMEMCHECK
+   select HAVE_USER_RETURN_NOTIFIER
 
 config OUTPUT_FORMAT
string
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index d27d0a2..375c917 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -83,6 +83,7 @@ struct thread_info {
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SECCOMP8   /* secure computing */
 #define TIF_MCE_NOTIFY 10  /* notify userspace of an MCE */
+#define TIF_USER_RETURN_NOTIFY 11  /* notify kernel of userspace return */
 #define TIF_NOTSC  16  /* TSC is not accessible in userland */
 #define TIF_IA32   17  /* 32bit process */
 #define TIF_FORK   18  /* ret_from_fork */
@@ -107,6 +108,7 @@ struct thread_info {
 #define _TIF_SYSCALL_AUDIT (1  TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP   (1  TIF_SECCOMP)
 #define _TIF_MCE_NOTIFY(1  TIF_MCE_NOTIFY)
+#define _TIF_USER_RETURN_NOTIFY(1  TIF_USER_RETURN_NOTIFY)
 #define _TIF_NOTSC (1  TIF_NOTSC)
 #define _TIF_IA32  (1  TIF_IA32)
 #define _TIF_FORK  (1  TIF_FORK)
@@ -142,13 +144,14 @@ struct thread_info {
 
 /* Only used for 64 bit */
 #define _TIF_DO_NOTIFY_MASK\
-   (_TIF_SIGPENDING|_TIF_MCE_NOTIFY|_TIF_NOTIFY_RESUME)
+   (_TIF_SIGPENDING | _TIF_MCE_NOTIFY | _TIF_NOTIFY_RESUME |   \
+_TIF_USER_RETURN_NOTIFY)
 
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW
\
(_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC)
 
-#define _TIF_WORK_CTXSW_PREV _TIF_WORK_CTXSW
+#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
 #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW|_TIF_DEBUG)
 
 #define PREEMPT_ACTIVE 0x1000
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5284cd2..e51b056 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -9,6 +9,7 @@
 #include linux/pm.h
 #include linux/clockchips.h
 #include linux/random.h
+#include linux/user-return-notifier.h
 #include trace/events/power.h
 #include asm/system.h
 #include asm/apic.h
@@ -224,6 +225,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct 
task_struct *next_p,
 */
memset(tss-io_bitmap, 0xff, prev-io_bitmap_max);
}
+   propagate_user_return_notify(prev_p, next_p);
 }
 
 int sys_fork(struct pt_regs *regs)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 6a44a76..c49f90f 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -19,6 +19,7 @@
 #include linux/stddef.h
 #include linux/personality.h
 #include linux/uaccess.h
+#include linux/user-return-notifier.h
 
 #include asm/processor.h
 #include asm/ucontext.h
@@ -872,6 +873,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 
thread_info_flags)
if

[PATCH 39/42] KVM: SVM: Move INTR vmexit out of atomic code

2009-11-16 Thread Avi Kivity

From: Joerg Roedel joerg.roe...@amd.com

The nested SVM code emulates a #vmexit caused by a request
to open the irq window right in the request function. This
is a bug because the request function runs with preemption
and interrupts disabled but the #vmexit emulation might
sleep. This can cause a schedule()-while-atomic bug and is
fixed with this patch.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c |   26 +-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e372854..884bffc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -85,6 +85,9 @@ struct nested_state {
/* gpa pointers to the real vectors */
u64 vmcb_msrpm;
 
+   /* A VMEXIT is required but not yet emulated */
+   bool exit_required;
+
/* cache for intercepts of the guest */
u16 intercept_cr_read;
u16 intercept_cr_write;
@@ -1379,7 +1382,14 @@ static inline int nested_svm_intr(struct vcpu_svm *svm)
 
svm-vmcb-control.exit_code = SVM_EXIT_INTR;
 
-   if (nested_svm_exit_handled(svm)) {
+   if (svm-nested.intercept  1ULL) {
+   /*
+* The #vmexit can't be emulated here directly because this
+* code path runs with irqs and preemtion disabled. A
+* #vmexit emulation might sleep. Only signal request for
+* the #vmexit here.
+*/
+   svm-nested.exit_required = true;
nsvm_printk(VMexit - INTR\n);
return 1;
}
@@ -2340,6 +2350,13 @@ static int handle_exit(struct kvm_vcpu *vcpu)
 
trace_kvm_exit(exit_code, svm-vmcb-save.rip);
 
+   if (unlikely(svm-nested.exit_required)) {
+   nested_svm_vmexit(svm);
+   svm-nested.exit_required = false;
+
+   return 1;
+   }
+
if (is_nested(svm)) {
int vmexit;
 
@@ -2615,6 +2632,13 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
u16 gs_selector;
u16 ldt_selector;
 
+   /*
+* A vmexit emulation is required before the vcpu can be executed
+* again.
+*/
+   if (unlikely(svm-nested.exit_required))
+   return;
+
svm-vmcb-save.rax = vcpu-arch.regs[VCPU_REGS_RAX];
svm-vmcb-save.rsp = vcpu-arch.regs[VCPU_REGS_RSP];
svm-vmcb-save.rip = vcpu-arch.regs[VCPU_REGS_RIP];
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/42] KVM: x86 emulator: Add pusha and popa instructions

2009-11-16 Thread Avi Kivity

From: Mohammed Gamal m.gamal...@gmail.com

This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.

[marcelo: remove unused variable]

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |   48 +++-
 1 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index db0820d..d226dff 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -139,7 +139,8 @@ static u32 opcode_table[256] = {
DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
DstReg | Stack, DstReg | Stack, DstReg | Stack, DstReg | Stack,
/* 0x60 - 0x67 */
-   0, 0, 0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
+   ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
+   0, DstReg | SrcMem32 | ModRM | Mov /* movsxd (x86/64) */ ,
0, 0, 0, 0,
/* 0x68 - 0x6F */
SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0,
@@ -1225,6 +1226,43 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt 
*ctxt,
return rc;
 }
 
+static void emulate_pusha(struct x86_emulate_ctxt *ctxt)
+{
+   struct decode_cache *c = ctxt-decode;
+   unsigned long old_esp = c-regs[VCPU_REGS_RSP];
+   int reg = VCPU_REGS_RAX;
+
+   while (reg = VCPU_REGS_RDI) {
+   (reg == VCPU_REGS_RSP) ?
+   (c-src.val = old_esp) : (c-src.val = c-regs[reg]);
+
+   emulate_push(ctxt);
+   ++reg;
+   }
+}
+
+static int emulate_popa(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops)
+{
+   struct decode_cache *c = ctxt-decode;
+   int rc = 0;
+   int reg = VCPU_REGS_RDI;
+
+   while (reg = VCPU_REGS_RAX) {
+   if (reg == VCPU_REGS_RSP) {
+   register_address_increment(c, c-regs[VCPU_REGS_RSP],
+   c-op_bytes);
+   --reg;
+   }
+
+   rc = emulate_pop(ctxt, ops, c-regs[reg], c-op_bytes);
+   if (rc != 0)
+   break;
+   --reg;
+   }
+   return rc;
+}
+
 static inline int emulate_grp1a(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops)
 {
@@ -1816,6 +1854,14 @@ special_insn:
if (rc != 0)
goto done;
break;
+   case 0x60:  /* pusha */
+   emulate_pusha(ctxt);
+   break;
+   case 0x61:  /* popa */
+   rc = emulate_popa(ctxt, ops);
+   if (rc != 0)
+   goto done;
+   break;
case 0x63:  /* movsxd */
if (ctxt-mode != X86EMUL_MODE_PROT64)
goto cannot_emulate;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 24/42] KVM: remove duplicated #include

2009-11-16 Thread Avi Kivity

From: Huang Weiyi weiyi.hu...@gmail.com

Remove duplicated #include('s) in
  arch/x86/kvm/lapic.c

Signed-off-by: Huang Weiyi weiyi.hu...@gmail.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/lapic.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8787637..cd60c0b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -32,7 +32,6 @@
 #include asm/current.h
 #include asm/apicdef.h
 #include asm/atomic.h
-#include asm/apicdef.h
 #include kvm_cache_regs.h
 #include irq.h
 #include trace.h
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 26/42] KVM: SVM: don't copy exit_int_info on nested vmrun

2009-11-16 Thread Avi Kivity

From: Joerg Roedel joerg.roe...@amd.com

The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3f3fe81..41c996a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1797,8 +1797,6 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm-nested.intercept= nested_vmcb-control.intercept;
 
force_new_asid(svm-vcpu);
-   svm-vmcb-control.exit_int_info = nested_vmcb-control.exit_int_info;
-   svm-vmcb-control.exit_int_info_err = 
nested_vmcb-control.exit_int_info_err;
svm-vmcb-control.int_ctl = nested_vmcb-control.int_ctl | 
V_INTR_MASKING_MASK;
if (nested_vmcb-control.int_ctl  V_IRQ_MASK) {
nsvm_printk(nSVM Injecting Interrupt: 0x%x\n,
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 27/42] KVM: SVM: Remove remaining occurences of rdtscll

2009-11-16 Thread Avi Kivity

From: Joerg Roedel joerg.roe...@amd.com

This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 41c996a..9a4daca 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -763,14 +763,13 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
int i;
 
if (unlikely(cpu != vcpu-cpu)) {
-   u64 tsc_this, delta;
+   u64 delta;
 
/*
 * Make sure that the guest sees a monotonically
 * increasing TSC.
 */
-   rdtscll(tsc_this);
-   delta = vcpu-arch.host_tsc - tsc_this;
+   delta = vcpu-arch.host_tsc - native_read_tsc();
svm-vmcb-control.tsc_offset += delta;
if (is_nested(svm))
svm-nested.hsave-control.tsc_offset += delta;
@@ -792,7 +791,7 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
for (i = 0; i  NR_HOST_SAVE_USER_MSRS; i++)
wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
 
-   rdtscll(vcpu-arch.host_tsc);
+   vcpu-arch.host_tsc = native_read_tsc();
 }
 
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/42] KVM: Don't pass kvm_run arguments

2009-11-16 Thread Avi Kivity

They're just copies of vcpu-run, which is readily accessible.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |   10 ++--
 arch/x86/kvm/emulate.c  |6 +-
 arch/x86/kvm/mmu.c  |2 +-
 arch/x86/kvm/svm.c  |  102 ++
 arch/x86/kvm/vmx.c  |  113 ++
 arch/x86/kvm/x86.c  |   50 -
 6 files changed, 141 insertions(+), 142 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d838922..0b113f2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -506,8 +506,8 @@ struct kvm_x86_ops {
 
void (*tlb_flush)(struct kvm_vcpu *vcpu);
 
-   void (*run)(struct kvm_vcpu *vcpu, struct kvm_run *run);
-   int (*handle_exit)(struct kvm_run *run, struct kvm_vcpu *vcpu);
+   void (*run)(struct kvm_vcpu *vcpu);
+   int (*handle_exit)(struct kvm_vcpu *vcpu);
void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
@@ -568,7 +568,7 @@ enum emulation_result {
 #define EMULTYPE_NO_DECODE (1  0)
 #define EMULTYPE_TRAP_UD   (1  1)
 #define EMULTYPE_SKIP  (1  2)
-int emulate_instruction(struct kvm_vcpu *vcpu, struct kvm_run *run,
+int emulate_instruction(struct kvm_vcpu *vcpu,
unsigned long cr2, u16 error_code, int emulation_type);
 void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context);
 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
@@ -585,9 +585,9 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 
data);
 
 struct x86_emulate_ctxt;
 
-int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
+int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in,
 int size, unsigned port);
-int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
+int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
   int size, unsigned long count, int down,
gva_t address, int rep, unsigned port);
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1f0ff4a..0644d3d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1826,7 +1826,7 @@ special_insn:
break;
case 0x6c:  /* insb */
case 0x6d:  /* insw/insd */
-if (kvm_emulate_pio_string(ctxt-vcpu, NULL,
+if (kvm_emulate_pio_string(ctxt-vcpu,
1,
(c-d  ByteOp) ? 1 : c-op_bytes,
c-rep_prefix ?
@@ -1842,7 +1842,7 @@ special_insn:
return 0;
case 0x6e:  /* outsb */
case 0x6f:  /* outsw/outsd */
-   if (kvm_emulate_pio_string(ctxt-vcpu, NULL,
+   if (kvm_emulate_pio_string(ctxt-vcpu,
0,
(c-d  ByteOp) ? 1 : c-op_bytes,
c-rep_prefix ?
@@ -2135,7 +2135,7 @@ special_insn:
case 0xef: /* out (e/r)ax,dx */
port = c-regs[VCPU_REGS_RDX];
io_dir_in = 0;
-   do_io:  if (kvm_emulate_pio(ctxt-vcpu, NULL, io_dir_in,
+   do_io:  if (kvm_emulate_pio(ctxt-vcpu, io_dir_in,
   (c-d  ByteOp) ? 1 : c-op_bytes,
   port) != 0) {
c-eip = saved_eip;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 818b92a..a902479 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2789,7 +2789,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code)
if (r)
goto out;
 
-   er = emulate_instruction(vcpu, vcpu-run, cr2, error_code, 0);
+   er = emulate_instruction(vcpu, cr2, error_code, 0);
 
switch (er) {
case EMULATE_DONE:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c17404a..92048a6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -286,7 +286,7 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
 
if (!svm-next_rip) {
-   if (emulate_instruction(vcpu, vcpu-run, 0, 0, EMULTYPE_SKIP) !=
+   if (emulate_instruction(vcpu, 0, 0, EMULTYPE_SKIP) !=
EMULATE_DONE)
printk(KERN_DEBUG %s: NOP\n, __func__);
return;
@@ -1180,7 +1180,7 @@ static void svm_set_dr(struct kvm_vcpu *vcpu, int dr, 
unsigned long value,
}
 }
 
-static int pf_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+static

[PATCH 41/42] KVM: SVM: Add tracepoint for nested #vmexit

2009-11-16 Thread Avi Kivity

From: Joerg Roedel joerg.roe...@amd.com

This patch adds a tracepoint for every #vmexit we get from a
nested guest.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c   |6 ++
 arch/x86/kvm/trace.h |   36 
 arch/x86/kvm/x86.c   |1 +
 3 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 907af3f..edf6e8b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2366,6 +2366,12 @@ static int handle_exit(struct kvm_vcpu *vcpu)
if (is_nested(svm)) {
int vmexit;
 
+   trace_kvm_nested_vmexit(svm-vmcb-save.rip, exit_code,
+   svm-vmcb-control.exit_info_1,
+   svm-vmcb-control.exit_info_2,
+   svm-vmcb-control.exit_int_info,
+   svm-vmcb-control.exit_int_info_err);
+
nsvm_printk(nested handle_exit: 0x%x | 0x%lx | 0x%lx | 
0x%lx\n,
exit_code, svm-vmcb-control.exit_info_1,
svm-vmcb-control.exit_info_2, 
svm-vmcb-save.rip);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b5798e1..a7eb629 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -382,6 +382,42 @@ TRACE_EVENT(kvm_nested_vmrun,
__entry-npt ? on : off)
 );
 
+/*
+ * Tracepoint for #VMEXIT while nested
+ */
+TRACE_EVENT(kvm_nested_vmexit,
+   TP_PROTO(__u64 rip, __u32 exit_code,
+__u64 exit_info1, __u64 exit_info2,
+__u32 exit_int_info, __u32 exit_int_info_err),
+   TP_ARGS(rip, exit_code, exit_info1, exit_info2,
+   exit_int_info, exit_int_info_err),
+
+   TP_STRUCT__entry(
+   __field(__u64,  rip )
+   __field(__u32,  exit_code   )
+   __field(__u64,  exit_info1  )
+   __field(__u64,  exit_info2  )
+   __field(__u32,  exit_int_info   )
+   __field(__u32,  exit_int_info_err   )
+   ),
+
+   TP_fast_assign(
+   __entry-rip= rip;
+   __entry-exit_code  = exit_code;
+   __entry-exit_info1 = exit_info1;
+   __entry-exit_info2 = exit_info2;
+   __entry-exit_int_info  = exit_int_info;
+   __entry-exit_int_info_err  = exit_int_info_err;
+   ),
+   TP_printk(rip: 0x%016llx reason: %s ext_inf1: 0x%016llx 
+ ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x\n,
+ __entry-rip,
+ ftrace_print_symbols_seq(p, __entry-exit_code,
+  kvm_x86_ops-exit_reasons_str),
+ __entry-exit_info1, __entry-exit_info2,
+ __entry-exit_int_info, __entry-exit_int_info_err)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3ab2f90..192d58e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4985,3 +4985,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/42] KVM: x86 emulator: Add 'push/pop sreg' instructions

2009-11-16 Thread Avi Kivity

From: Mohammed Gamal m.gamal...@gmail.com

[avi: avoid buffer overflow]

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |  107 +---
 1 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1be5cd6..1cdfec5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -92,19 +92,22 @@ static u32 opcode_table[256] = {
/* 0x00 - 0x07 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
-   ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, 0, 0,
+   ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
+   ImplicitOps | Stack, ImplicitOps | Stack,
/* 0x08 - 0x0F */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
-   0, 0, 0, 0,
+   0, 0, ImplicitOps | Stack, 0,
/* 0x10 - 0x17 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
-   ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, 0, 0,
+   ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
+   ImplicitOps | Stack, ImplicitOps | Stack,
/* 0x18 - 0x1F */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
-   ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, 0, 0,
+   ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
+   ImplicitOps | Stack, ImplicitOps | Stack,
/* 0x20 - 0x27 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
@@ -244,11 +247,13 @@ static u32 twobyte_table[256] = {
/* 0x90 - 0x9F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
/* 0xA0 - 0xA7 */
-   0, 0, 0, DstMem | SrcReg | ModRM | BitOp,
+   ImplicitOps | Stack, ImplicitOps | Stack,
+   0, DstMem | SrcReg | ModRM | BitOp,
DstMem | SrcReg | Src2ImmByte | ModRM,
DstMem | SrcReg | Src2CL | ModRM, 0, 0,
/* 0xA8 - 0xAF */
-   0, 0, 0, DstMem | SrcReg | ModRM | BitOp,
+   ImplicitOps | Stack, ImplicitOps | Stack,
+   0, DstMem | SrcReg | ModRM | BitOp,
DstMem | SrcReg | Src2ImmByte | ModRM,
DstMem | SrcReg | Src2CL | ModRM,
ModRM, 0,
@@ -1186,6 +1191,32 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
return rc;
 }
 
+static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int seg)
+{
+   struct decode_cache *c = ctxt-decode;
+   struct kvm_segment segment;
+
+   kvm_x86_ops-get_segment(ctxt-vcpu, segment, seg);
+
+   c-src.val = segment.selector;
+   emulate_push(ctxt);
+}
+
+static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops, int seg)
+{
+   struct decode_cache *c = ctxt-decode;
+   unsigned long selector;
+   int rc;
+
+   rc = emulate_pop(ctxt, ops, selector, c-op_bytes);
+   if (rc != 0)
+   return rc;
+
+   rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)selector, 1, seg);
+   return rc;
+}
+
 static inline int emulate_grp1a(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops)
 {
@@ -1707,18 +1738,66 @@ special_insn:
  add:  /* add */
emulate_2op_SrcV(add, c-src, c-dst, ctxt-eflags);
break;
+   case 0x06:  /* push es */
+   if (ctxt-mode == X86EMUL_MODE_PROT64)
+   goto cannot_emulate;
+
+   emulate_push_sreg(ctxt, VCPU_SREG_ES);
+   break;
+   case 0x07:  /* pop es */
+if (ctxt-mode == X86EMUL_MODE_PROT64)
+goto cannot_emulate;
+
+   rc = emulate_pop_sreg(ctxt, ops, VCPU_SREG_ES);
+   if (rc != 0)
+   goto done;
+   break;
case 0x08 ... 0x0d:
  or:   /* or */
emulate_2op_SrcV(or, c-src, c-dst, ctxt-eflags);
break;
+   case 0x0e:  /* push cs */
+if (ctxt-mode == X86EMUL_MODE_PROT64)
+goto cannot_emulate;
+
+   emulate_push_sreg(ctxt, VCPU_SREG_CS);
+   break;
case 0x10 ... 0x15:
  adc:  /* adc */
emulate_2op_SrcV(adc, c-src, c-dst, ctxt-eflags);
break;
+   case 0x16:  /* push ss */
+if (ctxt-mode == X86EMUL_MODE_PROT64)
+goto cannot_emulate;
+
+   emulate_push_sreg(ctxt, VCPU_SREG_SS);
+   break;
+   case 0x17:  /* pop ss */
+if

[PATCH 34/42] KVM: x86: Refactor guest debug IOCTL handling

2009-11-16 Thread Avi Kivity

From: Jan Kiszka jan.kis...@web.de

Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |4 ++--
 arch/x86/kvm/svm.c  |   14 ++
 arch/x86/kvm/vmx.c  |   18 +-
 arch/x86/kvm/x86.c  |   28 +---
 4 files changed, 26 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 295c7c4..e7f8708 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -475,8 +475,8 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
 
-   int (*set_guest_debug)(struct kvm_vcpu *vcpu,
-  struct kvm_guest_debug *dbg);
+   void (*set_guest_debug)(struct kvm_vcpu *vcpu,
+   struct kvm_guest_debug *dbg);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 02a4269..279a2ae 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1065,26 +1065,16 @@ static void update_db_intercept(struct kvm_vcpu *vcpu)
vcpu-guest_debug = 0;
 }
 
-static int svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
+static void svm_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
 {
-   int old_debug = vcpu-guest_debug;
struct vcpu_svm *svm = to_svm(vcpu);
 
-   vcpu-guest_debug = dbg-control;
-
-   update_db_intercept(vcpu);
-
if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)
svm-vmcb-save.dr7 = dbg-arch.debugreg[7];
else
svm-vmcb-save.dr7 = vcpu-arch.dr7;
 
-   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
-   svm-vmcb-save.rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
-   else if (old_debug  KVM_GUESTDBG_SINGLESTEP)
-   svm-vmcb-save.rflags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
-
-   return 0;
+   update_db_intercept(vcpu);
 }
 
 static void load_host_msrs(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 97f4265..70020e5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1096,30 +1096,14 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum 
kvm_reg reg)
}
 }
 
-static int set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
+static void set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg)
 {
-   int old_debug = vcpu-guest_debug;
-   unsigned long flags;
-
-   vcpu-guest_debug = dbg-control;
-   if (!(vcpu-guest_debug  KVM_GUESTDBG_ENABLE))
-   vcpu-guest_debug = 0;
-
if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)
vmcs_writel(GUEST_DR7, dbg-arch.debugreg[7]);
else
vmcs_writel(GUEST_DR7, vcpu-arch.dr7);
 
-   flags = vmcs_readl(GUEST_RFLAGS);
-   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
-   flags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
-   else if (old_debug  KVM_GUESTDBG_SINGLESTEP)
-   flags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
-   vmcs_writel(GUEST_RFLAGS, flags);
-
update_exception_bitmap(vcpu);
-
-   return 0;
 }
 
 static __init int cpu_has_kvm_support(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f44d56..a06f88e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4472,12 +4472,19 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
 {
-   int i, r;
+   unsigned long rflags;
+   int old_debug;
+   int i;
 
vcpu_load(vcpu);
 
-   if ((dbg-control  (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP)) ==
-   (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP)) {
+   old_debug = vcpu-guest_debug;
+
+   vcpu-guest_debug = dbg-control;
+   if (!(vcpu-guest_debug  KVM_GUESTDBG_ENABLE))
+   vcpu-guest_debug = 0;
+
+   if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP) {
for (i = 0; i  KVM_NR_DB_REGS; ++i)
vcpu-arch.eff_db[i] = dbg-arch.debugreg[i];
vcpu-arch.switch_db_regs =
@@ -4488,16 +4495,23 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
vcpu-arch.switch_db_regs = (vcpu-arch.dr7  DR7_BP_EN_MASK);
}
 
-   r = kvm_x86_ops-set_guest_debug(vcpu, dbg);
+   rflags = kvm_x86_ops-get_rflags(vcpu);

[PATCH 42/42] KVM: SVM: Add tracepoint for injected #vmexit

2009-11-16 Thread Avi Kivity

From: Joerg Roedel joerg.roe...@amd.com

This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c   |6 ++
 arch/x86/kvm/trace.h |   33 +
 arch/x86/kvm/x86.c   |1 +
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index edf6e8b..369eeb8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1592,6 +1592,12 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
struct vmcb *hsave = svm-nested.hsave;
struct vmcb *vmcb = svm-vmcb;
 
+   trace_kvm_nested_vmexit_inject(vmcb-control.exit_code,
+  vmcb-control.exit_info_1,
+  vmcb-control.exit_info_2,
+  vmcb-control.exit_int_info,
+  vmcb-control.exit_int_info_err);
+
nested_vmcb = nested_svm_map(svm, svm-nested.vmcb, KM_USER0);
if (!nested_vmcb)
return 1;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index a7eb629..4d6bb5e 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -418,6 +418,39 @@ TRACE_EVENT(kvm_nested_vmexit,
  __entry-exit_int_info, __entry-exit_int_info_err)
 );
 
+/*
+ * Tracepoint for #VMEXIT reinjected to the guest
+ */
+TRACE_EVENT(kvm_nested_vmexit_inject,
+   TP_PROTO(__u32 exit_code,
+__u64 exit_info1, __u64 exit_info2,
+__u32 exit_int_info, __u32 exit_int_info_err),
+   TP_ARGS(exit_code, exit_info1, exit_info2,
+   exit_int_info, exit_int_info_err),
+
+   TP_STRUCT__entry(
+   __field(__u32,  exit_code   )
+   __field(__u64,  exit_info1  )
+   __field(__u64,  exit_info2  )
+   __field(__u32,  exit_int_info   )
+   __field(__u32,  exit_int_info_err   )
+   ),
+
+   TP_fast_assign(
+   __entry-exit_code  = exit_code;
+   __entry-exit_info1 = exit_info1;
+   __entry-exit_info2 = exit_info2;
+   __entry-exit_int_info  = exit_int_info;
+   __entry-exit_int_info_err  = exit_int_info_err;
+   ),
+
+   TP_printk(reason: %s ext_inf1: 0x%016llx 
+ ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x\n,
+ ftrace_print_symbols_seq(p, __entry-exit_code,
+  kvm_x86_ops-exit_reasons_str),
+   __entry-exit_info1, __entry-exit_info2,
+   __entry-exit_int_info, __entry-exit_int_info_err)
+);
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 192d58e..a522d9b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4986,3 +4986,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 40/42] KVM: SVM: Add tracepoint for nested vmrun

2009-11-16 Thread Avi Kivity

From: Joerg Roedel joerg.roe...@amd.com

This patch adds a dedicated kvm tracepoint for a nested
vmrun.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c   |6 ++
 arch/x86/kvm/trace.h |   33 +
 arch/x86/kvm/x86.c   |1 +
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 884bffc..907af3f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1726,6 +1726,12 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
/* nested_vmcb is our indicator if nested SVM is activated */
svm-nested.vmcb = svm-vmcb-save.rax;
 
+   trace_kvm_nested_vmrun(svm-vmcb-save.rip - 3, svm-nested.vmcb,
+  nested_vmcb-save.rip,
+  nested_vmcb-control.int_ctl,
+  nested_vmcb-control.event_inj,
+  nested_vmcb-control.nested_ctl);
+
/* Clear internal status */
kvm_clear_exception_queue(svm-vcpu);
kvm_clear_interrupt_queue(svm-vcpu);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 0d480e7..b5798e1 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -349,6 +349,39 @@ TRACE_EVENT(kvm_apic_accept_irq,
  __entry-coalesced ?  (coalesced) : )
 );
 
+/*
+ * Tracepoint for nested VMRUN
+ */
+TRACE_EVENT(kvm_nested_vmrun,
+   TP_PROTO(__u64 rip, __u64 vmcb, __u64 nested_rip, __u32 int_ctl,
+__u32 event_inj, bool npt),
+   TP_ARGS(rip, vmcb, nested_rip, int_ctl, event_inj, npt),
+
+   TP_STRUCT__entry(
+   __field(__u64,  rip )
+   __field(__u64,  vmcb)
+   __field(__u64,  nested_rip  )
+   __field(__u32,  int_ctl )
+   __field(__u32,  event_inj   )
+   __field(bool,   npt )
+   ),
+
+   TP_fast_assign(
+   __entry-rip= rip;
+   __entry-vmcb   = vmcb;
+   __entry-nested_rip = nested_rip;
+   __entry-int_ctl= int_ctl;
+   __entry-event_inj  = event_inj;
+   __entry-npt= npt;
+   ),
+
+   TP_printk(rip: 0x%016llx vmcb: 0x%016llx nrip: 0x%016llx int_ctl: 
0x%08x 
+ event_inj: 0x%08x npt: %s\n,
+   __entry-rip, __entry-vmcb, __entry-nested_rip,
+   __entry-int_ctl, __entry-event_inj,
+   __entry-npt ? on : off)
+);
+
 #endif /* _TRACE_KVM_H */
 
 /* This part must be outside protection */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4de5bc0..3ab2f90 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4984,3 +4984,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 38/42] KVM: SVM: Notify nested hypervisor of lost event injections

2009-11-16 Thread Avi Kivity

From: Alexander Graf ag...@suse.de

If event_inj is valid on a #vmexit the host CPU would write
the contents to exit_int_info, so the hypervisor knows that
the event wasn't injected.

We don't do this in nested SVM by now which is a bug and
fixed by this patch.

Signed-off-by: Alexander Graf ag...@suse.de
Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 279a2ae..e372854 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1615,6 +1615,22 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb-control.exit_info_2   = vmcb-control.exit_info_2;
nested_vmcb-control.exit_int_info = vmcb-control.exit_int_info;
nested_vmcb-control.exit_int_info_err = 
vmcb-control.exit_int_info_err;
+
+   /*
+* If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
+* to make sure that we do not lose injected events. So check event_inj
+* here and copy it to exit_int_info if it is valid.
+* Exit_int_info and event_inj can't be both valid because the case
+* below only happens on a VMRUN instruction intercept which has
+* no valid exit_int_info set.
+*/
+   if (vmcb-control.event_inj  SVM_EVTINJ_VALID) {
+   struct vmcb_control_area *nc = nested_vmcb-control;
+
+   nc-exit_int_info = vmcb-control.event_inj;
+   nc-exit_int_info_err = vmcb-control.event_inj_err;
+   }
+
nested_vmcb-control.tlb_ctl   = 0;
nested_vmcb-control.event_inj = 0;
nested_vmcb-control.event_inj_err = 0;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 35/42] KVM: x86: disable paravirt mmu reporting

2009-11-16 Thread Avi Kivity

From: Marcelo Tosatti mtosa...@redhat.com

Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a06f88e..4693f91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1238,8 +1238,8 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_NR_MEMSLOTS:
r = KVM_MEMORY_SLOTS;
break;
-   case KVM_CAP_PV_MMU:
-   r = !tdp_enabled;
+   case KVM_CAP_PV_MMU:/* obsolete */
+   r = 0;
break;
case KVM_CAP_IOMMU:
r = iommu_found();
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 36/42] KVM: x86: Rework guest single-step flag injection and filtering

2009-11-16 Thread Avi Kivity

From: Jan Kiszka jan.kis...@siemens.com

Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |3 ++
 arch/x86/kvm/x86.c  |   77 +++
 2 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e7f8708..179a919 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -614,6 +614,9 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, 
int *l);
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
 
+unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
+
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4693f91..385cd0a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -235,6 +235,25 @@ bool kvm_require_cpl(struct kvm_vcpu *vcpu, int 
required_cpl)
 }
 EXPORT_SYMBOL_GPL(kvm_require_cpl);
 
+unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
+{
+   unsigned long rflags;
+
+   rflags = kvm_x86_ops-get_rflags(vcpu);
+   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
+   rflags = ~(unsigned long)(X86_EFLAGS_TF | X86_EFLAGS_RF);
+   return rflags;
+}
+EXPORT_SYMBOL_GPL(kvm_get_rflags);
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
+   rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
+   kvm_x86_ops-set_rflags(vcpu, rflags);
+}
+EXPORT_SYMBOL_GPL(kvm_set_rflags);
+
 /*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
@@ -2777,7 +2796,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l);
 
vcpu-arch.emulate_ctxt.vcpu = vcpu;
-   vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu);
+   vcpu-arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
vcpu-arch.emulate_ctxt.mode =
(vcpu-arch.emulate_ctxt.eflags  X86_EFLAGS_VM)
? X86EMUL_MODE_REAL : cs_l
@@ -2855,7 +2874,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
return EMULATE_DO_MMIO;
}
 
-   kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
+   kvm_set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
 
if (vcpu-mmio_is_write) {
vcpu-mmio_needed = 0;
@@ -3291,7 +3310,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long 
msw,
   unsigned long *rflags)
 {
kvm_lmsw(vcpu, msw);
-   *rflags = kvm_x86_ops-get_rflags(vcpu);
+   *rflags = kvm_get_rflags(vcpu);
 }
 
 unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr)
@@ -3329,7 +3348,7 @@ void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, 
unsigned long val,
switch (cr) {
case 0:
kvm_set_cr0(vcpu, mk_cr_64(vcpu-arch.cr0, val));
-   *rflags = kvm_x86_ops-get_rflags(vcpu);
+   *rflags = kvm_get_rflags(vcpu);
break;
case 2:
vcpu-arch.cr2 = val;
@@ -3460,7 +3479,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
 {
struct kvm_run *kvm_run = vcpu-run;
 
-   kvm_run-if_flag = (kvm_x86_ops-get_rflags(vcpu)  X86_EFLAGS_IF) != 0;
+   kvm_run-if_flag = (kvm_get_rflags(vcpu)  X86_EFLAGS_IF) != 0;
kvm_run-cr8 = kvm_get_cr8(vcpu);
kvm_run-apic_base = kvm_get_apic_base(vcpu);
if (irqchip_in_kernel(vcpu-kvm))
@@ -3840,13 +3859,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 #endif
 
regs-rip = kvm_rip_read(vcpu);
-   regs-rflags = kvm_x86_ops-get_rflags(vcpu);
-
-   /*
-* Don't leak debug flags in case they were set for guest debugging
-*/
-   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
-   regs-rflags = ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
+   regs-rflags = kvm_get_rflags(vcpu);
 
vcpu_put(vcpu);
 
@@ -3874,12 +3887,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvm_register_write(vcpu, VCPU_REGS_R13, regs-r13);
kvm_register_write(vcpu, VCPU_REGS_R14, regs-r14);
kvm_register_write(vcpu, VCPU_REGS_R15, regs-r15);
-
 #endif
 
kvm_rip_write(vcpu, regs-rip);
-   kvm_x86_ops-set_rflags(vcpu, regs-rflags);
-
+

[PATCH 15/42] KVM: Move IO APIC to its own lock

2009-11-16 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

The allows removal of irq_lock from the injection path.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c |7 +---
 arch/x86/kvm/i8259.c |   22 +---
 arch/x86/kvm/lapic.c |5 +--
 arch/x86/kvm/x86.c   |   10 +
 virt/kvm/ioapic.c|   80 +++---
 virt/kvm/ioapic.h|4 ++
 virt/kvm/irq_comm.c  |   23 -
 7 files changed, 100 insertions(+), 51 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 0ad09f0..4a98314 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -851,8 +851,7 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm,
r = 0;
switch (chip-chip_id) {
case KVM_IRQCHIP_IOAPIC:
-   memcpy(chip-chip.ioapic, ioapic_irqchip(kvm),
-   sizeof(struct kvm_ioapic_state));
+   r = kvm_get_ioapic(kvm, chip-chip.ioapic);
break;
default:
r = -EINVAL;
@@ -868,9 +867,7 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct 
kvm_irqchip *chip)
r = 0;
switch (chip-chip_id) {
case KVM_IRQCHIP_IOAPIC:
-   memcpy(ioapic_irqchip(kvm),
-   chip-chip.ioapic,
-   sizeof(struct kvm_ioapic_state));
+   r = kvm_set_ioapic(kvm, chip-chip.ioapic);
break;
default:
r = -EINVAL;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index ccc941a..d057c0c 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -38,7 +38,15 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
s-isr_ack |= (1  irq);
if (s != s-pics_state-pics[0])
irq += 8;
+   /*
+* We are dropping lock while calling ack notifiers since ack
+* notifier callbacks for assigned devices call into PIC recursively.
+* Other interrupt may be delivered to PIC while lock is dropped but
+* it should be safe since PIC state is already updated at this stage.
+*/
+   spin_unlock(s-pics_state-lock);
kvm_notify_acked_irq(s-pics_state-kvm, SELECT_PIC(irq), irq);
+   spin_lock(s-pics_state-lock);
 }
 
 void kvm_pic_clear_isr_ack(struct kvm *kvm)
@@ -176,16 +184,18 @@ int kvm_pic_set_irq(void *opaque, int irq, int level)
 static inline void pic_intack(struct kvm_kpic_state *s, int irq)
 {
s-isr |= 1  irq;
-   if (s-auto_eoi) {
-   if (s-rotate_on_auto_eoi)
-   s-priority_add = (irq + 1)  7;
-   pic_clear_isr(s, irq);
-   }
/*
 * We don't clear a level sensitive interrupt here
 */
if (!(s-elcr  (1  irq)))
s-irr = ~(1  irq);
+
+   if (s-auto_eoi) {
+   if (s-rotate_on_auto_eoi)
+   s-priority_add = (irq + 1)  7;
+   pic_clear_isr(s, irq);
+   }
+
 }
 
 int kvm_pic_read_irq(struct kvm *kvm)
@@ -294,9 +304,9 @@ static void pic_ioport_write(void *opaque, u32 addr, u32 
val)
priority = get_priority(s, s-isr);
if (priority != 8) {
irq = (priority + s-priority_add)  7;
-   pic_clear_isr(s, irq);
if (cmd == 5)
s-priority_add = (irq + 1)  7;
+   pic_clear_isr(s, irq);
pic_update_irq(s-pics_state);
}
break;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 23c2176..df8bcb0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -471,11 +471,8 @@ static void apic_set_eoi(struct kvm_lapic *apic)
trigger_mode = IOAPIC_LEVEL_TRIG;
else
trigger_mode = IOAPIC_EDGE_TRIG;
-   if (!(apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI)) {
-   mutex_lock(apic-vcpu-kvm-irq_lock);
+   if (!(apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI))
kvm_ioapic_update_eoi(apic-vcpu-kvm, vector, trigger_mode);
-   mutex_unlock(apic-vcpu-kvm-irq_lock);
-   }
 }
 
 static void apic_send_ipi(struct kvm_lapic *apic)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1687d12..fdf989f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2038,9 +2038,7 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, 
struct kvm_irqchip *chip)
sizeof(struct kvm_pic_state));
break;
case KVM_IRQCHIP_IOAPIC:
-   memcpy(chip-chip.ioapic,
-   ioapic_irqchip(kvm),
-

[PATCH 31/42] KVM: Fix printk name error in svm.c

2009-11-16 Thread Avi Kivity

From: Zachary Amsden zams...@redhat.com

Signed-off-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9a4daca..d1036ce 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -330,13 +330,14 @@ static int svm_hardware_enable(void *garbage)
return -EBUSY;
 
if (!has_svm()) {
-   printk(KERN_ERR svm_cpu_init: err EOPNOTSUPP on %d\n, me);
+   printk(KERN_ERR svm_hardware_enable: err EOPNOTSUPP on %d\n,
+  me);
return -EINVAL;
}
svm_data = per_cpu(svm_data, me);
 
if (!svm_data) {
-   printk(KERN_ERR svm_cpu_init: svm_data is NULL on %d\n,
+   printk(KERN_ERR svm_hardware_enable: svm_data is NULL on %d\n,
   me);
return -EINVAL;
}
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 29/42] KVM: Separate timer intialization into an indepedent function

2009-11-16 Thread Avi Kivity

From: Zachary Amsden zams...@redhat.com

Signed-off-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/x86.c |   23 +++
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d83de8..6a31dfb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3118,9 +3118,22 @@ static struct notifier_block 
kvmclock_cpufreq_notifier_block = {
 .notifier_call  = kvmclock_cpufreq_notifier
 };
 
+static void kvm_timer_init(void)
+{
+   int cpu;
+
+   for_each_possible_cpu(cpu)
+   per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
+   if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+   tsc_khz_ref = tsc_khz;
+   cpufreq_register_notifier(kvmclock_cpufreq_notifier_block,
+ CPUFREQ_TRANSITION_NOTIFIER);
+   }
+}
+
 int kvm_arch_init(void *opaque)
 {
-   int r, cpu;
+   int r;
struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque;
 
if (kvm_x86_ops) {
@@ -3152,13 +3165,7 @@ int kvm_arch_init(void *opaque)
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0);
 
-   for_each_possible_cpu(cpu)
-   per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
-   if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
-   tsc_khz_ref = tsc_khz;
-   cpufreq_register_notifier(kvmclock_cpufreq_notifier_block,
- CPUFREQ_TRANSITION_NOTIFIER);
-   }
+   kvm_timer_init();
 
return 0;
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/42] KVM: SVM: remove needless mmap_sem acquision from nested_svm_map

2009-11-16 Thread Avi Kivity

From: Marcelo Tosatti mtosa...@redhat.com

nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Acked-by: Alexander Graf ag...@suse.de
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/svm.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 92048a6..f54c4f9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1396,10 +1396,7 @@ static void *nested_svm_map(struct vcpu_svm *svm, u64 
gpa, enum km_type idx)
 {
struct page *page;
 
-   down_read(current-mm-mmap_sem);
page = gfn_to_page(svm-vcpu.kvm, gpa  PAGE_SHIFT);
-   up_read(current-mm-mmap_sem);
-
if (is_error_page(page))
goto error;
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 33/42] KVM: remove pre_task_link setting in save_state_to_tss16

2009-11-16 Thread Avi Kivity

From: Juan Quintela quint...@redhat.com

Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37a149e8b56436fabf093532483bff13b0
  Author: Gleb Natapov g...@redhat.com
  Date:   Mon Mar 30 16:03:24 2009 +0300

KVM: Fix task switch back link handling.

CC: Gleb Natapov g...@redhat.com
Signed-off-by: Juan Quintela quint...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/x86.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f75856..5f44d56 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4203,7 +4203,6 @@ static void save_state_to_tss16(struct kvm_vcpu *vcpu,
tss-ss = get_segment_selector(vcpu, VCPU_SREG_SS);
tss-ds = get_segment_selector(vcpu, VCPU_SREG_DS);
tss-ldt = get_segment_selector(vcpu, VCPU_SREG_LDTR);
-   tss-prev_task_link = get_segment_selector(vcpu, VCPU_SREG_TR);
 }
 
 static int load_state_from_tss16(struct kvm_vcpu *vcpu,
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 32/42] KVM: Fix hotplug of CPUs

2009-11-16 Thread Avi Kivity

From: Zachary Amsden zams...@redhat.com

Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.

Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.

Signed-off-by: Zachary Amsden zams...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 arch/x86/kvm/svm.c |4 ++--
 arch/x86/kvm/vmx.c |6 --
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1036ce..02a4269 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -482,7 +482,7 @@ static __init int svm_hardware_setup(void)
kvm_enable_efer_bits(EFER_SVME);
}
 
-   for_each_online_cpu(cpu) {
+   for_each_possible_cpu(cpu) {
r = svm_cpu_init(cpu);
if (r)
goto err;
@@ -516,7 +516,7 @@ static __exit void svm_hardware_unsetup(void)
 {
int cpu;
 
-   for_each_online_cpu(cpu)
+   for_each_possible_cpu(cpu)
svm_cpu_uninit(cpu);
 
__free_pages(pfn_to_page(iopm_base  PAGE_SHIFT), IOPM_ALLOC_ORDER);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a187570..97f4265 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1350,15 +1350,17 @@ static void free_kvm_area(void)
 {
int cpu;
 
-   for_each_online_cpu(cpu)
+   for_each_possible_cpu(cpu) {
free_vmcs(per_cpu(vmxarea, cpu));
+   per_cpu(vmxarea, cpu) = NULL;
+   }
 }
 
 static __init int alloc_kvm_area(void)
 {
int cpu;
 
-   for_each_online_cpu(cpu) {
+   for_each_possible_cpu(cpu) {
struct vmcs *vmcs;
 
vmcs = alloc_vmcs_cpu(cpu);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/42] KVM: Move irq routing data structure to rcu locking

2009-11-16 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 virt/kvm/irq_comm.c |   16 +++-
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 59cf8da..fb861dd 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -159,7 +159,8 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
 * IOAPIC.  So set the bit in both. The guest will ignore
 * writes to the unused one.
 */
-   irq_rt = kvm-irq_routing;
+   rcu_read_lock();
+   irq_rt = rcu_dereference(kvm-irq_routing);
if (irq  irq_rt-nr_rt_entries)
hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
int r = e-set(e, kvm, irq_source_id, level);
@@ -168,6 +169,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
 
ret = r + ((ret  0) ? 0 : ret);
}
+   rcu_read_unlock();
return ret;
 }
 
@@ -179,7 +181,10 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned 
irqchip, unsigned pin)
 
trace_kvm_ack_irq(irqchip, pin);
 
-   gsi = kvm-irq_routing-chip[irqchip][pin];
+   rcu_read_lock();
+   gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin];
+   rcu_read_unlock();
+
if (gsi != -1)
hlist_for_each_entry(kian, n, kvm-arch.irq_ack_notifier_list,
 link)
@@ -279,9 +284,9 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool 
mask)
 
 void kvm_free_irq_routing(struct kvm *kvm)
 {
-   mutex_lock(kvm-irq_lock);
+   /* Called only during vm destruction. Nobody can use the pointer
+  at this stage */
kfree(kvm-irq_routing);
-   mutex_unlock(kvm-irq_lock);
 }
 
 static int setup_routing_entry(struct kvm_irq_routing_table *rt,
@@ -387,8 +392,9 @@ int kvm_set_irq_routing(struct kvm *kvm,
 
mutex_lock(kvm-irq_lock);
old = kvm-irq_routing;
-   kvm-irq_routing = new;
+   rcu_assign_pointer(kvm-irq_routing, new);
mutex_unlock(kvm-irq_lock);
+   synchronize_rcu();
 
new = old;
r = 0;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/42] KVM: Convert irq notifiers lists to RCU locking

2009-11-16 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Use RCU locking for mask/ack notifiers lists.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 virt/kvm/irq_comm.c |   22 --
 1 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index f019725..6c94614 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -183,19 +183,19 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned 
irqchip, unsigned pin)
 
rcu_read_lock();
gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin];
-   rcu_read_unlock();
-
if (gsi != -1)
-   hlist_for_each_entry(kian, n, kvm-irq_ack_notifier_list, link)
+   hlist_for_each_entry_rcu(kian, n, kvm-irq_ack_notifier_list,
+link)
if (kian-gsi == gsi)
kian-irq_acked(kian);
+   rcu_read_unlock();
 }
 
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian)
 {
mutex_lock(kvm-irq_lock);
-   hlist_add_head(kian-link, kvm-irq_ack_notifier_list);
+   hlist_add_head_rcu(kian-link, kvm-irq_ack_notifier_list);
mutex_unlock(kvm-irq_lock);
 }
 
@@ -203,8 +203,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian)
 {
mutex_lock(kvm-irq_lock);
-   hlist_del_init(kian-link);
+   hlist_del_init_rcu(kian-link);
mutex_unlock(kvm-irq_lock);
+   synchronize_rcu();
 }
 
 int kvm_request_irq_source_id(struct kvm *kvm)
@@ -257,7 +258,7 @@ void kvm_register_irq_mask_notifier(struct kvm *kvm, int 
irq,
 {
mutex_lock(kvm-irq_lock);
kimn-irq = irq;
-   hlist_add_head(kimn-link, kvm-mask_notifier_list);
+   hlist_add_head_rcu(kimn-link, kvm-mask_notifier_list);
mutex_unlock(kvm-irq_lock);
 }
 
@@ -265,8 +266,9 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int 
irq,
  struct kvm_irq_mask_notifier *kimn)
 {
mutex_lock(kvm-irq_lock);
-   hlist_del(kimn-link);
+   hlist_del_rcu(kimn-link);
mutex_unlock(kvm-irq_lock);
+   synchronize_rcu();
 }
 
 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask)
@@ -274,11 +276,11 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, 
bool mask)
struct kvm_irq_mask_notifier *kimn;
struct hlist_node *n;
 
-   WARN_ON(!mutex_is_locked(kvm-irq_lock));
-
-   hlist_for_each_entry(kimn, n, kvm-mask_notifier_list, link)
+   rcu_read_lock();
+   hlist_for_each_entry_rcu(kimn, n, kvm-mask_notifier_list, link)
if (kimn-irq == irq)
kimn-func(kimn, mask);
+   rcu_read_unlock();
 }
 
 void kvm_free_irq_routing(struct kvm *kvm)
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/42] KVM: Return -ENOTTY on unrecognized ioctls

2009-11-16 Thread Avi Kivity

Not the incorrect -EINVAL.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c   |2 +-
 arch/powerpc/kvm/powerpc.c |2 +-
 arch/s390/kvm/kvm-s390.c   |2 +-
 arch/x86/kvm/x86.c |2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index f534e0f..f6471c8 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -941,7 +941,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 {
struct kvm *kvm = filp-private_data;
void __user *argp = (void __user *)arg;
-   int r = -EINVAL;
+   int r = -ENOTTY;
 
switch (ioctl) {
case KVM_SET_MEMORY_REGION: {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2a4551f..95af622 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -421,7 +421,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
switch (ioctl) {
default:
-   r = -EINVAL;
+   r = -ENOTTY;
}
 
return r;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 07ced89..00e2ce8 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -150,7 +150,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
break;
}
default:
-   r = -EINVAL;
+   r = -ENOTTY;
}
 
return r;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5beb4c1..829e306 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2176,7 +2176,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 {
struct kvm *kvm = filp-private_data;
void __user *argp = (void __user *)arg;
-   int r = -EINVAL;
+   int r = -ENOTTY;
/*
 * This union makes it completely explicit to gcc-3.x
 * that these two variables' stack usage should be
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/42] KVM: Call pic_clear_isr() on pic reset to reuse logic there

2009-11-16 Thread Avi Kivity

From: Gleb Natapov g...@redhat.com

Also move call of ack notifiers after pic state change.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/i8259.c |   22 +-
 1 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 01f1516..ccc941a 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -225,22 +225,11 @@ int kvm_pic_read_irq(struct kvm *kvm)
 
 void kvm_pic_reset(struct kvm_kpic_state *s)
 {
-   int irq, irqbase, n;
+   int irq;
struct kvm *kvm = s-pics_state-irq_request_opaque;
struct kvm_vcpu *vcpu0 = kvm-bsp_vcpu;
+   u8 irr = s-irr, isr = s-imr;
 
-   if (s == s-pics_state-pics[0])
-   irqbase = 0;
-   else
-   irqbase = 8;
-
-   for (irq = 0; irq  PIC_NUM_PINS/2; irq++) {
-   if (vcpu0  kvm_apic_accept_pic_intr(vcpu0))
-   if (s-irr  (1  irq) || s-isr  (1  irq)) {
-   n = irq + irqbase;
-   kvm_notify_acked_irq(kvm, SELECT_PIC(n), n);
-   }
-   }
s-last_irr = 0;
s-irr = 0;
s-imr = 0;
@@ -256,6 +245,13 @@ void kvm_pic_reset(struct kvm_kpic_state *s)
s-rotate_on_auto_eoi = 0;
s-special_fully_nested_mode = 0;
s-init4 = 0;
+
+   for (irq = 0; irq  PIC_NUM_PINS/2; irq++) {
+   if (vcpu0  kvm_apic_accept_pic_intr(vcpu0))
+   if (irr  (1  irq) || isr  (1  irq)) {
+   pic_clear_isr(s, irq);
+   }
+   }
 }
 
 static void pic_ioport_write(void *opaque, u32 addr, u32 val)
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] KVM Fault Tolerance: Kemari for KVM

2009-11-16 Thread Fernando Luis Vázquez Cao


Avi Kivity wrote:

On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote:


Kemari runs paired virtual machines in an active-passive configuration
and achieves whole-system replication by continuously copying the
state of the system (dirty pages and the state of the virtual devices)
from the active node to the passive node. An interesting implication
of this is that during normal operation only the active node is
actually executing code.



Can you characterize the performance impact for various workloads?  I 
assume you are running continuously in log-dirty mode.  Doesn't this 
make memory intensive workloads suffer?


Yes, we're running continuously in log-dirty mode.

We still do not have numbers to show for KVM, but
the snippets below from several runs of lmbench
using Xen+Kemari will give you an idea of what you
can expect in terms of overhead. All the tests were
run using a fully virtualized Debian guest with
hardware nested paging enabled.

 fork exec   shP/F  C/S   [us]
--
Base  114  349 1197 1.2845  8.2
Kemari(10GbE) + FC141  403 1280 1.2835 11.6
Kemari(10GbE) + DRBD  161  415 1388 1.3145 11.6
Kemari(1GbE) + FC 151  410 1335 1.3370 11.5
Kemari(1GbE) + DRBD   162  413 1318 1.3239 11.6
* P/F=page fault, C/S=context switch

The benchmarks above are memory intensive and, as you
can see, the overhead varies widely from 7% to 40%.
We also measured CPU bound operations, but, as expected,
Kemari incurred almost no overhead.


The synchronization process can be broken down as follows:

  - Event tapping: On KVM all I/O generates a VMEXIT that is
synchronously handled by the Linux kernel monitor i.e. KVM (it is
worth noting that this applies to virtio devices too, because they
use MMIO and PIO just like a regular PCI device).


Some I/O (virtio-based) is asynchronous, but you still have well-known 
tap points within qemu.


Yep, and in some cases we have polling from the backend, which I forgot to
mention in the RFC.


  - Notification to qemu: Taking a page from live migration's
playbook, the synchronization process is user-space driven, which
means that qemu needs to be woken up at each synchronization
point. That is already the case for qemu-emulated devices, but we
also have in-kernel emulators. To compound the problem, even for
user-space emulated devices accesses to coalesced MMIO areas can
not be detected. As a consequence we need a mechanism to
communicate KVM-handled events to qemu.


Do you mean the ioapic, pic, and lapic?


Well, I was more worried about the in-kernel backends currently in the
works. To save the state of those devices we could leverage qemu's vmstate
infrastructure and even reuse struct VMStateDescription's pre_save()
callback, but we would like to pass the device state through the kvm_run
area to avoid a ioctl call right after returning to user space.


Perhaps its best to start with those in userspace (-no-kvm-irqchip).


That's precisely what we were planning to do. Once we get a working
prototype we will take care of existing optimizations such as in-kernel
emulators and add our own.


Why is access to those chips considered a synchronization point?


The main problem with those is that to get the chip state we
use an ioctl when we could have copied it to qemu's memory
before going back to user space. Not all accesses to those chips
need to be treated as synchronization points.


  - Virtual machine synchronization: All the dirty pages since the
last synchronization point and the state of the virtual devices is
sent to the fallback node from the user-space qemu process. For this
the existing savevm infrastructure and KVM's dirty page tracking
capabilities can be reused. Regarding in-kernel devices, with the
likely advent of in-kernel virtio backends we need a generic way
to access their state from user-space, for which, again, the kvm_run
share memory area could be used.


I wonder if you can pipeline dirty memory synchronization.  That is, 
write-protect those pages that are dirty, start copying them to the 
other side, and continue execution, copying memory if the guest faults 
it again.


Asynchronous transmission of dirty pages would be really helpful to
eliminate the performance hiccups that tend to occur at synchronization
points. What we can do is to copy dirty pages asynchronously until we reach
a synchronization point, where we need to stop the guest and send the
remaining dirty pages and the state of devices to the other side.

However, we can not delay the transmission of a dirty page across a
synchronization point, because if the primary node crashed before the
page reached the fallback node the I/O operation that caused the
synchronization point cannot be replayed reliably.

How many pages do you copy per synchronization point for reasonably 
difficult workloads?


That is very

kvm segmentation fault on pci-hotplug add/del/re-add

2009-11-16 Thread Thomas Besser

Hi,

running kvm 0.11.0 from http://www.corpit.ru/debian/tls/ on debian lenny
host, guest also debian lenny (with virtio and pci_hotplug, 2.6.30)

Starting the guest with:
/usr/bin/kvm -name test -vnc :5 -net
nic,vlan=0,model=virtio,macaddr=00:08:15:00:09:95 -net
tap,vlan=0,ifname=tap05,script=no -m 256 -drive
file=/vsimages/test/test.raw,if=virtio,boot=on -monitor
tcp:127.0.0.1:3,server,nowait -vga std -k de -tdf

1. Adding hdd

On qemu-console: 
pci_add auto storage file=test2.raw,if=virtio
OK domain 0, bus 0, slot 6, function 0

In guest:
pci :00:06.0: reg 10 io port: [0x00-0x3ff]
pci :00:02.0: BAR 6: bogus alignment [0x0-0x0] flags 0x2
decode_hpp: Could not get hotplug parameters. Use defaults
virtio-pci :00:06.0: enabling device ( - 0001)
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
virtio-pci :00:06.0: PCI INT A - Link[LNKB] - GSI 11 (level, high) -
IRQ 11
  vdb: vdb1

2. Deleting hdd

On qemu-console: pci_del 0:6
In guest: virtio-pci :00:06.0: PCI INT A disabled

3. Re-adding the same hdd fails

qemu-console: pci_add auto storage file=test2.raw,if=virtio
host: Segmentation fault

Running with gdb gives:
dbg:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc15550a6e0 (LWP 9085)]
virtio_blk_init (dev=0xcb2010)
at /build/kvm/qemu-kvm-0.11.0/build/hw/virtio-blk.c:439
439 /build/kvm/qemu-kvm-0.11.0/build/hw/virtio-blk.c: No such file or
directory.
in /build/kvm/qemu-kvm-0.11.0/build/hw/virtio-blk.c

Is this problem known? 
Is pci-hotplug a qemu or kvm feature? 
Where to file a bug?

Regards
Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] KVM Fault Tolerance: Kemari for KVM

2009-11-16 Thread Avi Kivity


On 11/16/2009 04:18 PM, Fernando Luis Vázquez Cao wrote:

Avi Kivity wrote:

On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote:


Kemari runs paired virtual machines in an active-passive configuration
and achieves whole-system replication by continuously copying the
state of the system (dirty pages and the state of the virtual devices)
from the active node to the passive node. An interesting implication
of this is that during normal operation only the active node is
actually executing code.



Can you characterize the performance impact for various workloads?  I 
assume you are running continuously in log-dirty mode.  Doesn't this 
make memory intensive workloads suffer?


Yes, we're running continuously in log-dirty mode.

We still do not have numbers to show for KVM, but
the snippets below from several runs of lmbench
using Xen+Kemari will give you an idea of what you
can expect in terms of overhead. All the tests were
run using a fully virtualized Debian guest with
hardware nested paging enabled.

 fork exec   shP/F  C/S   [us]
--
Base  114  349 1197 1.2845  8.2
Kemari(10GbE) + FC141  403 1280 1.2835 11.6
Kemari(10GbE) + DRBD  161  415 1388 1.3145 11.6
Kemari(1GbE) + FC 151  410 1335 1.3370 11.5
Kemari(1GbE) + DRBD   162  413 1318 1.3239 11.6
* P/F=page fault, C/S=context switch

The benchmarks above are memory intensive and, as you
can see, the overhead varies widely from 7% to 40%.
We also measured CPU bound operations, but, as expected,
Kemari incurred almost no overhead.


Is lmbench fork that memory intensive?

Do you have numbers for benchmarks that use significant anonymous RSS?  
Say, a parallel kernel build.


Note that scaling vcpus will increase a guest's memory-dirtying power 
but snapshot rate will not scale in the same way.



  - Notification to qemu: Taking a page from live migration's
playbook, the synchronization process is user-space driven, which
means that qemu needs to be woken up at each synchronization
point. That is already the case for qemu-emulated devices, but we
also have in-kernel emulators. To compound the problem, even for
user-space emulated devices accesses to coalesced MMIO areas can
not be detected. As a consequence we need a mechanism to
communicate KVM-handled events to qemu.


Do you mean the ioapic, pic, and lapic?


Well, I was more worried about the in-kernel backends currently in the
works. To save the state of those devices we could leverage qemu's 
vmstate

infrastructure and even reuse struct VMStateDescription's pre_save()
callback, but we would like to pass the device state through the kvm_run
area to avoid a ioctl call right after returning to user space.


Hm, let's defer all that until we have something working so we can 
estimate the impact of userspace virtio in those circumstances.



Why is access to those chips considered a synchronization point?


The main problem with those is that to get the chip state we
use an ioctl when we could have copied it to qemu's memory
before going back to user space. Not all accesses to those chips
need to be treated as synchronization points.


Ok.  Note that piggybacking on an exit will work for the lapic, but not 
for the global irqchips (ioapic, pic) since they can still be modified 
by another vcpu.


I wonder if you can pipeline dirty memory synchronization.  That is, 
write-protect those pages that are dirty, start copying them to the 
other side, and continue execution, copying memory if the guest 
faults it again.


Asynchronous transmission of dirty pages would be really helpful to
eliminate the performance hiccups that tend to occur at synchronization
points. What we can do is to copy dirty pages asynchronously until we 
reach

a synchronization point, where we need to stop the guest and send the
remaining dirty pages and the state of devices to the other side.

However, we can not delay the transmission of a dirty page across a
synchronization point, because if the primary node crashed before the
page reached the fallback node the I/O operation that caused the
synchronization point cannot be replayed reliably.


What I mean is:

- choose synchronization point A
- start copying memory for synchronization point A
  - output is delayed
- choose synchronization point B
- copy memory for A and B
   if guest touches memory not yet copied for A, COW it
- once A copying is complete, release A output
- continue copying memory for B
- choose synchronization point B

by keeping two synchronization points active, you don't have any 
pauses.  The cost is maintaining copy-on-write so we can copy dirty 
pages for A while keeping execution.


How many pages do you copy per synchronization point for reasonably 
difficult workloads?


That is very workload-dependent, but if you take a look at the examples
below you can get a feeling of how Kemari behaves.

IOzoneKemari sync

buildbot failure in qemu-kvm on default_i386_debian_5_0

2009-11-16 Thread qemu-kvm

The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/158

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: 
Build Source Stamp: [branch next] HEAD
Blamelist: Avi Kivity a...@redhat.com

BUILD FAILED: failed git

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state

2009-11-16 Thread Jan Kiszka

This patch aims at addressing the mp_state writeback issue in a cleaner
fashion. By introducing additional information about the scope of the
scheduled vcpu state writeback, we can simply skin mp_state (and maybe
other specific states in the future) when updating the in-kernel state.

The writeback scope is defined when calling cpu_synchronize_state. It
accumulated, ie. once a full writeback was requested, this will stick
until it was performed.

This unbreaks --disable-kvm builds of qemu-kvm again.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

A corresponding upstream patch is ready to be posted as well, just
waiting for comments on the general direction from KVM POV.

 cpu-defs.h|2 +-
 exec.c|4 ++--
 gdbstub.c |8 
 hw/apic.c |5 ++---
 hw/pc.c   |2 +-
 monitor.c |6 ++
 qemu-kvm-ia64.c   |2 ++
 qemu-kvm-x86.c|6 --
 qemu-kvm.c|   44 +---
 qemu-kvm.h|   13 ++---
 target-i386/helper.c  |2 +-
 target-i386/machine.c |7 ++-
 target-ppc/machine.c  |4 ++--
 13 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index cf502e9..b7cda81 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -142,7 +142,7 @@ struct KVMCPUState {
 pthread_t thread;
 int signalled;
 struct qemu_work_item *queued_work_first, *queued_work_last;
-int regs_modified;
+int writeback_scope;
 };
 
 #define CPU_TEMP_BUF_NLONGS 128
diff --git a/exec.c b/exec.c
index fcffb0f..290a565 100644
--- a/exec.c
+++ b/exec.c
@@ -529,14 +529,14 @@ static void cpu_common_pre_save(void *opaque)
 {
 CPUState *env = opaque;
 
-cpu_synchronize_state(env);
+cpu_synchronize_state(env, CPU_SYNC_RUNTIME);
 }
 
 static int cpu_common_pre_load(void *opaque)
 {
 CPUState *env = opaque;
 
-cpu_synchronize_state(env);
+cpu_synchronize_state(env, CPU_SYNC_RESET);
 return 0;
 }
 
diff --git a/gdbstub.c b/gdbstub.c
index ad7cdca..5a3e5ee 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -1598,7 +1598,7 @@ static void gdb_breakpoint_remove_all(void)
 static void gdb_set_cpu_pc(GDBState *s, target_ulong pc)
 {
 #if defined(TARGET_I386)
-cpu_synchronize_state(s-c_cpu);
+cpu_synchronize_state(s-c_cpu, CPU_SYNC_RUNTIME);
 s-c_cpu-eip = pc;
 #elif defined (TARGET_PPC)
 s-c_cpu-nip = pc;
@@ -1785,7 +1785,7 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 }
 break;
 case 'g':
-cpu_synchronize_state(s-g_cpu);
+cpu_synchronize_state(s-g_cpu, CPU_SYNC_RUNTIME);
 len = 0;
 for (addr = 0; addr  num_g_regs; addr++) {
 reg_size = gdb_read_register(s-g_cpu, mem_buf + len, addr);
@@ -1795,7 +1795,7 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 put_packet(s, buf);
 break;
 case 'G':
-cpu_synchronize_state(s-g_cpu);
+cpu_synchronize_state(s-g_cpu, CPU_SYNC_RUNTIME);
 registers = mem_buf;
 len = strlen(p) / 2;
 hextomem((uint8_t *)registers, p, len);
@@ -1959,7 +1959,7 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 thread = strtoull(p+16, (char **)p, 16);
 env = find_cpu(thread);
 if (env != NULL) {
-cpu_synchronize_state(env);
+cpu_synchronize_state(env, CPU_SYNC_RUNTIME);
 len = snprintf((char *)mem_buf, sizeof(mem_buf),
CPU#%d [%s], env-cpu_index,
env-halted ? halted  : running);
diff --git a/hw/apic.c b/hw/apic.c
index f7cb9d2..abebde3 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -488,7 +488,7 @@ void apic_init_reset(CPUState *env)
 if (!s)
 return;
 
-cpu_synchronize_state(env);
+cpu_synchronize_state(env, CPU_SYNC_RESET);
 s-tpr = 0;
 s-spurious_vec = 0xff;
 s-log_dest = 0;
@@ -512,7 +512,6 @@ void apic_init_reset(CPUState *env)
 if (kvm_enabled()  kvm_irqchip_in_kernel()) {
 env-mp_state
 = env-halted ? KVM_MP_STATE_UNINITIALIZED : KVM_MP_STATE_RUNNABLE;
-kvm_load_mpstate(env);
 }
 #endif
 }
@@ -1070,7 +1069,7 @@ static void apic_reset(void *opaque)
 APICState *s = opaque;
 int bsp;
 
-cpu_synchronize_state(s-cpu_env);
+cpu_synchronize_state(s-cpu_env, CPU_SYNC_RESET);
 
 bsp = cpu_is_bsp(s-cpu_env);
 s-apicbase = 0xfee0 |
diff --git a/hw/pc.c b/hw/pc.c
index 5d90f8c..23d4a8e 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1021,7 +1021,7 @@ CPUState *pc_new_cpu(const char *cpu_model)
 fprintf(stderr, Unable to find x86 CPU definition\n);
 exit(1);
 }
-env-kvm_cpu_state.regs_modified = 1;
+env-kvm_cpu_state.writeback_scope = CPU_SYNC_RESET;
 if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
 env-cpuid_apic_id =

[PATCH] Fix qemu user mode build

2009-11-16 Thread Jan Kiszka

This hunk is bogus, probably some merge left-over.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 Makefile.target |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 9e7481e..17ffece 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -102,7 +102,6 @@ 
VPATH+=:$(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR)
 QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user 
-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR)
 obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
   elfload.o linuxload.o uaccess.o gdbstub.o
-obj-$(CONFIG_CPU_EMULATION) += tcg-runtime.o
 obj-y += host-utils.o
 
 obj-$(TARGET_HAS_BFLT) += flatload.o
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio disk slower than IDE?

2009-11-16 Thread john cooper

[prior attempts from elsewhere kept bouncing, apologies for any replication]

Gordan Bobic wrote:
 The test is building the Linux kernel (only taking the second run to give the 
 test the benefit of local cache):

 make clean; make -j8 all; make clean; sync; time make -j8 all

 This takes about 10 minutes with IDE disk emulation and about 13 minutes with 
 virtio. I ran the tests multiple time with most non-essential services on the 
 host switched off (including cron/atd), and the guest in single-user mode to 
 reduce the noise in the test to the minimum, and the results are pretty 
 consistent, with virtio being about 30% behind.

I'd expect for an observed 30% wall clock time difference
of an operation as complex as a kernel build the base i/o
throughput disparity is substantially greater.  Did you
try a more simple/regular load, eg: a streaming dd read
of various block sizes from guest raw disk devices?
This is also considerably easier to debug vs. the complex
i/o load generated by a build.

One way to chop up the problem space is using blktrace
on the host to observe both the i/o patterns coming out
of qemu and the host's response to them in terms of
turn around time.  I expect you'll see somewhat different
nature requests generated by qemu w/r/t blocking and
number of threads serving virtio_blk requests relative
to ide but the host response should be essentially the
same in terms of data returned per unit time.

If the host looks to be turning around i/o request with
similar latency in both cases, the problem would be lower
frequency of requests generated by qemu in the case of
virtio_blk.   Here it would be useful to know the host
load generated by the guest for both cases.

-john


-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio disk slower than IDE?

2009-11-16 Thread john cooper


Gordan Bobic wrote:
The test is building the Linux kernel (only taking the second run to 
give the test the benefit of local cache):


make clean; make -j8 all; make clean; sync; time make -j8 all

This takes about 10 minutes with IDE disk emulation and about 13 minutes 
with virtio. I ran the tests multiple time with most non-essential 
services on the host switched off (including cron/atd), and the guest in 
single-user mode to reduce the noise in the test to the minimum, and 
the results are pretty consistent, with virtio being about 30% behind.


I'd expect for an observed 30% wall clock time difference
of an operation as complex as a kernel build the base i/o
throughput disparity is substantially greater.  Did you
try a more simple/regular load, eg: a streaming dd read
of various block sizes from guest raw disk devices?
This is also considerably easier to debug vs. the complex
i/o load generated by a build.

One way to chop up the problem space is using blktrace
on the host to observe both the i/o patterns coming out
of qemu and the host's response to them in terms of
turn around time.  I expect you'll see somewhat different
nature requests generated by qemu w/r/t blocking and
number of threads serving virtio_blk requests relative
to ide but the host response should be essentially the
same in terms of data returned per unit time.

If the host looks to be turning around i/o request with
similar latency in both cases, the problem would be lower
frequency of requests generated by qemu in the case of
virtio_blk.   Here it would be useful to know the host
load generated by the guest for both cases.

-john


--
john.coo...@third-harmonic.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio disk slower than IDE?

2009-11-16 Thread Charles Duffy


Gordan Bobic wrote:
Lastly, do you use cache=wb on qemu? it's just a fun mode, we use 
cache=off only.


I don't see the option being set in the logs, so I'd guess it's whatever 
qemu-kvm defaults to.


You can set this through libvirt by putting an element such as the 
following within your disk element:


  driver name='qemu' type='qcow2' cache='none'/

(Setting the type is preferred to avoid security issues wherein a guest 
writes an arbitrary qcow2 header to the beginning of a raw disk, reboots 
and allows qemu's autodetection to decide that this formerly-raw disk 
should now be treated as a delta against a file they otherwise might not 
have access to read; as such, it's particularly important if you intend 
that the type be raw).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state

2009-11-16 Thread Alexander Graf



Am 16.11.2009 um 18:00 schrieb Jan Kiszka jan.kis...@siemens.com:

This patch aims at addressing the mp_state writeback issue in a  
cleaner

fashion. By introducing additional information about the scope of the
scheduled vcpu state writeback, we can simply skin mp_state (and maybe
other specific states in the future) when updating the in-kernel  
state.


The writeback scope is defined when calling cpu_synchronize_state. It
accumulated, ie. once a full writeback was requested, this will stick
until it was performed.

This unbreaks --disable-kvm builds of qemu-kvm again.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

A corresponding upstream patch is ready to be posted as well, just
waiting for comments on the general direction from KVM POV.


I think I'd rather have a sync function that implicitly does the  
RUNTIME sync, the way it is now, and an 'advanced' one you can pass a  
constant what it syncs.


That way most code continues to work the way it is now. It also makes  
it easier readable IMHO.



Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state

2009-11-16 Thread Avi Kivity


On 11/16/2009 07:00 PM, Jan Kiszka wrote:

This patch aims at addressing the mp_state writeback issue in a cleaner
fashion.


What's the issue?  the fact that mp_state is updated whenever state is 
synchronized, while it could be simultaneously updated from other vcpus 
(which latter updates are then lost)?



By introducing additional information about the scope of the
scheduled vcpu state writeback, we can simply skin mp_state (and maybe
other specific states in the future) when updating the in-kernel state.

The writeback scope is defined when calling cpu_synchronize_state. It
accumulated, ie. once a full writeback was requested, this will stick
until it was performed.
   


Maybe it's just simpler to divorce mp_state from the rest of the state.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] viostor driver. XP driver performance improvement.

2009-11-16 Thread Vadim Rozenfeld




repository: /home/vadimr/shares/kvm-guest-drivers-windows
branch: XP
commit e58183a0df0fd398e31e3f079b6c301520b2a5a2
Author: Vadim Rozenfeldvroze...@redhat.com
Date:   Sun Nov 15 22:50:55 2009 +0200

[PATCH] viostor driver. XP driver performance improvement.

Signed-off-by: Vadim Rozenfeldvroze...@redhat.com

diff --git a/viostor/virtio_ring.c b/viostor/virtio_ring.c
index 2911cef..de69557 100644
--- a/viostor/virtio_ring.c
+++ b/viostor/virtio_ring.c
@@ -39,7 +39,6 @@ initialize_virtqueue(
 IN VOID (*notify)(struct virtqueue *));


-//#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 #define to_vvq(_vq) (struct vring_virtqueue *)_vq

 static
@@ -119,7 +118,7 @@ vring_add_buf(
 RhelDbgPrint(TRACE_LEVEL_VERBOSE, (%s: Added buffer head %i to %p\n,
 __FUNCTION__, head, vq) );

-return 0;
+return vq-num_free;
 }

 static
diff --git a/viostor/virtio_stor_hw_helper.c b/viostor/virtio_stor_hw_helper.c
index 21d27cd..2e61b30 100644
--- a/viostor/virtio_stor_hw_helper.c
+++ b/viostor/virtio_stor_hw_helper.c
@@ -54,6 +54,8 @@ RhelDoReadWrite(PVOID DeviceExtension,
 PVOID DataBuffer;
 PADAPTER_EXTENSIONadaptExt;
 PRHEL_SRB_EXTENSION   srbExt;
+int   num_free;
+
 cdb  = (PCDB)Srb-Cdb[0];
 srbExt   = (PRHEL_SRB_EXTENSION)Srb-SrbExtension;
 adaptExt = (PADAPTER_EXTENSION)DeviceExtension;
@@ -88,20 +90,21 @@ RhelDoReadWrite(PVOID DeviceExtension,

 srbExt-vbr.sg[sgElement].physAddr = ScsiPortGetPhysicalAddress(DeviceExtension, 
NULL,srbExt-vbr.status,fragLen);
 srbExt-vbr.sg[sgElement].ulSize = sizeof(srbExt-vbr.status);
-
-
-if (adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq,
+num_free = 
adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq,
   srbExt-vbr.sg[0],
   srbExt-out, srbExt-in,
-srbExt-vbr) == 0) {
+srbExt-vbr);
+if ( num_free= 0) {
 InsertTailList(adaptExt-list_head,srbExt-vbr.list_entry);
 adaptExt-pci_vq_info.vq-vq_ops-kick(adaptExt-pci_vq_info.vq);
-if(++adaptExt-requests  adaptExt-queue_depth) {
+srbExt-call_next = FALSE;
+if(num_free  VIRTIO_MAX_SG) {
+   srbExt-call_next = TRUE;
+} else {
ScsiPortNotification(NextLuRequest, DeviceExtension, Srb-PathId, 
Srb-TargetId, Srb-Lun);
 }
-return TRUE;
 }
-return FALSE;
+return TRUE;
 }
 #endif


diff --git a/viostor/virtio_ring.c b/viostor/virtio_ring.c
index 2911cef..de69557 100644
--- a/viostor/virtio_ring.c
+++ b/viostor/virtio_ring.c
@@ -39,7 +39,6 @@ initialize_virtqueue(
 IN VOID (*notify)(struct virtqueue *));
 
 
-//#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 #define to_vvq(_vq) (struct vring_virtqueue *)_vq
 
 static
@@ -119,7 +118,7 @@ vring_add_buf(
 RhelDbgPrint(TRACE_LEVEL_VERBOSE, (%s: Added buffer head %i to %p\n,
 __FUNCTION__, head, vq) );
 
-return 0;
+return vq-num_free;
 }
 
 static
diff --git a/viostor/virtio_stor_hw_helper.c b/viostor/virtio_stor_hw_helper.c
index 21d27cd..2e61b30 100644
--- a/viostor/virtio_stor_hw_helper.c
+++ b/viostor/virtio_stor_hw_helper.c
@@ -54,6 +54,8 @@ RhelDoReadWrite(PVOID DeviceExtension,
 PVOID DataBuffer;
 PADAPTER_EXTENSIONadaptExt;
 PRHEL_SRB_EXTENSION   srbExt;
+int   num_free;
+
 cdb  = (PCDB)Srb-Cdb[0];
 srbExt   = (PRHEL_SRB_EXTENSION)Srb-SrbExtension;
 adaptExt = (PADAPTER_EXTENSION)DeviceExtension;
@@ -88,20 +90,21 @@ RhelDoReadWrite(PVOID DeviceExtension,
 
 srbExt-vbr.sg[sgElement].physAddr = 
ScsiPortGetPhysicalAddress(DeviceExtension, NULL, srbExt-vbr.status, 
fragLen);
 srbExt-vbr.sg[sgElement].ulSize = sizeof(srbExt-vbr.status);
-
-
-if (adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq,
+num_free = 
adaptExt-pci_vq_info.vq-vq_ops-add_buf(adaptExt-pci_vq_info.vq,
   srbExt-vbr.sg[0],
   srbExt-out, srbExt-in,
-  srbExt-vbr) == 0) {
+  srbExt-vbr);
+if ( num_free = 0) {
 InsertTailList(adaptExt-list_head, srbExt-vbr.list_entry);
 adaptExt-pci_vq_info.vq-vq_ops-kick(adaptExt-pci_vq_info.vq);
-if(++adaptExt-requests  adaptExt-queue_depth) {
+srbExt-call_next = FALSE;
+if(num_free  VIRTIO_MAX_SG) {
+   srbExt-call_next = TRUE;
+} else {
ScsiPortNotification(NextLuRequest, DeviceExtension, Srb-PathId, 
Srb-TargetId, Srb-Lun);
 }
-return TRUE;
 }
-return FALSE;
+return TRUE;
 }
 #endif

Re: [ANNOUNCE] kvm-kmod-2.6.32-rc7

2009-11-16 Thread John Wong


Jan Kiszka 提到:

win7_x86 means 64-bit version?
  

win7_x86 means windows 7 32bit version



Which qemu-kvm version are you using for these tests?
I download qemu-kvm from 
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=summary; at 09-Nov-2009

QEMU 0.11.50 monitor - type 'help' for more information

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio disk slower than IDE?

2009-11-16 Thread Dor Laor


On 11/16/2009 08:11 PM, Charles Duffy wrote:

Gordan Bobic wrote:

Lastly, do you use cache=wb on qemu? it's just a fun mode, we use
cache=off only.


I don't see the option being set in the logs, so I'd guess it's
whatever qemu-kvm defaults to.


You can set this through libvirt by putting an element such as the
following within your disk element:

driver name='qemu' type='qcow2' cache='none'/


It's not needed on rhel5.4 qemu - we have cache=none as a default



(Setting the type is preferred to avoid security issues wherein a guest
writes an arbitrary qcow2 header to the beginning of a raw disk, reboots
and allows qemu's autodetection to decide that this formerly-raw disk
should now be treated as a delta against a file they otherwise might not
have access to read; as such, it's particularly important if you intend
that the type be raw).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] qemu-kvm: Introduce writeback scope for cpu_synchronize_state

2009-11-16 Thread Jan Kiszka

Avi Kivity wrote:
 On 11/16/2009 07:00 PM, Jan Kiszka wrote:
 This patch aims at addressing the mp_state writeback issue in a cleaner
 fashion.
 
 What's the issue?  the fact that mp_state is updated whenever state is
 synchronized, while it could be simultaneously updated from other vcpus
 (which latter updates are then lost)?

Right, the issue b8a7857071 addressed. But that approach spreads more
kvm_* fragments in unrelated qemu code, e.g. the monitor, and fails to
update other parts (gdbstub). And it doesn't care about what happens if
kvm is off at build or runtime. Such things are better addressed in
upstream by encapsulating kvm calls in synchronization points.

 
 By introducing additional information about the scope of the
 scheduled vcpu state writeback, we can simply skin mp_state (and maybe
 other specific states in the future) when updating the in-kernel state.

 The writeback scope is defined when calling cpu_synchronize_state. It
 accumulated, ie. once a full writeback was requested, this will stick
 until it was performed.

 
 Maybe it's just simpler to divorce mp_state from the rest of the state.

That won't solve the core issue. mp_state *is* part of the state, and
needs to be read (to update halted) and sometimes also written when the
state was hard reset.

Jan



signature.asc
Description: OpenPGP digital signature

[ kvm-Bugs-2897679 ] strange mouse behavior when connecting to kvm-sesion via vnc

2009-11-16 Thread SourceForge.net

Bugs item #2897679, was opened at 2009-11-14 14:19
Message generated for change (Comment added) made by d3vi0n
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2897679group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Michael Mair-Keimberger (d3vi0n)
Assigned to: Nobody/Anonymous (nobody)
Summary: strange mouse behavior when connecting to kvm-sesion via vnc

Initial Comment:
I've a few running kvm-sessions on my server (fedora, ubuntu and winxp). I can 
connect to them via vnc which i enabled in kvm (via kvm -vnc).
The problem is that the mouse in a kvm-window is never there where it should 
be. This is really annoying because, for example if i try to press the start 
button (in windows), most of the time my mouse is already out of the window. I 
always have to play with the mouse to reach the button. Generally its with 
everything i do in windows with the mouse. Its the same with the other 
linux-oses.

It seems it depends on which point i jump into the window. Also the mouse 
distance between the local mouse and the mouse in windows changes while i move 
the mouse in windows.
I already made an bug-report on bugs.kde.org, because i though it's the fault 
with krdc (which is my vnc client), but i have the same issue with other 
clients too.
Here is the link of the bug-report: https://bugs.kde.org/show_bug.cgi?id=212498

Some info about the system:
Its a stable full 64-bit (no multilib) gentoo system:

Portage 2.1.6.13 (default/linux/amd64/10.0/no-multilib, gcc-4.3.4, 
glibc-2.9_p20081201-r2, 2.6.30-gentoo-r4 x86_64)
=   
   
System uname: 
linux-2.6.30-gentoo-r4-x86_64-intel-r-_xeon-r-_cpu_e54...@_2.00ghz-with-gentoo-2.0.1
 
Timestamp of tree: Sat, 14 Nov 2009 05:20:01 +
app-shells/bash: 4.0_p28
dev-lang/python: 2.6.2-r1
sys-apps/baselayout: 2.0.1
sys-apps/openrc: 0.5.2-r2
sys-apps/sandbox:1.6-r2
sys-devel/autoconf:  2.63-r1
sys-devel/automake:  1.7.9-r1, 1.9.6-r2, 1.10.2
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6a
virtual/os-headers:  2.6.27-r2
ACCEPT_KEYWORDS=amd64
CBUILD=x86_64-pc-linux-gnu
CFLAGS=-O2 -pipe -march=nocona -msse4.1
CHOST=x86_64-pc-linux-gnu
CONFIG_PROTECT=/etc
CONFIG_PROTECT_MASK=/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf 
/etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo 
/etc/udev/rules.d
CXXFLAGS=-O2 -pipe -march=nocona -msse4.1
DISTDIR=/usr/portage/distfiles
FEATURES=distlocks fixpackages parallel-fetch protect-owned sandbox sfperms 
strict unmerge-orphans userfetch
GENTOO_MIRRORS=http://gentoo.supp.name/
http://ftp.fi.muni.cz/pub/linux/gentoo/ 
http://gentoo.mirror.web4u.cz/  
http://gentoo.mirror.dkm.cz/pub/gentoo/ 
http://gentoo.ynet.sk/pub;
LANG=de_DE.utf8
LDFLAGS=-Wl,-O1
LINGUAS=de
MAKEOPTS=-j9
PKGDIR=/usr/portage/packages
PORTAGE_CONFIGROOT=/
PORTAGE_RSYNC_OPTS=--recursive --links --safe-links --perms --times --compress 
--force --whole-file --delete --stats --timeout=180 --exclude=/distfiles 
--exclude=/local --exclude=/packages
PORTAGE_TMPDIR=/var/tmp/tunafix
PORTDIR=/usr/portage
PORTDIR_OVERLAY=/home/clown/overlays/local /home/clown/overlays/layman/x11 
/home/clown/overlays/layman/sunrise
SYNC=rsync://rsync.europe.gentoo.org/gentoo-portage/
USE=acl acpi amd64 berkdb bzip2 cli cracklib crypt cups dbus dri fortran gdbm 
gpm iconv ipv6 mmx modules mudflap ncurses nls nptl nptlonly openmp pam pcre 
perl pppd python readline reflection session spl sse sse2 ssl ssse3 sysfs tcpd 
unicode xorg zlib ALSA_CARDS=ali5451 als4000 atiixp atiixp-modem bt87x ca0106 
cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 
intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci 
ALSA_PCM_PLUGINS=adpcm alaw asym copy dmix dshare dsnoop empty extplug file 
hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug 
rate route share shm softvol APACHE2_MODULES=actions alias auth_basic 
authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm 
authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache 
dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache 
filter headers include info log_config logio mem_cache mime mime_magic 
negotiation rewrite setenvif speling status unique_id userdir usertrack 
vhost_alias DVB_CARDS=ttpci ELIBC=glibc INPUT_DEVICES=keyboard mouse 
evdev KERNEL=linux LCD_DEVICES=bayrad cfontz cfontz633 glk hd44780 lb216 
lcdm001 mtxorb ncurses text LINGUAS=de USERLAND=GNU

[ kvm-Bugs-2897679 ] strange mouse behavior when connecting to kvm-sesion via vnc

2009-11-16 Thread SourceForge.net

Bugs item #2897679, was opened at 2009-11-14 16:19
Message generated for change (Comment added) made by 
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2897679group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Michael Mair-Keimberger (d3vi0n)
Assigned to: Nobody/Anonymous (nobody)
Summary: strange mouse behavior when connecting to kvm-sesion via vnc

Initial Comment:
I've a few running kvm-sessions on my server (fedora, ubuntu and winxp). I can 
connect to them via vnc which i enabled in kvm (via kvm -vnc).
The problem is that the mouse in a kvm-window is never there where it should 
be. This is really annoying because, for example if i try to press the start 
button (in windows), most of the time my mouse is already out of the window. I 
always have to play with the mouse to reach the button. Generally its with 
everything i do in windows with the mouse. Its the same with the other 
linux-oses.

It seems it depends on which point i jump into the window. Also the mouse 
distance between the local mouse and the mouse in windows changes while i move 
the mouse in windows.
I already made an bug-report on bugs.kde.org, because i though it's the fault 
with krdc (which is my vnc client), but i have the same issue with other 
clients too.
Here is the link of the bug-report: https://bugs.kde.org/show_bug.cgi?id=212498

Some info about the system:
Its a stable full 64-bit (no multilib) gentoo system:

Portage 2.1.6.13 (default/linux/amd64/10.0/no-multilib, gcc-4.3.4, 
glibc-2.9_p20081201-r2, 2.6.30-gentoo-r4 x86_64)
=   
   
System uname: 
linux-2.6.30-gentoo-r4-x86_64-intel-r-_xeon-r-_cpu_e54...@_2.00ghz-with-gentoo-2.0.1
 
Timestamp of tree: Sat, 14 Nov 2009 05:20:01 +
app-shells/bash: 4.0_p28
dev-lang/python: 2.6.2-r1
sys-apps/baselayout: 2.0.1
sys-apps/openrc: 0.5.2-r2
sys-apps/sandbox:1.6-r2
sys-devel/autoconf:  2.63-r1
sys-devel/automake:  1.7.9-r1, 1.9.6-r2, 1.10.2
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6a
virtual/os-headers:  2.6.27-r2
ACCEPT_KEYWORDS=amd64
CBUILD=x86_64-pc-linux-gnu
CFLAGS=-O2 -pipe -march=nocona -msse4.1
CHOST=x86_64-pc-linux-gnu
CONFIG_PROTECT=/etc
CONFIG_PROTECT_MASK=/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf 
/etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo 
/etc/udev/rules.d
CXXFLAGS=-O2 -pipe -march=nocona -msse4.1
DISTDIR=/usr/portage/distfiles
FEATURES=distlocks fixpackages parallel-fetch protect-owned sandbox sfperms 
strict unmerge-orphans userfetch
GENTOO_MIRRORS=http://gentoo.supp.name/
http://ftp.fi.muni.cz/pub/linux/gentoo/ 
http://gentoo.mirror.web4u.cz/  
http://gentoo.mirror.dkm.cz/pub/gentoo/ 
http://gentoo.ynet.sk/pub;
LANG=de_DE.utf8
LDFLAGS=-Wl,-O1
LINGUAS=de
MAKEOPTS=-j9
PKGDIR=/usr/portage/packages
PORTAGE_CONFIGROOT=/
PORTAGE_RSYNC_OPTS=--recursive --links --safe-links --perms --times --compress 
--force --whole-file --delete --stats --timeout=180 --exclude=/distfiles 
--exclude=/local --exclude=/packages
PORTAGE_TMPDIR=/var/tmp/tunafix
PORTDIR=/usr/portage
PORTDIR_OVERLAY=/home/clown/overlays/local /home/clown/overlays/layman/x11 
/home/clown/overlays/layman/sunrise
SYNC=rsync://rsync.europe.gentoo.org/gentoo-portage/
USE=acl acpi amd64 berkdb bzip2 cli cracklib crypt cups dbus dri fortran gdbm 
gpm iconv ipv6 mmx modules mudflap ncurses nls nptl nptlonly openmp pam pcre 
perl pppd python readline reflection session spl sse sse2 ssl ssse3 sysfs tcpd 
unicode xorg zlib ALSA_CARDS=ali5451 als4000 atiixp atiixp-modem bt87x ca0106 
cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 
intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci 
ALSA_PCM_PLUGINS=adpcm alaw asym copy dmix dshare dsnoop empty extplug file 
hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug 
rate route share shm softvol APACHE2_MODULES=actions alias auth_basic 
authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm 
authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache 
dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache 
filter headers include info log_config logio mem_cache mime mime_magic 
negotiation rewrite setenvif speling status unique_id userdir usertrack 
vhost_alias DVB_CARDS=ttpci ELIBC=glibc INPUT_DEVICES=keyboard mouse 
evdev KERNEL=linux LCD_DEVICES=bayrad cfontz cfontz633 glk hd44780 lb216 
lcdm001 mtxorb ncurses text LINGUAS=de USERLAND=GNU

Windows XP Bluescreen when unplugging printer

2009-11-16 Thread Erik Rull


Hi all,

I want to run an epson inkjet within my windows xp guest. my host has 
enabled usb 2.0, the USB flashdrive works without any problems. When I plug 
in the printer (works with the same drivers on a native windows xp!), it is 
recognized and the status monitor shows also the ink levels. Until I start 
printing. Then the ink level disappears and the status monitor hangs. The 
printer itself doesn't do anything, no LED blinks, no printing starts. When 
I shut down windows or unplug the printer I get a bluescreen in XP on 
usbuhci.sys


Interesting: When I switch off USB 2.0 and enable only USB 1.x in the host 
BIOS, everything related to USB is SLOW (usb flash drive, too!) but the 
printer works (also slow, but it prints).


Any ideas what could cause that behaviour? Comes up with kvm-88 and kvm-77 
as well.


I tested it on two different systems both the same behaviour.

Best regards,

Erik
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Serial Port Driver does not handle interrupt

2009-11-16 Thread Erik Rull


Erik Rull wrote:

Hi all,

I've tested two kvm versions 77 and 88, both with the same behaviour:
I add a serial device with -serial /dev/ttyS0 to my guest and launched 
HyperTerm on my Windows Guest.
Additionally I plugged in a loopback plug on the serial connector that 
just routes back the data send back to the receive line (TXD - RXD).
When I enter characters on the Hyperterm all characters are displayed 
except the last one that was sent.
When I plug in a real serial device and request its status via Hyperterm 
I get only back the first 16 chars (I expect 50) when I send an 
additional dummy character (sizeof: 16550 FIFO buffer). On my normal 
Windows PC it works as expected - so it seems to be an issue with kvm.


In the linux host, I don't see any changes in /proc/interrupts, the 
driver is opened and exclusively on this interrupt line.


Within the host the serial line works without any problems, and there, 
the interrupt counts increase during normal operation with a software 
that runs on the host and that uses this serial line.


What must I do to bring the serial line to real life with a complete 
communication without any data lost caused by missing interrupts? At the 
moment this is a key issue that has to be solved to continue my work 
with kvm.


Any Ideas? I also tested other IRQ lines and other ttyS* on the system - 
same behaviour.


Thanks in advance,

Erik
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


fixed, apic on host side was disabled, kvm / qemu seems to need it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2896992 ] Intel PCI NIC passthrough problem

2009-11-16 Thread SourceForge.net

Bugs item #2896992, was opened at 2009-11-13 05:58
Message generated for change (Comment added) made by 
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2896992group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Sergey Cheperis ()
Assigned to: Nobody/Anonymous (nobody)
Summary: Intel PCI NIC passthrough problem

Initial Comment:
Host: 
- CPU Core2Duo E6300, Intel q45 chipset, VT-d enabled
- Ubuntu 9.10 x86_64, kernel 2.6.31.4 recompiled with CONFIG_DMAR=y and 
CONFIG_INTR_REMAP=y according to 
http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM
- in-kernel kvm module
- qemu-kvm commit c04b2aebf50c7d8cba883b86d1b872ccfc8f2249
- intel_iommu=igfx_off
- Qemu command line: /usr/local/bin/qemu-system-x86_64  -m 512 -k en-us -drive 
if=ide,file=/dev/server2/ubuntu,boot=on  -cdrom 
/home/install/Linux/i386/ubuntu-9.10-desktop-i386.iso -boot d  -vga std  
-pcidevice host=01:00.0 -net none  -vnc :15 -daemonize

Guest OSes:
- Ubuntu 9.10 live CD i386, Windows Server 2003 i386, Windows Server 2008 
x86_64, MacOS X 10.5.x i386

The device on the host:
01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 02)
Subsystem: Intel Corporation Device 002e
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR- INTx-
Latency: 32 (63750ns min), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 31
Region 0: Memory at d054 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at d052 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at d000 [size=64]
Expansion ROM at bf00 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device
Command: DPERE- ERO+ RBC=512 OST=1
Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple 
DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
Capabilities: [f0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
Enable+
Address: fee0300c  Data: 41e9
Kernel driver in use: pci-stub
Kernel modules: e1000

Symptoms:
The device is found and initialized in all OS'es. The driver properly detects 
the link speed, and does detect when I plug the cable in or out. However, it 
does not send or receive any packets, all the counters are always at 0's though 
there must be more at least due to DHCP activity. dmesg does not show any 
errors neither on the host or on the guest.

ifconfig on the guest:
eth0  Link encap:Ethernet  HWaddr 00:07:e9:0f:c8:10  
  UP BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lspci -vvv on the guest:

00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 02)
Subsystem: Intel Corporation Device 002e
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR- INTx-
Latency: 32 (63750ns min), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at f000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at f002 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at c040 [size=64]
Expansion ROM at 2000 [disabled] [size=128K]
Capabilities: [40] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 
Enable-
Address:   Data: 
Kernel driver in use: e1000
Kernel modules: e1000

cat /proc/interrupts on the guest:
   CPU0   
  0: 89   IO-APIC-edge  timer
  1:   1088   IO-APIC-edge  i8042
  4:  2   IO-APIC-edge
  6:  2   IO-APIC-edge  floppy
  7:  4   IO-APIC-edge  parport0
  8:  0   IO-APIC-edge  rtc0
  9:  0   IO-APIC-fasteoi   acpi
 11:  0   IO-APIC-fasteoi   eth0
 12:   1333   IO-APIC-edge  i8042
 14: 98   IO-APIC-edge  ata_piix
 15:   9874   IO-APIC-edge  ata_piix
NMI:  0   Non-maskable interrupts
LOC:  59892   Local timer interrupts
SPU:  0   Spurious

Re: Windows XP Bluescreen when unplugging printer

2009-11-16 Thread Jim Paris

Erik Rull wrote:
 Hi all,

 I want to run an epson inkjet within my windows xp guest. my host has  
 enabled usb 2.0, the USB flashdrive works without any problems. When I 
 plug in the printer (works with the same drivers on a native windows 
 xp!), it is recognized and the status monitor shows also the ink levels. 
 Until I start printing. Then the ink level disappears and the status 
 monitor hangs. The printer itself doesn't do anything, no LED blinks, no 
 printing starts. When I shut down windows or unplug the printer I get a 
 bluescreen in XP on usbuhci.sys

 Interesting: When I switch off USB 2.0 and enable only USB 1.x in the 
 host BIOS, everything related to USB is SLOW (usb flash drive, too!) but 
 the printer works (also slow, but it prints).

 Any ideas what could cause that behaviour? Comes up with kvm-88 and 
 kvm-77 as well.

 I tested it on two different systems both the same behaviour.

You might try this usb fix:
  
http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=c4c0e236beabb9de5ff472f77aeb811ec5484615

It's been around for a while but hasn't made it into any qemu or kvm
releases yet.

-jim
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1:1 mapping

2009-11-16 Thread Marty Kurtz

Hi all

I work in the embedded world and am evaluating potential hypervisors
and kvm looks suitable for what we want
However, I see some old references to 1:1 mapping which would allow a
guest to see a PCI device even without an IOMMU

Is this currently incorporated into the qemu-0.11 and kvm-88 release?

Is dma=none equivalent to this by any chance or is it only meant for
devices that do not use DMA?

If 1:1 mapping is not in the tree, is there a reasonable chance that
the old patch would work with the latest?

Thanks
Marty
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-AUTOTEST] KSM-overcommit test v.2 (python version)

2009-11-16 Thread Jiri Zupka

Hi,  
  based on your requirements we have created new version 
of KSM-overcommit patch (submitted in September). 

Describe:
  It tests KSM (kernel shared memory) with overcommit of memory.

Changelog:
  1) Based only on python (remove C code)
  2) Add new test (check last 96B)
  3) Separate test to (serial,parallel,both)
  4) Improve log and documentation 
  5) Add perf constat to change time limit for waiting. (slow computer problem)

Functionality:
  KSM test start guests. They are connect to guest over ssh.
  Copy and run allocator.py to guests. 
  Host can run any python command over Allocator.py loop on client side. 

  Start run_ksm_overcommit.
  Define host and guest reserve variables (host_reserver,guest_reserver).
  Calculate amount of virtual machine and their memory based on variables
  host_mem and overcommit. 
  Check KSM status.
  Create and start virtual guests.
  Test :
   a] serial
1) initialize, merge all mem to single page
2) separate first guset mem
3) separate rest of guest up to fill all mem
4) kill all guests except for the last
5) check if mem of last guest is ok
6) kill guest
   b] parallel 
1) initialize, merge all mem to single page
2) separate mem of guest
3) verification of guest mem
4) merge mem to one block
5) verification of guests mem
6) separate mem of guests by 96B
7) check if mem is all right 
8) kill guest
  allocator.py (client side script) 
After start they wait for command witch they make in client side.
mem_fill class implement commands to fill, check mem and return 
error to host.

We need client side script because we need generate lot of GB of special 
data. 

Future plane:
  We want to add to log information about time spend in task. 
  Information from log we want to use to automatic compute perf contant.
  And add New tests.
  






  

diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample
index ac9ef66..90f62bb 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -118,6 +118,23 @@ variants:
 test_name = npb
 test_control_file = npb.control
 
+- ksm_overcommit:
+# Don't preprocess any vms as we need to change it's params
+vms = ''
+image_snapshot = yes
+kill_vm_gracefully = no
+type = ksm_overcommit
+ksm_swap = yes   # yes | no
+no hugepages
+# Overcommit of host memmory
+ksm_overcommit_ratio = 3
+# Max paralel runs machine
+ksm_paralel_ratio = 4
+variants:
+- serial
+ksm_test_size = serial
+- paralel
+ksm_test_size = paralel
 
 - linux_s3: install setup unattended_install
 type = linux_s3
diff --git a/client/tests/kvm/tests/ksm_overcommit.py b/client/tests/kvm/tests/ksm_overcommit.py
new file mode 100644
index 000..fb7ded6
--- /dev/null
+++ b/client/tests/kvm/tests/ksm_overcommit.py
@@ -0,0 +1,605 @@
+import logging, time
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+import kvm_preprocessing
+import random, string, math, os
+
+def run_ksm_overcommit(test, params, env):
+
+Test how KSM (Kernel Shared Memory) act with more than physical memory is
+used. In second part is also tested, how KVM can handle the situation,
+when the host runs out of memory (expected is to pause the guest system,
+wait until some process returns the memory and bring the guest back to life)
+
+@param test: kvm test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test wnvironment.
+
+
+def parse_meminfo(rowName):
+
+Function get date from file /proc/meminfo
+
+@param rowName: Name of line in meminfo 
+
+for line in open('/proc/meminfo').readlines():
+if line.startswith(rowName+:):
+name, amt, unit = line.split()
+return name, amt, unit   
+
+def parse_meminfo_value(rowName):
+
+Function convert meminfo value to int
+
+@param rowName: Name of line in meminfo  
+
+name, amt, unit = parse_meminfo(rowName)
+return amt
+
+
+def get_stat(lvms):
+
+Get statistics in format:
+Host: memfree = XXXM; Guests memsh = {XXX,XXX,...}
+
+@params lvms: List of VMs
+
+if not isinstance(lvms, list):
+raise error.TestError(get_stat: parameter have to be proper list)
+
+try:
+stat = Host: memfree = 
+stat += str(int(parse_meminfo_value(MemFree)) / 1024) + M; 
+stat += swapfree = 
+stat += str(int(parse_meminfo_value(SwapFree)) / 1024) + M; 
+except:
+raise error.TestFail(Could not fetch free memory

Re: Serial Port Driver does not handle interrupt

2009-11-16 Thread Rodrigo Campos

On Tue, Nov 17, 2009 at 12:01:08AM +0100, Erik Rull wrote:
 Erik Rull wrote:
 Any Ideas? I also tested other IRQ lines and other ttyS* on the
 system - same behaviour.
 
 fixed, apic on host side was disabled, kvm / qemu seems to need it.

I think I hit the same issue. What did you do exactly to solve it ? Enable a
kernel option ? May I ask which one ? :)

Sorry, I dont have the hardware right now (so I can't play with apic options). I
will have it in a few weeks, so that's why I am asking :)




Thanks a lot,
Rodrigo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio disk slower than IDE?

2009-11-16 Thread Gordan Bobic


john cooper wrote:


The test is building the Linux kernel (only taking the second run to give the 
test the benefit of local cache):

make clean; make -j8 all; make clean; sync; time make -j8 all

This takes about 10 minutes with IDE disk emulation and about 13 minutes with virtio. I 
ran the tests multiple time with most non-essential services on the host switched off 
(including cron/atd), and the guest in single-user mode to reduce the noise 
in the test to the minimum, and the results are pretty consistent, with virtio being 
about 30% behind.


I'd expect for an observed 30% wall clock time difference
of an operation as complex as a kernel build the base i/o
throughput disparity is substantially greater.  Did you
try a more simple/regular load, eg: a streaming dd read
of various block sizes from guest raw disk devices?
This is also considerably easier to debug vs. the complex
i/o load generated by a build.


I'm not convinced it's the read performance, since it's the second pass 
that is time, by which time all the source files will be in the guest's 
cache. I verified this by doing just one pass and priming it with:


find . -type f -exec cat '{}'  /dev/null \;

The execution times are indistinguishable from the second pass in the 
two-pass test.


To me that would indicate the the problem is with write performance, 
rather than read performance.



One way to chop up the problem space is using blktrace
on the host to observe both the i/o patterns coming out
of qemu and the host's response to them in terms of
turn around time.  I expect you'll see somewhat different
nature requests generated by qemu w/r/t blocking and
number of threads serving virtio_blk requests relative
to ide but the host response should be essentially the
same in terms of data returned per unit time.

If the host looks to be turning around i/o request with
similar latency in both cases, the problem would be lower
frequency of requests generated by qemu in the case of
virtio_blk.   Here it would be useful to know the host
load generated by the guest for both cases.


With virtio the CPU usage did seem to be noticeably lower. I figured 
that was because it was spending more time waiting for I/O to finish, 
since it was clearly bottlenecking on disk I/O (since that's the only 
thing that changed).


I'll try iozone's write tests and see how that compares. If I'm right 
about write performance being problematic, iozone might show the same 
performance deterioration on write tests compared to the IDE emulation.


Gordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

80 matches

Mail list logo