Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Avi Kivity

On 03/16/2010 02:20 AM, Jason wrote:


In comparing KVM 2.6.31.6b to XenServer 5.5.0, it seems KVM has fewer overall
VMREADs and VMWRITEs, but there are a lot of VMWRITEs to Host FS_SEL, Host
GS_SEL, Host FS_BASE, and Host GS_BASE that don't appear in Xen.


Ugh, these should definitely be eliminated, they keep writing the same 
value most of the time.



  Also, KVM has a
lot of MSR accesses to 0xc081-0xc084 that Xen doesn't have.
   


These are unavoidable.  Those msrs are used for system calls and we need 
them to keep ordinary userspace going.


2.6.33 should reduce their frequency though.

Usually it doesn't make sense to pass them through, but if we detect the 
guest is writing them often, we can do so and eliminate the exits.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2971075 ] Assertion `bmdma->unit != (uint8_t)-1' failed.

2010-03-15 Thread SourceForge.net
Bugs item #2971075, was opened at 2010-03-16 07:02
Message generated for change (Tracker Item Submitted) made by zaphodbrx
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2971075&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: v1.0 (example)
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Andre Weidemann (zaphodbrx)
Assigned to: Nobody/Anonymous (nobody)
Summary: Assertion `bmdma->unit != (uint8_t)-1' failed.

Initial Comment:
Hi,
I cloned the qemu-kvm git repository with "git clone 
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git qemu-kvm-2010-03-14", 
ran configure and compiled it and did a "make install". Everything went 
fine without warnings or errors.
For configure output take a look here: http://pastebin.com/BL4DYCRY

Here is my Server Hardware:
Asus P5Q Mainboard
Intel Q9300
8GB RAM

I am running Ubuntu 9.10 x86_64 with kernel 2.6.31-20-server.
RAID5 with mdadm consisting of 4x 1TB disks
The volume /dev/storage/Windows7test mentioned below is on this RAID5.

I ran my virtual machine with the following command:

qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc 
192.168.3.42:2 -k de -smp 4,cores=4 -drive 
file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m 
1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net 
tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name 
Windows7test,process=Windows7test -drive 
file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native

Windows7Test_600G.img is a qcow2 file and contains a Windows 7 Pro image.
/dev/storage/Windows7test is formated with XFS(inside the VM)

After starting the machine with the above command line, I booted into an 
Ubuntu 9.10 x86_64 Live Image via PXE and mounted /dev/sdb1 
(/dev/storage/Windows7test) under /mnt. I then did "cd /mnt/" and ran 
"iozone -Ra -g 2G -b /tmp/iozone-aoi-linux.xls"

iozone ran some tests and then kvm simply quit with the following error 
message:
qemu-system-x86_64: 
/usr/local/src/qemu-kvm-2010-03-10/hw/ide/internal.h:510: 
bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed.

/var/log/syslog contained the folowing:
Mar 14 09:18:14 server kernel: [318080.627468] kvm: 1361: cpu0 
kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
Mar 14 09:18:14 server  kernel: [318080.627473] kvm: 1361: cpu0 
kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
Mar 14 09:18:14 server kernel: [318080.627476] kvm: 1361: cpu0 unhandled 
wrmsr: 0x400 data 
Mar 14 09:18:14 server kernel: [318080.627506] kvm: 1361: cpu1 
kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
Mar 14 09:18:14 server  kernel: [318080.627509] kvm: 1361: cpu1 
kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
Mar 14 09:18:14 server kernel: [318080.627511] kvm: 1361: cpu1 unhandled 
wrmsr: 0x400 data 
Mar 14 09:18:14 server kernel: [318080.627538] kvm: 1361: cpu2 
kvm_set_msr_common: MSR_IA32_MCG_STATUS 0x0, nop
Mar 14 09:18:14 server kernel: [318080.627540] kvm: 1361: cpu2 
kvm_set_msr_common: MSR_IA32_MCG_CTL 0x, nop
Mar 14 09:18:14 server kernel: [318080.627543] kvm: 1361: cpu2 unhandled 
wrmsr: 0x400 data 

I was able to reproduce this error 3 times in a row.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2971075&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Avi Kivity

On 03/16/2010 03:21 AM, Anthony Liguori wrote:

On 03/15/2010 10:06 AM, Avi Kivity wrote:

On 03/15/2010 03:23 PM, Anthony Liguori wrote:

On 03/15/2010 08:11 AM, Avi Kivity wrote:

On 03/15/2010 03:03 PM, Joerg Roedel wrote:



I will add another project - iommu emulation.  Could be very useful
for doing device assignment to nested guests, which could make
testing a lot easier.

Our experiments show that nested device assignment is pretty much
required for I/O performance in nested scenarios.
Really? I did a small test with virtio-blk in a nested guest (disk 
read
with dd, so not a real benchmark) and got a reasonable 
read-performance

of around 25MB/s from the disk in the l2-guest.



Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.

I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we 
can do for other guests.


VMREAD/VMWRITEs are generally optimized by hypervisors as they tend 
to be costly.  KVM is a bit unusual in terms of how many times the 
instructions are executed per exit.


Do you know offhand of any unnecessary read/writes?  There's 
update_cr8_intercept(), but on normal exits, I don't see what else we 
can remove.


Yeah, there are a number of examples.

vmcs_clear_bits() and vmcs_set_bits() read a field of the VMCS and 
then immediately writes it.  This is unnecessary as the same 
information could be kept in a shadow variable.  In vmx_fpu_activate, 
we call vmcs_clear_bits() followed immediately by vmcs_set_bits(). 
which means we're reading GUEST_CR0 twice and writing it twice.


This should be much better these days (2.6.34-rc1) as vmx_fpu_activate() 
is called at most once per heavyweight exit (and I have evil plans to 
reduce it even further).  Still, that code should be optimized.


vmx_get_rflags() reads from the VMCS and we frequently call 
get_rflags() followed by a set_rflags() to update a bit.  We also 
don't cache the value between calls and there's a few spots in the 
code that make multiple calls.


We definitely should cache that (and segment access from the emulator as 
well).  But I'd have thought this to be relatively infrequent.  At least 
with Linux, using x2apic and virtio allows you to eliminate most 
emulator access, if you have npt or ept.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-15 Thread Avi Kivity

On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:

From: Zhang, Yanmin

Based on the discussion in KVM community, I worked out the patch to support
perf to collect guest os statistics from host side. This patch is implemented
with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
critical bug and provided good suggestions with other guys. I really appreciate
their kind help.

The patch adds new subcommand kvm to perf.

   perf kvm top
   perf kvm record
   perf kvm report
   perf kvm diff

The new perf could profile guest os kernel except guest os user space, but it
could summarize guest os user space utilization per guest os.

Below are some examples.
1) perf kvm top
[r...@lkp-ne01 norm]# perf kvm --host --guest 
--guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

   


Excellent, support for guest kernel != host kernel is critical (I can't 
remember the last time I ran same kernels).


How would we support multiple guests with different kernels?  Perhaps a 
symbol server that perf can connect to (and that would connect to guests 
in turn)?



diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 
linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
--- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c  2010-03-16 08:59:11.825295404 
+0800
+++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c  2010-03-16 
09:01:09.976084492 +0800
@@ -26,6 +26,7 @@
  #include
  #include
  #include
+#include
  #include "kvm_cache_regs.h"
  #include "x86.h"

@@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
vmcs_write32(TPR_THRESHOLD, irr);
  }

+DEFINE_PER_CPU(int, kvm_in_guest) = {0};
+
+static void kvm_set_in_guest(void)
+{
+   percpu_write(kvm_in_guest, 1);
+}
+
+static int kvm_is_in_guest(void)
+{
+   return percpu_read(kvm_in_guest);
+}
   


There is already PF_VCPU for this.


+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+   .is_in_guest= kvm_is_in_guest,
+   .is_user_mode   = kvm_is_user_mode,
+   .get_guest_ip   = kvm_get_guest_ip,
+   .reset_in_guest = kvm_reset_in_guest,
+};
   


Should be in common code, not vmx specific.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Randy Dunlap  [2010-03-15 08:46:31]:

> On Mon, 15 Mar 2010 12:52:15 +0530 Balbir Singh wrote:
> 
> Hi,
> If you go ahead with this, please add the boot parameter & its description
> to Documentation/kernel-parameters.txt.
>

I certainly will, thanks for keeping a watch. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Chris Webb  [2010-03-15 20:23:54]:

> Avi Kivity  writes:
> 
> > On 03/15/2010 10:07 AM, Balbir Singh wrote:
> >
> > >Yes, it is a virtio call away, but is the cost of paying twice in
> > >terms of memory acceptable?
> > 
> > Usually, it isn't, which is why I recommend cache=off.
> 
> Hi Avi. One observation about your recommendation for cache=none:
> 
> We run hosts of VMs accessing drives backed by logical volumes carved out
> from md RAID1. Each host has 32GB RAM and eight cores, divided between (say)
> twenty virtual machines, which pretty much fill the available memory on the
> host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback
> caching turned on get advertised to the guest as having a write-cache, and
> FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback
> isn't acting as cache=neverflush like it would have done a year ago. I know
> that comparing performance for cache=none against that unsafe behaviour
> would be somewhat unfair!)
> 
> Wasteful duplication of page cache between guest and host notwithstanding,
> turning on cache=writeback is a spectacular performance win for our guests.
> For example, even IDE with cache=writeback easily beats virtio with
> cache=none in most of the guest filesystem performance tests I've tried. The
> anecdotal feedback from clients is also very strongly in favour of
> cache=writeback.
> 
> With a host full of cache=none guests, IO contention between guests is
> hugely problematic with non-stop seek from the disks to service tiny
> O_DIRECT writes (especially without virtio), many of which needn't have been
> synchronous if only there had been some way for the guest OS to tell qemu
> that. Running with cache=writeback seems to reduce the frequency of disk
> flush per guest to a much more manageable level, and to allow the host's
> elevator to optimise writing out across the guests in between these flushes.

Thanks for the inputs above, they are extremely useful. The goal of
these patches is that with cache != none, we allow double caching when
needed and then slowly take away unmapped pages, pushing the caching
to the host. There are knobs to control how much, etc and the whole
feature is enabled via a boot parameter.

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Anthony Liguori

On 03/15/2010 07:43 PM, Christoph Hellwig wrote:

On Mon, Mar 15, 2010 at 06:43:06PM -0500, Anthony Liguori wrote:
   

I knew someone would do this...

This really gets down to your definition of "safe" behaviour.  As it
stands, if you suffer a power outage, it may lead to guest corruption.

While we are correct in advertising a write-cache, write-caches are
volatile and should a drive lose power, it could lead to data
corruption.  Enterprise disks tend to have battery backed write caches
to prevent this.

In the set up you're emulating, the host is acting as a giant write
cache.  Should your host fail, you can get data corruption.

cache=writethrough provides a much stronger data guarantee.  Even in the
event of a host failure, data integrity will be preserved.
 

Actually cache=writeback is as safe as any normal host is with a
volatile disk cache, except that in this case the disk cache is
actually a lot larger.  With a properly implemented filesystem this
will never cause corruption.


Metadata corruption, not necessarily corruption of data stored in a file.


   You will lose recent updates after
the last sync/fsync/etc up to the size of the cache, but filesystem
metadata should never be corrupted, and data that has been forced to
disk using fsync/O_SYNC should never be lost either.


Not all software uses fsync as much as they should.  And often times, 
it's for good reason (like ext3).  This is mitigated by the fact that 
there's usually a short window of time before metadata is flushed to 
disk.  Adding another layer increases that delay.


IIUC, an O_DIRECT write using cache=writeback is not actually on the 
spindle when the write() completes.  Rather, an explicit fsync() would 
be required.  That will cause data corruption in many applications (like 
databases) regardless of whether the fs gets metadata corruption.


You could argue that the software should disable writeback caching on 
the virtual disk, but we don't currently support that so even if the 
application did, it's not going to help.


Regards,

Anthony Liguori


   If it is that's
a bug somewhere in the stack, but in my powerfail testing we never did
so using xfs or ext3/4 after I fixed up the fsync code in the latter
two.

   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Anthony Liguori

On 03/15/2010 10:06 AM, Avi Kivity wrote:

On 03/15/2010 03:23 PM, Anthony Liguori wrote:

On 03/15/2010 08:11 AM, Avi Kivity wrote:

On 03/15/2010 03:03 PM, Joerg Roedel wrote:



I will add another project - iommu emulation.  Could be very useful
for doing device assignment to nested guests, which could make
testing a lot easier.

Our experiments show that nested device assignment is pretty much
required for I/O performance in nested scenarios.
Really? I did a small test with virtio-blk in a nested guest (disk 
read
with dd, so not a real benchmark) and got a reasonable 
read-performance

of around 25MB/s from the disk in the l2-guest.



Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.

I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we 
can do for other guests.


VMREAD/VMWRITEs are generally optimized by hypervisors as they tend 
to be costly.  KVM is a bit unusual in terms of how many times the 
instructions are executed per exit.


Do you know offhand of any unnecessary read/writes?  There's 
update_cr8_intercept(), but on normal exits, I don't see what else we 
can remove.


Yeah, there are a number of examples.

vmcs_clear_bits() and vmcs_set_bits() read a field of the VMCS and then 
immediately writes it.  This is unnecessary as the same information 
could be kept in a shadow variable.  In vmx_fpu_activate, we call 
vmcs_clear_bits() followed immediately by vmcs_set_bits(). which means 
we're reading GUEST_CR0 twice and writing it twice.


vmx_get_rflags() reads from the VMCS and we frequently call get_rflags() 
followed by a set_rflags() to update a bit.  We also don't cache the 
value between calls and there's a few spots in the code that make 
multiple calls.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Christoph Hellwig
On Mon, Mar 15, 2010 at 06:43:06PM -0500, Anthony Liguori wrote:
> I knew someone would do this...
>
> This really gets down to your definition of "safe" behaviour.  As it  
> stands, if you suffer a power outage, it may lead to guest corruption.
>
> While we are correct in advertising a write-cache, write-caches are  
> volatile and should a drive lose power, it could lead to data  
> corruption.  Enterprise disks tend to have battery backed write caches  
> to prevent this.
>
> In the set up you're emulating, the host is acting as a giant write  
> cache.  Should your host fail, you can get data corruption.
>
> cache=writethrough provides a much stronger data guarantee.  Even in the  
> event of a host failure, data integrity will be preserved.

Actually cache=writeback is as safe as any normal host is with a
volatile disk cache, except that in this case the disk cache is
actually a lot larger.  With a properly implemented filesystem this
will never cause corruption.  You will lose recent updates after
the last sync/fsync/etc up to the size of the cache, but filesystem
metadata should never be corrupted, and data that has been forced to
disk using fsync/O_SYNC should never be lost either.  If it is that's
a bug somewhere in the stack, but in my powerfail testing we never did
so using xfs or ext3/4 after I fixed up the fsync code in the latter
two.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Jason
Avi Kivity  redhat.com> writes:
> 
> On 03/15/2010 03:23 PM, Anthony Liguori wrote:
> > On 03/15/2010 08:11 AM, Avi Kivity wrote:
> >> Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.
> >>
> >> I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can 
> >> do for other guests.
> >
> > VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to 
> > be costly.  KVM is a bit unusual in terms of how many times the 
> > instructions are executed per exit.
> 
> Do you know offhand of any unnecessary read/writes?  There's 
> update_cr8_intercept(), but on normal exits, I don't see what else we 
> can remove.
> 

In comparing KVM 2.6.31.6b to XenServer 5.5.0, it seems KVM has fewer overall
VMREADs and VMWRITEs, but there are a lot of VMWRITEs to Host FS_SEL, Host
GS_SEL, Host FS_BASE, and Host GS_BASE that don't appear in Xen. Also, KVM has a
lot of MSR accesses to 0xc081-0xc084 that Xen doesn't have.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Anthony Liguori

On 03/15/2010 03:23 PM, Chris Webb wrote:

Avi Kivity  writes:

   

On 03/15/2010 10:07 AM, Balbir Singh wrote:

 

Yes, it is a virtio call away, but is the cost of paying twice in
terms of memory acceptable?
   

Usually, it isn't, which is why I recommend cache=off.
 

Hi Avi. One observation about your recommendation for cache=none:

We run hosts of VMs accessing drives backed by logical volumes carved out
from md RAID1. Each host has 32GB RAM and eight cores, divided between (say)
twenty virtual machines, which pretty much fill the available memory on the
host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback
caching turned on get advertised to the guest as having a write-cache, and
FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback
isn't acting as cache=neverflush like it would have done a year ago. I know
that comparing performance for cache=none against that unsafe behaviour
would be somewhat unfair!)
   


I knew someone would do this...

This really gets down to your definition of "safe" behaviour.  As it 
stands, if you suffer a power outage, it may lead to guest corruption.


While we are correct in advertising a write-cache, write-caches are 
volatile and should a drive lose power, it could lead to data 
corruption.  Enterprise disks tend to have battery backed write caches 
to prevent this.


In the set up you're emulating, the host is acting as a giant write 
cache.  Should your host fail, you can get data corruption.


cache=writethrough provides a much stronger data guarantee.  Even in the 
event of a host failure, data integrity will be preserved.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Chris Webb
Avi Kivity  writes:

> On 03/15/2010 10:07 AM, Balbir Singh wrote:
>
> >Yes, it is a virtio call away, but is the cost of paying twice in
> >terms of memory acceptable?
> 
> Usually, it isn't, which is why I recommend cache=off.

Hi Avi. One observation about your recommendation for cache=none:

We run hosts of VMs accessing drives backed by logical volumes carved out
from md RAID1. Each host has 32GB RAM and eight cores, divided between (say)
twenty virtual machines, which pretty much fill the available memory on the
host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback
caching turned on get advertised to the guest as having a write-cache, and
FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback
isn't acting as cache=neverflush like it would have done a year ago. I know
that comparing performance for cache=none against that unsafe behaviour
would be somewhat unfair!)

Wasteful duplication of page cache between guest and host notwithstanding,
turning on cache=writeback is a spectacular performance win for our guests.
For example, even IDE with cache=writeback easily beats virtio with
cache=none in most of the guest filesystem performance tests I've tried. The
anecdotal feedback from clients is also very strongly in favour of
cache=writeback.

With a host full of cache=none guests, IO contention between guests is
hugely problematic with non-stop seek from the disks to service tiny
O_DIRECT writes (especially without virtio), many of which needn't have been
synchronous if only there had been some way for the guest OS to tell qemu
that. Running with cache=writeback seems to reduce the frequency of disk
flush per guest to a much more manageable level, and to allow the host's
elevator to optimise writing out across the guests in between these flushes.

Cheers,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: clean up assigned_device_enable_host_msix

2010-03-15 Thread Marcelo Tosatti
On Sat, Mar 13, 2010 at 03:00:45PM +0800, jing zhang wrote:
> From: Jing Zhang 
> 
> Date: Sat Mar 13 14:05:27 2010
> 
> Cc: Avi Kivity 
> Signed-off-by: Jing Zhang 

Applied (with a better description), thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/2] qemu-kvm: Save&restore debug registers

2010-03-15 Thread Marcelo Tosatti
On Fri, Mar 12, 2010 at 03:20:48PM +0100, Jan Kiszka wrote:
> Patch 1 is for upstream and should be applied to uq/master as well, patch
> 2 is for qemu-kvm only.
> 
> Jan Kiszka (2):
>   KVM: x86: Add debug register saving and restoring
>   qemu-kvm: x86: Add support for saving&restoring debug registers
> 
>  kvm-all.c |   11 ++
>  kvm.h |1 +
>  qemu-kvm-x86.c|2 +
>  qemu-kvm.c|5 
>  qemu-kvm.h|1 +
>  target-i386/kvm.c |   55 
> +
>  6 files changed, 75 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failure

2010-03-15 Thread Marcelo Tosatti
On Fri, Mar 12, 2010 at 12:59:06PM +0800, Wei Yongjun wrote:
> This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
> from -EINVAL to -ENXIO if no coalesced mmio dev exists.
> 
> Signed-off-by: Wei Yongjun 

Applied all, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Anthony Liguori

On 03/15/2010 04:27 AM, Avi Kivity wrote:


That's only beneficial if the cache is shared.  Otherwise, you could 
use the balloon to evict cache when memory is tight.


Shared cache is mostly a desktop thing where users run similar 
workloads.  For servers, it's much less likely.  So a modified-guest 
doesn't help a lot here.


Not really.  In many cloud environments, there's a set of common images 
that are instantiated on each node.  Usually this is because you're 
running a horizontally scalable application or because you're supporting 
an ephemeral storage model.


In fact, with ephemeral storage, you typically want to use 
cache=writeback since you aren't providing data guarantees across 
shutdown/failure.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nfs and db servers under kvm?

2010-03-15 Thread Mike Diehl
Hi all,

I'm considering changing much of my current infrastructure so that it runs 
under an array of VM's.

I'm wondering how well database and nfs servers run under KVM.  Should I put 
the data on a host filesystem, or can I put it on the guest filesystem?


-- 

Take care and have fun,
Mike Diehl.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 04:46:20PM +0100, Andre Przywara wrote:
> Gleb Natapov wrote:
> >If LOCK prefix is used dest arg should be memory, otherwise instruction
> >should generate #UD.
> Well, there is one exception:
> There is an AMD specific "lock mov cr0 = mov cr8" equivalence, where
> there is no memory involved (and we intercept this). I am not sure
> if anyone actually uses this code sequence, but it is definitely
> legal.
> 
Even without this patch "lock mov cr0" will cause #UD to be injected by
emulator since mov does not have Lock in opcode table. Also it look like
Intel does not support this extension so no portable program can use
it.

> Regards,
> Andre.
> 
> >
> >Signed-off-by: Gleb Natapov 
> >---
> > arch/x86/kvm/emulate.c |2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >index b89a8f2..46a7ee3 100644
> >--- a/arch/x86/kvm/emulate.c
> >+++ b/arch/x86/kvm/emulate.c
> >@@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
> >x86_emulate_ops *ops)
> > }
> > /* LOCK prefix is allowed only with some instructions */
> >-if (c->lock_prefix && !(c->d & Lock)) {
> >+if (c->lock_prefix && (!(c->d & Lock) || c->dst.type != OP_MEM)) {
> > kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
> > goto done;
> > }
> 
> 
> -- 
> Andre Przywara
> AMD-OSRC (Dresden)
> Tel: x29712

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/30] emulator cleanup

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 04:51:35PM +0100, Andre Przywara wrote:
> Gleb Natapov wrote:
> >This is the first series of patches that tries to cleanup emulator code.
> >This is mix of bug fixes and moving code that does emulation from x86.c
> >to emulator.c while making it KVM independent. The status of the patches:
> >works for me. realtime.flat test now also pass where it failed before.
> 
> Patch 1..13, 17:
> Reviewed-by: Andre Przywara 
> 
> I am still investigating a corner case in patch 14 (calling
> syscall/sysenter from real mode), and there is the issue in patch
> 16. I have only shortly looked over the others.
> 
Patch 14 is only mechanical change. It doesn't change behaviour of
syscall/sysenter emulation.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Joerg Roedel
On Mon, Mar 15, 2010 at 08:14:29AM -0500, Anthony Liguori wrote:
> On 03/15/2010 07:42 AM, Avi Kivity wrote:
>> On 03/15/2010 02:38 PM, Joerg Roedel wrote:
>>> On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote:
 On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
>Hi there,
>
>Our wiki page for the Summer of Code 2010 is doing quite well:
>
> http://wiki.qemu.org/Google_Summer_of_Code_2010
>
 I will add another project - iommu emulation.  Could be very useful for
 doing device assignment to nested guests, which could make testing 
 a lot
 easier.
>>> Good idea. If there is interest I could help to mentor this project.
>>
>> Thanks.  I volunteered Anthony, but he may be a little overcommitted.
>
> Joerg, feel free to put your name against too.

[x] Done.

Thanks,
Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/30] emulator cleanup

2010-03-15 Thread Andre Przywara

Gleb Natapov wrote:

This is the first series of patches that tries to cleanup emulator code.
This is mix of bug fixes and moving code that does emulation from x86.c
to emulator.c while making it KVM independent. The status of the patches:
works for me. realtime.flat test now also pass where it failed before.


Patch 1..13, 17:
Reviewed-by: Andre Przywara 

I am still investigating a corner case in patch 14 (calling 
syscall/sysenter from real mode), and there is the issue in patch 16. I 
have only shortly looked over the others.


Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.

2010-03-15 Thread Avi Kivity

On 03/15/2010 05:46 PM, Andre Przywara wrote:

Gleb Natapov wrote:

If LOCK prefix is used dest arg should be memory, otherwise instruction
should generate #UD.

Well, there is one exception:
There is an AMD specific "lock mov cr0 = mov cr8" equivalence, where 
there is no memory involved (and we intercept this). I am not sure if 
anyone actually uses this code sequence, but it is definitely legal.


It's better to trap on this instead of emulating it incorrectly as mov 
cr8.  If/when someone adds 32-bit mov cr8 handling, this will need to be 
addressed.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Randy Dunlap
On Mon, 15 Mar 2010 12:52:15 +0530 Balbir Singh wrote:

> Selectively control Unmapped Page Cache (nospam version)
> 
> From: Balbir Singh 
> 
> This patch implements unmapped page cache control via preferred
> page cache reclaim. The current patch hooks into kswapd and reclaims
> page cache if the user has requested for unmapped page control.
> This is useful in the following scenario
> 
> - In a virtualized environment with cache!=none, we see
>   double caching - (one in the host and one in the guest). As
>   we try to scale guests, cache usage across the system grows.
>   The goal of this patch is to reclaim page cache when Linux is running
>   as a guest and get the host to hold the page cache and manage it.
>   There might be temporary duplication, but in the long run, memory
>   in the guests would be used for mapped pages.
> - The option is controlled via a boot option and the administrator
>   can selectively turn it on, on a need to use basis.
> 
> A lot of the code is borrowed from zone_reclaim_mode logic for
> __zone_reclaim(). One might argue that the with ballooning and
> KSM this feature is not very useful, but even with ballooning,
> we need extra logic to balloon multiple VM machines and it is hard
> to figure out the correct amount of memory to balloon. With these
> patches applied, each guest has a sufficient amount of free memory
> available, that can be easily seen and reclaimed by the balloon driver.
> The additional memory in the guest can be reused for additional
> applications or used to start additional guests/balance memory in
> the host.
> 
> KSM currently does not de-duplicate host and guest page cache. The goal
> of this patch is to help automatically balance unmapped page cache when
> instructed to do so.
> 
> There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
> and the number of pages to reclaim when unmapped_page_control argument
> is supplied. These numbers were chosen to avoid aggressiveness in
> reaping page cache ever so frequently, at the same time providing control.
> 
> The sysctl for min_unmapped_ratio provides further control from
> within the guest on the amount of unmapped pages to reclaim.
> 
> The patch is applied against mmotm feb-11-2010.

Hi,
If you go ahead with this, please add the boot parameter & its description
to Documentation/kernel-parameters.txt.


> TODOS
> -
> 1. Balance slab cache as well
> 2. Invoke the balance routines from the balloon driver

---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.

2010-03-15 Thread Andre Przywara

Gleb Natapov wrote:

If LOCK prefix is used dest arg should be memory, otherwise instruction
should generate #UD.

Well, there is one exception:
There is an AMD specific "lock mov cr0 = mov cr8" equivalence, where 
there is no memory involved (and we intercept this). I am not sure if 
anyone actually uses this code sequence, but it is definitely legal.


Regards,
Andre.



Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b89a8f2..46a7ee3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
 	/* LOCK prefix is allowed only with some instructions */

-   if (c->lock_prefix && !(c->d & Lock)) {
+   if (c->lock_prefix && (!(c->d & Lock) || c->dst.type != OP_MEM)) {
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
goto done;
}



--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Avi Kivity

On 03/15/2010 03:23 PM, Anthony Liguori wrote:

On 03/15/2010 08:11 AM, Avi Kivity wrote:

On 03/15/2010 03:03 PM, Joerg Roedel wrote:



I will add another project - iommu emulation.  Could be very useful
for doing device assignment to nested guests, which could make
testing a lot easier.

Our experiments show that nested device assignment is pretty much
required for I/O performance in nested scenarios.

Really? I did a small test with virtio-blk in a nested guest (disk read
with dd, so not a real benchmark) and got a reasonable read-performance
of around 25MB/s from the disk in the l2-guest.



Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.

I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can 
do for other guests.


VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to 
be costly.  KVM is a bit unusual in terms of how many times the 
instructions are executed per exit.


Do you know offhand of any unnecessary read/writes?  There's 
update_cr8_intercept(), but on normal exits, I don't see what else we 
can remove.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/30] emulator cleanup

2010-03-15 Thread Avi Kivity

On 03/15/2010 04:38 PM, Gleb Natapov wrote:

This is the first series of patches that tries to cleanup emulator code.
This is mix of bug fixes and moving code that does emulation from x86.c
to emulator.c while making it KVM independent. The status of the patches:
works for me. realtime.flat test now also pass where it failed before.
   


Reviewed-by: Avi Kivity 

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Muli Ben-Yehuda
On Mon, Mar 15, 2010 at 02:03:11PM +0100, Joerg Roedel wrote:

> On Mon, Mar 15, 2010 at 05:53:13AM -0700, Muli Ben-Yehuda wrote:

> > On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote:

> > > On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
> > 
> > > >  Hi there,
> > > >
> > > >  Our wiki page for the Summer of Code 2010 is doing quite well:
> > > >
> > > >http://wiki.qemu.org/Google_Summer_of_Code_2010
> > > 
> > > I will add another project - iommu emulation.  Could be very
> > > useful for doing device assignment to nested guests, which could
> > > make testing a lot easier.
> > 
> > Our experiments show that nested device assignment is pretty much
> > required for I/O performance in nested scenarios.
> 
> Really? I did a small test with virtio-blk in a nested guest (disk
> read with dd, so not a real benchmark) and got a reasonable
> read-performance of around 25MB/s from the disk in the l2-guest.

Netperf running in L1 with direct access: ~950 Mbps throughput with
25% CPU utilization. Netperf running in L2 with virtio between L2 and
L1 and direct assignment between L1 and L0: roughly the same
throughput, but over 90% CPU utilization! Now extrapolate to 10GbE.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 08/30] KVM: Provide current eip as part of emulator context.

2010-03-15 Thread Gleb Natapov
Eliminate the need to call back into KVM to get it from emulator.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |3 ++-
 arch/x86/kvm/emulate.c |   12 ++--
 arch/x86/kvm/x86.c |1 +
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b048fd2..0765725 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -141,7 +141,7 @@ struct decode_cache {
u8 seg_override;
unsigned int d;
unsigned long regs[NR_VCPU_REGS];
-   unsigned long eip, eip_orig;
+   unsigned long eip;
/* modrm */
u8 modrm;
u8 modrm_mod;
@@ -160,6 +160,7 @@ struct x86_emulate_ctxt {
struct kvm_vcpu *vcpu;
 
unsigned long eflags;
+   unsigned long eip; /* eip before instruction emulation */
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
int mode;
u32 cs_base;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8bd0557..2c27aa4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -667,7 +667,7 @@ static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
int rc;
 
/* x86 instructions are limited to 15 bytes. */
-   if (eip + size - ctxt->decode.eip_orig > 15)
+   if (eip + size - ctxt->eip > 15)
return X86EMUL_UNHANDLEABLE;
eip += ctxt->cs_base;
while (size--) {
@@ -927,7 +927,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
/* Shadow copy of register state. Committed on successful emulation. */
 
memset(c, 0, sizeof(struct decode_cache));
-   c->eip = c->eip_orig = kvm_rip_read(ctxt->vcpu);
+   c->eip = ctxt->eip;
ctxt->cs_base = seg_base(ctxt, VCPU_SREG_CS);
memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
 
@@ -1878,7 +1878,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
}
register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
-   c->eip = kvm_rip_read(ctxt->vcpu);
+   c->eip = ctxt->eip;
}
 
if (c->src.type == OP_MEM) {
@@ -2447,7 +2447,7 @@ twobyte_insn:
goto done;
 
/* Let the processor re-execute the fixed hypercall */
-   c->eip = kvm_rip_read(ctxt->vcpu);
+   c->eip = ctxt->eip;
/* Disable writeback. */
c->dst.type = OP_NONE;
break;
@@ -2551,7 +2551,7 @@ twobyte_insn:
| ((u64)c->regs[VCPU_REGS_RDX] << 32);
if (kvm_set_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], msr_data)) {
kvm_inject_gp(ctxt->vcpu, 0);
-   c->eip = kvm_rip_read(ctxt->vcpu);
+   c->eip = ctxt->eip;
}
rc = X86EMUL_CONTINUE;
c->dst.type = OP_NONE;
@@ -2560,7 +2560,7 @@ twobyte_insn:
/* rdmsr */
if (kvm_get_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], &msr_data)) 
{
kvm_inject_gp(ctxt->vcpu, 0);
-   c->eip = kvm_rip_read(ctxt->vcpu);
+   c->eip = ctxt->eip;
} else {
c->regs[VCPU_REGS_RAX] = (u32)msr_data;
c->regs[VCPU_REGS_RDX] = msr_data >> 32;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3b6848e..022d28e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3494,6 +3494,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
vcpu->arch.emulate_ctxt.vcpu = vcpu;
vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
+   vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu);
vcpu->arch.emulate_ctxt.mode =
(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 15/30] KVM: x86 emulator: do not call writeback if msr access fails.

2010-03-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1393bf0..b89a8f2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2563,7 +2563,7 @@ twobyte_insn:
| ((u64)c->regs[VCPU_REGS_RDX] << 32);
if (kvm_set_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], msr_data)) {
kvm_inject_gp(ctxt->vcpu, 0);
-   c->eip = ctxt->eip;
+   goto done;
}
rc = X86EMUL_CONTINUE;
c->dst.type = OP_NONE;
@@ -2572,7 +2572,7 @@ twobyte_insn:
/* rdmsr */
if (kvm_get_msr(ctxt->vcpu, c->regs[VCPU_REGS_RCX], &msr_data)) 
{
kvm_inject_gp(ctxt->vcpu, 0);
-   c->eip = ctxt->eip;
+   goto done;
} else {
c->regs[VCPU_REGS_RAX] = (u32)msr_data;
c->regs[VCPU_REGS_RDX] = msr_data >> 32;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 20/30] KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()

2010-03-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index db4776c..702bfff 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1508,7 +1508,7 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt,
if (rc != X86EMUL_CONTINUE)
return rc;
 
-   rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)selector, seg);
+   rc = load_segment_descriptor(ctxt, ops, (u16)selector, seg);
return rc;
 }
 
@@ -1683,7 +1683,7 @@ static int emulate_ret_far(struct x86_emulate_ctxt *ctxt,
rc = emulate_pop(ctxt, ops, &cs, c->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)cs, VCPU_SREG_CS);
+   rc = load_segment_descriptor(ctxt, ops, (u16)cs, VCPU_SREG_CS);
return rc;
 }
 
@@ -2717,7 +2717,7 @@ special_insn:
if (c->modrm_reg == VCPU_SREG_SS)
toggle_interruptibility(ctxt, 
KVM_X86_SHADOW_INT_MOV_SS);
 
-   rc = kvm_load_segment_descriptor(ctxt->vcpu, sel, c->modrm_reg);
+   rc = load_segment_descriptor(ctxt, ops, sel, c->modrm_reg);
 
c->dst.type = OP_NONE;  /* Disable writeback. */
break;
@@ -2892,8 +2892,8 @@ special_insn:
goto jmp;
case 0xea: /* jmp far */
jump_far:
-   if (kvm_load_segment_descriptor(ctxt->vcpu, c->src2.val,
-   VCPU_SREG_CS))
+   if (load_segment_descriptor(ctxt, ops, c->src2.val,
+   VCPU_SREG_CS))
goto done;
 
c->eip = c->src.val;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 13/30] KVM: x86 emulator: fix mov dr to inject #UD when needed.

2010-03-15 Thread Gleb Natapov
If CR4.DE=1 access to registers DR4/DR5 cause #UD.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 836e97b..5afddcf 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2531,9 +2531,12 @@ twobyte_insn:
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
-   if (emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm]))
-   goto cannot_emulate;
-   rc = X86EMUL_CONTINUE;
+   if ((ops->get_cr(4, ctxt->vcpu) & X86_CR4_DE) &&
+   (c->modrm_reg == 4 || c->modrm_reg == 5)) {
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   goto done;
+   }
+   emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm]);
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
@@ -2541,9 +2544,12 @@ twobyte_insn:
c->dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
-   if (emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm]))
-   goto cannot_emulate;
-   rc = X86EMUL_CONTINUE;
+   if ((ops->get_cr(4, ctxt->vcpu) & X86_CR4_DE) &&
+   (c->modrm_reg == 4 || c->modrm_reg == 5)) {
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   goto done;
+   }
+   emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm]);
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x30:
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/30] KVM: Provide x86_emulate_ctxt callback to get current cpl

2010-03-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   15 ---
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0c5caa4..b048fd2 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -110,6 +110,7 @@ struct x86_emulate_ops {
struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
+   int (*cpl)(struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5e2fa61..8bd0557 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
int rc;
unsigned long val, change_mask;
int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
-   int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu);
+   int cpl = ops->cpl(ctxt->vcpu);
 
rc = emulate_pop(ctxt, ops, &val, len);
if (rc != X86EMUL_CONTINUE)
@@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
 }
 
-static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)
+static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops)
 {
int iopl;
if (ctxt->mode == X86EMUL_MODE_REAL)
@@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt 
*ctxt)
if (ctxt->mode == X86EMUL_MODE_VM86)
return true;
iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
-   return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl;
+   return ops->cpl(ctxt->vcpu) > iopl;
 }
 
 static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
@@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct x86_emulate_ctxt 
*ctxt,
 struct x86_emulate_ops *ops,
 u16 port, u16 len)
 {
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
if (!emulator_io_port_access_allowed(ctxt, ops, port, len))
return false;
return true;
@@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
/* Privileged instruction can be executed only in CPL=0 */
-   if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) {
+   if ((c->d & Priv) && ops->cpl(ctxt->vcpu)) {
kvm_inject_gp(ctxt->vcpu, 0);
goto done;
}
@@ -2378,7 +2379,7 @@ special_insn:
c->dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xfa: /* cli */
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt->vcpu, 0);
else {
ctxt->eflags &= ~X86_EFLAGS_IF;
@@ -2386,7 +2387,7 @@ special_insn:
}
break;
case 0xfb: /* sti */
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt->vcpu, 0);
else {
toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b139334..3b6848e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3442,6 +3442,11 @@ static void emulator_set_cr(int cr, unsigned long val, 
struct kvm_vcpu *vcpu)
}
 }
 
+static int emulator_get_cpl(struct kvm_vcpu *vcpu)
+{
+   return kvm_x86_ops->get_cpl(vcpu);
+}
+
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.fetch   = kvm_fetch_guest_virt,
@@ -3450,6 +3455,7 @@ static struct x86_emulate_ops emulate_ops = {
.cmpxchg_emulated= emulator_cmpxchg_emulated,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .cpl = emulator_get_cpl,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 16/30] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.

2010-03-15 Thread Gleb Natapov
If LOCK prefix is used dest arg should be memory, otherwise instruction
should generate #UD.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b89a8f2..46a7ee3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1842,7 +1842,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
/* LOCK prefix is allowed only with some instructions */
-   if (c->lock_prefix && !(c->d & Lock)) {
+   if (c->lock_prefix && (!(c->d & Lock) || c->dst.type != OP_MEM)) {
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
goto done;
}
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 23/30] KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM

2010-03-15 Thread Gleb Natapov
Add decoding of X,Y parameters from Intel SDM which are used by string
instruction to specify source and destination. Use this new decoding
to implement movs, cmps, stos, lods in a generic way.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |  125 +---
 1 files changed, 44 insertions(+), 81 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 55b8a8b..6ebd642 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -51,6 +51,7 @@
 #define DstReg  (2<<1) /* Register operand. */
 #define DstMem  (3<<1) /* Memory operand. */
 #define DstAcc  (4<<1)  /* Destination Accumulator */
+#define DstDI   (5<<1) /* Destination is in ES:(E)DI */
 #define DstMask (7<<1)
 /* Source operand type. */
 #define SrcNone (0<<4) /* No source operand. */
@@ -64,6 +65,7 @@
 #define SrcOne  (7<<4) /* Implied '1' */
 #define SrcImmUByte (8<<4)  /* 8-bit unsigned immediate operand. */
 #define SrcImmU (9<<4)  /* Immediate operand, unsigned */
+#define SrcSI   (0xa<<4)   /* Source is in the DS:RSI */
 #define SrcMask (0xf<<4)
 /* Generic ModRM decode. */
 #define ModRM   (1<<8)
@@ -177,12 +179,12 @@ static u32 opcode_table[256] = {
/* 0xA0 - 0xA7 */
ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs,
ByteOp | DstMem | SrcReg | Mov | MemAbs, DstMem | SrcReg | Mov | MemAbs,
-   ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
-   ByteOp | ImplicitOps | String, ImplicitOps | String,
+   ByteOp | SrcSI | DstDI | Mov | String, SrcSI | DstDI | Mov | String,
+   ByteOp | SrcSI | DstDI | String, SrcSI | DstDI | String,
/* 0xA8 - 0xAF */
-   0, 0, ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
-   ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String,
-   ByteOp | ImplicitOps | String, ImplicitOps | String,
+   0, 0, ByteOp | DstDI | Mov | String, DstDI | Mov | String,
+   ByteOp | SrcSI | DstAcc | Mov | String, SrcSI | DstAcc | Mov | String,
+   ByteOp | DstDI | String, DstDI | String,
/* 0xB0 - 0xB7 */
ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
ByteOp | DstReg | SrcImm | Mov, ByteOp | DstReg | SrcImm | Mov,
@@ -1145,6 +1147,14 @@ done_prefixes:
c->src.bytes = 1;
c->src.val = 1;
break;
+   case SrcSI:
+   c->src.type = OP_MEM;
+   c->src.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
+   c->src.ptr = (unsigned long *)
+   register_address(c,  seg_override_base(ctxt, c),
+c->regs[VCPU_REGS_RSI]);
+   c->src.val = 0;
+   break;
}
 
/*
@@ -1230,6 +1240,14 @@ done_prefixes:
}
c->dst.orig_val = c->dst.val;
break;
+   case DstDI:
+   c->dst.type = OP_MEM;
+   c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
+   c->dst.ptr = (unsigned long *)
+   register_address(c, es_base(ctxt),
+c->regs[VCPU_REGS_RDI]);
+   c->dst.val = 0;
+   break;
}
 
 done:
@@ -2388,6 +2406,16 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
return rc;
 }
 
+static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned long base,
+   int reg, unsigned long **ptr)
+{
+   struct decode_cache *c = &ctxt->decode;
+   int df = (ctxt->eflags & EFLG_DF) ? -1 : 1;
+
+   register_address_increment(c, &c->regs[reg], df * c->src.bytes);
+   *ptr = (unsigned long *)register_address(c,  base, c->regs[reg]);
+}
+
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
@@ -2750,89 +2778,16 @@ special_insn:
c->dst.val = (unsigned long)c->regs[VCPU_REGS_RAX];
break;
case 0xa4 ... 0xa5: /* movs */
-   c->dst.type = OP_MEM;
-   c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
-   c->dst.ptr = (unsigned long *)register_address(c,
-  es_base(ctxt),
-  c->regs[VCPU_REGS_RDI]);
-   rc = ops->read_emulated(register_address(c,
-   seg_override_base(ctxt, c),
-   c->regs[VCPU_REGS_RSI]),
-   &c->dst.val,
-   c->dst.bytes, ctxt->vcpu);
-   if (rc != X86EMUL_CONTINUE)
-   goto done;
-   register_address_increment(c, &c->regs[VCPU_REGS_RSI],
-  (ctxt->eflags & EFLG_DF) ? -c->dst.bytes
-  

[PATCH v3 02/30] KVM: x86 emulator: fix RCX access during rep emulation

2010-03-15 Thread Gleb Natapov
During rep emulation access length to RCX depends on current address
mode.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0b70a36..4dce805 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1852,7 +1852,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 
if (c->rep_prefix && (c->d & String)) {
/* All REP prefixes have the same first termination condition */
-   if (c->regs[VCPU_REGS_RCX] == 0) {
+   if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) {
kvm_rip_write(ctxt->vcpu, c->eip);
goto done;
}
@@ -1876,7 +1876,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
goto done;
}
}
-   c->regs[VCPU_REGS_RCX]--;
+   register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
c->eip = kvm_rip_read(ctxt->vcpu);
}
 
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 19/30] KVM: x86 emulator: Emulate task switch in emulator.c

2010-03-15 Thread Gleb Natapov
Implement emulation of 16/32 bit task switch in emulator.c

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |5 +
 arch/x86/kvm/emulate.c |  563 
 2 files changed, 568 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f901467..bd46929 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -11,6 +11,8 @@
 #ifndef _ASM_X86_KVM_X86_EMULATE_H
 #define _ASM_X86_KVM_X86_EMULATE_H
 
+#include 
+
 struct x86_emulate_ctxt;
 
 /*
@@ -210,5 +212,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops);
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt,
 struct x86_emulate_ops *ops);
+int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+u16 tss_selector, int reason);
 
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d696cbd..db4776c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -33,6 +33,7 @@
 #include 
 
 #include "x86.h"
+#include "tss.h"
 
 /*
  * Opcode effective-address decode tables.
@@ -1221,6 +1222,198 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static u32 desc_limit_scaled(struct desc_struct *desc)
+{
+   u32 limit = get_desc_limit(desc);
+
+   return desc->g ? (limit << 12) | 0xfff : limit;
+}
+
+static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+u16 selector, struct desc_ptr *dt)
+{
+   if (selector & 1 << 2) {
+   struct desc_struct desc;
+   memset (dt, 0, sizeof *dt);
+   if (!ops->get_cached_descriptor(&desc, VCPU_SREG_LDTR, 
ctxt->vcpu))
+   return;
+
+   dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? 
*/
+   dt->address = get_desc_base(&desc);
+   } else
+   ops->get_gdt(dt, ctxt->vcpu);
+}
+
+/* allowed just for 8 bytes segments */
+static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  u16 selector, struct desc_struct *desc)
+{
+   struct desc_ptr dt;
+   u16 index = selector >> 3;
+   int ret;
+   u32 err;
+   ulong addr;
+
+   get_descriptor_table_ptr(ctxt, ops, selector, &dt);
+
+   if (dt.size < index * 8 + 7) {
+   kvm_inject_gp(ctxt->vcpu, selector & 0xfffc);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+   addr = dt.address + index * 8;
+   ret = ops->read_std(addr, desc, sizeof *desc, ctxt->vcpu,  &err);
+   if (ret == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt->vcpu, addr, err);
+
+   return ret;
+}
+
+/* allowed just for 8 bytes segments */
+static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops,
+   u16 selector, struct desc_struct *desc)
+{
+   struct desc_ptr dt;
+   u16 index = selector >> 3;
+   u32 err;
+   ulong addr;
+   int ret;
+
+   get_descriptor_table_ptr(ctxt, ops, selector, &dt);
+
+   if (dt.size < index * 8 + 7) {
+   kvm_inject_gp(ctxt->vcpu, selector & 0xfffc);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+
+   addr = dt.address + index * 8;
+   ret = ops->write_std(addr, desc, sizeof *desc, ctxt->vcpu, &err);
+   if (ret == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt->vcpu, addr, err);
+
+   return ret;
+}
+
+static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  u16 selector, int seg)
+{
+   struct desc_struct seg_desc;
+   u8 dpl, rpl, cpl;
+   unsigned err_vec = GP_VECTOR;
+   u32 err_code = 0;
+   bool null_selector = !(selector & ~0x3); /* -0003 are null */
+   int ret;
+
+   memset(&seg_desc, 0, sizeof seg_desc);
+
+   if ((seg <= VCPU_SREG_GS && ctxt->mode == X86EMUL_MODE_VM86)
+   || ctxt->mode == X86EMUL_MODE_REAL) {
+   /* set real mode segment descriptor */
+   set_desc_base(&seg_desc, selector << 4);
+   set_desc_limit(&seg_desc, 0x);
+   seg_desc.type = 3;
+   seg_desc.p = 1;
+   seg_desc.s = 1;
+   goto load;
+   }
+
+   /* NULL selector is not valid for TR, CS and SS */
+   if ((seg == VCPU_SREG_CS || seg == VCPU_SREG_SS || seg == VCPU_SREG_TR)
+   && null_selector)
+   goto exception;
+
+   /* TR should be in GDT

[PATCH v3 27/30] KVM: x86 emulator: remove saved_eip

2010-03-15 Thread Gleb Natapov
c->eip is never written back in case of emulation failure, so no need to
set it to old value.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |9 +
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1bedbb6..541f3c9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2420,7 +2420,6 @@ int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
u64 msr_data;
-   unsigned long saved_eip = 0;
struct decode_cache *c = &ctxt->decode;
int rc = X86EMUL_CONTINUE;
 
@@ -2432,7 +2431,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 */
 
memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
-   saved_eip = c->eip;
 
if (ctxt->mode == X86EMUL_MODE_PROT64 && (c->d & No64)) {
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
@@ -2923,11 +2921,7 @@ writeback:
kvm_rip_write(ctxt->vcpu, c->eip);
 
 done:
-   if (rc == X86EMUL_UNHANDLEABLE) {
-   c->eip = saved_eip;
-   return -1;
-   }
-   return 0;
+   return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 
 twobyte_insn:
switch (c->b) {
@@ -3204,6 +3198,5 @@ twobyte_insn:
 
 cannot_emulate:
DPRINTF("Cannot emulate %02x\n", c->b);
-   c->eip = saved_eip;
return -1;
 }
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 18/30] KVM: x86 emulator: Provide more callbacks for x86 emulator.

2010-03-15 Thread Gleb Natapov
Provide get_cached_descriptor(), set_cached_descriptor(),
get_segment_selector(), set_segment_selector(), get_gdt(),
write_std() callbacks.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |   16 +
 arch/x86/kvm/x86.c |  130 +++
 2 files changed, 131 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0765725..f901467 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -63,6 +63,15 @@ struct x86_emulate_ops {
unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 
/*
+* write_std: Write bytes of standard (non-emulated/special) memory.
+*Used for descriptor writing.
+*  @addr:  [IN ] Linear address to which to write.
+*  @val:   [OUT] Value write to memory, zero-extended to 'u_long'.
+*  @bytes: [IN ] Number of bytes to write to memory.
+*/
+   int (*write_std)(unsigned long addr, void *val,
+unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
+   /*
 * fetch: Read bytes of standard (non-emulated/special) memory.
 *Used for instruction fetch.
 *  @addr:  [IN ] Linear address from which to read.
@@ -108,6 +117,13 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
+   bool (*get_cached_descriptor)(struct desc_struct *desc,
+ int seg, struct kvm_vcpu *vcpu);
+   void (*set_cached_descriptor)(struct desc_struct *desc,
+ int seg, struct kvm_vcpu *vcpu);
+   u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu);
+   void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu);
+   void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 022d28e..2ef83db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3050,6 +3050,18 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t 
addr, int len, void *v)
return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v);
 }
 
+static void kvm_set_segment(struct kvm_vcpu *vcpu,
+   struct kvm_segment *var, int seg)
+{
+   kvm_x86_ops->set_segment(vcpu, var, seg);
+}
+
+void kvm_get_segment(struct kvm_vcpu *vcpu,
+struct kvm_segment *var, int seg)
+{
+   kvm_x86_ops->get_segment(vcpu, var, seg);
+}
+
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
@@ -3130,14 +3142,18 @@ static int kvm_read_guest_virt_system(gva_t addr, void 
*val, unsigned int bytes,
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
 }
 
-static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
-   struct kvm_vcpu *vcpu, u32 *error)
+static int kvm_write_guest_virt_helper(gva_t addr, void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu, u32 access,
+  u32 *error)
 {
void *data = val;
int r = X86EMUL_CONTINUE;
 
+   access |= PFERR_WRITE_MASK;
+
while (bytes) {
-   gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error);
+   gpa_t gpa =  vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, 
error);
unsigned offset = addr & (PAGE_SIZE-1);
unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
int ret;
@@ -3160,6 +3176,19 @@ out:
return r;
 }
 
+static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
+   struct kvm_vcpu *vcpu, u32 *error)
+{
+   u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+   return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, access, 
error);
+}
+
+static int kvm_write_guest_virt_system(gva_t addr, void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu, u32 *error)
+{
+   return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
+}
 
 static int emulator_read_emulated(unsigned long addr,
  void *val,
@@ -3447,12 +3476,95 @@ static int emulator_get_cpl(struct kvm_vcpu *vcpu)
return kvm_x86_ops->get_cpl(vcpu);
 }
 
+static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu)
+{
+   kvm_x86_ops->get_g

[PATCH v3 26/30] KVM: x86 emulator: Move string pio emulation into emulator.c

2010-03-15 Thread Gleb Natapov
Currently emulation is done outside of emulator so things like doing
ins/outs to/from mmio are broken it also makes it hard (if not impossible)
to implement single stepping in the future. The implementation in this
patch is not efficient since it exits to userspace for each IO while
previous implementation did 'ins' in batches. Further patch that
implements pio in string read ahead address this problem.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |8 --
 arch/x86/kvm/emulate.c  |   48 +++--
 arch/x86/kvm/x86.c  |  204 +++
 3 files changed, 31 insertions(+), 229 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a4fb8d..c072401 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -224,14 +224,9 @@ struct kvm_pv_mmu_op_buffer {
 
 struct kvm_pio_request {
unsigned long count;
-   int cur_count;
-   gva_t guest_gva;
int in;
int port;
int size;
-   int string;
-   int down;
-   int rep;
 };
 
 /*
@@ -590,9 +585,6 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 
data);
 struct x86_emulate_ctxt;
 
 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
-int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
-  int size, unsigned long count, int down,
-   gva_t address, int rep, unsigned port);
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 873da58..1bedbb6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -153,8 +153,8 @@ static u32 opcode_table[256] = {
0, 0, 0, 0,
/* 0x68 - 0x6F */
SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0,
-   SrcNone  | ByteOp  | ImplicitOps, SrcNone  | ImplicitOps, /* insb, 
insw/insd */
-   SrcNone  | ByteOp  | ImplicitOps, SrcNone  | ImplicitOps, /* outsb, 
outsw/outsd */
+   DstDI | ByteOp | Mov | String, DstDI | Mov | String, /* insb, insw/insd 
*/
+   SrcSI | ByteOp | ImplicitOps | String, SrcSI | ImplicitOps | String, /* 
outsb, outsw/outsd */
/* 0x70 - 0x77 */
SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
@@ -2611,47 +2611,29 @@ special_insn:
break;
case 0x6c:  /* insb */
case 0x6d:  /* insw/insd */
+   c->dst.bytes = min(c->dst.bytes, 4u);
if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX],
- (c->d & ByteOp) ? 1 : c->op_bytes)) {
+ c->dst.bytes)) {
kvm_inject_gp(ctxt->vcpu, 0);
goto done;
}
-   if (kvm_emulate_pio_string(ctxt->vcpu,
-   1,
-   (c->d & ByteOp) ? 1 : c->op_bytes,
-   c->rep_prefix ?
-   address_mask(c, c->regs[VCPU_REGS_RCX]) : 1,
-   (ctxt->eflags & EFLG_DF),
-   register_address(c, es_base(ctxt),
-c->regs[VCPU_REGS_RDI]),
-   c->rep_prefix,
-   c->regs[VCPU_REGS_RDX]) == 0) {
-   c->eip = saved_eip;
-   return -1;
-   }
-   return 0;
+   if (!ops->pio_in_emulated(c->dst.bytes, c->regs[VCPU_REGS_RDX],
+ &c->dst.val, 1, ctxt->vcpu))
+   goto done; /* IO is needed, skip writeback */
+   break;
case 0x6e:  /* outsb */
case 0x6f:  /* outsw/outsd */
+   c->src.bytes = min(c->src.bytes, 4u);
if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX],
- (c->d & ByteOp) ? 1 : c->op_bytes)) {
+ c->src.bytes)) {
kvm_inject_gp(ctxt->vcpu, 0);
goto done;
}
-   if (kvm_emulate_pio_string(ctxt->vcpu,
-   0,
-   (c->d & ByteOp) ? 1 : c->op_bytes,
-   c->rep_prefix ?
-   address_mask(c, c->regs[VCPU_REGS_RCX]) : 1,
-   (ctxt->eflags & EFLG_DF),
-register_address(c,
- seg_override_base(ctxt, c),
-c->regs[VCPU_REGS_RSI]),
- 

[PATCH v3 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Gleb Natapov
Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode, but return to a guest
mode each 1024 iterations to allow interrupt injection. Pending
exception causes immediate guest entry too.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   34 +++---
 arch/x86/kvm/x86.c |   19 ++-
 3 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 679245c..7fda16f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -193,6 +193,7 @@ struct x86_emulate_ctxt {
/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
 
+   bool restart; /* restart string instruction after writeback */
/* decode cache */
struct decode_cache decode;
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 541f3c9..c4da60e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -927,8 +927,11 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
int mode = ctxt->mode;
int def_op_bytes, def_ad_bytes, group;
 
-   /* Shadow copy of register state. Committed on successful emulation. */
 
+   /* we cannot decode insn before we complete previous rep insn */
+   WARN_ON(ctxt->restart);
+
+   /* Shadow copy of register state. Committed on successful emulation. */
memset(c, 0, sizeof(struct decode_cache));
c->eip = ctxt->eip;
ctxt->cs_base = seg_base(ctxt, VCPU_SREG_CS);
@@ -2422,6 +2425,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
u64 msr_data;
struct decode_cache *c = &ctxt->decode;
int rc = X86EMUL_CONTINUE;
+   int saved_dst_type = c->dst.type;
 
ctxt->interruptibility = 0;
 
@@ -2450,8 +2454,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c->rep_prefix && (c->d & String)) {
+   ctxt->restart = true;
/* All REP prefixes have the same first termination condition */
if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) {
+   string_done:
+   ctxt->restart = false;
kvm_rip_write(ctxt->vcpu, c->eip);
goto done;
}
@@ -2463,17 +2470,13 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 *  - if REPNE/REPNZ and ZF = 1 then done
 */
if ((c->b == 0xa6) || (c->b == 0xa7) ||
-   (c->b == 0xae) || (c->b == 0xaf)) {
+   (c->b == 0xae) || (c->b == 0xaf)) {
if ((c->rep_prefix == REPE_PREFIX) &&
-   ((ctxt->eflags & EFLG_ZF) == 0)) {
-   kvm_rip_write(ctxt->vcpu, c->eip);
-   goto done;
-   }
+   ((ctxt->eflags & EFLG_ZF) == 0))
+   goto string_done;
if ((c->rep_prefix == REPNE_PREFIX) &&
-   ((ctxt->eflags & EFLG_ZF) == EFLG_ZF)) {
-   kvm_rip_write(ctxt->vcpu, c->eip);
-   goto done;
-   }
+   ((ctxt->eflags & EFLG_ZF) == EFLG_ZF))
+   goto string_done;
}
c->eip = ctxt->eip;
}
@@ -2906,6 +2909,12 @@ writeback:
if (rc != X86EMUL_CONTINUE)
goto done;
 
+   /*
+* restore dst type in case the decoding will be reused
+* (happens for string instruction )
+*/
+   c->dst.type = saved_dst_type;
+
if ((c->d & SrcMask) == SrcSI)
string_addr_inc(ctxt, seg_override_base(ctxt, c), VCPU_REGS_RSI,
&c->src);
@@ -2913,8 +2922,11 @@ writeback:
if ((c->d & DstMask) == DstDI)
string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, &c->dst);
 
-   if (c->rep_prefix && (c->d & String))
+   if (c->rep_prefix && (c->d & String)) {
register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
+   if (!(c->regs[VCPU_REGS_RCX] & 0x3ff))
+   ctxt->restart = false;
+   }
 
/* Commit shadow register state. */
memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b8237ac..cd0043a 100644
--- a/arch/x86/kvm/

[PATCH v3 30/30] KVM: small kvm_arch_vcpu_ioctl_run() cleanup.

2010-03-15 Thread Gleb Natapov
Unify all conditions that get us back into emulator after returning from
userspace.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/x86.c |   32 ++--
 1 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd0043a..1c00c06 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4505,33 +4505,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
if (!irqchip_in_kernel(vcpu->kvm))
kvm_set_cr8(vcpu, kvm_run->cr8);
 
-   if (vcpu->arch.pio.count) {
-   vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-   r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
-   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
-   if (r == EMULATE_DO_MMIO) {
-   r = 0;
-   goto out;
+   if (vcpu->arch.pio.count || vcpu->mmio_needed ||
+   vcpu->arch.emulate_ctxt.restart) {
+   if (vcpu->mmio_needed) {
+   memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8);
+   vcpu->mmio_read_completed = 1;
+   vcpu->mmio_needed = 0;
}
-   }
-   if (vcpu->mmio_needed) {
-   memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8);
-   vcpu->mmio_read_completed = 1;
-   vcpu->mmio_needed = 0;
-
-   vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-   r = emulate_instruction(vcpu, vcpu->arch.mmio_fault_cr2, 0,
-   EMULTYPE_NO_DECODE);
-   srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
-   if (r == EMULATE_DO_MMIO) {
-   /*
-* Read-modify-write.  Back to userspace.
-*/
-   r = 0;
-   goto out;
-   }
-   }
-   if (vcpu->arch.emulate_ctxt.restart) {
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 21/30] KVM: Use task switch from emulator.c

2010-03-15 Thread Gleb Natapov
Remove old task switch code from x86.c

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/x86.c |  557 ++--
 1 files changed, 17 insertions(+), 540 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ef83db..7d1b481 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4795,553 +4795,30 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu 
*vcpu,
return 0;
 }
 
-static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector,
-  struct kvm_segment *kvm_desct)
-{
-   kvm_desct->base = get_desc_base(seg_desc);
-   kvm_desct->limit = get_desc_limit(seg_desc);
-   if (seg_desc->g) {
-   kvm_desct->limit <<= 12;
-   kvm_desct->limit |= 0xfff;
-   }
-   kvm_desct->selector = selector;
-   kvm_desct->type = seg_desc->type;
-   kvm_desct->present = seg_desc->p;
-   kvm_desct->dpl = seg_desc->dpl;
-   kvm_desct->db = seg_desc->d;
-   kvm_desct->s = seg_desc->s;
-   kvm_desct->l = seg_desc->l;
-   kvm_desct->g = seg_desc->g;
-   kvm_desct->avl = seg_desc->avl;
-   if (!selector)
-   kvm_desct->unusable = 1;
-   else
-   kvm_desct->unusable = 0;
-   kvm_desct->padding = 0;
-}
-
-static void get_segment_descriptor_dtable(struct kvm_vcpu *vcpu,
- u16 selector,
- struct desc_ptr *dtable)
-{
-   if (selector & 1 << 2) {
-   struct kvm_segment kvm_seg;
-
-   kvm_get_segment(vcpu, &kvm_seg, VCPU_SREG_LDTR);
-
-   if (kvm_seg.unusable)
-   dtable->size = 0;
-   else
-   dtable->size = kvm_seg.limit;
-   dtable->address = kvm_seg.base;
-   }
-   else
-   kvm_x86_ops->get_gdt(vcpu, dtable);
-}
-
-/* allowed just for 8 bytes segments */
-static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-struct desc_struct *seg_desc)
-{
-   struct desc_ptr dtable;
-   u16 index = selector >> 3;
-   int ret;
-   u32 err;
-   gva_t addr;
-
-   get_segment_descriptor_dtable(vcpu, selector, &dtable);
-
-   if (dtable.size < index * 8 + 7) {
-   kvm_queue_exception_e(vcpu, GP_VECTOR, selector & 0xfffc);
-   return X86EMUL_PROPAGATE_FAULT;
-   }
-   addr = dtable.address + index * 8;
-   ret = kvm_read_guest_virt_system(addr, seg_desc, sizeof(*seg_desc),
-vcpu,  &err);
-   if (ret == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(vcpu, addr, err);
-
-   return ret;
-}
-
-/* allowed just for 8 bytes segments */
-static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-struct desc_struct *seg_desc)
-{
-   struct desc_ptr dtable;
-   u16 index = selector >> 3;
-
-   get_segment_descriptor_dtable(vcpu, selector, &dtable);
-
-   if (dtable.size < index * 8 + 7)
-   return 1;
-   return kvm_write_guest_virt(dtable.address + index*8, seg_desc, 
sizeof(*seg_desc), vcpu, NULL);
-}
-
-static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu,
-  struct desc_struct *seg_desc)
-{
-   u32 base_addr = get_desc_base(seg_desc);
-
-   return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL);
-}
-
-static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
-struct desc_struct *seg_desc)
-{
-   u32 base_addr = get_desc_base(seg_desc);
-
-   return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
-}
-
-static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
-{
-   struct kvm_segment kvm_seg;
-
-   kvm_get_segment(vcpu, &kvm_seg, seg);
-   return kvm_seg.selector;
-}
-
-static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int 
seg)
-{
-   struct kvm_segment segvar = {
-   .base = selector << 4,
-   .limit = 0x,
-   .selector = selector,
-   .type = 3,
-   .present = 1,
-   .dpl = 3,
-   .db = 0,
-   .s = 1,
-   .l = 0,
-   .g = 0,
-   .avl = 0,
-   .unusable = 0,
-   };
-   kvm_x86_ops->set_segment(vcpu, &segvar, seg);
-   return X86EMUL_CONTINUE;
-}
-
-static int is_vm86_segment(struct kvm_vcpu *vcpu, int seg)
-{
-   return (seg != VCPU_SREG_LDTR) &&
-   (seg != VCPU_SREG_TR) &&
-   (kvm_get_rflags(vcpu) & X86_EFLAGS_VM);
-}
-
-int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg)
-{
-   struct kvm_segment kvm_seg;
-   struct desc_struct seg_desc;
-   u8 dpl, rpl, cpl;
-   unsigned err_vec = GP_VECTOR;
-   

[PATCH v3 24/30] KVM: x86 emulator: during rep emulation decrement ECX only if emulation succeeded

2010-03-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6ebd642..a166235 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2407,13 +2407,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 }
 
 static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned long base,
-   int reg, unsigned long **ptr)
+   int reg, struct operand *op)
 {
struct decode_cache *c = &ctxt->decode;
int df = (ctxt->eflags & EFLG_DF) ? -1 : 1;
 
-   register_address_increment(c, &c->regs[reg], df * c->src.bytes);
-   *ptr = (unsigned long *)register_address(c,  base, c->regs[reg]);
+   register_address_increment(c, &c->regs[reg], df * op->bytes);
+   op->ptr = (unsigned long *)register_address(c,  base, c->regs[reg]);
 }
 
 int
@@ -2479,7 +2479,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
goto done;
}
}
-   register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
c->eip = ctxt->eip;
}
 
@@ -2932,11 +2931,13 @@ writeback:
 
if ((c->d & SrcMask) == SrcSI)
string_addr_inc(ctxt, seg_override_base(ctxt, c), VCPU_REGS_RSI,
-   &c->src.ptr);
+   &c->src);
 
if ((c->d & DstMask) == DstDI)
-   string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI,
-   &c->dst.ptr);
+   string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, &c->dst);
+
+   if (c->rep_prefix && (c->d & String))
+   register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
 
/* Commit shadow register state. */
memcpy(ctxt->vcpu->arch.regs, c->regs, sizeof c->regs);
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 22/30] KVM: x86 emulator: populate OP_MEM operand during decoding.

2010-03-15 Thread Gleb Natapov
All struct operand fields are initialized during decoding for all
operand types except OP_MEM, but there is no reason for that. Move
OP_MEM operand initialization into decoding stage for consistency.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   66 +---
 1 files changed, 29 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 702bfff..55b8a8b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1057,6 +1057,10 @@ done_prefixes:
 
if (c->ad_bytes != 8)
c->modrm_ea = (u32)c->modrm_ea;
+
+   if (c->rip_relative)
+   c->modrm_ea += c->eip;
+
/*
 * Decode and fetch the source operand: register, memory
 * or immediate.
@@ -1091,6 +1095,8 @@ done_prefixes:
break;
}
c->src.type = OP_MEM;
+   c->src.ptr = (unsigned long *)c->modrm_ea;
+   c->src.val = 0;
break;
case SrcImm:
case SrcImmU:
@@ -1169,8 +1175,10 @@ done_prefixes:
c->src2.val = 1;
break;
case Src2Mem16:
-   c->src2.bytes = 2;
c->src2.type = OP_MEM;
+   c->src2.bytes = 2;
+   c->src2.ptr = (unsigned long *)(c->modrm_ea + c->src.bytes);
+   c->src2.val = 0;
break;
}
 
@@ -1192,6 +1200,15 @@ done_prefixes:
break;
}
c->dst.type = OP_MEM;
+   c->dst.ptr = (unsigned long *)c->modrm_ea;
+   c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
+   c->dst.val = 0;
+   if (c->d & BitOp) {
+   unsigned long mask = ~(c->dst.bytes * 8 - 1);
+
+   c->dst.ptr = (void *)c->dst.ptr +
+  (c->src.val & mask) / 8;
+   }
break;
case DstAcc:
c->dst.type = OP_REG;
@@ -1215,9 +1232,6 @@ done_prefixes:
break;
}
 
-   if (c->rip_relative)
-   c->modrm_ea += c->eip;
-
 done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
@@ -1638,14 +1652,13 @@ static inline int emulate_grp45(struct x86_emulate_ctxt 
*ctxt,
 }
 
 static inline int emulate_grp9(struct x86_emulate_ctxt *ctxt,
-  struct x86_emulate_ops *ops,
-  unsigned long memop)
+  struct x86_emulate_ops *ops)
 {
struct decode_cache *c = &ctxt->decode;
u64 old, new;
int rc;
 
-   rc = ops->read_emulated(memop, &old, 8, ctxt->vcpu);
+   rc = ops->read_emulated(c->modrm_ea, &old, 8, ctxt->vcpu);
if (rc != X86EMUL_CONTINUE)
return rc;
 
@@ -1660,7 +1673,7 @@ static inline int emulate_grp9(struct x86_emulate_ctxt 
*ctxt,
new = ((u64)c->regs[VCPU_REGS_RCX] << 32) |
   (u32) c->regs[VCPU_REGS_RBX];
 
-   rc = ops->cmpxchg_emulated(memop, &old, &new, 8, ctxt->vcpu);
+   rc = ops->cmpxchg_emulated(c->modrm_ea, &old, &new, 8, 
ctxt->vcpu);
if (rc != X86EMUL_CONTINUE)
return rc;
ctxt->eflags |= EFLG_ZF;
@@ -2378,7 +2391,6 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
-   unsigned long memop = 0;
u64 msr_data;
unsigned long saved_eip = 0;
struct decode_cache *c = &ctxt->decode;
@@ -2413,9 +2425,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
goto done;
}
 
-   if (((c->d & ModRM) && (c->modrm_mod != 3)) || (c->d & MemAbs))
-   memop = c->modrm_ea;
-
if (c->rep_prefix && (c->d & String)) {
/* All REP prefixes have the same first termination condition */
if (address_mask(c, c->regs[VCPU_REGS_RCX]) == 0) {
@@ -2447,8 +2456,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c->src.type == OP_MEM) {
-   c->src.ptr = (unsigned long *)memop;
-   c->src.val = 0;
rc = ops->read_emulated((unsigned long)c->src.ptr,
&c->src.val,
c->src.bytes,
@@ -2459,8 +2466,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c->src2.type == OP_MEM) {
-   c->src2.ptr = (unsigned long *)(memop + c->src.bytes);
-   c->src2.val = 0;
rc = ops->read_emulated((unsigned long)c->src2.ptr,
&c->src2.val,
c->src2.bytes,
@@ -2473,25 +2478,12 @@ 

[PATCH v3 29/30] KVM: x86 emulator: introduce pio in string read ahead.

2010-03-15 Thread Gleb Natapov
To optimize "rep ins" instruction do IO in big chunks ahead of time
instead of doing it only when required during instruction emulation.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |7 ++
 arch/x86/kvm/emulate.c |   43 +++
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 7fda16f..b5e12c5 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -151,6 +151,12 @@ struct fetch_cache {
unsigned long end;
 };
 
+struct read_cache {
+   u8 data[1024];
+   unsigned long pos;
+   unsigned long end;
+};
+
 struct decode_cache {
u8 twobyte;
u8 b;
@@ -178,6 +184,7 @@ struct decode_cache {
void *modrm_ptr;
unsigned long modrm_val;
struct fetch_cache fetch;
+   struct read_cache io_read;
 };
 
 struct x86_emulate_ctxt {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c4da60e..d9cf93b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1257,6 +1257,34 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  unsigned int size, unsigned short port,
+  void *dest)
+{
+   struct read_cache *rc = &ctxt->decode.io_read;
+
+   if (rc->pos == rc->end) { /* refill pio read ahead */
+   struct decode_cache *c = &ctxt->decode;
+   unsigned int in_page, n;
+   unsigned int count = c->rep_prefix ?
+   address_mask(c, c->regs[VCPU_REGS_RCX]) : 1;
+   in_page = (ctxt->eflags & EFLG_DF) ?
+   offset_in_page(c->regs[VCPU_REGS_RDI]) :
+   PAGE_SIZE - offset_in_page(c->regs[VCPU_REGS_RDI]);
+   n = min(min(in_page, (unsigned int)sizeof(rc->data)) / size,
+   count);
+   rc->pos = rc->end = 0;
+   if (!ops->pio_in_emulated(size, port, rc->data, n, ctxt->vcpu))
+   return 0;
+   rc->end = n * size;
+   }
+
+   memcpy(dest, rc->data + rc->pos, size);
+   rc->pos += size;
+   return 1;
+}
+
 static u32 desc_limit_scaled(struct desc_struct *desc)
 {
u32 limit = get_desc_limit(desc);
@@ -2618,8 +2646,8 @@ special_insn:
kvm_inject_gp(ctxt->vcpu, 0);
goto done;
}
-   if (!ops->pio_in_emulated(c->dst.bytes, c->regs[VCPU_REGS_RDX],
- &c->dst.val, 1, ctxt->vcpu))
+   if (!pio_in_emulated(ctxt, ops, c->dst.bytes,
+c->regs[VCPU_REGS_RDX], &c->dst.val))
goto done; /* IO is needed, skip writeback */
break;
case 0x6e:  /* outsb */
@@ -2835,8 +2863,7 @@ special_insn:
kvm_inject_gp(ctxt->vcpu, 0);
goto done;
}
-   ops->pio_in_emulated(c->dst.bytes, c->src.val, &c->dst.val, 1,
-ctxt->vcpu);
+   pio_in_emulated(ctxt, ops, c->dst.bytes, c->src.val, 
&c->dst.val);
break;
case 0xee: /* out al,dx */
case 0xef: /* out (e/r)ax,dx */
@@ -2923,8 +2950,14 @@ writeback:
string_addr_inc(ctxt, es_base(ctxt), VCPU_REGS_RDI, &c->dst);
 
if (c->rep_prefix && (c->d & String)) {
+   struct read_cache *rc = &ctxt->decode.io_read;
register_address_increment(c, &c->regs[VCPU_REGS_RCX], -1);
-   if (!(c->regs[VCPU_REGS_RCX] & 0x3ff))
+   /*
+* Re-enter guest when pio read ahead buffer is empty or,
+* if it is not used, after each 1024 iteration.
+*/
+   if ((rc->end == 0 && !(c->regs[VCPU_REGS_RCX] & 0x3ff)) ||
+   (rc->end != 0 && rc->end == rc->pos))
ctxt->restart = false;
}
 
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 17/30] KVM: x86 emulator: cleanup grp3 return value

2010-03-15 Thread Gleb Natapov
When x86_emulate_insn() does not know how to emulate instruction it
exits via cannot_emulate label in all cases except when emulating
grp3. Fix that.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 46a7ee3..d696cbd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1397,7 +1397,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
   struct x86_emulate_ops *ops)
 {
struct decode_cache *c = &ctxt->decode;
-   int rc = X86EMUL_CONTINUE;
 
switch (c->modrm_reg) {
case 0 ... 1:   /* test */
@@ -1410,11 +1409,9 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
emulate_1op("neg", c->dst, ctxt->eflags);
break;
default:
-   DPRINTF("Cannot emulate %02x\n", c->b);
-   rc = X86EMUL_UNHANDLEABLE;
-   break;
+   return 0;
}
-   return rc;
+   return 1;
 }
 
 static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt,
@@ -2374,9 +2371,8 @@ special_insn:
c->dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xf6 ... 0xf7: /* Grp3 */
-   rc = emulate_grp3(ctxt, ops);
-   if (rc != X86EMUL_CONTINUE)
-   goto done;
+   if (!emulate_grp3(ctxt, ops))
+   goto cannot_emulate;
break;
case 0xf8: /* clc */
ctxt->eflags &= ~EFLG_CF;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 25/30] KVM: x86 emulator: fix in/out emulation.

2010-03-15 Thread Gleb Natapov
in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |7 +
 arch/x86/include/asm/kvm_host.h|3 +-
 arch/x86/kvm/emulate.c |   49 -
 arch/x86/kvm/svm.c |   20 +--
 arch/x86/kvm/vmx.c |   18 ++--
 arch/x86/kvm/x86.c |  213 ++--
 6 files changed, 177 insertions(+), 133 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index bd46929..679245c 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -119,6 +119,13 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
+
+   int (*pio_in_emulated)(int size, unsigned short port, void *val,
+  unsigned int count, struct kvm_vcpu *vcpu);
+
+   int (*pio_out_emulated)(int size, unsigned short port, const void *val,
+   unsigned int count, struct kvm_vcpu *vcpu);
+
bool (*get_cached_descriptor)(struct desc_struct *desc,
  int seg, struct kvm_vcpu *vcpu);
void (*set_cached_descriptor)(struct desc_struct *desc,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 72997aa..4a4fb8d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -589,8 +589,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 
data);
 
 struct x86_emulate_ctxt;
 
-int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in,
-int size, unsigned port);
+int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
 int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
   int size, unsigned long count, int down,
gva_t address, int rep, unsigned port);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a166235..873da58 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -210,13 +210,13 @@ static u32 opcode_table[256] = {
0, 0, 0, 0, 0, 0, 0, 0,
/* 0xE0 - 0xE7 */
0, 0, 0, 0,
-   ByteOp | SrcImmUByte, SrcImmUByte,
-   ByteOp | SrcImmUByte, SrcImmUByte,
+   ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc,
+   ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc,
/* 0xE8 - 0xEF */
SrcImm | Stack, SrcImm | ImplicitOps,
SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps,
-   SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
-   SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
+   SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
+   SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
/* 0xF0 - 0xF7 */
0, 0, 0, 0,
ImplicitOps | Priv, ImplicitOps, Group | Group3_Byte, Group | Group3,
@@ -2422,8 +2422,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
u64 msr_data;
unsigned long saved_eip = 0;
struct decode_cache *c = &ctxt->decode;
-   unsigned int port;
-   int io_dir_in;
int rc = X86EMUL_CONTINUE;
 
ctxt->interruptibility = 0;
@@ -2819,14 +2817,10 @@ special_insn:
break;
case 0xe4:  /* inb */
case 0xe5:  /* in */
-   port = c->src.val;
-   io_dir_in = 1;
-   goto do_io;
+   goto do_io_in;
case 0xe6: /* outb */
case 0xe7: /* out */
-   port = c->src.val;
-   io_dir_in = 0;
-   goto do_io;
+   goto do_io_out;
case 0xe8: /* call (near) */ {
long int rel = c->src.val;
c->src.val = (unsigned long) c->eip;
@@ -2851,25 +2845,28 @@ special_insn:
break;
case 0xec: /* in al,dx */
case 0xed: /* in (e/r)ax,dx */
-   port = c->regs[VCPU_REGS_RDX];
-   io_dir_in = 1;
-   goto do_io;
+   c->src.val = c->regs[VCPU_REGS_RDX];
+   do_io_in:
+   c->dst.bytes = min(c->dst.bytes, 4u);
+   if (!emulator_io_permited(ctxt, ops, c->src.val, c->dst.bytes)) 
{
+   kvm_inject_gp(ctxt->vcpu, 0);
+   goto done;
+   }
+   ops->pio_in_emulated(c->dst.bytes, c->src.val, &c->dst.val, 1,
+ctxt->vcpu);
+ 

[PATCH v3 05/30] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-15 Thread Gleb Natapov
Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |3 +-
 arch/x86/include/asm/kvm_host.h|2 -
 arch/x86/kvm/emulate.c |7 +-
 arch/x86/kvm/x86.c |  114 ++--
 4 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 2666d7a..0c5caa4 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -108,7 +108,8 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
-
+   ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
+   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8567107..9725856 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, 
unsigned long address);
 void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
   unsigned long *rflags);
 
-unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 91450b5..5b060e4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2483,7 +2483,7 @@ twobyte_insn:
break;
case 4: /* smsw */
c->dst.bytes = 2;
-   c->dst.val = realmode_get_cr(ctxt->vcpu, 0);
+   c->dst.val = ops->get_cr(0, ctxt->vcpu);
break;
case 6: /* lmsw */
realmode_lmsw(ctxt->vcpu, (u16)c->src.val,
@@ -2519,8 +2519,7 @@ twobyte_insn:
case 0x20: /* mov cr, reg */
if (c->modrm_mod != 3)
goto cannot_emulate;
-   c->regs[c->modrm_rm] =
-   realmode_get_cr(ctxt->vcpu, c->modrm_reg);
+   c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu);
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
@@ -2534,7 +2533,7 @@ twobyte_insn:
case 0x22: /* mov reg, cr */
if (c->modrm_mod != 3)
goto cannot_emulate;
-   realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val);
+   ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu);
c->dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56cdaa5..fb00ed5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3386,12 +3386,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu 
*vcpu, const char *context)
 }
 EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
 
+static u64 mk_cr_64(u64 curr_cr, u32 new_val)
+{
+   return (curr_cr & ~((1ULL << 32) - 1)) | new_val;
+}
+
+static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
+{
+   unsigned long value;
+
+   switch (cr) {
+   case 0:
+   value = kvm_read_cr0(vcpu);
+   break;
+   case 2:
+   value = vcpu->arch.cr2;
+   break;
+   case 3:
+   value = vcpu->arch.cr3;
+   break;
+   case 4:
+   value = kvm_read_cr4(vcpu);
+   break;
+   case 8:
+   value = kvm_get_cr8(vcpu);
+   break;
+   default:
+   vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
+   return 0;
+   }
+
+   return value;
+}
+
+static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu)
+{
+   switch (cr) {
+   case 0:
+   kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
+   break;
+   case 2:
+   vcpu->arch.cr2 = val;
+   break;
+   case 3:
+   kvm_set_cr3(vcpu, val);
+   break;
+   case 4:
+   kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val));
+   break;
+   case 8:
+   kvm_set_cr8(vcpu, val & 0xfUL);
+   break;
+   default:
+   vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
+   }
+}
+
 static struct x86_emulate_ops emulate_

[PATCH v3 14/30] KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations

2010-03-15 Thread Gleb Natapov
Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD
for those instruction when appropriate.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5afddcf..1393bf0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1600,8 +1600,11 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
u64 msr_data;
 
/* syscall is not available in real mode */
-   if (ctxt->mode == X86EMUL_MODE_REAL || ctxt->mode == X86EMUL_MODE_VM86)
-   return X86EMUL_UNHANDLEABLE;
+   if (ctxt->mode == X86EMUL_MODE_REAL ||
+   ctxt->mode == X86EMUL_MODE_VM86) {
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
 
setup_syscalls_segments(ctxt, &cs, &ss);
 
@@ -1651,14 +1654,16 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
/* inject #GP if in real mode */
if (ctxt->mode == X86EMUL_MODE_REAL) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_PROPAGATE_FAULT;
}
 
/* XXX sysenter/sysexit have not been tested in 64bit mode.
* Therefore, we inject an #UD.
*/
-   if (ctxt->mode == X86EMUL_MODE_PROT64)
-   return X86EMUL_UNHANDLEABLE;
+   if (ctxt->mode == X86EMUL_MODE_PROT64) {
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
 
setup_syscalls_segments(ctxt, &cs, &ss);
 
@@ -1713,7 +1718,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
if (ctxt->mode == X86EMUL_MODE_REAL ||
ctxt->mode == X86EMUL_MODE_VM86) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_PROPAGATE_FAULT;
}
 
setup_syscalls_segments(ctxt, &cs, &ss);
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 12/30] KVM: x86 emulator: inject #UD on access to non-existing CR

2010-03-15 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fa4604e..836e97b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2520,6 +2520,13 @@ twobyte_insn:
c->dst.type = OP_NONE;
break;
case 0x20: /* mov cr, reg */
+   switch (c->modrm_reg) {
+   case 1:
+   case 5 ... 7:
+   case 9 ... 15:
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   goto done;
+   }
c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu);
c->dst.type = OP_NONE;  /* no writeback */
break;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/30] KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.

2010-03-15 Thread Gleb Natapov
Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field
are ignored. Interestingly enough older spec says that 11 is only valid
encoding.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |8 
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7c7debb..fa4604e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2520,28 +2520,20 @@ twobyte_insn:
c->dst.type = OP_NONE;
break;
case 0x20: /* mov cr, reg */
-   if (c->modrm_mod != 3)
-   goto cannot_emulate;
c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu);
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
-   if (c->modrm_mod != 3)
-   goto cannot_emulate;
if (emulator_get_dr(ctxt, c->modrm_reg, &c->regs[c->modrm_rm]))
goto cannot_emulate;
rc = X86EMUL_CONTINUE;
c->dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
-   if (c->modrm_mod != 3)
-   goto cannot_emulate;
ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu);
c->dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
-   if (c->modrm_mod != 3)
-   goto cannot_emulate;
if (emulator_set_dr(ctxt, c->modrm_reg, c->regs[c->modrm_rm]))
goto cannot_emulate;
rc = X86EMUL_CONTINUE;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/30] KVM: remove realmode_lmsw function.

2010-03-15 Thread Gleb Natapov
Use (get|set)_cr callback to emulate lmsw inside emulator.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |2 --
 arch/x86/kvm/emulate.c  |4 ++--
 arch/x86/kvm/x86.c  |7 ---
 3 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9725856..72997aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -582,8 +582,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context);
 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
 void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags);
 
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5b060e4..5e2fa61 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2486,8 +2486,8 @@ twobyte_insn:
c->dst.val = ops->get_cr(0, ctxt->vcpu);
break;
case 6: /* lmsw */
-   realmode_lmsw(ctxt->vcpu, (u16)c->src.val,
- &ctxt->eflags);
+   ops->set_cr(0, (ops->get_cr(0, ctxt->vcpu) & ~0x0ful) |
+   (c->src.val & 0x0f), ctxt->vcpu);
c->dst.type = OP_NONE;
break;
case 7: /* invlpg*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb00ed5..b139334 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4061,13 +4061,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, 
unsigned long base)
kvm_x86_ops->set_idt(vcpu, &dt);
 }
 
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags)
-{
-   kvm_lmsw(vcpu, msw);
-   *rflags = kvm_get_rflags(vcpu);
-}
-
 static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i)
 {
struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i];
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/30] KVM: x86 emulator: fix mov r/m, sreg emulation.

2010-03-15 Thread Gleb Natapov
mov r/m, sreg generates #UD ins sreg is incorrect.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2c27aa4..c3b9334 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2126,12 +2126,11 @@ special_insn:
case 0x8c: { /* mov r/m, sreg */
struct kvm_segment segreg;
 
-   if (c->modrm_reg <= 5)
+   if (c->modrm_reg <= VCPU_SREG_GS)
kvm_get_segment(ctxt->vcpu, &segreg, c->modrm_reg);
else {
-   printk(KERN_INFO "0x8c: Invalid segreg in modrm byte 
0x%02x\n",
-  c->modrm);
-   goto cannot_emulate;
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   goto done;
}
c->dst.val = segreg.selector;
break;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/30] KVM: x86 emulator: fix 0f 01 /5 emulation

2010-03-15 Thread Gleb Natapov
It is undefined and should generate #UD.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c3b9334..7c7debb 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2490,6 +2490,9 @@ twobyte_insn:
(c->src.val & 0x0f), ctxt->vcpu);
c->dst.type = OP_NONE;
break;
+   case 5: /* not defined */
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   goto done;
case 7: /* invlpg*/
emulate_invlpg(ctxt->vcpu, memop);
/* Disable writeback. */
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/30] emulator cleanup

2010-03-15 Thread Gleb Natapov
This is the first series of patches that tries to cleanup emulator code.
This is mix of bug fixes and moving code that does emulation from x86.c
to emulator.c while making it KVM independent. The status of the patches:
works for me. realtime.flat test now also pass where it failed before.

ChangeLog:

v1->v2:
  - A couple of new bug fixed
  - cpl is now x86_emulator_ops callback
  - during string instruction re-enter guest on each page boundary
  - retain fast path for pio out (do not go through emulator)
v2->v3:
  - use correct operand length for pio instruction with REX prefix
  - check for string instruction before decrementing ecx
  - change guest re-entry condition for string instruction

Gleb Natapov (30):
  KVM: x86 emulator: Fix DstAcc decoding.
  KVM: x86 emulator: fix RCX access during rep emulation
  KVM: x86 emulator: check return value against correct define
  KVM: Remove pointer to rflags from realmode_set_cr parameters.
  KVM: Provide callback to get/set control registers in emulator ops.
  KVM: remove realmode_lmsw function.
  KVM: Provide x86_emulate_ctxt callback to get current cpl
  KVM: Provide current eip as part of emulator context.
  KVM: x86 emulator: fix mov r/m, sreg emulation.
  KVM: x86 emulator: fix 0f 01 /5 emulation
  KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.
  KVM: x86 emulator: inject #UD on access to non-existing CR
  KVM: x86 emulator: fix mov dr to inject #UD when needed.
  KVM: x86 emulator: fix return values of syscall/sysenter/sysexit
emulations
  KVM: x86 emulator: do not call writeback if msr access fails.
  KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
  KVM: x86 emulator: cleanup grp3 return value
  KVM: x86 emulator: Provide more callbacks for x86 emulator.
  KVM: x86 emulator: Emulate task switch in emulator.c
  KVM: x86 emulator: Use load_segment_descriptor() instead of
kvm_load_segment_descriptor()
  KVM: Use task switch from emulator.c
  KVM: x86 emulator: populate OP_MEM operand during decoding.
  KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM
  KVM: x86 emulator: during rep emulation decrement ECX only if
emulation succeeded
  KVM: x86 emulator: fix in/out emulation.
  KVM: x86 emulator: Move string pio emulation into emulator.c
  KVM: x86 emulator: remove saved_eip
  KVM: x86 emulator: restart string instruction without going back to a
guest.
  KVM: x86 emulator: introduce pio in string read ahead.
  KVM: small kvm_arch_vcpu_ioctl_run() cleanup.

 arch/x86/include/asm/kvm_emulate.h |   41 ++-
 arch/x86/include/asm/kvm_host.h|   16 +-
 arch/x86/kvm/emulate.c | 1062 ++-
 arch/x86/kvm/svm.c |   20 +-
 arch/x86/kvm/vmx.c |   18 +-
 arch/x86/kvm/x86.c | 1121 +---
 6 files changed, 1146 insertions(+), 1132 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/30] KVM: x86 emulator: Fix DstAcc decoding.

2010-03-15 Thread Gleb Natapov
Set correct operation length. Add RAX (64bit) handling.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2832a8c..0b70a36 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1194,9 +1194,9 @@ done_prefixes:
break;
case DstAcc:
c->dst.type = OP_REG;
-   c->dst.bytes = c->op_bytes;
+   c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
c->dst.ptr = &c->regs[VCPU_REGS_RAX];
-   switch (c->op_bytes) {
+   switch (c->dst.bytes) {
case 1:
c->dst.val = *(u8 *)c->dst.ptr;
break;
@@ -1206,6 +1206,9 @@ done_prefixes:
case 4:
c->dst.val = *(u32 *)c->dst.ptr;
break;
+   case 8:
+   c->dst.val = *(u64 *)c->dst.ptr;
+   break;
}
c->dst.orig_val = c->dst.val;
break;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/30] KVM: x86 emulator: check return value against correct define

2010-03-15 Thread Gleb Natapov
Check return value against correct define instead of open code
the value.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4dce805..670ca8f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -566,7 +566,7 @@ static u32 group2_table[] = {
 #define insn_fetch(_type, _size, _eip)  \
 ({ unsigned long _x;   \
rc = do_insn_fetch(ctxt, ops, (_eip), &_x, (_size));\
-   if (rc != 0)\
+   if (rc != X86EMUL_CONTINUE) \
goto done;  \
(_eip) += (_size);  \
(_type)_x;  \
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 04/30] KVM: Remove pointer to rflags from realmode_set_cr parameters.

2010-03-15 Thread Gleb Natapov
Mov reg, cr instruction doesn't change flags in any meaningful way, so
no need to update rflags after instruction execution.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |3 +--
 arch/x86/kvm/emulate.c  |3 +--
 arch/x86/kvm/x86.c  |4 +---
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ea1b6c6..8567107 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -586,8 +586,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
   unsigned long *rflags);
 
 unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value,
-unsigned long *rflags);
+void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 670ca8f..91450b5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2534,8 +2534,7 @@ twobyte_insn:
case 0x22: /* mov reg, cr */
if (c->modrm_mod != 3)
goto cannot_emulate;
-   realmode_set_cr(ctxt->vcpu,
-   c->modrm_reg, c->modrm_val, &ctxt->eflags);
+   realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val);
c->dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9d02cc7..56cdaa5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4043,13 +4043,11 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, 
int cr)
return value;
 }
 
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val,
-unsigned long *rflags)
+void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val)
 {
switch (cr) {
case 0:
kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
-   *rflags = kvm_get_rflags(vcpu);
break;
case 2:
vcpu->arch.cr2 = val;
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Corrupted filesystem, possible after livemigration with iSCSI storagebackend.

2010-03-15 Thread Daniel P. Berrange
On Mon, Mar 15, 2010 at 08:59:10AM -0500, Anthony Liguori wrote:
> On 03/15/2010 08:46 AM, Espen Berg wrote:
> >In our KVM system we have two iSCSI backends (master/slave
> >configuration) with failover and two KVM hosts supporting live migration.
> >
> >The iSCSI volumes are shared by the host as a block device in KVM, and
> >the volumes are available on both frontends.  After a reboot one of the
> >KVMs where not able to start again due to file system corruption.  We
> >use XFS and have problems to understand what caused the corruption.
> >
> >We have ruled out the iSCSI backend as both the master and slave data
> >where consistent at the time.
> >
> >Anyone else had similar problems?  What is the recommended way to share
> >an iSCSI drive among the two host machines?
> >
> >Should XFS be ok as a file system for live migration?  I'm not able to
> >find any documentation stating that a clustered file system (GFS2 etc.)
> >is recommended.  Are there any concurrent writes on the two host
> >machines during a livemigtation?
> >
> >
> >
> >
> >
> >
> >
> 
> You need to use cache=off if you've got one iscsi drive mounted on two 
> separate physical machines.

FYI, this can be done by changing the disk XML driver

  

to be

  

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Corrupted filesystem, possible after livemigration with iSCSI storagebackend.

2010-03-15 Thread Anthony Liguori

On 03/15/2010 08:46 AM, Espen Berg wrote:

In our KVM system we have two iSCSI backends (master/slave
configuration) with failover and two KVM hosts supporting live migration.

The iSCSI volumes are shared by the host as a block device in KVM, and
the volumes are available on both frontends.  After a reboot one of the
KVMs where not able to start again due to file system corruption.  We
use XFS and have problems to understand what caused the corruption.

We have ruled out the iSCSI backend as both the master and slave data
where consistent at the time.

Anyone else had similar problems?  What is the recommended way to share
an iSCSI drive among the two host machines?

Should XFS be ok as a file system for live migration?  I'm not able to
find any documentation stating that a clustered file system (GFS2 etc.)
is recommended.  Are there any concurrent writes on the two host
machines during a livemigtation?









You need to use cache=off if you've got one iscsi drive mounted on two 
separate physical machines.


The additional layer of caching will result in inconsistency because 
iSCSI doesn't have a mechanism to provide cache coherence between two nodes.


Regards,

Anthony Liguori


#virsh version
Compiled against library: libvir 0.7.6
Using library: libvir 0.7.6
Using API: QEMU 0.7.6
Running hypervisor: QEMU 0.11.0

#uname -a
Linux vm01 2.6.32-bpo.2-amd64 #1 SMP Fri Feb 12 16:50:27 UTC 2010 x86_64
GNU/Linux

Regards
Espen



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Anthony Liguori

On 03/15/2010 08:24 AM, Joerg Roedel wrote:

On Mon, Mar 15, 2010 at 03:11:42PM +0200, Avi Kivity wrote:
   

On 03/15/2010 03:03 PM, Joerg Roedel wrote:
 
   

I will add another project - iommu emulation.  Could be very useful
for doing device assignment to nested guests, which could make
testing a lot easier.

   

Our experiments show that nested device assignment is pretty much
required for I/O performance in nested scenarios.

 

Really? I did a small test with virtio-blk in a nested guest (disk read
with dd, so not a real benchmark) and got a reasonable read-performance
of around 25MB/s from the disk in the l2-guest.


   

Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.

I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do
for other guests.
 

Does it matter for the ept-on-ept case? The initial patchset of
nested-vmx implemented it and they reported a performance drop of around
12% between levels which is reasonable. So I expected the loss of
io-performance for l2 also reasonable in this case. My small measurement
was also done using npt-on-npt.
   


But that was something like kernbench IIRC which is actually exit light 
once ept is enabled.


Network IO is typically exit heavy and becomes something more of a 
pathological work load (both for nested ept and nested npt).


Regards,

Anthony Liguori


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to tweak kernel to get the best out of kvm?

2010-03-15 Thread Harald Dunkel
On 03/13/10 09:54, Avi Kivity wrote:
> 
> If the slowdown is indeed due to I/O, LVM (with cache=off) should
> eliminate it completely.
> 
As promised I have installed LVM: The difference is remarkable.
My test case (running 8 vhosts in parallel, each building a Linux
kernel) just works. There is no blocking job (by now), all
vhosts can be pinged, great.

Many thanx for your help, and for the nice software, of course.


Regards

Harri
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Corrupted filesystem, possible after livemigration with iSCSI storagebackend.

2010-03-15 Thread Espen Berg

In our KVM system we have two iSCSI backends (master/slave
configuration) with failover and two KVM hosts supporting live migration.

The iSCSI volumes are shared by the host as a block device in KVM, and
the volumes are available on both frontends.  After a reboot one of the
KVMs where not able to start again due to file system corruption.  We
use XFS and have problems to understand what caused the corruption.

We have ruled out the iSCSI backend as both the master and slave data
where consistent at the time.

Anyone else had similar problems?  What is the recommended way to share
an iSCSI drive among the two host machines?

Should XFS be ok as a file system for live migration?  I'm not able to
find any documentation stating that a clustered file system (GFS2 etc.)
is recommended.  Are there any concurrent writes on the two host
machines during a livemigtation?

 
  
  
  
  
 

#virsh version
Compiled against library: libvir 0.7.6
Using library: libvir 0.7.6
Using API: QEMU 0.7.6
Running hypervisor: QEMU 0.11.0

#uname -a
Linux vm01 2.6.32-bpo.2-amd64 #1 SMP Fri Feb 12 16:50:27 UTC 2010 x86_64
GNU/Linux

Regards
Espen



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Joerg Roedel
On Mon, Mar 15, 2010 at 03:11:42PM +0200, Avi Kivity wrote:
> On 03/15/2010 03:03 PM, Joerg Roedel wrote:
>>
 I will add another project - iommu emulation.  Could be very useful
 for doing device assignment to nested guests, which could make
 testing a lot easier.

>>> Our experiments show that nested device assignment is pretty much
>>> required for I/O performance in nested scenarios.
>>>  
>> Really? I did a small test with virtio-blk in a nested guest (disk read
>> with dd, so not a real benchmark) and got a reasonable read-performance
>> of around 25MB/s from the disk in the l2-guest.
>>
>>
>
> Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.
>
> I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do  
> for other guests.

Does it matter for the ept-on-ept case? The initial patchset of
nested-vmx implemented it and they reported a performance drop of around
12% between levels which is reasonable. So I expected the loss of
io-performance for l2 also reasonable in this case. My small measurement
was also done using npt-on-npt.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Anthony Liguori

On 03/15/2010 08:11 AM, Avi Kivity wrote:

On 03/15/2010 03:03 PM, Joerg Roedel wrote:



I will add another project - iommu emulation.  Could be very useful
for doing device assignment to nested guests, which could make
testing a lot easier.

Our experiments show that nested device assignment is pretty much
required for I/O performance in nested scenarios.

Really? I did a small test with virtio-blk in a nested guest (disk read
with dd, so not a real benchmark) and got a reasonable read-performance
of around 25MB/s from the disk in the l2-guest.



Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.

I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can 
do for other guests.


VMREAD/VMWRITEs are generally optimized by hypervisors as they tend to 
be costly.  KVM is a bit unusual in terms of how many times the 
instructions are executed per exit.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/30] KVM: Provide x86_emulate_ctxt callback to get current cpl

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 02:16:01PM +0100, Andre Przywara wrote:
> Gleb,
> 
> what is the purpose of this patch? Is this a preparation for
> something upcoming? I don't see a reason to change this, in my eyes
> it is not a simplification.
> 
To make emulator independent of KVM. All direct calls from emulator to
KVM will be changed to callbacks.

> Regards,
> Andre.
> 
> 
> Gleb Natapov wrote:
> >Signed-off-by: Gleb Natapov 
> >---
> > arch/x86/include/asm/kvm_emulate.h |1 +
> > arch/x86/kvm/emulate.c |   15 ---
> > arch/x86/kvm/x86.c |6 ++
> > 3 files changed, 15 insertions(+), 7 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/kvm_emulate.h 
> >b/arch/x86/include/asm/kvm_emulate.h
> >index 0c5caa4..b048fd2 100644
> >--- a/arch/x86/include/asm/kvm_emulate.h
> >+++ b/arch/x86/include/asm/kvm_emulate.h
> >@@ -110,6 +110,7 @@ struct x86_emulate_ops {
> > struct kvm_vcpu *vcpu);
> > ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
> > void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
> >+int (*cpl)(struct kvm_vcpu *vcpu);
> > };
> > /* Type, address-of, and value of an instruction's operand. */
> >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >index 5e2fa61..8bd0557 100644
> >--- a/arch/x86/kvm/emulate.c
> >+++ b/arch/x86/kvm/emulate.c
> >@@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
> > int rc;
> > unsigned long val, change_mask;
> > int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
> >-int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu);
> >+int cpl = ops->cpl(ctxt->vcpu);
> > rc = emulate_pop(ctxt, ops, &val, len);
> > if (rc != X86EMUL_CONTINUE)
> >@@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
> > return X86EMUL_CONTINUE;
> > }
> >-static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)
> >+static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt,
> >+  struct x86_emulate_ops *ops)
> > {
> > int iopl;
> > if (ctxt->mode == X86EMUL_MODE_REAL)
> >@@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt 
> >*ctxt)
> > if (ctxt->mode == X86EMUL_MODE_VM86)
> > return true;
> > iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
> >-return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl;
> >+return ops->cpl(ctxt->vcpu) > iopl;
> > }
> > static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
> >@@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct 
> >x86_emulate_ctxt *ctxt,
> >  struct x86_emulate_ops *ops,
> >  u16 port, u16 len)
> > {
> >-if (emulator_bad_iopl(ctxt))
> >+if (emulator_bad_iopl(ctxt, ops))
> > if (!emulator_io_port_access_allowed(ctxt, ops, port, len))
> > return false;
> > return true;
> >@@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
> >x86_emulate_ops *ops)
> > }
> > /* Privileged instruction can be executed only in CPL=0 */
> >-if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) {
> >+if ((c->d & Priv) && ops->cpl(ctxt->vcpu)) {
> > kvm_inject_gp(ctxt->vcpu, 0);
> > goto done;
> > }
> >@@ -2378,7 +2379,7 @@ special_insn:
> > c->dst.type = OP_NONE;  /* Disable writeback. */
> > break;
> > case 0xfa: /* cli */
> >-if (emulator_bad_iopl(ctxt))
> >+if (emulator_bad_iopl(ctxt, ops))
> > kvm_inject_gp(ctxt->vcpu, 0);
> > else {
> > ctxt->eflags &= ~X86_EFLAGS_IF;
> >@@ -2386,7 +2387,7 @@ special_insn:
> > }
> > break;
> > case 0xfb: /* sti */
> >-if (emulator_bad_iopl(ctxt))
> >+if (emulator_bad_iopl(ctxt, ops))
> > kvm_inject_gp(ctxt->vcpu, 0);
> > else {
> > toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI);
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index b08f8a1..3f2a8d3 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -3426,6 +3426,11 @@ static void emulator_set_cr(int cr, unsigned long 
> >val, struct kvm_vcpu *vcpu)
> > }
> > }
> >+static int emulator_get_cpl(struct kvm_vcpu *vcpu)
> >+{
> >+return kvm_x86_ops->get_cpl(vcpu);
> >+}
> >+
> > static struct x86_emulate_ops emulate_ops = {
> > .read_std= kvm_read_guest_virt_system,
> > .fetch   = kvm_fetch_guest_virt,
> >@@ -3434,6 +3439,7 @@ static struct x86_emulate_ops emulate_ops = {
> > .cmpxchg_emulated= emulator_cmpxchg_emulated,
> > .get_cr  = emulator_get_cr,
> > .set_cr  = emulator_set_cr,
> >+.cpl = emulator_get_cpl,
> > };
> > static void cache_all_regs(struct kvm_vcpu *vcpu)
> 
> 
> -- 
> Andre Przywara
> 

Re: [PATCH v2 07/30] KVM: Provide x86_emulate_ctxt callback to get current cpl

2010-03-15 Thread Andre Przywara

Gleb,

what is the purpose of this patch? Is this a preparation for something 
upcoming? I don't see a reason to change this, in my eyes it is not a 
simplification.


Regards,
Andre.


Gleb Natapov wrote:

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   15 ---
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0c5caa4..b048fd2 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -110,6 +110,7 @@ struct x86_emulate_ops {
struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
+   int (*cpl)(struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5e2fa61..8bd0557 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1257,7 +1257,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
int rc;
unsigned long val, change_mask;
int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
-   int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu);
+   int cpl = ops->cpl(ctxt->vcpu);
 
 	rc = emulate_pop(ctxt, ops, &val, len);

if (rc != X86EMUL_CONTINUE)
@@ -1758,7 +1758,8 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
 }
 
-static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)

+static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops)
 {
int iopl;
if (ctxt->mode == X86EMUL_MODE_REAL)
@@ -1766,7 +1767,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt 
*ctxt)
if (ctxt->mode == X86EMUL_MODE_VM86)
return true;
iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
-   return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl;
+   return ops->cpl(ctxt->vcpu) > iopl;
 }
 
 static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,

@@ -1803,7 +1804,7 @@ static bool emulator_io_permited(struct x86_emulate_ctxt 
*ctxt,
 struct x86_emulate_ops *ops,
 u16 port, u16 len)
 {
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
if (!emulator_io_port_access_allowed(ctxt, ops, port, len))
return false;
return true;
@@ -1842,7 +1843,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
 	/* Privileged instruction can be executed only in CPL=0 */

-   if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) {
+   if ((c->d & Priv) && ops->cpl(ctxt->vcpu)) {
kvm_inject_gp(ctxt->vcpu, 0);
goto done;
}
@@ -2378,7 +2379,7 @@ special_insn:
c->dst.type = OP_NONE;   /* Disable writeback. */
break;
case 0xfa: /* cli */
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt->vcpu, 0);
else {
ctxt->eflags &= ~X86_EFLAGS_IF;
@@ -2386,7 +2387,7 @@ special_insn:
}
break;
case 0xfb: /* sti */
-   if (emulator_bad_iopl(ctxt))
+   if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt->vcpu, 0);
else {
toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b08f8a1..3f2a8d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3426,6 +3426,11 @@ static void emulator_set_cr(int cr, unsigned long val, 
struct kvm_vcpu *vcpu)
}
 }
 
+static int emulator_get_cpl(struct kvm_vcpu *vcpu)

+{
+   return kvm_x86_ops->get_cpl(vcpu);
+}
+
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.fetch   = kvm_fetch_guest_virt,
@@ -3434,6 +3439,7 @@ static struct x86_emulate_ops emulate_ops = {
.cmpxchg_emulated= emulator_cmpxchg_emulated,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .cpl = emulator_get_cpl,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)



--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Anthony Liguori

On 03/15/2010 07:42 AM, Avi Kivity wrote:

On 03/15/2010 02:38 PM, Joerg Roedel wrote:

On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote:

On 03/10/2010 11:30 PM, Luiz Capitulino wrote:

   Hi there,

   Our wiki page for the Summer of Code 2010 is doing quite well:

http://wiki.qemu.org/Google_Summer_of_Code_2010


I will add another project - iommu emulation.  Could be very useful for
doing device assignment to nested guests, which could make testing a 
lot

easier.

Good idea. If there is interest I could help to mentor this project.


Thanks.  I volunteered Anthony, but he may be a little overcommitted.


Joerg, feel free to put your name against too.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/30] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-15 Thread Avi Kivity

On 03/15/2010 03:06 PM, Andre Przywara wrote:

Gleb Natapov wrote:

Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has 
nothing

to do with real mode.
Do you mind removing the static before emulator_{set,get}_cr and 
marking it EXPORT_SYMBOL? Then one could use it in vmx.c (and soon in 
svm.c ;-) while handling MOV-CR intercepts. Currently most of the code 
is actually duplicated.


Just do that in your patch, that's standard practice.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Avi Kivity

On 03/15/2010 03:03 PM, Joerg Roedel wrote:



I will add another project - iommu emulation.  Could be very useful
for doing device assignment to nested guests, which could make
testing a lot easier.
   

Our experiments show that nested device assignment is pretty much
required for I/O performance in nested scenarios.
 

Really? I did a small test with virtio-blk in a nested guest (disk read
with dd, so not a real benchmark) and got a reasonable read-performance
of around 25MB/s from the disk in the l2-guest.

   


Your guest wasn't doing a zillion VMREADs and VMWRITEs every exit.

I plan to reduce VMREAD/VMWRITE overhead for kvm, but not much we can do 
for other guests.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/30] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 02:06:48PM +0100, Andre Przywara wrote:
> Gleb Natapov wrote:
> >Use this callback instead of directly call kvm function. Also rename
> >realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
> >to do with real mode.
> Do you mind removing the static before emulator_{set,get}_cr and
I don't, but this is not the goal of this patch series.

> marking it EXPORT_SYMBOL? Then one could use it in vmx.c (and soon
> in svm.c ;-) while handling MOV-CR intercepts. Currently most of the
> code is actually duplicated.
> 
> Also, shouldn't mk_cr_64() not be called mask_cr_64() for better
> readability?
This is how it is called now, the patch only moves it. But this code
will be reworked by later patches anyway since functions called from
emulator should not inject exceptions behind emulator's back.

> 
> Regards,
> Andre.
> 
> >Signed-off-by: Gleb Natapov 
> >---
> > arch/x86/include/asm/kvm_emulate.h |3 +-
> > arch/x86/include/asm/kvm_host.h|2 -
> > arch/x86/kvm/emulate.c |7 +-
> > arch/x86/kvm/x86.c |  114 
> > ++--
> > 4 files changed, 63 insertions(+), 63 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/kvm_emulate.h 
> >b/arch/x86/include/asm/kvm_emulate.h
> >index 2666d7a..0c5caa4 100644
> >--- a/arch/x86/include/asm/kvm_emulate.h
> >+++ b/arch/x86/include/asm/kvm_emulate.h
> >@@ -108,7 +108,8 @@ struct x86_emulate_ops {
> > const void *new,
> > unsigned int bytes,
> > struct kvm_vcpu *vcpu);
> >-
> >+ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
> >+void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
> > };
> > /* Type, address-of, and value of an instruction's operand. */
> >diff --git a/arch/x86/include/asm/kvm_host.h 
> >b/arch/x86/include/asm/kvm_host.h
> >index 3b178d8..e8e108a 100644
> >--- a/arch/x86/include/asm/kvm_host.h
> >+++ b/arch/x86/include/asm/kvm_host.h
> >@@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, 
> >unsigned long address);
> > void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
> >unsigned long *rflags);
> >-unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
> >-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
> > void kvm_enable_efer_bits(u64);
> > int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
> > int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
> >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >index 91450b5..5b060e4 100644
> >--- a/arch/x86/kvm/emulate.c
> >+++ b/arch/x86/kvm/emulate.c
> >@@ -2483,7 +2483,7 @@ twobyte_insn:
> > break;
> > case 4: /* smsw */
> > c->dst.bytes = 2;
> >-c->dst.val = realmode_get_cr(ctxt->vcpu, 0);
> >+c->dst.val = ops->get_cr(0, ctxt->vcpu);
> > break;
> > case 6: /* lmsw */
> > realmode_lmsw(ctxt->vcpu, (u16)c->src.val,
> >@@ -2519,8 +2519,7 @@ twobyte_insn:
> > case 0x20: /* mov cr, reg */
> > if (c->modrm_mod != 3)
> > goto cannot_emulate;
> >-c->regs[c->modrm_rm] =
> >-realmode_get_cr(ctxt->vcpu, c->modrm_reg);
> >+c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu);
> > c->dst.type = OP_NONE;  /* no writeback */
> > break;
> > case 0x21: /* mov from dr to reg */
> >@@ -2534,7 +2533,7 @@ twobyte_insn:
> > case 0x22: /* mov reg, cr */
> > if (c->modrm_mod != 3)
> > goto cannot_emulate;
> >-realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val);
> >+ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu);
> > c->dst.type = OP_NONE;
> > break;
> > case 0x23: /* mov from reg to dr */
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index a1e671a..bf714df 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -3370,12 +3370,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu 
> >*vcpu, const char *context)
> > }
> > EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
> >+static u64 mk_cr_64(u64 curr_cr, u32 new_val)
> >+{
> >+return (curr_cr & ~((1ULL << 32) - 1)) | new_val;
> >+}
> >+
> >+static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
> >+{
> >+unsigned long value;
> >+
> >+switch (cr) {
> >+case 0:
> >+value = kvm_read_cr0(vcpu);
> >+break;
> >+case 2:
> >+value = vcpu->arch.cr2;
> >+break;
> >+case 3:
> >+value = vcpu->arch.cr3;
> >+break;
> >+case 4:
> >+value = kvm_read_cr4(vcpu);
> >+break;
> >+case 8:
> >+value = kvm_get_cr8(vcpu);
> >+break;
> >+def

[PATCH rework] KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling

2010-03-15 Thread Takuya Yoshikawa
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
mmio ring page and dev even after it has freed them.

Also, if this function fails, though it might be rare, it seems to be
suggesting the system's serious state: so we'd better stop the works
following the kvm_creat_vm().

This patch clears these problems.

  We move the coalesced mmio's initialization out of kvm_create_vm().
  This seems to be natural because it includes a registration which
  can be done only when vm is successfully created.

Signed-off-by: Takuya Yoshikawa 
---
 virt/kvm/coalesced_mmio.c |2 ++
 virt/kvm/kvm_main.c   |   12 
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5169736..11776b7 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
return ret;
 
 out_free_dev:
+   kvm->coalesced_mmio_dev = NULL;
kfree(dev);
 out_free_page:
+   kvm->coalesced_mmio_ring = NULL;
__free_page(page);
 out_err:
return ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bcd08b8..c7053aa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -418,9 +418,6 @@ static struct kvm *kvm_create_vm(void)
spin_lock(&kvm_lock);
list_add(&kvm->vm_list, &vm_list);
spin_unlock(&kvm_lock);
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   kvm_coalesced_mmio_init(kvm);
-#endif
 out:
return kvm;
 
@@ -1748,12 +1745,19 @@ static struct file_operations kvm_vm_fops = {
 
 static int kvm_dev_ioctl_create_vm(void)
 {
-   int fd;
+   int fd, r;
struct kvm *kvm;
 
kvm = kvm_create_vm();
if (IS_ERR(kvm))
return PTR_ERR(kvm);
+#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
+   r = kvm_coalesced_mmio_init(kvm);
+   if (r < 0) {
+   kvm_put_kvm(kvm);
+   return r;
+   }
+#endif
fd = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
if (fd < 0)
kvm_put_kvm(kvm);
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/30] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-15 Thread Andre Przywara

Gleb Natapov wrote:

Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.
Do you mind removing the static before emulator_{set,get}_cr and marking 
it EXPORT_SYMBOL? Then one could use it in vmx.c (and soon in svm.c ;-) 
while handling MOV-CR intercepts. Currently most of the code is actually 
duplicated.


Also, shouldn't mk_cr_64() not be called mask_cr_64() for better 
readability?


Regards,
Andre.


Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_emulate.h |3 +-
 arch/x86/include/asm/kvm_host.h|2 -
 arch/x86/kvm/emulate.c |7 +-
 arch/x86/kvm/x86.c |  114 ++--
 4 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 2666d7a..0c5caa4 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -108,7 +108,8 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
-
+   ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
+   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3b178d8..e8e108a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, 
unsigned long address);
 void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
   unsigned long *rflags);
 
-unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);

-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 91450b5..5b060e4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2483,7 +2483,7 @@ twobyte_insn:
break;
case 4: /* smsw */
c->dst.bytes = 2;
-   c->dst.val = realmode_get_cr(ctxt->vcpu, 0);
+   c->dst.val = ops->get_cr(0, ctxt->vcpu);
break;
case 6: /* lmsw */
realmode_lmsw(ctxt->vcpu, (u16)c->src.val,
@@ -2519,8 +2519,7 @@ twobyte_insn:
case 0x20: /* mov cr, reg */
if (c->modrm_mod != 3)
goto cannot_emulate;
-   c->regs[c->modrm_rm] =
-   realmode_get_cr(ctxt->vcpu, c->modrm_reg);
+   c->regs[c->modrm_rm] = ops->get_cr(c->modrm_reg, ctxt->vcpu);
c->dst.type = OP_NONE;   /* no writeback */
break;
case 0x21: /* mov from dr to reg */
@@ -2534,7 +2533,7 @@ twobyte_insn:
case 0x22: /* mov reg, cr */
if (c->modrm_mod != 3)
goto cannot_emulate;
-   realmode_set_cr(ctxt->vcpu, c->modrm_reg, c->modrm_val);
+   ops->set_cr(c->modrm_reg, c->modrm_val, ctxt->vcpu);
c->dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1e671a..bf714df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3370,12 +3370,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu 
*vcpu, const char *context)
 }
 EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
 
+static u64 mk_cr_64(u64 curr_cr, u32 new_val)

+{
+   return (curr_cr & ~((1ULL << 32) - 1)) | new_val;
+}
+
+static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
+{
+   unsigned long value;
+
+   switch (cr) {
+   case 0:
+   value = kvm_read_cr0(vcpu);
+   break;
+   case 2:
+   value = vcpu->arch.cr2;
+   break;
+   case 3:
+   value = vcpu->arch.cr3;
+   break;
+   case 4:
+   value = kvm_read_cr4(vcpu);
+   break;
+   case 8:
+   value = kvm_get_cr8(vcpu);
+   break;
+   default:
+   vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
+   return 0;
+   }
+
+   return value;
+}
+
+static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu)
+{
+   switch (cr) {
+   case 0:
+   kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
+   break;
+   case 2:
+   vcpu->arch.cr2 = val;
+   break;
+   case 3:
+   kvm_set_cr3(vcpu, val);
+ 

Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Joerg Roedel
On Mon, Mar 15, 2010 at 05:53:13AM -0700, Muli Ben-Yehuda wrote:
> On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote:
> > On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
> 
> > >  Hi there,
> > >
> > >  Our wiki page for the Summer of Code 2010 is doing quite well:
> > >
> > >http://wiki.qemu.org/Google_Summer_of_Code_2010
> > 
> > I will add another project - iommu emulation.  Could be very useful
> > for doing device assignment to nested guests, which could make
> > testing a lot easier.
> 
> Our experiments show that nested device assignment is pretty much
> required for I/O performance in nested scenarios.

Really? I did a small test with virtio-blk in a nested guest (disk read
with dd, so not a real benchmark) and got a reasonable read-performance
of around 25MB/s from the disk in the l2-guest.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Muli Ben-Yehuda
On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote:
> On 03/10/2010 11:30 PM, Luiz Capitulino wrote:

> >  Hi there,
> >
> >  Our wiki page for the Summer of Code 2010 is doing quite well:
> >
> >http://wiki.qemu.org/Google_Summer_of_Code_2010
> 
> I will add another project - iommu emulation.  Could be very useful
> for doing device assignment to nested guests, which could make
> testing a lot easier.

Our experiments show that nested device assignment is pretty much
required for I/O performance in nested scenarios.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/18] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa

2010-03-15 Thread Joerg Roedel
On Mon, Mar 15, 2010 at 04:30:47AM +, Daniel K. wrote:
> Joerg Roedel wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2883ce8..9f8b02d 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -314,6 +314,19 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, 
>> unsigned long addr,
>>  kvm_queue_exception_e(vcpu, PF_VECTOR, error_code)
>>  }
>>  +void kvm_propagate_fault(struct kvm_vcpu *vcpu, unsigned long addr, 
>> u32 error_code)
>> +{
>> +u32 nested, error;
>> +
>> +nested = error_code &  PFERR_NESTED_MASK;
>> +error  = error_code & ~PFERR_NESTED_MASK;
>> +
>> +if (vcpu->arch.mmu.nested && !(error_code && PFERR_NESTED_MASK))
>
> This looks incorrect, nested is unused.
>
> At the very least it should be a binary & operation
>
>   if (vcpu->arch.mmu.nested && !(error_code & PFERR_NESTED_MASK))
>
> which can be simplified to
>
>   if (vcpu->arch.mmu.nested && !nested)
>
> but it seems wrong that the condition is that it is nested and not nested 
> at the same time.

Yes, this is already fixed in my local patch-stack. I found it during
further testing (while fixing another bug). But thanks for your feedback
:-)

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [long] MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled

2010-03-15 Thread Avi Kivity

On 03/15/2010 12:54 PM, Antoine Leca wrote:



When doing switch, the cached segment selectors are preserved,
which allows one to use protected mode segments in real-address mode
(this is called unreal mode).
 

Now this is a by-product of the implementation inside the BIOS.
In fact, even if the BIOS enters unreal mode (or the similar big real,
more useful with segmentation-less architectures), before turning back
to the client it (should) reset things to normal real mode, as service
15/87 is not an usual way to enter unreal mode (for example, this effect
is not even mentionned in Ralf Brown's list).
   


The entry into unreal mode is unintentional; the bios is transitioning 
to protected mode and 'unreal mode' only exists for a few instructions, 
IIRC.



As a result (and also and foremost because of 80286 compatibility),
instead of directly using unreal or big real mode if possible (as done
eg. in himem.sys), Minix monitor goes to the great pain to going back to
square #1, and since blocks are at most 64 KB in size and several
iterations are needed, on the next block Minix sets up the (very
similar) GDT then does another call to the same BIOS service 15/87.


   

I knew these parts before, but this is where Avi's answer came in: KVM
on Intel does not yet support unreal mode and requires the cached
segment descriptors to be valid in real-address mode.
 

I do not know which virtual BIOS is using KVM, but I notice while
reading http://bochs.sourceforge.net/cgi-bin/lxr/source/bios/rombios.c:
[ Slightly edited to fit the width of my post. AL. ]
3555 case 0x87:
3556 #if BX_CPU<  3
3557 #  error "Int15 function 87h not supported on<  80386"
3558 #endif
3559   // +++ should probably have descriptor checks
3560   // +++ should have exception handlers
...
3640   mov  eax, cr0
3641   or   al, #0x01
3642   mov  cr0, eax
3643   ;; far jump to flush CPU queue after transition to prot. mode
3644   JMP_AP(0x0020, protected_mode)
3645
3646 protected_mode:
3647   ;; GDT points to valid descriptor table, now load SS, DS, ES
3648   mov  ax, #0x28 ;; 101 000 = 5th desc.in table, TI=GDT,RPL=00
3649   mov  ss, ax
3650   mov  ax, #0x10 ;; 010 000 = 2nd desc.in table, TI=GDT,RPL=00
3651   mov  ds, ax
3652   mov  ax, #0x18 ;; 011 000 = 3rd desc.in table, TI=GDT,RPL=00
3653   mov  es, ax
3654   xor  si, si
3655   xor  di, di
3656   cld
3657   rep
3658 movsw  ;; move CX words from DS:SI to ES:DI
3659
3660   ;; make sure DS and ES limits are 64KB
3661   mov ax, #0x28
3662   mov ds, ax
3663   mov es, ax
3664
3665   ;; reset PG bit in CR0 ???
3666   mov  eax, cr0
3667   and  al, #0xFE
3668   mov  cr0, eax

I should be loosing something here... There is no unreal mode at any
moment, is it?

[ ... some web browsing occuring meanwhile ... Later: ]

Okay, now I got another picture. 8-|
Until recently, KVM (and qemu) used Bochs BIOS, showed above; but they
switched recently to SeaBIOS... where the applicable code is in
src/system.c, and looks like (now this is AT&T assembly):
   83 static void
   84 handle_1587(struct bregs *regs)
   85 {
   86 // +++ should probably have descriptor checks
   87 // +++ should have exception handlers

  127 // Enable protected mode
  128 "  movl %%cr0, %%eax\n"
  129 "  orl $" __stringify(CR0_PE) ", %%eax\n"
  130 "  movl %%eax, %%cr0\n"
  131
  132  // far jump to flush CPU queue after transition to prot. mode
  133 "  ljmpw $(4<<3), $1f\n"
  134
  135 // GDT points to valid descriptor table, now load DS, ES
  136 "1:movw $(2<<3), %%ax\n"
 // 2nd descriptor in table, TI=GDT, RPL=00
  137 "  movw %%ax, %%ds\n"
  138 "  movw $(3<<3), %%ax\n"
 // 3rd descriptor in table, TI=GDT, RPL=00
  139 "  movw %%ax, %%es\n"
  140
  141 // move CX words from DS:SI to ES:DI
  142 "  xorw %%si, %%si\n"
  143 "  xorw %%di, %%di\n"
  144 "  rep movsw\n"
  145
  146 // Disable protected mode
  147 "  movl %%cr0, %%eax\n"
  148 "  andl $~" __stringify(CR0_PE) ", %%eax\n"
  149 "  movl %%eax, %%cr0\n"

Note that while the basic scheme is the same, the "cleaning up" of lines
3660-3663 "make sure DS and ES limits are 64KB" is not present.
IIUC, the virtualized CPU goes back to real mode with those segments
sets as they are in protected mode, and yes with Minix boot monitor they
happenned to NOT be paragraph-aligned.


Is it possible to add back this "cleaning up" to the BIOS used in KVM?
   


I think so.  This is a longstanding kvm bug, but I can't see any 
downsides to a workaround in the BIOS.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info 

Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Avi Kivity

On 03/15/2010 02:38 PM, Joerg Roedel wrote:

On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote:
   

On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
 

   Hi there,

   Our wiki page for the Summer of Code 2010 is doing quite well:

http://wiki.qemu.org/Google_Summer_of_Code_2010

   

I will add another project - iommu emulation.  Could be very useful for
doing device assignment to nested guests, which could make testing a lot
easier.
 

Good idea. If there is interest I could help to mentor this project.
   


Thanks.  I volunteered Anthony, but he may be a little overcommitted.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Joerg Roedel
On Mon, Mar 15, 2010 at 02:25:41PM +0200, Avi Kivity wrote:
> On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
>>   Hi there,
>>
>>   Our wiki page for the Summer of Code 2010 is doing quite well:
>>
>> http://wiki.qemu.org/Google_Summer_of_Code_2010
>>
>
> I will add another project - iommu emulation.  Could be very useful for  
> doing device assignment to nested guests, which could make testing a lot  
> easier.

Good idea. If there is interest I could help to mentor this project.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-15 Thread Avi Kivity

On 03/10/2010 11:30 PM, Luiz Capitulino wrote:

  Hi there,

  Our wiki page for the Summer of Code 2010 is doing quite well:

http://wiki.qemu.org/Google_Summer_of_Code_2010
   


I will add another project - iommu emulation.  Could be very useful for 
doing device assignment to nested guests, which could make testing a lot 
easier.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)

2010-03-15 Thread Avi Kivity
Currently when we emulate a locked operation into a shadowed guest page
table, we perform a write rather than a true atomic.  This is indicated
by the "emulating exchange as write" message that shows up in dmesg.

In addition, the pte prefetch operation during invlpg suffered from a
race.  This was fixed by removing the operation.

This patchset fixes both issues and reinstates pte prefetch on invlpg.

v3:
   - rebase against next branch (resolves conflicts via hypercall patch)

v2:
   - fix truncated description for patch 1
   - add new patch 4, which fixes a bug in patch 5

Avi Kivity (5):
  KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()
  KVM: Make locked operations truly atomic
  KVM: Don't follow an atomic operation by a non-atomic one
  KVM: MMU: Do not instantiate nontrapping spte on unsync page
  KVM: MMU: Reinstate pte prefetch on invlpg

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   78 +
 arch/x86/kvm/paging_tmpl.h  |   25 ++-
 arch/x86/kvm/x86.c  |   90 +++
 4 files changed, 127 insertions(+), 67 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()

2010-03-15 Thread Avi Kivity
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes.  Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/mmu.c |   69 +++
 1 files changed, 31 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b137515..f63c9ad 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2556,36 +2556,11 @@ static bool last_updated_pte_accessed(struct kvm_vcpu 
*vcpu)
 }
 
 static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
- const u8 *new, int bytes)
+ u64 gpte)
 {
gfn_t gfn;
-   int r;
-   u64 gpte = 0;
pfn_t pfn;
 
-   if (bytes != 4 && bytes != 8)
-   return;
-
-   /*
-* Assume that the pte write on a page table of the same type
-* as the current vcpu paging mode.  This is nearly always true
-* (might be false while changing modes).  Note it is verified later
-* by update_pte().
-*/
-   if (is_pae(vcpu)) {
-   /* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   if ((bytes == 4) && (gpa % 4 == 0)) {
-   r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8);
-   if (r)
-   return;
-   memcpy((void *)&gpte + (gpa % 8), new, 4);
-   } else if ((bytes == 8) && (gpa % 8 == 0)) {
-   memcpy((void *)&gpte, new, 8);
-   }
-   } else {
-   if ((bytes == 4) && (gpa % 4 == 0))
-   memcpy((void *)&gpte, new, 4);
-   }
if (!is_present_gpte(gpte))
return;
gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
@@ -2636,7 +2611,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int r;
 
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
-   mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes);
+
+   switch (bytes) {
+   case 4:
+   gentry = *(const u32 *)new;
+   break;
+   case 8:
+   gentry = *(const u64 *)new;
+   break;
+   default:
+   gentry = 0;
+   break;
+   }
+
+   /*
+* Assume that the pte write on a page table of the same type
+* as the current vcpu paging mode.  This is nearly always true
+* (might be false while changing modes).  Note it is verified later
+* by update_pte().
+*/
+   if (is_pae(vcpu) && bytes == 4) {
+   /* Handle a 32-bit guest writing two halves of a 64-bit gpte */
+   gpa &= ~(gpa_t)7;
+   r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8);
+   if (r)
+   gentry = 0;
+   }
+
+   mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
spin_lock(&vcpu->kvm->mmu_lock);
kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
@@ -2701,20 +2703,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
continue;
}
spte = &sp->spt[page_offset / sizeof(*spte)];
-   if ((gpa & (pte_size - 1)) || (bytes < pte_size)) {
-   gentry = 0;
-   r = kvm_read_guest_atomic(vcpu->kvm,
- gpa & ~(u64)(pte_size - 1),
- &gentry, pte_size);
-   new = (const void *)&gentry;
-   if (r < 0)
-   new = NULL;
-   }
while (npte--) {
entry = *spte;
mmu_pte_write_zap_pte(vcpu, sp, spte);
-   if (new)
-   mmu_pte_write_new_pte(vcpu, sp, spte, new);
+   if (gentry)
+   mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
mmu_pte_write_flush_tlb(vcpu, entry, *spte);
++spte;
}
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: Don't follow an atomic operation by a non-atomic one

2010-03-15 Thread Avi Kivity
Currently emulated atomic operations are immediately followed by a non-atomic
operation, so that kvm_mmu_pte_write() can be invoked.  This updates the mmu
but undoes the whole point of doing things atomically.

Fix by only performing the atomic operation and the mmu update, and avoiding
the non-atomic write.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |   21 +++--
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d724a52..2c0f632 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3227,7 +3227,8 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   const void *val,
   unsigned int bytes,
   struct kvm_vcpu *vcpu,
-  bool guest_initiated)
+  bool guest_initiated,
+  bool mmu_only)
 {
gpa_t gpa;
u32 error_code;
@@ -3247,6 +3248,10 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
goto mmio;
 
+   if (mmu_only) {
+   kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1);
+   return X86EMUL_CONTINUE;
+   }
if (emulator_write_phys(vcpu, gpa, val, bytes))
return X86EMUL_CONTINUE;
 
@@ -3271,7 +3276,8 @@ int __emulator_write_emulated(unsigned long addr,
   const void *val,
   unsigned int bytes,
   struct kvm_vcpu *vcpu,
-  bool guest_initiated)
+  bool guest_initiated,
+  bool mmu_only)
 {
/* Crossing a page boundary? */
if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
@@ -3279,7 +3285,7 @@ int __emulator_write_emulated(unsigned long addr,
 
now = -addr & ~PAGE_MASK;
rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
-guest_initiated);
+guest_initiated, mmu_only);
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
@@ -3287,7 +3293,7 @@ int __emulator_write_emulated(unsigned long addr,
bytes -= now;
}
return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
-  guest_initiated);
+  guest_initiated, mmu_only);
 }
 
 int emulator_write_emulated(unsigned long addr,
@@ -3295,7 +3301,7 @@ int emulator_write_emulated(unsigned long addr,
   unsigned int bytes,
   struct kvm_vcpu *vcpu)
 {
-   return __emulator_write_emulated(addr, val, bytes, vcpu, true);
+   return __emulator_write_emulated(addr, val, bytes, vcpu, true, false);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3359,6 +3365,8 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;
 
+   return __emulator_write_emulated(addr, new, bytes, vcpu, true, true);
+
 emul_write:
printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
 
@@ -4013,7 +4021,8 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
kvm_x86_ops->patch_hypercall(vcpu, instruction);
 
-   return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
+   return __emulator_write_emulated(rip, instruction, 3, vcpu,
+false, false);
 }
 
 static u64 mk_cr_64(u64 curr_cr, u32 new_val)
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: MMU: Do not instantiate nontrapping spte on unsync page

2010-03-15 Thread Avi Kivity
The update_pte() path currently uses a nontrapping spte when a nonpresent
(or nonaccessed) gpte is written.  This is fine since at present it is only
used on sync pages.  However, on an unsync page this will cause an endless
fault loop as the guest is under no obligation to invlpg a gpte that
transitions from nonpresent to present.

Needed for the next patch which reinstates update_pte() on invlpg.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/paging_tmpl.h |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 81eab9a..4b37e1a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -258,11 +258,17 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *page,
pt_element_t gpte;
unsigned pte_access;
pfn_t pfn;
+   u64 new_spte;
 
gpte = *(const pt_element_t *)pte;
if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
-   if (!is_present_gpte(gpte))
-   __set_spte(spte, shadow_notrap_nonpresent_pte);
+   if (!is_present_gpte(gpte)) {
+   if (page->unsync)
+   new_spte = shadow_trap_nonpresent_pte;
+   else
+   new_spte = shadow_notrap_nonpresent_pte;
+   __set_spte(spte, new_spte);
+   }
return;
}
pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: Make locked operations truly atomic

2010-03-15 Thread Avi Kivity
Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.

These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest.  This may cause corruption of guest page tables.

Fix by using an atomic cmpxchg for these operations.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |   69 
 1 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9d02cc7..d724a52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3299,41 +3299,68 @@ int emulator_write_emulated(unsigned long addr,
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
+#define CMPXCHG_TYPE(t, ptr, old, new) \
+   (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old))
+
+#ifdef CONFIG_X86_64
+#  define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new)
+#else
+#  define CMPXCHG64(ptr, old, new) \
+   (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old))
+#endif
+
 static int emulator_cmpxchg_emulated(unsigned long addr,
 const void *old,
 const void *new,
 unsigned int bytes,
 struct kvm_vcpu *vcpu)
 {
-   printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
-#ifndef CONFIG_X86_64
-   /* guests cmpxchg8b have to be emulated atomically */
-   if (bytes == 8) {
-   gpa_t gpa;
-   struct page *page;
-   char *kaddr;
-   u64 val;
+   gpa_t gpa;
+   struct page *page;
+   char *kaddr;
+   bool exchanged;
 
-   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
+   /* guests cmpxchg8b have to be emulated atomically */
+   if (bytes > 8 || (bytes & (bytes - 1)))
+   goto emul_write;
 
-   if (gpa == UNMAPPED_GVA ||
-  (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
-   goto emul_write;
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
 
-   if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
-   goto emul_write;
+   if (gpa == UNMAPPED_GVA ||
+   (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
+   goto emul_write;
 
-   val = *(u64 *)new;
+   if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
+   goto emul_write;
 
-   page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+   page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   set_64bit((u64 *)(kaddr + offset_in_page(gpa)), val);
-   kunmap_atomic(kaddr, KM_USER0);
-   kvm_release_page_dirty(page);
+   kaddr = kmap_atomic(page, KM_USER0);
+   kaddr += offset_in_page(gpa);
+   switch (bytes) {
+   case 1:
+   exchanged = CMPXCHG_TYPE(u8, kaddr, old, new);
+   break;
+   case 2:
+   exchanged = CMPXCHG_TYPE(u16, kaddr, old, new);
+   break;
+   case 4:
+   exchanged = CMPXCHG_TYPE(u32, kaddr, old, new);
+   break;
+   case 8:
+   exchanged = CMPXCHG64(kaddr, old, new);
+   break;
+   default:
+   BUG();
}
+   kunmap_atomic(kaddr, KM_USER0);
+   kvm_release_page_dirty(page);
+
+   if (!exchanged)
+   return X86EMUL_CMPXCHG_FAILED;
+
 emul_write:
-#endif
+   printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
 
return emulator_write_emulated(addr, new, bytes, vcpu);
 }
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: MMU: Reinstate pte prefetch on invlpg

2010-03-15 Thread Avi Kivity
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:

  "The processor may create entries in paging-structure caches for
   translations required for prefetches and for accesses that are a
   result of speculative execution that would never actually occur
   in the executed code path."

And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.

Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter.  If a race occured, do not install the pte.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   37 +++--
 arch/x86/kvm/paging_tmpl.h  |   15 +++
 3 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ea1b6c6..28826c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -389,6 +389,7 @@ struct kvm_arch {
unsigned int n_free_mmu_pages;
unsigned int n_requested_mmu_pages;
unsigned int n_alloc_mmu_pages;
+   atomic_t invlpg_counter;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
 * Hash table of struct kvm_mmu_page.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f63c9ad..b3edc46 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2609,20 +2609,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int flooded = 0;
int npte;
int r;
+   int invlpg_counter;
 
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 
-   switch (bytes) {
-   case 4:
-   gentry = *(const u32 *)new;
-   break;
-   case 8:
-   gentry = *(const u64 *)new;
-   break;
-   default:
-   gentry = 0;
-   break;
-   }
+   invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter);
 
/*
 * Assume that the pte write on a page table of the same type
@@ -2630,16 +2621,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 * (might be false while changing modes).  Note it is verified later
 * by update_pte().
 */
-   if (is_pae(vcpu) && bytes == 4) {
+   if ((is_pae(vcpu) && bytes == 4) || !new) {
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   gpa &= ~(gpa_t)7;
-   r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8);
+   if (is_pae(vcpu)) {
+   gpa &= ~(gpa_t)7;
+   bytes = 8;
+   }
+   r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8));
if (r)
gentry = 0;
+   new = (const u8 *)&gentry;
+   }
+
+   switch (bytes) {
+   case 4:
+   gentry = *(const u32 *)new;
+   break;
+   case 8:
+   gentry = *(const u64 *)new;
+   break;
+   default:
+   gentry = 0;
+   break;
}
 
mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
spin_lock(&vcpu->kvm->mmu_lock);
+   if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
+   gentry = 0;
kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
++vcpu->kvm->stat.mmu_pte_write;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 4b37e1a..067797a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -463,6 +463,7 @@ out_unlock:
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
struct kvm_shadow_walk_iterator iterator;
+   gpa_t pte_gpa = -1;
int level;
u64 *sptep;
int need_flush = 0;
@@ -476,6 +477,10 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
if (level == PT_PAGE_TABLE_LEVEL  ||
((level == PT_DIRECTORY_LEVEL && is_large_pte(*sptep))) ||
((level == PT_PDPE_LEVEL && is_large_pte(*sptep {
+   struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+   pte_gpa = (sp->gfn << PAGE_SHIFT);
+   pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
 
if (is_shadow_present_pte(*sptep)) {
rmap_remove(vcpu->kvm, sptep);
@@ -493,7 +498,17 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 
if (need_flush)
kvm_flush_remote_tlbs(vcpu->kvm);
+
+   atomic_inc(&vcpu->kvm->arch.invlpg_counter);
+
spin_unlock(&vcpu->kvm->mmu_lock);
+
+   if (pte_gpa == -1)
+   return;
+
+   if (mmu_topup_memory_caches(vcpu)

Re: [PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)

2010-03-15 Thread Avi Kivity

On 03/15/2010 12:16 PM, Marcelo Tosatti wrote:

On Sun, Mar 14, 2010 at 09:03:47AM +0200, Avi Kivity wrote:
   

On 03/10/2010 04:50 PM, Avi Kivity wrote:
 

Currently when we emulate a locked operation into a shadowed guest page
table, we perform a write rather than a true atomic.  This is indicated
by the "emulating exchange as write" message that shows up in dmesg.

In addition, the pte prefetch operation during invlpg suffered from a
race.  This was fixed by removing the operation.

This patchset fixes both issues and reinstates pte prefetch on invlpg.

v2:
- fix truncated description for patch 1
- add new patch 4, which fixes a bug in patch 5
   

No comments, but looks like last week's maintainer neglected to merge this.
 

Looks fine. Can you please regenerate against next branch? (just
pushed).
   


Will send out shortly.


For the invlpg prefetch it would be good to confirm the original bug is
not reproducible.
   


I tried to reproduce the problem with the original revert reverted, but 
couldn't.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/30] KVM: remove realmode_lmsw function.

2010-03-15 Thread Avi Kivity

On 03/15/2010 01:02 PM, Andre Przywara wrote:

Gleb Natapov wrote:

Use (get|set)_cr callback to emulate lmsw inside emulator.
I see that vmx.c:handle_cr() is the only other user of kvm_lmsw(). If 
we fix this place similar like you did below, we could get rid of 
kvm_lmsw() entirely. But I am not sure whether it's OK to remove an 
exported symbol.


Exported symbols can be changed or removed at will.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [long] MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled

2010-03-15 Thread Antoine Leca
Avi Kivity wrote on 2010-03-10 13:03:25 +0200:
> On 03/10/2010 12:26 PM, Erik van der Kouwe wrote:
>> I've submitted this bug report a week ago:
>> http://sourceforge.net/tracker/?func=detail&aid=2962575&group_id=180599&atid=893831
>>  
> 
> MINIX is using big real mode which is currently not well supported by kvm on 
> Intel hardware: 
> 
>> (qemu) info registers
>> EAX=0010 EBX=0009 ECX=4920 EDX=a796
>> ESI=0200 EDI=49200200 EBP=0009 ESP=a762
>> EIP=f4a7 EFL=00023002 [---] CPL=3 II=0 A20=1 SMM=0 HLT=0
>> ES =   f300
>> CS =f000 000f  f300
>> SS =9492 00094920  f300
>> DS =97ce 00097cec  f300
> 
> A ds.base of 0x97cec cannot be translated to a real mode segment.
> 
> There is some work to get this to work, but it is proceeding really slowly.
> It should work on AMD hardware though.

Hi guys,

I searched the issue, and Erik was kind enough to point me to this list
where there are knowledgeable people.


Erik van der Kouwe wrote in
http://groups.google.com/group/minix3/msg/40f44df0c434cfa6:
> The situation is as follows:
>
> The boot monitor runs in real-address mode, but has to copy parts of
> the boot image into high memory (>= 1 MB) which is not accessible from
> that mode as only 20 bits are available. It calls the BIOS (int 0x15)
> to perform the copy. This is done under the ext_copy label in boot/
> boothead.s.

Okay. It is my understanding this is where Minix' involvement stops.


> The BIOS switches to protected mode, loading a GDT which it receives
> from the caller. Before returning to the caller, it copies data using
> the segment descriptors in the GDT and switches back to real-address
> mode.

This is the description of BIOS service 15/87, which have to be
implemented (using whatever solution it pleases) by the BIOS.


> When doing switch, the cached segment selectors are preserved,
> which allows one to use protected mode segments in real-address mode
> (this is called unreal mode).

Now this is a by-product of the implementation inside the BIOS.
In fact, even if the BIOS enters unreal mode (or the similar big real,
more useful with segmentation-less architectures), before turning back
to the client it (should) reset things to normal real mode, as service
15/87 is not an usual way to enter unreal mode (for example, this effect
is not even mentionned in Ralf Brown's list).

As a result (and also and foremost because of 80286 compatibility),
instead of directly using unreal or big real mode if possible (as done
eg. in himem.sys), Minix monitor goes to the great pain to going back to
square #1, and since blocks are at most 64 KB in size and several
iterations are needed, on the next block Minix sets up the (very
similar) GDT then does another call to the same BIOS service 15/87.


> I knew these parts before, but this is where Avi's answer came in: KVM
> on Intel does not yet support unreal mode and requires the cached
> segment descriptors to be valid in real-address mode.

I do not know which virtual BIOS is using KVM, but I notice while
reading http://bochs.sourceforge.net/cgi-bin/lxr/source/bios/rombios.c:
[ Slightly edited to fit the width of my post. AL. ]
3555 case 0x87:
3556 #if BX_CPU < 3
3557 #  error "Int15 function 87h not supported on < 80386"
3558 #endif
3559   // +++ should probably have descriptor checks
3560   // +++ should have exception handlers
...
3640   mov  eax, cr0
3641   or   al, #0x01
3642   mov  cr0, eax
3643   ;; far jump to flush CPU queue after transition to prot. mode
3644   JMP_AP(0x0020, protected_mode)
3645
3646 protected_mode:
3647   ;; GDT points to valid descriptor table, now load SS, DS, ES
3648   mov  ax, #0x28 ;; 101 000 = 5th desc.in table, TI=GDT,RPL=00
3649   mov  ss, ax
3650   mov  ax, #0x10 ;; 010 000 = 2nd desc.in table, TI=GDT,RPL=00
3651   mov  ds, ax
3652   mov  ax, #0x18 ;; 011 000 = 3rd desc.in table, TI=GDT,RPL=00
3653   mov  es, ax
3654   xor  si, si
3655   xor  di, di
3656   cld
3657   rep
3658 movsw  ;; move CX words from DS:SI to ES:DI
3659
3660   ;; make sure DS and ES limits are 64KB
3661   mov ax, #0x28
3662   mov ds, ax
3663   mov es, ax
3664
3665   ;; reset PG bit in CR0 ???
3666   mov  eax, cr0
3667   and  al, #0xFE
3668   mov  cr0, eax

I should be loosing something here... There is no unreal mode at any
moment, is it?

[ ... some web browsing occuring meanwhile ... Later: ]

Okay, now I got another picture. 8-|
Until recently, KVM (and qemu) used Bochs BIOS, showed above; but they
switched recently to SeaBIOS... where the applicable code is in
src/system.c, and looks like (now this is AT&T assembly):
  83 static void
  84 handle_1587(struct bregs *regs)
  85 {
  86 // +++ should probably have descriptor checks
  87 // +++ should have exception handlers

 127 // Enable protected mode
 128 "  mov

Re: [PATCH v2 06/30] KVM: remove realmode_lmsw function.

2010-03-15 Thread Andre Przywara

Gleb Natapov wrote:

Use (get|set)_cr callback to emulate lmsw inside emulator.
I see that vmx.c:handle_cr() is the only other user of kvm_lmsw(). If we 
fix this place similar like you did below, we could get rid of 
kvm_lmsw() entirely. But I am not sure whether it's OK to remove an 
exported symbol.


Regards,
Andre.



Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |2 --
 arch/x86/kvm/emulate.c  |4 ++--
 arch/x86/kvm/x86.c  |7 ---
 3 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e8e108a..1e15a0a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -582,8 +582,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context);
 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
 void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags);
 
 void kvm_enable_efer_bits(u64);

 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5b060e4..5e2fa61 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2486,8 +2486,8 @@ twobyte_insn:
c->dst.val = ops->get_cr(0, ctxt->vcpu);
break;
case 6: /* lmsw */
-   realmode_lmsw(ctxt->vcpu, (u16)c->src.val,
- &ctxt->eflags);
+   ops->set_cr(0, (ops->get_cr(0, ctxt->vcpu) & ~0x0ful) |
+   (c->src.val & 0x0f), ctxt->vcpu);
c->dst.type = OP_NONE;
break;
case 7: /* invlpg*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf714df..b08f8a1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4045,13 +4045,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, 
unsigned long base)
kvm_x86_ops->set_idt(vcpu, &dt);
 }
 
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,

-  unsigned long *rflags)
-{
-   kvm_lmsw(vcpu, msw);
-   *rflags = kvm_get_rflags(vcpu);
-}
-
 static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i)
 {
struct kvm_cpuid_entry2 *e = &vcpu->arch.cpuid_entries[i];



--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Moving dirty bitmaps to userspace - Double buffering approach

2010-03-15 Thread Takuya Yoshikawa

Avi Kivity wrote:

On 03/15/2010 10:33 AM, Marcelo Tosatti wrote:



Are there any good ways to solve this kind of problems?
 

You can introduce a new get_dirty_log ioctl that passes the address
of the next bitmap in userspace, and use it (after pinning with
get_user_pages), instead of vmalloc'ing.



Thank you for your advice!

   


No pinning please, put_user_bit() or set_bit_user().

(can be implemented generically using get_user_pages() and 
kmap_atomic(), but x86 should get an optimized implementation)




Given your advice last time, I started this with my colleague.
 -- We were just talking about how to strugle with every architectures.

As your comment, we'll make the generic implementation with optimized one
for x86 first.

Thanks
  Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

2010-03-15 Thread Balbir Singh
* Avi Kivity  [2010-03-15 11:27:56]:

> >>>The knobs are for
> >>>
> >>>1. Selective enablement
> >>>2. Selective control of the % of unmapped pages
> >>An alternative path is to enable KSM for page cache.  Then we have
> >>direct read-only guest access to host page cache, without any guest
> >>modifications required.  That will be pretty difficult to achieve
> >>though - will need a readonly bit in the page cache radix tree, and
> >>teach all paths to honour it.
> >>
> >Yes, it is, I've taken a quick look. I am not sure if de-duplication
> >would be the best approach, may be dropping the page in the page cache
> >might be a good first step. Data consistency would be much easier to
> >maintain that way, as long as the guest is not writing frequently to
> >that page, we don't need the page cache in the host.
> 
> Trimming the host page cache should happen automatically under
> pressure.  Since the page is cached by the guest, it won't be
> re-read, so the host page is not frequently used and then dropped.
>

Yes, agreed, but dropping is easier than tagging cache as read-only
and getting everybody to understand read-only cached pages. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 12:24:43PM +0200, Avi Kivity wrote:
> On 03/15/2010 12:19 PM, Gleb Natapov wrote:
> >On Mon, Mar 15, 2010 at 12:15:22PM +0200, Avi Kivity wrote:
> >>On 03/15/2010 12:07 PM, Gleb Natapov wrote:
> Or we can make the buffer larger for everyone (outside this patchset
> though).
> 
> >>>I am not sure what do you mean here. INS read ahead and MMIO read cache are
> >>>different beasts. Former is needed to speed-up string pio reads, later
> >>>(not yet implemented) is needed to reread previous MMIO read results in
> >>>case instruction emulation is restarted due to need to exit to userspace.
> >>>MMIO read cache need to be invalidated on each iteration of string
> >>>instruction.
> >>Instructions with multiple reads or writes need an mmio read/write
> >>buffer that can be replayed on re-execution.
> >>
> >>buffer != cache!  A cache can be dropped (perhaps after flushing it
> >>to a backing store), but a buffer in general cannot.
> >>
> >That is just naming. Call it "buffer" if you want.
> >
> >I still don't understand what do you mean by "Or we can make the buffer
> >larger for everyone". Who is this "everyone"? Different instruction need
> >different kind of buffers.
> 
> Many instructions can issue multiple reads, ins is just one of them.
> A generic mmio buffer can be used by everyone.
> 
No, ins can issue only _one_ io read during one iteration (i.e between
each pair of reads there is a commit point). But this is slow, so we do
non-architectural hack: do many reads ahead of time into a buffer and
use results from this buffer for emulation of subsequent iterations.
Other instruction can do multiple reads between instruction fetching and
commit of emulation result and need different kind of buffering
(actually caching is more appropriate here since we "cache" results of
reads from past attempts to emulation same instruction).

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Avi Kivity

On 03/15/2010 12:19 PM, Gleb Natapov wrote:

On Mon, Mar 15, 2010 at 12:15:22PM +0200, Avi Kivity wrote:
   

On 03/15/2010 12:07 PM, Gleb Natapov wrote:
 
   

Or we can make the buffer larger for everyone (outside this patchset
though).

 

I am not sure what do you mean here. INS read ahead and MMIO read cache are
different beasts. Former is needed to speed-up string pio reads, later
(not yet implemented) is needed to reread previous MMIO read results in
case instruction emulation is restarted due to need to exit to userspace.
MMIO read cache need to be invalidated on each iteration of string
instruction.
   

Instructions with multiple reads or writes need an mmio read/write
buffer that can be replayed on re-execution.

buffer != cache!  A cache can be dropped (perhaps after flushing it
to a backing store), but a buffer in general cannot.

 

That is just naming. Call it "buffer" if you want.

I still don't understand what do you mean by "Or we can make the buffer
larger for everyone". Who is this "everyone"? Different instruction need
different kind of buffers.
   


Many instructions can issue multiple reads, ins is just one of them.  A 
generic mmio buffer can be used by everyone.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 12:15:22PM +0200, Avi Kivity wrote:
> On 03/15/2010 12:07 PM, Gleb Natapov wrote:
> >
> >>Or we can make the buffer larger for everyone (outside this patchset
> >>though).
> >>
> >I am not sure what do you mean here. INS read ahead and MMIO read cache are
> >different beasts. Former is needed to speed-up string pio reads, later
> >(not yet implemented) is needed to reread previous MMIO read results in
> >case instruction emulation is restarted due to need to exit to userspace.
> >MMIO read cache need to be invalidated on each iteration of string
> >instruction.
> 
> Instructions with multiple reads or writes need an mmio read/write
> buffer that can be replayed on re-execution.
> 
> buffer != cache!  A cache can be dropped (perhaps after flushing it
> to a backing store), but a buffer in general cannot.
> 
That is just naming. Call it "buffer" if you want.

I still don't understand what do you mean by "Or we can make the buffer
larger for everyone". Who is this "everyone"? Different instruction need
different kind of buffers.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)

2010-03-15 Thread Marcelo Tosatti
On Sun, Mar 14, 2010 at 09:03:47AM +0200, Avi Kivity wrote:
> On 03/10/2010 04:50 PM, Avi Kivity wrote:
> >Currently when we emulate a locked operation into a shadowed guest page
> >table, we perform a write rather than a true atomic.  This is indicated
> >by the "emulating exchange as write" message that shows up in dmesg.
> >
> >In addition, the pte prefetch operation during invlpg suffered from a
> >race.  This was fixed by removing the operation.
> >
> >This patchset fixes both issues and reinstates pte prefetch on invlpg.
> >
> >v2:
> >- fix truncated description for patch 1
> >- add new patch 4, which fixes a bug in patch 5
> 
> No comments, but looks like last week's maintainer neglected to merge this.

Looks fine. Can you please regenerate against next branch? (just
pushed).

For the invlpg prefetch it would be good to confirm the original bug is
not reproducible.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Avi Kivity

On 03/15/2010 12:07 PM, Gleb Natapov wrote:



Or we can make the buffer larger for everyone (outside this patchset
though).

 

I am not sure what do you mean here. INS read ahead and MMIO read cache are
different beasts. Former is needed to speed-up string pio reads, later
(not yet implemented) is needed to reread previous MMIO read results in
case instruction emulation is restarted due to need to exit to userspace.
MMIO read cache need to be invalidated on each iteration of string
instruction.
   


Instructions with multiple reads or writes need an mmio read/write 
buffer that can be replayed on re-execution.


buffer != cache!  A cache can be dropped (perhaps after flushing it to a 
backing store), but a buffer in general cannot.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 11:56:32AM +0200, Avi Kivity wrote:
> On 03/15/2010 11:44 AM, Gleb Natapov wrote:
> >On Mon, Mar 15, 2010 at 09:44:26AM +0200, Avi Kivity wrote:
> >>On 03/14/2010 08:06 PM, Gleb Natapov wrote:
> Suggest simply reentering every N executions.
> 
> >>>This restart mechanism is, in fact, needed for ins read ahead to work.
> >>>After reading ahead from IO port we need to avoid entering decoder
> >>>until entire cache is consumed otherwise decoder will clear cache and
> >>>data will be lost. So we can't just enter guest in arbitrary times, only
> >>>when read ahead cache is empty. Since read ahead is never done across
> >>>page boundary this is save place to re-enter guest.
> >>Please make the two depend on each other directly then.  We can't
> >>expect the reader of the emulator code know that.
> >>
> >We can document that. I wouldn't want to have different conditions for
> >guest re-entry for different opcodes.
> 
> We now have a write buffer size of one.  It's just a matter of
> making the emulator know the size of the buffer (extra parameter to
> ->write_emulated).
> 
The buffer is maintained inside emulator, so emulator knows about it and
can check it, but then for all other string instruction except INS we will
re-enter guest on each iteration.

> >>Have the emulator ask the buffer when it is empty.
> >>
> >It will be always empty for all string ops except INS.
> >
> 
> Or we can make the buffer larger for everyone (outside this patchset
> though).
> 
I am not sure what do you mean here. INS read ahead and MMIO read cache are
different beasts. Former is needed to speed-up string pio reads, later
(not yet implemented) is needed to reread previous MMIO read results in
case instruction emulation is restarted due to need to exit to userspace.
MMIO read cache need to be invalidated on each iteration of string
instruction.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Avi Kivity

On 03/15/2010 11:44 AM, Gleb Natapov wrote:

On Mon, Mar 15, 2010 at 09:44:26AM +0200, Avi Kivity wrote:
   

On 03/14/2010 08:06 PM, Gleb Natapov wrote:
 

Suggest simply reentering every N executions.

 

This restart mechanism is, in fact, needed for ins read ahead to work.
After reading ahead from IO port we need to avoid entering decoder
until entire cache is consumed otherwise decoder will clear cache and
data will be lost. So we can't just enter guest in arbitrary times, only
when read ahead cache is empty. Since read ahead is never done across
page boundary this is save place to re-enter guest.
   

Please make the two depend on each other directly then.  We can't
expect the reader of the emulator code know that.

 

We can document that. I wouldn't want to have different conditions for
guest re-entry for different opcodes.
   


We now have a write buffer size of one.  It's just a matter of making 
the emulator know the size of the buffer (extra parameter to 
->write_emulated).


   

Have the emulator ask the buffer when it is empty.

 

It will be always empty for all string ops except INS.

   


Or we can make the buffer larger for everyone (outside this patchset 
though).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 28/30] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-15 Thread Gleb Natapov
On Mon, Mar 15, 2010 at 09:44:26AM +0200, Avi Kivity wrote:
> On 03/14/2010 08:06 PM, Gleb Natapov wrote:
> >>Suggest simply reentering every N executions.
> >>
> >This restart mechanism is, in fact, needed for ins read ahead to work.
> >After reading ahead from IO port we need to avoid entering decoder
> >until entire cache is consumed otherwise decoder will clear cache and
> >data will be lost. So we can't just enter guest in arbitrary times, only
> >when read ahead cache is empty. Since read ahead is never done across
> >page boundary this is save place to re-enter guest.
> 
> Please make the two depend on each other directly then.  We can't
> expect the reader of the emulator code know that.
> 
We can document that. I wouldn't want to have different conditions for
guest re-entry for different opcodes.

> Have the emulator ask the buffer when it is empty.
> 
It will be always empty for all string ops except INS.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >