Re: [RFC][ PATCH 3/3] vhost-net: Add mergeable RX buffer support to vhost-net

2010-03-08 Thread Michael S. Tsirkin
On Sun, Mar 07, 2010 at 06:06:51PM -0800, David Stevens wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 03/07/2010 08:26:33 AM:
 
  On Tue, Mar 02, 2010 at 05:20:34PM -0700, David Stevens wrote:
   This patch glues them all together and makes sure we
   notify whenever we don't have enough buffers to receive
   a max-sized packet, and adds the feature bit.
   
   Signed-off-by: David L Stevens dlstev...@us.ibm.com
  
  Maybe split this up?
 
 I can. I was looking mostly at size (and this is the smallest
 of the bunch). But the feature requires all of them together, of course.
 This last one is just everything left over from the other two.
 
   @@ -110,6 +90,7 @@
   size_t len, total_len = 0;
   int err, wmem;
   struct socket *sock = rcu_dereference(vq-private_data);
   +
  
  I tend not to add empty lines if line below it is already short.
 
 This leaves no blank line between the declarations and the start
 of code. It's habit for me-- not sure of kernel coding standards address
 that or not, but I don't think I've seen it anywhere else.
 
  
   if (!sock)
   return;
   
   @@ -166,11 +147,11 @@
   /* Skip header. TODO: support TSO. */
   msg.msg_iovlen = out;
   head.iov_len = len = iov_length(vq-iov, out);
   +
  
  I tend not to add empty lines if line below it is a comment.
 
 I added this to separate the logical skip header block from
 the next, unrelated piece. Not important to me, though.
 
  
   /* Sanity check */
   if (!len) {
   vq_err(vq, Unexpected header len for TX: 
   -  %zd expected %zd\n,
   -  len, vq-guest_hlen);
   +  %zd expected %zd\n, len, 
 vq-guest_hlen);
   break;
   }
   /* TODO: Check specific error and bomb out unless 
 ENOBUFS? 
   */
 
 
   /* TODO: Should check and handle checksum. */
   +   if (vhost_has_feature(net-dev, 
 VIRTIO_NET_F_MRG_RXBUF)) 
   {
   +   struct virtio_net_hdr_mrg_rxbuf *vhdr =
   +   (struct virtio_net_hdr_mrg_rxbuf *)
   +   vq-iov[0].iov_base;
   +   /* add num_bufs */
   +   vq-iov[0].iov_len = vq-guest_hlen;
   +   vhdr-num_buffers = headcount;
  
  I don't understand this. iov_base is a userspace pointer, isn't it.
  How can you assign values to it like that?
  Rusty also commented earlier that it's not a good idea to assume
  specific layout, such as first chunk being large enough to
  include virtio_net_hdr_mrg_rxbuf.
  
  I think we need to use memcpy to/from iovec etc.
 
 I guess you mean put_user() or copy_to_user(); yes, I suppose
 it could be paged since we read it.
 The code doesn't assume that it'll fit so much as arranged for
 it to fit. We allocate guest_hlen bytes in the buffer, but set the
 iovec to the (smaller) sock_hlen; do the read, then this code adds
 back the 2 bytes in the middle that we didn't read into (where
 num_buffers goes). But the allocator does require that guest_hlen
 will fit in a single buffer (and reports error if it doesn't). The
 alternative is significantly more complicated,

I'm not sure why. Can't we just call memcpy_from_iovec
and then read the structure as usual?

 and only fails if
 the guest doesn't give us at least the buffer size the guest header
 requires (a truly lame guest). I'm not sure it's worth a lot of
 complexity in vhost to support the guest giving us 12 byte buffers;
 those guests don't exist now and maybe they never should?
 
 
/* This actually signals the guest, using eventfd. */
void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
{
   __u16 flags = 0;
   +
  
  I tend not to add empty lines if a line above it is already short.
 
 Again, separating declarations from code-- never seen different
 in any other kernel code.
 
  
   if (get_user(flags, vq-avail-flags)) {
   vq_err(vq, Failed to get flags);
   return;
   @@ -1125,7 +1140,7 @@
   
   /* If they don't want an interrupt, don't signal, unless 
 empty. */
   if ((flags  VRING_AVAIL_F_NO_INTERRUPT) 
   -   (vq-avail_idx != vq-last_avail_idx ||
   +   (vhost_available(vq)  vq-maxheadcount ||
  
  I don't understand this change. It seems to make
  code not match the comments.
 
 It redefines empty. Without mergeable buffers, we can empty
 the ring down to nothing before we require notification. With
 mergeable buffers, if the packet requires, say, 3 buffers, and we
 have only 2 left, we are empty and require notification and new
 buffers to read anything. In both cases, we notify when we can't
 read another packet 

Re: kvm-kmod-2.6.33 (or 2.6.32) messes up pages on guest exit

2010-03-08 Thread Henrik Holst
2010/3/8 Jan Kiszka jan.kis...@siemens.com:
 Henrik Holst wrote:
 Hi,

  I'm running a few Debian Lenny host machines with kernel 2.6.26, in
 production we use kvm-kmod-2.6.31.5 without any problems. Today I
 tested to change to kvm-kmod-2.6.33 and everything went just fine up
 to the moment when a guest exited and when it did the kernel started
 to log thousands of rows about page errors on the host.

 modprobe -r kvm-intel (and kvm) and modprobe of the 2.6.31.5 version
 made the problems go away again. Could it be that 2.6.26 is a little
 too old kernel to run as host for the newer kvm?

 Maybe. I'm only testing against 2.6.27 as oldest host, down to 2.6.24 is
 solely build-tested. Maybe the missing MMU notfiers in = 2.6.26 cause
 troubles, though this used to work before.

 Can't promise that I find the time to look into this (such old kernels
 are out of official scope). If I managed to, I would try to bisect over
 kvm-kmod-2.6.32 what import from kvm.git or what kvm-kmod wrapping
 brought us the breakage. But maybe someone else finds the time, setup
 support would be provided...
I figured as much, I'll try to bisect when I get the time.

/Henrik Holst
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Moving dirty bitmaps to userspace - Double buffering approach

2010-03-08 Thread Takuya Yoshikawa

Hi, I would like to hear your comments about the following plan:

  Moving dirty bitmaps to userspace
- Double buffering approach

especially I would be glad if I can hear some advice about how
to keep the compatibility.

Thanks in advance,
  Takuya


---
Overview:

Last time, I submitted a patch
  make get dirty log ioctl return the first dirty page's position
   http://www.spinics.net/lists/kvm/msg29724.html
and got some new better ideas from Avi.

As a result, I agreed to try to eliminate the bitmap allocation
done in the x86 KVM every time when we execute get dirty log by
using double buffering approach.


Here is my plan:

- move the dirty bitmap allocation to userspace

We allocate bitmaps in the userspace and register them by ioctl.
Once a bitmap is registered, we do not touch it from userspace
and let the kernel modify it directly until we switch to the next
bitmap. We use double buffering at this switch point: userspace
give the kernel a new bitmap by ioctl and the kernel switch the
bitmap atomically to new one.

After succeeded in this switch, we can read the old bitmap freely
in the userspace and free it if we want: needless to say we can
also reuse it at the next switch.


- implementation details

Although it may be possible to touch the bitmap from the kernel
side without doing kmap, I think kmapping the bitmap is better.
So we may use the following functions paying enough attention to
the preemption control.
  - get_user_pages()
  - kmap_atomic()


- compatibility issues

What I am facing now are the compatibility issues. We have to
support both the userspace and kernel side bitmap allocations
to let the current qemu and KVM work properly.

1. From the kernel side, we have to care bitmap allocations done in both
the kvm_vm_ioctl_set_memory_region() and kvm_vm_ioctl_get_dirty_log().

2. From the userspace side, we have to check the new api's availability
and determine which way we use, e.g. by using check extension ioctl.

The most problematic is 1, kernel side. We have to be able to know
by which way current bitmap allocation is being done using flags or
something. In the case of set memory region, we have to judge whether
we allocate a bitmap, and if not we have to register a bitmap later
by another api: set memory region is not restricted to the dirty log
issues and need more care than get dirty log.

Are there any good ways to solve this kind of problems?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:46 AM, Bernhard Schmidt wrote:

Hi,

sorry for this pretty generic question, I did not find any real pros and
cons on the net anywhere, but I might just have missed them.

In a pure x86_64 environment (~2.6.32 vanilla kernel, 0.12.3 qemu-kvm),
is enabling linux-aio in KVM a good idea?


Yes.


What are the
advantages/disadvantages?


It's faster.


Are there any potential pitfalls?
   


It won't work well unless running on a block device (partition or LVM).


The reason I'm asking is that there has been some traffic on the list
about it, so it seems to be something people want to get working.
qemu-kvm in Ubuntu Lucid is currently not compiled with that option.
I've made a local version with aio and it seems to work fine (and
performs a bit better at first glance).

Is there any reason one should not compile that feature by default?
   


Not to my knowledge.


Does it do anything if not explicitly run with aio=native?
   


IIUC, no.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter

2010-03-08 Thread Avi Kivity

On 03/03/2010 09:12 PM, Joerg Roedel wrote:

This patch changes the tdp_enabled flag from its global
meaning to the mmu-context. This is necessary for Nested SVM
with emulation of Nested Paging where we need an extra MMU
context to shadow the Nested Nested Page Table.


diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..e7bef19 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_mmu {
int root_level;
int shadow_root_level;
union kvm_mmu_page_role base_role;
+   bool tdp_enabled;

   


This needs a different name, since the old one is still around.  Perhaps 
we could call it parent_mmu and make it a kvm_mmu pointer.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker

2010-03-08 Thread Avi Kivity

On 03/03/2010 09:12 PM, Joerg Roedel wrote:

This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.

Signed-off-by: Joerg Roedeljoerg.roe...@amd.com
---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/mmu.c  |7 +++
  arch/x86/kvm/paging_tmpl.h  |   19 +++
  include/linux/kvm_host.h|5 +
  4 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c0b5576..76c8b5f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -250,6 +250,7 @@ struct kvm_mmu {
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
u32 *error);
+   gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error);
void (*prefetch_page)(struct kvm_vcpu *vcpu,
  struct kvm_mmu_page *page);
int (*sync_page)(struct kvm_vcpu *vcpu,
   


I think placing this here means we will miss a few translations, namely 
when we do a physical access (say, reading PDPTEs or similar).


We need to do this on the level of kvm_read_guest() so we capture 
physical accesses:


kvm_read_guest_virt
  - walk_addr
 - kvm_read_guest_tdp
 - kvm_read_guest_virt
- walk_addr
- kvm_read_guest_tdp
 - kvm_read_guest

Of course, not all accesses will use kvm_read_guest_tdp; for example 
kvmclock accesses should still go untranslated.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED

2010-03-08 Thread Avi Kivity

On 03/03/2010 09:12 PM, Joerg Roedel wrote:

This capability shows userspace that is can trust the values
of cpuid[0x800A] that it gets from the kernel. Old
behavior was to just return the host cpuid values which is
broken because all additional svm-features need support in
the svm emulation code.

   


A think we can simply fix the bug and push the fix to the various stable 
queues.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)

2010-03-08 Thread Avi Kivity

On 03/03/2010 09:12 PM, Joerg Roedel wrote:

Hi,

here are the patches that implement nested paging support for nested
svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
in the first round to get feedback about the general direction of the
changes.  Nevertheless I am proud to report that with these patches the
famous kernel-compile benchmark runs only 4% slower in the l2 guest as
in the l1 guest when l2 is single-processor. With SMP guests the
situation is very different. The more vcpus the guest has the more is
the performance drop from l1 to l2.
Anyway, this post is to get feedback about the overall concept of these
patches.  Please review and give feedback :-)

Thanks,

Joerg

Diffstat:

  arch/x86/include/asm/kvm_host.h |   21 ++
  arch/x86/kvm/mmu.c  |  152 ++-
  arch/x86/kvm/mmu.h  |2 +
  arch/x86/kvm/paging_tmpl.h  |   81 ++---
  arch/x86/kvm/svm.c  |  126 +++-
  arch/x86/kvm/vmx.c  |9 +++
  arch/x86/kvm/x86.c  |   19 +-
  include/linux/kvm.h |1 +
  include/linux/kvm_host.h|5 ++
  9 files changed, 354 insertions(+), 62 deletions(-)
   


Okay, this looks excellent overall, it's nice to see how well this fits 
with the existing mmu infrastructure (only ~300 lines added).  The 
performance results are impressive.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Bernhard Schmidt
On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote:

 Are there any potential pitfalls?
 It won't work well unless running on a block device (partition or LVM).

What does work well mean in this context? Potential dataloss?

 Is there any reason one should not compile that feature by default?
 Not to my knowledge.

Thanks, I've filed a bug with Ubuntu to get it enabled.

Bernhard
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Alexander Graf


Am 08.03.2010 um 02:45 schrieb Jamie Lokier ja...@shareable.org:


Paul Brook wrote:
Support an inter-vm shared memory device that maps a shared-memory  
object
as a PCI device in the guest.  This patch also supports interrupts  
between
guest by communicating over a unix domain socket.  This patch  
applies to

the qemu-kvm repository.


No. All new devices should be fully qdev based.

I suspect you've also ignored a load of coherency issues,  
especially when not

using KVM. As soon as you have shared memory in more than one host
thread/process you have to worry about memory barriers.


Yes. Guest-observable behaviour is likely to be quite different on
different hosts, expecially beteen x86 and non-x86 hosts, which is not
good at all for emulation.

Memory barriers performed by the guest would help, but would not
remove the fact that behaviour would vary beteen different host types
if a guest doesn't call them.  I.e. you could accidentally have some
guests working fine for years on x86 hosts, which gain subtle
memory corruption as soon as you run them on a different host.

This is acceptable when recompiling code for different architectures,
but it's asking for trouble with binary guest images which aren't
supposed to depend on host architecture.

However, coherence could be made host-type-independent by the host
mapping and unampping pages, so that each page is only mapped into one
guest (or guest CPU) at a time.  Just like some clustering filesystems
do to maintain coherence.


Or we could put in some code that tells the guest the host shm  
architecture and only accept x86 on x86 for now. If anyone cares for  
other combinations, they're free to implement them.


Seriously, we're looking at an interface designed for kvm here. Let's  
please keep it as simple and fast as possible for the actual use case,  
not some theoretically possible ones.



Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Avi Kivity

On 03/08/2010 11:48 AM, Bernhard Schmidt wrote:

On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote:

   

Are there any potential pitfalls?
   

It won't work well unless running on a block device (partition or LVM).
 

What does work well mean in this context? Potential dataloss?

   


No, it becomes synchronous (=extra slow).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Avi Kivity

On 03/08/2010 12:53 AM, Paul Brook wrote:

Support an inter-vm shared memory device that maps a shared-memory object
as a PCI device in the guest.  This patch also supports interrupts between
guest by communicating over a unix domain socket.  This patch applies to
  the qemu-kvm repository.
 

No. All new devices should be fully qdev based.

I suspect you've also ignored a load of coherency issues, especially when not
using KVM. As soon as you have shared memory in more than one host
thread/process you have to worry about memory barriers.
   


Shouldn't it be sufficient to require the guest to issue barriers (and 
to ensure tcg honours the barriers, if someone wants this with tcg)?.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Support adding a file to qemu's ram allocation

2010-03-08 Thread Avi Kivity

On 03/06/2010 01:52 AM, Cam Macdonell wrote:

This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a 
host file into guest RAM.  This function mmaps the opened file anywhere and 
adds the memory to the ram blocks.

Usage is

qemu_add_file_to_ram(fd, size, MAP_SHARED);
   


A traditional name would be qemu_ram_mmap() as a counterpart to 
qemu_ram_alloc().  Would be nice to accept an offset.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Jamie Lokier
Alexander Graf wrote:
 Or we could put in some code that tells the guest the host shm  
 architecture and only accept x86 on x86 for now. If anyone cares for  
 other combinations, they're free to implement them.
 
 Seriously, we're looking at an interface designed for kvm here. Let's  
 please keep it as simple and fast as possible for the actual use case,  
 not some theoretically possible ones.

The concern is that a perfectly working guest image running on kvm,
the guest being some OS or app that uses this facility (_not_ a
kvm-only guest driver), is later run on qemu on a different host, and
then mostly works except for some silent data corruption.

That is not a theoretical scenario.

Well, the bit with this driver is theoretical, obviously :-)
But not the bit about moving to a different host.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Avi Kivity

On 03/06/2010 01:52 AM, Cam Macdonell wrote:

Support an inter-vm shared memory device that maps a shared-memory object
as a PCI device in the guest.  This patch also supports interrupts between
guest by communicating over a unix domain socket.  This patch applies to the
qemu-kvm repository.

This device now creates a qemu character device and sends 1-bytes messages to
trigger interrupts.  Writes are trigger by writing to the Doorbell register
on the shared memory PCI device.  The lower 8-bits of the value written to this
register are sent as the 1-byte message so different meanings of interrupts can
be supported.

Interrupts are supported between multiple VMs by using a shared memory server

-ivshmemsize in MB,[unix:path][file]

Interrupts can also be used between host and guest as well by implementing a
listener on the host that talks to shared memory server.  The shared memory
server passes file descriptors for the shared memory object and eventfds (our
interrupt mechanism) to the respective qemu instances.

   


Can you provide a spec that describes the device?  This would be useful 
for maintaining the code, writing guest drivers, and as a framework for 
review.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Corrupted filesystem, possible after livemigration with iSCSI storagebackend.

2010-03-08 Thread Espen Berg
In our KVM system we have two iSCSI backends (master/slave 
configuration) with failover and two KVM hosts supporting live migration.


The iSCSI volumes are shared by the host as a block device in KVM, and 
the volumes are available on both frontends.  After a reboot one of the 
KVMs where not able to start again due to file system corruption.  We 
use XFS and have problems to understand what caused the corruption.


We have ruled out the iSCSI backend as both the master and slave data 
where consistent at the time.


Anyone else had similar problems?  What is the recommended way to share 
an iSCSI drive among the two host machines?


Should XFS be ok as a file system for live migration?  I'm not able to 
find any documentation stating that a clustered file system (GFS2 etc.) 
is recommended.  Are there any concurrent writes on the two host 
machines during a livemigtation?


 disk type='block' device='disk'
  driver name='qemu'/
  source dev='/dev/disk/by-path/ip-ip:3260-iscsi-test2-lun-0'/
  target dev='sda' bus='scsi'/
  address type='drive' controller='0' bus='0' unit='0'/
 /disk

#virsh version
Compiled against library: libvir 0.7.6
Using library: libvir 0.7.6
Using API: QEMU 0.7.6
Running hypervisor: QEMU 0.11.0

#uname -a
Linux vm01 2.6.32-bpo.2-amd64 #1 SMP Fri Feb 12 16:50:27 UTC 2010 x86_64 
GNU/Linux


Regards
Espen



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM PMU virtualization

2010-03-08 Thread Avi Kivity

On 03/01/2010 07:17 PM, Peter Zijlstra wrote:




2. For every emulated performance counter the guest activates kvm
allocates a perf_event and configures it for the guest (we may allow
kvm to specify the counter index, the guest would be able to use
rdpmc unintercepted then). Event filtering is also done in this step.
 

rdpmc can never be used unintercepted, for perf might be multiplexing
the actual hw.
   


How often is rdpmc used?  If it is invoked on high frequency 
software-only events (like context switches), then this may be a 
performance issue.  If it is only issued on perf interrupts, we may be 
able to live with it (since we already took an exit for the interrupt).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


extended vga modes?

2010-03-08 Thread Michael Tokarev
After updating qemu-kvm Debian package to 0.12
we've a bugreport about missing video modes which
were present in previous versions.  Big thanks to
the original reporter, Bjørn Mork, who found what
the issue is.  See
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572991
for the bugreport in question, together with the
resolution.

In short, when vgabios were dropped from qemu-kvm
(for whatever yet unknown reason), all local changes
to it were dropped too, including this patch:
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commitdiff;h=ebfac597cf

Note that this patch is present in upstream 0.6c version
of vgabios, but not in 0.6b+ which is currently used in
qemu.

Should vgabios in qemu include that patch?

See also: https://bugzilla.redhat.com/show_bug.cgi?id=501545

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: extended vga modes?

2010-03-08 Thread Avi Kivity

On 03/08/2010 12:20 PM, Michael Tokarev wrote:

After updating qemu-kvm Debian package to 0.12
we've a bugreport about missing video modes which
were present in previous versions.  Big thanks to
the original reporter, Bjørn Mork, who found what
the issue is.  See
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572991
for the bugreport in question, together with the
resolution.

In short, when vgabios were dropped from qemu-kvm
(for whatever yet unknown reason),


What do you mean?  qemu-kvm still carries a local vgabios (see 
kvm/vgabios in qemu-kvm.git).



all local changes
to it were dropped too, including this patch:
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commitdiff;h=ebfac597cf

Note that this patch is present in upstream 0.6c version
of vgabios, but not in 0.6b+ which is currently used in
qemu.

Should vgabios in qemu include that patch?

See also: https://bugzilla.redhat.com/show_bug.cgi?id=501545

   


Looks like Fedora was using the upstream vgabios for a while, not the 
version in qemu-kvm.git.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Alexander Graf
Jamie Lokier wrote:
 Alexander Graf wrote:
   
 Or we could put in some code that tells the guest the host shm  
 architecture and only accept x86 on x86 for now. If anyone cares for  
 other combinations, they're free to implement them.

 Seriously, we're looking at an interface designed for kvm here. Let's  
 please keep it as simple and fast as possible for the actual use case,  
 not some theoretically possible ones.
 

 The concern is that a perfectly working guest image running on kvm,
 the guest being some OS or app that uses this facility (_not_ a
 kvm-only guest driver), is later run on qemu on a different host, and
 then mostly works except for some silent data corruption.

 That is not a theoretical scenario.

 Well, the bit with this driver is theoretical, obviously :-)
 But not the bit about moving to a different host.
   

I agree. Hence there should be a safety check so people can't corrupt
their data silently.

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: extended vga modes?

2010-03-08 Thread Michael Tokarev
Avi Kivity wrote:
[]
 In short, when vgabios were dropped from qemu-kvm
 (for whatever yet unknown reason),
 
 What do you mean?  qemu-kvm still carries a local vgabios (see
 kvm/vgabios in qemu-kvm.git).

Oh my.  So we all overlooked it.  I asked you several times
about the bios sources, in 0.12 seabios were supposed to be
in roms/seabios (which is still empty in the release), and
I thought vgabios should be in roms/vgabios (which is empty
too), and concluded it were dropped from qemu-kvm tarball.
But you're right, and I by mistake take vgabios sources from
upstream qemu when building Debian package, instead of using
the old'good sources from kvm/vgabios.  What a mess!... :(

And it looks like that it's time to remove at least parts of
this mess, don't you think?  How about pushing the vgabios
changes to qemu and moving it to the same place where it is
in qemu?  Does it make sense?

There were another patch mentioned recently when I asked for
bios sources origin a few days ago which probably should be
applied as well...

[]
 Should vgabios in qemu include that patch?

 See also: https://bugzilla.redhat.com/show_bug.cgi?id=501545
 
 Looks like Fedora was using the upstream vgabios for a while, not the
 version in qemu-kvm.git.

Yes they are still using upstream but they also
applied the patches missing in there.

Thank you for the prompt response.  Now I think all
confusion is cleared.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: extended vga modes?

2010-03-08 Thread Avi Kivity

On 03/08/2010 01:07 PM, Michael Tokarev wrote:

Avi Kivity wrote:
[]
   

In short, when vgabios were dropped from qemu-kvm
(for whatever yet unknown reason),
   

What do you mean?  qemu-kvm still carries a local vgabios (see
kvm/vgabios in qemu-kvm.git).
 

Oh my.  So we all overlooked it.  I asked you several times
about the bios sources, in 0.12 seabios were supposed to be
in roms/seabios (which is still empty in the release), and
I thought vgabios should be in roms/vgabios (which is empty
too), and concluded it were dropped from qemu-kvm tarball.
But you're right, and I by mistake take vgabios sources from
upstream qemu when building Debian package, instead of using
the old'good sources from kvm/vgabios.  What a mess!... :(

And it looks like that it's time to remove at least parts of
this mess, don't you think?  How about pushing the vgabios
changes to qemu and moving it to the same place where it is
in qemu?  Does it make sense?
   


We can't push the changes to qemu since qemu.git doesn't have a vgabios 
fork.  We might push the changes upstream.  Best of all if the seabios 
thing repeats itself with vgabios so we have maintainable and maintained 
vga firmware.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


request: please merge docs for -netdev in stable

2010-03-08 Thread xming
hi,

with version 0.12.x there is a new -netdev option, but the docs cannot
be found anywhere.
It seems that this commit
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=96560cb34c3183a4fb1769e4eff4d860a24579a8
is only applied to the unstable but not stable, is it possible to
merge this to stable?

Thanks

xming
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: request: please merge docs for -netdev in stable

2010-03-08 Thread Avi Kivity

Copying qemu-devel

On 03/08/2010 01:11 PM, xming wrote:

hi,

with version 0.12.x there is a new -netdev option, but the docs cannot
be found anywhere.
It seems that this commit
http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=96560cb34c3183a4fb1769e4eff4d860a24579a8
is only applied to the unstable but not stable, is it possible to
merge this to stable?

Thanks

xming
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/3] Let host NIC driver to DMA to guest user space.

2010-03-08 Thread Michael S. Tsirkin
On Sat, Mar 06, 2010 at 05:38:38PM +0800, xiaohui@intel.com wrote:
 From: Xin Xiaohui xiaohui@intel.com
 
 The patch let host NIC driver to receive user space skb,
 then the driver has chance to directly DMA to guest user
 space buffers thru single ethX interface.
 
 Signed-off-by: Xin Xiaohui xiaohui@intel.com
 Signed-off-by: Zhao Yu yzha...@gmail.com
 Sigend-off-by: Jeff Dike jd...@c2.user-mode-linux.org


I have a feeling I commented on some of the below issues already.
Do you plan to send a version with comments addressed?

 ---
  include/linux/netdevice.h |   76 ++-
  include/linux/skbuff.h|   30 +++--
  net/core/dev.c|   32 ++
  net/core/skbuff.c |   79 
 +
  4 files changed, 205 insertions(+), 12 deletions(-)
 
 diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
 index 94958c1..97bf12c 100644
 --- a/include/linux/netdevice.h
 +++ b/include/linux/netdevice.h
 @@ -485,6 +485,17 @@ struct netdev_queue {
   unsigned long   tx_dropped;
  } cacheline_aligned_in_smp;
  
 +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE)
 +struct mpassthru_port{
 + int hdr_len;
 + int data_len;
 + int npages;
 + unsignedflags;
 + struct socket   *sock;
 + struct skb_user_page*(*ctor)(struct mpassthru_port *,
 + struct sk_buff *, int);
 +};
 +#endif
  
  /*
   * This structure defines the management hooks for network devices.
 @@ -636,6 +647,10 @@ struct net_device_ops {
   int (*ndo_fcoe_ddp_done)(struct net_device *dev,
u16 xid);
  #endif
 +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE)
 + int (*ndo_mp_port_prep)(struct net_device *dev,
 + struct mpassthru_port *port);
 +#endif
  };
  
  /*
 @@ -891,7 +906,8 @@ struct net_device
   struct macvlan_port *macvlan_port;
   /* GARP */
   struct garp_port*garp_port;
 -
 + /* mpassthru */
 + struct mpassthru_port   *mp_port;
   /* class/net/name entry */
   struct device   dev;
   /* space for optional statistics and wireless sysfs groups */
 @@ -2013,6 +2029,62 @@ static inline u32 dev_ethtool_get_flags(struct 
 net_device *dev)
   return 0;
   return dev-ethtool_ops-get_flags(dev);
  }
 -#endif /* __KERNEL__ */
  
 +#if defined(CONFIG_VHOST_PASSTHRU) || defined(CONFIG_VHOST_PASSTHRU_MODULE)
 +static inline int netdev_mp_port_prep(struct net_device *dev,
 + struct mpassthru_port *port)
 +{

This function lacks documentation.

 + int rc;
 + int npages, data_len;
 + const struct net_device_ops *ops = dev-netdev_ops;
 +
 + /* needed by packet split */
 + if (ops-ndo_mp_port_prep) {
 + rc = ops-ndo_mp_port_prep(dev, port);
 + if (rc)
 + return rc;
 + } else {  /* should be temp */
 + port-hdr_len = 128;
 + port-data_len = 2048;
 + port-npages = 1;

where do the numbers come from?

 + }
 +
 + if (port-hdr_len = 0)
 + goto err;
 +
 + npages = port-npages;
 + data_len = port-data_len;
 + if (npages = 0 || npages  MAX_SKB_FRAGS ||
 + (data_len  PAGE_SIZE * (npages - 1) ||
 +  data_len  PAGE_SIZE * npages))
 + goto err;
 +
 + return 0;
 +err:
 + dev_warn(dev-dev, invalid page constructor parameters\n);
 +
 + return -EINVAL;
 +}
 +
 +static inline int netdev_mp_port_attach(struct net_device *dev,
 + struct mpassthru_port *port)
 +{
 + if (rcu_dereference(dev-mp_port))
 + return -EBUSY;
 +
 + rcu_assign_pointer(dev-mp_port, port);
 +
 + return 0;
 +}
 +
 +static inline void netdev_mp_port_detach(struct net_device *dev)
 +{
 + if (!rcu_dereference(dev-mp_port))
 + return;
 +
 + rcu_assign_pointer(dev-mp_port, NULL);
 + synchronize_rcu();
 +}

The above looks wrong, rcu_dereference should be called
under rcu read side, rcu_assign_pointer usually should not,
synchronize_rcu definitely should not.

As I suggested already, these functions are better opencoded,
rcu is tricky as is without hiding it in inline helpers.

 +#endif /* CONFIG_VHOST_PASSTHRU */
 +#endif /* __KERNEL__ */
  #endif   /* _LINUX_NETDEVICE_H */
 diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
 index df7b23a..e59fa57 100644
 --- a/include/linux/skbuff.h
 +++ b/include/linux/skbuff.h
 @@ -209,6 +209,13 @@ struct skb_shared_info {
   void *  destructor_arg;
  };
  
 +struct skb_user_page {
 + u8  *start;
 + int size;
 + struct skb_frag_struct *frags;
 

Re: [PATCH v1 1/3] A device for zero-copy based on KVM virtio-net.

2010-03-08 Thread Michael S. Tsirkin
On Sat, Mar 06, 2010 at 05:38:36PM +0800, xiaohui@intel.com wrote:
 From: Xin Xiaohui xiaohui@intel.com
 
 Add a device to utilize the vhost-net backend driver for
 copy-less data transfer between guest FE and host NIC.
 It pins the guest user space to the host memory and
 provides proto_ops as sendmsg/recvmsg to vhost-net.
 
 Signed-off-by: Xin Xiaohui xiaohui@intel.com
 Signed-off-by: Zhao Yu yzha...@gmail.com
 Sigend-off-by: Jeff Dike jd...@c2.user-mode-linux.org

I think some of the comments below are repeated.
Do you plan addressing them?

 ---
  drivers/vhost/Kconfig |5 +
  drivers/vhost/Makefile|2 +
  drivers/vhost/mpassthru.c | 1202 
 +
  include/linux/mpassthru.h |   29 ++

I'm not sure it's wise to limit the device to
vhost even if that's the only mode that you are going to
support in the first version.
How about locating the char device under drivers/net/?

  4 files changed, 1238 insertions(+), 0 deletions(-)
  create mode 100644 drivers/vhost/mpassthru.c
  create mode 100644 include/linux/mpassthru.h
 
 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 index 9f409f4..ee32a3b 100644
 --- a/drivers/vhost/Kconfig
 +++ b/drivers/vhost/Kconfig
 @@ -9,3 +9,8 @@ config VHOST_NET
 To compile this driver as a module, choose M here: the module will
 be called vhost_net.
  
 +config VHOST_PASSTHRU
 + tristate Zerocopy network driver (EXPERIMENTAL)
 + depends on VHOST_NET
 + ---help---
 +   zerocopy network I/O support
 diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
 index 72dd020..3f79c79 100644
 --- a/drivers/vhost/Makefile
 +++ b/drivers/vhost/Makefile
 @@ -1,2 +1,4 @@
  obj-$(CONFIG_VHOST_NET) += vhost_net.o
  vhost_net-y := vhost.o net.o
 +
 +obj-$(CONFIG_VHOST_PASSTHRU) += mpassthru.o
 diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c
 new file mode 100644
 index 000..744d6cd
 --- /dev/null
 +++ b/drivers/vhost/mpassthru.c
 @@ -0,0 +1,1202 @@
 +/*
 + *  MPASSTHRU - Mediate passthrough device.
 + *  Copyright (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License as published by
 + *  the Free Software Foundation; either version 2 of the License, or
 + *  (at your option) any later version.
 + *
 + *  This program is distributed in the hope that it will be useful,
 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 + *  GNU General Public License for more details.
 + *
 + */
 +
 +#define DRV_NAMEmpassthru
 +#define DRV_DESCRIPTION Mediate passthru device driver
 +#define DRV_COPYRIGHT   (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G
 +
 +#include linux/module.h
 +#include linux/errno.h
 +#include linux/kernel.h
 +#include linux/major.h
 +#include linux/slab.h
 +#include linux/smp_lock.h
 +#include linux/poll.h
 +#include linux/fcntl.h
 +#include linux/init.h
 +#include linux/skbuff.h
 +#include linux/netdevice.h
 +#include linux/etherdevice.h
 +#include linux/miscdevice.h
 +#include linux/ethtool.h
 +#include linux/rtnetlink.h
 +#include linux/if.h
 +#include linux/if_arp.h
 +#include linux/if_ether.h
 +#include linux/crc32.h
 +#include linux/nsproxy.h
 +#include linux/uaccess.h
 +#include linux/virtio_net.h
 +#include linux/mpassthru.h
 +#include net/net_namespace.h
 +#include net/netns/generic.h
 +#include net/rtnetlink.h
 +#include net/sock.h
 +
 +#include asm/system.h
 +
 +#include vhost.h
 +
 +/* Uncomment to enable debugging */
 +/* #define MPASSTHRU_DEBUG 1 */
 +
 +#ifdef MPASSTHRU_DEBUG
 +static int debug;
 +
 +#define DBG  if (mp-debug) printk
 +#define DBG1 if (debug == 2) printk
 +#else
 +#define DBG(a...)
 +#define DBG1(a...)
 +#endif
 +
 +#define COPY_THRESHOLD (L1_CACHE_BYTES * 4)
 +#define COPY_HDR_LEN   (L1_CACHE_BYTES  64 ? 64 : L1_CACHE_BYTES)
 +
 +struct frag {
 + u16 offset;
 + u16 size;
 +};
 +
 +struct page_ctor {
 + struct list_headreadq;
 + int w_len;
 + int r_len;
 + spinlock_t  read_lock;
 + atomic_trefcnt;
 + struct kmem_cache   *cache;
 + struct net_device   *dev;
 + struct mpassthru_port   port;
 + void*sendctrl;
 + void*recvctrl;
 +};
 +
 +struct page_info {
 + struct list_headlist;
 + int header;
 + /* indicate the actual length of bytes
 +  * send/recv in the user space buffers
 +  */
 + int total;
 + int offset;
 + struct page *pages[MAX_SKB_FRAGS+1];
 + struct skb_frag_struct  frag[MAX_SKB_FRAGS+1];
 + struct sk_buff  *skb;
 + struct page_ctor*ctor;
 +
 + /* The pointer relayed to skb, 

Re: [RFC] Moving dirty bitmaps to userspace - Double buffering approach

2010-03-08 Thread Avi Kivity

On 03/08/2010 10:22 AM, Takuya Yoshikawa wrote:

Hi, I would like to hear your comments about the following plan:

  Moving dirty bitmaps to userspace
- Double buffering approach

especially I would be glad if I can hear some advice about how
to keep the compatibility.

Thanks in advance,
  Takuya


---
Overview:

Last time, I submitted a patch
  make get dirty log ioctl return the first dirty page's position
   http://www.spinics.net/lists/kvm/msg29724.html
and got some new better ideas from Avi.

As a result, I agreed to try to eliminate the bitmap allocation
done in the x86 KVM every time when we execute get dirty log by
using double buffering approach.




[...]


Although it may be possible to touch the bitmap from the kernel
side without doing kmap, I think kmapping the bitmap is better.
So we may use the following functions paying enough attention to
the preemption control.
  - get_user_pages()
  - kmap_atomic()


Although direct access is more difficult (you need to implement 
put_user_bit() or similar) I think it is worthwhile.  
get_user_pages_fast() is fast, but nowhere near as fast as 
put_user_bit() (or set_bit_user()), which can be just two instructions 
in the fast path.




- compatibility issues

What I am facing now are the compatibility issues. We have to
support both the userspace and kernel side bitmap allocations
to let the current qemu and KVM work properly.

1. From the kernel side, we have to care bitmap allocations done in both
the kvm_vm_ioctl_set_memory_region() and kvm_vm_ioctl_get_dirty_log().


One way to handle this is to call do_mmap() from the kernel, so that now 
the bitmap really lives in user space.  This is a bit ugly but I think 
acceptable.  We already do this for KVM_SET_MEMORY_REGION (which was 
replaced by KVM_SET_USER_MEMORY_REGION, which moved allocation to 
userspace).




2. From the userspace side, we have to check the new api's availability
and determine which way we use, e.g. by using check extension ioctl.

The most problematic is 1, kernel side. We have to be able to know
by which way current bitmap allocation is being done using flags or
something. In the case of set memory region, we have to judge whether
we allocate a bitmap, and if not we have to register a bitmap later
by another api: set memory region is not restricted to the dirty log
issues and need more care than get dirty log.

Are there any good ways to solve this kind of problems?


I believe that do_mmap() will simplify this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Christoph Hellwig
On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote:
 Are there any potential pitfalls?


 It won't work well unless running on a block device (partition or LVM).

It will actually work well on pre-allocated filesystem images, at least
on XFS and NFS.

The real pitfal is that cache=none is required for kernel support as
it only supports O_DIRECT.

 Is there any reason one should not compile that feature by default?

It's compiled by default if libaio and it's development headers are
found.

 Does it do anything if not explicitly run with aio=native?

No.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Paul Brook
 On 03/08/2010 12:53 AM, Paul Brook wrote:
  Support an inter-vm shared memory device that maps a shared-memory
  object as a PCI device in the guest.  This patch also supports
  interrupts between guest by communicating over a unix domain socket. 
  This patch applies to the qemu-kvm repository.
 
  No. All new devices should be fully qdev based.
 
  I suspect you've also ignored a load of coherency issues, especially when
  not using KVM. As soon as you have shared memory in more than one host
  thread/process you have to worry about memory barriers.
 
 Shouldn't it be sufficient to require the guest to issue barriers (and
 to ensure tcg honours the barriers, if someone wants this with tcg)?.

In a cross environment that becomes extremely hairy.  For example the x86 
architecture effectively has an implicit write barrier before every store, and 
an implicit read barrier before every load.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Paul Brook
 However, coherence could be made host-type-independent by the host
 mapping and unampping pages, so that each page is only mapped into one
 guest (or guest CPU) at a time.  Just like some clustering filesystems
 do to maintain coherence.

You're assuming that a TLB flush implies a write barrier, and a TLB miss 
implies a read barrier.  I'd be surprised if this were true in general.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:03 PM, Paul Brook wrote:

On 03/08/2010 12:53 AM, Paul Brook wrote:
 

Support an inter-vm shared memory device that maps a shared-memory
object as a PCI device in the guest.  This patch also supports
interrupts between guest by communicating over a unix domain socket.
This patch applies to the qemu-kvm repository.
 

No. All new devices should be fully qdev based.

I suspect you've also ignored a load of coherency issues, especially when
not using KVM. As soon as you have shared memory in more than one host
thread/process you have to worry about memory barriers.
   

Shouldn't it be sufficient to require the guest to issue barriers (and
to ensure tcg honours the barriers, if someone wants this with tcg)?.
 

In a cross environment that becomes extremely hairy.  For example the x86
architecture effectively has an implicit write barrier before every store, and
an implicit read barrier before every load.
   


Ah yes.  For cross tcg environments you can map the memory using mmio 
callbacks instead of directly, and issue the appropriate barriers there.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Guest mmap.c bug

2010-03-08 Thread Avi Kivity

On 03/02/2010 10:25 PM, BRUNO CESAR RIBAS wrote:

Hi,

I run a bunch of virtual servers using KVM. And I a mmap.c bug on the guest
machine. The virtual machines are desktop servers for Thin Clients.

My host is running a 2.6.33 kernel and have 32GB of rami, opteron with
amd-v.

The guest is running 2.6.27.45 (tried 2.6.31.12, 2.6.32.9, 2.6.33), some
guests are using 10GB, 4GB or 20GB of ram.

My qemu-kvm version is 0.12.3

All guests are using NFSROOT as the ROOT FS and virtio as the network
driver.

I run the guest with:
kvm  -cpu kvm64 -smp 4 -vnc :101 -daemonize -name ${NOME} -localtime -m $RAM
-net nic,macaddr=$VLAN0,model=virtio,vlan=0 -net tap,vlan=0,ifname=${NOME}0\
-net nic,macaddr=$VLAN121,model=virtio,vlan=121 -net 
tap,vlan=121,ifname=${NOME}121\
-net nic,macaddr=$VLAN112,model=virtio,vlan=112 -net 
tap,vlan=112,ifname=${NOME}112\
-kernel /root/vmlinuz-2.6.27.45-amd64-aufs-guest \
-append root=/dev/nfs rw ip=dhcp nfsroot=$5 init=/sbin/boot.sh


I have a machine running an identical kernel (without virtio stuff) for a
dedicated machine (as it does not have amd-v) and it stays up for days and
even months. But when running a guest machine with qemu-kvm i get some bug
message and lots of process in D state and i can't 'ps aux' or look inside
/proc and /sys without losing my shell (it hangs).

   



In `console` I get the folowing message, repeated for different processor,
different Pid and diferent  mmap.c line (line 486 appears to).

[ cut here ]
kernel BUG at mm/mmap.c:869!
invalid opcode:  [1] SMP
CPU 2
Pid: 31334, comm: nautilus Not tainted 2.6.27.45-amd64-aufs-guest-00267
  #2
RIP: 0010:[8027b2e1]  [8027b2e1] find_mergeable_ano
f1/0x200
RSP: :8804d933fb38  EFLAGS: 00010283
RAX: 8804cb44b9a8 RBX: 8804cb44b978 RCX: 8804fe6d3088
RDX: f4803000 RSI: 8804fe6d3088 RDI: 88049fa56138
RBP: 88049fa56138 R08: 8804d933e000 R09: 
R10:  R11:  R12: 00100073
R13: 00100073 R14: f4803000 R15: 806ce6c0
FS:  () GS:88051cc7d440(0063) knlGS:f41
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: f4803000 CR3: 0004a7d39000 CR4: 06a0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process nautilus (pid: 31334, threadinfo 8804d933e000, task 880
)
Stack:  8052e62d   88049fa5
  88051a5aac40 80280382 8804cb41b790 880498919018
   88049f8dad20 3000 802770aa
Call Trace:
  [8052e62d] ? _spin_lock_irq+0xd/0x10
  [80280382] ? anon_vma_prepare+0x52/0x100
  [802770aa] ? handle_mm_fault+0x65a/0x900
  [802de6d8] ? proc_alloc_inode+0x58/0x90
  [8052e545] ? __down_read+0x85/0xbc
  [80223331] ? do_page_fault+0x2a1/0xab0
  [803d6899] ? vsnprintf+0x4d9/0x750
  [8029d7a1] ? do_lookup+0x81/0x240
  [8027265d] ? zone_statistics+0x7d/0x80
  [8052ea3a] ? error_exit+0x0/0x70
  [803d706d] ? copy_user_generic_string+0x2d/0x40
  [802e35ec] ? proc_file_read+0x12c/0x2e0
  [802e34c0] ? proc_file_read+0x0/0x2e0
  [802dec1a] ? proc_reg_read+0x8a/0xe0
  [80295995] ? vfs_read+0xb5/0x160
  [80295b2e] ? sys_read+0x4e/0x90
  [80227004] ? ia32_sysret+0x0/0x5


Code: 29 d0 48 c1 e8 0c 48 01 f8 48 3b 83 88 00 00 00 0f 85 5b fe ff ff
  78 e9 c5 fe ff ff 0f 1f 00 31 f6 31 db e9 a9 fe ff ff0f  0b eb fe 66
  1f 84 00 00 00 00 00 48 83 ec 08 48 8b
RIP  [8027b2e1] find_mergeable_anon_vma+0x1f1/0x200
  RSP8804d933fb38
---[ end trace e5ca25224cd7d1d4 ]---


Does anyone has a sugestion? Where to look? What else should I trace?

   


It looks unrelated to kvm, though of course random memory corruption 
cannot be ruled out.


Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)?

Andrea, any idea?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Avi Kivity

On 03/05/2010 06:50 PM, Alexander Graf wrote:

We have wrappers to do for example gpr read/write accesses with,
because the contents of registers could be either in the PACA
or in the VCPU struct.

There's nothing that says we have to have the guest vcpu loaded
when using these wrappers though, so let's introduce a flag that
tells us whether we're inside a vcpu_load context.

   


On x86 we always access registers within vcpu_load() context.  That 
simplifies things.  Does this not apply here?


Even so, sometimes guest registers are present on the cpu, and sometimes 
in shadow variables (for example, msrs might be loaded or not).  The 
approach here is to always unload and access the variable data.  See for 
example vmx_set_msr() calling vmx_load_host_state() before accessing msrs.


Seems like this could reduce the if () tree?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Avi Kivity

On 03/05/2010 06:50 PM, Alexander Graf wrote:

Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.

Signed-off-by: Alexander Grafag...@suse.de
---
  arch/powerpc/include/asm/kvm.h |3 +++
  arch/powerpc/include/asm/kvm_ppc.h |2 ++
  arch/powerpc/kvm/book3s.c  |6 ++
  arch/powerpc/kvm/powerpc.c |5 -
  4 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index 19bae31..6c5547d 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -84,4 +84,7 @@ struct kvm_guest_debug_arch {
  #define KVM_REG_QPR   0x0040
  #define KVM_REG_FQPR  0x0060

+#define KVM_INTERRUPT_SET  -1U
+#define KVM_INTERRUPT_UNSET-2U
   


Funny choice of numbers.

How does userspace know they exist?

Can you use KVM_IRQ_LINE?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/05/2010 06:50 PM, Alexander Graf wrote:
 We have wrappers to do for example gpr read/write accesses with,
 because the contents of registers could be either in the PACA
 or in the VCPU struct.

 There's nothing that says we have to have the guest vcpu loaded
 when using these wrappers though, so let's introduce a flag that
 tells us whether we're inside a vcpu_load context.



 On x86 we always access registers within vcpu_load() context.  That
 simplifies things.  Does this not apply here?

 Even so, sometimes guest registers are present on the cpu, and
 sometimes in shadow variables (for example, msrs might be loaded or
 not).  The approach here is to always unload and access the variable
 data.  See for example vmx_set_msr() calling vmx_load_host_state()
 before accessing msrs.

 Seems like this could reduce the if () tree?

Well - it would probably render this particular patch void. In fact, I
think it is already useless thanks to the other always do vcpu_load patch.

As far as the already existing if goes, we can't really get rid of that.
I want to be fast in the instruction emulation. Copying around the
registers won't help there.



Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/05/2010 06:50 PM, Alexander Graf wrote:
 Userspace can tell us that it wants to trigger an interrupt. But
 so far it can't tell us that it wants to stop triggering one.

 So let's interpret the parameter to the ioctl that we have anyways
 to tell us if we want to raise or lower the interrupt line.

 Signed-off-by: Alexander Grafag...@suse.de
 ---
   arch/powerpc/include/asm/kvm.h |3 +++
   arch/powerpc/include/asm/kvm_ppc.h |2 ++
   arch/powerpc/kvm/book3s.c  |6 ++
   arch/powerpc/kvm/powerpc.c |5 -
   4 files changed, 15 insertions(+), 1 deletions(-)

 diff --git a/arch/powerpc/include/asm/kvm.h
 b/arch/powerpc/include/asm/kvm.h
 index 19bae31..6c5547d 100644
 --- a/arch/powerpc/include/asm/kvm.h
 +++ b/arch/powerpc/include/asm/kvm.h
 @@ -84,4 +84,7 @@ struct kvm_guest_debug_arch {
   #define KVM_REG_QPR0x0040
   #define KVM_REG_FQPR0x0060

 +#define KVM_INTERRUPT_SET-1U
 +#define KVM_INTERRUPT_UNSET-2U


 Funny choice of numbers.

Qemu currently does explicitly set -1U and is the only user.

 How does userspace know they exist?

#ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that
won't work without the hypervisor call anyways.

 Can you use KVM_IRQ_LINE?

I'd rather like to keep that around for when we get an in-kernel-mpic,
which is what we probably ultimately want for qemu.



Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:44 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/05/2010 06:50 PM, Alexander Graf wrote:
 

We have wrappers to do for example gpr read/write accesses with,
because the contents of registers could be either in the PACA
or in the VCPU struct.

There's nothing that says we have to have the guest vcpu loaded
when using these wrappers though, so let's introduce a flag that
tells us whether we're inside a vcpu_load context.


   

On x86 we always access registers within vcpu_load() context.  That
simplifies things.  Does this not apply here?

Even so, sometimes guest registers are present on the cpu, and
sometimes in shadow variables (for example, msrs might be loaded or
not).  The approach here is to always unload and access the variable
data.  See for example vmx_set_msr() calling vmx_load_host_state()
before accessing msrs.

Seems like this could reduce the if () tree?
 

Well - it would probably render this particular patch void. In fact, I
think it is already useless thanks to the other always do vcpu_load patch.

As far as the already existing if goes, we can't really get rid of that.
I want to be fast in the instruction emulation. Copying around the
registers won't help there.
   


So do it the other way around.  Always load the registers (of course, do 
nothing if already loaded) and then access them in just one way.  I 
assume during emulation the registers will always be loaded?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/05/2010 06:50 PM, Alexander Graf wrote:
   }
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index ce28767..c7ed3cb 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -400,6 +400,12 @@ struct kvm_ioeventfd {
   __u8  pad[36];
   };

 +/* for KVM_ENABLE_CAP */
 +struct kvm_enable_cap {
 +/* in */
 +__u32 cap;


 Reserve space here.  Add a flags field and check it for zeros.

Flags? How about something like

u64 args[4]

That way the capability enabling code could decide what to do with the
arguments. We don't always only need flags I suppose?.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:48 PM, Alexander Graf wrote:




How does userspace know they exist?
 

#ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that
won't work without the hypervisor call anyways.
   


We generally compile on one machine, and run on another.


Can you use KVM_IRQ_LINE?
 

I'd rather like to keep that around for when we get an in-kernel-mpic,
which is what we probably ultimately want for qemu.
   


Yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:51 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/05/2010 06:50 PM, Alexander Graf wrote:
 

   }
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..c7ed3cb 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -400,6 +400,12 @@ struct kvm_ioeventfd {
   __u8  pad[36];
   };

+/* for KVM_ENABLE_CAP */
+struct kvm_enable_cap {
+/* in */
+__u32 cap;

   

Reserve space here.  Add a flags field and check it for zeros.
 

Flags? How about something like

u64 args[4]

That way the capability enabling code could decide what to do with the
arguments. We don't always only need flags I suppose?.
   


If you interpret these as bit flags anyway, that would be redundant.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 03:44 PM, Alexander Graf wrote:
 Avi Kivity wrote:
   
 On 03/05/2010 06:50 PM, Alexander Graf wrote:
 
 We have wrappers to do for example gpr read/write accesses with,
 because the contents of registers could be either in the PACA
 or in the VCPU struct.

 There's nothing that says we have to have the guest vcpu loaded
 when using these wrappers though, so let's introduce a flag that
 tells us whether we're inside a vcpu_load context.



 On x86 we always access registers within vcpu_load() context.  That
 simplifies things.  Does this not apply here?

 Even so, sometimes guest registers are present on the cpu, and
 sometimes in shadow variables (for example, msrs might be loaded or
 not).  The approach here is to always unload and access the variable
 data.  See for example vmx_set_msr() calling vmx_load_host_state()
 before accessing msrs.

 Seems like this could reduce the if () tree?
  
 Well - it would probably render this particular patch void. In fact, I
 think it is already useless thanks to the other always do vcpu_load
 patch.

 As far as the already existing if goes, we can't really get rid of that.
 I want to be fast in the instruction emulation. Copying around the
 registers won't help there.


 So do it the other way around.  Always load the registers (of course,
 do nothing if already loaded) and then access them in just one way.  I
 assume during emulation the registers will always be loaded?

During emulation we're always in VCPU_RUN, so the vcpu is loaded.

Do you mean something like:

read_register(num) {
  vcpu_load();
  read register from PACA(num);
  vcpu_put();
}

? Does vcpu_load incur overhead when it doesnt' need to do anything?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 03:48 PM, Alexander Graf wrote:


 How does userspace know they exist?
  
 #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that
 won't work without the hypervisor call anyways.


 We generally compile on one machine, and run on another.

So? Then IRQ unsetting doesn't work. Without this series you won't get
much further than booting the kernel anyways because XER is broken, TLB
flushes are broken and FPU loading is broken. So not being able to unset
an IRQ line is the least of your problems :).


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 03:51 PM, Alexander Graf wrote:
 Avi Kivity wrote:
   
 On 03/05/2010 06:50 PM, Alexander Graf wrote:
 
}
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index ce28767..c7ed3cb 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -400,6 +400,12 @@ struct kvm_ioeventfd {
__u8  pad[36];
};

 +/* for KVM_ENABLE_CAP */
 +struct kvm_enable_cap {
 +/* in */
 +__u32 cap;


 Reserve space here.  Add a flags field and check it for zeros.
  
 Flags? How about something like

 u64 args[4]

 That way the capability enabling code could decide what to do with the
 arguments. We don't always only need flags I suppose?.


 If you interpret these as bit flags anyway, that would be redundant.


I think I just don't understand what you're trying to say with flags.
For the OSI enabling we don't need any flags. For later additions we
don't know what we'll need.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:55 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/08/2010 03:48 PM, Alexander Graf wrote:
 


   

How does userspace know they exist?

 

#ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And that
won't work without the hypervisor call anyways.

   

We generally compile on one machine, and run on another.
 

So? Then IRQ unsetting doesn't work. Without this series you won't get
much further than booting the kernel anyways because XER is broken, TLB
flushes are broken and FPU loading is broken. So not being able to unset
an IRQ line is the least of your problems :).
   


There's a difference between an error message telling you to upgrade to 
a kernel with KVM_CAP_BLAH and a failure.  It's the difference between a 
bug report and silence.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 03:55 PM, Alexander Graf wrote:
 Avi Kivity wrote:
   
 On 03/08/2010 03:48 PM, Alexander Graf wrote:
 

   
 How does userspace know they exist?

  
 #ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And
 that
 won't work without the hypervisor call anyways.


 We generally compile on one machine, and run on another.
  
 So? Then IRQ unsetting doesn't work. Without this series you won't get
 much further than booting the kernel anyways because XER is broken, TLB
 flushes are broken and FPU loading is broken. So not being able to unset
 an IRQ line is the least of your problems :).


 There's a difference between an error message telling you to upgrade
 to a kernel with KVM_CAP_BLAH and a failure.  It's the difference
 between a bug report and silence.

I see. So we can check for KVM_CAP_PPC_OSI and know that it's in the
same patch series, also making KVM_INTERRUPT_XXX work, right? Or do you
really want to have 500 capabilities for every single patch?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:56 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/08/2010 03:51 PM, Alexander Graf wrote:
 

Avi Kivity wrote:

   

On 03/05/2010 06:50 PM, Alexander Graf wrote:

 

}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..c7ed3cb 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -400,6 +400,12 @@ struct kvm_ioeventfd {
__u8  pad[36];
};

+/* for KVM_ENABLE_CAP */
+struct kvm_enable_cap {
+/* in */
+__u32 cap;


   

Reserve space here.  Add a flags field and check it for zeros.

 

Flags? How about something like

u64 args[4]

That way the capability enabling code could decide what to do with the
arguments. We don't always only need flags I suppose?.

   

If you interpret these as bit flags anyway, that would be redundant.

 

I think I just don't understand what you're trying to say with flags.
For the OSI enabling we don't need any flags. For later additions we
don't know what we'll need.
   


When we have reserved fields which are later used for something new, the 
kernel needs a way to know if the reserved fields are known or not by 
userspace.  One way to do this is to assume a value of zero means the 
field is unknown to usespace so ignore it.  Another is to require 
userspace to set a bit in an already-known flags field, and only act on 
the new field if its bit was set.  This has the advantage that the old 
kernel checks for unknown flags and errors out, improving forwards and 
backwards compatibility.


I thought -cap was already a bit field, so this isn't necessary, but if 
it isn't, then a flags field is helpful.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Avi Kivity

On 03/08/2010 03:53 PM, Alexander Graf wrote:



So do it the other way around.  Always load the registers (of course,
do nothing if already loaded) and then access them in just one way.  I
assume during emulation the registers will always be loaded?
 

During emulation we're always in VCPU_RUN, so the vcpu is loaded.

Do you mean something like:

read_register(num) {
   vcpu_load();
   read register from PACA(num);
   vcpu_put();
}

? Does vcpu_load incur overhead when it doesnt' need to do anything?
   


If the vcpu is always loaded, this would be redundant, no?

The situation is that a piece of data is in one of two places.  Instead 
of checking and loading it from either, force it to the place where it 
normally is, and load it from there.


So instead of

if (x)
y = p1;
else
y = p2;

in a zillion places, just do

force_to_p2(); // the common case anyway
y = p2;

which results in cleaner code.  Assuming that you have a common case of 
course.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Avi Kivity

On 03/08/2010 04:01 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/08/2010 03:55 PM, Alexander Graf wrote:
 

Avi Kivity wrote:

   

On 03/08/2010 03:48 PM, Alexander Graf wrote:

 


   

How does userspace know they exist?


 

#ifdef KVM_INTERRUPT_SET? MOL is the only user of this so far. And
that
won't work without the hypervisor call anyways.


   

We generally compile on one machine, and run on another.

 

So? Then IRQ unsetting doesn't work. Without this series you won't get
much further than booting the kernel anyways because XER is broken, TLB
flushes are broken and FPU loading is broken. So not being able to unset
an IRQ line is the least of your problems :).

   

There's a difference between an error message telling you to upgrade
to a kernel with KVM_CAP_BLAH and a failure.  It's the difference
between a bug report and silence.
 

I see. So we can check for KVM_CAP_PPC_OSI and know that it's in the
same patch series, also making KVM_INTERRUPT_XXX work, right? Or do you
really want to have 500 capabilities for every single patch?
   


Having individual capabilities makes backporting a lot easier (otherwise 
you have to backport the whole thing).  If the changes are logically 
separate, I prefer 500 separate capabilities.


However, for a platform bringup, it's okay to have just one capability, 
assuming none of the changes are applicable to other platforms.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation

2010-03-08 Thread Stefan Bader
Avi Kivity wrote:
 On 03/06/2010 03:53 PM, Stefan Bader wrote:
 i Avi,

 we currently try to integrate this patch for an update into a 2.6.32
 based
 system (amongst other kvm updates). But as soon as this patch gets
 added kvm
 will die on startup in kvm_leave_lazy_mmu. This has been documented here:

 https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823

 I have placed the backports of your patches, which are currently in
 linux-next
 and marked for stable here:

 git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm

 I have tested the failure with a version that got only the following
 patches in:
 KVM: x86 emulator: Add Virtual-8086 mode of emulation
 KVM: x86 emulator: fix memory access during x86 emulation
 KVM: x86 emulator: Check IOPL level during io instruction emulation
 KVM: x86 emulator: Fix popf emulation
 KVM: x86 emulator: Check CPL level during privilege instruction emulation

 and also with a version that takes all stable patches up to the bad one:
 KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
 KVM: x86 emulator: Add group8 instruction decoding
 KVM: x86 emulator: Add group9 instruction decoding
 KVM: x86 emulator: Add Virtual-8086 mode of emulation
 KVM: x86 emulator: fix memory access during x86 emulation

 But as soon as the fix for memory access gets added, the bug will
 occur. Would
 you have an idea what might be causing this?

 
 Does the same guest, using the same qemu-kvm, work on kvm.git or upstream?
 
The test was done with a kvm user-space package based on 0.12.3 (which seems to
be the current upstream version). I try to do a test on the git version.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 03:56 PM, Alexander Graf wrote:
 Avi Kivity wrote:
   
 On 03/08/2010 03:51 PM, Alexander Graf wrote:
 
 Avi Kivity wrote:

   
 On 03/05/2010 06:50 PM, Alexander Graf wrote:

 
 }
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index ce28767..c7ed3cb 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -400,6 +400,12 @@ struct kvm_ioeventfd {
 __u8  pad[36];
 };

 +/* for KVM_ENABLE_CAP */
 +struct kvm_enable_cap {
 +/* in */
 +__u32 cap;



 Reserve space here.  Add a flags field and check it for zeros.

  
 Flags? How about something like

 u64 args[4]

 That way the capability enabling code could decide what to do with the
 arguments. We don't always only need flags I suppose?.


 If you interpret these as bit flags anyway, that would be redundant.

  
 I think I just don't understand what you're trying to say with flags.
 For the OSI enabling we don't need any flags. For later additions we
 don't know what we'll need.


 When we have reserved fields which are later used for something new,
 the kernel needs a way to know if the reserved fields are known or not
 by userspace.  One way to do this is to assume a value of zero means
 the field is unknown to usespace so ignore it.  Another is to require
 userspace to set a bit in an already-known flags field, and only act
 on the new field if its bit was set.  This has the advantage that the
 old kernel checks for unknown flags and errors out, improving forwards
 and backwards compatibility.

 I thought -cap was already a bit field, so this isn't necessary, but
 if it isn't, then a flags field is helpful.

- cap is the capability number. So you want something like:

struct kvm_enable_cap {
  __u32 cap;
  __u32 flags;
  __u64 args[4];
  __u8 pad[64];
};

And then check for flags == 0 in the ioctl handler? Flags could later on
define if the padding changed to a different position, adding new fields
in between args and pad?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation

2010-03-08 Thread Avi Kivity

On 03/08/2010 04:10 PM, Stefan Bader wrote:

Avi Kivity wrote:
   

On 03/06/2010 03:53 PM, Stefan Bader wrote:
 

i Avi,

we currently try to integrate this patch for an update into a 2.6.32
based
system (amongst other kvm updates). But as soon as this patch gets
added kvm
will die on startup in kvm_leave_lazy_mmu. This has been documented here:

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823

I have placed the backports of your patches, which are currently in
linux-next
and marked for stable here:

git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm

I have tested the failure with a version that got only the following
patches in:
KVM: x86 emulator: Add Virtual-8086 mode of emulation
KVM: x86 emulator: fix memory access during x86 emulation
KVM: x86 emulator: Check IOPL level during io instruction emulation
KVM: x86 emulator: Fix popf emulation
KVM: x86 emulator: Check CPL level during privilege instruction emulation

and also with a version that takes all stable patches up to the bad one:
KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
KVM: x86 emulator: Add group8 instruction decoding
KVM: x86 emulator: Add group9 instruction decoding
KVM: x86 emulator: Add Virtual-8086 mode of emulation
KVM: x86 emulator: fix memory access during x86 emulation

But as soon as the fix for memory access gets added, the bug will
occur. Would
you have an idea what might be causing this?

   

Does the same guest, using the same qemu-kvm, work on kvm.git or upstream?

 

The test was done with a kvm user-space package based on 0.12.3 (which seems to
be the current upstream version). I try to do a test on the git version.
   


I meant keep the same userspace without change, and try it on a Linus 
kernel or kvm.git master 
(http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 03:53 PM, Alexander Graf wrote:

 So do it the other way around.  Always load the registers (of course,
 do nothing if already loaded) and then access them in just one way.  I
 assume during emulation the registers will always be loaded?
  
 During emulation we're always in VCPU_RUN, so the vcpu is loaded.

 Do you mean something like:

 read_register(num) {
vcpu_load();
read register from PACA(num);
vcpu_put();
 }

 ? Does vcpu_load incur overhead when it doesnt' need to do anything?


 If the vcpu is always loaded, this would be redundant, no?

 The situation is that a piece of data is in one of two places. 
 Instead of checking and loading it from either, force it to the place
 where it normally is, and load it from there.

 So instead of

 if (x)
 y = p1;
 else
 y = p2;

 in a zillion places, just do

 force_to_p2(); // the common case anyway
 y = p2;

 which results in cleaner code.  Assuming that you have a common case
 of course.


We're looking at two different ifs here.

1) GPR Inside the PACA or not (volatile vs non-volatile)

This is constant. Volatile registers go to the PACA; non-volatiles go to
the vcpu struct.

2) GPR actually loaded in the PACA

When we're in vcpu_load context the registers in the PACA, when not
they're in the vcpu struct


If you have a really easy and fast way to assure that we're always
inside a vcpu_load context, all is great. I could probably even just put
in a BUG_ON(not in vcpu_load context) and make the callers safe. But
some check needs to be done.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Avi Kivity

On 03/08/2010 04:10 PM, Alexander Graf wrote:



When we have reserved fields which are later used for something new,
the kernel needs a way to know if the reserved fields are known or not
by userspace.  One way to do this is to assume a value of zero means
the field is unknown to usespace so ignore it.  Another is to require
userspace to set a bit in an already-known flags field, and only act
on the new field if its bit was set.  This has the advantage that the
old kernel checks for unknown flags and errors out, improving forwards
and backwards compatibility.

I thought -cap was already a bit field, so this isn't necessary, but
if it isn't, then a flags field is helpful.
 

-  cap is the capability number. So you want something like:

struct kvm_enable_cap {
   __u32 cap;
   __u32 flags;
   __u64 args[4];
   __u8 pad[64];
};

And then check for flags == 0 in the ioctl handler? Flags could later on
define if the padding changed to a different position, adding new fields
in between args and pad?
   


Exactly, we do so in several places.  Can be useful if, for example, 
some new capability comes with a resource count value.


What's this thing anyway?  like cpuid bits for x86?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Avi Kivity

On 03/08/2010 04:14 PM, Alexander Graf wrote:


We're looking at two different ifs here.

1) GPR Inside the PACA or not (volatile vs non-volatile)

This is constant. Volatile registers go to the PACA; non-volatiles go to
the vcpu struct.
   


Okay - so no if ().


2) GPR actually loaded in the PACA

When we're in vcpu_load context the registers in the PACA, when not
they're in the vcpu struct


If you have a really easy and fast way to assure that we're always
inside a vcpu_load context, all is great. I could probably even just put
in a BUG_ON(not in vcpu_load context) and make the callers safe. But
some check needs to be done.
   


x86 assumes in vcpu_load() context (without even a BUG_ON()).  
KVM_GET_REGS and friends are responsible for this.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation

2010-03-08 Thread Stefan Bader
Avi Kivity wrote:
 On 03/08/2010 04:10 PM, Stefan Bader wrote:
 Avi Kivity wrote:
   
 On 03/06/2010 03:53 PM, Stefan Bader wrote:
 
 i Avi,

 we currently try to integrate this patch for an update into a 2.6.32
 based
 system (amongst other kvm updates). But as soon as this patch gets
 added kvm
 will die on startup in kvm_leave_lazy_mmu. This has been documented
 here:

 https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823

 I have placed the backports of your patches, which are currently in
 linux-next
 and marked for stable here:

 git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm

 I have tested the failure with a version that got only the following
 patches in:
 KVM: x86 emulator: Add Virtual-8086 mode of emulation
 KVM: x86 emulator: fix memory access during x86 emulation
 KVM: x86 emulator: Check IOPL level during io instruction emulation
 KVM: x86 emulator: Fix popf emulation
 KVM: x86 emulator: Check CPL level during privilege instruction
 emulation

 and also with a version that takes all stable patches up to the bad
 one:
 KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
 KVM: x86 emulator: Add group8 instruction decoding
 KVM: x86 emulator: Add group9 instruction decoding
 KVM: x86 emulator: Add Virtual-8086 mode of emulation
 KVM: x86 emulator: fix memory access during x86 emulation

 But as soon as the fix for memory access gets added, the bug will
 occur. Would
 you have an idea what might be causing this?


 Does the same guest, using the same qemu-kvm, work on kvm.git or
 upstream?

  
 The test was done with a kvm user-space package based on 0.12.3 (which
 seems to
 be the current upstream version). I try to do a test on the git version.

 
 I meant keep the same userspace without change, and try it on a Linus
 kernel or kvm.git master
 (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary).
 

Ok, sorry I misunderstood that. As I see Linus just pulled your patches in, I
will get that compiled and tested.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 04:10 PM, Alexander Graf wrote:

 When we have reserved fields which are later used for something new,
 the kernel needs a way to know if the reserved fields are known or not
 by userspace.  One way to do this is to assume a value of zero means
 the field is unknown to usespace so ignore it.  Another is to require
 userspace to set a bit in an already-known flags field, and only act
 on the new field if its bit was set.  This has the advantage that the
 old kernel checks for unknown flags and errors out, improving forwards
 and backwards compatibility.

 I thought -cap was already a bit field, so this isn't necessary, but
 if it isn't, then a flags field is helpful.
  
 -  cap is the capability number. So you want something like:

 struct kvm_enable_cap {
__u32 cap;
__u32 flags;
__u64 args[4];
__u8 pad[64];
 };

 And then check for flags == 0 in the ioctl handler? Flags could later on
 define if the padding changed to a different position, adding new fields
 in between args and pad?


 Exactly, we do so in several places.  Can be useful if, for example,
 some new capability comes with a resource count value.

 What's this thing anyway?  like cpuid bits for x86?

What thing? This ioctl or the OSI call?

The ioctl is a way to enable a feature on a per-vcpu basis. MOL overlays
the syscall interface with a hypercall interface, so a normal OS syscall
magically becomes a hypercall when magic constants get passed in r3 and r4.

Because for obvious reasons we don't want to enable that when not using
MOL, I figured I'd go in and have userspace decide if it wants to get a
hypercall exit or not. Qemu couldn't really do anything with it after
all. And while at it, I figured let's better make the interface generic.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Alexander Graf
Avi Kivity wrote:
 On 03/08/2010 04:14 PM, Alexander Graf wrote:

 We're looking at two different ifs here.

 1) GPR Inside the PACA or not (volatile vs non-volatile)

 This is constant. Volatile registers go to the PACA; non-volatiles go to
 the vcpu struct.


 Okay - so no if ().

Eh.

r[0 - 12] are volatile
r[13 - 31] are non-volatile

So if we want a common gpr access function we need an if. And we need
one, because the opcodes just use register numbers and doesn't care
where they are.


 2) GPR actually loaded in the PACA

 When we're in vcpu_load context the registers in the PACA, when not
 they're in the vcpu struct


 If you have a really easy and fast way to assure that we're always
 inside a vcpu_load context, all is great. I could probably even just put
 in a BUG_ON(not in vcpu_load context) and make the callers safe. But
 some check needs to be done.


 x86 assumes in vcpu_load() context (without even a BUG_ON()). 
 KVM_GET_REGS and friends are responsible for this.

Oh, interesting. Just drop this patch then :).


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Avi Kivity

On 03/08/2010 04:18 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/08/2010 04:10 PM, Alexander Graf wrote:
 
   

When we have reserved fields which are later used for something new,
the kernel needs a way to know if the reserved fields are known or not
by userspace.  One way to do this is to assume a value of zero means
the field is unknown to usespace so ignore it.  Another is to require
userspace to set a bit in an already-known flags field, and only act
on the new field if its bit was set.  This has the advantage that the
old kernel checks for unknown flags and errors out, improving forwards
and backwards compatibility.

I thought -cap was already a bit field, so this isn't necessary, but
if it isn't, then a flags field is helpful.

 

-   cap is the capability number. So you want something like:

struct kvm_enable_cap {
__u32 cap;
__u32 flags;
__u64 args[4];
__u8 pad[64];
};

And then check for flags == 0 in the ioctl handler? Flags could later on
define if the padding changed to a different position, adding new fields
in between args and pad?

   

Exactly, we do so in several places.  Can be useful if, for example,
some new capability comes with a resource count value.

What's this thing anyway?  like cpuid bits for x86?
 

What thing? This ioctl or the OSI call?

The ioctl is a way to enable a feature on a per-vcpu basis. MOL overlays
the syscall interface with a hypercall interface, so a normal OS syscall
magically becomes a hypercall when magic constants get passed in r3 and r4.

Because for obvious reasons we don't want to enable that when not using
MOL, I figured I'd go in and have userspace decide if it wants to get a
hypercall exit or not. Qemu couldn't really do anything with it after
all. And while at it, I figured let's better make the interface generic.
   


That's reasonable.  Thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] KVM: PPC: Make register read/write wrappers always work

2010-03-08 Thread Avi Kivity

On 03/08/2010 04:20 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/08/2010 04:14 PM, Alexander Graf wrote:
 

We're looking at two different ifs here.

1) GPR Inside the PACA or not (volatile vs non-volatile)

This is constant. Volatile registers go to the PACA; non-volatiles go to
the vcpu struct.

   

Okay - so no if ().
 

Eh.

r[0 - 12] are volatile
r[13 - 31] are non-volatile

So if we want a common gpr access function we need an if. And we need
one, because the opcodes just use register numbers and doesn't care
where they are.
   


I see - we have something similar on x86 (where vmx keeps rsp/rip in a 
register and lets us save everything else manually).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Dustin Kirkland
On Mon, Mar 8, 2010 at 3:48 AM, Avi Kivity a...@redhat.com wrote:
 On 03/08/2010 11:48 AM, Bernhard Schmidt wrote:

 On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote:
 Are there any potential pitfalls?

 It won't work well unless running on a block device (partition or LVM).

 What does work well mean in this context? Potential dataloss?

 No, it becomes synchronous (=extra slow).

But for this to happen, the user would have had to consciously enter
into the situation by creating/using a non block device,
non-pre-allocated backing disk AND specify the aio=native option,
correct?

:-Dustin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Avi Kivity

On 03/08/2010 04:25 PM, Dustin Kirkland wrote:

On Mon, Mar 8, 2010 at 3:48 AM, Avi Kivitya...@redhat.com  wrote:
   

On 03/08/2010 11:48 AM, Bernhard Schmidt wrote:
 

On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote:
   

Are there any potential pitfalls?
   

It won't work well unless running on a block device (partition or LVM).
 

What does work well mean in this context? Potential dataloss?
   

No, it becomes synchronous (=extra slow).
 

But for this to happen, the user would have had to consciously enter
into the situation by creating/using a non block device,
non-pre-allocated backing disk AND specify the aio=native option,
correct?

   


I thought there was some autodetection involved, but perhaps I just 
imagined it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Guest mmap.c bug

2010-03-08 Thread Andrea Arcangeli
On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote:
 It looks unrelated to kvm, though of course random memory corruption 
 cannot be ruled out.
 
 Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)?
 
 Andrea, any idea?

Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma
despite vma is the one with the smaller vm_end where the comparison
vma-vm_start  vma-vm_end is true (the next vma is null and the
prev will have vma-vm_start == prev-vm_end, not ).

The bug check looks right, it doesn't seem false positive and this
bugcheck indicates that the vma rbtree is memory-corrupted somehow.

so yes fiddling with npt on and off sounds a good start, if it's a bug
in shadow paging it's unlikely the exact same bug materializes with
both npt and without. If the crash happens with npt on and off, then
maybe it's not hypervisor related. Could also be bad RAM if it only
happens on a single host and all other hosts are fine with same binary
guest/host kernels (rbtree walk might stress the memory bus more than
other operations). Said that vm_next being null (and if it's null,
likely vm_next pointer has no ram bitflip) is a bit weird and not
common scenario and this page fault seems triggered with procfs
copy_user call which is non standard, so maybe this is a guest bug. It
would be interesting to know what is the vm_start address, at the end
there are stack, vdso and vsyscall areas.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Anthony Liguori

On 03/08/2010 08:26 AM, Avi Kivity wrote:

On 03/08/2010 04:25 PM, Dustin Kirkland wrote:

On Mon, Mar 8, 2010 at 3:48 AM, Avi Kivitya...@redhat.com  wrote:

On 03/08/2010 11:48 AM, Bernhard Schmidt wrote:

On Mon, Mar 08, 2010 at 11:10:29AM +0200, Avi Kivity wrote:

Are there any potential pitfalls?
It won't work well unless running on a block device (partition or 
LVM).

What does work well mean in this context? Potential dataloss?

No, it becomes synchronous (=extra slow).

But for this to happen, the user would have had to consciously enter
into the situation by creating/using a non block device,
non-pre-allocated backing disk AND specify the aio=native option,
correct?



I thought there was some autodetection involved, but perhaps I just 
imagined it.


There's no autodetection.

linux-aio support in the kernel downgrades to synchronous IO if the 
underlying storage does not support linux-aio.  There is no indication 
to userspace that this has happened.


If this happens, besides having a slow guest, the guest VCPU will be 
starved during the I/O requests potentially resulting in things like 
soft lockups and time drift.


Generally, speaking, linux-aio will work well under the following 
circumstances:


 - cache=off is specified
 - the underlying file system is XFS or you are using a block device

We cannot detect this reliably though so it's really up to the user to 
decide whether to use it.  We're working on improving the linux-aio 
kernel interface though to eliminate this detectability problem after 
which, we can enable it in a more automatic fashion.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Avi Kivity

On 03/08/2010 06:28 PM, Anthony Liguori wrote:
I thought there was some autodetection involved, but perhaps I just 
imagined it.



There's no autodetection.

linux-aio support in the kernel downgrades to synchronous IO if the 
underlying storage does not support linux-aio.  There is no indication 
to userspace that this has happened.


If this happens, besides having a slow guest, the guest VCPU will be 
starved during the I/O requests potentially resulting in things like 
soft lockups and time drift.


Generally, speaking, linux-aio will work well under the following 
circumstances:


 - cache=off is specified
 - the underlying file system is XFS or you are using a block device

We cannot detect this reliably though so it's really up to the user to 
decide whether to use it.  We're working on improving the linux-aio 
kernel interface though to eliminate this detectability problem after 
which, we can enable it in a more automatic fashion.


Well, the common case of cache=none on a block device certainly can be 
autodetected.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw disks no longer work in latest kvm (kvm-88 was fine)

2010-03-08 Thread Anthony Liguori

On 03/07/2010 10:21 AM, Avi Kivity wrote:

On 03/07/2010 12:00 PM, Christoph Hellwig wrote:



I can only guess that the info collected so far is not sufficient to
understand what's going on: except of I/O error writing block NNN
we does not have anything at all.  So it's impossible to know where
the problem is.

Actually it is, and the bug has been fixed long ago in:

commit e2a305fb13ff0f5cf6ff80aaa90a5ed5954c
Author: Christoph Hellwigh...@lst.de
Date:   Tue Jan 26 14:49:08 2010 +0100

 block: avoid creating too large iovecs in multiwrite_merge


I've asked for it be added to the -stable series but that hasn't
happened so far.


Anthony, this looks critical.



It's in stable now.  Sounds like a good time to do a 0.12.4.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Exposing boot and reboot timeouts in config files

2010-03-08 Thread Lucas Meneghel Rodrigues
Some guests may take longer to boot/reboot in some hosts,
so let's expose the boot and reboot timeouts in the tests
config file. Also, print the timeouts on the debug
messages.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/kvm_test_utils.py |   13 +++--
 client/tests/kvm/tests/boot.py |   10 ++
 client/tests/kvm/tests_base.cfg.sample |2 ++
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/client/tests/kvm/kvm_test_utils.py 
b/client/tests/kvm/kvm_test_utils.py
index 7d96d6e..564ff35 100644
--- a/client/tests/kvm/kvm_test_utils.py
+++ b/client/tests/kvm/kvm_test_utils.py
@@ -53,7 +53,7 @@ def wait_for_login(vm, nic_index=0, timeout=240, start=0, 
step=2):
 @param timeout: Time to wait before giving up.
 @return: A shell session object.
 
-logging.info(Trying to log into guest '%s'... % vm.name)
+logging.info(Trying to log into guest '%s', timeout %ds, vm.name, 
timeout)
 session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index),
  timeout, start, step)
 if not session:
@@ -80,16 +80,16 @@ def reboot(vm, session, method=shell, 
sleep_before_reset=10, nic_index=0,
 if method == shell:
 # Send a reboot command to the guest's shell
 session.sendline(vm.get_params().get(reboot_command))
-logging.info(Reboot command sent; waiting for guest to go down...)
+logging.info(Reboot command sent. Waiting for guest to go down)
 elif method == system_reset:
 # Sleep for a while before sending the command
 time.sleep(sleep_before_reset)
 # Send a system_reset monitor command
 vm.send_monitor_cmd(system_reset)
-logging.info(system_reset monitor command sent; waiting for guest to 
- go down...)
+logging.info(Monitor command system_reset sent. Waiting for guest to 
+ go down)
 else:
-logging.error(Unknown reboot method: %s % method)
+logging.error(Unknown reboot method: %s, method)
 
 # Wait for the session to become unresponsive and close it
 if not kvm_utils.wait_for(lambda: not session.is_responsive(timeout=30),
@@ -98,7 +98,8 @@ def reboot(vm, session, method=shell, 
sleep_before_reset=10, nic_index=0,
 session.close()
 
 # Try logging into the guest until timeout expires
-logging.info(Guest is down; waiting for it to go up again...)
+logging.info(Guest is down. Waiting for it to go up again, timeout %ds,
+ timeout)
 session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index),
  timeout, 0, 2)
 if not session:
diff --git a/client/tests/kvm/tests/boot.py b/client/tests/kvm/tests/boot.py
index cd1f1d4..9b3f392 100644
--- a/client/tests/kvm/tests/boot.py
+++ b/client/tests/kvm/tests/boot.py
@@ -16,7 +16,9 @@ def run_boot(test, params, env):
 @param env: Dictionary with test environment.
 
 vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
-session = kvm_test_utils.wait_for_login(vm)
+session = kvm_test_utils.wait_for_login(vm, 0,
+ float(params.get(boot_timeout, 
240)),
+ 0, 2)
 
 try:
 if not params.get(reboot_method):
@@ -24,9 +26,9 @@ def run_boot(test, params, env):
 
 # Reboot the VM
 session = kvm_test_utils.reboot(vm, session,
-params.get(reboot_method),
-float(params.get(sleep_before_reset,
- 10)))
+params.get(reboot_method),
+float(params.get(sleep_before_reset, 
10)),
+0, float(params.get(reboot_timeout, 
240)))
 
 finally:
 session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 040d0c3..340b0c0 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -75,11 +75,13 @@ variants:
 type = boot
 restart_vm = yes
 kill_vm_on_error = yes
+boot_timeout = 240
 
 - reboot:   install setup unattended_install
 type = boot
 reboot_method = shell
 kill_vm_on_error = yes
+reboot_timeout = 240
 
 - migrate:  install setup unattended_install
 type = migration
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM test: Exposing boot and reboot timeouts in config files

2010-03-08 Thread sudhir kumar
On Mon, Mar 8, 2010 at 10:58 PM, Lucas Meneghel Rodrigues
l...@redhat.com wrote:
 Some guests may take longer to boot/reboot in some hosts,
 so let's expose the boot and reboot timeouts in the tests
 config file. Also, print the timeouts on the debug
Fine. It seems we missed it during the major development cycle. We
faced this situation when we were having kvm_autotest git. The patch
that I sent was merged. This patch makes perfect sense for cases like
stress boot, slow machines, highly loaded machines etc.

 messages.

 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  client/tests/kvm/kvm_test_utils.py     |   13 +++--
  client/tests/kvm/tests/boot.py         |   10 ++
  client/tests/kvm/tests_base.cfg.sample |    2 ++
  3 files changed, 15 insertions(+), 10 deletions(-)

 diff --git a/client/tests/kvm/kvm_test_utils.py 
 b/client/tests/kvm/kvm_test_utils.py
 index 7d96d6e..564ff35 100644
 --- a/client/tests/kvm/kvm_test_utils.py
 +++ b/client/tests/kvm/kvm_test_utils.py
 @@ -53,7 +53,7 @@ def wait_for_login(vm, nic_index=0, timeout=240, start=0, 
 step=2):
     @param timeout: Time to wait before giving up.
     @return: A shell session object.
     
 -    logging.info(Trying to log into guest '%s'... % vm.name)
 +    logging.info(Trying to log into guest '%s', timeout %ds, vm.name, 
 timeout)
     session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index),
                                  timeout, start, step)
     if not session:
 @@ -80,16 +80,16 @@ def reboot(vm, session, method=shell, 
 sleep_before_reset=10, nic_index=0,
     if method == shell:
         # Send a reboot command to the guest's shell
         session.sendline(vm.get_params().get(reboot_command))
 -        logging.info(Reboot command sent; waiting for guest to go down...)
 +        logging.info(Reboot command sent. Waiting for guest to go down)
     elif method == system_reset:
         # Sleep for a while before sending the command
         time.sleep(sleep_before_reset)
         # Send a system_reset monitor command
         vm.send_monitor_cmd(system_reset)
 -        logging.info(system_reset monitor command sent; waiting for guest 
 to 
 -                     go down...)
 +        logging.info(Monitor command system_reset sent. Waiting for guest 
 to 
 +                     go down)
     else:
 -        logging.error(Unknown reboot method: %s % method)
 +        logging.error(Unknown reboot method: %s, method)

     # Wait for the session to become unresponsive and close it
     if not kvm_utils.wait_for(lambda: not session.is_responsive(timeout=30),
 @@ -98,7 +98,8 @@ def reboot(vm, session, method=shell, 
 sleep_before_reset=10, nic_index=0,
     session.close()

     # Try logging into the guest until timeout expires
 -    logging.info(Guest is down; waiting for it to go up again...)
 +    logging.info(Guest is down. Waiting for it to go up again, timeout %ds,
 +                 timeout)
     session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index),
                                  timeout, 0, 2)
     if not session:
 diff --git a/client/tests/kvm/tests/boot.py b/client/tests/kvm/tests/boot.py
 index cd1f1d4..9b3f392 100644
 --- a/client/tests/kvm/tests/boot.py
 +++ b/client/tests/kvm/tests/boot.py
 @@ -16,7 +16,9 @@ def run_boot(test, params, env):
     @param env: Dictionary with test environment.
     
     vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
 -    session = kvm_test_utils.wait_for_login(vm)
 +    session = kvm_test_utils.wait_for_login(vm, 0,
 +                                         float(params.get(boot_timeout, 
 240)),
 +                                         0, 2)

     try:
         if not params.get(reboot_method):
 @@ -24,9 +26,9 @@ def run_boot(test, params, env):

         # Reboot the VM
         session = kvm_test_utils.reboot(vm, session,
 -                                        params.get(reboot_method),
 -                                        
 float(params.get(sleep_before_reset,
 -                                                         10)))
 +                                    params.get(reboot_method),
 +                                    float(params.get(sleep_before_reset, 
 10)),
 +                                    0, float(params.get(reboot_timeout, 
 240)))

     finally:
         session.close()
 diff --git a/client/tests/kvm/tests_base.cfg.sample 
 b/client/tests/kvm/tests_base.cfg.sample
 index 040d0c3..340b0c0 100644
 --- a/client/tests/kvm/tests_base.cfg.sample
 +++ b/client/tests/kvm/tests_base.cfg.sample
 @@ -75,11 +75,13 @@ variants:
         type = boot
         restart_vm = yes
         kill_vm_on_error = yes
 +        boot_timeout = 240

     - reboot:       install setup unattended_install
         type = boot
         reboot_method = shell
         kill_vm_on_error = yes
 +        reboot_timeout = 240

     - migrate:      install setup unattended_install
         type = migration
 --
 

Re: [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Cam Macdonell
On Mon, Mar 8, 2010 at 2:56 AM, Avi Kivity a...@redhat.com wrote:
 On 03/06/2010 01:52 AM, Cam Macdonell wrote:

 Support an inter-vm shared memory device that maps a shared-memory object
 as a PCI device in the guest.  This patch also supports interrupts between
 guest by communicating over a unix domain socket.  This patch applies to
 the
 qemu-kvm repository.

 This device now creates a qemu character device and sends 1-bytes messages
 to
 trigger interrupts.  Writes are trigger by writing to the Doorbell
 register
 on the shared memory PCI device.  The lower 8-bits of the value written to
 this
 register are sent as the 1-byte message so different meanings of
 interrupts can
 be supported.

 Interrupts are supported between multiple VMs by using a shared memory
 server

 -ivshmemsize in MB,[unix:path][file]

 Interrupts can also be used between host and guest as well by implementing
 a
 listener on the host that talks to shared memory server.  The shared
 memory
 server passes file descriptors for the shared memory object and eventfds
 (our
 interrupt mechanism) to the respective qemu instances.



 Can you provide a spec that describes the device?  This would be useful for
 maintaining the code, writing guest drivers, and as a framework for review.

I'm not sure if you want the Qemu command-line part as part of the
spec here, but I've included for completeness.

Device Specification for Inter-VM shared memory device
---

Qemu Command-line
---

The command-line for inter-vm shared memory is as follows

-ivshmem size,[unix:]name

the size argument specifies the size of the shared memory object.  The second
option specifies either a unix domain socket (when using the unix: prefix) or a
name for the shared memory object.

If a unix domain socket is specified, the guest will receive the shared object
from the shared memory server listening on that socket and will support
interrupts with the other guests using that server.  Each server only serves
one memory object.

If a name is specified on the command line (without 'unix:'), then the guest
will open the POSIX shared memory object with that name (in /dev/shm) and the
specified size.  The guest will NOT support interrupts but the shared memory
object can be shared between multiple guests.

The Inter-VM Shared Memory PCI device
---

BARs

The device supports two BARs.  BAR0 is a 256-byte MMIO region to
support registers
and BAR1 is used to map the shared memory object from the host.  The size of
BAR1 is specified on the command-line and must be a power of 2 in size.

Registers

BAR0 currently supports 5 registers of 16-bits each.  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).

When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.

enum ivshmem_registers {
IntrMask = 0,
IntrStatus = 2,
Doorbell = 4,
IVPosition = 6,
IVLiveList = 8
};

The first two registers are the interrupt mask and status registers.
Interrupts are triggered when a message is received on the guest's eventfd from
another VM.  Writing to the 'Doorbell' register is how synchronization messages
are sent to other VMs.

The IVPosition register is read-only and reports the guest's ID number.  The
IVLiveList register is also read-only and reports a bit vector of currently
live VM IDs.

The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest status
register will be set to when the interrupt is trigger is the destination guest.
A value of 255 in the upper 8-bits will trigger a broadcast where the message
will be sent to all other guests.

Cheers,
Cam


 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/15] KVM: PPC: Make DSISR 32 bits wide

2010-03-08 Thread Alexander Graf
DSISR is only defined as 32 bits wide. So let's reflect that in the
structs too.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h   |2 +-
 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/kvm/book3s_64_interrupts.S |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 14d0262..9f5a992 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -84,8 +84,8 @@ struct kvmppc_vcpu_book3s {
u64 hid[6];
u64 gqr[8];
int slb_nr;
+   u32 dsisr;
u64 sdr1;
-   u64 dsisr;
u64 hior;
u64 msr_mask;
u64 vsid_first;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 119deb4..0ebda67 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -260,7 +260,7 @@ struct kvm_vcpu_arch {
 
u32 last_inst;
 #ifdef CONFIG_PPC64
-   ulong fault_dsisr;
+   u32 fault_dsisr;
 #endif
ulong fault_dear;
ulong fault_esr;
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S 
b/arch/powerpc/kvm/book3s_64_interrupts.S
index c1584d0..faca876 100644
--- a/arch/powerpc/kvm/book3s_64_interrupts.S
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -171,7 +171,7 @@ kvmppc_handler_highmem:
std r3, VCPU_PC(r7)
std r4, VCPU_SHADOW_SRR1(r7)
std r5, VCPU_FAULT_DEAR(r7)
-   std r6, VCPU_FAULT_DSISR(r7)
+   stw r6, VCPU_FAULT_DSISR(r7)
 
ld  r5, VCPU_HFLAGS(r7)
rldicl. r5, r5, 0, 63   /* CR = ((r5  1) == 0) */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/15] KVM: PPC: Split instruction reading out

2010-03-08 Thread Alexander Graf
The current check_ext function reads the instruction and then does
the checking. Let's split the reading out so we can reuse it for
different functions.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |   24 
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 9e0bc47..400ae0a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -650,26 +650,34 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
kvmppc_recalc_shadow_msr(vcpu);
 }
 
-static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr)
+static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
 {
ulong srr0 = vcpu-arch.pc;
int ret;
 
-   /* Need to do paired single emulation? */
-   if (!(vcpu-arch.hflags  BOOK3S_HFLAG_PAIRED_SINGLE))
-   return EMULATE_DONE;
-
-   /* Read out the instruction */
ret = kvmppc_ld(vcpu, srr0, sizeof(u32), vcpu-arch.last_inst, false);
if (ret == -ENOENT) {
vcpu-arch.msr = kvmppc_set_field(vcpu-arch.msr, 33, 33, 1);
vcpu-arch.msr = kvmppc_set_field(vcpu-arch.msr, 34, 36, 0);
vcpu-arch.msr = kvmppc_set_field(vcpu-arch.msr, 42, 47, 0);
kvmppc_book3s_queue_irqprio(vcpu, 
BOOK3S_INTERRUPT_INST_STORAGE);
-   } else if(ret == EMULATE_DONE) {
+   return EMULATE_AGAIN;
+   }
+
+   return EMULATE_DONE;
+}
+
+static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr)
+{
+
+   /* Need to do paired single emulation? */
+   if (!(vcpu-arch.hflags  BOOK3S_HFLAG_PAIRED_SINGLE))
+   return EMULATE_DONE;
+
+   /* Read out the instruction */
+   if (kvmppc_read_inst(vcpu) == EMULATE_DONE)
/* Need to emulate */
return EMULATE_FAIL;
-   }
 
return EMULATE_AGAIN;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/15] KVM: PPC: MOL bringup patches

2010-03-08 Thread Alexander Graf
Mac-on-Linux has always lacked PPC64 host support. This is going to
change now!

This patchset contains minor patches to enable MOL, but is mostly about
bug fixes that came out of running Mac OS X. With this set and a pretty
small patch to MOL I have 10.4.11 running as a guest on a 970MP host.

I'll send the MOl patches to the respective ML in the next days.


v1 - v2:

 - Add documentation for EXIT_OSI and ENABLE_CAP
 - Add flags to enable_cap
 - Add build fix for !CONFIG_VSX
 - Remove in-paca register check

Alexander Graf (15):
  KVM: PPC: Ensure split mode works
  KVM: PPC: Allow userspace to unset the IRQ line
  KVM: PPC: Make DSISR 32 bits wide
  KVM: PPC: Book3S_32 guest MMU fixes
  KVM: PPC: Split instruction reading out
  KVM: PPC: Don't reload FPU with invalid values
  KVM: PPC: Load VCPU for register fetching
  KVM: PPC: Implement mfsr emulation
  KVM: PPC: Implement BAT reads
  KVM: PPC: Make XER load 32 bit
  KVM: PPC: Implement emulation for lbzux and lhax
  KVM: PPC: Implement alignment interrupt
  KVM: Add support for enabling capabilities per-vcpu
  KVM: PPC: Add OSI hypercall interface
  KVM: PPC: Make build work without CONFIG_VSX/ALTIVEC

 Documentation/kvm/api.txt   |   28 +++
 arch/powerpc/include/asm/kvm.h  |3 +
 arch/powerpc/include/asm/kvm_book3s.h   |   18 +++-
 arch/powerpc/include/asm/kvm_host.h |4 +-
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/kvm/book3s.c   |  130 ++-
 arch/powerpc/kvm/book3s_32_mmu.c|   30 ++--
 arch/powerpc/kvm/book3s_64_emulate.c|   88 +
 arch/powerpc/kvm/book3s_64_interrupts.S |2 +-
 arch/powerpc/kvm/book3s_64_slb.S|2 +-
 arch/powerpc/kvm/emulate.c  |   20 +
 arch/powerpc/kvm/powerpc.c  |   43 ++-
 include/linux/kvm.h |   17 
 13 files changed, 335 insertions(+), 52 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/15] KVM: PPC: Ensure split mode works

2010-03-08 Thread Alexander Graf
On PowerPC we can go into MMU Split Mode. That means that either
data relocation is on but instruction relocation is off or vice
versa.

That mode didn't work properly, as we weren't always flushing
entries when going into a new split mode, potentially mapping
different code or data that we're supposed to.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |9 +++---
 arch/powerpc/kvm/book3s.c |   46 +---
 2 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index e6ea974..14d0262 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -99,10 +99,11 @@ struct kvmppc_vcpu_book3s {
 #define CONTEXT_GUEST  1
 #define CONTEXT_GUEST_END  2
 
-#define VSID_REAL  0xfff0
-#define VSID_REAL_DR   0xffe0
-#define VSID_REAL_IR   0xffd0
-#define VSID_BAT   0xffc0
+#define VSID_REAL_DR   0x7ff0
+#define VSID_REAL_IR   0x7fe0
+#define VSID_SPLIT_MASK0x7fe0
+#define VSID_REAL  0x7fc0
+#define VSID_BAT   0x7fb0
 #define VSID_PR0x8000
 
 extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 94c229d..c2ffb91 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -133,6 +133,14 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 
if (((vcpu-arch.msr  (MSR_IR|MSR_DR)) != (old_msr  (MSR_IR|MSR_DR))) 
||
(vcpu-arch.msr  MSR_PR) != (old_msr  MSR_PR)) {
+   bool dr = (vcpu-arch.msr  MSR_DR) ? true : false;
+   bool ir = (vcpu-arch.msr  MSR_IR) ? true : false;
+
+   /* Flush split mode PTEs */
+   if (dr != ir)
+   kvmppc_mmu_pte_vflush(vcpu, VSID_SPLIT_MASK,
+ VSID_SPLIT_MASK);
+
kvmppc_mmu_flush_segments(vcpu);
kvmppc_mmu_map_segment(vcpu, vcpu-arch.pc);
}
@@ -395,15 +403,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong 
eaddr, bool data,
} else {
pte-eaddr = eaddr;
pte-raddr = eaddr  0x;
-   pte-vpage = eaddr  12;
-   switch (vcpu-arch.msr  (MSR_DR|MSR_IR)) {
-   case 0:
-   pte-vpage |= VSID_REAL;
-   case MSR_DR:
-   pte-vpage |= VSID_REAL_DR;
-   case MSR_IR:
-   pte-vpage |= VSID_REAL_IR;
-   }
+   pte-vpage = VSID_REAL | eaddr  12;
pte-may_read = true;
pte-may_write = true;
pte-may_execute = true;
@@ -512,12 +512,10 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
int page_found = 0;
struct kvmppc_pte pte;
bool is_mmio = false;
+   bool dr = (vcpu-arch.msr  MSR_DR) ? true : false;
+   bool ir = (vcpu-arch.msr  MSR_IR) ? true : false;
 
-   if ( vec == BOOK3S_INTERRUPT_DATA_STORAGE ) {
-   relocated = (vcpu-arch.msr  MSR_DR);
-   } else {
-   relocated = (vcpu-arch.msr  MSR_IR);
-   }
+   relocated = data ? dr : ir;
 
/* Resolve real address if translation turned on */
if (relocated) {
@@ -529,14 +527,18 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
pte.raddr = eaddr  0x;
pte.eaddr = eaddr;
pte.vpage = eaddr  12;
-   switch (vcpu-arch.msr  (MSR_DR|MSR_IR)) {
-   case 0:
-   pte.vpage |= VSID_REAL;
-   case MSR_DR:
-   pte.vpage |= VSID_REAL_DR;
-   case MSR_IR:
-   pte.vpage |= VSID_REAL_IR;
-   }
+   }
+
+   switch (vcpu-arch.msr  (MSR_DR|MSR_IR)) {
+   case 0:
+   pte.vpage |= VSID_REAL;
+   break;
+   case MSR_DR:
+   pte.vpage |= VSID_REAL_DR;
+   break;
+   case MSR_IR:
+   pte.vpage |= VSID_REAL_IR;
+   break;
}
 
if (vcpu-arch.mmu.is_dcbz32(vcpu) 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/15] KVM: PPC: Implement mfsr emulation

2010-03-08 Thread Alexander Graf
We emulate the mfsrin instruction already, that passes the SR number
in a register value. But we lacked support for mfsr that encoded the
SR number in the opcode.

So let's implement it.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_emulate.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index c989214..8d7a78d 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -35,6 +35,7 @@
 #define OP_31_XOP_SLBMTE   402
 #define OP_31_XOP_SLBIE434
 #define OP_31_XOP_SLBIA498
+#define OP_31_XOP_MFSR 595
 #define OP_31_XOP_MFSRIN   659
 #define OP_31_XOP_SLBMFEV  851
 #define OP_31_XOP_EIOIO854
@@ -90,6 +91,18 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
case OP_31_XOP_MTMSR:
kvmppc_set_msr(vcpu, kvmppc_get_gpr(vcpu, 
get_rs(inst)));
break;
+   case OP_31_XOP_MFSR:
+   {
+   int srnum;
+
+   srnum = kvmppc_get_field(inst, 12 + 32, 15 + 32);
+   if (vcpu-arch.mmu.mfsrin) {
+   u32 sr;
+   sr = vcpu-arch.mmu.mfsrin(vcpu, srnum);
+   kvmppc_set_gpr(vcpu, get_rt(inst), sr);
+   }
+   break;
+   }
case OP_31_XOP_MFSRIN:
{
int srnum;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/15] KVM: PPC: Load VCPU for register fetching

2010-03-08 Thread Alexander Graf
When trying to read or store vcpu register data, we should also make
sure the vcpu is actually loaded, so we're 100% sure we get the correct
values.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 029e1be..585dc91 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -955,6 +955,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 {
int i;
 
+   vcpu_load(vcpu);
+
regs-pc = vcpu-arch.pc;
regs-cr = kvmppc_get_cr(vcpu);
regs-ctr = vcpu-arch.ctr;
@@ -975,6 +977,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
for (i = 0; i  ARRAY_SIZE(regs-gpr); i++)
regs-gpr[i] = kvmppc_get_gpr(vcpu, i);
 
+   vcpu_put(vcpu);
+
return 0;
 }
 
@@ -982,6 +986,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 {
int i;
 
+   vcpu_load(vcpu);
+
vcpu-arch.pc = regs-pc;
kvmppc_set_cr(vcpu, regs-cr);
vcpu-arch.ctr = regs-ctr;
@@ -1001,6 +1007,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
for (i = 0; i  ARRAY_SIZE(regs-gpr); i++)
kvmppc_set_gpr(vcpu, i, regs-gpr[i]);
 
+   vcpu_put(vcpu);
+
return 0;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/15] KVM: PPC: Make XER load 32 bit

2010-03-08 Thread Alexander Graf
We have a 32 bit value in the PACA to store XER in. We also do an stw
when storing XER in there. But then we load it with ld, completely
screwing it up on every entry.

Welcome to the Big Endian world.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_slb.S |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
index 35b7627..0919679 100644
--- a/arch/powerpc/kvm/book3s_64_slb.S
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -145,7 +145,7 @@ slb_do_enter:
lwz r11, (PACA_KVM_CR)(r13)
mtcrr11
 
-   ld  r11, (PACA_KVM_XER)(r13)
+   lwz r11, (PACA_KVM_XER)(r13)
mtxer   r11
 
ld  r11, (PACA_KVM_R11)(r13)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/15] KVM: PPC: Don't reload FPU with invalid values

2010-03-08 Thread Alexander Graf
When the guest activates the FPU, we load it up. That's fine when
it wasn't activated before on the host, but if it was we end up
reloading FPU values from last time the FPU was deactivated on the
host without writing the proper values back to the vcpu struct.

This patch checks if the FPU is enabled already and if so just doesn't
bother activating it, making FPU operations survive guest context switches.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 400ae0a..029e1be 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -701,6 +701,11 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, 
unsigned int exit_nr,
return RESUME_GUEST;
}
 
+   /* We already own the ext */
+   if (vcpu-arch.guest_owned_ext  msr) {
+   return RESUME_GUEST;
+   }
+
 #ifdef DEBUG_EXT
printk(KERN_INFO Loading up ext 0x%lx\n, msr);
 #endif
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/15] KVM: PPC: Book3S_32 guest MMU fixes

2010-03-08 Thread Alexander Graf
This patch makes the VSID of mapped pages always reflecting all special cases
we have, like split mode.

It also changes the tlbie mask to 0x0000 according to the spec. The mask
we used before was incorrect.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |1 +
 arch/powerpc/kvm/book3s_32_mmu.c  |   30 +++---
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 9f5a992..b47b2f5 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -44,6 +44,7 @@ struct kvmppc_sr {
bool Ks;
bool Kp;
bool nx;
+   bool valid;
 };
 
 struct kvmppc_bat {
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 1483a9b..7071e22 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -57,6 +57,8 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
 
 static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data);
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+u64 *vsid);
 
 static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t 
eaddr)
 {
@@ -66,13 +68,14 @@ static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s 
*vcpu_book3s, gva_t e
 static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
 bool data)
 {
-   struct kvmppc_sr *sre = find_sr(to_book3s(vcpu), eaddr);
+   u64 vsid;
struct kvmppc_pte pte;
 
if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data))
return pte.vpage;
 
-   return (((u64)eaddr  12)  0x) | (((u64)sre-vsid)  16);
+   kvmppc_mmu_book3s_32_esid_to_vsid(vcpu, eaddr  SID_SHIFT, vsid);
+   return (((u64)eaddr  12)  0x) | (vsid  16);
 }
 
 static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
@@ -142,8 +145,13 @@ static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu 
*vcpu, gva_t eaddr,
bat-bepi_mask);
}
if ((eaddr  bat-bepi_mask) == bat-bepi) {
+   u64 vsid;
+   kvmppc_mmu_book3s_32_esid_to_vsid(vcpu,
+   eaddr  SID_SHIFT, vsid);
+   vsid = 16;
+   pte-vpage = (((u64)eaddr  12)  0x) | vsid;
+
pte-raddr = bat-brpn | (eaddr  ~bat-bepi_mask);
-   pte-vpage = (eaddr  12) | VSID_BAT;
pte-may_read = bat-pp;
pte-may_write = bat-pp  1;
pte-may_execute = true;
@@ -302,6 +310,7 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu 
*vcpu, u32 srnum,
/* And then put in the new SR */
sre-raw = value;
sre-vsid = (value  0x0fff);
+   sre-valid = (value  0x8000) ? false : true;
sre-Ks = (value  0x4000) ? true : false;
sre-Kp = (value  0x2000) ? true : false;
sre-nx = (value  0x1000) ? true : false;
@@ -312,7 +321,7 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu 
*vcpu, u32 srnum,
 
 static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool 
large)
 {
-   kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, ea, 0x0000);
 }
 
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
@@ -333,15 +342,22 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct 
kvm_vcpu *vcpu, u64 esid,
break;
case MSR_DR|MSR_IR:
{
-   ulong ea;
-   ea = esid  SID_SHIFT;
-   *vsid = find_sr(to_book3s(vcpu), ea)-vsid;
+   ulong ea = esid  SID_SHIFT;
+   struct kvmppc_sr *sr = find_sr(to_book3s(vcpu), ea);
+
+   if (!sr-valid)
+   return -1;
+
+   *vsid = sr-vsid;
break;
}
default:
BUG();
}
 
+   if (vcpu-arch.msr  MSR_PR)
+   *vsid |= VSID_PR;
+
return 0;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/15] KVM: PPC: Implement emulation for lbzux and lhax

2010-03-08 Thread Alexander Graf
We get MMIOs with the weirdest instructions. But every time we do,
we need to improve our emulator to implement them.

So let's do that - this time it's lbzux and lhax's round.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/emulate.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 2410ec2..dbb5d68 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -38,10 +38,12 @@
 #define OP_31_XOP_LBZX  87
 #define OP_31_XOP_STWX  151
 #define OP_31_XOP_STBX  215
+#define OP_31_XOP_LBZUX 119
 #define OP_31_XOP_STBUX 247
 #define OP_31_XOP_LHZX  279
 #define OP_31_XOP_LHZUX 311
 #define OP_31_XOP_MFSPR 339
+#define OP_31_XOP_LHAX  343
 #define OP_31_XOP_STHX  407
 #define OP_31_XOP_STHUX 439
 #define OP_31_XOP_MTSPR 467
@@ -173,6 +175,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
 
+   case OP_31_XOP_LBZUX:
+   rt = get_rt(inst);
+   ra = get_ra(inst);
+   rb = get_rb(inst);
+
+   ea = kvmppc_get_gpr(vcpu, rb);
+   if (ra)
+   ea += kvmppc_get_gpr(vcpu, ra);
+
+   emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
+   kvmppc_set_gpr(vcpu, ra, ea);
+   break;
+
case OP_31_XOP_STWX:
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu,
@@ -202,6 +217,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, rs, ea);
break;
 
+   case OP_31_XOP_LHAX:
+   rt = get_rt(inst);
+   emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1);
+   break;
+
case OP_31_XOP_LHZX:
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-08 Thread Alexander Graf
Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm.h |3 +++
 arch/powerpc/include/asm/kvm_ppc.h |2 ++
 arch/powerpc/kvm/book3s.c  |6 ++
 arch/powerpc/kvm/powerpc.c |5 -
 4 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index 19bae31..6c5547d 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -84,4 +84,7 @@ struct kvm_guest_debug_arch {
 #define KVM_REG_QPR0x0040
 #define KVM_REG_FQPR   0x0060
 
+#define KVM_INTERRUPT_SET  -1U
+#define KVM_INTERRUPT_UNSET-2U
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index c7fcdd7..6a2464e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -92,6 +92,8 @@ extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
struct kvm_interrupt *irq);
+extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
+ struct kvm_interrupt *irq);
 
 extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
   unsigned int op, int *advance);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c2ffb91..9e0bc47 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -230,6 +230,12 @@ void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
 }
 
+void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
+  struct kvm_interrupt *irq)
+{
+   kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
+}
+
 int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
 {
int deliver = 1;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5a8eb95..a28a512 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -449,7 +449,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
 {
-   kvmppc_core_queue_external(vcpu, irq);
+   if (irq-irq == KVM_INTERRUPT_UNSET)
+   kvmppc_core_dequeue_external(vcpu, irq);
+   else
+   kvmppc_core_queue_external(vcpu, irq);
 
if (waitqueue_active(vcpu-wq)) {
wake_up_interruptible(vcpu-wq);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/15] KVM: PPC: Implement BAT reads

2010-03-08 Thread Alexander Graf
BATs can't only be written to, you can also read them out!
So let's implement emulation for reading BAT values again.

While at it, I also made BAT setting flush the segment cache,
so we're absolutely sure there's no MMU state left when writing
BATs.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_emulate.c |   35 ++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index 8d7a78d..39d5003 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -239,6 +239,34 @@ void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat, bool upper,
}
 }
 
+static u32 kvmppc_read_bat(struct kvm_vcpu *vcpu, int sprn)
+{
+   struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+   struct kvmppc_bat *bat;
+
+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = vcpu_book3s-ibat[4 + ((sprn - SPRN_IBAT4U) / 2)];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = vcpu_book3s-dbat[4 + ((sprn - SPRN_DBAT4U) / 2)];
+   break;
+   default:
+   BUG();
+   }
+
+   if (sprn % 2)
+   return bat-raw  32;
+   else
+   return bat-raw;
+}
+
 static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u32 val)
 {
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
@@ -290,6 +318,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
/* BAT writes happen so rarely that we're ok to flush
 * everything here */
kvmppc_mmu_pte_flush(vcpu, 0, 0);
+   kvmppc_mmu_flush_segments(vcpu);
break;
case SPRN_HID0:
to_book3s(vcpu)-hid[0] = spr_val;
@@ -373,6 +402,12 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
int emulated = EMULATE_DONE;
 
switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   kvmppc_set_gpr(vcpu, rt, kvmppc_read_bat(vcpu, sprn));
+   break;
case SPRN_SDR1:
kvmppc_set_gpr(vcpu, rt, to_book3s(vcpu)-sdr1);
break;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/15] KVM: PPC: Make build work without CONFIG_VSX/ALTIVEC

2010-03-08 Thread Alexander Graf
The FPU/Altivec/VSX enablement also brought access to some structure
elements that are only defined when the respective config options
are enabled.

Unfortuately I forgot to check for the config options at some places,
so let's do that now.

Unbreaks the build when CONFIG_VSX is not set.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index e752a59..00e9684 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -608,7 +608,9 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
 {
struct thread_struct *t = current-thread;
u64 *vcpu_fpr = vcpu-arch.fpr;
+#ifdef CONFIG_VSX
u64 *vcpu_vsx = vcpu-arch.vsr;
+#endif
u64 *thread_fpr = (u64*)t-fpr;
int i;
 
@@ -688,7 +690,9 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, 
unsigned int exit_nr,
 {
struct thread_struct *t = current-thread;
u64 *vcpu_fpr = vcpu-arch.fpr;
+#ifdef CONFIG_VSX
u64 *vcpu_vsx = vcpu-arch.vsr;
+#endif
u64 *thread_fpr = (u64*)t-fpr;
int i;
 
@@ -1218,8 +1222,12 @@ int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 {
int ret;
struct thread_struct ext_bkp;
+#ifdef CONFIG_ALTIVEC
bool save_vec = current-thread.used_vr;
+#endif
+#ifdef CONFIG_VSX
bool save_vsx = current-thread.used_vsr;
+#endif
ulong ext_msr;
 
/* No need to go into the guest when all we do is going out */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-08 Thread Alexander Graf
Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - Add flags to enable_cap
  - Update documentation for kvm_enable_cap
---
 Documentation/kvm/api.txt  |   15 +++
 arch/powerpc/kvm/powerpc.c |   26 ++
 include/linux/kvm.h|   11 +++
 3 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index d170cb4..6a19ab6 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -749,6 +749,21 @@ Writes debug registers into the vcpu.
 See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
 yet and must be cleared on entry.
 
+4.34 KVM_ENABLE_CAP
+
+Capability: basic
+Architectures: all
+Type: vcpu ioctl
+Parameters: struct kvm_enable_cap (in)
+Returns: 0 on success; -1 on error
+
+Not all extensions are enabled by default. Using this ioctl the application
+can enable an extension, making it available to the guest.
+
+On systems that do not support this ioctl, it always fails. On systems that
+do support it, it only works for extensions that are supported for enablement.
+As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI.
+
 
 5. The kvm_run structure
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a28a512..8bd8204 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -462,6 +462,23 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct 
kvm_interrupt *irq)
return 0;
 }
 
+static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
+struct kvm_enable_cap *cap)
+{
+   int r;
+
+   if (cap-flags)
+   return -EINVAL;
+
+   switch (cap-cap) {
+   default:
+   r = -EINVAL;
+   break;
+   }
+
+   return r;
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 struct kvm_mp_state *mp_state)
 {
@@ -490,6 +507,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_interrupt(vcpu, irq);
break;
}
+   case KVM_ENABLE_CAP:
+   {
+   struct kvm_enable_cap cap;
+   r = -EFAULT;
+   if (copy_from_user(cap, argp, sizeof(cap)))
+   goto out;
+   r = kvm_vcpu_ioctl_enable_cap(vcpu, cap);
+   break;
+   }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..a18ac92 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -400,6 +400,15 @@ struct kvm_ioeventfd {
__u8  pad[36];
 };
 
+/* for KVM_ENABLE_CAP */
+struct kvm_enable_cap {
+   /* in */
+   __u32 cap;
+   __u32 flags;
+   __u64 args[4];
+   __u8  pad[64];
+};
+
 #define KVMIO 0xAE
 
 /*
@@ -696,6 +705,8 @@ struct kvm_clock_data {
 /* Available with KVM_CAP_DEBUGREGS */
 #define KVM_GET_DEBUGREGS _IOR(KVMIO,  0xa1, struct kvm_debugregs)
 #define KVM_SET_DEBUGREGS _IOW(KVMIO,  0xa2, struct kvm_debugregs)
+/* No need for CAP, because then it just always fails */
+#define KVM_ENABLE_CAP_IOW(KVMIO,  0xa3, struct kvm_enable_cap)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/15] KVM: PPC: Add OSI hypercall interface

2010-03-08 Thread Alexander Graf
MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - Add documentation for OSI exit struct
---
 Documentation/kvm/api.txt |   13 +
 arch/powerpc/include/asm/kvm_book3s.h |5 +
 arch/powerpc/include/asm/kvm_host.h   |2 ++
 arch/powerpc/kvm/book3s.c |   24 ++--
 arch/powerpc/kvm/powerpc.c|   12 
 include/linux/kvm.h   |6 ++
 6 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 6a19ab6..b2129e8 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -932,6 +932,19 @@ s390 specific.
 
 powerpc specific.
 
+   /* KVM_EXIT_OSI */
+   struct {
+   __u64 gprs[32];
+   } osi;
+
+MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
+hypercalls and exit with this exit struct that contains all the guest gprs.
+
+If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
+Userspace can now handle the hypercall and when it's done modify the gprs as
+necessary. Upon guest entry all guest GPRs will then be replaced by the values
+in this struct.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 1a169f3..54929cd 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -147,6 +147,11 @@ static inline ulong dsisr(void)
 
 extern void kvm_return_point(void);
 
+/* Magic register values loaded into r3 and r4 before the 'sc' assembly
+ * instruction for the OSI hypercalls */
+#define OSI_SC_MAGIC_R30x113724FA
+#define OSI_SC_MAGIC_R40x77810F9B
+
 #define INS_DCBZ   0x7c0007ec
 
 #endif /* __ASM_KVM_BOOK3S_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 0ebda67..486f1ca 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -273,6 +273,8 @@ struct kvm_vcpu_arch {
u8 mmio_sign_extend;
u8 dcr_needed;
u8 dcr_is_write;
+   u8 osi_needed;
+   u8 osi_enabled;
 
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 6b8b5ed..e752a59 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -871,12 +871,24 @@ program_interrupt:
break;
}
case BOOK3S_INTERRUPT_SYSCALL:
-#ifdef EXIT_DEBUG
-   printk(KERN_INFO Syscall Nr %d\n, (int)kvmppc_get_gpr(vcpu, 
0));
-#endif
-   vcpu-stat.syscall_exits++;
-   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   r = RESUME_GUEST;
+   // XXX make user settable
+   if (vcpu-arch.osi_enabled 
+   (((u32)kvmppc_get_gpr(vcpu, 3)) == OSI_SC_MAGIC_R3) 
+   (((u32)kvmppc_get_gpr(vcpu, 4)) == OSI_SC_MAGIC_R4)) {
+   u64 *gprs = run-osi.gprs;
+   int i;
+
+   run-exit_reason = KVM_EXIT_OSI;
+   for (i = 0; i  32; i++)
+   gprs[i] = kvmppc_get_gpr(vcpu, i);
+   vcpu-arch.osi_needed = 1;
+   r = RESUME_HOST_NV;
+
+   } else {
+   vcpu-stat.syscall_exits++;
+   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   r = RESUME_GUEST;
+   }
break;
case BOOK3S_INTERRUPT_FP_UNAVAIL:
case BOOK3S_INTERRUPT_ALTIVEC:
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8bd8204..035bad4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -148,6 +148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
switch (ext) {
case KVM_CAP_PPC_SEGSTATE:
case KVM_CAP_PPC_PAIRED_SINGLES:
+   case KVM_CAP_PPC_OSI:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -429,6 +430,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (!vcpu-arch.dcr_is_write)
kvmppc_complete_dcr_load(vcpu, run);
vcpu-arch.dcr_needed = 0;
+   } else if (vcpu-arch.osi_needed) {
+   u64 *gprs = run-osi.gprs;
+   int i;
+
+   for (i = 0; i  32; i++)
+   kvmppc_set_gpr(vcpu, i, gprs[i]);
+   

[PATCH 12/15] KVM: PPC: Implement alignment interrupt

2010-03-08 Thread Alexander Graf
Mac OS X has some applications - namely the Finder - that require alignment
interrupts to work properly. So we need to implement them.

But the spec for 970 and 750 also looks different. While 750 requires the
DSISR fields to reflect some instruction bits, the 970 declares this as an
optional feature. So we need to reconstruct DSISR manually.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |1 +
 arch/powerpc/kvm/book3s.c |9 +++
 arch/powerpc/kvm/book3s_64_emulate.c  |   40 +
 3 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b47b2f5..1a169f3 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -131,6 +131,7 @@ extern void kvmppc_rmcall(ulong srr0, ulong srr1);
 extern void kvmppc_load_up_fpu(void);
 extern void kvmppc_load_up_altivec(void);
 extern void kvmppc_load_up_vsx(void);
+extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 585dc91..6b8b5ed 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -905,6 +905,15 @@ program_interrupt:
}
break;
}
+   case BOOK3S_INTERRUPT_ALIGNMENT:
+   vcpu-arch.dear = vcpu-arch.fault_dear;
+   if (kvmppc_read_inst(vcpu) == EMULATE_DONE) {
+   to_book3s(vcpu)-dsisr = kvmppc_alignment_dsisr(vcpu,
+   vcpu-arch.last_inst);
+   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   }
+   r = RESUME_GUEST;
+   break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
case BOOK3S_INTERRUPT_TRACE:
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index 39d5003..c401dd4 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -44,6 +44,8 @@
 /* DCBZ is actually 1014, but we patch it to 1010 so we get a trap */
 #define OP_31_XOP_DCBZ 1010
 
+#define OP_LFS 48
+
 #define SPRN_GQR0  912
 #define SPRN_GQR1  913
 #define SPRN_GQR2  914
@@ -474,3 +476,41 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
return emulated;
 }
 
+u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst)
+{
+   u32 dsisr = 0;
+
+   /*
+* This is what the spec says about DSISR bits (not mentioned = 0):
+*
+* 12:13[DS]Set to bits 30:31
+* 15:16[X] Set to bits 29:30
+* 17   [X] Set to bit 25
+*  [D/DS]  Set to bit 5
+* 18:21[X] Set to bits 21:24
+*  [D/DS]  Set to bits 1:4
+* 22:26Set to bits 6:10 (RT/RS/FRT/FRS)
+* 27:31Set to bits 11:15 (RA)
+*/
+
+   switch (get_op(inst)) {
+   /* D-form */
+   case OP_LFS:
+   dsisr |= (inst  12)  0x4000; /* bit 17 */
+   dsisr |= (inst  17)  0x3c00; /* bits 18:21 */
+   break;
+   /* X-form */
+   case 31:
+   dsisr |= (inst  14)  0x18000; /* bits 15:16 */
+   dsisr |= (inst  8)   0x04000; /* bit 17 */
+   dsisr |= (inst  3)   0x03c00; /* bits 18:21 */
+   break;
+   default:
+   printk(KERN_INFO KVM: Unaligned instruction 0x%x\n, inst);
+   break;
+   }
+
+   dsisr |= (inst  16)  0x03ff; /* bits 22:31 */
+
+   return dsisr;
+}
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/15] KVM: PPC: MOL bringup patches

2010-03-08 Thread Alexander Graf
Alexander Graf wrote:
 Mac-on-Linux has always lacked PPC64 host support. This is going to
 change now!

 This patchset contains minor patches to enable MOL, but is mostly about
 bug fixes that came out of running Mac OS X. With this set and a pretty
 small patch to MOL I have 10.4.11 running as a guest on a 970MP host.

 I'll send the MOl patches to the respective ML in the next days.
   

The patches for MOL are integrated in their SVN already. Forgot to
change the description.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1

2010-03-08 Thread Chris Webb
Chris Webb ch...@arachsys.com writes:

 During boot, the screen gets resized to height 1 and a mouse click at this
 point will cause a division by zero when calculating the absolute pointer
 position from the pixel (x, y). Return a click in the middle of the screen
 instead in this case.

I think this probably ought to be a candidate for 0.12-stable too. We're
seeing these crashes for real from time-to-time so it's not just a
theoretical problem.

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM PMU virtualization

2010-03-08 Thread Avi Kivity

On 02/26/2010 04:42 PM, Peter Zijlstra wrote:


Also, intel debugstore things requires a host linear address,


It requires a linear address, not a host linear address.  Of course, it 
might not like the linear address mappings changing under its feet.  If 
it has a private tlb, then this won't work.



  again, not
something a vcpu can easily provide (although that might be worked
around with an msr trap, but that still limits you to 1 page data sizes,
not a limitation all software will respect).
   


If you're willing to pin pages, you can map the guest's buffer.  That 
won't work if BTS can happen in parallel with a #VMEXIT, or if there are 
interactions with npt/ept.  Will have to ask the vendors.



All that said, what we really want is for Intel+AMD to come up with
proper hw PMU virtualization support that makes it easy to rotate the
full PMU in and out for a guest. Then this whole discussion will become
a non issue.
 

As it stands there simply are a number of PMU features that defy being
virtualized, simply because the virt stuff doesn't do system topology.
So even if they were to support a virtualized pmu, it would likely be a
different beast than the native hardware is, and it will be several
hardware models in the future, coming up with a paravirt interface and
getting !linux hosts to adapt and !linux guests to use is probably as
'easy'.
   


!linux hosts are someone else's problem, but how would be get !linux 
guests to use a soft pmu?


The only way I see that happening is if a soft pmu is standardized 
across hypervisors, which is unfortunately unlikely.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Michael Tokarev
Avi Kivity wrote:
 On 03/08/2010 03:46 AM, Bernhard Schmidt wrote:
 Hi,

 sorry for this pretty generic question, I did not find any real pros and
 cons on the net anywhere, but I might just have missed them.

 In a pure x86_64 environment (~2.6.32 vanilla kernel, 0.12.3 qemu-kvm),
 is enabling linux-aio in KVM a good idea?
 
 Yes.

Apparently that does not quite work.  I just re-compiled kvm with
--enable-linux-aio (actually I just installed libaio-dev on debian
and qemu-kvm's configure picked it up automatically), and tried
a guest.  But any I/O fails.

  kvm-0.12.3 ... -drive file=/dev/sda10,if=virtio,cache=none,aio=native

(/dev/sda10 is a (spare) partition on my hard drive I use for testing).
Here's the resulting dmesg in the guest (2.6.32):

 vdb:
end_request: I/O error, dev vdb, sector 0
Buffer I/O error on device vdb, logical block 0
Buffer I/O error on device vdb, logical block 1
Buffer I/O error on device vdb, logical block 2
Buffer I/O error on device vdb, logical block 3
Buffer I/O error on device vdb, logical block 4
Buffer I/O error on device vdb, logical block 5
Buffer I/O error on device vdb, logical block 6
Buffer I/O error on device vdb, logical block 7
end_request: I/O error, dev vdb, sector 0
Buffer I/O error on device vdb, logical block 0
Buffer I/O error on device vdb, logical block 1
end_request: I/O error, dev vdb, sector 0
 unable to read partition table

And any I/O - be it reads of writes - fails.

I see some aio_submit() etc are happening in strace,
but no errors.  Unfortunately my strace does not
decode io_*() routines.

# fgrep io_ trc
...
1227  io_submit(4152147968, 1, {...})   = 1
1226  io_getevents(-142819328, 1, 128, {...}{0, 0}) = 1
1227  io_submit(4152147968, 1, {...})   = 1
1226  io_getevents(-142819328, 1, 128, {...}{0, 0}) = 1
1227  io_submit(4152147968, 1, {...})   = 1
1226  io_getevents(-142819328, 1, 128, {...}{0, 0}) = 1
...

Oh well ;)

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Michael Tokarev
Michael Tokarev wrote:
[]
 Apparently that does not quite work.  I just re-compiled kvm with
 --enable-linux-aio (actually I just installed libaio-dev on debian
 and qemu-kvm's configure picked it up automatically), and tried
 a guest.  But any I/O fails.

It has nothing to do with kvm.  It is compat_ioctl32 in the kernel
wrt aio calls.  Historically I've a 64bit kernel with 32bit userland,
and tried 32bit kvm too, and that does not work.  But 64bit kvm works
just fine with aio, and the performance numbers are indeed better.

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation

2010-03-08 Thread Stefan Bader
Avi Kivity wrote:
 On 03/08/2010 04:10 PM, Stefan Bader wrote:
 Avi Kivity wrote:
   
 On 03/06/2010 03:53 PM, Stefan Bader wrote:
 
 i Avi,

 we currently try to integrate this patch for an update into a 2.6.32
 based
 system (amongst other kvm updates). But as soon as this patch gets
 added kvm
 will die on startup in kvm_leave_lazy_mmu. This has been documented
 here:

 https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823

 I have placed the backports of your patches, which are currently in
 linux-next
 and marked for stable here:

 git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm

 I have tested the failure with a version that got only the following
 patches in:
 KVM: x86 emulator: Add Virtual-8086 mode of emulation
 KVM: x86 emulator: fix memory access during x86 emulation
 KVM: x86 emulator: Check IOPL level during io instruction emulation
 KVM: x86 emulator: Fix popf emulation
 KVM: x86 emulator: Check CPL level during privilege instruction
 emulation

 and also with a version that takes all stable patches up to the bad
 one:
 KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
 KVM: x86 emulator: Add group8 instruction decoding
 KVM: x86 emulator: Add group9 instruction decoding
 KVM: x86 emulator: Add Virtual-8086 mode of emulation
 KVM: x86 emulator: fix memory access during x86 emulation

 But as soon as the fix for memory access gets added, the bug will
 occur. Would
 you have an idea what might be causing this?


 Does the same guest, using the same qemu-kvm, work on kvm.git or
 upstream?

  
 The test was done with a kvm user-space package based on 0.12.3 (which
 seems to
 be the current upstream version). I try to do a test on the git version.

 
 I meant keep the same userspace without change, and try it on a Linus
 kernel or kvm.git master
 (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary).
 
HEAD of kvm.git tree works (with same client and userspace)
Stable 2.6.32.y tree plus all patches marked cc: stable fails.

(32bit host/guest)
Host dmesg:
kvm: emulating exchange as write

Guest dmesg:
...
[3.053503] Freeing initrd memory: 8843k freed
[3.059863] Freeing unused kernel memory: 660k freed
[3.076657] Write protecting the kernel text: 4780k
[3.082863] Write protecting the kernel read-only data: 1912k
[3.08] BUG: unable to handle kernel paging request at c01292e3
[3.088025] IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70
[3.088025] *pde = 00910067 *pte = 00129161
[3.088025] Oops: 0003 [#1] SMP
[3.088025] last sysfs file:
[3.088025] Modules linked in:
[3.088025]
[3.088025] Pid: 1, comm: init Not tainted (2.6.32-15-generic #22-Ubuntu) 
Bochs
[3.088025] EIP: 0060:[c01292e3] EFLAGS: 00010246 CPU: 0
[3.088025] EIP is at kvm_leave_lazy_mmu+0x43/0x70
[3.088025] EAX: 0002 EBX: 0018 ECX: 01802c20 EDX: 
[3.088025] ESI: c1802c20 EDI: c1802c20 EBP: df071cb4 ESP: df071ca8
[3.088025]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[3.088025] Process init (pid: 1, ti=df07 task=df068000 task.ti=df07)
[3.088025] Stack:
[3.088025]  c000 dce2b000 dce2a844 df071cf0 c01e8b6d  0001
b000
[3.088025] 0  db7ed000 c139d54c c139d54c df133000 db7ed000
1ffef067 b000
[3.088025] 0 bfe1 db44bbfc df071d2c c01e8ce0 c000 df133000
db44bbfc bfe1
[3.088025] Call Trace:
[3.088025]  [c01e8b6d] ? move_ptes+0x1ad/0x270
[3.088025]  [c01e8ce0] ? move_page_tables+0xb0/0x130
[3.088025]  [c020b614] ? shift_arg_pages+0x94/0x180
[3.088025]  [c020b885] ? setup_arg_pages+0x185/0x1b0
[3.088025]  [c0241243] ? load_elf_binary+0x3c3/0xac0
[3.088025]  [c02f1654] ? security_file_permission+0x14/0x20
[3.088025]  [c02052f4] ? rw_verify_area+0x64/0xe0
[3.088025]  [c0240e80] ? load_elf_binary+0x0/0xac0
[3.088025]  [c020bd9f] ? search_binary_handler+0xef/0x2f0
[3.088025]  [c020b465] ? kernel_read+0x35/0x50
[3.088025]  [c023f7b2] ? load_script+0x1e2/0x270
[3.088025]  [c01e4160] ? get_user_pages+0x50/0x60
[3.088025]  [c020a662] ? get_arg_page+0x52/0xb0
[3.088025]  [c023f5d0] ? load_script+0x0/0x270
[3.088025]  [c020bd9f] ? search_binary_handler+0xef/0x2f0
[3.088025]  [c020a834] ? copy_strings+0x174/0x190
[3.088025]  [c020c2c7] ? do_execve+0x1f7/0x2c0
[3.088025]  [c034ed6a] ? strncpy_from_user+0x3a/0x70
[3.088025]  [c0101a1d] ? sys_execve+0x2d/0x60
[3.088025]  [c01033ec] ? syscall_call+0x7/0xb
[3.088025]  [c01070a4] ? kernel_execve+0x24/0x30
[3.088025]  [c01012ac] ? run_init_process+0x1c/0x20
[3.088025]  [c0101396] ? init_post+0xe6/0x100
[3.088025]  [c07d83d0] ? kernel_init+0xb8/0xbf
[3.088025]  [c07d8318] ? kernel_init+0x0/0xbf
[3.088025]  [c0104087] ? kernel_thread_helper+0x7/0x10
[3.088025] Code: 6c 87 c0 64 a1 40 6a 87 c0 03 3c 85 80 4a 7d c0 8b 9f 00 04
00 00 85 db 74 24 89 fe 31 d2 66 90 8d 8e 00 00 00 40 b8 02 00 00 00 0f 01 c1
01 c6 29 c3 75 ec c7 87 

Re: [PATCH 00/10] uq/master: irqchip-in-kernel support

2010-03-08 Thread Marcelo Tosatti
On Thu, Mar 04, 2010 at 05:33:16PM +0100, Jan Kiszka wrote:
 Glauber Costa wrote:
  Hi guys,
  
  This is the same in-kernel irqchip support already posted to qemu-devel,
  just rebased, retested, etc. It passes my basic tests, so it seem to be
  still in good shape.
  
  It is provided against uq/master as part of the integration efforts
 
 Just as another heads-up:
 
 host-guest networking performance over slirp and non-virtio NICs
 suffers with this irqchip support the same way as in qemu-kvm. It's not
 a bug I expect to be directly related to these changes, but it is at
 least triggered by them and should now really be addressed.

Isnt it triggered by enablement of the iothread (and if so irqchip
support is unrelated to the problem) ?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-08 Thread Marcelo Tosatti
On Fri, Mar 05, 2010 at 09:37:26PM -0500, Kevin O'Connor wrote:
 On Thu, Mar 04, 2010 at 03:35:52PM -0300, Marcelo Tosatti wrote:
  On Thu, Mar 04, 2010 at 12:58:58AM -0500, Kevin O'Connor wrote:
   On Thu, Mar 04, 2010 at 01:21:12AM -0300, Marcelo Tosatti wrote:
The regression seems to be caused by seabios commit d7e998f. Kevin, the
failure can be seen on the attached screenshot, which happens on the
first reboot of WinXP 32 installation (after copying files etc).
   
   Sorry - I also noticed a bug in that commit recently.  I pushed the
   fix I had in my local tree.
  
  Thanks, it does fix the issue here. Anthony can you please update
  seabios?
 
 Neither commit d7e998f nor the fix 8f469b96 are on the SeaBIOS stable
 branch.  Is qemu ready to pull in bigger changes now?

Anthony pulls in seabios master into qemu.git master periodically.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Nikola Ciprich
 It's faster.
Hi Avi,
Could You give some rough estimate on how much faster?
I'm stuck with glibc-2.5 now, but I'm always eager to improve performance,
so I wonder if it would make sense to either port eventfd + aio stuff, or
switch to glibc-2.8 for me...



-- 
-
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-08 Thread Brian Jackson
On Monday 08 March 2010 03:27:36 pm Nikola Ciprich wrote:
  It's faster.
 
 Hi Avi,
 Could You give some rough estimate on how much faster?
 I'm stuck with glibc-2.5 now, but I'm always eager to improve performance,
 so I wonder if it would make sense to either port eventfd + aio stuff, or
 switch to glibc-2.8 for me...


I saw approx. 10% improvement in sequential i/o. Random i/o was only 
marginally faster in our setup. We generally have problems with random i/o 
here... Something to do with our setup.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/7] kvm-tpr-opt: remove dead code

2010-03-08 Thread Marcelo Tosatti
Simplify code around kvm_enable_tpr_access_reporting.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-kvm-tpr/qemu-kvm-x86.c
===
--- qemu-kvm-tpr.orig/qemu-kvm-x86.c
+++ qemu-kvm-tpr/qemu-kvm-x86.c
@@ -597,30 +597,16 @@ int kvm_get_shadow_pages(kvm_context_t k
 }
 
 #ifdef KVM_CAP_VAPIC
-
-static int tpr_access_reporting(CPUState *env, int enabled)
-{
-   int r;
-   struct kvm_tpr_access_ctl tac = {
-   .enabled = enabled,
-   };
-
-   r = kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_VAPIC);
-   if (r = 0)
-   return -ENOSYS;
-   return kvm_vcpu_ioctl(env, KVM_TPR_ACCESS_REPORTING, tac);
-}
-
-int kvm_enable_tpr_access_reporting(CPUState *env)
+static int kvm_enable_tpr_access_reporting(CPUState *env)
 {
-   return tpr_access_reporting(env, 1);
-}
+int r;
+struct kvm_tpr_access_ctl tac = { .enabled = 1 };
 
-int kvm_disable_tpr_access_reporting(CPUState *env)
-{
-   return tpr_access_reporting(env, 0);
+r = kvm_ioctl(env-kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_VAPIC);
+if (r = 0)
+return -ENOSYS;
+return kvm_vcpu_ioctl(env, KVM_TPR_ACCESS_REPORTING, tac);
 }
-
 #endif
 
 int kvm_qemu_create_memory_alias(uint64_t phys_start,
@@ -1319,7 +1305,7 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 #endif
 
 #ifdef KVM_EXIT_TPR_ACCESS
-kvm_tpr_vcpu_start(cenv);
+kvm_enable_tpr_access_reporting(cenv);
 #endif
 kvm_reset_mpstate(cenv);
 return 0;
Index: qemu-kvm-tpr/qemu-kvm.h
===
--- qemu-kvm-tpr.orig/qemu-kvm.h
+++ qemu-kvm-tpr/qemu-kvm.h
@@ -601,27 +601,6 @@ int kvm_get_pit2(kvm_context_t kvm, stru
 
 #ifdef KVM_CAP_VAPIC
 
-/*!
- * \brief Enable kernel tpr access reporting
- *
- * When tpr access reporting is enabled, the kernel will call the
- * -tpr_access() callback every time the guest vcpu accesses the tpr.
- *
- * \param kvm Pointer to the current kvm_context
- * \param vcpu vcpu to enable tpr access reporting on
- */
-int kvm_enable_tpr_access_reporting(CPUState *env);
-
-/*!
- * \brief Disable kernel tpr access reporting
- *
- * Undoes the effect of kvm_enable_tpr_access_reporting().
- *
- * \param kvm Pointer to the current kvm_context
- * \param vcpu vcpu to disable tpr access reporting on
- */
-int kvm_disable_tpr_access_reporting(CPUState *env);
-
 int kvm_enable_vapic(CPUState *env, uint64_t vapic);
 
 #endif
@@ -895,7 +874,6 @@ void qemu_kvm_aio_wait_end(void);
 void qemu_kvm_notify_work(void);
 
 void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write);
-void kvm_tpr_vcpu_start(CPUState *env);
 
 int qemu_kvm_get_dirty_pages(unsigned long phys_addr, void *buf);
 
Index: qemu-kvm-tpr/kvm-tpr-opt.c
===
--- qemu-kvm-tpr.orig/kvm-tpr-opt.c
+++ qemu-kvm-tpr/kvm-tpr-opt.c
@@ -318,11 +318,6 @@ void kvm_tpr_access_report(CPUState *env
 patch_instruction(env, rip);
 }
 
-void kvm_tpr_vcpu_start(CPUState *env)
-{
-kvm_enable_tpr_access_reporting(env);
-}
-
 static void tpr_save(QEMUFile *f, void *s)
 {
 int i;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/7] kvm-tpr-opt: clean up usage of bios_enabled

2010-03-08 Thread Marcelo Tosatti
1. bios_enabled must already be set when enable_vapic is called.
2. kvm_tpr_vcpu_start is called during vcpu creation, when bios_enabled
is always zero.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-kvm-tpr/kvm-tpr-opt.c
===
--- qemu-kvm-tpr.orig/kvm-tpr-opt.c
+++ qemu-kvm-tpr/kvm-tpr-opt.c
@@ -250,7 +250,6 @@ int kvm_tpr_enable_vapic(CPUState *env)
 
 static int enable_vapic(CPUState *env)
 {
-bios_enabled = 1;
 env-update_vapic = 1;
 return 1;
 }
@@ -322,8 +321,6 @@ void kvm_tpr_access_report(CPUState *env
 void kvm_tpr_vcpu_start(CPUState *env)
 {
 kvm_enable_tpr_access_reporting(env);
-if (bios_enabled)
-   kvm_tpr_enable_vapic(env);
 }
 
 static void tpr_save(QEMUFile *f, void *s)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/7] kvm-tpr-opt cleanups

2010-03-08 Thread Marcelo Tosatti
Prepare kvm-tpr-opt.c for upstream merge.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/7] kvm-tpr-opt: use device_init

2010-03-08 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-kvm-tpr/kvm-tpr-opt.c
===
--- qemu-kvm-tpr.orig/kvm-tpr-opt.c
+++ qemu-kvm-tpr/kvm-tpr-opt.c
@@ -401,10 +401,12 @@ static void vtpr_ioport_write(void *opaq
 kvm_tpr_enable_vapic(env);
 }
 
-void kvm_tpr_opt_setup(void)
+static void kvm_tpr_opt_setup(void)
 {
 register_savevm(kvm-tpr-opt, 0, 1, tpr_save, tpr_load, NULL);
 register_ioport_write(0x7e, 1, 1, vtpr_ioport_write, NULL);
 register_ioport_write(0x7e, 2, 2, vtpr_ioport_write16, NULL);
 }
 
+device_init(kvm_tpr_opt_setup);
+
Index: qemu-kvm-tpr/qemu-kvm-x86.c
===
--- qemu-kvm-tpr.orig/qemu-kvm-x86.c
+++ qemu-kvm-tpr/qemu-kvm-x86.c
@@ -157,10 +157,6 @@ int kvm_arch_create(kvm_context_t kvm, u
if (r  0)
return r;
 
-#ifdef KVM_EXIT_TPR_ACCESS
-kvm_tpr_opt_setup();
-#endif
-
return 0;
 }
 
Index: qemu-kvm-tpr/qemu-kvm.h
===
--- qemu-kvm-tpr.orig/qemu-kvm.h
+++ qemu-kvm-tpr/qemu-kvm.h
@@ -894,7 +894,6 @@ void qemu_kvm_aio_wait_end(void);
 
 void qemu_kvm_notify_work(void);
 
-void kvm_tpr_opt_setup(void);
 void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write);
 void kvm_tpr_vcpu_start(CPUState *env);
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/7] kvm-tpr-opt: qemu-kvm.h - kvm.h

2010-03-08 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-kvm-tpr/kvm-tpr-opt.c
===
--- qemu-kvm-tpr.orig/kvm-tpr-opt.c
+++ qemu-kvm-tpr/kvm-tpr-opt.c
@@ -14,7 +14,7 @@
 #include hw/hw.h
 #include hw/isa.h
 #include sysemu.h
-#include qemu-kvm.h
+#include kvm.h
 #include cpu.h
 
 #include stdio.h


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/7] qemu-kvm: move vapic enablement to kvm_arch_load_regs

2010-03-08 Thread Marcelo Tosatti
update_vapic is used for enabling vcpu's vapic on migration. 
Use the new writeback states for that.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-kvm-tpr/qemu-kvm-x86.c
===
--- qemu-kvm-tpr.orig/qemu-kvm-x86.c
+++ qemu-kvm-tpr/qemu-kvm-x86.c
@@ -988,6 +988,10 @@ void kvm_arch_load_regs(CPUState *env, i
 kvm_arch_load_mpstate(env);
 kvm_load_lapic(env);
 }
+if (level == KVM_PUT_FULL_STATE) {
+if (env-update_vapic)
+kvm_tpr_enable_vapic(env);
+}
 if (kvm_irqchip_in_kernel()) {
 /* Avoid deadlock: no user space IRQ will ever clear it. */
 env-halted = 0;
@@ -1338,9 +1342,6 @@ int kvm_arch_halt(CPUState *env)
 
 int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 {
-if (env-update_vapic) {
-kvm_tpr_enable_vapic(env);
-}
 if (!kvm_irqchip_in_kernel())
kvm_set_cr8(env, cpu_get_apic_tpr(env));
 return 0;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >