Re: [PATCH v3 10/11] KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handling

2015-03-27 Thread Marc Zyngier
On Fri, 27 Mar 2015 00:14:12 +
Andre Przywara  wrote:

> On 03/26/2015 10:06 PM, Marc Zyngier wrote:
> > On Thu, 26 Mar 2015 14:39:37 +
> > Andre Przywara  wrote:
> > 
> >> Using the framework provided by the recent vgic.c changes, we
> >> register a kvm_io_bus device on mapping the virtual GICv3 resources.
> >> The distributor mapping is pretty straight forward, but the
> >> redistributors need some more love, since they need to be tagged with
> >> the respective redistributor (read: VCPU) they are connected with.
> >> We use the kvm_io_bus framework to register one devices per VCPU.
> >>
> >> Signed-off-by: Andre Przywara 
> >> ---
> >>  include/kvm/arm_vgic.h  |1 +
> >>  virt/kvm/arm/vgic-v3-emul.c |   39 ++-
> >>  2 files changed, 39 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> >> index 4523984..d6705f4 100644
> >> --- a/include/kvm/arm_vgic.h
> >> +++ b/include/kvm/arm_vgic.h
> >> @@ -252,6 +252,7 @@ struct vgic_dist {
> >>  
> >>struct vgic_vm_ops  vm_ops;
> >>struct vgic_io_device   dist_iodev;
> >> +  struct vgic_io_device   *redist_iodevs;
> >>  };
> >>  
> >>  struct vgic_v2_cpu_if {
> >> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
> >> index 2f03a36..eb1a797 100644
> >> --- a/virt/kvm/arm/vgic-v3-emul.c
> >> +++ b/virt/kvm/arm/vgic-v3-emul.c
> >> @@ -758,6 +758,9 @@ static int vgic_v3_map_resources(struct kvm *kvm,
> >>  {
> >>int ret = 0;
> >>struct vgic_dist *dist = &kvm->arch.vgic;
> >> +  gpa_t rdbase = dist->vgic_redist_base;
> >> +  struct vgic_io_device *iodevs = NULL;
> >> +  int i;
> >>  
> >>if (!irqchip_in_kernel(kvm))
> >>return 0;
> >> @@ -783,7 +786,41 @@ static int vgic_v3_map_resources(struct kvm *kvm,
> >>goto out;
> >>}
> >>  
> >> -  kvm->arch.vgic.ready = true;
> >> +  ret = vgic_register_kvm_io_dev(kvm, dist->vgic_dist_base,
> >> + GIC_V3_DIST_SIZE, vgic_v3_dist_ranges,
> >> + -1, &dist->dist_iodev);
> > 
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  iodevs = kcalloc(dist->nr_cpus, sizeof(iodevs[0]), GFP_KERNEL);
> >> +  if (!iodevs) {
> >> +  ret = -ENOMEM;
> >> +  goto out_unregister;
> >> +  }
> >> +
> >> +  for (i = 0; i < dist->nr_cpus; i++) {
> >> +  ret = vgic_register_kvm_io_dev(kvm, rdbase,
> >> + SZ_128K, vgic_redist_ranges,
> >> + i, &iodevs[i]);
> > 
> > This looks really weird. You seems to be mapping all redistributors at
> > the same IPA. Have you actually tested this with an SMP guest?
> 
> But the patch continues:
> 
> + if (ret)
> + goto out_unregister;
> + rdbase += GIC_V3_REDIST_SIZE;
> 
> Is that too confusing to re-use the rdbase variable? The spec speaks of
> RD_base for each redistributor, so that seemed sane to me.
> Shall I add a comment or use "rdbase + i * GIC_V3_REDIST_SIZE" for clarity?

That would have been nicer, but let's face the harsh truth: I can't
read, ignore me (well, for this time only ;-).

Reviewed-by: Marc Zyngier 

M.
-- 
Jazz is not dead. It just smells funny.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: vfio: fix fail to set error notification

2015-03-27 Thread Wanpeng Li
After commit 6140a8f56238 ("vfio-pci: Add device request interface") merged,
there is error dump when run guest as below:

qemu-system-x86_64 -enable-kvm -m 2G -smp 2  -net none /root/cathy/rhel6u5.qcow 
-device vfio-pci,host=03:10.5
qemu-system-x86_64: -device vfio-pci,host=03:10.5: vfio: Failed to set up error 
notification

This is caused by missing one break, this patch fix it.

Reported-and-tested-by: Liu Rongrong 
Signed-off-by: Wanpeng Li 
---
 drivers/vfio/pci/vfio_pci_intrs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index f88bfdf..d1b6845 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -868,6 +868,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, 
uint32_t flags,
func = vfio_pci_set_err_trigger;
break;
}
+   break;
case VFIO_PCI_REQ_IRQ_INDEX:
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
case VFIO_IRQ_SET_ACTION_TRIGGER:
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/12] KVM: arm/arm64: remove now obsolete VGIC specific MMIO handling code

2015-03-27 Thread Marc Zyngier
On 23/03/15 15:58, Andre Przywara wrote:
> With all of the virtual GIC emulation code now being registered with
> the kvm_io_bus, we can remove all of the old MMIO handling code and
> its dispatching functionality.
> 
> Signed-off-by: Andre Przywara 
> ---
>  include/kvm/arm_vgic.h  |2 --
>  virt/kvm/arm/vgic-v2-emul.c |   19 
>  virt/kvm/arm/vgic-v3-emul.c |   39 
>  virt/kvm/arm/vgic.c |   71 
> ---
>  virt/kvm/arm/vgic.h |5 ---
>  5 files changed, 136 deletions(-)

Hi Andre,

I've given this some more thoughts, and one thing really worries me. Up
to now, we've only filled the vcpu->run structure when we were about to
give it to userspace, and would never use it ourselves.

Now, we seem to be using much more extensively at various points in the
code. What if userspace changes it under our feet? What guarantee do we
have that this is always safe?

That makes me feel very uncomfortable. I'd rather see an intermediate
structure being used to pass the parameters around, and only fill run at
the last moment. I'd probably sleep better... ;-)

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #12 from Thomas Stein  ---
Hello Igor.

Is this bug in 3.18 also present? I'm asking because i consider a downgrade.

thanks and cheers
t.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] E5-2620v2 - emulation stop error

2015-03-27 Thread Andrey Korolyov
On Fri, Mar 27, 2015 at 12:03 AM, Bandan Das  wrote:
> Radim Krčmář  writes:
>
>> 2015-03-26 21:24+0300, Andrey Korolyov:
>>> On Thu, Mar 26, 2015 at 8:40 PM, Radim Krčmář  wrote:
>>> > 2015-03-26 20:08+0300, Andrey Korolyov:
>>> >> KVM internal error. Suberror: 2
>>> >> extra data[0]: 80ef
>>> >> extra data[1]: 8b0d
>>> >
>>> > Btw. does this part ever change?
>>> >
>>> > I see that first report had:
>>> >
>>> >   KVM internal error. Suberror: 2
>>> >   extra data[0]: 80d1
>>> >   extra data[1]: 8b0d
>>> >
>>> > Was that a Windows guest by any chance?
>>>
>>> Yes, exactly, different extra data output was from a Windows VMs.
>>
>> Windows uses vector 0xd1 for timer interrupts.
>
>> I second Bandan -- checking that it reproduces on other machine would be
>> great for sanity :)  (Although a bug in our APICv is far more likely.)
>
> If it's APICv related, a run without apicv enabled could give more hints.
>
> Your "devices not getting reset" hypothesis makes the most sense to me,
> maybe the timer vector in the error message is just one part of
> the whole story. Another misbehaving interrupt from the dark comes in at the
> same time and leads to a double fault.

Default trace (APICv enabled, first reboot introduced the issue):
http://xdel.ru/downloads/kvm-e5v2-issue/hanged-reboot-apic-on.dat.gz

Trace without APICv (three reboots, just to make sure to hit the
problematic condition of supposed DF, as it still have not one hundred
percent reproducibility):
http://xdel.ru/downloads/kvm-e5v2-issue/apic-off.dat.gz

It would be great of course to reproduce this somewhere else,
otherwise all this thread may end in fixing a bug which exists only at
my particular platform. Right now I have no hardware except a lot of
well-known (in terms of existing issues) Supermicro boards of one
model.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ARM: KVM/XEN: how should we support virt-what?

2015-03-27 Thread Andrew Jones
On Thu, Mar 26, 2015 at 07:50:06PM +0100, Ard Biesheuvel wrote:
> On 26 March 2015 at 19:49, Ard Biesheuvel  wrote:
> > On 26 March 2015 at 19:45, Stefano Stabellini
> >  wrote:
> >> On Thu, 26 Mar 2015, Andrew Jones wrote:
> >>> On Wed, Mar 25, 2015 at 10:44:42AM +0100, Andrew Jones wrote:
> >>> > Hello ARM virt maintainers,
> >>> >
> >>> > I'd like to start a discussion about supporting virt-what[1]. virt-what
> >>> > allows userspace to determine if the system it's running on is running
> >>> > in a guest, and of what type (KVM, Xen, etc.). Despite it being a best
> >>> > effort tool, see the Caveat emptor in [1], it has become quite a useful
> >>> > tool, and is showing up in different places, such as OpenStack. If you
> >>> > look at the code[2], specifically [3], then you'll see how it works on
> >>> > x86, which is to use the dedicated hypervisor cpuid leaves. I'm
> >>> > wondering what equivalent we have, or can develop, for arm.
> >>> > Here are some thoughts;
> >>> > 0) there's already something we can use, and I just need to be told
> >>> >about it.
> >>> > 1) be as similar as possible to x86 by dedicating some currently
> >>> >undefined sysreg bits. This would take buy-in from lots of parties,
> >>> >so is not likely the way to go.
> >>> > 2) create a specific DT node that will get exposed through sysfs, or
> >>> >somewhere.
> >>> > 3) same as (2), but just use the nodes currently in mach-virt's DT
> >>> >as the indication we're a guest. This would just be a heuristic,
> >>> >i.e. "have virtio mmio" && psci.method == hvc, or something,
> >>> >and we'd still need a way to know if we're kvm vs. xen vs. ??.
> >>> >
> >>> > Thanks,
> >>> > drew
> >>> >
> >>> > [1] http://people.redhat.com/~rjones/virt-what/
> >>> > [2] http://git.annexia.org/?p=virt-what.git;a=summary
> >>> > [3] 
> >>> > http://git.annexia.org/?p=virt-what.git;a=blob_plain;f=virt-what-cpuid-helper.c;hb=HEAD
> >>>
> >>> Thanks everyone for their responses. So, the current summary seems to
> >>> be;
> >>> 1) Xen has both a DT node and an ACPI table, virt-what can learn how
> >>>to probe those.
> >>> 2) We don't have anything yet for KVM, and we're reluctant to create a
> >>>specific DT node. Anyway, we'd still need to address ACPI booted
> >>>guests some other way.
> >>>
> >>> For a short-term, DT-only, approach we could go with a heuristic, one
> >>> that includes Marc's "if hypervisor node exists, then xen, else kvm"
> >>> condition.
> >>>
> >>> How about SMBIOS for a long-term solution that works for both DT and
> >>> ACPI? We're not populating SMBIOS for arm guests yet in qemu, but now
> >>> that AAVMF has fw_cfg, we should be able to. On x86 we already have
> >>> smbios populated from qemu, although not in a way that allows us to
> >>> determine kvm vs. xen vs. tcg.
> >>
> >> I don't think that SMBIOS works with DT.
> >>
> >
> > SMBIOS works fine with DT
> 
> ... but it needs UEFI ...

Yes. Perhaps the short-term solution will be a long-term solution for
DT-only, non-UEFI guests, but we can do better for the guests with UEFI,
that may or may not be ACPI, by using SMBIOS.

Unless somebody objects to either/both of these paths, then I guess
we'll start heading down them both.

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/12] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-27 Thread Paul Mackerras
On Fri, Mar 27, 2015 at 02:29:46PM +1100, Paul Mackerras wrote:
> This reads the timebase at various points in the real-mode guest
> entry/exit code and uses that to accumulate total, minimum and
> maximum time spent in those parts of the code.  Currently these
> times are accumulated per vcpu in 5 parts of the code:

I just realized that this is going to give bogus results if we have a
non-zero timebase offset, so ignore this patch for now.  I'll fix it
and repost.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] E5-2620v2 - emulation stop error

2015-03-27 Thread Andrey Korolyov
On Thu, Mar 26, 2015 at 11:40 PM, Radim Krčmář  wrote:
> 2015-03-26 21:24+0300, Andrey Korolyov:
>> On Thu, Mar 26, 2015 at 8:40 PM, Radim Krčmář  wrote:
>> > 2015-03-26 20:08+0300, Andrey Korolyov:
>> >> KVM internal error. Suberror: 2
>> >> extra data[0]: 80ef
>> >> extra data[1]: 8b0d
>> >
>> > Btw. does this part ever change?
>> >
>> > I see that first report had:
>> >
>> >   KVM internal error. Suberror: 2
>> >   extra data[0]: 80d1
>> >   extra data[1]: 8b0d
>> >
>> > Was that a Windows guest by any chance?
>>
>> Yes, exactly, different extra data output was from a Windows VMs.
>
> Windows uses vector 0xd1 for timer interrupts.
>
> I second Bandan -- checking that it reproduces on other machine would be
> great for sanity :)  (Although a bug in our APICv is far more likely.)

Trace with new bits:

KVM internal error. Suberror: 2
extra data[0]: 80ef
extra data[1]: 8b0d
extra data[2]: 77b
EAX= EBX= ECX= EDX=
ESI= EDI= EBP= ESP=6d24
EIP=d331 EFL=00010202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT= 000f6cb0 0037
IDT=  03ff
CR0=0010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb 
19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f
ba 2d d4 fe fb 3f
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: vfio: fix fail to set error notification

2015-03-27 Thread Alex Williamson
On Fri, 2015-03-27 at 16:48 +0800, Wanpeng Li wrote:
> After commit 6140a8f56238 ("vfio-pci: Add device request interface") merged,
> there is error dump when run guest as below:
> 
> qemu-system-x86_64 -enable-kvm -m 2G -smp 2  -net none 
> /root/cathy/rhel6u5.qcow -device vfio-pci,host=03:10.5
> qemu-system-x86_64: -device vfio-pci,host=03:10.5: vfio: Failed to set up 
> error notification
> 
> This is caused by missing one break, this patch fix it.
> 
> Reported-and-tested-by: Liu Rongrong 
> Signed-off-by: Wanpeng Li 
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
> b/drivers/vfio/pci/vfio_pci_intrs.c
> index f88bfdf..d1b6845 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -868,6 +868,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, 
> uint32_t flags,
>   func = vfio_pci_set_err_trigger;
>   break;
>   }
> + break;
>   case VFIO_PCI_REQ_IRQ_INDEX:
>   switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>   case VFIO_IRQ_SET_ACTION_TRIGGER:

Thanks for the report.  This was already fixed in v4.0-rc4:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/vfio/pci/vfio_pci_intrs.c?id=ec76f4007079469e86e2e44c3e5d1d11086de9d6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iscsi multipath failure with "libvirtError: Failed to open file '/dev/mapper/Mar': No such file or directory"

2015-03-27 Thread Stefan Hajnoczi
On Mon, Mar 23, 2015 at 10:14:31PM +0530, mad Engineer wrote:
> hello All,
>   I know the issue is related to libvirt,but i dont know
> where to ask.

The libvirt mailing list is the place to ask libvirt questions.  I have
CCed it.

> i have centos 6.6 running KVM as compute node in openstack icehouse
> 
> when i try to attach volume to instance it shows
> 
> 2596: error : virStorageFileGetMetadataRecurse:952 : Failed to open
> file '/dev/mapper/Mar': No such file or directory
> 
> in libvirt log
> 
> This does not always happen when it happens no one will be able to
> attach volume to instance
> 
> 
> using EMC VNX as storage backend.
> 
> 
> multipath.conf
> 
> 
> # Skip the files uner /dev that are definitely not FC/iSCSI devices
> # Different system may need different customization
> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
> devnode "^hd[a-z][0-9]*"
> devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
> 
> # Skip LUNZ device from VNX
> device {
> vendor "DGC"
> product "LUNZ"
> }
> }
> 
> defaults {
> user_friendly_names no
> flush_on_last_del yes
> }
> 
> devices {
> # Device attributed for EMC CLARiiON and VNX series ALUA
> device {
> vendor "DGC"
> product ".*"
> product_blacklist "LUNZ"
> path_grouping_policy group_by_prio
> path_selector "round-robin 0"
> path_checker emc_clariion
> features "1 queue_if_no_path"
> hardware_handler "1 alua"
> prio alua
> failback immediate
> }
> }
> 
> 
> Can any one help me with this issue

You may need to check dmesg or logs related to the EMC storage.  In
particular, check for LUNs going offline, coming online, or the
multipath device changing state.

Stefan


pgpM6PeZXrDpl.pgp
Description: PGP signature


[PATCH] kvmtool: Add minimal support for macvtap

2015-03-27 Thread Marc Zyngier
In order to be useable by kvmtool, a macvtap interface requires
some minimal configuration (basically setting up the offload bits).
This requires skipping some of the low level TUN/TAP setup.

To avoid adding yet another option, we extend the 'tapif' option
to detect the use of a file (such as /dev/tap23).

Assuming you've run the following as root:

# ip link add link eth0 name kvmtap0 type macvtap mode bridge
# chgrp kvm /dev/tap$(< /sys/class/net/kvmtap0/ifindex)
# chmod g+rw /dev/tap$(< /sys/class/net/kvmtap0/ifindex)

it is fairly easy to have a script that does the following:

#!/bin/sh
addr=$(< /sys/class/net/kvmtap0/address)
tap=/dev/tap$(< /sys/class/net/kvmtap0/ifindex)

kvmtool/lkvm run --console virtio   \
-k /boot/zImage \
-p "console=hvc0 earlyprintk"   \
-n trans=mmio,mode=tap,tapif=$tap,guest_mac=$addr

and you now have your VM running, directly attached to the network.

This patch also removes the TUNSETNOCSUM ioctl that has declared
obsolete for quite some time now...

Signed-off-by: Marc Zyngier 
---
 tools/kvm/virtio/net.c | 40 ++--
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index ecdb94e..25b9496 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -276,6 +276,23 @@ static void virtio_net_handle_callback(struct kvm *kvm, 
struct net_dev *ndev, in
mutex_unlock(&ndev->io_lock[queue]);
 }
 
+static int virtio_net_request_tap(struct net_dev *ndev, struct ifreq *ifr,
+ const char *tapname)
+{
+   int ret;
+
+   memset(ifr, 0, sizeof(*ifr));
+   ifr->ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
+   if (tapname)
+   strncpy(ifr->ifr_name, tapname, sizeof(ifr->ifr_name));
+
+   ret = ioctl(ndev->tap_fd, TUNSETIFF, &ifr);
+
+   if (ret >= 0)
+   strncpy(ndev->tap_name, ifr->ifr_name, sizeof(ndev->tap_name));
+   return ret;
+}
+
 static bool virtio_net__tap_init(struct net_dev *ndev)
 {
int sock = socket(AF_INET, SOCK_STREAM, 0);
@@ -284,6 +301,8 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
struct ifreq ifr;
const struct virtio_net_params *params = ndev->params;
bool skipconf = !!params->tapif;
+   bool macvtap = skipconf && (params->tapif[0] == '/');
+   const char *tap_file = "/dev/net/tun";
 
/* Did the user already gave us the FD? */
if (params->fd) {
@@ -291,28 +310,21 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
return 1;
}
 
-   ndev->tap_fd = open("/dev/net/tun", O_RDWR);
+   if (macvtap)
+   tap_file = params->tapif;
+
+   ndev->tap_fd = open(tap_file, O_RDWR);
if (ndev->tap_fd < 0) {
-   pr_warning("Unable to open /dev/net/tun");
+   pr_warning("Unable to open %s", tap_file);
goto fail;
}
 
-   memset(&ifr, 0, sizeof(ifr));
-   ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
-   if (params->tapif)
-   strncpy(ifr.ifr_name, params->tapif, sizeof(ifr.ifr_name));
-   if (ioctl(ndev->tap_fd, TUNSETIFF, &ifr) < 0) {
+   if (!macvtap &&
+   virtio_net_request_tap(ndev, &ifr, params->tapif) < 0) {
pr_warning("Config tap device error. Are you root?");
goto fail;
}
 
-   strncpy(ndev->tap_name, ifr.ifr_name, sizeof(ndev->tap_name));
-
-   if (ioctl(ndev->tap_fd, TUNSETNOCSUM, 1) < 0) {
-   pr_warning("Config tap device TUNSETNOCSUM error");
-   goto fail;
-   }
-
hdr_len = has_virtio_feature(ndev, VIRTIO_NET_F_MRG_RXBUF) ?
sizeof(struct virtio_net_hdr_mrg_rxbuf) :
sizeof(struct virtio_net_hdr);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] qspinlock stuff -v15

2015-03-27 Thread Konrad Rzeszutek Wilk
On Thu, Mar 26, 2015 at 09:21:53PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 25, 2015 at 03:47:39PM -0400, Konrad Rzeszutek Wilk wrote:
> > Ah nice. That could be spun out as a seperate patch to optimize the existing
> > ticket locks I presume.
> 
> Yes I suppose we can do something similar for the ticket and patch in
> the right increment. We'd need to restructure the code a bit, but
> its not fundamentally impossible.
> 
> We could equally apply the head hashing to the current ticket
> implementation and avoid the current bitmap iteration.
> 
> > Now with the old pv ticketlock code an vCPU would only go to sleep once and
> > be woken up when it was its turn. With this new code it is woken up twice 
> > (and twice it goes to sleep). With an overcommit scenario this would imply
> > that we will have at least twice as many VMEXIT as with the previous code.
> 
> An astute observation, I had not considered that.

Thank you.
> 
> > I presume when you did benchmarking this did not even register? Thought
> > I wonder if it would if you ran the benchmark for a week or so.
> 
> You presume I benchmarked :-) I managed to boot something virt and run
> hackbench in it. I wouldn't know a representative virt setup if I ran
> into it.
> 
> The thing is, we want this qspinlock for real hardware because its
> faster and I really want to avoid having to carry two spinlock
> implementations -- although I suppose that if we really really have to
> we could.

In some way you already have that - for virtualized environments where you
don't have an PV mechanism you just use the byte spinlock - which is good.

And switching to PV ticketlock implementation after boot.. ugh. I feel your 
pain.

What if you used an PV bytelock implemenation? The code you posted already
'sprays' all the vCPUS to wake up. And that is exactly what you need for PV
bytelocks - well, you only need to wake up the vCPUS that have gone to sleep
waiting on an specific 'struct spinlock' and just stash those in an per-cpu
area. The old Xen spinlock code (Before 3.11?) had this.

Just an idea thought.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #13 from Igor Mammedov  ---
Nope, it's only since 3.19.

Could you test patch in comment 11?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #14 from Thomas Stein  ---
I have patch from comment 11 already running on two machines. No problems so
far.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Guest memory backed by PCI BAR (x86)

2015-03-27 Thread Nate Case
> >0x90249: movax,0x1
> >0x9024c: lmsw   ax
> >0x9024f: jmp0x90251
> >0x90251: movax,0x18
> >0x90254: movds,ax
> >0x90256: moves,ax
> >0x90258: movss,ax  <-- the "real" IP
> >0x9025a: movfs,ax
> >0x9025c: movgs,ax
> >0x9025e: jmp0x10:0x1
> 
> This makes more sense.  The processor is looking at this code at least
> until 0x9024c, because of this in the trace:
> 
>  qemu-system-x86-3937  [002] 1474032.001887: kvm_exit: reason
>  CR_ACCESS rip 0x4c
>  qemu-system-x86-3937  [002] 1474032.001887: kvm_cr:   cr_write 0
>  = 0x11
> 
> (bit 4 is always 1 so you see 0x11).
> 
> However, the trace then shows a crash (triple fault) at 0x63, not 0x58.

I was curious about the crash at 0x63 instead of 0x58, and I realized that
the first trace I uploaded had some debug code in the memtest86 setup.S
which would have moved the instruction addresses around.  So the trace
addresses wouldn't have matched the assembly dump exactly.

I uploaded a cleaner trace here:

  http://oss.xes-inc.com/xtmp/trace-pcimem-memtest86-stock-reset.dat.gz

This was used with the stock memtest86 code and also with "-no-reboot"
so you don't see the subsequent boot in the trace.

In this trace, at the end the last guest_rip reference I see is 0x58 now:

 kvm_exit: [FAILED TO PARSE] exit_reason=30 guest_rip=0x69
 kvm_pio:  pio_read at 0x64 size 1 count 1
 kvm_entry:vcpu 0
 kvm_exit: [FAILED TO PARSE] exit_reason=28 guest_rip=0x4c
 kvm_cr:   cr_write 0 = 0x11
 kvm_mmu_get_page: [FAILED TO PARSE] gfn=0 role=983104 root_count=0 
unsync=0 created=0
 kvm_entry:vcpu 0
 kvm_exit: [FAILED TO PARSE] exit_reason=2 guest_rip=0x58

QEMU register dump after the failure looks the same as my last post:

(qemu) info registers
EAX=0018 EBX= ECX=2000 EDX=0092
ESI=5a00 EDI=3ff4 EBP=01d0 ESP=0800
EIP=0058 EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018   00f0ff00 DPL=3 CS64 [CRA]
CS =9020 00090200  00809b00 DPL=0 CS16 [-RA]
SS =9000 0009  00809300 DPL=0 DS16 [-WA]
DS =0018   00f0ff00 DPL=3 CS64 [CRA]
FS =9000 0009  00809300 DPL=0 DS16 [-WA]
GS =9000 0009  00809300 DPL=0 DS16 [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 00090282 0800
IDT=  
CR0=0011 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=
FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=

Instruction dump (matches setup.S code from memtest86+):

(qemu) x/60i 0x90200
0x00090200:  cli
0x00090201:  mov$0x80,%al
0x00090203:  out%al,$0x70
0x00090205:  mov$0x9000,%ax
0x00090208:  mov%ax,%ds
0x0009020a:  mov%ax,%es
0x0009020c:  mov%ax,%fs
0x0009020e:  mov%ax,%ss
0x00090210:  mov%dx,%sp
0x00090212:  push   %cs
0x00090213:  pop%ds
0x00090214:  lidtw  0xa2
0x00090219:  lgdtw  0xa8
0x0009021e:  mov$0x92,%dx
0x00090221:  in (%dx),%al
0x00090222:  cmp$0xff,%al
0x00090224:  je 0x90238
0x00090226:  addr32 mov 0x4(%esp),%ah
0x0009022b:  test   %ah,%ah
0x0009022d:  je 0x90233
0x0009022f:  or $0x2,%al
0x00090231:  jmp0x90235
0x00090233:  and$0xfd,%al
0x00090235:  and$0xfe,%al
0x00090237:  out%al,(%dx)
0x00090238:  call   0x90266
0x0009023b:  mov$0xd1,%al
0x0009023d:  out%al,$0x64
0x0009023f:  call   0x90266
0x00090242:  mov$0xdf,%al
0x00090244:  out%al,$0x60
0x00090246:  call   0x90266
0x00090249:  mov$0x1,%ax
0x0009024c:  lmsw   %ax
0x0009024f:  jmp0x90251
0x00090251:  mov$0x18,%ax
0x00090254:  mov%ax,%ds
0x00090256:  mov%ax,%es
0x00090258:  mov%ax,%ss  <- pc
0x0009025a:  mov%ax,%fs
0x0009025c:  mov%ax,%gs
0x0009025e:  ljmpl  $0x10,$0x1
0x00090266:  ca

Re: [PATCH v3 07/11] KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC

2015-03-27 Thread Christoffer Dall
On Thu, Mar 26, 2015 at 02:39:34PM +, Andre Przywara wrote:
> Currently we use a lot of VGIC specific code to do the MMIO
> dispatching.
> Use the previous reworks to add kvm_io_bus style MMIO handlers.
> 
> Those are not yet called by the MMIO abort handler, also the actual
> VGIC emulator function do not make use of it yet, but will be enabled
> with the following patches.
> 
> Signed-off-by: Andre Przywara 
> Reviewed-by: Marc Zyngier 

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/11] KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus

2015-03-27 Thread Christoffer Dall
On Thu, Mar 26, 2015 at 02:39:38PM +, Andre Przywara wrote:
> Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
> data to be passed on from syndrome decoding all the way down to the
> VGIC register handlers. Now as we switch the MMIO handling to be
> routed through the KVM MMIO bus, it does not make sense anymore to
> use that structure already from the beginning. So we put the data into
> kvm_run very early and use that encapsulation till the MMIO bus call.
> Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
> structure. On that way we replace the data buffer in that structure
> with a pointer pointing to a single location in kvm_run, so we get
> rid of some copying on the way.
> With all of the virtual GIC emulation code now being registered with
> the kvm_io_bus, we can remove all of the old MMIO handling code and
> its dispatching functionality.
> 
> I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
> because that touches a lot of code lines without any good reason.
> 
> This is based on an original patch by Nikolay.
> 
> Signed-off-by: Andre Przywara 
> Cc: Nikolay Nikolaev 
> ---
>  arch/arm/include/asm/kvm_mmio.h   |   22 -
>  arch/arm/kvm/mmio.c   |   60 ++---
>  arch/arm64/include/asm/kvm_mmio.h |   22 -
>  include/kvm/arm_vgic.h|6 ---
>  virt/kvm/arm/vgic-v2-emul.c   |   21 +
>  virt/kvm/arm/vgic-v3-emul.c   |   35 ---
>  virt/kvm/arm/vgic.c   |   89 
> ++---
>  virt/kvm/arm/vgic.h   |   13 +++---
>  8 files changed, 57 insertions(+), 211 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
> index 3f83db2..d8e90c8 100644
> --- a/arch/arm/include/asm/kvm_mmio.h
> +++ b/arch/arm/include/asm/kvm_mmio.h
> @@ -28,28 +28,6 @@ struct kvm_decode {
>   bool sign_extend;
>  };
>  
> -/*
> - * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
> - * which is an anonymous type. Use our own type instead.
> - */
> -struct kvm_exit_mmio {
> - phys_addr_t phys_addr;
> - u8  data[8];
> - u32 len;
> - boolis_write;
> - void*private;
> -};
> -
> -static inline void kvm_prepare_mmio(struct kvm_run *run,
> - struct kvm_exit_mmio *mmio)
> -{
> - run->mmio.phys_addr = mmio->phys_addr;
> - run->mmio.len   = mmio->len;
> - run->mmio.is_write  = mmio->is_write;
> - memcpy(run->mmio.data, mmio->data, mmio->len);
> - run->exit_reason= KVM_EXIT_MMIO;
> -}
> -
>  int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
>phys_addr_t fault_ipa);
> diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
> index 5d3bfc0..bb2ab44 100644
> --- a/arch/arm/kvm/mmio.c
> +++ b/arch/arm/kvm/mmio.c
> @@ -122,7 +122,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>  }
>  
>  static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -   struct kvm_exit_mmio *mmio)
> +   struct kvm_run *run)
>  {
>   unsigned long rt;
>   int len;
> @@ -148,9 +148,9 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t 
> fault_ipa,
>   sign_extend = kvm_vcpu_dabt_issext(vcpu);
>   rt = kvm_vcpu_dabt_get_rd(vcpu);
>  
> - mmio->is_write = is_write;
> - mmio->phys_addr = fault_ipa;
> - mmio->len = len;
> + run->mmio.is_write = is_write;
> + run->mmio.phys_addr = fault_ipa;
> + run->mmio.len = len;
>   vcpu->arch.mmio_decode.sign_extend = sign_extend;
>   vcpu->arch.mmio_decode.rt = rt;
>  
> @@ -162,23 +162,49 @@ static int decode_hsr(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   return 0;
>  }
>  
> +/**
> + * handle_kernel_mmio - handle an in-kernel MMIO access
> + * @vcpu:pointer to the vcpu performing the access
> + * @run: pointer to the kvm_run structure
> + *
> + * returns true if the MMIO access has been performed in kernel space,
> + * and false if it needs to be emulated in user space.
> + */
> +static bool handle_kernel_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> + int ret;
> +
> + if (run->mmio.is_write) {
> + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, run->mmio.phys_addr,
> +run->mmio.len, run->mmio.data);
> +
> + } else {
> + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, run->mmio.phys_addr,
> +   run->mmio.len, run->mmio.data);
> + }
> + if (!ret) {
> + kvm_handle_mmio_return(vcpu, run);
> + return true;
> + }
> +
> + return false;
> +}
> +
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
>phys_addr_t fault_ipa)

Re: [PATCH v3 10/11] KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handling

2015-03-27 Thread Christoffer Dall
On Thu, Mar 26, 2015 at 02:39:37PM +, Andre Przywara wrote:
> Using the framework provided by the recent vgic.c changes, we
> register a kvm_io_bus device on mapping the virtual GICv3 resources.
> The distributor mapping is pretty straight forward, but the
> redistributors need some more love, since they need to be tagged with
> the respective redistributor (read: VCPU) they are connected with.
> We use the kvm_io_bus framework to register one devices per VCPU.
> 
> Signed-off-by: Andre Przywara 

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/11] KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_bus

2015-03-27 Thread Christoffer Dall
On Thu, Mar 26, 2015 at 02:39:35PM +, Andre Przywara wrote:
> Using the framework provided by the recent vgic.c changes we register
> a kvm_io_bus device when initializing the virtual GICv2.
> 
> Signed-off-by: Andre Przywara 

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 09/11] KVM: arm/arm64: merge GICv3 RD_base and SGI_base register frames

2015-03-27 Thread Christoffer Dall
On Thu, Mar 26, 2015 at 02:39:36PM +, Andre Przywara wrote:
> Currently we handle the redistributor registers in two separate MMIO
> regions, one for the overall behaviour and SPIs and one for the
> SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus
> devices for each redistributor.
> Since the spec mandates those two pages to be contigious, we could as
> well merge them and save the churn with the second KVM I/O bus device.
> 
> Signed-off-by: Andre Przywara 

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2015-03-27 Thread Marcelo Tosatti
On Fri, Mar 27, 2015 at 06:34:14AM +, Wu, Feng wrote:
> > > Currently, the following code is executed before local_irq_disable() is 
> > > called,
> > > so do you mean 1)moving local_irq_disable() to the place before it. 2) 
> > > after
> > interrupt
> > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > 
> > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > is set.
> 
> Here is my understanding about your comments here:
> - Disable interrupts
> - Check 'ON'
> - Set KVM_REQ_EVENT if 'ON' is set
> 
> Then we can put the above code inside " if (kvm_check_request(KVM_REQ_EVENT, 
> vcpu) || req_int_win) "
> just like it used to be. However, I still have some questions about this 
> comment:
> 
> 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(), or other 
> places?

See below:

> If in vcpu_enter_guest(), since currently local_irq_disable() is called after 
> 'KVM_REQ_EVENT'
> is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is 
> called?

local_irq_disable();

*** add code here ***

if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
^^
|| need_resched() || signal_pending(current)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
local_irq_enable();
preempt_enable();
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
r = 1;
goto cancel_injection;
}

> 2. 'ON' is set by VT-d hardware, it can be set even when interrupt is 
> disabled (the related bit in PIR is also set).

Yes, we are checking if the HW has set an interrupt in PIR while
outside VM (which requires PIR->VIRR transfer by software).

If the interrupt it set by hardware after local_irq_disable(), 
VMX-entry will handle the interrupt and perform the PIR->VIRR
transfer and reevaluate interrupts, injecting to guest 
if necessary, is that correct ?

> So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly after 
> interrupt is disabled?

To replace the costly 

+*/
+   if (kvm_x86_ops->hwapic_irr_update)
+   kvm_x86_ops->hwapic_irr_update(vcpu,
+   kvm_lapic_find_highest_irr(vcpu));

Yes, i think so.

> I might miss something in your comments, if so please point out. Thanks a lot!
> 
> Thanks,
> Feng
> 
> > 
> > >
> > > "if (kvm_x86_ops->hwapic_irr_update)
> > >   kvm_x86_ops->hwapic_irr_update(vcpu,
> > >   kvm_lapic_find_highest_irr(vcpu));
> > >
> > > > kvm_lapic_find_highest_irr(vcpu) eats some cache
> > > > (4 cachelines) versus 1 cacheline for reading ON bit.
> > > >
> > > > > > > > Please remove blocked and wakeup_cpu, they should not be
> > necessary.
> > > > > > >
> > > > > > > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > > > > > > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > > > > > > is woken up, it can run on a different cpu, so we need wakeup_cpu 
> > > > > > > to
> > > > > > > find the right list to wake up the vCPU.
> > > > > >
> > > > > > If the vCPU was moved it should have updated IRTE destination field
> > > > > > to the pCPU which it has moved to?
> > > > >
> > > > > Every time a vCPU is scheduled to a new pCPU, the IRTE destination 
> > > > > filed
> > > > > would be updated accordingly.
> > > > >
> > > > > When vCPU is blocked. To wake up the blocked vCPU, we need to find
> > which
> > > > > list the vCPU is blocked on, and this is what wakeup_cpu used for?
> > > >
> > > > Right, perhaps prev_vcpu is a better name.
> > >
> > > Do you mean "prev_pcpu"?
> > 
> > Yes.
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Error upgrading vm from windows 8 to 8.1

2015-03-27 Thread coreys
When I try to upgrade my guest windows 8 vm to window 8.1. I get the 
error of processor doesn't support CompareExchange128.  Haven not been 
able to find any information about this error.


--
Corey W. Scherr
Affinity Global Solutions
812 Burlington Dr, STE 300
cor...@affinitygs.com
701-223-3565, ext. 31


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Error upgrading vm from windows 8 to 8.1

2015-03-27 Thread Bandan Das
coreys  writes:

> When I try to upgrade my guest windows 8 vm to window 8.1. I get the
> error of processor doesn't support CompareExchange128.  Haven not been
> able to find any information about this error.

I think that's because cmpxchg16b is not emulated yet:

static int em_cmpxchg8b(struct x86_emulate_ctxt *ctxt)
{
u64 old = ctxt->dst.orig_val64;

if (ctxt->dst.bytes == 16)
return X86EMUL_UNHANDLEABLE;
...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for 4.0-rc5

2015-03-27 Thread Marcelo Tosatti

Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git master

To receive the following PPC KVM bug fixes


Marcelo Tosatti (1):
  Merge tag 'signed-for-4.0' of git://github.com/agraf/linux-2.6

Paul Mackerras (3):
  KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in 
kvmppc_set_lpcr()
  KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
  KVM: PPC: Book3S HV: Fix instruction emulation

 arch/powerpc/kvm/book3s_hv.c|8 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |1 +
 2 files changed, 5 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3a 11/11] KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus

2015-03-27 Thread Andre Przywara
Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
data to be passed on from syndrome decoding all the way down to the
VGIC register handlers. Now as we switch the MMIO handling to be
routed through the KVM MMIO bus, it does not make sense anymore to
use that structure already from the beginning. So we keep the data in
local variables until we put them into the kvm_io_bus framework.
Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
structure. On that way we replace the data buffer in that structure
with a pointer pointing to a single location in a local variable, so
we get rid of some copying on the way.
With all of the virtual GIC emulation code now being registered with
the kvm_io_bus, we can remove all of the old MMIO handling code and
its dispatching functionality.

I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
because that touches a lot of code lines without any good reason.

This is based on an original patch by Nikolay.

Signed-off-by: Andre Przywara 
Cc: Nikolay Nikolaev 
---
Hi,

this is a new version of PATCH v3 11/11, which does not use kvm_run
in the write case anymore. That should fix the problem that Marc and
Christoffer mentioned. The trick is to not use a structure at all,
but instead just pass on the needed values as parameters. By folding
handle_kernel_mmio() into io_mem_abort() we save some code and avoid
passing too many parameters.
As this is the last patch and the only one changed, I just send out
this one and rely on v3 for the other patches. If a complete respin
on the list is preferred, let me know.

Cheers,
Andre.

 arch/arm/include/asm/kvm_mmio.h   |   22 -
 arch/arm/kvm/mmio.c   |   64 ++---
 arch/arm64/include/asm/kvm_mmio.h |   22 -
 include/kvm/arm_vgic.h|6 ---
 virt/kvm/arm/vgic-v2-emul.c   |   21 +
 virt/kvm/arm/vgic-v3-emul.c   |   35 --
 virt/kvm/arm/vgic.c   |   93 -
 virt/kvm/arm/vgic.h   |   13 --
 8 files changed, 55 insertions(+), 221 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
index 3f83db2..d8e90c8 100644
--- a/arch/arm/include/asm/kvm_mmio.h
+++ b/arch/arm/include/asm/kvm_mmio.h
@@ -28,28 +28,6 @@ struct kvm_decode {
bool sign_extend;
 };
 
-/*
- * The in-kernel MMIO emulation code wants to use a copy of run->mmio,
- * which is an anonymous type. Use our own type instead.
- */
-struct kvm_exit_mmio {
-   phys_addr_t phys_addr;
-   u8  data[8];
-   u32 len;
-   boolis_write;
-   void*private;
-};
-
-static inline void kvm_prepare_mmio(struct kvm_run *run,
-   struct kvm_exit_mmio *mmio)
-{
-   run->mmio.phys_addr = mmio->phys_addr;
-   run->mmio.len   = mmio->len;
-   run->mmio.is_write  = mmio->is_write;
-   memcpy(run->mmio.data, mmio->data, mmio->len);
-   run->exit_reason= KVM_EXIT_MMIO;
-}
-
 int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 phys_addr_t fault_ipa);
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
index 5d3bfc0..974b1c6 100644
--- a/arch/arm/kvm/mmio.c
+++ b/arch/arm/kvm/mmio.c
@@ -121,12 +121,11 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return 0;
 }
 
-static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
- struct kvm_exit_mmio *mmio)
+static int decode_hsr(struct kvm_vcpu *vcpu, bool *is_write, int *len)
 {
unsigned long rt;
-   int len;
-   bool is_write, sign_extend;
+   int access_size;
+   bool sign_extend;
 
if (kvm_vcpu_dabt_isextabt(vcpu)) {
/* cache operation on I/O addr, tell guest unsupported */
@@ -140,17 +139,15 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t 
fault_ipa,
return 1;
}
 
-   len = kvm_vcpu_dabt_get_as(vcpu);
-   if (unlikely(len < 0))
-   return len;
+   access_size = kvm_vcpu_dabt_get_as(vcpu);
+   if (unlikely(access_size < 0))
+   return access_size;
 
-   is_write = kvm_vcpu_dabt_iswrite(vcpu);
+   *is_write = kvm_vcpu_dabt_iswrite(vcpu);
sign_extend = kvm_vcpu_dabt_issext(vcpu);
rt = kvm_vcpu_dabt_get_rd(vcpu);
 
-   mmio->is_write = is_write;
-   mmio->phys_addr = fault_ipa;
-   mmio->len = len;
+   *len = access_size;
vcpu->arch.mmio_decode.sign_extend = sign_extend;
vcpu->arch.mmio_decode.rt = rt;
 
@@ -165,20 +162,20 @@ static int decode_hsr(struct kvm_vcpu *vcpu, phys_addr_t 
fault_ipa,
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 phys_addr_t fault_ipa)
 {
-   struct kvm_exit_mmio mmio;
unsigne

[PATCH] KVM: arm/arm64: avoid using kvm_run for in-kernel emulation

2015-03-27 Thread Andre Przywara
Our in-kernel VGIC emulation still uses struct kvm_run briefly before
writing back the emulation result into the guest register. Using a
userspace mapped data structure within the kernel sounds dodgy, also
we do some extra copying at the moment at the end of the VGIC
emulation code.
Replace the usage of struct kvm_run in favour of passing separate
parameters into kvm_handle_mmio_return (and rename the function on
the way) to optimise the VGIC emulation. The real userland MMIO code
path does not change much.

Signed-off-by: Andre Przywara 
---
Hi,

this is an optimization of the VGIC code totally removing struct
kvm_run from the VGIC emulation. In my eyes it provides a nice
cleanup and is a logical consequence of the kvm_io_bus patches (on
which it goes on top). On the other hand it is optional and I didn't
want to merge it with the already quite large last patch 11.
Marc, I leave it up to you whether you take this as part of the
kvm_io_bus series or not.

Cheers,
Andre.

 arch/arm/include/asm/kvm_mmio.h   |3 +-
 arch/arm/kvm/arm.c|6 ++--
 arch/arm/kvm/mmio.c   |   55 ++---
 arch/arm64/include/asm/kvm_mmio.h |3 +-
 virt/kvm/arm/vgic.c   |8 ++
 5 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmio.h b/arch/arm/include/asm/kvm_mmio.h
index d8e90c8..53461a6 100644
--- a/arch/arm/include/asm/kvm_mmio.h
+++ b/arch/arm/include/asm/kvm_mmio.h
@@ -28,7 +28,8 @@ struct kvm_decode {
bool sign_extend;
 };
 
-int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_writeback_mmio_data(struct kvm_vcpu *vcpu, unsigned int len,
+   void *val, gpa_t phys_addr);
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 phys_addr_t fault_ipa);
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e98370c..b837aef 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -506,8 +506,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (ret)
return ret;
 
-   if (run->exit_reason == KVM_EXIT_MMIO) {
-   ret = kvm_handle_mmio_return(vcpu, vcpu->run);
+   if (run->exit_reason == KVM_EXIT_MMIO && !run->mmio.is_write) {
+   ret = kvm_writeback_mmio_data(vcpu, run->mmio.len,
+ run->mmio.data,
+ run->mmio.phys_addr);
if (ret)
return ret;
}
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
index 974b1c6..3c57f96 100644
--- a/arch/arm/kvm/mmio.c
+++ b/arch/arm/kvm/mmio.c
@@ -86,38 +86,36 @@ static unsigned long mmio_read_buf(char *buf, unsigned int 
len)
 }
 
 /**
- * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation
- * @vcpu: The VCPU pointer
- * @run:  The VCPU run struct containing the mmio data
+ * kvm_writeback_mmio_data -- Handle MMIO loads after user space emulation
+ * @vcpu:  The VCPU pointer
+ * @len:   The length in Bytes of the MMIO access
+ * @data_ptr:  Pointer to the data to be written back into the guest
+ * @phys_addr: Physical address of the originating MMIO access
  *
  * This should only be called after returning from userspace for MMIO load
- * emulation.
+ * emulation. phys_addr is only used for the tracepoint output.
  */
-int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_writeback_mmio_data(struct kvm_vcpu *vcpu, unsigned int len,
+   void *data_ptr, gpa_t phys_addr)
 {
unsigned long data;
-   unsigned int len;
int mask;
 
-   if (!run->mmio.is_write) {
-   len = run->mmio.len;
-   if (len > sizeof(unsigned long))
-   return -EINVAL;
+   if (len > sizeof(unsigned long))
+   return -EINVAL;
 
-   data = mmio_read_buf(run->mmio.data, len);
+   data = mmio_read_buf(data_ptr, len);
 
-   if (vcpu->arch.mmio_decode.sign_extend &&
-   len < sizeof(unsigned long)) {
-   mask = 1U << ((len * 8) - 1);
-   data = (data ^ mask) - mask;
-   }
-
-   trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
-  data);
-   data = vcpu_data_host_to_guest(vcpu, data, len);
-   *vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) = data;
+   if (vcpu->arch.mmio_decode.sign_extend &&
+   len < sizeof(unsigned long)) {
+   mask = 1U << ((len * 8) - 1);
+   data = (data ^ mask) - mask;
}
 
+   trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, phys_addr, data);
+   data = vcpu_data_host_to_guest(vcpu, data, len);
+   *vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) = data;
+
return 0;
 }
 
@@ -201,18 +199,19 @@ int io_mem_abort(struc

[PATCH 02/12] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-27 Thread Paul Mackerras
This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  21 +
 arch/powerpc/include/asm/time.h |   3 +
 arch/powerpc/kernel/asm-offsets.c   |  13 +++
 arch/powerpc/kernel/time.c  |   6 ++
 arch/powerpc/kvm/Kconfig|  14 +++
 arch/powerpc/kvm/book3s_hv.c| 150 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 141 +-
 7 files changed, 346 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..d2068bb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
u8 base_page_size;  /* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+   u64 seqcount;   /* used to synchronize access, also count * 2 */
+   u64 tb_total;   /* total time in timebase ticks */
+   u64 tb_min; /* min time */
+   u64 tb_max; /* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
@@ -657,6 +665,19 @@ struct kvm_vcpu_arch {
 
u32 emul_inst;
 #endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
+   u64 cur_tb_start;   /* when it started */
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+
+   struct dentry *debugfs_dir;
+   struct dentry *debugfs_timings;
+#endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
 #define VCPU_FPR(vcpu, i)  (vcpu)->arch.fp.fpr[i][TS_FPROFFSET]
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..3fea721 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -459,6 +459,19 @@ int main(void)
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
 #endif
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+   DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+   DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+   DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+   DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+   DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+   DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
arch.cur_tb_start));
+   DEFINE(TAS_SEQCOUNT, offsetof(struct k

[PATCH 07/12] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI

2015-03-27 Thread Paul Mackerras
When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 9a2ad8f..f3fef6c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
li  r3, NAPPING_NOVCPU
stb r3, HSTATE_NAPPING(r13)
 
+   li  r3, 0   /* Don't wake on privileged (OS) doorbell */
b   kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2128,10 +2129,13 @@ _GLOBAL(kvmppc_h_cede)  /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
bl  kvmhv_accumulate_time
 #endif
 
+   lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
+
/*
 * Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-* runlatch bit before napping.
+* occurs, with PECE1 and PECE0 set in LPCR.
+* On POWER8, if we are ceding, also set PECEDP.
+* Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
mfspr   r0, SPRN_CTRLF
@@ -2143,7 +2147,7 @@ kvm_do_nap:
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-   orisr5,r5,LPCR_PECEDP@h
+   rlwimi  r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/12] KVM: PPC: Book3S HV: Minor cleanups

2015-03-27 Thread Paul Mackerras
* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++---
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2f339ff..3eecd88 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
unsigned long host_sdr1;
int tlbie_lock;
unsigned long lpcr;
-   unsigned long rmor;
unsigned long vrma_slb_v;
int hpte_setup_done;
u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
int n_runnable;
-   int n_busy;
int num_threads;
int entry_exit_count;
int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 3fea721..92ec3fc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -505,7 +505,6 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-   DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f05ae0c..29190af 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -245,9 +245,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
/* Set runlatch bit the minute you wake up from nap */
-   mfspr   r1, SPRN_CTRLF
-   ori r1, r1, 1
-   mtspr   SPRN_CTRLT, r1
+   mfspr   r0, SPRN_CTRLF
+   ori r0, r0, 1
+   mtspr   SPRN_CTRLT, r0
 
ld  r2,PACATOC(r13)
 
@@ -493,11 +493,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   r0,0
beq 20b
 
-   /* Set LPCR and RMOR. */
+   /* Set LPCR. */
 10:ld  r8,VCORE_LPCR(r5)
mtspr   SPRN_LPCR,r8
-   ld  r8,KVM_RMOR(r9)
-   mtspr   SPRN_RMOR,r8
isync
 
/* Check if HDEC expires soon */
@@ -1074,7 +1072,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bne 2f
mfspr   r3,SPRN_HDEC
cmpwi   r3,0
-   bge ignore_hdec
+   mr  r4,r9
+   bge fast_guest_return
 2:
/* See if this is an hcall we can handle in real mode */
cmpwi   r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1082,26 +1081,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* External interrupt ? */
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
-   bne+ext_interrupt_to_host
+   bne+guest_exit_cont
 
/* External interrupt, first check for host_ipi. If this is
 * set, we know the host wants us out so let's do it now
 */
bl  kvmppc_read_intr
cmpdi   r3, 0
-   bgt ext_interrupt_to_host
+   bgt guest_exit_cont
 
/* Check if any CPU is heading out to the host, if so head out too */
ld  r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
-   bge ext_interrupt_to_host
-
-   /* Return to guest after delivering any pending interrupt */
mr  r4, r9
-   b   deliver_guest_interrupt
-
-ext_interrupt_to_host:
+   blt deliver_guest_interrupt
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save more register state  */
@@ -1762,8 +1756,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-   .globl  hcall_try_real_mode
 hcall_try_real_mode:
ld  r3,VCPU_GPR(R3)(r9)
andi.   r0,r11,MSR_PR
@@ -2023,10 +2019,6 @@ hcall_real_table:
.globl  hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-   mr  r4,r9
-   b   fast_guest_return
-
 _GLOBAL(kvmppc_h_set_xdabr)
andi.   r

[PATCH 08/12] KVM: PPC: Book3S HV: Use decrementer to wake napping threads

2015-03-27 Thread Paul Mackerras
This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f3fef6c..1c5d052 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
+   /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   mfspr   r3, SPRN_HDEC
+   mtspr   SPRN_DEC, r3
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
cmpdi   r3, 0
bge kvm_novcpu_exit
 
+   /* See if our timeslice has expired (HDEC is negative) */
+   mfspr   r0, SPRN_HDEC
+   li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+   cmpwi   r0, 0
+   blt kvm_novcpu_exit
+
/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -1492,10 +1501,10 @@ kvmhv_do_exit:  /* r12 = trap, r13 = 
paca */
cmpwi   r3,0x100/* Are we the first here? */
bge 43f
cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   beq 40f
+   beq 43f
li  r0,0
mtspr   SPRN_HDEC,r0
-40:
+
/*
 * Send an IPI to any napping threads, since an HDEC interrupt
 * doesn't wake CPUs up from nap.
@@ -2123,6 +2132,27 @@ _GLOBAL(kvmppc_h_cede)   /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
/* save FP state */
bl  kvmppc_save_fp
 
+   /*
+* Set DEC to the smaller of DEC and HDEC, so that we wake
+* no later than the end of our timeslice (HDEC interrupts
+* don't wake us from nap).
+*/
+   mfspr   r3, SPRN_DEC
+   mfspr   r4, SPRN_HDEC
+   mftbr5
+   cmpwr3, r4
+   ble 67f
+   mtspr   SPRN_DEC, r4
+67:
+   /* save expiry time of guest decrementer */
+   extsw   r3, r3
+   add r3, r3, r5
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   subfr3, r6, r3  /* convert to host TB value */
+   std r3, VCPU_DEC_EXPIRES(r4)
+
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
@@ -2180,6 +2210,15 @@ kvm_end_cede:
/* load up FP state */
bl  kvmppc_load_fp
 
+   /* Restore guest decrementer */
+   ld  r3, VCPU_DEC_EXPIRES(r4)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   add r3, r3, r6  /* convert host TB to guest TB value */
+   mftbr7
+   subfr3, r7, r3
+   mtspr   SPRN_DEC, r3
+
/* Load NV GPRS */
ld  r14, VCPU_GPR(R14)(r4)
ld  r15, VCPU_GPR(R15)(r4)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/12] Remaining improvements for HV KVM

2015-03-27 Thread Paul Mackerras
This is the rest of my current patch queue for HV KVM on PPC.  This
series is based on Alex Graf's kvm-ppc-queue branch.  The only change
from the previous version of this series is that patch 2 has been
updated to take account of the timebase offset.

The last patch in this series needs a definition of PPC_MSGCLR that is
added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
handling", which has now gone upstream into Linus' tree as commit
755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
to handle that?  You could pull in the master branch of the kvm tree,
which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and
let the subsequent merge fix it up.

I would like to see these patches go into 4.1.

Paul.

 arch/powerpc/include/asm/kvm_book3s_64.h |   4 +
 arch/powerpc/include/asm/kvm_host.h  |  44 ++-
 arch/powerpc/include/asm/time.h  |   3 +
 arch/powerpc/kernel/asm-offsets.c|  20 +-
 arch/powerpc/kernel/time.c   |   6 +
 arch/powerpc/kvm/Kconfig |  14 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 ++
 arch/powerpc/kvm/book3s_hv.c | 413 
 arch/powerpc/kvm/book3s_hv_builtin.c |  85 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  12 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 444 +--
 virt/kvm/kvm_main.c  |   1 +
 12 files changed, 909 insertions(+), 273 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/12] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken

2015-03-27 Thread Paul Mackerras
We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv.c| 47 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 83c4425..1517faa 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
int n_runnable;
int num_threads;
int entry_exit_count;
-   int n_woken;
-   int nap_count;
int napping_threads;
int first_vcpuid;
u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 92ec3fc..8aa8246 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -563,7 +563,6 @@ int main(void)
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
-   DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5a1abf6..6741505 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1729,8 +1729,10 @@ static int kvmppc_grab_hwthread(int cpu)
tpaca = &paca[cpu];
 
/* Ensure the thread won't go into the kernel if it wakes */
-   tpaca->kvm_hstate.hwthread_req = 1;
tpaca->kvm_hstate.kvm_vcpu = NULL;
+   tpaca->kvm_hstate.napping = 0;
+   smp_wmb();
+   tpaca->kvm_hstate.hwthread_req = 1;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -1773,35 +1775,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
}
cpu = vc->pcpu + vcpu->arch.ptid;
tpaca = &paca[cpu];
-   tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.kvm_vcore = vc;
tpaca->kvm_hstate.ptid = vcpu->arch.ptid;
vcpu->cpu = vc->pcpu;
+   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
+   tpaca->kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-   if (cpu != smp_processor_id()) {
+   if (cpu != smp_processor_id())
xics_wake_cpu(cpu);
-   if (vcpu->arch.ptid)
-   ++vc->n_woken;
-   }
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-   int i;
+   int cpu = smp_processor_id();
+   int i, loops;
 
-   HMT_low();
-   i = 0;
-   while (vc->nap_count < vc->n_woken) {
-   if (++i >= 100) {
-   pr_err("kvmppc_wait_for_nap timeout %d %d\n",
-  vc->nap_count, vc->n_woken);
-   break;
+   for (loops = 0; loops < 100; ++loops) {
+   /*
+* Check if all threads are finished.
+* We set the vcpu pointer when starting a thread
+* and the thread clears it when finished, so we look
+* for any threads that still have a non-NULL vcpu ptr.
+*/
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   break;
+   if (i == threads_per_subcore) {
+   HMT_medium();
+   return;
}
-   cpu_relax();
+   HMT_low();
}
HMT_medium();
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
 /*
@@ -1942,8 +1952,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc->n_woken = 0;
-   vc->nap_count = 0;
vc->entry_exit_count = 0;
vc-

[PATCH 01/12] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-27 Thread Paul Mackerras
This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vm, where  is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 000bd7ff1196 0003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h  |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 +++
 arch/powerpc/kvm/book3s_hv.c |  12 +++
 virt/kvm/kvm_main.c  |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
return rcu_dereference_raw_notrace(kvm->memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+   struct dentry *debugfs_dir;
+   struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
kvm_get_htab_fd *ghf)
return ret;
 }
 
+struct debugfs_htab_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   hpt_index;
+   int chars_left;
+   int buf_index;
+   charbuf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode->i_private;
+   struct debugfs_htab_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p->kvm = kvm;
+   mutex_init(&p->mutex);
+   file->private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_htab_state *p = file->private_data;
+
+   kvm_put_kvm(p->kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_htab_state *p = file->private_data;
+   ssize_t ret, r;
+   unsigned long i, n;
+   unsigned long v, hr, gr;
+   struct kvm *kvm;
+   __be64 *hptp;
+
+   ret = mutex_lock_interruptible(&p->mutex);
+   if (ret)
+   return ret;
+
+   if (p->chars_left) {
+   n = p->chars_left;
+   if (n > len)
+   n = len;
+   r = copy_to_user(buf, p->buf + p->buf_index, n);
+   n -= r;
+   p->chars_left -= n;
+   p->buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   kvm = p->kvm;
+   i = p->hpt_index;
+   hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+   for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+   if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
+   continue;
+
+   /* lock the HPTE so it's stable and read it */
+   p

[PATCH 05/12] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu

2015-03-27 Thread Paul Mackerras
Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc->lock for us around the reschedule.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c| 92 +
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3eecd88..83c4425 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_RUNNING  2
-#define VCORE_EXITING  3
+#define VCORE_PREEMPT  2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1a6ea6e..5a1abf6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1882,15 +1882,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+   u64 now;
+   long ret;
+   struct kvm_vcpu *vcpu, *vnext;
+
+   now = get_tb();
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   /* cancel pending dec exception if dec is positive */
+   if (now < vcpu->arch.dec_expires &&
+   kvmppc_core_pending_dec(vcpu))
+   kvmppc_core_dequeue_dec(vcpu);
+
+   trace_kvm_guest_exit(vcpu);
+
+   ret = RESUME_GUEST;
+   if (vcpu->arch.trap)
+   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
+   vcpu->arch.run_task);
+
+   vcpu->arch.ret = ret;
+   vcpu->arch.trap = 0;
+
+   if (vcpu->arch.ceded) {
+   if (!is_kvmppc_resume_guest(ret))
+   kvmppc_end_cede(vcpu);
+   else
+   kvmppc_set_timer(vcpu);
+   }
+   if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu, *vnext;
-   long ret;
-   u64 now;
+   struct kvm_vcpu *vcpu;
int i;
int srcu_idx;
 
@@ -1922,8 +1957,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if ((threads_per_core > 1) &&
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) 
{
vcpu->arch.ret = -EBUSY;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
goto out;
}
 
@@ -1979,44 +2017,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   cond_resched();
 
spin_lock(&vc->lock);
-   now = get_tb();
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   /* cancel pending dec exception if dec is positive */
-   if (now < vcpu->arch.dec_expires &&
-   kvmppc_core_pending_dec(vcpu))
-   kvmppc_core_dequeue_dec(vcpu);
-
-   trace_kvm_guest_exit(vcpu);
-
-   ret = RESUME_GUEST;
-   if (vcpu->arch.trap)
-   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
-   vcpu->arch.run_task);
-
-   vcpu->arch.ret = ret;
-   vcpu->arch.trap = 0;
-
-   if (vcpu->arch.ceded) {
-   if (!is_kvmppc_resume_g

[PATCH 12/12] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-27 Thread Paul Mackerras
This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry: 3387.6ns (228 - 86600, 1008969 samples)
  rm_exit: 4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr: 1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry: 3060.1ns (212 - 65138, 953873 samples)
  rm_exit: 4244.1ns (12 - 9693408, 954331 samples)
  rm_intr: 1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.

Signed-off-by: Paul Mackerras 
---
Note that this patch depends on the patch "powerpc/powernv: Fixes for
hypervisor doorbell handling", which is now upstream in Linus' tree as
commit 755563bc79c7, for the definition of PPC_MSGCLR().

 arch/powerpc/kernel/asm-offsets.c   |  3 ++
 arch/powerpc/kvm/book3s_hv.c| 51 ++---
 arch/powerpc/kvm/book3s_hv_builtin.c| 16 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 --
 4 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0d07efb..0034b6b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #include 
@@ -759,5 +760,7 @@ int main(void)
offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1426459..bb29e75 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,9 +85,35 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   preempt_disable();
+   if (cpu_first_thread_sibling(cpu) ==
+   cpu_first_thread_sibling(smp_processor_id())) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu_thread_in_core(cpu);
+   smp_mb();
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+   preempt_enable();
+   return true;
+   }
+   preempt_enable();
+   }
+
+#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
+   if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
+   xics_wake_cpu(cpu);
+   return true;
+   }
+#endif
+
+   return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int me;
int cpu = vcpu->cpu;
wait_queue_head_t *wqp;
 
@@ -96,20 +123,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
 
-   me = get_cpu();
+   if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
+   return;
 
/* CPU points to the first thread of the core */
-   if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-   int real_cpu = cpu + vcpu->arch.ptid;
-   if (paca[real_cpu].kvm_hstate.xics_phys)
-   xics_wake_cpu(real_cpu);
-   else
-#endif
-   if (cpu_online(cpu))
-   smp_send_reschedule(cpu);
-   }
-   put_cpu();
+   if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
+   smp_send_reschedule(cpu);
 }
 
 /*
@@ -1781,10 +1800,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
tpaca->kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
if (cpu != smp_processor_id())
-   xics_wake_cpu(cpu);
-#endif

[PATCH 03/12] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update

2015-03-27 Thread Paul Mackerras
Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c| 56 -
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d2068bb..2f339ff 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_STARTING 2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_RUNNING  2
+#define VCORE_EXITING  3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c7b18ac..1a6ea6e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1863,6 +1863,25 @@ static void kvmppc_start_restoring_l2_cache(const struct 
kvmppc_vcore *vc)
mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+   struct kvm_vcpu *vcpu, *vnext;
+
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   if (signal_pending(vcpu->arch.run_task))
+   vcpu->arch.ret = -EINTR;
+   else if (vcpu->arch.vpa.update_pending ||
+vcpu->arch.slb_shadow.update_pending ||
+vcpu->arch.dtl.update_pending)
+   vcpu->arch.ret = RESUME_GUEST;
+   else
+   continue;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
@@ -1872,46 +1891,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
struct kvm_vcpu *vcpu, *vnext;
long ret;
u64 now;
-   int i, need_vpa_update;
+   int i;
int srcu_idx;
-   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-   /* don't start if any threads have a signal pending */
-   need_vpa_update = 0;
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   if (signal_pending(vcpu->arch.run_task))
-   return;
-   if (vcpu->arch.vpa.update_pending ||
-   vcpu->arch.slb_shadow.update_pending ||
-   vcpu->arch.dtl.update_pending)
-   vcpus_to_update[need_vpa_update++] = vcpu;
-   }
+   /*
+* Remove from the list any threads that have a signal pending
+* or need a VPA update done
+*/
+   prepare_threads(vc);
+
+   /* if the runner is no longer runnable, let the caller pick a new one */
+   if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE)
+   return;
 
/*
-* Initialize *vc, in particular vc->vcore_state, so we can
-* drop the vcore lock if necessary.
+* Initialize *vc.
 */
vc->n_woken = 0;
vc->nap_count = 0;
vc->entry_exit_count = 0;
vc->preempt_tb = TB_NIL;
-   vc->vcore_state = VCORE_STARTING;
vc->in_guest = 0;
vc->napping_threads = 0;
vc->conferring_threads = 0;
 
/*
-* Updating any of the vpas requires calling kvmppc_pin_guest_page,
-* which can't be called with any spinlocks held.
-*/
-   if (need_vpa_update) {
-   spin_unlock(&vc->lock);
-   for (i = 0; i < need_vpa_update; ++i)
-   kvmppc_update_vpas(vcpus_to_update[i]);
-   spin_lock(&vc->lock);
-   }
-
-   /*
 * Make sure we are running on primary threads, and that secondary
 * threads are offline.  Also check if 

[PATCH 10/12] KVM: PPC: Book3S HV: Streamline guest entry and exit

2015-03-27 Thread Paul Mackerras
On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry: avg 4537.3ns (222 - 48444, 1068878 samples)
  rm_exit: avg 4787.6ns (152 - 165490, 1010717 samples)
  rm_intr: avg 1673.6ns (12 - 341304, 3818691 samples)

and this after the change:

 rm_entry: avg 3427.7ns (232 - 68150, 1118921 samples)
  rm_exit: avg 4716.0ns (12 - 150720, 1119477 samples)
  rm_intr: avg 1614.8ns (12 - 522436, 3850432 samples)

showing a substantial reduction in the time spent per guest entry in
the real-mode guest entry code, and smaller reductions in the real
mode guest exit and interrupt handling times.  (The test was to start
the guest and boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 212 +++-
 1 file changed, 126 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 063c235..1de596f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
+   /*
+* Make sure the primary has finished the MMU switch.
+* We should never get here on a secondary thread, but
+* check it for robustness' sake.
+*/
+   ld  r5, HSTATE_KVM_VCORE(r13)
+65:lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+   beq 65b
+   /* Set LPCR. */
+   ld  r8,VCORE_LPCR(r5)
+   mtspr   SPRN_LPCR,r8
+   isync
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
/* check the wake reason */
bl  kvmppc_check_wake_reason
-   
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
@@ -244,7 +257,15 @@ kvm_novcpu_wakeup:
b   kvmppc_got_guest
 
 kvm_novcpu_exit:
-   b   hdec_soon
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   cmpdi   r4, 0
+   beq 13f
+   addir3, r4, VCPU_TB_RMEXIT
+   bl  kvmhv_accumulate_time
+#endif
+13:bl  kvmhv_commence_exit
+   b   kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -422,7 +443,7 @@ kvmppc_hv_entry:
/* Primary thread switches to guest partition. */
ld  r9,VCORE_KVM(r5)/* pointer to struct kvm */
cmpwi   r6,0
-   bne 20f
+   bne 10f
ld  r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
li  r0,LPID_RSVD/* switch to reserved LPID */
@@ -493,26 +514,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
li  r0,1
stb r0,VCORE_IN_GUEST(r5)   /* signal secondaries to continue */
-   b   10f
-
-   /* Secondary threads wait for primary to have done partition switch */
-20:lbz r0,VCORE_IN_GUEST(r5)
-   cmpwi   r0,0
-   beq 20b
-
-   /* Set LPCR. */
-10:ld  r8,VCORE_LPCR(r5)
-   mtspr   SPRN_LPCR,r8
-   isync
-
-   /* Check if HDEC expires soon */
-   mfspr   r3,SPRN_HDEC
-   cmpwi   r3,512  /* 1 microsecond */
-   li  r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   blt hdec_soon
 
/* Do we have a guest vcpu to run? */
-   cmpdi   r4, 0
+10:cmpdi   r4, 0
beq kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -837,6 +841,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
clrrdi  r6,r6,1
mtspr   SPRN_CTRLT,r6
 4:
+   /* Secondary threads wait for primary to have done partition switch */
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   lbz r6, HSTATE_PTID(r13

[PATCH 11/12] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C

2015-03-27 Thread Paul Mackerras
This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI sending code that was
in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c | 63 ++
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 12 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 66 
 4 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..c42aa55 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -184,3 +185,65 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+   __asm__ __volatile__("stbcix %0,0,%1"
+   : : "r" (val), "r" (paddr) : "memory");
+}
+
+/*
+ * Send an interrupt to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+   unsigned long xics_phys;
+
+   /* Poke the target */
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+   int cpu = vc->pcpu;
+
+   /* Order setting of exit map vs. msgsnd/IPI */
+   smp_mb();
+   for (; active; active >>= 1, ++cpu)
+   if (active & 1)
+   kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+   struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
+   int ptid = local_paca->kvm_hstate.ptid;
+   int me, ee;
+
+   /* Set our bit in the threads-exiting-guest map in the 0xff00
+  bits of vcore->entry_exit_map */
+   me = 0x100 << ptid;
+   do {
+   ee = vc->entry_exit_map;
+   } while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee);
+
+   /* Are we the first here? */
+   if ((ee >> 8) != 0)
+   return;
+
+   /*
+* Trigger the other threads in this vcore to exit the guest.
+* If this is a hypervisor decrementer interrupt then they
+* will be already on their way out of the guest.
+*/
+   if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+   kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..00e45b6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -26,12 +26,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-   __asm__ __volatile__("sync; stbcix %0,0,%1"
-   : : "r" (val), "r" (paddr) : "memory");
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -60,7 +54,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
-   unsigned long xics_phys;
int cpu;
 
/* Mark the target VCPU as having an interrupt pending */
@@ -83,9 +76,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT cpu will always point to thread 0, we adjust it */
cpu += vcpu->arch.ptid;
 
-   /* Not too hard, then poke the target */
-   xics_phys = paca[cpu].kvm_hstate.xics_phys;
-   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+   smp_mb();
+   kvmhv_rm_send_ipi(cpu);
 }
 
 static void icp_rm_clr_vcpu_irq(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 1de596f..6c6d030 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhan

[PATCH 09/12] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count

2015-03-27 Thread Paul Mackerras
Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h | 15 
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++--
 5 files changed, 44 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1517faa..d67a838 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
int n_runnable;
int num_threads;
-   int entry_exit_count;
+   int entry_exit_map;
int napping_threads;
int first_vcpuid;
u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)  ((vc)->entry_exit_count & 0xff)
-#define VCORE_EXIT_COUNT(vc)   ((vc)->entry_exit_count >> 8)
+#define VCORE_ENTRY_MAP(vc)((vc)->entry_exit_map & 0xff)
+#define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
+#define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8aa8246..0d07efb 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -562,7 +562,7 @@ int main(void)
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
+   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6741505..1426459 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1952,7 +1952,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc->entry_exit_count = 0;
+   vc->entry_exit_map = 0;
vc->preempt_tb = TB_NIL;
vc->in_guest = 0;
vc->napping_threads = 0;
@@ -2119,8 +2119,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * this thread straight away and have it join in.
 */
if (!signal_pending(current)) {
-   if (vc->vcore_state == VCORE_RUNNING &&
-   VCORE_EXIT_COUNT(vc) == 0) {
+   if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) {
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
int rv = H_SUCCESS; /* => don't yield */
 
set_bit(vcpu->arch.ptid, &vc->conferring_threads);
-   while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) {
-   threads_running = VCORE_ENTRY_COUNT(vc);
-   threads_ceded = hweight32(vc->napping_threads);
-   threads_conferring = hweight32(vc->conferring_threads);
-   if (threads_ceded + threads_conferring >= threads_running) {
+