[PULL] virtio-rng: add derating factor for use by hwrng core

2014-08-14 Thread Amit Shah
Hi Linus,

Sending directly to you with the commit log changes Ted Ts'o pointed
out.  Not sure if Rusty's back after his travel, but this already has
his s-o-b.

Please pull.

The following changes since commit c9d26423e56ce1ab4d786f92aebecf859d419293:

  Merge tag 'pm+acpi-3.17-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2014-08-14 
18:13:46 -0600)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/amit/virtio.git rng-queue

for you to fetch changes up to 34679ec7a0c45da8161507e1f2e1f72749dfd85c:

  virtio: rng: add derating factor for use by hwrng core (2014-08-15 10:26:01 
+0530)



Amit Shah (1):
  virtio: rng: add derating factor for use by hwrng core

 drivers/char/hw_random/virtio-rng.c | 1 +
 1 file changed, 1 insertion(+)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] virtio: rng: add derating factor for use by hwrng core

2014-08-14 Thread Amit Shah
The khwrngd thread is started when a hwrng device of sufficient
quality is registered.  The virtio-rng device is backed by the
hypervisor, and we trust the hypervisor to provide real entropy.

A malicious or badly-implemented hypervisor is a scenario that's
irrelevant -- such a setup is bound to cause all sorts of badness, and a
compromised hwrng is the least of the user's worries.

Given this, we might as well assume that the quality of randomness we
receive is perfectly trustworthy.  Hence, we use 100% for the factor,
indicating maximum confidence in the source.

Signed-off-by: Amit Shah 
Reviewed-by: H. Peter Anvin 
Reviewed-by: Amos Kong 
Signed-off-by: Rusty Russell 

---
Pretty small and contained patch; would be great if it is picked up for
3.17.

v2: re-word commit msg (hpa)
v3: re-word commit msg (tytso)
---
 drivers/char/hw_random/virtio-rng.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 0027137..2e3139e 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -116,6 +116,7 @@ static int probe_common(struct virtio_device *vdev)
.cleanup = virtio_cleanup,
.priv = (unsigned long)vi,
.name = vi->name,
+   .quality = 1000,
};
vdev->priv = vi;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] vhost_net: stop rx net polling when possible

2014-08-14 Thread Jason Wang
After rx vq was enabled, we never stop polling its socket. This is sub optimal
when may lead unnecessary wake-ups after the rx net work has already been
queued. This could be optimized by stopping polling the rx net sock when
processing both rx and tx and restart it afterward. This could save unnecessary
wake-ups and even unnecessary spin locks acquiring with the help of commit
9e641bdcfa4ef4d6e2fbaa59c1be0ad5d1551fd5 "net-tun: restructure tun_do_read for
better sleep/wakeup efficiency".

Test shows significant CPU% savings during almost all the cases:

Guest rx stream:
size(B)/sessions/throughput/cpu/normalized thru/
64/1/+0.7773%   -8.6224% +10.2866%
64/2/+0.6335%   -13.9109%+16.8946%
64/4/-0.8182%   -14.8336%+16.4565%
64/8/+0.4830%   -13.7675%+16.5256%
256/1/-7.0963%  -12.6880%+6.4043%
256/2/-1.3982%  -11.5424%+11.4678%
256/4/-0.0350%  -11.8323%+13.3806%
256/8/-1.5830%  -12.7693%+12.8238%
1024/1/-7.4895% -19.1449%   +14.4152%
1024/2/-7.4575% -19.4018%   +14.8195%
1024/4/-0.3881% -9.1183%+9.6061%
1024/8/+0.4713% -11.0155%   +12.9087%
4096/1/+0.8786%  -8.4050%+10.1355%
4096/2/+0.0098%  -15.3094%   +18.0885%
4096/4/+0.0445%  -10.8247%   +12.1886%
4096/8/-2.1317%  -12.5111%   +11.8637%
16384/1/-0.0008% -6.1891%+6.5966%
16384/2/-0.0117% -16.2716%   +19.4198%
16384/4/+0.0001% -5.9197%+6.2923%
16384/8/+0.0173% -7.6681%+8.3236%
65535/1/+0.0011% -10.3594%   +11.5578%
65535/2/-0.4108%  -14.4304%   +16.3838%
65535/4/+0.0011%  -10.3594%   +11.5578%
65535/8/-0.4108%  -14.4304%   +16.3838%

Guest tx stream:
size(B)/sessions/throughput/cpu/normalized thru/
64/1/-0.6228% -2.1936% +1.6060%
64/2/+0.8646% -3.5063% +4.5297%
64/4/+0.8733% -3.2495% +4.2613%
64/8/+1.4290% -3.5593% +5.1724%
256/1/+7.2098%-3.1122% +10.6535%
256/2/-10.1408%   -6.8230% -3.5607%
256/4/-11.3531%   -6.7085% -4.9785%
256/8/-10.2723%   -6.5628% -3.9701%
1024/1/-18.9329%  -13.6162%-6.1547%
1024/2/-0.3728%   -1.3181% +0.9580%
1024/4/+0.0125%   -3.6338% +3.7838%
1024/8/-0.0030%   -2.7282% +2.8017%
4096/1/+16.9367%  -1.9435% +19.2543%
4096/2/+0.0121%   -6.1682% +6.5866%
4096/4/+0.0019%   -3.8510% +4.0072%
4096/8/-0.0222%   -4.1368% +4.2922%
16384/1/-0.0026%  -8.6892% +9.5132%
16384/2/-0.0012%  -10.1676%+11.3171%
16384/4/+0.0196%  -1.2551% +1.2908%
16384/8/+0.1303%  -3.2634% +3.5082%
65535/1/+0.0019%  -3.4694% +3.5961%
65535/2/-0.0003%  -0.7635% +0.7690%
65535/4/-0.0219%  -2.7875% +2.8448%
65535/8/+0.1137%  -2.7922% +2.9894%

TCP_RR:
size(B)/sessions/throughput/cpu/normalized thru/
256/1/+1.9004%-4.7985% +7.0366%
256/25/-4.7366%   -11.0809%+7.1349%
256/50/+3.9808%   -5.2037% +9.6887%
4096/1/+2.1619%   -0.7303% +2.9134%
4096/25/-13.1836% -14.7298%+1.8134%
4096/50/-11.1990% -15.4763%+5.0605%

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8dae2f7..d4a9742 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -334,6 +334,8 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
 static void handle_tx(struct vhost_net *net)
 {
struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+   struct vhost_virtqueue *rx_vq = &net->vqs[VHOST_NET_VQ_RX].vq;
+   struct vhost_poll *rx_poll = &net->poll[VHOST_NET_VQ_RX];
struct vhost_virtqueue *vq = &nvq->vq;
unsigned out, in, s;
int head;
@@ -348,15 +350,18 @@ static void handle_tx(struct vhost_net *net)
size_t len, total_len = 0;
int err;
size_t hdr_size;
-   struct socket *sock;
+   struct socket *sock, *rxsock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
-   bool zcopy, zcopy_used;
+   bool zcopy, zcopy_used, poll = false;
 
mutex_lock(&vq->mutex);
+   mutex_lock(&rx_vq->mutex);
sock = vq->private_data;
+   rxsock = rx_vq->private_data;
if (!sock)
goto out;
 
+   vhost_poll_stop(rx_poll);
vhost_disable_notify(&net->dev, vq);
 
hdr_size = nvq->vhost_hlen;
@@ -451,11 +456,17 @@ static void handle_tx(struct vhost_net *net)
total_len += len;
vhost_net_tx_packet(net);
if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
-   vhost_poll_queue(&vq->poll);
+   poll = true;
break;
}
}
+
+   if (rxsock)
+   vhost_poll_start(rx_poll, rxsock->file);
+   if (poll)
+   vhost_poll_queue(&vq->poll);
 out:
+   mutex_unlock(&rx_vq->mutex);
mutex_unlock(&vq->mutex);
 }
 
@@ -554,6 +565,7 @@ err:
 static void handle_rx(struct vhost_net *net)
 {
struct vhost_net_v

Re: [Qemu-devel] The status about vhost-net on kvm-arm?

2014-08-14 Thread Li Liu
Hi Ying-Shiuan Pan,

I don't know why for missing your mail in mailbox. Sorry about that.
The results of vhost-net performance have been attached in another mail.

Do you have a plan to renew your patchset to support irqfd. If not,
we will try to finish it based on yours.

On 2014/8/14 11:50, Li Liu wrote:
> 
> 
> On 2014/8/13 19:25, Nikolay Nikolaev wrote:
>> On Wed, Aug 13, 2014 at 12:10 PM, Nikolay Nikolaev
>>  wrote:
>>> On Tue, Aug 12, 2014 at 6:47 PM, Nikolay Nikolaev
>>>  wrote:

 Hello,


 On Tue, Aug 12, 2014 at 5:41 AM, Li Liu  wrote:
>
> Hi all,
>
> Is anyone there can tell the current status of vhost-net on kvm-arm?
>
> Half a year has passed from Isa Ansharullah asked this question:
> http://www.spinics.net/lists/kvm-arm/msg08152.html
>
> I have found two patches which have provided the kvm-arm support of
> eventfd and irqfd:
>
> 1) [RFC PATCH 0/4] ARM: KVM: Enable the ioeventfd capability of KVM on ARM
> http://lists.gnu.org/archive/html/qemu-devel/2014-01/msg01770.html
>
> 2) [RFC,v3] ARM: KVM: add irqfd and irq routing support
> https://patches.linaro.org/32261/
>
> And there's a rough patch for qemu to support eventfd from Ying-Shiuan 
> Pan:
>
> [Qemu-devel] [PATCH 0/4] ioeventfd support for virtio-mmio
> https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html
>
> But there no any comments of this patch. And I can found nothing about 
> qemu
> to support irqfd. Do I lost the track?
>
> If nobody try to fix it. We have a plan to complete it about virtio-mmio
> supporing irqfd and multiqueue.
>
>

 we at Virtual Open Systems did some work and tested vhost-net on ARM
 back in March.
 The setup was based on:
  - host kernel with our ioeventfd patches:
 http://www.spinics.net/lists/kvm-arm/msg08413.html

 - qemu with the aforementioned patches from Ying-Shiuan Pan
 https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

 The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3
 Ethernet adapter connected to a 1Gbps switch. I can't find the actual
 numbers but I remember that with multiple streams the gain was clearly
 seen. Note that it used the minimum required ioventfd implementation
 and not irqfd.

 I guess it is feasible to think that it all can be put together and
 rebased + the recent irqfd work. One can achiev even better
 performance (because of the irqfd).

>>>
>>> Managed to replicate the setup with the old versions e used in March:
>>>
>>> Single stream from another machine to chromebook with 1Gbps USB3
>>> Ethernet adapter.
>>> iperf -c  -P 1 -i 1 -p 5001 -f k -t 10
>>> to HOST: 858316 Kbits/sec
>>> to GUEST: 761563 Kbits/sec
>> to GUEST vhost=off: 508150 Kbits/sec
>>>
>>> 10 parallel streams
>>> iperf -c  -P 10 -i 1 -p 5001 -f k -t 10
>>> to HOST: 842420 Kbits/sec
>>> to GUEST: 625144 Kbits/sec
>> to GUEST vhost=off: 425276 Kbits/sec
> 
> I have tested the same cases on a Hisilicon board (Cortex-A15@1G)
> with Integrated 1Gbps Ethernet adapter.
> 
> iperf -c  -P 1 -i 1 -p 5001 -f M -t 10
> to HOST: 906 Mbits/sec
> to GUEST: 562 Mbits/sec
> to GUEST vhost=off: 340 Mbits/sec
> 
> 10 parallel streams, the performance gets <10% plus:
> iperf -c  -P 10 -i 1 -p 5001 -f M -t 10
> to HOST: 923 Mbits/sec
> to GUEST: 592 Mbits/sec
> to GUEST vhost=off: 364 Mbits/sec
> 
> I't easy to see vhost-net brings great performance improvements,
> almost 50%+.
> 
> Li.
> 
>>>

>
>
>
>
>
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


 regards,
 Nikolay Nikolaev
 Virtual Open Systems
>>
>> .
>>
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-14 Thread Alexey Kardashevskiy
On 08/14/2014 11:40 PM, Alexander Graf wrote:
> 
> On 14.08.14 07:13, Aneesh Kumar K.V wrote:
>> Alexey Kardashevskiy  writes:
>>
>>> fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
>>> functional change but this is not true as it calls get_order() (which
>>> takes bytes) where it should have called ilog2() and the kernel stops
>>> on VM_BUG_ON().
>>>
>>> This replaces get_order() with order_base_2() (round-up version of ilog2).
>>>
>>> Suggested-by: Paul Mackerras 
>>> Cc: Alexander Graf 
>>> Cc: Aneesh Kumar K.V 
>>> Cc: Joonsoo Kim 
>>> Cc: Benjamin Herrenschmidt 
>>> Signed-off-by: Alexey Kardashevskiy 
>> Reviewed-by: Aneesh Kumar K.V 
> 
> So this affects 3.17?

Yes.


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query: Is it possible to lose interrupts between vhost and virtio_net during migration?

2014-08-14 Thread Jason Wang
On 08/14/2014 06:02 PM, Michael S. Tsirkin wrote:
> On Thu, Aug 14, 2014 at 04:52:40PM +0800, Jason Wang wrote:
>> On 08/07/2014 08:47 PM, Zhangjie (HZ) wrote:
>>> On 2014/8/5 20:14, Zhangjie (HZ) wrote:
 On 2014/8/5 17:49, Michael S. Tsirkin wrote:
> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
>> Jason is right, the new order is not the cause of network unreachable.
>> Changing order seems not work. After about 40 times, the problem occurs 
>> again.
>> Maybe there is other hidden reasons for that.
 I modified the code to change the order myself yesterday.
 This result is about my code.
> To make sure, you tested the patch that I posted to list:
> "vhost_net: stop guest notifiers after backend"?
>
> Please confirm.
>
 OK, I will test with your patch "vhost_net: stop guest notifiers after 
 backend".

>>> Unfortunately, after using the patch "vhost_net: stop guest notifiers after 
>>> backend",
>>> Linux VMs stopt themselves a few minutes after they were started.
 @@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
 *ncs,
 goto err;
 }

 +r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
 +if (r < 0) {
 +error_report("Error binding guest notifier: %d", -r);
 +goto err;
 +}
 +
 for (i = 0; i < total_queues; i++) {
 r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);

 @@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
 *ncs,
 }
 }

 -r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
 -if (r < 0) {
 -error_report("Error binding guest notifier: %d", -r);
 -goto err;
 -}
 -
 return 0;
>>> I wonder if k->set_guest_notifiers should be called after "hdev->started = 
>>> true;" in vhost_dev_start.
>> Michael, can we just remove those assertions? Since you may want to set
>> guest notifiers before starting the backend.
> Which assertions?

I mean assert(hdev->started) in vhost.c. Your patch may hit them.

>> Another question for virtio_pci_vector_poll(): why not using
>> msix_notify() instead of msix_set_pending().
> We can do that but the effect will be same since we know
> vector is masked.

Perhaps not in during current vhost starting. We start backend before
setting guest notifiers now. So backend are using masked notifier in
this time but the vector was not masked.

>
>> If so, there's no need to
>> change the vhost_net_start() ?
> Confused, don't see the connection.

If we use msix_notify(), it will raise the irq if backend want it before
setting guest notifiers. So no need to check the order of setting guest
notifiers and starting backend in vhost_net_start().
>
>> Zhang Jie, is this a regression? If yes, could you please do a bisection
>> to find the first bad commit.
>>
>> Thanks
> Pretty sure it's the mq patch: a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
>
> Since we may have many vhost/net devices for a virtio-net device.  The 
> setting of
> guest notifiers were moved out of the starting/stopping of a specific 
> vhost
> thread. The vhost_net_{start|stop}() were renamed to
> vhost_net_{start|stop}_one(), and a new vhost_net_{start|stop}() were 
> introduced
> to configure the guest notifiers and start/stop all vhost/vhost_net 
> devices.
>

Ok.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support

2014-08-14 Thread Mario Smarduch
On 08/13/2014 06:20 PM, Mario Smarduch wrote:
> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
>> On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
>>> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
 On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
>>
>> [...]
>>
>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, 
>>> gpa_t gpa, void *data)
>>>  {
>>> pte_t *pte = (pte_t *)data;
>>>  
>>> -   stage2_set_pte(kvm, NULL, gpa, pte, false);
>>> +   stage2_set_pte(kvm, NULL, gpa, pte, false, false);
>>
>> why is logging never active if we are called from MMU notifiers?
>
[...]

>> The comment is because when you look at this function it is not obvious
>> why we pass logging_active=false, despite logging may actually be
>> active.  This could suggest that the parameter to stage2_set_pte()
>> should be named differently (break_huge_pmds) or something like that,
>> but we can also be satisfied with the comment.
> 
> Ok I see, I was thinking you thought it was breaking something.
> Yeah I'll add the comment, in reality this is another use case
> where a PMD may need to be converted to page table so it makes sense
> to contrast use cases.
> 
>>
>>>
>>> Should I add comments on flag use in other places as well?
>>>
>>
>> It's always a judgement call.  I didn't find it necessarry to put a
>> comment elsewhere because I think it's pretty obivous that we would
>> never care about logging writes to device regions.
>>
>> However, this made me think, are we making sure that we are not marking
>> device mappings as read-only in the wp_range functions?  I think it's
> 
> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
> these regions wind up in KVMState->KVMSlot[], when
> memory_region_add_subregion() is called KVM listener installs it.
> For migration and dirty page logging QEMU walks the KVMSlot[] array.
> 
> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
> causes the memory region to be added to KVMState->KVMSlot[].
> In that case it's possible to walk KVMState->KVMSlot[] issue
> the ioctl and  come across  a device mapping with normal memory and
> WP it's s2ptes (VFIO sets unmigrateble state though).
> 
> But I'm not sure what's there to stop someone calling the ioctl and
> install a region with device memory type. Most likely though if you
> installed that kind of region migration would be disabled.
> 
> But just for logging use not checking memory type could be an issue.

Clarifying above a bit, KVM structures like kvm_run or vgic don't go
through KVM_SET_USER_MEMORY_REGION interface (can't think of any
other KVM structures). VFIO uses KVM_SET_USER_MEMORY_REGION,
user_mem_abort() should resolve the fault. I recall VFIO patch
series add that support.

It should be ok to write protect MMIO regions installed through
KVM_SET_USER_MEMORY_REGION. Although at this time I don't know
of use case for logging without migration, so this may not be
an issue at all at this time.

> 
>> quite bad if we mark the VCPU interface as read-only for example.
>>
>> -Christoffer
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] x86: kvm: Add MTRR support for kvm_get|put_msrs()

2014-08-14 Thread Laszlo Ersek
On 08/14/14 23:39, Alex Williamson wrote:
> The MTRR state in KVM currently runs completely independent of the
> QEMU state in CPUX86State.mtrr_*.  This means that on migration, the
> target loses MTRR state from the source.  Generally that's ok though
> because KVM ignores it and maps everything as write-back anyway.  The
> exception to this rule is when we have an assigned device and an IOMMU
> that doesn't promote NoSnoop transactions from that device to be cache
> coherent.  In that case KVM trusts the guest mapping of memory as
> configured in the MTRR.
> 
> This patch updates kvm_get|put_msrs() so that we retrieve the actual
> vCPU MTRR settings and therefore keep CPUX86State synchronized for
> migration.  kvm_put_msrs() is also used on vCPU reset and therefore
> allows future modificaitons of MTRR state at reset to be realized.
> 
> Note that the entries array used by both functions was already
> slightly undersized for holding every possible MSR, so this patch
> increases it beyond the 28 new entries necessary for MTRR state.
> 
> Signed-off-by: Alex Williamson 
> Cc: Laszlo Ersek 
> Cc: qemu-sta...@nongnu.org
> ---
> 
>  target-i386/cpu.h |2 +
>  target-i386/kvm.c |  101 
> -
>  2 files changed, 101 insertions(+), 2 deletions(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index d37d857..3460b12 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -337,6 +337,8 @@
>  #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
>  #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
>  
> +#define MSR_MTRRphysIndex(addr) addr) & ~1u) - 0x200) / 2)
> +
>  #define MSR_MTRRfix64K_00x250
>  #define MSR_MTRRfix16K_80x258
>  #define MSR_MTRRfix16K_A0x259
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 097fe11..ddedc73 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -79,6 +79,7 @@ static int lm_capable_kernel;
>  static bool has_msr_hv_hypercall;
>  static bool has_msr_hv_vapic;
>  static bool has_msr_hv_tsc;
> +static bool has_msr_mtrr;
>  
>  static bool has_msr_architectural_pmu;
>  static uint32_t num_architectural_pmu_counters;
> @@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
>  }
>  
> +if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
> +has_msr_mtrr = true;
> +}
> +
>  return 0;
>  }
>  
> @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  CPUX86State *env = &cpu->env;
>  struct {
>  struct kvm_msrs info;
> -struct kvm_msr_entry entries[100];
> +struct kvm_msr_entry entries[150];
>  } msr_data;
>  struct kvm_msr_entry *msrs = msr_data.entries;
>  int n = 0, i;
> @@ -1278,6 +1283,37 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_REFERENCE_TSC,
>env->msr_hv_tsc);
>  }
> +if (has_msr_mtrr) {
> +kvm_msr_entry_set(&msrs[n++], MSR_MTRRdefType, 
> env->mtrr_deftype);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix64K_0, env->mtrr_fixed[0]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix16K_8, env->mtrr_fixed[1]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix16K_A, env->mtrr_fixed[2]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_C, env->mtrr_fixed[3]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_C8000, env->mtrr_fixed[4]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_D, env->mtrr_fixed[5]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_D8000, env->mtrr_fixed[6]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_E, env->mtrr_fixed[7]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_E8000, env->mtrr_fixed[8]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_F, env->mtrr_fixed[9]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_F8000, env->mtrr_fixed[10]);
> +for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRphysBase(i), 
> env->mtrr_var[i].base);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRphysMask(i), 
> env->mtrr_var[i].mask);
> +}
> +}
>  
>  /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
>   *   kvm_put_msr_feature_control. */

[PATCH v3 2/3] x86: kvm: Add MTRR support for kvm_get|put_msrs()

2014-08-14 Thread Alex Williamson
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*.  This means that on migration, the
target loses MTRR state from the source.  Generally that's ok though
because KVM ignores it and maps everything as write-back anyway.  The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent.  In that case KVM trusts the guest mapping of memory as
configured in the MTRR.

This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration.  kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.

Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.

Signed-off-by: Alex Williamson 
Cc: Laszlo Ersek 
Cc: qemu-sta...@nongnu.org
---

 target-i386/cpu.h |2 +
 target-i386/kvm.c |  101 -
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index d37d857..3460b12 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -337,6 +337,8 @@
 #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
 #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
 
+#define MSR_MTRRphysIndex(addr) addr) & ~1u) - 0x200) / 2)
+
 #define MSR_MTRRfix64K_00x250
 #define MSR_MTRRfix16K_80x258
 #define MSR_MTRRfix16K_A0x259
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 097fe11..ddedc73 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -79,6 +79,7 @@ static int lm_capable_kernel;
 static bool has_msr_hv_hypercall;
 static bool has_msr_hv_vapic;
 static bool has_msr_hv_tsc;
+static bool has_msr_mtrr;
 
 static bool has_msr_architectural_pmu;
 static uint32_t num_architectural_pmu_counters;
@@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
 env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
 }
 
+if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
+has_msr_mtrr = true;
+}
+
 return 0;
 }
 
@@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 CPUX86State *env = &cpu->env;
 struct {
 struct kvm_msrs info;
-struct kvm_msr_entry entries[100];
+struct kvm_msr_entry entries[150];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
 int n = 0, i;
@@ -1278,6 +1283,37 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_REFERENCE_TSC,
   env->msr_hv_tsc);
 }
+if (has_msr_mtrr) {
+kvm_msr_entry_set(&msrs[n++], MSR_MTRRdefType, env->mtrr_deftype);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix64K_0, env->mtrr_fixed[0]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix16K_8, env->mtrr_fixed[1]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix16K_A, env->mtrr_fixed[2]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_C, env->mtrr_fixed[3]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_C8000, env->mtrr_fixed[4]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_D, env->mtrr_fixed[5]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_D8000, env->mtrr_fixed[6]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_E, env->mtrr_fixed[7]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_E8000, env->mtrr_fixed[8]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_F, env->mtrr_fixed[9]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_F8000, env->mtrr_fixed[10]);
+for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRphysBase(i), env->mtrr_var[i].base);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRphysMask(i), env->mtrr_var[i].mask);
+}
+}
 
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
@@ -1484,7 +1520,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 CPUX86State *env = &cpu->env;
 struct {
 struct kvm_msrs info;
-struct kvm_msr_entry entries[100];
+struct kvm_msr_entry entries[150];
 } msr_data;
 struct kvm_msr_entry *msr

[PATCH v3 1/3] x86: Use common variable range MTRR counts

2014-08-14 Thread Alex Williamson
We currently define the number of variable range MTRR registers as 8
in the CPUX86State structure and vmstate, but use MSR_MTRRcap_VCNT
(also 8) to report to guests the number available.  Change this to
use MSR_MTRRcap_VCNT consistently.

Signed-off-by: Alex Williamson 
Reviewed-by: Laszlo Ersek 
Cc: qemu-sta...@nongnu.org
---

 target-i386/cpu.h |2 +-
 target-i386/machine.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index e634d83..d37d857 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -930,7 +930,7 @@ typedef struct CPUX86State {
 /* MTRRs */
 uint64_t mtrr_fixed[11];
 uint64_t mtrr_deftype;
-MTRRVar mtrr_var[8];
+MTRRVar mtrr_var[MSR_MTRRcap_VCNT];
 
 /* For KVM */
 uint32_t mp_state;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 16d2f6a..fb89065 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -677,7 +677,7 @@ VMStateDescription vmstate_x86_cpu = {
 /* MTRRs */
 VMSTATE_UINT64_ARRAY_V(env.mtrr_fixed, X86CPU, 11, 8),
 VMSTATE_UINT64_V(env.mtrr_deftype, X86CPU, 8),
-VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, 8, 8),
+VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, MSR_MTRRcap_VCNT, 8),
 /* KVM-related states */
 VMSTATE_INT32_V(env.interrupt_injected, X86CPU, 9),
 VMSTATE_UINT32_V(env.mp_state, X86CPU, 9),

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] x86: Clear MTRRs on vCPU reset

2014-08-14 Thread Alex Williamson
The SDM specifies (June 2014 Vol3 11.11.5):

On a hardware reset, the P6 and more recent processors clear the
valid flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
MTRRs are undefined.

We currently do none of that, so whatever MTRR settings you had prior
to reset is what you have after reset.  Usually this doesn't matter
because KVM often ignores the guest mappings and uses write-back
anyway.  However, if you have an assigned device and an IOMMU that
allows NoSnoop for that device, KVM defers to the guest memory
mappings which are now stale after reset.  The result is that OVMF
rebooting on such a configuration takes a full minute to LZMA
decompress the firmware volume, a process that is nearly instant on
the initial boot.

Signed-off-by: Alex Williamson 
Reviewed-by: Laszlo Ersek 
Cc: qemu-sta...@nongnu.org
---

 target-i386/cpu.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 6d008ab..9768be1 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2588,6 +2588,16 @@ static void x86_cpu_reset(CPUState *s)
 
 env->xcr0 = 1;
 
+/*
+ * SDM 11.11.5 requires:
+ *  - IA32_MTRR_DEF_TYPE MSR.E = 0
+ *  - IA32_MTRR_PHYSMASKn.V = 0
+ * All other bits are undefined.  For simplification, zero it all.
+ */
+env->mtrr_deftype = 0;
+memset(env->mtrr_var, 0, sizeof(env->mtrr_var));
+memset(env->mtrr_fixed, 0, sizeof(env->mtrr_fixed));
+
 #if !defined(CONFIG_USER_ONLY)
 /* We hard-wire the BSP to the first CPU. */
 if (s->cpu_index == 0) {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] Sync MTRRs with KVM and disable on reset

2014-08-14 Thread Alex Williamson
v3:
 - Fix off-by-one identified by Laszlo in 2/3
 - Add R-b in 1 & 3

It turns out that not only do we not follow the SDM guidelines for
reseting MTRR state on vCPU reset, but we really don't even attempt
to keep KVM MTRR state synchronized with QEMU, which affects not
only reset, but migration.  This series implements the get/put MSR
support for KVM, then goes on to properly re-initialize the state on
vCPU reset.  This resolves the problem described in the last patch
as well as some potential mismatches around migration.  The migration
state is unchanged, other than actually passing valid data.

Thanks to Laszlo for his help debugging this and realization of how
terribly broken MTRR synchronization is.  Thanks,

Alex

---

Alex Williamson (3):
  x86: Clear MTRRs on vCPU reset
  x86: kvm: Add MTRR support for kvm_get|put_msrs()
  x86: Use common variable range MTRR counts


 target-i386/cpu.c |   10 +
 target-i386/cpu.h |4 +-
 target-i386/kvm.c |  101 -
 target-i386/machine.c |2 -
 4 files changed, 113 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] x86: kvm: Add MTRR support for kvm_get|put_msrs()

2014-08-14 Thread Alex Williamson
On Thu, 2014-08-14 at 23:20 +0200, Laszlo Ersek wrote:
> You're going to use my name in contexts that I won't wish to be privy
> to. :) I like everything about this patch except:
> 
> > +case MSR_MTRRphysBase(0) ... MSR_MTRRphysMask(MSR_MTRRcap_VCNT):
> 
> ... the off-by-one in this case range. Everything is cool and the range
> conforms to
>  (ie. the
> range is inclusive), but the *argument* of the MSR_MTRRphysMask() macro
> is off-by-one. You should say
> 
> case MSR_MTRRphysBase(0) ... MSR_MTRRphysMask(MSR_MTRRcap_VCNT - 1):
> 
> Peek up to the for loops: the greatest argument you ever pass to
> MSR_MTRRphysMask() is (MSR_MTRRcap_VCNT - 1).
> 
> Of course this causes no visible bug, because we don't use those
> register indices at all (and if we *did* use them, then we'd add new
> case labels for them, and then gcc would be required by the standard to
> complain about duplicated case labels [*]).

Nope, legitimate bug.  v3 on the way...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] x86: kvm: Add MTRR support for kvm_get|put_msrs()

2014-08-14 Thread Laszlo Ersek
On 08/14/14 21:24, Alex Williamson wrote:
> The MTRR state in KVM currently runs completely independent of the
> QEMU state in CPUX86State.mtrr_*.  This means that on migration, the
> target loses MTRR state from the source.  Generally that's ok though
> because KVM ignores it and maps everything as write-back anyway.  The
> exception to this rule is when we have an assigned device and an IOMMU
> that doesn't promote NoSnoop transactions from that device to be cache
> coherent.  In that case KVM trusts the guest mapping of memory as
> configured in the MTRR.
> 
> This patch updates kvm_get|put_msrs() so that we retrieve the actual
> vCPU MTRR settings and therefore keep CPUX86State synchronized for
> migration.  kvm_put_msrs() is also used on vCPU reset and therefore
> allows future modificaitons of MTRR state at reset to be realized.
> 
> Note that the entries array used by both functions was already
> slightly undersized for holding every possible MSR, so this patch
> increases it beyond the 28 new entries necessary for MTRR state.
> 
> Signed-off-by: Alex Williamson 
> Cc: Laszlo Ersek 
> Cc: qemu-sta...@nongnu.org
> ---
> 
>  target-i386/cpu.h |2 +
>  target-i386/kvm.c |  101 
> -
>  2 files changed, 101 insertions(+), 2 deletions(-)

Another (positive) remark I wanted to add: if we migrate from an
MTRR-capable KVM host that lacks these patches, to an MTRR-capable KVM
host that has these patches, then the migration stream will simply
contain zeros (because the patch-less source never fetched those from
the source-side KVM), so when we send those zeros to the target KVM, we
won't regress (because those zeroes should match the "initial KVM MTRR
state" that the target comes up in anyway).

If we migrate from patchful to patchless (ie. reverse direction), then
we lose MTRR state, which is the current status quo; not bad.

Thanks
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] x86: Clear MTRRs on vCPU reset

2014-08-14 Thread Laszlo Ersek
On 08/14/14 21:24, Alex Williamson wrote:
> The SDM specifies (June 2014 Vol3 11.11.5):
> 
> On a hardware reset, the P6 and more recent processors clear the
> valid flags in variable-range MTRRs and clear the E flag in the
> IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
> MTRRs are undefined.
> 
> We currently do none of that, so whatever MTRR settings you had prior
> to reset is what you have after reset.  Usually this doesn't matter
> because KVM often ignores the guest mappings and uses write-back
> anyway.  However, if you have an assigned device and an IOMMU that
> allows NoSnoop for that device, KVM defers to the guest memory
> mappings which are now stale after reset.  The result is that OVMF
> rebooting on such a configuration takes a full minute to LZMA
> decompress the firmware volume, a process that is nearly instant on
> the initial boot.
> 
> Signed-off-by: Alex Williamson 
> Cc: Laszlo Ersek 
> Cc: qemu-sta...@nongnu.org
> ---
> 
>  target-i386/cpu.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 6d008ab..9768be1 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2588,6 +2588,16 @@ static void x86_cpu_reset(CPUState *s)
>  
>  env->xcr0 = 1;
>  
> +/*
> + * SDM 11.11.5 requires:
> + *  - IA32_MTRR_DEF_TYPE MSR.E = 0
> + *  - IA32_MTRR_PHYSMASKn.V = 0
> + * All other bits are undefined.  For simplification, zero it all.
> + */
> +env->mtrr_deftype = 0;
> +memset(env->mtrr_var, 0, sizeof(env->mtrr_var));
> +memset(env->mtrr_fixed, 0, sizeof(env->mtrr_fixed));
> +
>  #if !defined(CONFIG_USER_ONLY)
>  /* We hard-wire the BSP to the first CPU. */
>  if (s->cpu_index == 0) {
> 

I like this heavy-handed approach.

Reviewed-by: Laszlo Ersek 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] x86: kvm: Add MTRR support for kvm_get|put_msrs()

2014-08-14 Thread Laszlo Ersek
You're going to use my name in contexts that I won't wish to be privy
to. :) I like everything about this patch except:

On 08/14/14 21:24, Alex Williamson wrote:
> The MTRR state in KVM currently runs completely independent of the
> QEMU state in CPUX86State.mtrr_*.  This means that on migration, the
> target loses MTRR state from the source.  Generally that's ok though
> because KVM ignores it and maps everything as write-back anyway.  The
> exception to this rule is when we have an assigned device and an IOMMU
> that doesn't promote NoSnoop transactions from that device to be cache
> coherent.  In that case KVM trusts the guest mapping of memory as
> configured in the MTRR.
> 
> This patch updates kvm_get|put_msrs() so that we retrieve the actual
> vCPU MTRR settings and therefore keep CPUX86State synchronized for
> migration.  kvm_put_msrs() is also used on vCPU reset and therefore
> allows future modificaitons of MTRR state at reset to be realized.
> 
> Note that the entries array used by both functions was already
> slightly undersized for holding every possible MSR, so this patch
> increases it beyond the 28 new entries necessary for MTRR state.
> 
> Signed-off-by: Alex Williamson 
> Cc: Laszlo Ersek 
> Cc: qemu-sta...@nongnu.org
> ---
> 
>  target-i386/cpu.h |2 +
>  target-i386/kvm.c |  101 
> -
>  2 files changed, 101 insertions(+), 2 deletions(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index d37d857..3460b12 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -337,6 +337,8 @@
>  #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
>  #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
>  
> +#define MSR_MTRRphysIndex(addr) addr) & ~1u) - 0x200) / 2)
> +
>  #define MSR_MTRRfix64K_00x250
>  #define MSR_MTRRfix16K_80x258
>  #define MSR_MTRRfix16K_A0x259
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 097fe11..3c46d4a 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -79,6 +79,7 @@ static int lm_capable_kernel;
>  static bool has_msr_hv_hypercall;
>  static bool has_msr_hv_vapic;
>  static bool has_msr_hv_tsc;
> +static bool has_msr_mtrr;
>  
>  static bool has_msr_architectural_pmu;
>  static uint32_t num_architectural_pmu_counters;
> @@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
>  }
>  
> +if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
> +has_msr_mtrr = true;
> +}
> +
>  return 0;
>  }
>  
> @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  CPUX86State *env = &cpu->env;
>  struct {
>  struct kvm_msrs info;
> -struct kvm_msr_entry entries[100];
> +struct kvm_msr_entry entries[150];
>  } msr_data;
>  struct kvm_msr_entry *msrs = msr_data.entries;
>  int n = 0, i;
> @@ -1278,6 +1283,37 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_REFERENCE_TSC,
>env->msr_hv_tsc);
>  }
> +if (has_msr_mtrr) {
> +kvm_msr_entry_set(&msrs[n++], MSR_MTRRdefType, 
> env->mtrr_deftype);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix64K_0, env->mtrr_fixed[0]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix16K_8, env->mtrr_fixed[1]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix16K_A, env->mtrr_fixed[2]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_C, env->mtrr_fixed[3]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_C8000, env->mtrr_fixed[4]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_D, env->mtrr_fixed[5]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_D8000, env->mtrr_fixed[6]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_E, env->mtrr_fixed[7]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_E8000, env->mtrr_fixed[8]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_F, env->mtrr_fixed[9]);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRfix4K_F8000, env->mtrr_fixed[10]);
> +for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRphysBase(i), 
> env->mtrr_var[i].base);
> +kvm_msr_entry_set(&msrs[n++],
> +  MSR_MTRRphysMask(i), 
> env->mtrr_var[i].mask);
> +}
> +}
>  
>  

[PATCH] kvm-unit-tests: x86: pmu: call measure for every counter in check_counters_many

2014-08-14 Thread Chris J Arges
In the check_counters_many function measure was only being called on the last
counter, causing the pmu test to fail. This ensures that measure is called for
each counter in the array before calling verify_counter.

Signed-off-by: Chris J Arges 
---
 x86/pmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 5c85146..3402d1e 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -287,11 +287,11 @@ static void check_counters_many(void)
n++;
}
 
-   measure(cnt, n);
-
-   for (i = 0; i < n; i++)
+   for (i = 0; i < n; i++) {
+   measure(&cnt[i], 1);
if (!verify_counter(&cnt[i]))
break;
+   }
 
report("all counters", i == n);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] x86: Use common variable range MTRR counts

2014-08-14 Thread Laszlo Ersek
On 08/14/14 21:24, Alex Williamson wrote:
> We currently define the number of variable range MTRR registers as 8
> in the CPUX86State structure and vmstate, but use MSR_MTRRcap_VCNT
> (also 8) to report to guests the number available.  Change this to
> use MSR_MTRRcap_VCNT consistently.
> 
> Signed-off-by: Alex Williamson 
> Cc: Laszlo Ersek 
> Cc: qemu-sta...@nongnu.org
> ---
> 
>  target-i386/cpu.h |2 +-
>  target-i386/machine.c |2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index e634d83..d37d857 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -930,7 +930,7 @@ typedef struct CPUX86State {
>  /* MTRRs */
>  uint64_t mtrr_fixed[11];
>  uint64_t mtrr_deftype;
> -MTRRVar mtrr_var[8];
> +MTRRVar mtrr_var[MSR_MTRRcap_VCNT];
>  
>  /* For KVM */
>  uint32_t mp_state;
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index 16d2f6a..fb89065 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -677,7 +677,7 @@ VMStateDescription vmstate_x86_cpu = {
>  /* MTRRs */
>  VMSTATE_UINT64_ARRAY_V(env.mtrr_fixed, X86CPU, 11, 8),
>  VMSTATE_UINT64_V(env.mtrr_deftype, X86CPU, 8),
> -VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, 8, 8),
> +VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, MSR_MTRRcap_VCNT, 8),
>  /* KVM-related states */
>  VMSTATE_INT32_V(env.interrupt_injected, X86CPU, 9),
>  VMSTATE_UINT32_V(env.mp_state, X86CPU, 9),
> 

Reviewed-by: Laszlo Ersek 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] x86: Clear MTRRs on vCPU reset

2014-08-14 Thread Alex Williamson
The SDM specifies (June 2014 Vol3 11.11.5):

On a hardware reset, the P6 and more recent processors clear the
valid flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the
MTRRs are undefined.

We currently do none of that, so whatever MTRR settings you had prior
to reset is what you have after reset.  Usually this doesn't matter
because KVM often ignores the guest mappings and uses write-back
anyway.  However, if you have an assigned device and an IOMMU that
allows NoSnoop for that device, KVM defers to the guest memory
mappings which are now stale after reset.  The result is that OVMF
rebooting on such a configuration takes a full minute to LZMA
decompress the firmware volume, a process that is nearly instant on
the initial boot.

Signed-off-by: Alex Williamson 
Cc: Laszlo Ersek 
Cc: qemu-sta...@nongnu.org
---

 target-i386/cpu.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 6d008ab..9768be1 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2588,6 +2588,16 @@ static void x86_cpu_reset(CPUState *s)
 
 env->xcr0 = 1;
 
+/*
+ * SDM 11.11.5 requires:
+ *  - IA32_MTRR_DEF_TYPE MSR.E = 0
+ *  - IA32_MTRR_PHYSMASKn.V = 0
+ * All other bits are undefined.  For simplification, zero it all.
+ */
+env->mtrr_deftype = 0;
+memset(env->mtrr_var, 0, sizeof(env->mtrr_var));
+memset(env->mtrr_fixed, 0, sizeof(env->mtrr_fixed));
+
 #if !defined(CONFIG_USER_ONLY)
 /* We hard-wire the BSP to the first CPU. */
 if (s->cpu_index == 0) {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] x86: kvm: Add MTRR support for kvm_get|put_msrs()

2014-08-14 Thread Alex Williamson
The MTRR state in KVM currently runs completely independent of the
QEMU state in CPUX86State.mtrr_*.  This means that on migration, the
target loses MTRR state from the source.  Generally that's ok though
because KVM ignores it and maps everything as write-back anyway.  The
exception to this rule is when we have an assigned device and an IOMMU
that doesn't promote NoSnoop transactions from that device to be cache
coherent.  In that case KVM trusts the guest mapping of memory as
configured in the MTRR.

This patch updates kvm_get|put_msrs() so that we retrieve the actual
vCPU MTRR settings and therefore keep CPUX86State synchronized for
migration.  kvm_put_msrs() is also used on vCPU reset and therefore
allows future modificaitons of MTRR state at reset to be realized.

Note that the entries array used by both functions was already
slightly undersized for holding every possible MSR, so this patch
increases it beyond the 28 new entries necessary for MTRR state.

Signed-off-by: Alex Williamson 
Cc: Laszlo Ersek 
Cc: qemu-sta...@nongnu.org
---

 target-i386/cpu.h |2 +
 target-i386/kvm.c |  101 -
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index d37d857..3460b12 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -337,6 +337,8 @@
 #define MSR_MTRRphysBase(reg)   (0x200 + 2 * (reg))
 #define MSR_MTRRphysMask(reg)   (0x200 + 2 * (reg) + 1)
 
+#define MSR_MTRRphysIndex(addr) addr) & ~1u) - 0x200) / 2)
+
 #define MSR_MTRRfix64K_00x250
 #define MSR_MTRRfix16K_80x258
 #define MSR_MTRRfix16K_A0x259
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 097fe11..3c46d4a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -79,6 +79,7 @@ static int lm_capable_kernel;
 static bool has_msr_hv_hypercall;
 static bool has_msr_hv_vapic;
 static bool has_msr_hv_tsc;
+static bool has_msr_mtrr;
 
 static bool has_msr_architectural_pmu;
 static uint32_t num_architectural_pmu_counters;
@@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
 env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
 }
 
+if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
+has_msr_mtrr = true;
+}
+
 return 0;
 }
 
@@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 CPUX86State *env = &cpu->env;
 struct {
 struct kvm_msrs info;
-struct kvm_msr_entry entries[100];
+struct kvm_msr_entry entries[150];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
 int n = 0, i;
@@ -1278,6 +1283,37 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_REFERENCE_TSC,
   env->msr_hv_tsc);
 }
+if (has_msr_mtrr) {
+kvm_msr_entry_set(&msrs[n++], MSR_MTRRdefType, env->mtrr_deftype);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix64K_0, env->mtrr_fixed[0]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix16K_8, env->mtrr_fixed[1]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix16K_A, env->mtrr_fixed[2]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_C, env->mtrr_fixed[3]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_C8000, env->mtrr_fixed[4]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_D, env->mtrr_fixed[5]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_D8000, env->mtrr_fixed[6]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_E, env->mtrr_fixed[7]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_E8000, env->mtrr_fixed[8]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_F, env->mtrr_fixed[9]);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRfix4K_F8000, env->mtrr_fixed[10]);
+for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRphysBase(i), env->mtrr_var[i].base);
+kvm_msr_entry_set(&msrs[n++],
+  MSR_MTRRphysMask(i), env->mtrr_var[i].mask);
+}
+}
 
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
@@ -1484,7 +1520,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 CPUX86State *env = &cpu->env;
 struct {
 struct kvm_msrs info;
-struct kvm_msr_entry entries[100];
+struct kvm_msr_entry entries[150];
 } msr_data;
 struct kvm_msr_entry *msr

[PATCH v2 1/3] x86: Use common variable range MTRR counts

2014-08-14 Thread Alex Williamson
We currently define the number of variable range MTRR registers as 8
in the CPUX86State structure and vmstate, but use MSR_MTRRcap_VCNT
(also 8) to report to guests the number available.  Change this to
use MSR_MTRRcap_VCNT consistently.

Signed-off-by: Alex Williamson 
Cc: Laszlo Ersek 
Cc: qemu-sta...@nongnu.org
---

 target-i386/cpu.h |2 +-
 target-i386/machine.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index e634d83..d37d857 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -930,7 +930,7 @@ typedef struct CPUX86State {
 /* MTRRs */
 uint64_t mtrr_fixed[11];
 uint64_t mtrr_deftype;
-MTRRVar mtrr_var[8];
+MTRRVar mtrr_var[MSR_MTRRcap_VCNT];
 
 /* For KVM */
 uint32_t mp_state;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 16d2f6a..fb89065 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -677,7 +677,7 @@ VMStateDescription vmstate_x86_cpu = {
 /* MTRRs */
 VMSTATE_UINT64_ARRAY_V(env.mtrr_fixed, X86CPU, 11, 8),
 VMSTATE_UINT64_V(env.mtrr_deftype, X86CPU, 8),
-VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, 8, 8),
+VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, MSR_MTRRcap_VCNT, 8),
 /* KVM-related states */
 VMSTATE_INT32_V(env.interrupt_injected, X86CPU, 9),
 VMSTATE_UINT32_V(env.mp_state, X86CPU, 9),

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Sync MTRRs with KVM and disable on reset

2014-08-14 Thread Alex Williamson
It turns out that not only do we not follow the SDM guidelines for
reseting MTRR state on vCPU reset, but we really don't even attempt
to keep KVM MTRR state synchronized with QEMU, which affects not
only reset, but migration.  This series implements the get/put MSR
support for KVM, then goes on to properly re-initialize the state on
vCPU reset.  This resolves the problem described in the last patch
as well as some potential mismatches around migration.  The migration
state is unchanged, other than actually passing valid data.

Thanks to Laszlo for his help debugging this and realization of how
terribly broken MTRR synchronization is.  Thanks,

Alex

---

Alex Williamson (3):
  x86: Clear MTRRs on vCPU reset
  x86: kvm: Add MTRR support for kvm_get|put_msrs()
  x86: Use common variable range MTRR counts


 target-i386/cpu.c |   10 +
 target-i386/cpu.h |4 +-
 target-i386/kvm.c |  101 -
 target-i386/machine.c |2 -
 4 files changed, 113 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code

2014-08-14 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 10:43 PM, Andy Lutomirski  wrote:
> Currently, init_std_data calls ktime_get_real().  This imposes
> awkward constraints on when init_std_data can be called, and
> init_std_data is unlikely to collect the full unpredictable data
> available to the timekeeping code, especially after resume.
>
> Remove this code from random.c and add the appropriate
> add_device_randomness calls to timekeeping.c instead.

*sigh* this is buggy:


> +   add_device_randomness(tk, sizeof(tk));

sizeof(*tk)

> +   add_device_randomness(tk, sizeof(tk));

ditto.

I'll fix this for v7, but I'll wait awhile for other comments to reduce spam.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: x86: fix stale mmio cache bug

2014-08-14 Thread David Matlack
On Thu, Aug 14, 2014 at 12:01 AM, Xiao Guangrong
 wrote:
> From: David Matlack 
>
> The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
> up to userspace:
>
> (1) Guest accesses gpa X without a memory slot. The gfn is cached in
> struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
> the SPTE write-execute-noread so that future accesses cause
> EPT_MISCONFIGs.
>
> (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
> covering the page just accessed.
>
> (3) Guest attempts to read or write to gpa X again. On Intel, this
> generates an EPT_MISCONFIG. The memory slot generation number that
> was incremented in (2) would normally take care of this but we fast
> path mmio faults through quickly_check_mmio_pf(), which only checks
> the per-vcpu mmio cache. Since we hit the cache, KVM passes a
> KVM_EXIT_MMIO up to userspace.
>
> This patch fixes the issue by using the memslot generation number
> to validate the mmio cache.
>
> [ xiaoguangrong: adjust the code to make it simpler for stable-tree fix. ]

The adjustments look good. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] The status about vhost-net on kvm-arm?

2014-08-14 Thread Joel Schopp



we at Virtual Open Systems did some work and tested vhost-net on ARM
back in March.
The setup was based on:
  - host kernel with our ioeventfd patches:
http://www.spinics.net/lists/kvm-arm/msg08413.html

- qemu with the aforementioned patches from Ying-Shiuan Pan
https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00715.html

The testbed was ARM Chromebook with Exynos 5250, using a 1Gbps USB3
Ethernet adapter connected to a 1Gbps switch. I can't find the actual
numbers but I remember that with multiple streams the gain was clearly
seen. Note that it used the minimum required ioventfd implementation
and not irqfd.

I guess it is feasible to think that it all can be put together and
rebased + the recent irqfd work. One can achiev even better
performance (because of the irqfd).


Managed to replicate the setup with the old versions e used in March:

Single stream from another machine to chromebook with 1Gbps USB3
Ethernet adapter.
iperf -c  -P 1 -i 1 -p 5001 -f k -t 10
to HOST: 858316 Kbits/sec
to GUEST: 761563 Kbits/sec

to GUEST vhost=off: 508150 Kbits/sec

10 parallel streams
iperf -c  -P 10 -i 1 -p 5001 -f k -t 10
to HOST: 842420 Kbits/sec
to GUEST: 625144 Kbits/sec

to GUEST vhost=off: 425276 Kbits/sec

I have tested the same cases on a Hisilicon board (Cortex-A15@1G)
with Integrated 1Gbps Ethernet adapter.

iperf -c  -P 1 -i 1 -p 5001 -f M -t 10
to HOST: 906 Mbits/sec
to GUEST: 562 Mbits/sec
to GUEST vhost=off: 340 Mbits/sec

10 parallel streams, the performance gets <10% plus:
iperf -c  -P 10 -i 1 -p 5001 -f M -t 10
to HOST: 923 Mbits/sec
to GUEST: 592 Mbits/sec
to GUEST vhost=off: 364 Mbits/sec

I't easy to see vhost-net brings great performance improvements,
almost 50%+.
That's pretty impressive for not even having irqfd.  I guess we should 
renew some effort to get these patches merged upstream.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm/arm64: KVM: Support KVM_CAP_READONLY_MEM

2014-08-14 Thread Marc Zyngier
On Thu, Jul 10 2014 at  3:42:31 pm BST, Christoffer Dall 
 wrote:
> When userspace loads code and data in a read-only memory regions, KVM
> needs to be able to handle this on arm and arm64.  Specifically this is
> used when running code directly from a read-only flash device; the
> common scenario is a UEFI blob loaded with the -bios option in QEMU.
>
> To avoid looking through the memslots twice and to reuse the hva error
> checking of gfn_to_hva_prot(), add a new gfn_to_hva_memslot_prot()
> function and refactor gfn_to_hva_prot() to use this function.
>
> Signed-off-by: Christoffer Dall 

This looks good to me, but you may want to split the patch in two
(generic stuff, and the ARM code).

One question though...

> ---
> Note that if you want to test this with QEMU, you need to update the
> uapi headers.  You can also grab the branch below from my qemu git tree
> with the temporary update headers patch applied on top of Peter
> Maydell's -bios in -M virt support patches:
>
> git://git.linaro.org/people/christoffer.dall/qemu-arm.git virt-for-uefi
>
>  arch/arm/include/uapi/asm/kvm.h   |  1 +
>  arch/arm/kvm/arm.c|  1 +
>  arch/arm/kvm/mmu.c| 15 ---
>  arch/arm64/include/uapi/asm/kvm.h |  1 +
>  include/linux/kvm_host.h  |  2 ++
>  virt/kvm/kvm_main.c   | 11 +--
>  6 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index e6ebdd3..51257fd 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -25,6 +25,7 @@
>  
>  #define __KVM_HAVE_GUEST_DEBUG
>  #define __KVM_HAVE_IRQ_LINE
> +#define __KVM_HAVE_READONLY_MEM
>  
>  #define KVM_REG_SIZE(id) \
>   (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index d7424ef..037adda 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -188,6 +188,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>   case KVM_CAP_ONE_REG:
>   case KVM_CAP_ARM_PSCI:
>   case KVM_CAP_ARM_PSCI_0_2:
> + case KVM_CAP_READONLY_MEM:
>   r = 1;
>   break;
>   case KVM_CAP_COALESCED_MMIO:
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 0f6f642..d606d86 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -745,14 +745,13 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
> phys_addr_t *ipap)
>  }
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -   struct kvm_memory_slot *memslot,
> +   struct kvm_memory_slot *memslot, unsigned long hva,
> unsigned long fault_status)
>  {
>   int ret;
>   bool write_fault, writable, hugetlb = false, force_pte = false;
>   unsigned long mmu_seq;
>   gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> - unsigned long hva = gfn_to_hva(vcpu->kvm, gfn);
>   struct kvm *kvm = vcpu->kvm;
>   struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>   struct vm_area_struct *vma;
> @@ -861,7 +860,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   unsigned long fault_status;
>   phys_addr_t fault_ipa;
>   struct kvm_memory_slot *memslot;
> - bool is_iabt;
> + unsigned long hva;
> + bool is_iabt, write_fault, writable;
>   gfn_t gfn;
>   int ret, idx;
>  
> @@ -882,7 +882,10 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
>   gfn = fault_ipa >> PAGE_SHIFT;
> - if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
> + memslot = gfn_to_memslot(vcpu->kvm, gfn);
> + hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
> + write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> + if (kvm_is_error_hva(hva) || (write_fault && !writable)) {

So the consequence of a write to a ROM region would be to do an IO
emulation? That seems a bit weird. Shouldn't we have a separate error
path for this (possibly ignoring the write entierely)?

>   if (is_iabt) {
>   /* Prefetch Abort on I/O address */
>   kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> @@ -908,9 +911,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   goto out_unlock;
>   }
>  
> - memslot = gfn_to_memslot(vcpu->kvm, gfn);
> -
> - ret = user_mem_abort(vcpu, fault_ipa, memslot, fault_status);
> + ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
>   if (ret == 0)
>   ret = 1;
>  out_unlock:
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index e633ff8..f4ec5a6 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -37,6 +37,7 @@
>  

Re: [PATCH] x86: Reset MTRR on vCPU reset

2014-08-14 Thread Alex Williamson
On Thu, 2014-08-14 at 01:44 +0200, Laszlo Ersek wrote:
> On 08/14/14 01:17, Laszlo Ersek wrote:
> 
> > - With KVM, the lack of loading MTRR state from KVM, combined with the
> >   (partial) storing of MTRR state to KVM, has two consequences:
> >   - migration invalidates (loses) MTRR state,
> 
> I'll concede that migration *already* loses MTRR state (on KVM), even
> before your patch. On the incoming host, the difference is that
> pre-patch, the guest continues running (after migration) with MTRRs in
> the "initial" KVM state, while post-patch, the guest continues running
> after an explicit zeroing of the variable MTRR masks and the deftype.
> 
> I admit that it wouldn't be right to say that the patch "causes" MTRR
> state loss.
> 
> With that, I think I've actually convinced myself that your patch is
> correct:
> 
> The x86_cpu_reset() hunk is correct in any case, independently of KVM
> vs. TCG. (On TCG it even improves MTRR conformance.) Splitting that hunk
> into a separate patch might be worthwhile, but not overly important.
> 
> The kvm_put_msrs() hunk forces a zero write to the variable MTRR
> PhysMasks and the DefType, on both reset and on incoming migration. For
> reset, this is correct behavior. For incoming migration, it is not, but
> it certainly shouldn't qualify as a regression, relative to the current
> status (where MTRR state is simply lost and replaced with initial MTRR
> state on the incoming host).
> 
> I think the above "end results" could be expressed more clearly in the
> code, but I'm already wondering if you'll ever talk to me again, so I'm
> willing to give my R-b if you think that's useful... :)

Heh, I think you've highlighted an important point, perhaps several.  I
was assuming my kvm_put_msrs() was only for reset, but it's clearly not.
So I agree that we need both get and put support.  It probably makes
sense to create one patch cleaning up the hardcoded variable register
array vs guest advertised, another implementing the reset path, and a
final one adding KVM get/put.  I'll get started.  Thanks for the review.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-14 Thread Alexander Graf


On 14.08.14 07:13, Aneesh Kumar K.V wrote:

Alexey Kardashevskiy  writes:


fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
functional change but this is not true as it calls get_order() (which
takes bytes) where it should have called ilog2() and the kernel stops
on VM_BUG_ON().

This replaces get_order() with order_base_2() (round-up version of ilog2).

Suggested-by: Paul Mackerras 
Cc: Alexander Graf 
Cc: Aneesh Kumar K.V 
Cc: Joonsoo Kim 
Cc: Benjamin Herrenschmidt 
Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Aneesh Kumar K.V 


So this affects 3.17?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [questions] about using vfio to assign sr-iov vf to vm

2014-08-14 Thread Alex Williamson
On Thu, 2014-08-14 at 16:22 +0800, Zhang Haoyu wrote:
> Hi, all
> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> 82599 PF and its VFs belong to the same iommu_group, but I only want to 
> assign some VFs to one VM, and some other VFs to another VM, ...,
> so how to only unbind (part of) the VFs but PF?
> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the 
> devices which belong to one iommu_group?
> If so, because PF and its VFs belong to the same iommu_group, if I unbind the 
> PF, its VFs also diappeared.
> I think I misunderstand someting,
> any advises?

This occurs when the PF is installed behind components in the system
that do not support PCIe Access Control Services (ACS).  The IOMMU group
contains both the PF and the VF because upstream transactions can be
re-routed downstream by these non-ACS components before being translated
by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
version and we might be able to give you some advise on how to work
around the problem.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NOTICE

2014-08-14 Thread Head Quarter Western Union and Money Gram Transfer
Top of the day to you all from the Head Quarter Western Union and Money
Gram Transfer.

Dear User

This is to inform all our users that the high rate of scam has been coming
so much and we are receive complains from beach Transfer office that our
customer has been send money to differed country as a result of scam. So
we have decided to upgrade our system for security reasons.

Dear customer for security reason if any payment is been made via Western
Union or MoneyGram you are been advise to send us the scan copy of the
payment slip and the information of the payment for verification.

For Western Union or MoneyGarm verification:

Name of sender:
Name of Receiver:
Address of sender:
Address of Receiver:
Amount sent:
MTCN (Money Transfer Control Number) OR Reference number:
Country from which payment was made:
Country from which payment is about to be receive:
Test question and answer if any required:

Once this informations are receive verification take place and we will get
back to you as soon as the payment is been verified.

Thanks for your understanding and we look forward to serve you better.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query: Is it possible to lose interrupts between vhost and virtio_net during migration?

2014-08-14 Thread Michael S. Tsirkin
On Thu, Aug 14, 2014 at 04:52:40PM +0800, Jason Wang wrote:
> On 08/07/2014 08:47 PM, Zhangjie (HZ) wrote:
> > On 2014/8/5 20:14, Zhangjie (HZ) wrote:
> >> On 2014/8/5 17:49, Michael S. Tsirkin wrote:
> >>> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
>  Jason is right, the new order is not the cause of network unreachable.
>  Changing order seems not work. After about 40 times, the problem occurs 
>  again.
>  Maybe there is other hidden reasons for that.
> >> I modified the code to change the order myself yesterday.
> >> This result is about my code.
> >>> To make sure, you tested the patch that I posted to list:
> >>> "vhost_net: stop guest notifiers after backend"?
> >>>
> >>> Please confirm.
> >>>
> >> OK, I will test with your patch "vhost_net: stop guest notifiers after 
> >> backend".
> >>
> > Unfortunately, after using the patch "vhost_net: stop guest notifiers after 
> > backend",
> > Linux VMs stopt themselves a few minutes after they were started.
> >> @@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
> >> *ncs,
> >> goto err;
> >> }
> >>
> >> +r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> >> +if (r < 0) {
> >> +error_report("Error binding guest notifier: %d", -r);
> >> +goto err;
> >> +}
> >> +
> >> for (i = 0; i < total_queues; i++) {
> >> r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
> >>
> >> @@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
> >> *ncs,
> >> }
> >> }
> >>
> >> -r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> >> -if (r < 0) {
> >> -error_report("Error binding guest notifier: %d", -r);
> >> -goto err;
> >> -}
> >> -
> >> return 0;
> > I wonder if k->set_guest_notifiers should be called after "hdev->started = 
> > true;" in vhost_dev_start.
> 
> Michael, can we just remove those assertions? Since you may want to set
> guest notifiers before starting the backend.

Which assertions?

> Another question for virtio_pci_vector_poll(): why not using
> msix_notify() instead of msix_set_pending().

We can do that but the effect will be same since we know
vector is masked.

> If so, there's no need to
> change the vhost_net_start() ?

Confused, don't see the connection.

> Zhang Jie, is this a regression? If yes, could you please do a bisection
> to find the first bad commit.
> 
> Thanks

Pretty sure it's the mq patch: a9f98bb5ebe6fb1869321dcc58e72041ae626ad8

Since we may have many vhost/net devices for a virtio-net device.  The 
setting of
guest notifiers were moved out of the starting/stopping of a specific vhost
thread. The vhost_net_{start|stop}() were renamed to
vhost_net_{start|stop}_one(), and a new vhost_net_{start|stop}() were 
introduced
to configure the guest notifiers and start/stop all vhost/vhost_net devices.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query: Is it possible to lose interrupts between vhost and virtio_net during migration?

2014-08-14 Thread Jason Wang
On 08/07/2014 08:47 PM, Zhangjie (HZ) wrote:
> On 2014/8/5 20:14, Zhangjie (HZ) wrote:
>> On 2014/8/5 17:49, Michael S. Tsirkin wrote:
>>> On Tue, Aug 05, 2014 at 02:29:28PM +0800, Zhangjie (HZ) wrote:
 Jason is right, the new order is not the cause of network unreachable.
 Changing order seems not work. After about 40 times, the problem occurs 
 again.
 Maybe there is other hidden reasons for that.
>> I modified the code to change the order myself yesterday.
>> This result is about my code.
>>> To make sure, you tested the patch that I posted to list:
>>> "vhost_net: stop guest notifiers after backend"?
>>>
>>> Please confirm.
>>>
>> OK, I will test with your patch "vhost_net: stop guest notifiers after 
>> backend".
>>
> Unfortunately, after using the patch "vhost_net: stop guest notifiers after 
> backend",
> Linux VMs stopt themselves a few minutes after they were started.
>> @@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
>> *ncs,
>> goto err;
>> }
>>
>> +r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>> +if (r < 0) {
>> +error_report("Error binding guest notifier: %d", -r);
>> +goto err;
>> +}
>> +
>> for (i = 0; i < total_queues; i++) {
>> r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
>>
>> @@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
>> *ncs,
>> }
>> }
>>
>> -r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
>> -if (r < 0) {
>> -error_report("Error binding guest notifier: %d", -r);
>> -goto err;
>> -}
>> -
>> return 0;
> I wonder if k->set_guest_notifiers should be called after "hdev->started = 
> true;" in vhost_dev_start.

Michael, can we just remove those assertions? Since you may want to set
guest notifiers before starting the backend.

Another question for virtio_pci_vector_poll(): why not using
msix_notify() instead of msix_set_pending(). If so, there's no need to
change the vhost_net_start() ?

Zhang Jie, is this a regression? If yes, could you please do a bisection
to find the first bad commit.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[questions] about using vfio to assign sr-iov vf to vm

2014-08-14 Thread Zhang Haoyu
Hi, all
I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
82599 PF and its VFs belong to the same iommu_group, but I only want to assign 
some VFs to one VM, and some other VFs to another VM, ...,
so how to only unbind (part of) the VFs but PF?
I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices 
which belong to one iommu_group?
If so, because PF and its VFs belong to the same iommu_group, if I unbind the 
PF, its VFs also diappeared.
I think I misunderstand someting,
any advises?

Thanks,
Zhang Haoyu

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-14 Thread Xiao Guangrong

Sorry, the title is not clear enough.

This is the v2 which fixes the issue pointed out by David:
" the generation number actually decreases."

Please review.

On 08/14/2014 03:01 PM, Xiao Guangrong wrote:
> We may cache the current mmio generation number and stale memslot info
> into spte, like this scenarioļ¼š
> 
>CPU 0  CPU 1
> page fault:add a new memslot
> read memslot and detecting its a mmio access
>update memslots
>update generation number
> read generation number
> cache the gpa and current gen number into spte
> 
> So, if guest accesses the gpa later, it will generate a incorrect
> mmio exit
> 
> This patch fixes it by updating the generation number after
> synchronize_srcu_expedited() that makes sure the generation
> number updated only if memslots update is finished
> 
> Cc: sta...@vger.kernel.org
> Cc: David Matlack 
> Signed-off-by: Xiao Guangrong 
> ---
>  virt/kvm/kvm_main.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 33712fb..bb40df3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
> 
>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>  static void update_memslots(struct kvm_memslots *slots,
> - struct kvm_memory_slot *new, u64 last_generation);
> + struct kvm_memory_slot *new);
> 
>  static void kvm_release_pfn_dirty(pfn_t pfn);
>  static void mark_page_dirty_in_slot(struct kvm *kvm,
> @@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>  }
> 
>  static void update_memslots(struct kvm_memslots *slots,
> - struct kvm_memory_slot *new,
> - u64 last_generation)
> + struct kvm_memory_slot *new)
>  {
>   if (new) {
>   int id = new->id;
> @@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
>   if (new->npages != npages)
>   sort_memslots(slots);
>   }
> -
> - slots->generation = last_generation + 1;
>  }
> 
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct 
> kvm *kvm,
>  {
>   struct kvm_memslots *old_memslots = kvm->memslots;
> 
> - update_memslots(slots, new, kvm->memslots->generation);
> + /* ensure generation number is always increased. */
> + slots->generation = old_memslots->generation;
> + update_memslots(slots, new);
>   rcu_assign_pointer(kvm->memslots, slots);
>   synchronize_srcu_expedited(&kvm->srcu);
> + slots->generation++;
> 
>   kvm_arch_memslots_updated(kvm);
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm: x86: fix stale mmio cache bug

2014-08-14 Thread Xiao Guangrong
From: David Matlack 

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

[ xiaoguangrong: adjust the code to make it simpler for stable-tree fix. ]

Cc: sta...@vger.kernel.org
Signed-off-by: David Matlack 
Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c  |  4 ++--
 arch/x86/kvm/mmu.h  |  2 ++
 arch/x86/kvm/x86.h  | 19 +++
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5724601..58fa3ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -481,6 +481,7 @@ struct kvm_vcpu_arch {
u64 mmio_gva;
unsigned access;
gfn_t mmio_gfn;
+   unsigned int mmio_gen;
 
struct kvm_pmu pmu;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..e00fbfe 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -234,7 +234,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
return gen;
 }
 
-static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
/*
 * Init kvm generation close to MMIO_MAX_GEN to easily test the
@@ -3163,7 +3163,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
 
-   vcpu_clear_mmio_info(vcpu, ~0ul);
+   vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
hpa_t root = vcpu->arch.mmu.root_hpa;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b982112..e2d902a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -76,6 +76,8 @@ enum {
 };
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
+unsigned int kvm_current_mmio_generation(struct kvm *kvm);
+
 void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
bool execonly);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 306a1b7..ffd03b7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -3,6 +3,7 @@
 
 #include 
 #include "kvm_cache_regs.h"
+#include "mmu.h"
 
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
@@ -88,15 +89,23 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu 
*vcpu,
vcpu->arch.mmio_gva = gva & PAGE_MASK;
vcpu->arch.access = access;
vcpu->arch.mmio_gfn = gfn;
+   vcpu->arch.mmio_gen = kvm_current_mmio_generation(vcpu->kvm);
+}
+
+static inline bool vcpu_match_mmio_gen(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.mmio_gen == kvm_current_mmio_generation(vcpu->kvm);
 }
 
 /*
  * Clear the mmio cache info for the given gva,
- * specially, if gva is ~0ul, we clear all mmio cache info.
+ * specially, if gva is ~MMIO_GVA_ANY, we clear all mmio cache info.
  */
+#define MMIO_GVA_ANY   ~((gva_t)0)
+
 static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
 {
-   if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
+   if (gva != MMIO_GVA_ANY && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
return;
 
vcpu->arch.mmio_gva = 0;
@@ -104,7 +113,8 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu 
*vcpu, gva_t gva)
 
 static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long 
gva)
 {
-   if (vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
+   if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gva &&
+ vcpu->arch.mmio_gva == (gva & PAGE_MASK))
return true;
 
return false;
@@ -112,7 +122,8 @@ static inline bool vcpu_match_mmio_gva(struct kvm_vcpu 
*vcpu, unsigned long gva)
 
 static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
-   if (vcpu->arch.mmio_gfn && vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
+   if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gfn &&
+ vcpu->arch.mmio_gf

[PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-14 Thread Xiao Guangrong
We may cache the current mmio generation number and stale memslot info
into spte, like this scenarioļ¼š

   CPU 0  CPU 1
page fault:add a new memslot
read memslot and detecting its a mmio access
   update memslots
   update generation number
read generation number
cache the gpa and current gen number into spte

So, if guest accesses the gpa later, it will generate a incorrect
mmio exit

This patch fixes it by updating the generation number after
synchronize_srcu_expedited() that makes sure the generation
number updated only if memslots update is finished

Cc: sta...@vger.kernel.org
Cc: David Matlack 
Signed-off-by: Xiao Guangrong 
---
 virt/kvm/kvm_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..bb40df3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new, u64 last_generation);
+   struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new,
-   u64 last_generation)
+   struct kvm_memory_slot *new)
 {
if (new) {
int id = new->id;
@@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
if (new->npages != npages)
sort_memslots(slots);
}
-
-   slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 {
struct kvm_memslots *old_memslots = kvm->memslots;
 
-   update_memslots(slots, new, kvm->memslots->generation);
+   /* ensure generation number is always increased. */
+   slots->generation = old_memslots->generation;
+   update_memslots(slots, new);
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);
+   slots->generation++;
 
kvm_arch_memslots_updated(kvm);
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html