Re: [Intel-gfx] [PATCH v6 08/17] drm/ttm: use gem vma_node

2019-08-13 Thread Gerd Hoffmann
> Hi Gerd,
> 
> I've been seeing a regression on Nouveau with recent linux-next releases
> and git bisect points at this commit as the first bad one. If I revert
> it (there's a tiny conflict with a patch that was merged subsequently),
> things are back to normal.
> 
> I think the reason for this issue is that Nouveau doesn't use GEM
> objects for all buffer objects,

That shouldn't be a problem ...

> and even when it uses GEM objects, the
> code will not initialize the GEM object until after the buffer objects
> and the backing TTM objects have been created.

... but the initialization order is.

ttm_bo_uses_embedded_gem_object() assumes gem gets initialized first.

drm_gem_object_init() init calling drm_vma_node_reset() again is
probably the root cause for the breakage.

> I tried to fix that by making sure drm_gem_object_init() gets called by
> Nouveau before ttm_bo_init(), but the changes are fairly involved and I
> was unable to get the GEM reference counting right. I can look into the
> proper fix some more, but it might be worth reverting this patch for
> now to get Nouveau working again.

Changing the order doesn't look hard.  Patch attached (untested, have no
test hardware).  But maybe I missed some detail ...

The other patch attached works around the issue with a flag, to avoid
drm_vma_node_reset() being called twice.

cheers,
  Gerd

>From af43f933533140e2df58176a68df0c60ba082273 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann 
Date: Wed, 14 Aug 2019 07:23:31 +0200
Subject: [PATCH 1/2] try unbreak nouveau #1

---
 include/drm/drm_gem.h| 11 +++
 drivers/gpu/drm/drm_gem.c|  6 --
 drivers/gpu/drm/ttm/ttm_bo.c |  3 ++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index ae693c0666cd..24e8fc58a3e1 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -297,6 +297,17 @@ struct drm_gem_object {
 *
 */
const struct drm_gem_object_funcs *funcs;
+
+   /**
+* @ttm_init: indicate ttm has initialized _resv and vma_node fields.
+*
+* ttm_bo_uses_embedded_gem_object() assumes gem is
+* initialized before ttm, nouveau does it the other way
+* around though.
+*
+* This is a temporary stopgap to handle that case.
+*/
+   bool ttm_init;
 };
 
 /**
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index afc38cece3f5..0a75d8cf7ac7 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -159,11 +159,13 @@ void drm_gem_private_object_init(struct drm_device *dev,
kref_init(>refcount);
obj->handle_count = 0;
obj->size = size;
-   reservation_object_init(>_resv);
if (!obj->resv)
obj->resv = >_resv;
 
-   drm_vma_node_reset(>vma_node);
+   if (!obj->ttm_init) {
+   reservation_object_init(>_resv);
+   drm_vma_node_reset(>vma_node);
+   }
 }
 EXPORT_SYMBOL(drm_gem_private_object_init);
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 10a861a1690c..83b389fc117e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -160,7 +160,7 @@ static void ttm_bo_release_list(struct kref *list_kref)
ttm_tt_destroy(bo->ttm);
atomic_dec(>bdev->glob->bo_count);
dma_fence_put(bo->moving);
-   if (!ttm_bo_uses_embedded_gem_object(bo))
+   if (bo->base.ttm_init)
reservation_object_fini(>base._resv);
mutex_destroy(>wu_mutex);
bo->destroy(bo);
@@ -1344,6 +1344,7 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
 */
reservation_object_init(>base._resv);
drm_vma_node_reset(>base.vma_node);
+   bo->base.ttm_init = true;
}
atomic_inc(>bdev->glob->bo_count);
 
-- 
2.18.1

>From 3e36d5819ed5330068340e78c7a1bf35451b1dad Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann 
Date: Wed, 14 Aug 2019 07:41:05 +0200
Subject: [PATCH 2/2] try unbreak nouveau #2

---
 drivers/gpu/drm/nouveau/nouveau_bo.h|  4 ++--
 drivers/gpu/drm/nouveau/dispnv04/crtc.c |  2 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c |  2 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c| 11 ++-
 drivers/gpu/drm/nouveau/nouveau_chan.c  |  2 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c  |  2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c   | 10 +-
 drivers/gpu/drm/nouveau/nouveau_prime.c | 10 +-
 drivers/gpu/drm/nouveau/nv17_fence.c|  2 +-
 drivers/gpu/drm/nouveau/nv50_fence.c|  2 +-
 drivers/gpu/drm/nouveau/nv84_fence.c|  2 +-
 11 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h 
b/drivers/gpu/drm/nouveau/nouveau_bo.h
index d675efe8e7f9..4c268f299226 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -73,7 +73,7 @@ extern struct ttm_bo_driver nouveau_bo_driver;
 void 

Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive

2019-08-13 Thread 冉 jiang

On 2019/8/13 18:55, Michael S. Tsirkin wrote:

On Tue, Jul 23, 2019 at 12:05:03PM +, 冉 jiang wrote:



On 2019/7/20 0:13, Michael S. Tsirkin wrote:


On Fri, Jul 19, 2019 at 03:31:29PM +, 冉 jiang wrote:


On 2019/7/19 22:29, Jiang wrote:


On 2019/7/19 10:36, Jason Wang wrote:


On 2019/7/18 下午10:43, Michael S. Tsirkin wrote:


On Thu, Jul 18, 2019 at 10:42:47AM -0400, Michael S. Tsirkin wrote:


On Thu, Jul 18, 2019 at 10:01:05PM +0800, Jason Wang wrote:


On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:


On Thu, Jul 18, 2019 at 12:55:50PM +, ? jiang wrote:


This change makes ring buffer reclaim threshold num_free
configurable
for better performance, while it's hard coded as 1/2 * queue now.
According to our test with qemu + dpdk, packet dropping happens
when
the guest is not able to provide free buffer in avail ring timely.
Smaller value of num_free does decrease the number of packet
dropping
during our test as it makes virtio_net reclaim buffer earlier.

At least, we should leave the value changeable to user while the
default value as 1/2 * queue is kept.

Signed-off-by: jiangkidd


That would be one reason, but I suspect it's not the
true one. If you need more buffer due to jitter
then just increase the queue size. Would be cleaner.


However are you sure this is the reason for
packet drops? Do you see them dropped by dpdk
due to lack of space in the ring? As opposed to
by guest?




Besides those, this patch depends on the user to choose a suitable
threshold
which is not good. You need either a good value with demonstrated
numbers or
something smarter.

Thanks


I do however think that we have a problem right now: try_fill_recv can
take up a long time during which net stack does not run at all.
Imagine
a 1K queue - we are talking 512 packets. That's exceessive.



Yes, we will starve a fast host in this case.




napi poll
weight solves a similar problem, so it might make sense to cap this at
napi_poll_weight.

Which will allow tweaking it through a module parameter as a
side effect :) Maybe just do NAPI_POLL_WEIGHT.


Or maybe NAPI_POLL_WEIGHT/2 like we do at half the queue ;). Please
experiment, measure performance and let the list know



Need to be careful though: queues can also be small and I don't
think we
want to exceed queue size / 2, or maybe queue size - napi_poll_weight.
Definitely must not exceed the full queue size.



Looking at intel, it uses 16 and i40e uses 32.  It looks to me
NAPI_POLL_WEIGHT/2 is better.

Jiang, want to try that and post a new patch?

Thanks




--
MST


We did have completed several rounds of test with setting the value to
budget (64 as the default value). It does improve a lot with pps is
below 400pps for a single stream. Let me consolidate the data and will
send it soon. Actually, we are confident that it runs out of free
buffer in avail ring when packet dropping happens with below systemtap:

Just a snippet:

probe module("virtio_ring").function("virtqueue_get_buf")
{
 x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)-
(@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one
to verify if the queue is full, which means guest is not able to take
buffer from the queue timely

 if (x<0 && (x+65535)<4096)
 x = x+65535

 if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback ==
callback_addr)
 netrxcount[x] <<< gettimeofday_s()
}


probe module("virtio_ring").function("virtqueue_add_inbuf")
{
 y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)-
(@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one
to verify if we run out of free buffer in avail ring
 if (y<0 && (y+65535)<4096)
 y = y+65535

 if(@2=="debugon")
 {
 if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback ==
callback_addr)
 {
 netrxfreecount[y] <<< gettimeofday_s()

 printf("no avail ring left seen, printing most recent 5
num free, vq: %lx, current index: %d\n", $vq, recentfreecount)
 for(i=recentfreecount; i!=((recentfreecount+4) % 5);
i=((i+1) % 5))
 {
 printf("index: %d, num free: %d\n", i, recentfree[$vq,
i])
 }

 printf("index: %d, num free: %d\n", i, recentfree[$vq, i])
 //exit()
 }
 }
}


probe
module("virtio_net").statement("virtnet_receive@drivers/net/virtio_net.c:732")
{
 recentfreecount++
 recentfreecount = recentfreecount % 5
 recentfree[$rq->vq, recentfreecount] = $rq->vq->num_free --->
record the num_free for the last 5 calls to virtnet_receive, so we can
see if lowering the bar helps.
}


Here is the result:

no avail ring left seen, printing most recent 5 num free, vq:
9c13c120, current index: 1
index: 1, num free: 561
index: 2, num free: 305
index: 3, num free: 369
index: 4, num free: 433
index: 0, num free: 497
no avail ring left seen, printing most 

[PATCH] virtio-net: lower min ring num_free for efficiency

2019-08-13 Thread ? jiang
This change lowers ring buffer reclaim threshold from 1/2*queue to budget
for better performance. According to our test with qemu + dpdk, packet
dropping happens when the guest is not able to provide free buffer in
avail ring timely with default 1/2*queue. The value in the patch has been
tested and does show better performance.

Signed-off-by: jiangkidd 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d4115c9e20b..bc08be7925eb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget,
}
}
 
-   if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
+   if (rq->vq->num_free > min((unsigned int)budget, 
virtqueue_get_vring_size(rq->vq)) / 2) {
if (!try_fill_recv(vi, rq, GFP_ATOMIC))
schedule_delayed_work(>refill, 0);
}
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 01/92] kvm: introduce KVMI (VM introspection subsystem)

2019-08-13 Thread Paolo Bonzini
On 13/08/19 17:01, Sean Christopherson wrote:
>>> It's a bit unclear how, but we'll try to get ride of the refcount object,
>>> which will remove a lot of code, indeed.
>> You can keep it for now.  It may become clearer how to fix it after the
>> event loop is cleaned up.
> By event loop, do you mean the per-vCPU jobs list?

Yes, I meant event handling (which involves the jobs list).

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 06/92] kvm: introspection: add KVMI_CONTROL_CMD_RESPONSE

2019-08-13 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 11:15:34 +0200, Paolo Bonzini  wrote:
> On 09/08/19 17:59, Adalbert Lazăr wrote:
> > +If `now` is 1, the command reply is enabled/disabled (according to
> > +`enable`) starting with the current command. For example, `enable=0`
> > +and `now=1` means that the reply is disabled for this command too,
> > +while `enable=0` and `now=0` means that a reply will be send for this
> > +command, but not for the next ones (until enabled back with another
> > +*KVMI_CONTROL_CMD_RESPONSE*).
> > +
> > +This command is used by the introspection tool to disable the replies
> > +for commands returning an error code only (eg. *KVMI_SET_REGISTERS*)
> > +when an error is less likely to happen. For example, the following
> > +commands can be used to reply to an event with a single `write()` call:
> > +
> > +   KVMI_CONTROL_CMD_RESPONSE enable=0 now=1
> > +   KVMI_SET_REGISTERS vcpu=N
> > +   KVMI_EVENT_REPLY   vcpu=N
> > +   KVMI_CONTROL_CMD_RESPONSE enable=1 now=0
> 
> I don't understand the usage.  Is there any case where you want now == 1
> actually?  Can you just say that KVMI_CONTROL_CMD_RESPONSE never has a
> reply, or to make now==enable?

The enable=1 now=1 is for pause VM:

KVMI_CONTROL_CMD_RESPONSE enable=0 now=1
KVMI_PAUSE_VCPU 0
KVMI_PAUSE_VCPU 1
...
KVMI_CONTROL_CMD_RESPONSE enable=1 now=1

We wait for a reply to make sure the vCPUs were stopped without waiting
for their pause events.

We can get around from userspace, if you like:

KVMI_CONTROL_CMD_RESPONSE enable=0 now=1
KVMI_PAUSE_VCPU 0
KVMI_PAUSE_VCPU 1
...
KVMI_PAUSE_VCPU N-2
KVMI_CONTROL_CMD_RESPONSE enable=1 now=0
KVMI_PAUSE_VCPU N-1

> 
> > +   if (err)
> > +   kvmi_warn(ikvm, "Error code %d discarded for message id %d\n",
> > + err, msg->id);
> > +
> 
> Would it make sense to even close the socket if there is an error?
> 
> Paolo

Sure.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration

2019-08-13 Thread Christoph Hellwig
On Tue, Aug 13, 2019 at 08:57:07AM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
> 
> > What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> > notifier in this series.
> 
> A GUP that can't sleep can't pagefault which makes it a really weird
> pattern

get_user_pages/get_user_pages_fast must not be called under a spinlock.
We have the somewhat misnamed __get_user_page_fast that just does a
lookup for existing pages and never faults for a few places that need
to do that lookup from contexts where we can't sleep.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 70/92] kvm: x86: filter out access rights only when tracked by the introspection tool

2019-08-13 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 11:08:39 +0200, Paolo Bonzini  wrote:
> On 09/08/19 18:00, Adalbert Lazăr wrote:
> > It should complete the commit fd34a9518173 ("kvm: x86: consult the page 
> > tracking from kvm_mmu_get_page() and __direct_map()")
> > 
> > Signed-off-by: Adalbert Lazăr 
> > ---
> >  arch/x86/kvm/mmu.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index 65b6acba82da..fd64cf1115da 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -2660,6 +2660,9 @@ static void clear_sp_write_flooding_count(u64 *spte)
> >  static unsigned int kvm_mmu_page_track_acc(struct kvm_vcpu *vcpu, gfn_t 
> > gfn,
> >unsigned int acc)
> >  {
> > +   if (!kvmi_tracked_gfn(vcpu, gfn))
> > +   return acc;
> > +
> > if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_PREREAD))
> > acc &= ~ACC_USER_MASK;
> > if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_PREWRITE) ||
> > 
> 
> If this patch is always needed, then the function should be named
> something like kvm_mmu_apply_introspection_access and kvmi_tracked_gfn
> should be tested from the moment it is introduced.
> 
> But the commit message says nothing about _why_ it is needed, so I
> cannot guess.  I would very much avoid it however.  Is it just an
> optimization?
> 
> Paolo

We'll retest to see if we still need kvm_mmu_page_track_acc().
The kvmi_tracked_gfn() check was used to keep the KVM code flow
"unchanged" as much as possible. Probably, we can get ride of it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-13 Thread Ram Pai
On Wed, Aug 14, 2019 at 12:24:39AM +1000, David Gibson wrote:
> On Tue, Aug 13, 2019 at 03:26:17PM +0200, Christoph Hellwig wrote:
> > On Mon, Aug 12, 2019 at 07:51:56PM +1000, David Gibson wrote:
> > > AFAICT we already kind of abuse this for the VIRTIO_F_IOMMU_PLATFORM,
> > > because to handle for cases where it *is* a device limitation, we
> > > assume that if the hypervisor presents VIRTIO_F_IOMMU_PLATFORM then
> > > the guest *must* select it.
> > > 
> > > What we actually need here is for the hypervisor to present
> > > VIRTIO_F_IOMMU_PLATFORM as available, but not required.  Then we need
> > > a way for the platform core code to communicate to the virtio driver
> > > that *it* requires the IOMMU to be used, so that the driver can select
> > > or not the feature bit on that basis.
> > 
> > I agree with the above, but that just brings us back to the original
> > issue - the whole bypass of the DMA OPS should be an option that the
> > device can offer, not the other way around.  And we really need to
> > fix that root cause instead of doctoring around it.
> 
> I'm not exactly sure what you mean by "device" in this context.  Do
> you mean the hypervisor (qemu) side implementation?
> 
> You're right that this was the wrong way around to begin with, but as
> well as being hard to change now, I don't see how it really addresses
> the current problem.  The device could default to IOMMU and allow
> bypass, but the driver would still need to get information from the
> platform to know that it *can't* accept that option in the case of a
> secure VM.  Reversed sense, but the same basic problem.
> 
> The hypervisor does not, and can not be aware of the secure VM
> restrictions - only the guest side platform code knows that.

This statement is almost entirely right. I will rephrase it to make it
entirely right.   

The hypervisor does not, and can not be aware of the secure VM
requirement that it needs to do some special processing that has nothing
to do with DMA address translation - only the guest side platform code
know that.

BTW: I do not consider 'bounce buffering' as 'DMA address translation'.
DMA address translation, translates CPU address to DMA address.  Bounce
buffering moves the data from one buffer at a given CPU address to
another buffer at a different CPU address.  Unfortunately the current
DMA ops conflates the two.  The need to do 'DMA address translation' 
is something the device can enforce.  But the need to do bounce
buffering, is something that the device should not be aware and should be
entirely a decision made locally by the kernel/driver in the secure VM.

RP

> 
> -- 
> David Gibson  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ 
> _other_
>   | _way_ _around_!
> http://www.ozlabs.org/~dgibson



-- 
Ram Pai

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 16/92] kvm: introspection: handle events and event replies

2019-08-13 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 10:55:21 +0200, Paolo Bonzini  wrote:
> On 09/08/19 17:59, Adalbert Lazăr wrote:
> > 
> > +reply->padding2);
> > +
> > +   ivcpu->reply_waiting = false;
> > +   return expected->error;
> > +}
> > +
> >  /*
> 
> Is this missing a wakeup?
> 
> >  
> > +static bool need_to_wait(struct kvm_vcpu *vcpu)
> > +{
> > +   struct kvmi_vcpu *ivcpu = IVCPU(vcpu);
> > +
> > +   return ivcpu->reply_waiting;
> > +}
> > +
> 
> Do you actually need this function?  It seems to me that everywhere you
> call it you already have an ivcpu, so you can just access the field.
> 
> Also, "reply_waiting" means "there is a reply that is waiting".  What
> you mean is "waiting_for_reply".

In an older version, handle_event_reply() was executed from the receiving
thread (having another name) and it contained a wakeup function. Now,
indeed, 'waiting_for_reply' is the right name.
 
> The overall structure of the jobs code is confusing.  The same function
> kvm_run_jobs_and_wait is an infinite loop before and gets a "break"
> later.  It is also not clear why kvmi_job_wait is called through a job.
>  Can you have instead just kvm_run_jobs in KVM_REQ_INTROSPECTION, and
> something like this instead when sending an event:
> 
> int kvmi_wait_for_reply(struct kvm_vcpu *vcpu)
> {
>   struct kvmi_vcpu *ivcpu = IVCPU(vcpu);
> 
>   while (ivcpu->waiting_for_reply) {
>   kvmi_run_jobs(vcpu);
> 
>   err = swait_event_killable(*wq,
>   !ivcpu->waiting_for_reply ||
>   !list_empty(>job_list));
> 
>   if (err)
>   return -EINTR;
>   }
> 
>   return 0;
> }
> 
> ?
> 
> Paolo

Much better :) Thank you.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Intel-gfx] [PATCH v6 08/17] drm/ttm: use gem vma_node

2019-08-13 Thread Thierry Reding
On Mon, Aug 05, 2019 at 04:01:10PM +0200, Gerd Hoffmann wrote:
> Drop vma_node from ttm_buffer_object, use the gem struct
> (base.vma_node) instead.
> 
> Signed-off-by: Gerd Hoffmann 
> Reviewed-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
>  drivers/gpu/drm/qxl/qxl_object.h   | 2 +-
>  drivers/gpu/drm/radeon/radeon_object.h | 2 +-
>  drivers/gpu/drm/virtio/virtgpu_drv.h   | 2 +-
>  include/drm/ttm/ttm_bo_api.h   | 4 
>  drivers/gpu/drm/drm_gem_vram_helper.c  | 2 +-
>  drivers/gpu/drm/nouveau/nouveau_display.c  | 2 +-
>  drivers/gpu/drm/nouveau/nouveau_gem.c  | 2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c   | 8 
>  drivers/gpu/drm/ttm/ttm_bo_util.c  | 2 +-
>  drivers/gpu/drm/ttm/ttm_bo_vm.c| 9 +
>  drivers/gpu/drm/virtio/virtgpu_prime.c | 3 ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 4 ++--
>  drivers/gpu/drm/vmwgfx/vmwgfx_surface.c| 4 ++--
>  14 files changed, 21 insertions(+), 27 deletions(-)

Hi Gerd,

I've been seeing a regression on Nouveau with recent linux-next releases
and git bisect points at this commit as the first bad one. If I revert
it (there's a tiny conflict with a patch that was merged subsequently),
things are back to normal.

I think the reason for this issue is that Nouveau doesn't use GEM
objects for all buffer objects, and even when it uses GEM objects, the
code will not initialize the GEM object until after the buffer objects
and the backing TTM objects have been created.

I tried to fix that by making sure drm_gem_object_init() gets called by
Nouveau before ttm_bo_init(), but the changes are fairly involved and I
was unable to get the GEM reference counting right. I can look into the
proper fix some more, but it might be worth reverting this patch for
now to get Nouveau working again.

Thierry

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 645a189d365c..113fb2feb437 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -191,7 +191,7 @@ static inline unsigned 
> amdgpu_bo_gpu_page_alignment(struct amdgpu_bo *bo)
>   */
>  static inline u64 amdgpu_bo_mmap_offset(struct amdgpu_bo *bo)
>  {
> - return drm_vma_node_offset_addr(>tbo.vma_node);
> + return drm_vma_node_offset_addr(>tbo.base.vma_node);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/qxl/qxl_object.h 
> b/drivers/gpu/drm/qxl/qxl_object.h
> index b812d4ae9d0d..8ae54ba7857c 100644
> --- a/drivers/gpu/drm/qxl/qxl_object.h
> +++ b/drivers/gpu/drm/qxl/qxl_object.h
> @@ -60,7 +60,7 @@ static inline unsigned long qxl_bo_size(struct qxl_bo *bo)
>  
>  static inline u64 qxl_bo_mmap_offset(struct qxl_bo *bo)
>  {
> - return drm_vma_node_offset_addr(>tbo.vma_node);
> + return drm_vma_node_offset_addr(>tbo.base.vma_node);
>  }
>  
>  static inline int qxl_bo_wait(struct qxl_bo *bo, u32 *mem_type,
> diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
> b/drivers/gpu/drm/radeon/radeon_object.h
> index 9ffd8215d38a..e5554bf9140e 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.h
> +++ b/drivers/gpu/drm/radeon/radeon_object.h
> @@ -116,7 +116,7 @@ static inline unsigned 
> radeon_bo_gpu_page_alignment(struct radeon_bo *bo)
>   */
>  static inline u64 radeon_bo_mmap_offset(struct radeon_bo *bo)
>  {
> - return drm_vma_node_offset_addr(>tbo.vma_node);
> + return drm_vma_node_offset_addr(>tbo.base.vma_node);
>  }
>  
>  extern int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index f4ecea6054ba..e28829661724 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -396,7 +396,7 @@ static inline void virtio_gpu_object_unref(struct 
> virtio_gpu_object **bo)
>  
>  static inline u64 virtio_gpu_object_mmap_offset(struct virtio_gpu_object *bo)
>  {
> - return drm_vma_node_offset_addr(>tbo.vma_node);
> + return drm_vma_node_offset_addr(>tbo.base.vma_node);
>  }
>  
>  static inline int virtio_gpu_object_reserve(struct virtio_gpu_object *bo,
> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
> index fa050f0328ab..7ffc50a3303d 100644
> --- a/include/drm/ttm/ttm_bo_api.h
> +++ b/include/drm/ttm/ttm_bo_api.h
> @@ -152,7 +152,6 @@ struct ttm_tt;
>   * @ddestroy: List head for the delayed destroy list.
>   * @swap: List head for swap LRU list.
>   * @moving: Fence set when BO is moving
> - * @vma_node: Address space manager node.
>   * @offset: The current GPU offset, which can have different meanings
>   * depending on the memory type. For SYSTEM type memory, it should be 0.
>   * @cur_placement: Hint of current placement.
> @@ -219,9 +218,6 @@ struct ttm_buffer_object {
>*/
>  
>   struct dma_fence *moving;
> -
> - struct drm_vma_offset_node vma_node;
> -
>   unsigned 

Re: [RFC PATCH v6 01/92] kvm: introduce KVMI (VM introspection subsystem)

2019-08-13 Thread Sean Christopherson
On Tue, Aug 13, 2019 at 02:09:51PM +0200, Paolo Bonzini wrote:
> On 13/08/19 13:57, Adalbert Lazăr wrote:
> >> The refcounting approach seems a bit backwards, and AFAICT is driven by
> >> implementing unhook via a message, which also seems backwards.  I assume
> >> hook and unhook are relatively rare events and not performance critical,
> >> so make those the restricted/slow flows, e.g. force userspace to quiesce
> >> the VM by making unhook() mutually exclusive with every vcpu ioctl() and
> >> maybe anything that takes kvm->lock. 
> >>
> >> Then kvmi_ioctl_unhook() can use thread_stop() and kvmi_recv() just needs
> >> to check kthread_should_stop().
> >>
> >> That way kvmi doesn't need to be refcounted since it's guaranteed to be
> >> alive if the pointer is non-null.  Eliminating the refcounting will clean
> >> up a lot of the code by eliminating calls to kvmi_{get,put}(), e.g.
> >> wrappers like kvmi_breakpoint_event() just check vcpu->kvmi, or maybe
> >> even get dropped altogether.
> > 
> > The unhook event has been added to cover the following case: while the
> > introspection tool runs in another VM, both VMs, the virtual appliance
> > and the introspected VM, could be paused by the user. We needed a way
> > to signal this to the introspection tool and give it time to unhook
> > (the introspected VM has to run and execute the introspection commands
> > during this phase). The receiving threads quits when the socket is closed
> > (by QEMU or by the introspection tool).

Why does closing the socket require destroying the kvmi object?  E.g. can
it be marked as defunct or whatever and only fully removed on a synchronous
unhook from userspace?  Re-hooking could either require said unhook, or
maybe reuse the existing kvmi object with a new socket.

> > It's a bit unclear how, but we'll try to get ride of the refcount object,
> > which will remove a lot of code, indeed.
> 
> You can keep it for now.  It may become clearer how to fix it after the
> event loop is cleaned up.

By event loop, do you mean the per-vCPU jobs list?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-13 Thread David Gibson
On Tue, Aug 13, 2019 at 03:26:17PM +0200, Christoph Hellwig wrote:
> On Mon, Aug 12, 2019 at 07:51:56PM +1000, David Gibson wrote:
> > AFAICT we already kind of abuse this for the VIRTIO_F_IOMMU_PLATFORM,
> > because to handle for cases where it *is* a device limitation, we
> > assume that if the hypervisor presents VIRTIO_F_IOMMU_PLATFORM then
> > the guest *must* select it.
> > 
> > What we actually need here is for the hypervisor to present
> > VIRTIO_F_IOMMU_PLATFORM as available, but not required.  Then we need
> > a way for the platform core code to communicate to the virtio driver
> > that *it* requires the IOMMU to be used, so that the driver can select
> > or not the feature bit on that basis.
> 
> I agree with the above, but that just brings us back to the original
> issue - the whole bypass of the DMA OPS should be an option that the
> device can offer, not the other way around.  And we really need to
> fix that root cause instead of doctoring around it.

I'm not exactly sure what you mean by "device" in this context.  Do
you mean the hypervisor (qemu) side implementation?

You're right that this was the wrong way around to begin with, but as
well as being hard to change now, I don't see how it really addresses
the current problem.  The device could default to IOMMU and allow
bypass, but the driver would still need to get information from the
platform to know that it *can't* accept that option in the case of a
secure VM.  Reversed sense, but the same basic problem.

The hypervisor does not, and can not be aware of the secure VM
restrictions - only the guest side platform code knows that.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 14/92] kvm: introspection: handle introspection commands before returning to guest

2019-08-13 Thread Paolo Bonzini
On 13/08/19 15:54, Adalbert Lazăr wrote:
> Leaving kvm_vcpu_block() in order to handle a request such as 'pause',
> would cause the vCPU to enter the guest when resumed. Most of the
> time this does not appear to be an issue, but during early boot it
> can happen for a non-boot vCPU to start executing code from areas that
> first needed to be set up by vCPU #0.
> 
> In a particular case, vCPU #1 executed code which resided in an area
> not covered by a memslot, which caused an EPT violation that got
> turned in mmu_set_spte() into a MMIO request that required emulation.
> Unfortunatelly, the emulator tripped, exited to userspace and the VM
> was aborted.

Okay, this makes sense.  Maybe you want to handle KVM_REQ_INTROSPECTION
in vcpu_run rather than vcpu_enter_guest?

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 75/92] kvm: x86: disable gpa_available optimization in emulator_read_write_onepage()

2019-08-13 Thread Paolo Bonzini
On 13/08/19 16:33, Adalbert Lazăr wrote:
> On Tue, 13 Aug 2019 10:47:34 +0200, Paolo Bonzini  wrote:
>> On 09/08/19 18:00, Adalbert Lazăr wrote:
>>> If the EPT violation was caused by an execute restriction imposed by the
>>> introspection tool, gpa_available will point to the instruction pointer,
>>> not the to the read/write location that has to be used to emulate the
>>> current instruction.
>>>
>>> This optimization should be disabled only when the VM is introspected,
>>> not just because the introspection subsystem is present.
>>>
>>> Signed-off-by: Adalbert Lazăr 
>>
>> The right thing to do is to not set gpa_available for fetch failures in 
>> kvm_mmu_page_fault instead:
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 24843cf49579..1bdca40fa831 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -5364,8 +5364,12 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t 
>> cr2, u64 error_code,
>>  enum emulation_result er;
>>  bool direct = vcpu->arch.mmu->direct_map;
>>  
>> -/* With shadow page tables, fault_address contains a GVA or nGPA.  */
>> -if (vcpu->arch.mmu->direct_map) {
>> +/*
>> + * With shadow page tables, fault_address contains a GVA or nGPA.
>> + * On a fetch fault, fault_address contains the instruction pointer.
>> + */
>> +if (vcpu->arch.mmu->direct_map &&
>> +likely(!(error_code & PFERR_FETCH_MASK)) {
>>  vcpu->arch.gpa_available = true;
>>  vcpu->arch.gpa_val = cr2;
>>  }
>
> Sure, but I think we'll have to extend the check.
> 
> Searching the logs I've found:
> 
> kvm/x86: re-translate broken translation that caused EPT violation
> 
> Signed-off-by: Mircea Cirjaliu 
> 
>  arch/x86/kvm/x86.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> /home/b/kvmi@9cad844~1/arch/x86/kvm/x86.c:4757,4762 - 
> /home/b/kvmi@9cad844/arch/x86/kvm/x86.c:4757,4763
>*/
>   if (vcpu->arch.gpa_available &&
>   emulator_can_use_gpa(ctxt) &&
> + (vcpu->arch.error_code & PFERR_GUEST_FINAL_MASK) &&
>   (addr & ~PAGE_MASK) == (vcpu->arch.gpa_val & ~PAGE_MASK)) {
>   gpa = vcpu->arch.gpa_val;
>   ret = vcpu_is_mmio_gpa(vcpu, addr, gpa, write);
> 

Yes, adding that check makes sense as well (still in kvm_mmu_page_fault).

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 75/92] kvm: x86: disable gpa_available optimization in emulator_read_write_onepage()

2019-08-13 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 10:47:34 +0200, Paolo Bonzini  wrote:
> On 09/08/19 18:00, Adalbert Lazăr wrote:
> > If the EPT violation was caused by an execute restriction imposed by the
> > introspection tool, gpa_available will point to the instruction pointer,
> > not the to the read/write location that has to be used to emulate the
> > current instruction.
> > 
> > This optimization should be disabled only when the VM is introspected,
> > not just because the introspection subsystem is present.
> > 
> > Signed-off-by: Adalbert Lazăr 
> 
> The right thing to do is to not set gpa_available for fetch failures in 
> kvm_mmu_page_fault instead:
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 24843cf49579..1bdca40fa831 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -5364,8 +5364,12 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t 
> cr2, u64 error_code,
>   enum emulation_result er;
>   bool direct = vcpu->arch.mmu->direct_map;
>  
> - /* With shadow page tables, fault_address contains a GVA or nGPA.  */
> - if (vcpu->arch.mmu->direct_map) {
> + /*
> +  * With shadow page tables, fault_address contains a GVA or nGPA.
> +  * On a fetch fault, fault_address contains the instruction pointer.
> +  */
> + if (vcpu->arch.mmu->direct_map &&
> + likely(!(error_code & PFERR_FETCH_MASK)) {
>   vcpu->arch.gpa_available = true;
>   vcpu->arch.gpa_val = cr2;
>   }
> 
> 
> Paolo
> 
> > ---
> >  arch/x86/kvm/x86.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 965c4f0108eb..3975331230b9 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5532,7 +5532,7 @@ static int emulator_read_write_onepage(unsigned long 
> > addr, void *val,
> >  * operation using rep will only have the initial GPA from the NPF
> >  * occurred.
> >  */
> > -   if (vcpu->arch.gpa_available &&
> > +   if (vcpu->arch.gpa_available && !kvmi_is_present() &&
> > emulator_can_use_gpa(ctxt) &&
> > (addr & ~PAGE_MASK) == (vcpu->arch.gpa_val & ~PAGE_MASK)) {
> > gpa = vcpu->arch.gpa_val;
> > 
> 

Sure, but I think we'll have to extend the check.

Searching the logs I've found:

kvm/x86: re-translate broken translation that caused EPT violation

Signed-off-by: Mircea Cirjaliu 

 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

/home/b/kvmi@9cad844~1/arch/x86/kvm/x86.c:4757,4762 - 
/home/b/kvmi@9cad844/arch/x86/kvm/x86.c:4757,4763
 */
if (vcpu->arch.gpa_available &&
emulator_can_use_gpa(ctxt) &&
+   (vcpu->arch.error_code & PFERR_GUEST_FINAL_MASK) &&
(addr & ~PAGE_MASK) == (vcpu->arch.gpa_val & ~PAGE_MASK)) {
gpa = vcpu->arch.gpa_val;
ret = vcpu_is_mmio_gpa(vcpu, addr, gpa, write);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 02/92] kvm: introspection: add basic ioctls (hook/unhook)

2019-08-13 Thread Adalbert Lazăr
We'll do.

On Tue, 13 Aug 2019 10:44:28 +0200, Paolo Bonzini  wrote:
> On 09/08/19 17:59, Adalbert Lazăr wrote:
> > +static int kvmi_recv(void *arg)
> > +{
> > +   struct kvmi *ikvm = arg;
> > +
> > +   kvmi_info(ikvm, "Hooking VM\n");
> > +
> > +   while (kvmi_msg_process(ikvm))
> > +   ;
> > +
> > +   kvmi_info(ikvm, "Unhooking VM\n");
> > +
> > +   kvmi_end_introspection(ikvm);
> > +
> > +   return 0;
> > +}
> > +
> 
> Rename this to kvmi_recv_thread instead, please.
> 
> > +
> > +   /*
> > +* Make sure all the KVM/KVMI structures are linked and no pointer
> > +* is read as NULL after the reference count has been set.
> > +*/
> > +   smp_mb__before_atomic();
> 
> This is an smp_wmb(), not an smp_mb__before_atomic().  Add a comment
> that it pairs with the refcount_inc_not_zero in kvmi_get.
> 
> > +   refcount_set(>kvmi_ref, 1);
> > +
> 
> 
> > @@ -57,8 +183,27 @@ void kvmi_destroy_vm(struct kvm *kvm)
> > if (!ikvm)
> > return;
> >  
> > +   /* trigger socket shutdown - kvmi_recv() will start shutdown process */
> > +   kvmi_sock_shutdown(ikvm);
> > +
> > kvmi_put(kvm);
> >  
> > /* wait for introspection resources to be released */
> > wait_for_completion_killable(>kvmi_completed);
> >  }
> > +
> 
> This addition means that kvmi_destroy_vm should have called
> kvmi_end_introspection instead.  In patch 1, kvmi_end_introspection
> should have been just kvmi_put, now this patch can add kvmi_sock_shutdown.
> 
> Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 13/92] kvm: introspection: make the vCPU wait even when its jobs list is empty

2019-08-13 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 10:43:52 +0200, Paolo Bonzini  wrote:
> On 09/08/19 17:59, Adalbert Lazăr wrote:
> > +void kvmi_handle_requests(struct kvm_vcpu *vcpu)
> > +{
> > +   struct kvmi *ikvm;
> > +
> > +   ikvm = kvmi_get(vcpu->kvm);
> > +   if (!ikvm)
> > +   return;
> > +
> > +   for (;;) {
> > +   int err = kvmi_run_jobs_and_wait(vcpu);
> > +
> > +   if (err)
> > +   break;
> > +   }
> > +
> > +   kvmi_put(vcpu->kvm);
> > +}
> > +
> 
> Using kvmi_run_jobs_and_wait from two places (here and kvmi_send_event)
> is very confusing.  Does kvmi_handle_requests need to do this, or can it
> just use kvmi_run_jobs?

I think I've added this wait to block vCPUs during single-step.
A 'wait_until_single_step_finished' job will do, I guess, so we could
use kvmi_run_jobs() here.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4 0/2] virtio_console: fix replug of virtio console port

2019-08-13 Thread Pankaj Gupta
This patch series fixes the issue with unplug/replug of a port in virtio
console driver which fails with an error "Error allocating inbufs\n".

Patch 1 updates the next avail index for packed ring code.
Patch 2 makes use of 'virtqueue_detach_unused_buf' function to detach
the unused buffers during port hotunplug time.

Tested the packed ring code with the qemu virtio 1.1 device code posted
here [1]. Also, sent a patch to document the behavior in virtio spec as
suggested by Michael.

Changes from v3
- Swap patch 1 with patch 2  - [Michael]
- Added acked-by tag by Jason in patch 1
- Add reference to spec change
Changes from v2
- Better change log in patch2 - [Greg]
Changes from v1[2]
- Make virtio packed ring code compatible with split ring - [Michael]

[1]  https://marc.info/?l=qemu-devel=156471883703948=2
[2]  https://lkml.org/lkml/2019/3/4/517
[3]  https://lists.oasis-open.org/archives/virtio-dev/201908/msg00055.html

Pankaj Gupta (2):
  virtio: decrement avail idx with buffer detach for packed ring
  virtio_console: free unused buffers with port delete

 char/virtio_console.c |   14 +++---
 virtio/virtio_ring.c  |6 ++
 2 files changed, 17 insertions(+), 3 deletions(-)
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 2/2] virtio_console: free unused buffers with port delete

2019-08-13 Thread Pankaj Gupta
The commit a7a69ec0d8e4 ("virtio_console: free buffers after reset")
deferred detaching of unused buffer to virtio device unplug time.
This causes unplug/replug of single port in virtio device with an
error "Error allocating inbufs\n". As we don't free the unused buffers
attached with the port. Re-plug the same port tries to allocate new
buffers in virtqueue and results in this error if queue is full.
This patch reverts this commit by removing the unused buffers in vq's
when we unplug the port.

Reported-by: Xiaohui Li 
Cc: sta...@vger.kernel.org
Signed-off-by: Pankaj Gupta 
---
 drivers/char/virtio_console.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7270e7b69262..e8be82f1bae9 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1494,15 +1494,25 @@ static void remove_port(struct kref *kref)
kfree(port);
 }
 
+static void remove_unused_bufs(struct virtqueue *vq)
+{
+   struct port_buffer *buf;
+
+   while ((buf = virtqueue_detach_unused_buf(vq)))
+   free_buf(buf, true);
+}
+
 static void remove_port_data(struct port *port)
 {
spin_lock_irq(>inbuf_lock);
/* Remove unused data this port might have received. */
discard_port_data(port);
+   remove_unused_bufs(port->in_vq);
spin_unlock_irq(>inbuf_lock);
 
spin_lock_irq(>outvq_lock);
reclaim_consumed_buffers(port);
+   remove_unused_bufs(port->out_vq);
spin_unlock_irq(>outvq_lock);
 }
 
@@ -1938,11 +1948,9 @@ static void remove_vqs(struct ports_device *portdev)
struct virtqueue *vq;
 
virtio_device_for_each_vq(portdev->vdev, vq) {
-   struct port_buffer *buf;
 
flush_bufs(vq, true);
-   while ((buf = virtqueue_detach_unused_buf(vq)))
-   free_buf(buf, true);
+   remove_unused_bufs(vq);
}
portdev->vdev->config->del_vqs(portdev->vdev);
kfree(portdev->in_vqs);
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/2] virtio: decrement avail idx with buffer detach for packed ring

2019-08-13 Thread Pankaj Gupta
This patch decrements 'next_avail_idx' count when detaching a buffer
from vq for packed ring code. Split ring code already does this in
virtqueue_detach_unused_buf_split function. This updates the
'next_avail_idx' to the previous correct index after an unused buffer
is detatched from the vq.

Acked-by: Jason Wang 
Signed-off-by: Pankaj Gupta 
---
 drivers/virtio/virtio_ring.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4f5b55..7c69181113e2 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1537,6 +1537,12 @@ static void *virtqueue_detach_unused_buf_packed(struct 
virtqueue *_vq)
/* detach_buf clears data, so grab it now. */
buf = vq->packed.desc_state[i].data;
detach_buf_packed(vq, i, NULL);
+   vq->packed.next_avail_idx--;
+   if (vq->packed.next_avail_idx < 0) {
+   vq->packed.next_avail_idx = vq->packed.vring.num - 1;
+   vq->packed.avail_wrap_counter ^= 1;
+   }
+
END_USE(vq);
return buf;
}
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 14/92] kvm: introspection: handle introspection commands before returning to guest

2019-08-13 Thread Adalbert Lazăr
On Tue, 13 Aug 2019 10:26:29 +0200, Paolo Bonzini  wrote:
> On 09/08/19 17:59, Adalbert Lazăr wrote:
> > +   prepare_to_swait_exclusive(>wq, ,
> > +  TASK_INTERRUPTIBLE);
> > +
> > +   if (kvm_vcpu_check_block(vcpu) < 0)
> > +   break;
> > +
> > +   waited = true;
> > +   schedule();
> > +
> > +   if (kvm_check_request(KVM_REQ_INTROSPECTION, vcpu)) {
> > +   do_kvmi_work = true;
> > +   break;
> > +   }
> > +   }
> >  
> > -   waited = true;
> > -   schedule();
> > +   finish_swait(>wq, );
> > +
> > +   if (do_kvmi_work)
> > +   kvmi_handle_requests(vcpu);
> > +   else
> > +   break;
> > }
> 
> Is this needed?  Or can it just go back to KVM_RUN and handle
> KVM_REQ_INTROSPECTION there (in which case it would be basically
> premature optimization)?
> 

It might still be needed, unless we can get back to this function.

The original commit message for this change was this:

kvm: do not abort kvm_vcpu_block() in order to handle KVMI requests

Leaving kvm_vcpu_block() in order to handle a request such as 'pause',
would cause the vCPU to enter the guest when resumed. Most of the
time this does not appear to be an issue, but during early boot it
can happen for a non-boot vCPU to start executing code from areas that
first needed to be set up by vCPU #0.

In a particular case, vCPU #1 executed code which resided in an area
not covered by a memslot, which caused an EPT violation that got
turned in mmu_set_spte() into a MMIO request that required emulation.
Unfortunatelly, the emulator tripped, exited to userspace and the VM
was aborted.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-08-13 Thread Christoph Hellwig
On Mon, Aug 12, 2019 at 07:51:56PM +1000, David Gibson wrote:
> AFAICT we already kind of abuse this for the VIRTIO_F_IOMMU_PLATFORM,
> because to handle for cases where it *is* a device limitation, we
> assume that if the hypervisor presents VIRTIO_F_IOMMU_PLATFORM then
> the guest *must* select it.
> 
> What we actually need here is for the hypervisor to present
> VIRTIO_F_IOMMU_PLATFORM as available, but not required.  Then we need
> a way for the platform core code to communicate to the virtio driver
> that *it* requires the IOMMU to be used, so that the driver can select
> or not the feature bit on that basis.

I agree with the above, but that just brings us back to the original
issue - the whole bypass of the DMA OPS should be an option that the
device can offer, not the other way around.  And we really need to
fix that root cause instead of doctoring around it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-08-13 Thread Christoph Hellwig
On Tue, Aug 13, 2019 at 08:09:26PM +0800, Tom Murphy wrote:
> Hi Christoph,
> 
> I quit my job and am having a great time traveling South East Asia.

Enjoy!  I just returned from my vacation.

> I definitely don't want this work to go to waste and I hope to repost it
> later this week but I can't guarantee it.
> 
> Let me know if you need this urgently.

It isn't in any strict sense urgent.  I just have various DMA API plans
that I'd rather just implement in dma-direct and dma-iommu rather than
also in two additional commonly used iommu drivers.  So on the one had
the sooner the better, on the other hand no real urgency.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 64/92] kvm: introspection: add single-stepping

2019-08-13 Thread Adalbert Lazăr
On Mon, 12 Aug 2019 13:50:39 -0700, Sean Christopherson 
 wrote:
> On Fri, Aug 09, 2019 at 07:00:19PM +0300, Adalbert Lazăr wrote:
> > From: Nicușor Cîțu 
> > 
> > This would be used either if the introspection tool request it as a
> > reply to a KVMI_EVENT_PF event or to cope with instructions that cannot
> > be handled by the x86 emulator during the handling of a VMEXIT. In
> > these situations, all other vCPU-s are kicked and held, the EPT-based
> > protection is removed and the guest is single stepped by the vCPU that
> > triggered the initial VMEXIT. Upon completion the EPT-base protection
> > is reinstalled and all vCPU-s all allowed to return to the guest.
> > 
> > This is a rather slow workaround that kicks in occasionally. In the
> > future, the most frequently single-stepped instructions should be added
> > to the emulator (usually, stores to and from memory - SSE/AVX).
> > 
> > For the moment it works only on Intel.
> > 
> > CC: Jim Mattson 
> > CC: Sean Christopherson 
> > CC: Joerg Roedel 
> > Signed-off-by: Nicușor Cîțu 
> > Co-developed-by: Mihai Donțu 
> > Signed-off-by: Mihai Donțu 
> > Co-developed-by: Adalbert Lazăr 
> > Signed-off-by: Adalbert Lazăr 
> > ---
> >  arch/x86/include/asm/kvm_host.h |   3 +
> >  arch/x86/kvm/kvmi.c |  47 ++-
> >  arch/x86/kvm/svm.c  |   5 ++
> >  arch/x86/kvm/vmx/vmx.c  |  17 
> >  arch/x86/kvm/x86.c  |  19 +
> >  include/linux/kvmi.h|   4 +
> >  virt/kvm/kvmi.c | 145 +++-
> >  virt/kvm/kvmi_int.h |  16 
> >  8 files changed, 253 insertions(+), 3 deletions(-)
> > 

[...] We'll do.

> > diff --git a/virt/kvm/kvmi_int.h b/virt/kvm/kvmi_int.h
> > index d7f9858d3e97..1550fe33ed48 100644
> > --- a/virt/kvm/kvmi_int.h
> > +++ b/virt/kvm/kvmi_int.h
> > @@ -126,6 +126,9 @@ struct kvmi_vcpu {
> > DECLARE_BITMAP(high, KVMI_NUM_MSR);
> > } msr_mask;
> >  
> > +   bool ss_owner;
> 
> Why is single-stepping mutually exclusive across all vCPUs?  Does that
> always have to be the case?

I never thought to single-step multiple vCPUs in the same time.

If one vCPU will relax the access to a guest page while a second one,
finishing single-stepping, restores the 'r--' flags, the first one
will get another page fault and relax the page access again. It might
be doable, but before starting single-stepping a vCPU we might replace
guest memory (as requested by the introspection tool) and we will have
to use a lock for this.

However, we would like to use alternate EPT views with single-step.
So, we might replace this patch.

> > +   bool ss_requested;
> > +
> > struct list_head job_list;
> > spinlock_t job_lock;
> >  
> > @@ -151,6 +154,15 @@ struct kvmi {
> > DECLARE_BITMAP(event_allow_mask, KVMI_NUM_EVENTS);
> > DECLARE_BITMAP(vm_ev_mask, KVMI_NUM_EVENTS);
> >  
> > +#define SINGLE_STEP_MAX_DEPTH 8
> > +   struct {
> > +   gfn_t gfn;
> > +   u8 old_access;
> > +   u32 old_write_bitmap;
> > +   } ss_context[SINGLE_STEP_MAX_DEPTH];
> > +   u8 ss_level;
> > +   atomic_t ss_active;
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 01/92] kvm: introduce KVMI (VM introspection subsystem)

2019-08-13 Thread Paolo Bonzini
On 13/08/19 13:57, Adalbert Lazăr wrote:
>> The refcounting approach seems a bit backwards, and AFAICT is driven by
>> implementing unhook via a message, which also seems backwards.  I assume
>> hook and unhook are relatively rare events and not performance critical,
>> so make those the restricted/slow flows, e.g. force userspace to quiesce
>> the VM by making unhook() mutually exclusive with every vcpu ioctl() and
>> maybe anything that takes kvm->lock. 
>>
>> Then kvmi_ioctl_unhook() can use thread_stop() and kvmi_recv() just needs
>> to check kthread_should_stop().
>>
>> That way kvmi doesn't need to be refcounted since it's guaranteed to be
>> alive if the pointer is non-null.  Eliminating the refcounting will clean
>> up a lot of the code by eliminating calls to kvmi_{get,put}(), e.g.
>> wrappers like kvmi_breakpoint_event() just check vcpu->kvmi, or maybe
>> even get dropped altogether.
> 
> The unhook event has been added to cover the following case: while the
> introspection tool runs in another VM, both VMs, the virtual appliance
> and the introspected VM, could be paused by the user. We needed a way
> to signal this to the introspection tool and give it time to unhook
> (the introspected VM has to run and execute the introspection commands
> during this phase). The receiving threads quits when the socket is closed
> (by QEMU or by the introspection tool).
> 
> It's a bit unclear how, but we'll try to get ride of the refcount object,
> which will remove a lot of code, indeed.

You can keep it for now.  It may become clearer how to fix it after the
event loop is cleaned up.

>>
>>> +void kvmi_create_vm(struct kvm *kvm)
>>> +{
>>> +   init_completion(>kvmi_completed);
>>> +   complete(>kvmi_completed);
>> Pretty sure you don't want to be calling complete() here.
> The intention was to stop the hooking ioctl until the VM is
> created. A better name for 'kvmi_completed' would have been
> 'ready_to_be_introspected', as kvmi_hook() will wait for it.
> 
> We'll see how we can get ride of the completion object.

The ioctls are not accessible while kvm_create_vm runs (only after
kvm_dev_ioctl_create_vm calls fd_install).  Even if it were, however,
you should have placed init_completion much earlier, otherwise
wait_for_completion would access uninitialized memory.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-08-13 Thread Tom Murphy
Hi Christoph,

I quit my job and am having a great time traveling South East Asia.

I definitely don't want this work to go to waste and I hope to repost it
later this week but I can't guarantee it.

Let me know if you need this urgently.

Thanks,
Tom

On Sat 10 Aug 2019, 3:20 p.m. Christoph Hellwig,  wrote:

> On Sun, Jun 23, 2019 at 11:19:45PM -0700, Christoph Hellwig wrote:
> > Tom,
> >
> > next time please cc Jerg as the AMD IOMMU maintainer.
> >
> > Joerg, any chance you could review this?  Toms patches to convert the
> > AMD and Intel IOMMU drivers to the dma-iommu code are going to make my
> > life in DMA land significantly easier, so I have a vested interest
> > in this series moving forward :)
>
> Tom, can you repost the series?  Seems like there hasn't been any
> news for a month.
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: DANGER WILL ROBINSON, DANGER

2019-08-13 Thread Paolo Bonzini
On 13/08/19 13:24, Matthew Wilcox wrote:
>>>
>>> This is an awfully big patch to the memory management code, buried in
>>> the middle of a gigantic series which almost guarantees nobody would
>>> look at it.  I call shenanigans.
>> Are you calling shenanigans on the patch submitter (which is gratuitous)
>> or on the KVM maintainers/reviewers?
>
> On the patch submitter, of course.  How can I possibly be criticising you
> for something you didn't do?

No idea.  "Nobody would look at it" definitely includes me though.

In any case, water under the bridge.  The submitter did duly mark the
series as RFC, I don't see anything wrong in what he did apart from not
having testcases. :)

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration

2019-08-13 Thread Jason Gunthorpe
On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:

> What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> notifier in this series.

A GUP that can't sleep can't pagefault which makes it a really weird
pattern

> Btw, back to the original question. May I know why synchronize_rcu() is not
> suitable? Consider:

We already went over this. You'd need to determine it doesn't somehow
deadlock the mm on reclaim paths. Maybe it is OK, the rcq_gq_wq is
marked WQ_MEM_RECLAIM at least..

I also think Michael was concerned about the latency spikes a long RCU
delay would cause.

Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 01/92] kvm: introduce KVMI (VM introspection subsystem)

2019-08-13 Thread Adalbert Lazăr
On Mon, 12 Aug 2019 13:20:30 -0700, Sean Christopherson 
 wrote:
> On Fri, Aug 09, 2019 at 06:59:16PM +0300, Adalbert Lazăr wrote:
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index 72fa955f4a15..f70a6a1b6814 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -96,6 +96,13 @@ config KVM_MMU_AUDIT
> >  This option adds a R/W kVM module parameter 'mmu_audit', which allows
> >  auditing of KVM MMU events at runtime.
> >  
> > +config KVM_INTROSPECTION
> > +   bool "VM Introspection"
> > +   depends on KVM && (KVM_INTEL || KVM_AMD)
> > +   help
> > +This option enables functions to control the execution of VM-s, query
> > +the state of the vCPU-s (GPR-s, MSR-s etc.).
> 
> This does a lot more than enable functions, it allows userspace to do all
> of these things *while the VM is running*.  Everything above can already
> be done by userspace.

First of all, thanks for helping us with this patch series.

Do you mean something like this?

This option enables an introspection app to control any running
VM if userspace/QEMU allows it.

> 
> The "-s" syntax is difficult to read and unnecessary, e.g. at first I
> thought VM-s was referring to a new subsystem or feature introduced by
> introspection.  VMs, vCPUs, GPRs, MSRs, etc...
> 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c38cc5eb7e73..582b0187f5a4 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -455,6 +455,10 @@ struct kvm {
> > struct srcu_struct srcu;
> > struct srcu_struct irq_srcu;
> > pid_t userspace_pid;
> > +
> > +   struct completion kvmi_completed;
> > +   refcount_t kvmi_ref;
> 
> The refcounting approach seems a bit backwards, and AFAICT is driven by
> implementing unhook via a message, which also seems backwards.  I assume
> hook and unhook are relatively rare events and not performance critical,
> so make those the restricted/slow flows, e.g. force userspace to quiesce
> the VM by making unhook() mutually exclusive with every vcpu ioctl() and
> maybe anything that takes kvm->lock. 
> 
> Then kvmi_ioctl_unhook() can use thread_stop() and kvmi_recv() just needs
> to check kthread_should_stop().
> 
> That way kvmi doesn't need to be refcounted since it's guaranteed to be
> alive if the pointer is non-null.  Eliminating the refcounting will clean
> up a lot of the code by eliminating calls to kvmi_{get,put}(), e.g.
> wrappers like kvmi_breakpoint_event() just check vcpu->kvmi, or maybe
> even get dropped altogether.

The unhook event has been added to cover the following case: while the
introspection tool runs in another VM, both VMs, the virtual appliance
and the introspected VM, could be paused by the user. We needed a way
to signal this to the introspection tool and give it time to unhook
(the introspected VM has to run and execute the introspection commands
during this phase). The receiving threads quits when the socket is closed
(by QEMU or by the introspection tool).

It's a bit unclear how, but we'll try to get ride of the refcount object,
which will remove a lot of code, indeed.

> 
> > +   void *kvmi;
> 
> Why is this a void*?  Just forward declare struct kvmi in kvmi.h.
> 
> IMO this should be 'struct kvm_introspection *introspection', similar to
> 'struct kvm_vcpu_arch arch' and 'struct kvm_vmx'.  Ditto for the vCPU
> flavor.  Local variables could be kvmi+vcpui, kvm_i+vcpu_i, or maybe
> a more long form if someone can come up with a good abbreviation?
> 
> Using 'ikvm' as the local variable name when everything else refers to
> introspection as 'kvmi' is especially funky.

We'll do.

> 
> >  };
> >  
> >  #define kvm_err(fmt, ...) \
> > diff --git a/include/linux/kvmi.h b/include/linux/kvmi.h
> > new file mode 100644
> > index ..e36de3f9f3de
> > --- /dev/null
> > +++ b/include/linux/kvmi.h
> > @@ -0,0 +1,23 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __KVMI_H__
> > +#define __KVMI_H__
> > +
> > +#define kvmi_is_present() IS_ENABLED(CONFIG_KVM_INTROSPECTION)
> 
> Peeking forward a few patches, introspection should have a module param.

Like kvm.introspection=true/False ?

> The code is also inconsistent in its usage of kvmi_is_present() versus
> #ifdef CONFIG_KVM_INTROSPECTION.
> 
> And maybe kvm_is_instrospection_enabled() so that the gating function has
> a more descriptive name for first-time readers?

Right.

> > diff --git a/include/uapi/linux/kvmi.h b/include/uapi/linux/kvmi.h
> > new file mode 100644
> > index ..dbf63ad0862f
> > --- /dev/null
> > +++ b/include/uapi/linux/kvmi.h
> > @@ -0,0 +1,68 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +#ifndef _UAPI__LINUX_KVMI_H
> > +#define _UAPI__LINUX_KVMI_H
> > +
> > +/*
> > + * KVMI structures and definitions
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +#define KVMI_VERSION 0x0001
> > +
> > +enum {
> > +   KVMI_EVENT_REPLY   = 0,
> > +   

Re: DANGER WILL ROBINSON, DANGER

2019-08-13 Thread Matthew Wilcox
On Tue, Aug 13, 2019 at 11:29:07AM +0200, Paolo Bonzini wrote:
> On 09/08/19 18:24, Matthew Wilcox wrote:
> > On Fri, Aug 09, 2019 at 07:00:26PM +0300, Adalbert Lazăr wrote:
> >> +++ b/include/linux/page-flags.h
> >> @@ -417,8 +417,10 @@ PAGEFLAG(Idle, idle, PF_ANY)
> >>   */
> >>  #define PAGE_MAPPING_ANON 0x1
> >>  #define PAGE_MAPPING_MOVABLE  0x2
> >> +#define PAGE_MAPPING_REMOTE   0x4
> > Uh.  How do you know page->mapping would otherwise have bit 2 clear?
> > Who's guaranteeing that?
> > 
> > This is an awfully big patch to the memory management code, buried in
> > the middle of a gigantic series which almost guarantees nobody would
> > look at it.  I call shenanigans.
> 
> Are you calling shenanigans on the patch submitter (which is gratuitous)
> or on the KVM maintainers/reviewers?

On the patch submitter, of course.  How can I possibly be criticising you
for something you didn't do?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: DANGER WILL ROBINSON, DANGER

2019-08-13 Thread Adalbert Lazăr
On Fri, 9 Aug 2019 09:24:44 -0700, Matthew Wilcox  wrote:
> On Fri, Aug 09, 2019 at 07:00:26PM +0300, Adalbert Lazăr wrote:
> > +++ b/include/linux/page-flags.h
> > @@ -417,8 +417,10 @@ PAGEFLAG(Idle, idle, PF_ANY)
> >   */
> >  #define PAGE_MAPPING_ANON  0x1
> >  #define PAGE_MAPPING_MOVABLE   0x2
> > +#define PAGE_MAPPING_REMOTE0x4
> 
> Uh.  How do you know page->mapping would otherwise have bit 2 clear?
> Who's guaranteeing that?
> 
> This is an awfully big patch to the memory management code, buried in
> the middle of a gigantic series which almost guarantees nobody would
> look at it.  I call shenanigans.
> 
> > @@ -1021,7 +1022,7 @@ void page_move_anon_rmap(struct page *page, struct 
> > vm_area_struct *vma)
> >   * __page_set_anon_rmap - set up new anonymous rmap
> >   * @page:  Page or Hugepage to add to rmap
> >   * @vma:   VM area to add page to.
> > - * @address:   User virtual address of the mapping 
> > + * @address:   User virtual address of the mapping
> 
> And mixing in fluff changes like this is a real no-no.  Try again.
> 

No bad intentions, just overzealous.
I didn't want to hide anything from our patches.
Once we advance with the introspection patches related to KVM we'll be
back with the remote mapping patch, split and cleaned.

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive

2019-08-13 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 12:05:03PM +, 冉 jiang wrote:
> 
> On 2019/7/20 0:13, Michael S. Tsirkin wrote:
> > On Fri, Jul 19, 2019 at 03:31:29PM +, 冉 jiang wrote:
> >> On 2019/7/19 22:29, Jiang wrote:
> >>> On 2019/7/19 10:36, Jason Wang wrote:
>  On 2019/7/18 下午10:43, Michael S. Tsirkin wrote:
> > On Thu, Jul 18, 2019 at 10:42:47AM -0400, Michael S. Tsirkin wrote:
> >> On Thu, Jul 18, 2019 at 10:01:05PM +0800, Jason Wang wrote:
> >>> On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
>  On Thu, Jul 18, 2019 at 12:55:50PM +, ? jiang wrote:
> > This change makes ring buffer reclaim threshold num_free
> > configurable
> > for better performance, while it's hard coded as 1/2 * queue now.
> > According to our test with qemu + dpdk, packet dropping happens
> > when
> > the guest is not able to provide free buffer in avail ring timely.
> > Smaller value of num_free does decrease the number of packet
> > dropping
> > during our test as it makes virtio_net reclaim buffer earlier.
> >
> > At least, we should leave the value changeable to user while the
> > default value as 1/2 * queue is kept.
> >
> > Signed-off-by: jiangkidd
>  That would be one reason, but I suspect it's not the
>  true one. If you need more buffer due to jitter
>  then just increase the queue size. Would be cleaner.
> 
> 
>  However are you sure this is the reason for
>  packet drops? Do you see them dropped by dpdk
>  due to lack of space in the ring? As opposed to
>  by guest?
> 
> 
> >>> Besides those, this patch depends on the user to choose a suitable
> >>> threshold
> >>> which is not good. You need either a good value with demonstrated
> >>> numbers or
> >>> something smarter.
> >>>
> >>> Thanks
> >> I do however think that we have a problem right now: try_fill_recv can
> >> take up a long time during which net stack does not run at all.
> >> Imagine
> >> a 1K queue - we are talking 512 packets. That's exceessive.
> 
>  Yes, we will starve a fast host in this case.
> 
> 
> >>     napi poll
> >> weight solves a similar problem, so it might make sense to cap this at
> >> napi_poll_weight.
> >>
> >> Which will allow tweaking it through a module parameter as a
> >> side effect :) Maybe just do NAPI_POLL_WEIGHT.
> > Or maybe NAPI_POLL_WEIGHT/2 like we do at half the queue ;). Please
> > experiment, measure performance and let the list know
> >
> >> Need to be careful though: queues can also be small and I don't
> >> think we
> >> want to exceed queue size / 2, or maybe queue size - napi_poll_weight.
> >> Definitely must not exceed the full queue size.
> 
>  Looking at intel, it uses 16 and i40e uses 32.  It looks to me
>  NAPI_POLL_WEIGHT/2 is better.
> 
>  Jiang, want to try that and post a new patch?
> 
>  Thanks
> 
> 
> >> -- 
> >> MST
> >>> We did have completed several rounds of test with setting the value to
> >>> budget (64 as the default value). It does improve a lot with pps is
> >>> below 400pps for a single stream. Let me consolidate the data and will
> >>> send it soon. Actually, we are confident that it runs out of free
> >>> buffer in avail ring when packet dropping happens with below systemtap:
> >>>
> >>> Just a snippet:
> >>>
> >>> probe module("virtio_ring").function("virtqueue_get_buf")
> >>> {
> >>>      x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)-
> >>> (@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one
> >>> to verify if the queue is full, which means guest is not able to take
> >>> buffer from the queue timely
> >>>
> >>>      if (x<0 && (x+65535)<4096)
> >>>          x = x+65535
> >>>
> >>>      if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback ==
> >>> callback_addr)
> >>>          netrxcount[x] <<< gettimeofday_s()
> >>> }
> >>>
> >>>
> >>> probe module("virtio_ring").function("virtqueue_add_inbuf")
> >>> {
> >>>      y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)-
> >>> (@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one
> >>> to verify if we run out of free buffer in avail ring
> >>>      if (y<0 && (y+65535)<4096)
> >>>          y = y+65535
> >>>
> >>>      if(@2=="debugon")
> >>>      {
> >>>          if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback ==
> >>> callback_addr)
> >>>          {
> >>>              netrxfreecount[y] <<< gettimeofday_s()
> >>>
> >>>              printf("no avail ring left seen, printing most recent 5
> >>> num free, vq: %lx, current index: %d\n", $vq, recentfreecount)
> >>>              for(i=recentfreecount; i!=((recentfreecount+4) % 5);
> >>> i=((i+1) % 5))
> >>>              {
> >>>                  printf("index: %d, num free: %d\n", 

Re: [RFC PATCH v6 00/92] VM introspection

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> 
> Patches 1-20: unroll a big part of the KVM introspection subsystem,
> sent in one patch in the previous versions.
> 
> Patches 21-24: extend the current page tracking code.
> 
> Patches 25-33: make use of page tracking to support the
> KVMI_SET_PAGE_ACCESS introspection command and the KVMI_EVENT_PF event
> (on EPT violations caused by the tracking settings).
> 
> Patches 34-42: include the SPP feature (Enable Sub-page
> Write Protection Support), already sent to KVM list:
> 
>   
> https://lore.kernel.org/lkml/20190717133751.12910-1-weijiang.y...@intel.com/
> 
> Patches 43-46: add the commands needed to use SPP.
> 
> Patches 47-63: unroll almost all the rest of the introspection code.
> 
> Patches 64-67: add single-stepping, mostly as a way to overcome the
> unimplemented instructions, but also as a feature for the introspection
> tool.
> 
> Patches 68-70: cover more cases related to EPT violations.
> 
> Patches 71-73: add the remote mapping feature, allowing the introspection
> tool to map into its address space a page from guest memory.
> 
> Patches 74: add a fix to hypercall emulation.
> 
> Patches 75-76: disable some features/optimizations when the introspection
> code is present.
> 
> Patches 77-78: add trace functions for the introspection code and change
> some related to interrupts/exceptions injection.
> 
> Patches 79-92: new instruction for the x86 emulator, including cmpxchg
> fixes.

Thanks for the very good explanation.  Apart from the complicated flow
of KVM request handling and KVM reply, the main issue is the complete
lack of testcases.  There should be a kvmi_test in
tools/testing/selftests/kvm, and each patch adding a new ioctl or event
should add a new testcase.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: DANGER WILL ROBINSON, DANGER

2019-08-13 Thread Paolo Bonzini
On 09/08/19 18:24, Matthew Wilcox wrote:
> On Fri, Aug 09, 2019 at 07:00:26PM +0300, Adalbert Lazăr wrote:
>> +++ b/include/linux/page-flags.h
>> @@ -417,8 +417,10 @@ PAGEFLAG(Idle, idle, PF_ANY)
>>   */
>>  #define PAGE_MAPPING_ANON   0x1
>>  #define PAGE_MAPPING_MOVABLE0x2
>> +#define PAGE_MAPPING_REMOTE 0x4
> Uh.  How do you know page->mapping would otherwise have bit 2 clear?
> Who's guaranteeing that?
> 
> This is an awfully big patch to the memory management code, buried in
> the middle of a gigantic series which almost guarantees nobody would
> look at it.  I call shenanigans.

Are you calling shenanigans on the patch submitter (which is gratuitous)
or on the KVM maintainers/reviewers?

It's not true that nobody would look at it.  Of course no one from
linux-mm is going to look at it, but the maintainer that looks at the
gigantic series is very much expected to look at it and explain to the
submitter that this patch is unacceptable as is.

In fact I shouldn't have to to explain this to you; you know better than
believing that I would try to sneak it past the mm folks.  I am puzzled.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 74/92] kvm: x86: do not unconditionally patch the hypercall instruction during emulation

2019-08-13 Thread Paolo Bonzini
On 09/08/19 18:00, Adalbert Lazăr wrote:
> From: Mihai Donțu 
> 
> It can happened for us to end up emulating the VMCALL instruction as a
> result of the handling of an EPT write fault. In this situation, the
> emulator will try to unconditionally patch the correct hypercall opcode
> bytes using emulator_write_emulated(). However, this last call uses the
> fault GPA (if available) or walks the guest page tables at RIP,
> otherwise. The trouble begins when using KVMI, when we forbid the use of
> the fault GPA and fallback to the guest pt walk: in Windows (8.1 and
> newer) the page that we try to write into is marked read-execute and as
> such emulator_write_emulated() fails and we inject a write #PF, leading
> to a guest crash.
> 
> The fix is rather simple: check the existing instruction bytes before
> doing the patching. This does not change the normal KVM behaviour, but
> does help when using KVMI as we no longer inject a write #PF.

Fixing the hypercall is just an optimization.  Can we just hush and
return to the guest if emulator_write_emulated returns
X86EMUL_PROPAGATE_FAULT?

Paolo

> CC: Joerg Roedel 
> Signed-off-by: Mihai Donțu 
> Signed-off-by: Adalbert Lazăr 
> ---
>  arch/x86/kvm/x86.c | 23 ---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 04b1d2916a0a..965c4f0108eb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7363,16 +7363,33 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
>  
> +#define KVM_HYPERCALL_INSN_LEN 3
> +
>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
>  {
> + int err;
>   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
> - char instruction[3];
> + char buf[KVM_HYPERCALL_INSN_LEN];
> + char instruction[KVM_HYPERCALL_INSN_LEN];
>   unsigned long rip = kvm_rip_read(vcpu);
>  
> + err = emulator_read_emulated(ctxt, rip, buf, sizeof(buf),
> +  >exception);
> + if (err != X86EMUL_CONTINUE)
> + return err;
> +
>   kvm_x86_ops->patch_hypercall(vcpu, instruction);
> + if (!memcmp(instruction, buf, sizeof(instruction)))
> + /*
> +  * The hypercall instruction is the correct one. Retry
> +  * its execution maybe we got here as a result of an
> +  * event other than #UD which has been resolved in the
> +  * mean time.
> +  */
> + return X86EMUL_CONTINUE;
>  
> - return emulator_write_emulated(ctxt, rip, instruction, 3,
> - >exception);
> + return emulator_write_emulated(ctxt, rip, instruction,
> +sizeof(instruction), >exception);
>  }
>  
>  static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu)
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 76/92] kvm: x86: disable EPT A/D bits if introspection is present

2019-08-13 Thread Paolo Bonzini
On 09/08/19 18:00, Adalbert Lazăr wrote:
> Signed-off-by: Adalbert Lazăr 
> ---
>  arch/x86/kvm/vmx/vmx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index dc648ba47df3..152c58b63f69 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7718,7 +7718,7 @@ static __init int hardware_setup(void)
>   !cpu_has_vmx_invept_global())
>   enable_ept = 0;
>  
> - if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
> + if (!cpu_has_vmx_ept_ad_bits() || !enable_ept || kvmi_is_present())
>   enable_ept_ad_bits = 0;
>  
>   if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
> 

Why?

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 79/92] kvm: x86: emulate movsd xmm, m64

2019-08-13 Thread Paolo Bonzini
On 09/08/19 18:00, Adalbert Lazăr wrote:
> From: Mihai Donțu 
> 
> This is needed in order to be able to support guest code that uses movsd to
> write into pages that are marked for write tracking.
> 
> Signed-off-by: Mihai Donțu 
> Signed-off-by: Adalbert Lazăr 
> ---
>  arch/x86/kvm/emulate.c | 32 +++-
>  1 file changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 34431cf31f74..9d38f892beea 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1177,6 +1177,27 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
>   return X86EMUL_CONTINUE;
>  }
>  
> +static u8 simd_prefix_to_bytes(const struct x86_emulate_ctxt *ctxt,
> +int simd_prefix)
> +{
> + u8 bytes;
> +
> + switch (ctxt->b) {
> + case 0x11:
> + /* movsd xmm, m64 */
> + /* movups xmm, m128 */
> + if (simd_prefix == 0xf2) {
> + bytes = 8;
> + break;
> + }
> + /* fallthrough */
> + default:
> + bytes = 16;
> + break;
> + }
> + return bytes;
> +}
> +
>  static void decode_register_operand(struct x86_emulate_ctxt *ctxt,
>   struct operand *op)
>  {
> @@ -1187,7 +1208,7 @@ static void decode_register_operand(struct 
> x86_emulate_ctxt *ctxt,
>  
>   if (ctxt->d & Sse) {
>   op->type = OP_XMM;
> - op->bytes = 16;
> + op->bytes = ctxt->op_bytes;
>   op->addr.xmm = reg;
>   read_sse_reg(ctxt, >vec_val, reg);
>   return;
> @@ -1238,7 +1259,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>   ctxt->d & ByteOp);
>   if (ctxt->d & Sse) {
>   op->type = OP_XMM;
> - op->bytes = 16;
> + op->bytes = ctxt->op_bytes;
>   op->addr.xmm = ctxt->modrm_rm;
>   read_sse_reg(ctxt, >vec_val, ctxt->modrm_rm);
>   return rc;
> @@ -4529,7 +4550,7 @@ static const struct gprefix pfx_0f_2b = {
>  };
>  
>  static const struct gprefix pfx_0f_10_0f_11 = {
> - I(Unaligned, em_mov), I(Unaligned, em_mov), N, N,
> + I(Unaligned, em_mov), I(Unaligned, em_mov), I(Unaligned, em_mov), N,
>  };
>  
>  static const struct gprefix pfx_0f_28_0f_29 = {
> @@ -5097,7 +5118,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void 
> *insn, int insn_len)
>  {
>   int rc = X86EMUL_CONTINUE;
>   int mode = ctxt->mode;
> - int def_op_bytes, def_ad_bytes, goffset, simd_prefix;
> + int def_op_bytes, def_ad_bytes, goffset, simd_prefix = 0;
>   bool op_prefix = false;
>   bool has_seg_override = false;
>   struct opcode opcode;
> @@ -5320,7 +5341,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void 
> *insn, int insn_len)
>   ctxt->op_bytes = 4;
>  
>   if (ctxt->d & Sse)
> - ctxt->op_bytes = 16;
> + ctxt->op_bytes = simd_prefix_to_bytes(ctxt,
> +   simd_prefix);
>   else if (ctxt->d & Mmx)
>   ctxt->op_bytes = 8;
>   }
> 

Please submit all these emulator patches as a separate series, complete
with testcases for kvm-unit-tests.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 07/92] kvm: introspection: honor the reply option when handling the KVMI_GET_VERSION command

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> Obviously, the KVMI_GET_VERSION command must not be used when the command
> reply is disabled by a previous KVMI_CONTROL_CMD_RESPONSE command.
> 
> This commit changes the code path in order to check the reply option
> (enabled/disabled) before trying to reply to this command. If the command
> reply is disabled it will return an error to the caller. In the end, the
> receiving worker will finish and the introspection socket will be closed.
> 
> Signed-off-by: Adalbert Lazăr 
> ---
>  virt/kvm/kvmi_msg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvmi_msg.c b/virt/kvm/kvmi_msg.c
> index ea5c7e23669a..2237a6ed25f6 100644
> --- a/virt/kvm/kvmi_msg.c
> +++ b/virt/kvm/kvmi_msg.c
> @@ -169,7 +169,7 @@ static int handle_get_version(struct kvmi *ikvm,
>   memset(, 0, sizeof(rpl));
>   rpl.version = KVMI_VERSION;
>  
> - return kvmi_msg_vm_reply(ikvm, msg, 0, , sizeof(rpl));
> + return kvmi_msg_vm_maybe_reply(ikvm, msg, 0, , sizeof(rpl));
>  }
>  
>  static bool is_command_allowed(struct kvmi *ikvm, int id)
> 

Go ahead and squash this in the previous patch.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 06/92] kvm: introspection: add KVMI_CONTROL_CMD_RESPONSE

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> +If `now` is 1, the command reply is enabled/disabled (according to
> +`enable`) starting with the current command. For example, `enable=0`
> +and `now=1` means that the reply is disabled for this command too,
> +while `enable=0` and `now=0` means that a reply will be send for this
> +command, but not for the next ones (until enabled back with another
> +*KVMI_CONTROL_CMD_RESPONSE*).
> +
> +This command is used by the introspection tool to disable the replies
> +for commands returning an error code only (eg. *KVMI_SET_REGISTERS*)
> +when an error is less likely to happen. For example, the following
> +commands can be used to reply to an event with a single `write()` call:
> +
> + KVMI_CONTROL_CMD_RESPONSE enable=0 now=1
> + KVMI_SET_REGISTERS vcpu=N
> + KVMI_EVENT_REPLY   vcpu=N
> + KVMI_CONTROL_CMD_RESPONSE enable=1 now=0

I don't understand the usage.  Is there any case where you want now == 1
actually?  Can you just say that KVMI_CONTROL_CMD_RESPONSE never has a
reply, or to make now==enable?

> + if (err)
> + kvmi_warn(ikvm, "Error code %d discarded for message id %d\n",
> +   err, msg->id);
> +

Would it make sense to even close the socket if there is an error?

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 01/92] kvm: introduce KVMI (VM introspection subsystem)

2019-08-13 Thread Paolo Bonzini
On 12/08/19 22:20, Sean Christopherson wrote:
> The refcounting approach seems a bit backwards, and AFAICT is driven by
> implementing unhook via a message, which also seems backwards.  I assume
> hook and unhook are relatively rare events and not performance critical,
> so make those the restricted/slow flows, e.g. force userspace to quiesce
> the VM by making unhook() mutually exclusive with every vcpu ioctl() and
> maybe anything that takes kvm->lock. 

The reason for the unhook event, as far as I understand, is because the
introspection appliance can poke int3 into the guest and needs an
opportunity to undo that.

I don't have a big problem with that and the refcounting, at least for
this first iteration---it can be tackled later, once the general event
loop is simplified---however I agree with the other comments that Sean
made.  Fortunately it should not be hard to apply them to the whole
patchset with search and replace on the patches themselves.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 70/92] kvm: x86: filter out access rights only when tracked by the introspection tool

2019-08-13 Thread Paolo Bonzini
On 09/08/19 18:00, Adalbert Lazăr wrote:
> It should complete the commit fd34a9518173 ("kvm: x86: consult the page 
> tracking from kvm_mmu_get_page() and __direct_map()")
> 
> Signed-off-by: Adalbert Lazăr 
> ---
>  arch/x86/kvm/mmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 65b6acba82da..fd64cf1115da 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2660,6 +2660,9 @@ static void clear_sp_write_flooding_count(u64 *spte)
>  static unsigned int kvm_mmu_page_track_acc(struct kvm_vcpu *vcpu, gfn_t gfn,
>  unsigned int acc)
>  {
> + if (!kvmi_tracked_gfn(vcpu, gfn))
> + return acc;
> +
>   if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_PREREAD))
>   acc &= ~ACC_USER_MASK;
>   if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_PREWRITE) ||
> 

If this patch is always needed, then the function should be named
something like kvm_mmu_apply_introspection_access and kvmi_tracked_gfn
should be tested from the moment it is introduced.

But the commit message says nothing about _why_ it is needed, so I
cannot guess.  I would very much avoid it however.  Is it just an
optimization?

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 27/92] kvm: introspection: use page track

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> +
> + /*
> +  * This function uses kvm->mmu_lock so it's not allowed to be
> +  * called under kvmi_put(). It can reach a deadlock if called
> +  * from kvm_mmu_load -> kvmi_tracked_gfn -> kvmi_put.
> +  */
> + kvmi_clear_mem_access(kvm);

kvmi_tracked_gfn does not exist yet.

More in general, this comment says why you are calling this here, but it
says nothing about the split of responsibility between
kvmi_end_introspection and kvmi_release.  Please add a comment for this
as soon as you add kvmi_end_introspection (which according to my earlier
review should be patch 1).

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4] drm/virtio: use virtio_max_dma_size

2019-08-13 Thread Gerd Hoffmann
We must make sure our scatterlist segments are not too big, otherwise
we might see swiotlb failures (happens with sev, also reproducable with
swiotlb=force).

Suggested-by: Laszlo Ersek 
Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/virtio/virtgpu_object.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
b/drivers/gpu/drm/virtio/virtgpu_object.c
index b2da31310d24..09b526518f5a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -204,6 +204,7 @@ int virtio_gpu_object_get_sg_table(struct virtio_gpu_device 
*qdev,
.interruptible = false,
.no_wait_gpu = false
};
+   size_t max_segment;
 
/* wtf swapping */
if (bo->pages)
@@ -215,8 +216,13 @@ int virtio_gpu_object_get_sg_table(struct 
virtio_gpu_device *qdev,
if (!bo->pages)
goto out;
 
-   ret = sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
-   nr_pages << PAGE_SHIFT, GFP_KERNEL);
+   max_segment = virtio_max_dma_size(qdev->vdev);
+   max_segment &= PAGE_MASK;
+   if (max_segment > SCATTERLIST_MAX_SEGMENT)
+   max_segment = SCATTERLIST_MAX_SEGMENT;
+   ret = __sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
+ nr_pages << PAGE_SHIFT,
+ max_segment, GFP_KERNEL);
if (ret)
goto out;
return 0;
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 16/92] kvm: introspection: handle events and event replies

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> 
> +  reply->padding2);
> +
> + ivcpu->reply_waiting = false;
> + return expected->error;
> +}
> +
>  /*

Is this missing a wakeup?

>  
> +static bool need_to_wait(struct kvm_vcpu *vcpu)
> +{
> + struct kvmi_vcpu *ivcpu = IVCPU(vcpu);
> +
> + return ivcpu->reply_waiting;
> +}
> +

Do you actually need this function?  It seems to me that everywhere you
call it you already have an ivcpu, so you can just access the field.

Also, "reply_waiting" means "there is a reply that is waiting".  What
you mean is "waiting_for_reply".

The overall structure of the jobs code is confusing.  The same function
kvm_run_jobs_and_wait is an infinite loop before and gets a "break"
later.  It is also not clear why kvmi_job_wait is called through a job.
 Can you have instead just kvm_run_jobs in KVM_REQ_INTROSPECTION, and
something like this instead when sending an event:

int kvmi_wait_for_reply(struct kvm_vcpu *vcpu)
{
struct kvmi_vcpu *ivcpu = IVCPU(vcpu);

while (ivcpu->waiting_for_reply) {
kvmi_run_jobs(vcpu);

err = swait_event_killable(*wq,
!ivcpu->waiting_for_reply ||
!list_empty(>job_list));

if (err)
return -EINTR;
}

return 0;
}

?

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3] drm/virtio: use virtio_max_dma_size

2019-08-13 Thread Gerd Hoffmann
We must make sure our scatterlist segments are not too big, otherwise
we might see swiotlb failures (happens with sev, also reproducable with
swiotlb=force).

Suggested-by: Laszlo Ersek 
Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/virtio/virtgpu_object.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
b/drivers/gpu/drm/virtio/virtgpu_object.c
index b2da31310d24..3d86e4b3de58 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -204,6 +204,7 @@ int virtio_gpu_object_get_sg_table(struct virtio_gpu_device 
*qdev,
.interruptible = false,
.no_wait_gpu = false
};
+   unsigned max_segment;
 
/* wtf swapping */
if (bo->pages)
@@ -215,8 +216,13 @@ int virtio_gpu_object_get_sg_table(struct 
virtio_gpu_device *qdev,
if (!bo->pages)
goto out;
 
-   ret = sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
-   nr_pages << PAGE_SHIFT, GFP_KERNEL);
+   max_segment = virtio_max_dma_size(qdev->vdev);
+   max_segment &= PAGE_MASK;
+   if (max_segment > SCATTERLIST_MAX_SEGMENT)
+   max_segment = SCATTERLIST_MAX_SEGMENT;
+   ret = __sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
+ nr_pages << PAGE_SHIFT,
+ max_segment, GFP_KERNEL);
if (ret)
goto out;
return 0;
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v6 75/92] kvm: x86: disable gpa_available optimization in emulator_read_write_onepage()

2019-08-13 Thread Paolo Bonzini
On 09/08/19 18:00, Adalbert Lazăr wrote:
> If the EPT violation was caused by an execute restriction imposed by the
> introspection tool, gpa_available will point to the instruction pointer,
> not the to the read/write location that has to be used to emulate the
> current instruction.
> 
> This optimization should be disabled only when the VM is introspected,
> not just because the introspection subsystem is present.
> 
> Signed-off-by: Adalbert Lazăr 

The right thing to do is to not set gpa_available for fetch failures in 
kvm_mmu_page_fault instead:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24843cf49579..1bdca40fa831 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5364,8 +5364,12 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u64 error_code,
enum emulation_result er;
bool direct = vcpu->arch.mmu->direct_map;
 
-   /* With shadow page tables, fault_address contains a GVA or nGPA.  */
-   if (vcpu->arch.mmu->direct_map) {
+   /*
+* With shadow page tables, fault_address contains a GVA or nGPA.
+* On a fetch fault, fault_address contains the instruction pointer.
+*/
+   if (vcpu->arch.mmu->direct_map &&
+   likely(!(error_code & PFERR_FETCH_MASK)) {
vcpu->arch.gpa_available = true;
vcpu->arch.gpa_val = cr2;
}


Paolo

> ---
>  arch/x86/kvm/x86.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 965c4f0108eb..3975331230b9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5532,7 +5532,7 @@ static int emulator_read_write_onepage(unsigned long 
> addr, void *val,
>* operation using rep will only have the initial GPA from the NPF
>* occurred.
>*/
> - if (vcpu->arch.gpa_available &&
> + if (vcpu->arch.gpa_available && !kvmi_is_present() &&
>   emulator_can_use_gpa(ctxt) &&
>   (addr & ~PAGE_MASK) == (vcpu->arch.gpa_val & ~PAGE_MASK)) {
>   gpa = vcpu->arch.gpa_val;
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 02/92] kvm: introspection: add basic ioctls (hook/unhook)

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> +static int kvmi_recv(void *arg)
> +{
> + struct kvmi *ikvm = arg;
> +
> + kvmi_info(ikvm, "Hooking VM\n");
> +
> + while (kvmi_msg_process(ikvm))
> + ;
> +
> + kvmi_info(ikvm, "Unhooking VM\n");
> +
> + kvmi_end_introspection(ikvm);
> +
> + return 0;
> +}
> +

Rename this to kvmi_recv_thread instead, please.

> +
> + /*
> +  * Make sure all the KVM/KVMI structures are linked and no pointer
> +  * is read as NULL after the reference count has been set.
> +  */
> + smp_mb__before_atomic();

This is an smp_wmb(), not an smp_mb__before_atomic().  Add a comment
that it pairs with the refcount_inc_not_zero in kvmi_get.

> + refcount_set(>kvmi_ref, 1);
> +


> @@ -57,8 +183,27 @@ void kvmi_destroy_vm(struct kvm *kvm)
>   if (!ikvm)
>   return;
>  
> + /* trigger socket shutdown - kvmi_recv() will start shutdown process */
> + kvmi_sock_shutdown(ikvm);
> +
>   kvmi_put(kvm);
>  
>   /* wait for introspection resources to be released */
>   wait_for_completion_killable(>kvmi_completed);
>  }
> +

This addition means that kvmi_destroy_vm should have called
kvmi_end_introspection instead.  In patch 1, kvmi_end_introspection
should have been just kvmi_put, now this patch can add kvmi_sock_shutdown.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 13/92] kvm: introspection: make the vCPU wait even when its jobs list is empty

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> +void kvmi_handle_requests(struct kvm_vcpu *vcpu)
> +{
> + struct kvmi *ikvm;
> +
> + ikvm = kvmi_get(vcpu->kvm);
> + if (!ikvm)
> + return;
> +
> + for (;;) {
> + int err = kvmi_run_jobs_and_wait(vcpu);
> +
> + if (err)
> + break;
> + }
> +
> + kvmi_put(vcpu->kvm);
> +}
> +

Using kvmi_run_jobs_and_wait from two places (here and kvmi_send_event)
is very confusing.  Does kvmi_handle_requests need to do this, or can it
just use kvmi_run_jobs?

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 14/92] kvm: introspection: handle introspection commands before returning to guest

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> + prepare_to_swait_exclusive(>wq, ,
> +TASK_INTERRUPTIBLE);
> +
> + if (kvm_vcpu_check_block(vcpu) < 0)
> + break;
> +
> + waited = true;
> + schedule();
> +
> + if (kvm_check_request(KVM_REQ_INTROSPECTION, vcpu)) {
> + do_kvmi_work = true;
> + break;
> + }
> + }
>  
> - waited = true;
> - schedule();
> + finish_swait(>wq, );
> +
> + if (do_kvmi_work)
> + kvmi_handle_requests(vcpu);
> + else
> + break;
>   }

Is this needed?  Or can it just go back to KVM_RUN and handle
KVM_REQ_INTROSPECTION there (in which case it would be basically
premature optimization)?

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 1/2] drm/virtio: cleanup queue functions

2019-08-13 Thread Gerd Hoffmann
Make the queue functions return void, none of
the call sites checks the return value.

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 41 ++---
 1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 7ac20490e1b4..ca91e83ffaef 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -252,8 +252,8 @@ void virtio_gpu_dequeue_cursor_func(struct work_struct 
*work)
wake_up(>cursorq.ack_queue);
 }
 
-static int virtio_gpu_queue_ctrl_buffer_locked(struct virtio_gpu_device *vgdev,
-  struct virtio_gpu_vbuffer *vbuf)
+static void virtio_gpu_queue_ctrl_buffer_locked(struct virtio_gpu_device 
*vgdev,
+   struct virtio_gpu_vbuffer *vbuf)
__releases(>ctrlq.qlock)
__acquires(>ctrlq.qlock)
 {
@@ -263,7 +263,7 @@ static int virtio_gpu_queue_ctrl_buffer_locked(struct 
virtio_gpu_device *vgdev,
int ret;
 
if (!vgdev->vqs_ready)
-   return -ENODEV;
+   return;
 
sg_init_one(, vbuf->buf, vbuf->size);
sgs[outcnt + incnt] = 
@@ -294,30 +294,22 @@ static int virtio_gpu_queue_ctrl_buffer_locked(struct 
virtio_gpu_device *vgdev,
 
virtqueue_kick(vq);
}
-
-   if (!ret)
-   ret = vq->num_free;
-   return ret;
 }
 
-static int virtio_gpu_queue_ctrl_buffer(struct virtio_gpu_device *vgdev,
-   struct virtio_gpu_vbuffer *vbuf)
+static void virtio_gpu_queue_ctrl_buffer(struct virtio_gpu_device *vgdev,
+struct virtio_gpu_vbuffer *vbuf)
 {
-   int rc;
-
spin_lock(>ctrlq.qlock);
-   rc = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
+   virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(>ctrlq.qlock);
-   return rc;
 }
 
-static int virtio_gpu_queue_fenced_ctrl_buffer(struct virtio_gpu_device *vgdev,
-  struct virtio_gpu_vbuffer *vbuf,
-  struct virtio_gpu_ctrl_hdr *hdr,
-  struct virtio_gpu_fence *fence)
+static void virtio_gpu_queue_fenced_ctrl_buffer(struct virtio_gpu_device 
*vgdev,
+   struct virtio_gpu_vbuffer *vbuf,
+   struct virtio_gpu_ctrl_hdr *hdr,
+   struct virtio_gpu_fence *fence)
 {
struct virtqueue *vq = vgdev->ctrlq.vq;
-   int rc;
 
 again:
spin_lock(>ctrlq.qlock);
@@ -338,13 +330,12 @@ static int virtio_gpu_queue_fenced_ctrl_buffer(struct 
virtio_gpu_device *vgdev,
 
if (fence)
virtio_gpu_fence_emit(vgdev, hdr, fence);
-   rc = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
+   virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(>ctrlq.qlock);
-   return rc;
 }
 
-static int virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
-  struct virtio_gpu_vbuffer *vbuf)
+static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
+   struct virtio_gpu_vbuffer *vbuf)
 {
struct virtqueue *vq = vgdev->cursorq.vq;
struct scatterlist *sgs[1], ccmd;
@@ -352,7 +343,7 @@ static int virtio_gpu_queue_cursor(struct virtio_gpu_device 
*vgdev,
int outcnt;
 
if (!vgdev->vqs_ready)
-   return -ENODEV;
+   return;
 
sg_init_one(, vbuf->buf, vbuf->size);
sgs[0] = 
@@ -374,10 +365,6 @@ static int virtio_gpu_queue_cursor(struct 
virtio_gpu_device *vgdev,
}
 
spin_unlock(>cursorq.qlock);
-
-   if (!ret)
-   ret = vq->num_free;
-   return ret;
 }
 
 /* just create gem objects for userspace and long lived objects,
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] drm/virtio: notify virtqueues without holding spinlock

2019-08-13 Thread Gerd Hoffmann
Split virtqueue_kick() call into virtqueue_kick_prepare(), which
requires serialization, and virtqueue_notify(), which does not.  Move
the virtqueue_notify() call out of the critical section protected by the
queue lock.  This avoids triggering a vmexit while holding the lock and
thereby fixes a rather bad spinlock contention.

Suggested-by: Chia-I Wu 
Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index ca91e83ffaef..e41c96143342 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -252,7 +252,7 @@ void virtio_gpu_dequeue_cursor_func(struct work_struct 
*work)
wake_up(>cursorq.ack_queue);
 }
 
-static void virtio_gpu_queue_ctrl_buffer_locked(struct virtio_gpu_device 
*vgdev,
+static bool virtio_gpu_queue_ctrl_buffer_locked(struct virtio_gpu_device 
*vgdev,
struct virtio_gpu_vbuffer *vbuf)
__releases(>ctrlq.qlock)
__acquires(>ctrlq.qlock)
@@ -260,10 +260,11 @@ static void virtio_gpu_queue_ctrl_buffer_locked(struct 
virtio_gpu_device *vgdev,
struct virtqueue *vq = vgdev->ctrlq.vq;
struct scatterlist *sgs[3], vcmd, vout, vresp;
int outcnt = 0, incnt = 0;
+   bool notify = false;
int ret;
 
if (!vgdev->vqs_ready)
-   return;
+   return notify;
 
sg_init_one(, vbuf->buf, vbuf->size);
sgs[outcnt + incnt] = 
@@ -292,16 +293,21 @@ static void virtio_gpu_queue_ctrl_buffer_locked(struct 
virtio_gpu_device *vgdev,
trace_virtio_gpu_cmd_queue(vq,
(struct virtio_gpu_ctrl_hdr *)vbuf->buf);
 
-   virtqueue_kick(vq);
+   notify = virtqueue_kick_prepare(vq);
}
+   return notify;
 }
 
 static void virtio_gpu_queue_ctrl_buffer(struct virtio_gpu_device *vgdev,
 struct virtio_gpu_vbuffer *vbuf)
 {
+   bool notify;
+
spin_lock(>ctrlq.qlock);
-   virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
+   notify = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(>ctrlq.qlock);
+   if (notify)
+   virtqueue_notify(vgdev->ctrlq.vq);
 }
 
 static void virtio_gpu_queue_fenced_ctrl_buffer(struct virtio_gpu_device 
*vgdev,
@@ -310,6 +316,7 @@ static void virtio_gpu_queue_fenced_ctrl_buffer(struct 
virtio_gpu_device *vgdev,
struct virtio_gpu_fence *fence)
 {
struct virtqueue *vq = vgdev->ctrlq.vq;
+   bool notify;
 
 again:
spin_lock(>ctrlq.qlock);
@@ -330,8 +337,10 @@ static void virtio_gpu_queue_fenced_ctrl_buffer(struct 
virtio_gpu_device *vgdev,
 
if (fence)
virtio_gpu_fence_emit(vgdev, hdr, fence);
-   virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
+   notify = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(>ctrlq.qlock);
+   if (notify)
+   virtqueue_notify(vgdev->ctrlq.vq);
 }
 
 static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
@@ -339,6 +348,7 @@ static void virtio_gpu_queue_cursor(struct 
virtio_gpu_device *vgdev,
 {
struct virtqueue *vq = vgdev->cursorq.vq;
struct scatterlist *sgs[1], ccmd;
+   bool notify;
int ret;
int outcnt;
 
@@ -361,10 +371,13 @@ static void virtio_gpu_queue_cursor(struct 
virtio_gpu_device *vgdev,
trace_virtio_gpu_cmd_queue(vq,
(struct virtio_gpu_ctrl_hdr *)vbuf->buf);
 
-   virtqueue_kick(vq);
+   notify = virtqueue_kick_prepare(vq);
}
 
spin_unlock(>cursorq.qlock);
+
+   if (notify)
+   virtqueue_notify(vq);
 }
 
 /* just create gem objects for userspace and long lived objects,
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration

2019-08-13 Thread Jason Wang


On 2019/8/12 下午5:49, Michael S. Tsirkin wrote:

On Mon, Aug 12, 2019 at 10:44:51AM +0800, Jason Wang wrote:

On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:

On Fri, Aug 09, 2019 at 01:48:42AM -0400, Jason Wang wrote:

Hi all:

This series try to fix several issues introduced by meta data
accelreation series. Please review.

Changes from V4:
- switch to use spinlock synchronize MMU notifier with accessors

Changes from V3:
- remove the unnecessary patch

Changes from V2:
- use seqlck helper to synchronize MMU notifier with vhost worker

Changes from V1:
- try not use RCU to syncrhonize MMU notifier with vhost worker
- set dirty pages after no readers
- return -EAGAIN only when we find the range is overlapped with
metadata

Jason Wang (9):
vhost: don't set uaddr for invalid address
vhost: validate MMU notifier registration
vhost: fix vhost map leak
vhost: reset invalidate_count in vhost_set_vring_num_addr()
vhost: mark dirty pages during map uninit
vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
vhost: do not use RCU to synchronize MMU notifier with worker
vhost: correctly set dirty pages in MMU notifiers callback
vhost: do not return -EAGAIN for non blocking invalidation too early

   drivers/vhost/vhost.c | 202 +-
   drivers/vhost/vhost.h |   6 +-
   2 files changed, 122 insertions(+), 86 deletions(-)

This generally looks more solid.

But this amounts to a significant overhaul of the code.

At this point how about we revert 7f466032dc9e5a61217f22ea34b2df932786bbfc
for this release, and then re-apply a corrected version
for the next one?


If possible, consider we've actually disabled the feature. How about just
queued those patches for next release?

Thanks

Sorry if I was unclear. My idea is that
1. I revert the disabled code
2. You send a patch readding it with all the fixes squashed
3. Maybe optimizations on top right away?
4. We queue *that* for next and see what happens.

And the advantage over the patchy approach is that the current patches
are hard to review. E.g.  it's not reasonable to ask RCU guys to review
the whole of vhost for RCU usage but it's much more reasonable to ask
about a specific patch.



Ok. Then I agree to revert.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v6 26/92] kvm: x86: add kvm_mmu_nested_pagefault()

2019-08-13 Thread Paolo Bonzini
On 09/08/19 17:59, Adalbert Lazăr wrote:
> +static bool vmx_nested_pagefault(struct kvm_vcpu *vcpu)
> +{
> + if (vcpu->arch.exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)
> + return false;
> + return true;
> +}
> +

This hook is misnamed; it has nothing to do with nested virtualization.
 Rather, it returns true if it the failure happened while translating
the address of a guest page table.

SVM makes the same information available in EXITINFO[33].

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization