Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction

2012-12-03 Thread Xiao Guangrong
Hi Marcelo,

Thanks for your patience. I was reading your reply over and over again, i would
like to argue it more :).
Please see below.

On 11/29/2012 08:21 AM, Marcelo Tosatti wrote:

 
 https://lkml.org/lkml/2012/11/17/75
 
 Does unshadowing work with large sptes at reexecute_instruction? That
 is, do we nuke any large read-only sptes that might be causing a certain
 gfn to be read-only?
 
 That is, following the sequence there, is the large read-only spte
 properly destroyed?

Actually, sptes can not prevent gfn becoming writable, that means the gfn
can become writable *even if* it exists a spte which maps to the gfn.

The condition that can prevent gfn becoming writable is, the gfn has been
shadowed, that means the gfn can not become writable if it exists a sp
with sp.gfn = gfn.

Note, drop_spte does not remove any sp. So, either destroying spte or keeping
spte doest not have any affect for gfn write-protection.

If luck enough, my point is right, the current code can do some optimizations
as below:

if (level  PT_PAGE_TABLE_LEVEL 
-   has_wrprotected_page(vcpu-kvm, gfn, level)) {
-   ret = 1;
-   drop_spte(vcpu-kvm, sptep);
-   goto done;
-   }
+   has_wrprotected_page(vcpu-kvm, gfn, level))
+   return 0;


1): we can return 0 instead of 1 to avoid unnecessary emulation. vcpu will 
refault
again then kvm will use small page.

2): need not do any change on the spte.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes

2012-12-03 Thread Glauber Costa
On 11/30/2012 09:55 PM, Pavel Emelyanov wrote:
 Hello,
 
 This is an attempt to implement support for memory snapshot for the the
 checkpoint-restore project (http://criu.org).
 
 To create a dump of an application(s) we save all the information about it
 to files. No surprise, the biggest part of such dump is the contents of tasks'
 memory. However, in some usage scenarios it's not required to get _all_ the
 task memory while creating a dump. For example, when doing periodical dumps
 it's only required to take full memory dump only at the first step and then
 take incremental changes of memory. Another example is live migration. In the
 simplest form it looks like -- create dump, copy it on the remote node then
 restore tasks from dump files. While all this dump-copy-restore thing goes all
 the process must be stopped. However, if we can monitor how tasks change their
 memory, we can dump and copy it in smaller chunks, periodically updating it 
 and thus freezing tasks only at the very end for the very short time to pick
 up the recent changes.
 
 That said, some help from kernel to watch how processes modify the contents of
 their memory is required. I'd like to propose one possible solution of this
 task -- with the help of page-faults and trace events.
 
 Briefly the approach is -- remap some memory regions as read-only, get the #pf
 on task's attempt to modify the memory and issue a trace event of that. Since
 we're only interested in parts of memory of some tasks, make it possible to 
 mark
 the vmas we're interested in and issue events for them only. Also, to be aware
 of tasks unmapping the vma-s being watched, also issue an event when the 
 marked
 vma is removed (and for symmetry -- an event when a vma is marked).
 
 What do you think about this approach? Is this way of supporting mem snapshot
 OK for you, or should we invent some better one?
 

The page fault mechanism is pretty obvious - anything that deals with
dirty pages will end up having to do this. So there is nothing crazy
about this.

What concerns me, however, is that should this go in, we'll have two
dirty mem loggers in the kernel: one to support CRIU, one to support
KVM. And the worst part: They have the exact the same purpose!!

So to begin with, I think one thing to consider, would be to generalize
KVM's dirty memory notification so it can work on a normal process
memory region. KVM api requires a memory slot to be passed, something
we are unlikely to have. But KVM can easily keep its API and use an
alternate mechanics, that's trivial...

Generally speaking, KVM will do polling with this ioctl. I prefer your
tracing mechanism better. The only difference, is that KVM tends to
transfer large chunks of memory in some loads - in the high gigs range.
So the proposal tracing API should be able to optionally batch requests
within a time frame.

It would also be good to hear what does the KVM guys think of it as well



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] x86: PIT connects to pin 2 of IOAPIC

2012-12-03 Thread Gleb Natapov
On Mon, Dec 03, 2012 at 03:01:01PM +0800, Yang Zhang wrote:
 When PIT connects to IOAPIC, it route to pin 2 not pin 0.
 
This hack didn't work for a long time and nobody complained. It
should be safe to get rid of it.

 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
 index cfb7e4d..166c450 100644
 --- a/virt/kvm/ioapic.c
 +++ b/virt/kvm/ioapic.c
 @@ -181,7 +181,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
 irq)
  
  #ifdef CONFIG_X86
   /* Always delivery PIT interrupt to vcpu 0 */
 - if (irq == 0) {
 + if (irq == 2) {
   irqe.dest_mode = 0; /* Physical mode. */
   /* need to read apic_id from apic regiest since
* it can be rewritten */
 -- 
 1.7.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next rfc v7 2/3] virtio_net: multiqueue support

2012-12-03 Thread Michael S. Tsirkin
On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote:
 On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote:
  Jason Wang jasow...@redhat.com writes:
   +static const struct ethtool_ops virtnet_ethtool_ops;
   +
   +/*
   + * Converting between virtqueue no. and kernel tx/rx queue no.
   + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
   + */
   +static int vq2txq(struct virtqueue *vq)
   +{
   + int index = virtqueue_get_queue_index(vq);
   + return index == 1 ? 0 : (index - 2) / 2;
   +}
   +
   +static int txq2vq(int txq)
   +{
   + return txq ? 2 * txq + 2 : 1;
   +}
   +
   +static int vq2rxq(struct virtqueue *vq)
   +{
   + int index = virtqueue_get_queue_index(vq);
   + return index ? (index - 1) / 2 : 0;
   +}
   +
   +static int rxq2vq(int rxq)
   +{
   + return rxq ? 2 * rxq + 1 : 0;
   +}
   +
  
  I thought MST changed the proposed spec to make the control queue always
  the last one, so this logic becomes trivial.
 
 But it may break the support of legacy guest. If we boot a legacy single 
 queue 
 guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is 
 indeed 
 rx1.

Legacy guyest support should be handled by host using feature
bits in the usual way: host should detect legacy guest
by checking the VIRTIO_NET_F_RFS feature.

If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2.
If it's not acked, cvq is vq 2.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next rfc v7 2/3] virtio_net: multiqueue support

2012-12-03 Thread Jason Wang
On 12/03/2012 05:47 PM, Michael S. Tsirkin wrote:
 On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote:
 On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote:
 Jason Wang jasow...@redhat.com writes:
 +static const struct ethtool_ops virtnet_ethtool_ops;
 +
 +/*
 + * Converting between virtqueue no. and kernel tx/rx queue no.
 + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
 + */
 +static int vq2txq(struct virtqueue *vq)
 +{
 +  int index = virtqueue_get_queue_index(vq);
 +  return index == 1 ? 0 : (index - 2) / 2;
 +}
 +
 +static int txq2vq(int txq)
 +{
 +  return txq ? 2 * txq + 2 : 1;
 +}
 +
 +static int vq2rxq(struct virtqueue *vq)
 +{
 +  int index = virtqueue_get_queue_index(vq);
 +  return index ? (index - 1) / 2 : 0;
 +}
 +
 +static int rxq2vq(int rxq)
 +{
 +  return rxq ? 2 * rxq + 1 : 0;
 +}
 +
 I thought MST changed the proposed spec to make the control queue always
 the last one, so this logic becomes trivial.
 But it may break the support of legacy guest. If we boot a legacy single 
 queue 
 guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is 
 indeed 
 rx1.
 Legacy guyest support should be handled by host using feature
 bits in the usual way: host should detect legacy guest
 by checking the VIRTIO_NET_F_RFS feature.

 If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2.
 If it's not acked, cvq is vq 2.


We could, but we didn't gain much from this. Furthermore, we need also do the 
dynamic creation/destroying of virtqueues during feature negotiation which 
seems not supported in qemu now.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next rfc v7 2/3] virtio_net: multiqueue support

2012-12-03 Thread Michael S. Tsirkin
On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
 - if (!try_fill_recv(vi-rq, GFP_KERNEL))
 - schedule_delayed_work(vi-rq.refill, 0);
 + for (i = 0; i  vi-max_queue_pairs; i++)
 + if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
 + schedule_delayed_work(vi-rq[i].refill, 0);
  
   mutex_lock(vi-config_lock);
   vi-config_enable = true;
   mutex_unlock(vi-config_lock);
  
 + BUG_ON(virtnet_set_queues(vi));
 +
   return 0;
  }
  #endif

Also crashing on device nack of command is also not nice.
In this case it seems we can just switch to
single-queue mode which should always be safe.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next rfc v7 2/3] virtio_net: multiqueue support

2012-12-03 Thread Jason Wang
On 12/03/2012 06:14 PM, Michael S. Tsirkin wrote:
 On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
  -  if (!try_fill_recv(vi-rq, GFP_KERNEL))
  -  schedule_delayed_work(vi-rq.refill, 0);
  +  for (i = 0; i  vi-max_queue_pairs; i++)
  +  if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
  +  schedule_delayed_work(vi-rq[i].refill, 0);
   
 mutex_lock(vi-config_lock);
 vi-config_enable = true;
 mutex_unlock(vi-config_lock);
   
  +  BUG_ON(virtnet_set_queues(vi));
  +
 return 0;
   }
   #endif
 Also crashing on device nack of command is also not nice.
 In this case it seems we can just switch to
 single-queue mode which should always be safe.

Not sure it's safe. It depends on the reason why this call fails. If we
left a state that the driver only use single queue but the device use
multi queues, we may still lost the network.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 08/14] KVM: ARM: World-switch implementation

2012-12-03 Thread Marc Zyngier
On 30/11/12 18:49, Christoffer Dall wrote:
 On Fri, Nov 30, 2012 at 12:14 PM, Will Deacon will.dea...@arm.com wrote:
 On Fri, Nov 30, 2012 at 04:47:40PM +, Christoffer Dall wrote:
 On Fri, Nov 30, 2012 at 10:15 AM, Will Deacon will.dea...@arm.com wrote:
 At this point, VM1 is running and VM0:VCPU1 is running. VM0:VCPU0 is not
 running because physical CPU0 is handling an interrupt. The problem is that
 when VCPU0 *is* resumed, it will update the VMID of VM0 and could be
 scheduled in parallel with VCPU1 but with a different VMID.

 How do you avoid this in the current code?

 I don't. Nice catch. Please apply your interesting brain to the following 
 fix:)

 I'm far too sober to look at your patch right now, but I'll think about it
 over the weekend [I can't break it at a quick glance] :)

 In the meantime, can you think about whether the TLB operations need to run
 on every CPU please?

 they don't we can invalidate the TLB and the icache using the inner
 shareability domain. Here's a patch:
 
 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index ad1390f..df1b753 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -146,6 +146,7 @@ struct kvm_one_reg;
  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
  u64 kvm_call_hyp(void *hypfn, ...);
 +void force_vm_exit(const cpumask_t *mask);
 
  #define KVM_ARCH_WANT_MMU_NOTIFIER
  struct kvm;
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index c4f631e..674592e 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -405,9 +405,14 @@ int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
   return v-mode == IN_GUEST_MODE;
  }
 
 -static void reset_vm_context(void *info)
 +/* Just ensure a guest exit from a particular CPU */
 +static void exit_vm_noop(void *info)
  {
 - kvm_call_hyp(__kvm_flush_vm_context);
 +}
 +
 +void force_vm_exit(const cpumask_t *mask)
 +{
 + smp_call_function_many(mask, exit_vm_noop, NULL, true);
  }

Care to update the do_nothing() call in emulate.c to use this as well?

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 05/13] ARM: KVM: VGIC accept vcpu and dist base addresses from user space

2012-12-03 Thread Will Deacon
On Sat, Dec 01, 2012 at 02:52:13AM +, Christoffer Dall wrote:
 On Wed, Nov 28, 2012 at 8:11 AM, Will Deacon will.dea...@arm.com wrote:
  On Sat, Nov 10, 2012 at 03:44:51PM +, Christoffer Dall wrote:
  + kvm-arch.vgic.vgic_dist_base = addr;
  + break;
  + case KVM_VGIC_V2_ADDR_TYPE_CPU:
  + if (!IS_VGIC_ADDR_UNDEF(vgic-vgic_cpu_base))
  + return -EEXIST;
  + if (addr + VGIC_CPU_SIZE  addr)
  + return -EINVAL;
  + kvm-arch.vgic.vgic_cpu_base = addr;
  + break;
  + default:
  + r = -ENODEV;
  + }
  +
  + if (vgic_ioaddr_overlap(kvm)) {
  + kvm-arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
  + kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
  + return -EINVAL;
 
  Perhaps we could put all the address checking in one place, so that the 
  wrapping
  round zero checks and the overlap checks can be in the same function?
 
 
 Like this (?):

Almost:

 diff --git a/Documentation/virtual/kvm/api.txt
 b/Documentation/virtual/kvm/api.txt
 index 7f057a2..0b6b95e 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2163,6 +2163,7 @@ Errors:
ENXIO:  Device not supported on current system
EEXIST: Address already set
E2BIG:  Address outside guest physical address space
 +  EBUSY:  Address overlaps with other device range
 
  struct kvm_device_address {
   __u64 id;
 diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
 index f697c14..660fe24 100644
 --- a/arch/arm/kvm/vgic.c
 +++ b/arch/arm/kvm/vgic.c
 @@ -1230,11 +1230,28 @@ static bool vgic_ioaddr_overlap(struct kvm *kvm)
   phys_addr_t cpu = kvm-arch.vgic.vgic_cpu_base;
 
   if (IS_VGIC_ADDR_UNDEF(dist) || IS_VGIC_ADDR_UNDEF(cpu))
 - return false;
 + return 0;
   if ((dist = cpu  dist + VGIC_DIST_SIZE  cpu) ||
   (cpu = dist  cpu + VGIC_CPU_SIZE  dist))
 - return true;
 - return false;
 + return -EBUSY;
 + return 0;
 +}
 +
 +static int vgic_ioaddr_assign(struct kvm *kvm, phys_addr_t *ioaddr,
 +   phys_addr_t addr, phys_addr_t size)
 +{
 + int ret;
 +
 + if (!IS_VGIC_ADDR_UNDEF(*ioaddr))
 + return -EEXIST;
 + if (addr + size  addr)
 + return -EINVAL;
 +
 + ret = vgic_ioaddr_overlap(kvm);
 + if (ret)
 + return ret;
 + *ioaddr = addr;
 + return ret;
  }
 
  int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr)
 @@ -1251,29 +1268,21 @@ int kvm_vgic_set_addr(struct kvm *kvm,
 unsigned long type, u64 addr)
   mutex_lock(kvm-lock);
   switch (type) {
   case KVM_VGIC_V2_ADDR_TYPE_DIST:
 - if (!IS_VGIC_ADDR_UNDEF(vgic-vgic_dist_base))
 - return -EEXIST;
 - if (addr + VGIC_DIST_SIZE  addr)
 - return -EINVAL;
 - kvm-arch.vgic.vgic_dist_base = addr;
 + r = vgic_ioaddr_assign(kvm, vgic-vgic_dist_base,
 +addr, VGIC_DIST_SIZE);
   break;
   case KVM_VGIC_V2_ADDR_TYPE_CPU:
   if (!IS_VGIC_ADDR_UNDEF(vgic-vgic_cpu_base))
   return -EEXIST;
   if (addr + VGIC_CPU_SIZE  addr)
   return -EINVAL;
 - kvm-arch.vgic.vgic_cpu_base = addr;
 + r = vgic_ioaddr_assign(kvm, vgic-vgic_cpu_base,
 +addr, VGIC_CPU_SIZE);

You've left the checking in here, so now everything is checked twice.

Also, you should probably touch bases with Marc as I'm under the impression
that both of you are looking at addressing my comments so it would be good
to avoid tripping over each other on this.

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv5] virtio-spec: virtio network device RFS support

2012-12-03 Thread Michael S. Tsirkin
Add RFS support to virtio network device.
Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new
configuration field max_virtqueue_pairs to detect supported number of
virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program
packet steering for unidirectional protocols.

---

Changes from v5:
- Address Rusty's comments.
  Changes are only in the text, not the ideas.
- Some minor formatting changes.

Changes from v4:
- address Jason's comments
- have configuration specify the number of VQ pairs and not pairs - 1

Changes from v3:
- rename multiqueue - rfs this is what we support
- Be more explicit about what driver should do.
- Simplify layout making VQs functionality depend on feature.
- Remove unused commands, only leave in programming # of queues

Changes from v2:
Address Jason's comments on v2:
- Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
   this is both clearer and easier to support.
   It does not look like we need a separate steering command
   since host can just watch tx packets as they go.
- Moved RX and TX steering sections near each other.
- Add motivation for other changes in v2

Changes from Jason's rfc:
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
   looks nicer to me.
- documented packet steering, added a generalized steering programming
   command. Current modes are single queue and host driven multiqueue,
   but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
   but makes it more efficient to switch between modes as
   we can avoid this causing packet reordering.

Rusty, could you please take a look and comment soon?
If this looks OK to everyone, we can proceed with finalizing the
implementation. Would be nice to try and put it in 3.8.

 virtio-spec.lyx | 310 ++--
 1 file changed, 301 insertions(+), 9 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 83f2771..119925c 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -59,6 +59,7 @@
 \author -608949062 Rusty Russell,,, 
 \author -385801441 Cornelia Huck cornelia.h...@de.ibm.com
 \author 1531152142 Paolo Bonzini,,, 
+\author 1986246365 Michael S. Tsirkin 
 \end_header
 
 \begin_body
@@ -4170,9 +4171,46 @@ ID 1
 \end_layout
 
 \begin_layout Description
-Virtqueues 0:receiveq.
- 1:transmitq.
- 2:controlq
+Virtqueues 0:receiveq
+\change_inserted 1986246365 1352742829
+0
+\change_unchanged
+.
+ 1:transmitq
+\change_inserted 1986246365 1352742832
+0
+\change_deleted 1986246365 1352742947
+.
+ 
+\change_inserted 1986246365 1352742952
+.
+ 
+ 2N
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531595
+N=0 if VIRTIO_NET_F_RFS is not negotiated, otherwise N is derived from 
+\emph on
+max_virtqueue_pairs
+\emph default
+ control
+\emph on
+ 
+\emph default
+field.
+ 
+\end_layout
+
+\end_inset
+
+: receivqN.
+ 2N+1: transmitqN.
+ 2N+
+\change_unchanged
+2:controlq
 \begin_inset Foot
 status open
 
@@ -4343,6 +4381,16 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1352742767
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1352742808
+VIRTIO_NET_F_RFS(22) Device supports Receive Flow Steering.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4355,11 +4403,45 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout 
+\change_deleted 1986246365 1352743300
+Two
+\change_inserted 1986246365 1354531413
+Three
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: 
VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1354531470
+ The following read-only field, 
+\emph on
+max_virtqueue_pairs
+\emph default
+ only exists if VIRTIO_NET_F_RFS is set.
+ This field specifies the maximum number of each of transmit and receive
+ virtqueues (receiveq0..receiveq
+\emph on
+N
+\emph default
+ and transmitq0..transmitq
+\emph on
+N
+\emph default
+ respectively; 
+\emph on
+N
+\emph default
+=
+\emph on
+max_virtqueue_pairs - 1
+\emph default
+) that can be configured once VIRTIO_NET_F_RFS is negotiated.
+ Legal values for this field are 1 to 8000h.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4392,6 +4474,17 @@ struct virtio_net_config {
 \begin_layout Plain Layout
 
 u16 status;
+\change_inserted 1986246365 1354531427
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531437
+
+u16 max_virtqueue_pairs;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -4410,7 +4503,24 @@ Device Initialization
 
 

Re: [net-next rfc v7 2/3] virtio_net: multiqueue support

2012-12-03 Thread Michael S. Tsirkin
On Mon, Dec 03, 2012 at 06:30:49PM +0800, Jason Wang wrote:
 On 12/03/2012 06:14 PM, Michael S. Tsirkin wrote:
  On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
   -if (!try_fill_recv(vi-rq, GFP_KERNEL))
   -schedule_delayed_work(vi-rq.refill, 0);
   +for (i = 0; i  vi-max_queue_pairs; i++)
   +if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
   +schedule_delayed_work(vi-rq[i].refill, 0);

mutex_lock(vi-config_lock);
vi-config_enable = true;
mutex_unlock(vi-config_lock);

   +BUG_ON(virtnet_set_queues(vi));
   +
return 0;
}
#endif
  Also crashing on device nack of command is also not nice.
  In this case it seems we can just switch to
  single-queue mode which should always be safe.
 
 Not sure it's safe. It depends on the reason why this call fails. If we
 left a state that the driver only use single queue but the device use
 multi queues, we may still lost the network.

Not the way driver is currently written - you'll happily
process incoming packets from all queues so no problem?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 09/14] KVM: ARM: Emulation framework and CP15 emulation

2012-12-03 Thread Will Deacon
On Fri, Nov 30, 2012 at 08:22:28PM +, Christoffer Dall wrote:
 On Mon, Nov 19, 2012 at 10:01 AM, Will Deacon will.dea...@arm.com wrote:
  On Sat, Nov 10, 2012 at 03:43:13PM +, Christoffer Dall wrote:
  This should probably be 0xff and also use the macros that Lorenzo is
  introducing:
 

  http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/132977.html
 
 
 in the A15 TRM bits [7:2] are reserved, so we really only do care
 about bits [1:0], and this file is A15 specific, but if you prefer, I
 can merge this:

The upper bits are RAZ, so yes, please do merge something like this.

 diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
 index 55cb4a3..685063a 100644
 --- a/arch/arm/kvm/coproc_a15.c
 +++ b/arch/arm/kvm/coproc_a15.c
 @@ -24,8 +24,6 @@
  #include asm/kvm_coproc.h
  #include linux/init.h
 
 -#define MPIDR_CPUID0x3
 -
  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
  {
   /*
 @@ -35,8 +33,8 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const
 struct coproc_reg *r)
* revealing the underlying hardware properties is likely to
* be the best choice).
*/
 - vcpu-arch.cp15[c0_MPIDR] = (read_cpuid_mpidr()  ~MPIDR_CPUID)
 - | (vcpu-vcpu_id  MPIDR_CPUID);
 + vcpu-arch.cp15[c0_MPIDR] = (read_cpuid_mpidr()  ~MPIDR_LEVEL_MASK)
 + | (vcpu-vcpu_id  MPIDR_LEVEL_MASK);
  }
 
  #include coproc.h

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next rfc v7 2/3] virtio_net: multiqueue support

2012-12-03 Thread Michael S. Tsirkin
On Mon, Dec 03, 2012 at 06:01:58PM +0800, Jason Wang wrote:
 On 12/03/2012 05:47 PM, Michael S. Tsirkin wrote:
  On Mon, Dec 03, 2012 at 02:05:27PM +0800, Jason Wang wrote:
  On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote:
  Jason Wang jasow...@redhat.com writes:
  +static const struct ethtool_ops virtnet_ethtool_ops;
  +
  +/*
  + * Converting between virtqueue no. and kernel tx/rx queue no.
  + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
  + */
  +static int vq2txq(struct virtqueue *vq)
  +{
  +int index = virtqueue_get_queue_index(vq);
  +return index == 1 ? 0 : (index - 2) / 2;
  +}
  +
  +static int txq2vq(int txq)
  +{
  +return txq ? 2 * txq + 2 : 1;
  +}
  +
  +static int vq2rxq(struct virtqueue *vq)
  +{
  +int index = virtqueue_get_queue_index(vq);
  +return index ? (index - 1) / 2 : 0;
  +}
  +
  +static int rxq2vq(int rxq)
  +{
  +return rxq ? 2 * rxq + 1 : 0;
  +}
  +
  I thought MST changed the proposed spec to make the control queue always
  the last one, so this logic becomes trivial.
  But it may break the support of legacy guest. If we boot a legacy single 
  queue 
  guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is 
  indeed 
  rx1.
  Legacy guyest support should be handled by host using feature
  bits in the usual way: host should detect legacy guest
  by checking the VIRTIO_NET_F_RFS feature.
 
  If VIRTIO_NET_F_RFS is acked, cvq is vq max_virtqueue_pairs * 2.
  If it's not acked, cvq is vq 2.
 
 
 We could, but we didn't gain much from this.

It just seems cleaner and easier to understand.

 Furthermore, we need also
 do the dynamic creation/destroying of virtqueues during feature
 negotiation which seems not supported in qemu now.

It's not *done* in qemu now, but it seems easy: just call
virtio_add_queue for vq2 and on from virtio_net_set_features.
As features can be modified multiple times, we
should add virtio_del_queue and call that beforehand
to get to the known state (two vqs).

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next rfc v7 1/3] virtio-net: separate fields of sending/receiving queue from virtnet_info

2012-12-03 Thread Michael S. Tsirkin
On Mon, Dec 03, 2012 at 01:15:01PM +0800, Jason Wang wrote:
   +
 
   + /* Work struct for refilling if we run low on memory. */
 
   + struct delayed_work refill;
 
 
 
  I can't really see the justificaiton for a refill per queue. Just have
 
  one work iterate all the queues if it happens, unless it happens often
 
  (in which case, we need to look harder at this anyway).
 
  
 
 But during this kind of iteration, we may need enable/disable the napi
 regardless of whether the receive queue has lots to be refilled. This may add
 extra latency.

We are running from the timer, so latency is not a concern I think.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support

2012-12-03 Thread Dong Aisheng
From: Dong Aisheng dong.aish...@linaro.org

Add vgic state save and retore via KVM_SET_IRQCHIP/KVM_GET_IRQCHIP.

Signed-off-by: Dong Aisheng dong.aish...@linaro.org
---
The patch is mainly for implementing the vgic state save and retore function.
I'm not sure the current implementation method is the most proper way.
So i'd like to send out the patch for RFC on the direction and issues i met
during the implementation.

There're mainly two ways to implement such function while it looks different
ways have different issues respectively.
1) Firstly i tried using ONE_REG interfaces that user spaces could
read all virtual gic dist registers and gic virtual interfaces control
registers for saving gic states. (The vitural cpu interfaces registers
are not exported since it's state could be got from GICH_VMCR register which
already reflects the basic cpu interface states.)
The way to do this could be to mannually emulate MMIO access in
kernel to re-use vgic_handle_mmio code since the vgic code in kernel does
not maintain gic state in registers format and we can not simply memcpy
the kernel gic registers to user spaces.

For this way, the problem is that it looks a bit low efficiency for emulate
register access and via current ONE_REG interface we do not know which CPU
is performing the register access, so the banked registers are not suppported
well,
e.g. some banked dist registers and banked cpu virtual interfaces
control registers.
So i'm not sure we should still go follow this way.

2) Another way is, as x86, to use standard KVM_SET_IRQCHIP/KVM_GET_IRQCHIP
ioctl to save and restore irq chip state. e.g. arch/x86/kvm/x86.c.
This method does not have banked registers issue since it could all be
handled by SW and it looks simpler than the first way.
The problem is that the x86 kvm irqchip driver are using almost the same data
structure as user space to represent irqchip states which then can be easily
copied to user spaces. But arm vgic driver are using different data structure,
e.g. qemu gic common code is using gic_state while in kernel vgic code is using
a number of vgic_bitmaps.
We could either read all the in-kernel vgic_bitmaps from user space and
mannually convert it to git_state for user space to use to re-use the exist
common state handle codes in qemu or convert to git_state format in kernel
and then export to user space, but neither way looks good and they also
look low efficiency, so i can not see enough reasons to do so.

The current implementation in this patch does not do any convertion
between the two different state formats and it just simply export what we have
in kernel to represent gic state to user space via KVM_GET_IRQCHIP,
that means we does not use much common state handle code in arm_gic_common.c
in qemu.

However, by comparing to first method, the disadvantage of second method
is that it has side effect. Once the kernel vgic state code changes, it may
also need change the user space code accordingly since the user space
are using the kernel state definition.
And another known issue in current patch, VGIC_MAX_CPUS, defined the
size of CPU data in dist state, used is in headfile
arch/arm/include/asm/kvm.h which will also be used by qemu, is dynamically
defined in kernel, it looks we should not export it to user space and need
find a way to fix. Besides that, there're aslo some headfiles cross-inclusion
issue also need to resolved.

I'd like to know your suggestions on which way is better and your comments
on the issues i metioned above.
---
 Documentation/virtual/kvm/api.txt |6 +-
 arch/arm/include/asm/kvm.h|   11 +++
 arch/arm/include/asm/kvm_vgic.h   |   47 +
 arch/arm/kernel/asm-offsets.c |   16 ++--
 arch/arm/kvm/arm.c|   83 ++-
 arch/arm/kvm/interrupts_head.S|6 +-
 arch/arm/kvm/vgic.c   |  134 ++--
 include/linux/kvm.h   |3 +
 include/linux/kvm_host.h  |3 +-
 9 files changed, 213 insertions(+), 96 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7695425..e736689 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -654,7 +654,7 @@ struct kvm_irq_level {
 4.26 KVM_GET_IRQCHIP
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irqchip (in/out)
 Returns: 0 on success, -1 on error
@@ -669,6 +669,7 @@ struct kvm_irqchip {
char dummy[512];  /* reserving space */
struct kvm_pic_state pic;
struct kvm_ioapic_state ioapic;
+   struct kvm_vgic_state vgic;
} chip;
 };
 
@@ -676,7 +677,7 @@ struct kvm_irqchip {
 4.27 KVM_SET_IRQCHIP
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irqchip (in)
 Returns: 0 on success, -1 on error
@@ -691,6 +692,7 @@ 

Re: [net-next rfc v7 3/3] virtio-net: change the number of queues through ethtool

2012-12-03 Thread Michael S. Tsirkin
On Mon, Dec 03, 2012 at 02:09:28PM +0800, Jason Wang wrote:
 On Sunday, December 02, 2012 06:09:06 PM Michael S. Tsirkin wrote:
  On Tue, Nov 27, 2012 at 06:16:00PM +0800, Jason Wang wrote:
   This patch implement the {set|get}_channels method of ethool to allow user
   to change the number of queues dymaically when the device is running.
   This would let the user to configure it on demand.
   
   Signed-off-by: Jason Wang jasow...@redhat.com
   ---
   
drivers/net/virtio_net.c |   41 +
1 files changed, 41 insertions(+), 0 deletions(-)
   
   diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
   index bcaa6e5..f08ec2a 100644
   --- a/drivers/net/virtio_net.c
   +++ b/drivers/net/virtio_net.c
   @@ -1578,10 +1578,51 @@ static struct virtio_driver virtio_net_driver = {
   
#endif
};
   
   +/* TODO: Eliminate OOO packets during switching */
   +static int virtnet_set_channels(struct net_device *dev,
   + struct ethtool_channels *channels)
   +{
   + struct virtnet_info *vi = netdev_priv(dev);
   + u16 queue_pairs = channels-combined_count;

by the way shouldn't this be combined_count / 2?

And below channels-max_combined = vi-max_queue_pairs * 2; ?

   +
   + /* We don't support separate rx/tx channels.
   +  * We don't allow setting 'other' channels.
   +  */
   + if (channels-rx_count || channels-tx_count || channels-other_count)
   + return -EINVAL;
   +
   + /* Only two modes were support currently */
   + if (queue_pairs != vi-max_queue_pairs  queue_pairs != 1)
   + return -EINVAL;
   +
  
  Why the limitation?
 
 Not sure the value bettwen 1 and max_queue_pairs is useful. But anyway, I can 
 remove this limitation.

I guess a reasonable value would be number of active CPUs,
which can change with CPU hotplug.

  Also how does userspace discover what the legal values are?
 
 Userspace only check whether the value is greater than max_queue_pairs.

Exactly. One can query max combined but the fact that configuring
any less is impossible might be surprising.

  
   + vi-curr_queue_pairs = queue_pairs;
   + BUG_ON(virtnet_set_queues(vi));
   +
   + netif_set_real_num_tx_queues(dev, vi-curr_queue_pairs);
   + netif_set_real_num_rx_queues(dev, vi-curr_queue_pairs);
   +
   + return 0;
   +}
   +
   +static void virtnet_get_channels(struct net_device *dev,
   +  struct ethtool_channels *channels)
   +{
   + struct virtnet_info *vi = netdev_priv(dev);
   +
   + channels-combined_count = vi-curr_queue_pairs;
   + channels-max_combined = vi-max_queue_pairs;
   + channels-max_other = 0;
   + channels-rx_count = 0;
   + channels-tx_count = 0;
   + channels-other_count = 0;
   +}
   +
   
static const struct ethtool_ops virtnet_ethtool_ops = {

 .get_drvinfo = virtnet_get_drvinfo,
 .get_link = ethtool_op_get_link,
 .get_ringparam = virtnet_get_ringparam,
   
   + .set_channels = virtnet_set_channels,
   + .get_channels = virtnet_get_channels,
   
};

static int __init init(void)
  
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support

2012-12-03 Thread Gleb Natapov
On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
 Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
 manually, which is fully taken care of by the hardware. This needs
 some special awareness into existing interrupr injection path:
 
 - for pending interrupt, instead of direct injection, we may need
   update architecture specific indicators before resuming to guest.
 
 - A pending interrupt, which is masked by ISR, should be also
   considered in above update action, since hardware will decide
   when to inject it at right time. Current has_interrupt and
   get_interrupt only returns a valid vector from injection p.o.v.
 
Most of my previous comments still apply.

 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 Signed-off-by: Kevin Tian kevin.t...@intel.com
 ---
  arch/x86/include/asm/kvm_host.h |4 +
  arch/x86/include/asm/vmx.h  |   11 +++
  arch/x86/kvm/irq.c  |   53 ++-
  arch/x86/kvm/lapic.c|   56 +---
  arch/x86/kvm/lapic.h|6 ++
  arch/x86/kvm/svm.c  |   19 +
  arch/x86/kvm/vmx.c  |  140 
 ++-
  arch/x86/kvm/x86.c  |   34 --
  virt/kvm/ioapic.c   |1 +
  9 files changed, 291 insertions(+), 33 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index dc87b65..e5352c8 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -697,6 +697,10 @@ struct kvm_x86_ops {
   void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
   void (*enable_irq_window)(struct kvm_vcpu *vcpu);
   void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
 + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
 + void (*update_irq)(struct kvm_vcpu *vcpu);
 + void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
 + int trig_mode, int always_set);
   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
   int (*get_tdp_level)(void);
   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
 index 21101b6..1003341 100644
 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -62,6 +62,7 @@
  #define EXIT_REASON_MCE_DURING_VMENTRY  41
  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
  #define EXIT_REASON_APIC_ACCESS 44
 +#define EXIT_REASON_EOI_INDUCED 45
  #define EXIT_REASON_EPT_VIOLATION   48
  #define EXIT_REASON_EPT_MISCONFIG   49
  #define EXIT_REASON_WBINVD  54
 @@ -143,6 +144,7 @@
  #define SECONDARY_EXEC_WBINVD_EXITING0x0040
  #define SECONDARY_EXEC_UNRESTRICTED_GUEST0x0080
  #define SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100
 +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200
  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING0x0400
  #define SECONDARY_EXEC_ENABLE_INVPCID0x1000
  
 @@ -180,6 +182,7 @@ enum vmcs_field {
   GUEST_GS_SELECTOR   = 0x080a,
   GUEST_LDTR_SELECTOR = 0x080c,
   GUEST_TR_SELECTOR   = 0x080e,
 + GUEST_INTR_STATUS   = 0x0810,
   HOST_ES_SELECTOR= 0x0c00,
   HOST_CS_SELECTOR= 0x0c02,
   HOST_SS_SELECTOR= 0x0c04,
 @@ -207,6 +210,14 @@ enum vmcs_field {
   APIC_ACCESS_ADDR_HIGH   = 0x2015,
   EPT_POINTER = 0x201a,
   EPT_POINTER_HIGH= 0x201b,
 + EOI_EXIT_BITMAP0= 0x201c,
 + EOI_EXIT_BITMAP0_HIGH   = 0x201d,
 + EOI_EXIT_BITMAP1= 0x201e,
 + EOI_EXIT_BITMAP1_HIGH   = 0x201f,
 + EOI_EXIT_BITMAP2= 0x2020,
 + EOI_EXIT_BITMAP2_HIGH   = 0x2021,
 + EOI_EXIT_BITMAP3= 0x2022,
 + EOI_EXIT_BITMAP3_HIGH   = 0x2023,
   GUEST_PHYSICAL_ADDRESS  = 0x2400,
   GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401,
   VMCS_LINK_POINTER   = 0x2800,
 diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
 index 7e06ba1..f782788 100644
 --- a/arch/x86/kvm/irq.c
 +++ b/arch/x86/kvm/irq.c
 @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
   */
  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
  {
 - struct kvm_pic *s;
 -
   if (!irqchip_in_kernel(v-kvm))
   return v-arch.interrupt.pending;
  
 - if (kvm_apic_has_interrupt(v) == -1) {  /* LAPIC */
 - if (kvm_apic_accept_pic_intr(v)) {
 - s = pic_irqchip(v-kvm);/* PIC */
 - return s-output;
 - } else
 - return 0;
 - }
 + if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
 + return 

Re: [RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support

2012-12-03 Thread Peter Maydell
On 3 December 2012 10:36, Dong Aisheng b29...@freescale.com wrote:
 The patch is mainly for implementing the vgic state save and retore function.
 I'm not sure the current implementation method is the most proper way.
 So i'd like to send out the patch for RFC on the direction and issues i met
 during the implementation.

 There're mainly two ways to implement such function while it looks different
 ways have different issues respectively.
 1) Firstly i tried using ONE_REG interfaces that user spaces could
 read all virtual gic dist registers and gic virtual interfaces control
 registers for saving gic states. (The vitural cpu interfaces registers
 are not exported since it's state could be got from GICH_VMCR register which
 already reflects the basic cpu interface states.)
 The way to do this could be to mannually emulate MMIO access in
 kernel to re-use vgic_handle_mmio code since the vgic code in kernel does
 not maintain gic state in registers format and we can not simply memcpy
 the kernel gic registers to user spaces.

 For this way, the problem is that it looks a bit low efficiency for emulate
 register access

Efficiency is really not a significant concern here, because state
save and restore only happens when doing VM migration, which is
already a slow process (mostly dominated by the costs of moving the
VM memory from one machine to another).

 and via current ONE_REG interface we do not know which CPU
 is performing the register access, so the banked registers are not
 suppported well,

Actually you do, because it's a vcpu ioctl. That does raise a
different question, though. ONE_REG is currently aimed as a vcpu
ioctl for CPU state save/restore -- how does it need to change
to handle device state save/restore where the device is not per-cpu?

 e.g. some banked dist registers and banked cpu virtual interfaces
 control registers.
 So i'm not sure we should still go follow this way.

You can handle banked registers the same way we handle the cache
ID registers in the main CPU. In general ONE_REG is not supposed
to expose direct access to registers which behave like the hardware
registers: it is supposed to expose access to registers which
merely store state, ie which can be idempotently saved and loaded.
Sometimes this means the kernel has to invent its own registers which
don't have side effects.

 2) Another way is, as x86, to use standard KVM_SET_IRQCHIP/KVM_GET_IRQCHIP
 ioctl to save and restore irq chip state. e.g. arch/x86/kvm/x86.c.
 This method does not have banked registers issue since it could all be
 handled by SW and it looks simpler than the first way.
 The problem is that the x86 kvm irqchip driver are using almost the same data
 structure as user space to represent irqchip states which then can be easily
 copied to user spaces. But arm vgic driver are using different data structure,
 e.g. qemu gic common code is using gic_state while in kernel vgic code is 
 using
 a number of vgic_bitmaps.
 We could either read all the in-kernel vgic_bitmaps from user space and
 mannually convert it to git_state for user space to use to re-use the exist
 common state handle codes in qemu or convert to git_state format in kernel
 and then export to user space, but neither way looks good and they also
 look low efficiency, so i can not see enough reasons to do so.

Somebody needs to do this conversion, or TCG-to-KVM migrations won't work.
I don't have a strong opinion whether it ought to be userspace or kernel;
I think the deciding factor is probably going to be which is the easiest
for the kernel to support into the future as a stable ABI (even as GIC
versions change etc).

 The current implementation in this patch does not do any convertion
 between the two different state formats and it just simply export what we have
 in kernel to represent gic state to user space via KVM_GET_IRQCHIP,
 that means we does not use much common state handle code in arm_gic_common.c
 in qemu.

This is definitely the wrong approach, I'm afraid. I still strongly
feel we should be using the standard ONE_REG interfaces here (though
there is perhaps an argument to be made against this, along the lines
of my remarks above about it being a vcpu ioctl).

 However, by comparing to first method, the disadvantage of second method
 is that it has side effect. Once the kernel vgic state code changes, it may
 also need change the user space code accordingly since the user space
 are using the kernel state definition.

Any approach that doesn't maintain binary compatibility across kernel
versions is definitely not going to work. In particular you have to
be able to accept state that may have been saved out by a different
older kernel version (for doing VM migrations between host machines
with different kernels installed).

By far the largest part of the save/restore work here is figuring out
what the right state to expose to userspace is so we can retain that
compatibility guarantee. Whether we then use ONE_REG or a single struct
as 

Re: KVM VMX: register state after reset violates spec

2012-12-03 Thread Julian Stecklina
Thus spake Gleb Natapov g...@redhat.com:

 On Thu, Nov 29, 2012 at 03:07:38PM +0100, Julian Stecklina wrote:
 Hello,
 
 we have noticed that at least on 3.6.8 with VMX after a VCPU has been
 reset via the INIT-SIPI-SIPI sequence its register state violates
 Intel's specification.
[...]
 Shouldn't vmx_vcpu_reset actively clear those registers? And from a
 quick glance at the SVM code the problem might exist there, too.
 
 It should, so why not move the fix to kvm_vcpu_reset() so it will work
 for both. Also what about R8-R15? Intel SDM says nothing about them in
 the section you mention, but in Volume 1 section 3.4.1.1 is says:
[...]
 I take it that they are undefined on the first transition to 64-bit mode
 too. AMD spec says that they should be zeroed on reset, so lets do that.
 Also SVM does not set EDX to correct value on reset.

I'll post a revised patch later today.

Julian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] KVM: ARM: Handle guest faults in KVM

2012-12-03 Thread Will Deacon
On Fri, Nov 30, 2012 at 09:40:37PM +, Christoffer Dall wrote:
 On Mon, Nov 19, 2012 at 10:07 AM, Will Deacon will.dea...@arm.com wrote:
 
  Why are PIPT caches affected by this? The virtual address is irrelevant.
 
 
 The comment is slightly misleading, and I'll update it. Just so we're
 clear, this is the culprit:
 
 1. guest uses page X, containing instruction A
 2. page X gets swapped out
 3. host uses page X, containing instruction B
 4. instruction B enters i-cache at page X's cache line
 5. page X gets swapped out
 6. guest swaps page X back in
 7. guest executes instruction B from cache, should execute instruction A

Ok, that's clearer. Thanks for the explanation.

 The point is that with PIPT we can flush only that page from the
 icache using the host virtual address, as the MMU will do the
 translation on the fly. In the VIPT we have to nuke the whole thing
 (unless we .

Unless we what? Could we flush using the host VA + all virtual aliases
instead?

  +* (or another VM) may have used this page at the same virtual 
  address
  +* as this guest, and we read incorrect data from the icache.  If
  +* we're using a PIPT cache, we can invalidate just that page, but 
  if
  +* we are using a VIPT cache we need to invalidate the entire 
  icache -
  +* damn shame - as written in the ARM ARM (DDI 0406C - Page 
  B3-1384)
  +*/
  +   if (icache_is_pipt()) {
  +   unsigned long hva = gfn_to_hva(kvm, gfn);
  +   __cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
  +   } else if (!icache_is_vivt_asid_tagged()) {
  +   /* any kind of VIPT cache */
  +   __flush_icache_all();
  +   }
 
  so what if it *is* vivt_asid_tagged? Surely that necessitates nuking the
  thing, unless it's VMID tagged as well (does that even exist?).
 
 
 see page B3-1392 in the ARM ARM, if it's vivt_asid_tagged it is also
 vmid tagged.

Great.

  +   write_fault = false;
  +   else
  +   write_fault = true;
  +
  +   if (fault_status == FSC_PERM  !write_fault) {
  +   kvm_err(Unexpected L2 read permission error\n);
  +   return -EFAULT;
  +   }
  +
  +   /* We need minimum second+third level pages */
  +   ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
  +   if (ret)
  +   return ret;
  +
  +   mmu_seq = vcpu-kvm-mmu_notifier_seq;
  +   smp_rmb();
 
  What's this barrier for and why isn't there a write barrier paired with
  it?
 
 
 The read barrier is to ensure that mmu_notifier_seq is read before we
 call gfn_to_pfn_prot (which is essentially get_user_pages), so that we
 don't get a page which is unmapped by an MMU notifier before we grab
 the spinlock that we would never see. I also added a comment
 explaining it in the patch below.
 
 There is a write barrier paired with it, see virt/kvm/kvm_main.c,
 specifically kvm_mmu_notifier_invalidate_page (the spin_unlock), and
 kvm_mmu_notifier_invalidate_range_end.

Ok, so it sounds like a homebrew seqlock implementatiom. Any reason KVM
isn't just using those directly?

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 50891] The smp_affinity cannot work correctly on guest os when PCI passthrough device using msi/msi-x with KVM

2012-12-03 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=50891


liyi yiliker...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||INVALID




--- Comment #14 from liyi yiliker...@gmail.com  2012-12-03 13:18:21 ---
sorry to disturb all of you. this issue is fixed at the latest qemu-kvm version

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support

2012-12-03 Thread Peter Maydell
On 3 December 2012 12:02, Peter Maydell peter.mayd...@linaro.org wrote:
 By far the largest part of the save/restore work here is figuring out
 what the right state to expose to userspace is so we can retain that
 compatibility guarantee.

Some further thoughts on this...

What we're really providing the guest here is a hardware-accelerated
software emulation of a no-virtualization GICv2. So it's better for
the state we expose to userspace to be the state of that emulated
hardware, and not to expose details of exactly what that hardware
acceleration is. The hw accel is a property of the host CPU,
not of the guest VM environment, so it could be different on
different kernels or host CPUs, which would make migration potentially
tedious. For instance, consider migration where the source machine
has a VGIC with more list registers than the destination machine.

Obviously you could convert between different representations
in a number of places (source kernel, source qemu, destination
qemu, destination kernel), but I think it makes sense for the
canonical representation to be the guest-visible representation
of the emulated hardware, rather than privileging any one
acceleration implementation above another.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management

2012-12-03 Thread Will Deacon
Hi Marc,

I've managed to look at some more of the vgic code, so here is some more
feedback. I've still not got to the end of the series, but there's light at
the end of the tunnel...

On Sat, Nov 10, 2012 at 03:45:05PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com
 
 Add VGIC virtual CPU interface code, picking pending interrupts
 from the distributor and stashing them in the VGIC control interface
 list registers.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_vgic.h |   41 +++
  arch/arm/kvm/vgic.c |  226 
 +++
  2 files changed, 266 insertions(+), 1 deletion(-)
 
 diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
 index 9e60b1d..7229324 100644
 --- a/arch/arm/include/asm/kvm_vgic.h
 +++ b/arch/arm/include/asm/kvm_vgic.h
 @@ -193,8 +193,45 @@ struct vgic_dist {
  };
 
  struct vgic_cpu {
 +#ifdef CONFIG_KVM_ARM_VGIC
 +   /* per IRQ to LR mapping */
 +   u8  vgic_irq_lr_map[VGIC_NR_IRQS];

per IRQ?

 +
 +   /* Pending interrupts on this VCPU */
 +   DECLARE_BITMAP( pending, VGIC_NR_IRQS);
 +
 +   /* Bitmap of used/free list registers */
 +   DECLARE_BITMAP( lr_used, 64);
 +
 +   /* Number of list registers on this CPU */
 +   int nr_lr;
 +
 +   /* CPU vif control registers for world switch */
 +   u32 vgic_hcr;
 +   u32 vgic_vmcr;
 +   u32 vgic_misr;  /* Saved only */
 +   u32 vgic_eisr[2];   /* Saved only */
 +   u32 vgic_elrsr[2];  /* Saved only */
 +   u32 vgic_apr;
 +   u32 vgic_lr[64];/* A15 has only 4... */
 +#endif
  };

Looks like we should have a #define for the maximum number of list registers,
so we keep vgic_lr and lr_user in sync.

 
 +#define VGIC_HCR_EN(1  0)
 +#define VGIC_HCR_UIE   (1  1)
 +
 +#define VGIC_LR_VIRTUALID  (0x3ff  0)
 +#define VGIC_LR_PHYSID_CPUID   (7  10)
 +#define VGIC_LR_STATE  (3  28)
 +#define VGIC_LR_PENDING_BIT(1  28)
 +#define VGIC_LR_ACTIVE_BIT (1  29)
 +#define VGIC_LR_EOI(1  19)
 +
 +#define VGIC_MISR_EOI  (1  0)
 +#define VGIC_MISR_U(1  1)
 +
 +#define LR_EMPTY   0xff
 +

Could stick these in asm/hardware/gic.h. I know they're not used by the gic
driver, but they're the same piece of architecture so it's probably worth
keeping in one place.

You'd probably also want a s/VGIC/GICH/

  struct kvm;
  struct kvm_vcpu;
  struct kvm_run;
 @@ -202,9 +239,13 @@ struct kvm_exit_mmio;
 
  #ifdef CONFIG_KVM_ARM_VGIC
  int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
 +void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
 +void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
 +int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
   struct kvm_exit_mmio *mmio);
 
 +#define irqchip_in_kernel(k)   (!!((k)-arch.vgic.vctrl_base))
  #else
  static inline int kvm_vgic_hyp_init(void)
  {
 diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
 index 82feee8..d7cdec5 100644
 --- a/arch/arm/kvm/vgic.c
 +++ b/arch/arm/kvm/vgic.c
 @@ -587,7 +587,25 @@ static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 
 reg)
 
  static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
  {
 -   return 0;
 +   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +   unsigned long *pending, *enabled, *pend;
 +   int vcpu_id;
 +
 +   vcpu_id = vcpu-vcpu_id;
 +   pend = vcpu-arch.vgic_cpu.pending;
 +
 +   pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id);
 +   enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id);
 +   bitmap_and(pend, pending, enabled, 32);

pend and pending! vcpu_pending and dist_pending?

 +
 +   pending = vgic_bitmap_get_shared_map(dist-irq_state);
 +   enabled = vgic_bitmap_get_shared_map(dist-irq_enabled);
 +   bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS);
 +   bitmap_and(pend + 1, pend + 1,
 +  vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]),
 +  VGIC_NR_SHARED_IRQS);
 +
 +   return (find_first_bit(pend, VGIC_NR_IRQS)  VGIC_NR_IRQS);
  }
 
  /*
 @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm)
 }
  }
 
 +#define LR_PHYSID(lr)  (((lr)  VGIC_LR_PHYSID_CPUID)  10)

Is VGIC_LR_PHYSID_CPUID wide enough for this? The CPUID is only 3 bits, but
the interrupt ID could be larger. Or do you not supported hardware interrupt
forwarding? (in which case, LR_PHYSID is a misleading name).

 +#define MK_LR_PEND(src, irq)   (VGIC_LR_PENDING_BIT | ((src)  10) | (irq))
 +/*
 + * Queue an interrupt to a CPU virtual interface. Return true on success,
 + * or false if it wasn't possible 

Re: [PATCH v4 08/13] ARM: KVM: vgic: retire queued, disabled interrupts

2012-12-03 Thread Will Deacon
On Sat, Nov 10, 2012 at 03:45:11PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com
 
 An interrupt may have been disabled after being made pending on the
 CPU interface (the classic case is a timer running while we're
 rebooting the guest - the interrupt would kick as soon as the CPU
 interface gets enabled, with deadly consequences).
 
 The solution is to examine already active LRs, and check the
 interrupt is still enabled. If not, just retire it.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/kvm/vgic.c |   30 ++
  1 file changed, 30 insertions(+)
 
 diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
 index d7cdec5..dda5623 100644
 --- a/arch/arm/kvm/vgic.c
 +++ b/arch/arm/kvm/vgic.c
 @@ -633,6 +633,34 @@ static void vgic_update_state(struct kvm *kvm)
  
  #define LR_PHYSID(lr)(((lr)  VGIC_LR_PHYSID_CPUID)  10)
  #define MK_LR_PEND(src, irq) (VGIC_LR_PENDING_BIT | ((src)  10) | (irq))
 +
 +/*
 + * An interrupt may have been disabled after being made pending on the
 + * CPU interface (the classic case is a timer running while we're
 + * rebooting the guest - the interrupt would kick as soon as the CPU
 + * interface gets enabled, with deadly consequences).
 + *
 + * The solution is to examine already active LRs, and check the
 + * interrupt is still enabled. If not, just retire it.
 + */
 +static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
 +{
 + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 + struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 + int lr;
 +
 + for_each_set_bit(lr, vgic_cpu-lr_used, vgic_cpu-nr_lr) {
 + int irq = vgic_cpu-vgic_lr[lr]  VGIC_LR_VIRTUALID;
 +
 + if (!vgic_bitmap_get_irq_val(dist-irq_enabled,
 +  vcpu-vcpu_id, irq)) {
 + vgic_cpu-vgic_irq_lr_map[irq] = LR_EMPTY;
 + clear_bit(lr, vgic_cpu-lr_used);
 + vgic_cpu-vgic_lr[lr] = ~VGIC_LR_STATE;
 + }
 + }
 +}
 +
  /*
   * Queue an interrupt to a CPU virtual interface. Return true on success,
   * or false if it wasn't possible to queue it.
 @@ -696,6 +724,8 @@ static void __kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu)
  
   vcpu_id = vcpu-vcpu_id;
  
 + vgic_retire_disabled_irqs(vcpu);

Wouldn't it be better to do this when the interrupt is disabled, rather
than do the checking in the sync_to_cpu path?

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection

2012-12-03 Thread Will Deacon
On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com
 
 Plug the interrupt injection code. Interrupts can now be generated
 from user space.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_vgic.h |8 +++
  arch/arm/kvm/arm.c  |   29 +
  arch/arm/kvm/vgic.c |   90 
 +++
  3 files changed, 127 insertions(+)
 
 diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
 index 7229324..6e3d303 100644
 --- a/arch/arm/include/asm/kvm_vgic.h
 +++ b/arch/arm/include/asm/kvm_vgic.h
 @@ -241,6 +241,8 @@ struct kvm_exit_mmio;
  int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
  void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
  void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
 +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 + bool level);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 struct kvm_exit_mmio *mmio);
 @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu 
 *vcpu) {}
  static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {}
  static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {}
  
 +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid,
 +   const struct kvm_irq_level *irq)
 +{
 + return 0;
 +}
 +
  static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
  {
   return 0;
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 3ac1aab..f43da01 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
 kvm_irq_level *irq_level)
  
   switch (irq_type) {
   case KVM_ARM_IRQ_TYPE_CPU:
 + if (irqchip_in_kernel(kvm))
 + return -ENXIO;
 +
   if (irq_num  KVM_ARM_IRQ_CPU_FIQ)
   return -EINVAL;
  
   return vcpu_interrupt_line(vcpu, irq_num, level);
 +#ifdef CONFIG_KVM_ARM_VGIC
 + case KVM_ARM_IRQ_TYPE_PPI:
 + if (!irqchip_in_kernel(kvm))
 + return -ENXIO;
 +
 + if (irq_num  16 || irq_num  31)
 + return -EINVAL;

It's our favourite two numbers again! :)

 +
 + return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level);
 + case KVM_ARM_IRQ_TYPE_SPI:
 + if (!irqchip_in_kernel(kvm))
 + return -ENXIO;
 +
 + if (irq_num  32 || irq_num  KVM_ARM_IRQ_GIC_MAX)
 + return -EINVAL;
 +
 + return kvm_vgic_inject_irq(kvm, 0, irq_num, level);
 +#endif
   }
  
   return -EINVAL;
 @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
   void __user *argp = (void __user *)arg;
  
   switch (ioctl) {
 +#ifdef CONFIG_KVM_ARM_VGIC
 + case KVM_CREATE_IRQCHIP: {
 + if (vgic_present)
 + return kvm_vgic_create(kvm);
 + else
 + return -EINVAL;

ENXIO? At least, that's what you use when setting the GIC addresses.

 + }
 +#endif
   case KVM_SET_DEVICE_ADDRESS: {
   struct kvm_device_address dev_addr;
  
 diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
 index dda5623..70040bb 100644
 --- a/arch/arm/kvm/vgic.c
 +++ b/arch/arm/kvm/vgic.c
 @@ -75,6 +75,7 @@
  #define ACCESS_WRITE_MASK(x) ((x)  (3  1))
  
  static void vgic_update_state(struct kvm *kvm);
 +static void vgic_kick_vcpus(struct kvm *kvm);
  static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
  
  static inline int vgic_irq_is_edge(struct vgic_dist *dist, int irq)
 @@ -542,6 +543,9 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct 
 kvm_run *run, struct kvm_exi
   kvm_prepare_mmio(run, mmio);
   kvm_handle_mmio_return(vcpu, run);
  
 + if (updated_state)
 + vgic_kick_vcpus(vcpu-kvm);
 +
   return true;
  }
  
 @@ -867,6 +871,92 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
   return test_bit(vcpu-vcpu_id, dist-irq_pending_on_cpu);
  }
  
 +static void vgic_kick_vcpus(struct kvm *kvm)
 +{
 + struct kvm_vcpu *vcpu;
 + int c;
 +
 + /*
 +  * We've injected an interrupt, time to find out who deserves
 +  * a good kick...
 +  */
 + kvm_for_each_vcpu(c, vcpu, kvm) {
 + if (kvm_vgic_vcpu_pending_irq(vcpu))
 + kvm_vcpu_kick(vcpu);
 + }
 +}
 +
 +static bool vgic_update_irq_state(struct kvm *kvm, int cpuid,
 +   unsigned int irq_num, bool level)
 +{
 + struct vgic_dist *dist = kvm-arch.vgic;
 + struct kvm_vcpu *vcpu;
 + int is_edge, is_level, state;
 + int enabled;
 + bool 

Re: [PATCH v4 10/13] ARM: KVM: VGIC control interface world switch

2012-12-03 Thread Will Deacon
On Sat, Nov 10, 2012 at 03:45:25PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com
 
 Enable the VGIC control interface to be save-restored on world switch.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_arm.h |   12 +++
  arch/arm/kernel/asm-offsets.c  |   12 +++
  arch/arm/kvm/interrupts_head.S |   68 
 
  3 files changed, 92 insertions(+)
 
 diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
 index 246afd7..8f5dd22 100644
 --- a/arch/arm/include/asm/kvm_arm.h
 +++ b/arch/arm/include/asm/kvm_arm.h
 @@ -192,4 +192,16 @@
  #define HSR_EC_DABT  (0x24)
  #define HSR_EC_DABT_HYP  (0x25)
  
 +/* GICH offsets */
 +#define GICH_HCR 0x0
 +#define GICH_VTR 0x4
 +#define GICH_VMCR0x8
 +#define GICH_MISR0x10
 +#define GICH_EISR0   0x20
 +#define GICH_EISR1   0x24
 +#define GICH_ELRSR0  0x30
 +#define GICH_ELRSR1  0x34
 +#define GICH_APR 0xf0
 +#define GICH_LR0 0x100

Similar thing to the other new gic defines -- they're probably better off
in gic.h

 +
  #endif /* __ARM_KVM_ARM_H__ */
 diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
 index 95cab37..39b6221 100644
 --- a/arch/arm/kernel/asm-offsets.c
 +++ b/arch/arm/kernel/asm-offsets.c
 @@ -167,6 +167,18 @@ int main(void)
DEFINE(VCPU_HxFAR, offsetof(struct kvm_vcpu, arch.hxfar));
DEFINE(VCPU_HPFAR, offsetof(struct kvm_vcpu, arch.hpfar));
DEFINE(VCPU_HYP_PC,offsetof(struct kvm_vcpu, arch.hyp_pc));
 +#ifdef CONFIG_KVM_ARM_VGIC
 +  DEFINE(VCPU_VGIC_CPU,  offsetof(struct kvm_vcpu, 
 arch.vgic_cpu));
 +  DEFINE(VGIC_CPU_HCR,   offsetof(struct vgic_cpu, vgic_hcr));
 +  DEFINE(VGIC_CPU_VMCR,  offsetof(struct vgic_cpu, vgic_vmcr));
 +  DEFINE(VGIC_CPU_MISR,  offsetof(struct vgic_cpu, vgic_misr));
 +  DEFINE(VGIC_CPU_EISR,  offsetof(struct vgic_cpu, vgic_eisr));
 +  DEFINE(VGIC_CPU_ELRSR, offsetof(struct vgic_cpu, vgic_elrsr));
 +  DEFINE(VGIC_CPU_APR,   offsetof(struct vgic_cpu, vgic_apr));
 +  DEFINE(VGIC_CPU_LR,offsetof(struct vgic_cpu, vgic_lr));
 +  DEFINE(VGIC_CPU_NR_LR, offsetof(struct vgic_cpu, nr_lr));
 +  DEFINE(KVM_VGIC_VCTRL, offsetof(struct kvm, arch.vgic.vctrl_base));
 +#endif
DEFINE(KVM_VTTBR,  offsetof(struct kvm, arch.vttbr));
  #endif
return 0; 
 diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
 index 2ac8b4a..c2423d8 100644
 --- a/arch/arm/kvm/interrupts_head.S
 +++ b/arch/arm/kvm/interrupts_head.S
 @@ -341,6 +341,45 @@
   * @vcpup: Register pointing to VCPU struct
   */
  .macro save_vgic_state   vcpup
 +#ifdef CONFIG_KVM_ARM_VGIC
 + /* Get VGIC VCTRL base into r2 */
 + ldr r2, [\vcpup, #VCPU_KVM]
 + ldr r2, [r2, #KVM_VGIC_VCTRL]
 + cmp r2, #0
 + beq 2f
 +
 + /* Compute the address of struct vgic_cpu */
 + add r11, \vcpup, #VCPU_VGIC_CPU

Given that we're dealing with constants, it would be more efficient to
express this addition as part of the immediate offset and let gas spit
out the final computed address for the stores below.

 +
 + /* Save all interesting registers */
 + ldr r3, [r2, #GICH_HCR]
 + ldr r4, [r2, #GICH_VMCR]
 + ldr r5, [r2, #GICH_MISR]
 + ldr r6, [r2, #GICH_EISR0]
 + ldr r7, [r2, #GICH_EISR1]
 + ldr r8, [r2, #GICH_ELRSR0]
 + ldr r9, [r2, #GICH_ELRSR1]
 + ldr r10, [r2, #GICH_APR]
 +
 + str r3, [r11, #VGIC_CPU_HCR]
 + str r4, [r11, #VGIC_CPU_VMCR]
 + str r5, [r11, #VGIC_CPU_MISR]
 + str r6, [r11, #VGIC_CPU_EISR]
 + str r7, [r11, #(VGIC_CPU_EISR + 4)]
 + str r8, [r11, #VGIC_CPU_ELRSR]
 + str r9, [r11, #(VGIC_CPU_ELRSR + 4)]
 + str r10, [r11, #VGIC_CPU_APR]
 +
 + /* Save list registers */
 + add r2, r2, #GICH_LR0
 + add r3, r11, #VGIC_CPU_LR
 + ldr r4, [r11, #VGIC_CPU_NR_LR]
 +1:   ldr r6, [r2], #4
 + str r6, [r3], #4
 + subsr4, r4, #1
 + bne 1b
 +2:
 +#endif
  .endm

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 1/2] x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary

2012-12-03 Thread Gleb Natapov
On Tue, Nov 27, 2012 at 02:18:47PM +0200, Gleb Natapov wrote:
 Eric, can you ACK it?
 
Eric, ping.

 On Tue, Nov 27, 2012 at 11:26:02AM +0800, Zhang Yanfei wrote:
  This patch provides a way to VMCLEAR VMCSs related to guests
  on all cpus before executing the VMXOFF when doing kdump. This
  is used to ensure the VMCSs in the vmcore updated and
  non-corrupted.
  
  Signed-off-by: Zhang Yanfei zhangyan...@cn.fujitsu.com
  ---
   arch/x86/include/asm/kexec.h |2 ++
   arch/x86/kernel/crash.c  |   25 +
   2 files changed, 27 insertions(+), 0 deletions(-)
  
  diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
  index 317ff17..28feeba 100644
  --- a/arch/x86/include/asm/kexec.h
  +++ b/arch/x86/include/asm/kexec.h
  @@ -163,6 +163,8 @@ struct kimage_arch {
   };
   #endif
   
  +extern void (*crash_vmclear_loaded_vmcss)(void);
  +
   #endif /* __ASSEMBLY__ */
   
   #endif /* _ASM_X86_KEXEC_H */
  diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
  index 13ad899..4a2a12f 100644
  --- a/arch/x86/kernel/crash.c
  +++ b/arch/x86/kernel/crash.c
  @@ -16,6 +16,7 @@
   #include linux/delay.h
   #include linux/elf.h
   #include linux/elfcore.h
  +#include linux/module.h
   
   #include asm/processor.h
   #include asm/hardirq.h
  @@ -29,6 +30,20 @@
   #include asm/virtext.h
   
   int in_crash_kexec;
  + 
  +/*
  + * This is used to VMCLEAR all VMCSs loaded on the
  + * processor. And when loading kvm_intel module, the
  + * callback function pointer will be assigned.
  + */
  +void (*crash_vmclear_loaded_vmcss)(void) = NULL;
  +EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss);
  +
  +static inline void cpu_emergency_vmclear_loaded_vmcss(void)
  +{
  +   if (crash_vmclear_loaded_vmcss)
  +   crash_vmclear_loaded_vmcss();
  +}
   
   #if defined(CONFIG_SMP)  defined(CONFIG_X86_LOCAL_APIC)
   
  @@ -46,6 +61,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs 
  *regs)
   #endif
  crash_save_cpu(regs, cpu);
   
  +   /*
  +* VMCLEAR VMCSs loaded on all cpus if needed.
  +*/
  +   cpu_emergency_vmclear_loaded_vmcss();
  +
  /* Disable VMX or SVM if needed.
   *
   * We need to disable virtualization on all CPUs.
  @@ -88,6 +108,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
   
  kdump_nmi_shootdown_cpus();
   
  +   /*
  +* VMCLEAR VMCSs loaded on this cpu if needed.
  +*/
  +   cpu_emergency_vmclear_loaded_vmcss();
  +
  /* Booting kdump kernel with VMX or SVM enabled won't work,
   * because (among other limitations) we can't disable paging
   * with the virt flags.
  -- 
  1.7.1
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: x86: Make register state after reset conform to specification

2012-12-03 Thread Julian Stecklina
Floating point initialization is moved to kvm_arch_vcpu_init. Added general 
purpose
register clearing to the same function. SVM code now properly initializes
EDX.

Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kvm/cpuid.c |  1 +
 arch/x86/kvm/svm.c   |  7 +--
 arch/x86/kvm/vmx.c   |  7 +--
 arch/x86/kvm/x86.c   | 12 +++-
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0595f13..aa468c2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -659,6 +659,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, 
u32 *ecx, u32 *edx)
} else
*eax = *ebx = *ecx = *edx = 0;
 }
+EXPORT_SYMBOL_GPL(kvm_cpuid);
 
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index baead95..642213a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -20,6 +20,7 @@
 #include mmu.h
 #include kvm_cache_regs.h
 #include x86.h
+#include cpuid.h
 
 #include linux/module.h
 #include linux/mod_devicetable.h
@@ -1190,6 +1191,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   u32 dummy, eax;
 
init_vmcb(svm);
 
@@ -1198,8 +1200,9 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector  12;
svm-vmcb-save.cs.selector = svm-vcpu.arch.sipi_vector  8;
}
-   vcpu-arch.regs_avail = ~0;
-   vcpu-arch.regs_dirty = ~0;
+
+   kvm_cpuid(vcpu, eax, dummy, dummy, dummy);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
 
return 0;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ff66a3b..363fb03 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3942,7 +3942,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
u64 msr;
int ret;
 
-   vcpu-arch.regs_avail = ~((1  VCPU_REGS_RIP) | (1  VCPU_REGS_RSP));
+   vcpu-arch.regs_avail = ~(1  VCPU_REGS_RIP);
 
vmx-rmode.vm86_active = 0;
 
@@ -3955,10 +3955,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
msr |= MSR_IA32_APICBASE_BSP;
kvm_set_apic_base(vmx-vcpu, msr);
 
-   ret = fx_init(vmx-vcpu);
-   if (ret != 0)
-   goto out;
-
vmx_segment_cache_clear(vmx);
 
seg_setup(VCPU_SREG_CS);
@@ -3999,7 +3995,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_rip_write(vcpu, 0xfff0);
else
kvm_rip_write(vcpu, 0);
-   kvm_register_write(vcpu, VCPU_REGS_RSP, 0);
 
vmcs_writel(GUEST_DR7, 0x400);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2966c84..9640ea5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6045,6 +6045,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 {
+   int ret;
+
atomic_set(vcpu-arch.nmi_queued, 0);
vcpu-arch.nmi_pending = 0;
vcpu-arch.nmi_injected = false;
@@ -6066,7 +6068,15 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 
kvm_pmu_reset(vcpu);
 
-   return kvm_x86_ops-vcpu_reset(vcpu);
+   memset(vcpu-arch.regs, 0, sizeof(vcpu-arch.regs));
+   vcpu-arch.regs_avail = ~0;
+   vcpu-arch.regs_dirty = ~0;
+
+   if ((ret = fx_init(vcpu)) != 0) goto out;
+
+   ret = kvm_x86_ops-vcpu_reset(vcpu);
+out:
+   return ret;
 }
 
 int kvm_arch_hardware_enable(void *garbage)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM VMX: register state after reset violates spec

2012-12-03 Thread Julian Stecklina
Thus spake Gleb Natapov g...@redhat.com:

 It should, so why not move the fix to kvm_vcpu_reset() so it will work
 for both. Also what about R8-R15? Intel SDM says nothing about them in
 the section you mention, but in Volume 1 section 3.4.1.1 is says:
[...]
 I take it that they are undefined on the first transition to 64-bit mode
 too. AMD spec says that they should be zeroed on reset, so lets do that.
 Also SVM does not set EDX to correct value on reset. It should be:

I have posted a new version of the patch taking your suggestions into
account. The VMX version is working for me. I could not test it on AMD
hardware, though.

Julian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management

2012-12-03 Thread Marc Zyngier
On 03/12/12 13:23, Will Deacon wrote:
 Hi Marc,
 
 I've managed to look at some more of the vgic code, so here is some more
 feedback. I've still not got to the end of the series, but there's light at
 the end of the tunnel...
 
 On Sat, Nov 10, 2012 at 03:45:05PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 Add VGIC virtual CPU interface code, picking pending interrupts
 from the distributor and stashing them in the VGIC control interface
 list registers.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_vgic.h |   41 +++
  arch/arm/kvm/vgic.c |  226 
 +++
  2 files changed, 266 insertions(+), 1 deletion(-)

 diff --git a/arch/arm/include/asm/kvm_vgic.h 
 b/arch/arm/include/asm/kvm_vgic.h
 index 9e60b1d..7229324 100644
 --- a/arch/arm/include/asm/kvm_vgic.h
 +++ b/arch/arm/include/asm/kvm_vgic.h
 @@ -193,8 +193,45 @@ struct vgic_dist {
  };

  struct vgic_cpu {
 +#ifdef CONFIG_KVM_ARM_VGIC
 +   /* per IRQ to LR mapping */
 +   u8  vgic_irq_lr_map[VGIC_NR_IRQS];
 
 per IRQ?

Yes. We need to track which IRQ maps to which LR (so we can piggyback a
pending interrupt on an active one).

 +
 +   /* Pending interrupts on this VCPU */
 +   DECLARE_BITMAP( pending, VGIC_NR_IRQS);
 +
 +   /* Bitmap of used/free list registers */
 +   DECLARE_BITMAP( lr_used, 64);
 +
 +   /* Number of list registers on this CPU */
 +   int nr_lr;
 +
 +   /* CPU vif control registers for world switch */
 +   u32 vgic_hcr;
 +   u32 vgic_vmcr;
 +   u32 vgic_misr;  /* Saved only */
 +   u32 vgic_eisr[2];   /* Saved only */
 +   u32 vgic_elrsr[2];  /* Saved only */
 +   u32 vgic_apr;
 +   u32 vgic_lr[64];/* A15 has only 4... */
 +#endif
  };
 
 Looks like we should have a #define for the maximum number of list registers,
 so we keep vgic_lr and lr_user in sync.

Indeed.


 +#define VGIC_HCR_EN(1  0)
 +#define VGIC_HCR_UIE   (1  1)
 +
 +#define VGIC_LR_VIRTUALID  (0x3ff  0)
 +#define VGIC_LR_PHYSID_CPUID   (7  10)
 +#define VGIC_LR_STATE  (3  28)
 +#define VGIC_LR_PENDING_BIT(1  28)
 +#define VGIC_LR_ACTIVE_BIT (1  29)
 +#define VGIC_LR_EOI(1  19)
 +
 +#define VGIC_MISR_EOI  (1  0)
 +#define VGIC_MISR_U(1  1)
 +
 +#define LR_EMPTY   0xff
 +
 
 Could stick these in asm/hardware/gic.h. I know they're not used by the gic
 driver, but they're the same piece of architecture so it's probably worth
 keeping in one place.

This is on my list of things to do once the GIC code is shared between
arm and arm64. Could do it earlier if that makes more sense.

 You'd probably also want a s/VGIC/GICH/

Sure.

  struct kvm;
  struct kvm_vcpu;
  struct kvm_run;
 @@ -202,9 +239,13 @@ struct kvm_exit_mmio;

  #ifdef CONFIG_KVM_ARM_VGIC
  int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
 +void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
 +void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
 +int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
   struct kvm_exit_mmio *mmio);

 +#define irqchip_in_kernel(k)   (!!((k)-arch.vgic.vctrl_base))
  #else
  static inline int kvm_vgic_hyp_init(void)
  {
 diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
 index 82feee8..d7cdec5 100644
 --- a/arch/arm/kvm/vgic.c
 +++ b/arch/arm/kvm/vgic.c
 @@ -587,7 +587,25 @@ static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, 
 u32 reg)

  static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
  {
 -   return 0;
 +   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +   unsigned long *pending, *enabled, *pend;
 +   int vcpu_id;
 +
 +   vcpu_id = vcpu-vcpu_id;
 +   pend = vcpu-arch.vgic_cpu.pending;
 +
 +   pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id);
 +   enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id);
 +   bitmap_and(pend, pending, enabled, 32);
 
 pend and pending! vcpu_pending and dist_pending?

A lot of that code has already been reworked. See:
https://lists.cs.columbia.edu/pipermail/kvmarm/2012-November/004138.html

 +
 +   pending = vgic_bitmap_get_shared_map(dist-irq_state);
 +   enabled = vgic_bitmap_get_shared_map(dist-irq_enabled);
 +   bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS);
 +   bitmap_and(pend + 1, pend + 1,
 +  
 vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]),
 +  VGIC_NR_SHARED_IRQS);
 +
 +   return (find_first_bit(pend, VGIC_NR_IRQS)  VGIC_NR_IRQS);
  }

  /*
 @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm)
 }
  }

 +#define LR_PHYSID(lr)  (((lr)  

Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection

2012-12-03 Thread Marc Zyngier
On 03/12/12 13:25, Will Deacon wrote:
 On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 Plug the interrupt injection code. Interrupts can now be generated
 from user space.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_vgic.h |8 +++
  arch/arm/kvm/arm.c  |   29 +
  arch/arm/kvm/vgic.c |   90 
 +++
  3 files changed, 127 insertions(+)

 diff --git a/arch/arm/include/asm/kvm_vgic.h 
 b/arch/arm/include/asm/kvm_vgic.h
 index 7229324..6e3d303 100644
 --- a/arch/arm/include/asm/kvm_vgic.h
 +++ b/arch/arm/include/asm/kvm_vgic.h
 @@ -241,6 +241,8 @@ struct kvm_exit_mmio;
  int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
  void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
  void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
 +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 +bool level);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
struct kvm_exit_mmio *mmio);
 @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu 
 *vcpu) {}
  static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {}
  static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {}
  
 +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid,
 +  const struct kvm_irq_level *irq)
 +{
 +return 0;
 +}
 +
  static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
  {
  return 0;
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 3ac1aab..f43da01 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
 kvm_irq_level *irq_level)
  
  switch (irq_type) {
  case KVM_ARM_IRQ_TYPE_CPU:
 +if (irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
  if (irq_num  KVM_ARM_IRQ_CPU_FIQ)
  return -EINVAL;
  
  return vcpu_interrupt_line(vcpu, irq_num, level);
 +#ifdef CONFIG_KVM_ARM_VGIC
 +case KVM_ARM_IRQ_TYPE_PPI:
 +if (!irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
 +if (irq_num  16 || irq_num  31)
 +return -EINVAL;
 
 It's our favourite two numbers again! :)

I already fixed a number of them. Probably missed this one though.

 +
 +return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level);
 +case KVM_ARM_IRQ_TYPE_SPI:
 +if (!irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
 +if (irq_num  32 || irq_num  KVM_ARM_IRQ_GIC_MAX)
 +return -EINVAL;
 +
 +return kvm_vgic_inject_irq(kvm, 0, irq_num, level);
 +#endif
  }
  
  return -EINVAL;
 @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
  void __user *argp = (void __user *)arg;
  
  switch (ioctl) {
 +#ifdef CONFIG_KVM_ARM_VGIC
 +case KVM_CREATE_IRQCHIP: {
 +if (vgic_present)
 +return kvm_vgic_create(kvm);
 +else
 +return -EINVAL;
 
 ENXIO? At least, that's what you use when setting the GIC addresses.

-EINVAL seems to be one of the values other archs are using. -ENXIO is
not one of them for KVM_CREATE_IRQCHIP. Doesn't mean they are right, but
for the sake of keeping userspace happy, I'm not really inclined to
change this.

Christoffer?

 +}
 +#endif
  case KVM_SET_DEVICE_ADDRESS: {
  struct kvm_device_address dev_addr;
  
 diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
 index dda5623..70040bb 100644
 --- a/arch/arm/kvm/vgic.c
 +++ b/arch/arm/kvm/vgic.c
 @@ -75,6 +75,7 @@
  #define ACCESS_WRITE_MASK(x)((x)  (3  1))
  
  static void vgic_update_state(struct kvm *kvm);
 +static void vgic_kick_vcpus(struct kvm *kvm);
  static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
  
  static inline int vgic_irq_is_edge(struct vgic_dist *dist, int irq)
 @@ -542,6 +543,9 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct 
 kvm_run *run, struct kvm_exi
  kvm_prepare_mmio(run, mmio);
  kvm_handle_mmio_return(vcpu, run);
  
 +if (updated_state)
 +vgic_kick_vcpus(vcpu-kvm);
 +
  return true;
  }
  
 @@ -867,6 +871,92 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
  return test_bit(vcpu-vcpu_id, dist-irq_pending_on_cpu);
  }
  
 +static void vgic_kick_vcpus(struct kvm *kvm)
 +{
 +struct kvm_vcpu *vcpu;
 +int c;
 +
 +/*
 + * We've injected an interrupt, time to find out who deserves
 + * a good kick...
 + */
 +kvm_for_each_vcpu(c, vcpu, kvm) {
 +if (kvm_vgic_vcpu_pending_irq(vcpu))
 +

Re: [PATCH v4 10/13] ARM: KVM: VGIC control interface world switch

2012-12-03 Thread Marc Zyngier
On 03/12/12 13:31, Will Deacon wrote:
 On Sat, Nov 10, 2012 at 03:45:25PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 Enable the VGIC control interface to be save-restored on world switch.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_arm.h |   12 +++
  arch/arm/kernel/asm-offsets.c  |   12 +++
  arch/arm/kvm/interrupts_head.S |   68 
 
  3 files changed, 92 insertions(+)

 diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
 index 246afd7..8f5dd22 100644
 --- a/arch/arm/include/asm/kvm_arm.h
 +++ b/arch/arm/include/asm/kvm_arm.h
 @@ -192,4 +192,16 @@
  #define HSR_EC_DABT (0x24)
  #define HSR_EC_DABT_HYP (0x25)
  
 +/* GICH offsets */
 +#define GICH_HCR0x0
 +#define GICH_VTR0x4
 +#define GICH_VMCR   0x8
 +#define GICH_MISR   0x10
 +#define GICH_EISR0  0x20
 +#define GICH_EISR1  0x24
 +#define GICH_ELRSR0 0x30
 +#define GICH_ELRSR1 0x34
 +#define GICH_APR0xf0
 +#define GICH_LR00x100
 
 Similar thing to the other new gic defines -- they're probably better off
 in gic.h

Agreed.

 +
  #endif /* __ARM_KVM_ARM_H__ */
 diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
 index 95cab37..39b6221 100644
 --- a/arch/arm/kernel/asm-offsets.c
 +++ b/arch/arm/kernel/asm-offsets.c
 @@ -167,6 +167,18 @@ int main(void)
DEFINE(VCPU_HxFAR,offsetof(struct kvm_vcpu, arch.hxfar));
DEFINE(VCPU_HPFAR,offsetof(struct kvm_vcpu, arch.hpfar));
DEFINE(VCPU_HYP_PC,   offsetof(struct kvm_vcpu, arch.hyp_pc));
 +#ifdef CONFIG_KVM_ARM_VGIC
 +  DEFINE(VCPU_VGIC_CPU, offsetof(struct kvm_vcpu, 
 arch.vgic_cpu));
 +  DEFINE(VGIC_CPU_HCR,  offsetof(struct vgic_cpu, vgic_hcr));
 +  DEFINE(VGIC_CPU_VMCR, offsetof(struct vgic_cpu, vgic_vmcr));
 +  DEFINE(VGIC_CPU_MISR, offsetof(struct vgic_cpu, vgic_misr));
 +  DEFINE(VGIC_CPU_EISR, offsetof(struct vgic_cpu, vgic_eisr));
 +  DEFINE(VGIC_CPU_ELRSR,offsetof(struct vgic_cpu, vgic_elrsr));
 +  DEFINE(VGIC_CPU_APR,  offsetof(struct vgic_cpu, vgic_apr));
 +  DEFINE(VGIC_CPU_LR,   offsetof(struct vgic_cpu, vgic_lr));
 +  DEFINE(VGIC_CPU_NR_LR,offsetof(struct vgic_cpu, nr_lr));
 +  DEFINE(KVM_VGIC_VCTRL,offsetof(struct kvm, arch.vgic.vctrl_base));
 +#endif
DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr));
  #endif
return 0; 
 diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
 index 2ac8b4a..c2423d8 100644
 --- a/arch/arm/kvm/interrupts_head.S
 +++ b/arch/arm/kvm/interrupts_head.S
 @@ -341,6 +341,45 @@
   * @vcpup: Register pointing to VCPU struct
   */
  .macro save_vgic_state  vcpup
 +#ifdef CONFIG_KVM_ARM_VGIC
 +/* Get VGIC VCTRL base into r2 */
 +ldr r2, [\vcpup, #VCPU_KVM]
 +ldr r2, [r2, #KVM_VGIC_VCTRL]
 +cmp r2, #0
 +beq 2f
 +
 +/* Compute the address of struct vgic_cpu */
 +add r11, \vcpup, #VCPU_VGIC_CPU
 
 Given that we're dealing with constants, it would be more efficient to
 express this addition as part of the immediate offset and let gas spit
 out the final computed address for the stores below.

We had that for a while, and with the kvm_vcpu structure growing, we
ended up having fields out of the reach of an immediate offset. Is
kvm_vcpu too big? Yes.

 +
 +/* Save all interesting registers */
 +ldr r3, [r2, #GICH_HCR]
 +ldr r4, [r2, #GICH_VMCR]
 +ldr r5, [r2, #GICH_MISR]
 +ldr r6, [r2, #GICH_EISR0]
 +ldr r7, [r2, #GICH_EISR1]
 +ldr r8, [r2, #GICH_ELRSR0]
 +ldr r9, [r2, #GICH_ELRSR1]
 +ldr r10, [r2, #GICH_APR]
 +
 +str r3, [r11, #VGIC_CPU_HCR]
 +str r4, [r11, #VGIC_CPU_VMCR]
 +str r5, [r11, #VGIC_CPU_MISR]
 +str r6, [r11, #VGIC_CPU_EISR]
 +str r7, [r11, #(VGIC_CPU_EISR + 4)]
 +str r8, [r11, #VGIC_CPU_ELRSR]
 +str r9, [r11, #(VGIC_CPU_ELRSR + 4)]
 +str r10, [r11, #VGIC_CPU_APR]
 +
 +/* Save list registers */
 +add r2, r2, #GICH_LR0
 +add r3, r11, #VGIC_CPU_LR
 +ldr r4, [r11, #VGIC_CPU_NR_LR]
 +1:  ldr r6, [r2], #4
 +str r6, [r3], #4
 +subsr4, r4, #1
 +bne 1b
 +2:
 +#endif
  .endm
 
 Will
 


-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management

2012-12-03 Thread Will Deacon
On Mon, Dec 03, 2012 at 02:11:03PM +, Marc Zyngier wrote:
 On 03/12/12 13:23, Will Deacon wrote:
 
  +#define VGIC_HCR_EN(1  0)
  +#define VGIC_HCR_UIE   (1  1)
  +
  +#define VGIC_LR_VIRTUALID  (0x3ff  0)
  +#define VGIC_LR_PHYSID_CPUID   (7  10)
  +#define VGIC_LR_STATE  (3  28)
  +#define VGIC_LR_PENDING_BIT(1  28)
  +#define VGIC_LR_ACTIVE_BIT (1  29)
  +#define VGIC_LR_EOI(1  19)
  +
  +#define VGIC_MISR_EOI  (1  0)
  +#define VGIC_MISR_U(1  1)
  +
  +#define LR_EMPTY   0xff
  +
 
  Could stick these in asm/hardware/gic.h. I know they're not used by the gic
  driver, but they're the same piece of architecture so it's probably worth
  keeping in one place.
 
 This is on my list of things to do once the GIC code is shared between
 arm and arm64. Could do it earlier if that makes more sense.

Might as well as I found some others in a later patch too.

   static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
   {
  -   return 0;
  +   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
  +   unsigned long *pending, *enabled, *pend;
  +   int vcpu_id;
  +
  +   vcpu_id = vcpu-vcpu_id;
  +   pend = vcpu-arch.vgic_cpu.pending;
  +
  +   pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id);
  +   enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id);
  +   bitmap_and(pend, pending, enabled, 32);
 
  pend and pending! vcpu_pending and dist_pending?
 
 A lot of that code has already been reworked. See:
 https://lists.cs.columbia.edu/pipermail/kvmarm/2012-November/004138.html

Argh, too much code! Ok, as long as it's being looked at.

  +
  +   pending = vgic_bitmap_get_shared_map(dist-irq_state);
  +   enabled = vgic_bitmap_get_shared_map(dist-irq_enabled);
  +   bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS);
  +   bitmap_and(pend + 1, pend + 1,
  +  
  vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]),
  +  VGIC_NR_SHARED_IRQS);
  +
  +   return (find_first_bit(pend, VGIC_NR_IRQS)  VGIC_NR_IRQS);
   }
 
   /*
  @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm)
  }
   }
 
  +#define LR_PHYSID(lr)  (((lr)  VGIC_LR_PHYSID_CPUID)  10)
 
  Is VGIC_LR_PHYSID_CPUID wide enough for this? The CPUID is only 3 bits, but
  the interrupt ID could be larger. Or do you not supported hardware interrupt
  forwarding? (in which case, LR_PHYSID is a misleading name).
 
 Hardware interrupt forwarding is not supported. PHYSID is the name of
 the actual field in the spec, hence the name of the macro. LR_CPUID?

Sure.

  +   kvm_debug(LR%d piggyback for IRQ%d %x\n, lr, irq, 
  vgic_cpu-vgic_lr[lr]);
  +   BUG_ON(!test_bit(lr, vgic_cpu-lr_used));
  +   vgic_cpu-vgic_lr[lr] |= VGIC_LR_PENDING_BIT;
  +   if (is_level)
  +   vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI;
  +   return true;
  +   }
  +
  +   /* Try to use another LR for this interrupt */
  +   lr = find_first_bit((unsigned long *)vgic_cpu-vgic_elrsr,
  +  vgic_cpu-nr_lr);
  +   if (lr = vgic_cpu-nr_lr)
  +   return false;
  +
  +   kvm_debug(LR%d allocated for IRQ%d %x\n, lr, irq, sgi_source_id);
  +   vgic_cpu-vgic_lr[lr] = MK_LR_PEND(sgi_source_id, irq);
  +   if (is_level)
  +   vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI;
  +
  +   vgic_cpu-vgic_irq_lr_map[irq] = lr;
  +   clear_bit(lr, (unsigned long *)vgic_cpu-vgic_elrsr);
  +   set_bit(lr, vgic_cpu-lr_used);
  +
  +   return true;
  +}
 
  I can't help but feel that this could be made cleaner by moving the
  level-specific EOI handling out into a separate function.
 
 Do you mean having two functions, one for edge and the other for level?
 Seems overkill to me. I could move the if (is_level) ... to a common
 spot though.

Indeed, you could just have something like vgic_eoi_irq and call that
in one place, letting that function do the level check.

  +
  +/*
  + * Fill the list registers with pending interrupts before running the
  + * guest.
  + */
  +static void __kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu)
  +{
  +   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
  +   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
  +   unsigned long *pending;
  +   int i, c, vcpu_id;
  +   int overflow = 0;
  +
  +   vcpu_id = vcpu-vcpu_id;
  +
  +   /*
  +* We may not have any pending interrupt, or the interrupts
  +* may have been serviced from another vcpu. In all cases,
  +* move along.
  +*/
  +   if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
  +   pr_debug(CPU%d has no pending interrupt\n, vcpu_id);
  +   goto epilog;
  +   }
  +
  +   /* SGIs */
  +   pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id);
  +   for_each_set_bit(i, 

Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management

2012-12-03 Thread Christoffer Dall
[...]


 +
 +   clear_bit(c, sources);
 +   }
 +
 +   if (!sources)
 +   clear_bit(i, pending);

 What does this signify and how does it happen? An SGI without a source
 sounds pretty weird...

 See the clear_bit() just above. Once all the sources for this SGI are
 cleared, we can make the interrupt not pending anymore.


every time I read the code, I get completely bogged up on trying to
understand this case and I tell myself we should put a comment here,
then I understand why it happens and I think, oh it's obvious, no
comment needed, but now I (almost) forgot again. Could we add a
comment?

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection

2012-12-03 Thread Christoffer Dall
[...]

 +
 +static bool vgic_update_irq_state(struct kvm *kvm, int cpuid,
 +  unsigned int irq_num, bool level)
 +{
 +struct vgic_dist *dist = kvm-arch.vgic;
 +struct kvm_vcpu *vcpu;
 +int is_edge, is_level, state;
 +int enabled;
 +bool ret = true;
 +
 +spin_lock(dist-lock);
 +
 +is_edge = vgic_irq_is_edge(dist, irq_num);
 +is_level = !is_edge;
 +state = vgic_bitmap_get_irq_val(dist-irq_state, cpuid, irq_num);
 +
 +/*
 + * Only inject an interrupt if:
 + * - level triggered and we change level
 + * - edge triggered and we have a rising edge
 + */
 +if ((is_level  !(state ^ level)) || (is_edge  (state || !level))) {
 +ret = false;
 +goto out;
 +}

 Eek, more of the edge/level combo. Can this be be restructured so that we
 have vgic_update_{edge,level}_irq_state, which are called from here
 appropriately?

 I'll have a look.

oh, you're no fun anymore. That if statement is one of the funniest
pieces of this code.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/14] KVM: ARM: Handle guest faults in KVM

2012-12-03 Thread Christoffer Dall
On Mon, Dec 3, 2012 at 8:06 AM, Will Deacon will.dea...@arm.com wrote:
 On Fri, Nov 30, 2012 at 09:40:37PM +, Christoffer Dall wrote:
 On Mon, Nov 19, 2012 at 10:07 AM, Will Deacon will.dea...@arm.com wrote:
 
  Why are PIPT caches affected by this? The virtual address is irrelevant.
 

 The comment is slightly misleading, and I'll update it. Just so we're
 clear, this is the culprit:

 1. guest uses page X, containing instruction A
 2. page X gets swapped out
 3. host uses page X, containing instruction B
 4. instruction B enters i-cache at page X's cache line
 5. page X gets swapped out
 6. guest swaps page X back in
 7. guest executes instruction B from cache, should execute instruction A

 Ok, that's clearer. Thanks for the explanation.

 The point is that with PIPT we can flush only that page from the
 icache using the host virtual address, as the MMU will do the
 translation on the fly. In the VIPT we have to nuke the whole thing
 (unless we .

 Unless we what? Could we flush using the host VA + all virtual aliases
 instead?

you'd have to know all the virtual addresses of the guest(s) mapping
that physical page and then flush all the aliases of those addresses,
which we don't know at this time.

What we can do (down the road) is to mark the pages as XN, and catch
the fault, get the virtual fault address, and then flush that single
page.

The tradeoffs need to be measured before implementing this imho, and
is an optimization we can add later.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 08/14] KVM: ARM: World-switch implementation

2012-12-03 Thread Christoffer Dall
On Mon, Dec 3, 2012 at 5:33 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 On 30/11/12 18:49, Christoffer Dall wrote:
 On Fri, Nov 30, 2012 at 12:14 PM, Will Deacon will.dea...@arm.com wrote:
 On Fri, Nov 30, 2012 at 04:47:40PM +, Christoffer Dall wrote:
 On Fri, Nov 30, 2012 at 10:15 AM, Will Deacon will.dea...@arm.com wrote:
 At this point, VM1 is running and VM0:VCPU1 is running. VM0:VCPU0 is not
 running because physical CPU0 is handling an interrupt. The problem is 
 that
 when VCPU0 *is* resumed, it will update the VMID of VM0 and could be
 scheduled in parallel with VCPU1 but with a different VMID.

 How do you avoid this in the current code?

 I don't. Nice catch. Please apply your interesting brain to the following 
 fix:)

 I'm far too sober to look at your patch right now, but I'll think about it
 over the weekend [I can't break it at a quick glance] :)

 In the meantime, can you think about whether the TLB operations need to run
 on every CPU please?

 they don't we can invalidate the TLB and the icache using the inner
 shareability domain. Here's a patch:

 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index ad1390f..df1b753 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -146,6 +146,7 @@ struct kvm_one_reg;
  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
  u64 kvm_call_hyp(void *hypfn, ...);
 +void force_vm_exit(const cpumask_t *mask);

  #define KVM_ARCH_WANT_MMU_NOTIFIER
  struct kvm;
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index c4f631e..674592e 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -405,9 +405,14 @@ int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
   return v-mode == IN_GUEST_MODE;
  }

 -static void reset_vm_context(void *info)
 +/* Just ensure a guest exit from a particular CPU */
 +static void exit_vm_noop(void *info)
  {
 - kvm_call_hyp(__kvm_flush_vm_context);
 +}
 +
 +void force_vm_exit(const cpumask_t *mask)
 +{
 + smp_call_function_many(mask, exit_vm_noop, NULL, true);
  }

 Care to update the do_nothing() call in emulate.c to use this as well?

yes, was actually already done:

commit d1ff14417724e9062ec9d585bf3ef0537e12b54d
Author: Christoffer Dall c.d...@virtualopensystems.com
Date:   Fri Nov 30 13:37:01 2012 -0500

KVM: ARM: Reuse force_vm_exit from copy_current_insn

The VMID management code also needs to force VM exits, so reuse
that functionality from both places.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com

diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 60e7ec0..ad743b7 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -221,11 +221,6 @@ static bool copy_from_guest_va(struct kvm_vcpu *vcpu,
return true;
 }

-/* Just ensure we're not running the guest. */
-static void do_nothing(void *info)
-{
-}
-
 /*
  * We have to be very careful copying memory from a running (ie. SMP) guest.
  * Another CPU may remap the page (eg. swap out a userspace text page) as we
@@ -257,8 +252,9 @@ static bool copy_current_insn(struct kvm_vcpu
*vcpu, unsigned long *instr)
/* Kick out any which are still running. */
kvm_for_each_vcpu(i, v, vcpu-kvm) {
/* Guest could exit now, making cpu wrong. That's OK. */
-   if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE)
-   smp_call_function_single(v-cpu, do_nothing, NULL, 1);
+   if (kvm_vcpu_exiting_guest_mode(v) == IN_GUEST_MODE) {
+   force_vm_exit(get_cpu_mask(v-cpu));
+   }
}


--
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/13] ARM: KVM: VGIC virtual CPU interface management

2012-12-03 Thread Marc Zyngier
On 03/12/12 14:34, Will Deacon wrote:
 On Mon, Dec 03, 2012 at 02:11:03PM +, Marc Zyngier wrote:
 On 03/12/12 13:23, Will Deacon wrote:

 +#define VGIC_HCR_EN(1  0)
 +#define VGIC_HCR_UIE   (1  1)
 +
 +#define VGIC_LR_VIRTUALID  (0x3ff  0)
 +#define VGIC_LR_PHYSID_CPUID   (7  10)
 +#define VGIC_LR_STATE  (3  28)
 +#define VGIC_LR_PENDING_BIT(1  28)
 +#define VGIC_LR_ACTIVE_BIT (1  29)
 +#define VGIC_LR_EOI(1  19)
 +
 +#define VGIC_MISR_EOI  (1  0)
 +#define VGIC_MISR_U(1  1)
 +
 +#define LR_EMPTY   0xff
 +

 Could stick these in asm/hardware/gic.h. I know they're not used by the gic
 driver, but they're the same piece of architecture so it's probably worth
 keeping in one place.

 This is on my list of things to do once the GIC code is shared between
 arm and arm64. Could do it earlier if that makes more sense.
 
 Might as well as I found some others in a later patch too.
 
  static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
  {
 -   return 0;
 +   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +   unsigned long *pending, *enabled, *pend;
 +   int vcpu_id;
 +
 +   vcpu_id = vcpu-vcpu_id;
 +   pend = vcpu-arch.vgic_cpu.pending;
 +
 +   pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id);
 +   enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id);
 +   bitmap_and(pend, pending, enabled, 32);

 pend and pending! vcpu_pending and dist_pending?

 A lot of that code has already been reworked. See:
 https://lists.cs.columbia.edu/pipermail/kvmarm/2012-November/004138.html
 
 Argh, too much code! Ok, as long as it's being looked at.
 
 +
 +   pending = vgic_bitmap_get_shared_map(dist-irq_state);
 +   enabled = vgic_bitmap_get_shared_map(dist-irq_enabled);
 +   bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS);
 +   bitmap_and(pend + 1, pend + 1,
 +  
 vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]),
 +  VGIC_NR_SHARED_IRQS);
 +
 +   return (find_first_bit(pend, VGIC_NR_IRQS)  VGIC_NR_IRQS);
  }

  /*
 @@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm)
 }
  }

 +#define LR_PHYSID(lr)  (((lr)  VGIC_LR_PHYSID_CPUID)  10)

 Is VGIC_LR_PHYSID_CPUID wide enough for this? The CPUID is only 3 bits, but
 the interrupt ID could be larger. Or do you not supported hardware interrupt
 forwarding? (in which case, LR_PHYSID is a misleading name).

 Hardware interrupt forwarding is not supported. PHYSID is the name of
 the actual field in the spec, hence the name of the macro. LR_CPUID?
 
 Sure.
 
 +   kvm_debug(LR%d piggyback for IRQ%d %x\n, lr, irq, 
 vgic_cpu-vgic_lr[lr]);
 +   BUG_ON(!test_bit(lr, vgic_cpu-lr_used));
 +   vgic_cpu-vgic_lr[lr] |= VGIC_LR_PENDING_BIT;
 +   if (is_level)
 +   vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI;
 +   return true;
 +   }
 +
 +   /* Try to use another LR for this interrupt */
 +   lr = find_first_bit((unsigned long *)vgic_cpu-vgic_elrsr,
 +  vgic_cpu-nr_lr);
 +   if (lr = vgic_cpu-nr_lr)
 +   return false;
 +
 +   kvm_debug(LR%d allocated for IRQ%d %x\n, lr, irq, sgi_source_id);
 +   vgic_cpu-vgic_lr[lr] = MK_LR_PEND(sgi_source_id, irq);
 +   if (is_level)
 +   vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI;
 +
 +   vgic_cpu-vgic_irq_lr_map[irq] = lr;
 +   clear_bit(lr, (unsigned long *)vgic_cpu-vgic_elrsr);
 +   set_bit(lr, vgic_cpu-lr_used);
 +
 +   return true;
 +}

 I can't help but feel that this could be made cleaner by moving the
 level-specific EOI handling out into a separate function.

 Do you mean having two functions, one for edge and the other for level?
 Seems overkill to me. I could move the if (is_level) ... to a common
 spot though.
 
 Indeed, you could just have something like vgic_eoi_irq and call that
 in one place, letting that function do the level check.
 
 +
 +/*
 + * Fill the list registers with pending interrupts before running the
 + * guest.
 + */
 +static void __kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu)
 +{
 +   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 +   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +   unsigned long *pending;
 +   int i, c, vcpu_id;
 +   int overflow = 0;
 +
 +   vcpu_id = vcpu-vcpu_id;
 +
 +   /*
 +* We may not have any pending interrupt, or the interrupts
 +* may have been serviced from another vcpu. In all cases,
 +* move along.
 +*/
 +   if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
 +   pr_debug(CPU%d has no pending interrupt\n, vcpu_id);
 +   goto epilog;
 +   }
 +
 +   /* SGIs */
 +   pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id);
 +   for_each_set_bit(i, vgic_cpu-pending, 16) {
 +   unsigned long sources;
 +
 +  

KVM call agenda for 2012-12-04

2012-12-03 Thread Juan Quintela

Hi

Please send in any agenda topics you are interested in.
- migration troubles from 1.2 - 1.3 due to qemu-kvm integration

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv

2012-12-03 Thread Scott Wood

On 12/01/2012 07:58:25 AM, Alexander Graf wrote:

In BookE, EPCR is defined and valid when either the HV or the 64bit
category are implemented. Reflect this in the field definition.

Today the only KVM target on 64bit is HV enabled, so there is no
change in actual source code, but this keeps the code closer to the
spec and doesn't build up artificial road blocks for a PR KVM
on 64bit.

[snip]

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h  
b/arch/powerpc/include/asm/kvm_host.h

index 62fbd38..3480526 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -405,14 +405,19 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_KVM_BOOKE_HV
u32 host_mas4;
u32 host_mas6;
-   u32 shadow_epcr;
-   u32 epcr;
u32 shadow_msrp;
u32 eplc;
u32 epsc;
u32 oldpir;
 #endif

+#if defined(CONFIG_BOOKE)
+#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT)
+   u32 shadow_epcr;
+   u32 epcr;
+#endif
+#endif


On a PR-mode implementation, why would we be have a shadow_epcr?  It  
would always be set based on the host kernel, just like when running  
any other userspace process.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost-net: initialize zcopy packet counters

2012-12-03 Thread Michael S. Tsirkin
These packet counters are used to drive the zercopy
selection heuristic so nothing too bad happens if they are off a bit -
and they are also reset once in a while.
But it's cleaner to clear them when backend is set so that
we start in a known state.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 67898fa..ff6c9199 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -823,6 +823,9 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
r = vhost_init_used(vq);
if (r)
goto err_vq;
+
+   n-tx_packets = 0;
+   n-tx_zcopy_err = 0;
}
 
mutex_unlock(vq-mutex);
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] vfio powerpc: enabled on powernv platform

2012-12-03 Thread Alex Williamson
On Mon, 2012-12-03 at 13:52 +1100, Alexey Kardashevskiy wrote:
 This patch initializes IOMMU groups based on the IOMMU
 configuration discovered during the PCI scan on POWERNV
 (POWER non virtualized) platform. The IOMMU groups are
 to be used later by VFIO driver (PCI pass through).
 
 It also implements an API for mapping/unmapping pages for
 guest PCI drivers and providing DMA window properties.
 This API is going to be used later by QEMU-VFIO to handle
 h_put_tce hypercalls from the KVM guest.
 
 Although this driver has been tested only on the POWERNV
 platform, it should work on any platform which supports
 TCE tables.
 
 To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
 option and configure VFIO as required.
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/include/asm/iommu.h |9 ++
  arch/powerpc/kernel/iommu.c  |  186 
 ++
  arch/powerpc/platforms/powernv/pci.c |  135 
  drivers/iommu/Kconfig|8 ++
  4 files changed, 338 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index cbfe678..5c7087a 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -76,6 +76,9 @@ struct iommu_table {
   struct iommu_pool large_pool;
   struct iommu_pool pools[IOMMU_NR_POOLS];
   unsigned long *it_map;   /* A simple allocation bitmap for now */
 +#ifdef CONFIG_IOMMU_API
 + struct iommu_group *it_group;
 +#endif
  };
  
  struct scatterlist;
 @@ -147,5 +150,11 @@ static inline void iommu_restore(void)
  }
  #endif
  
 +extern long iommu_clear_tces(struct iommu_table *tbl, unsigned long entry,
 + unsigned long pages);
 +extern long iommu_put_tces(struct iommu_table *tbl, unsigned long entry,
 + uint64_t tce, enum dma_data_direction direction,
 + unsigned long pages);
 +
  #endif /* __KERNEL__ */
  #endif /* _ASM_IOMMU_H */
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index ff5a6ce..2738aa4 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -44,6 +44,7 @@
  #include asm/kdump.h
  #include asm/fadump.h
  #include asm/vio.h
 +#include asm/tce.h
  
  #define DBG(...)
  
 @@ -856,3 +857,188 @@ void iommu_free_coherent(struct iommu_table *tbl, 
 size_t size,
   free_pages((unsigned long)vaddr, get_order(size));
   }
  }
 +
 +#ifdef CONFIG_IOMMU_API
 +/*
 + * SPAPR TCE API
 + */
 +
 +/*
 + * Returns the number of used IOMMU pages (4K) within
 + * the same system page (4K or 64K).
 + * bitmap_weight is not used as it does not support bigendian maps.
 + */
 +static int syspage_weight(unsigned long *map, unsigned long entry)
 +{
 + int ret = 0, nbits = PAGE_SIZE/IOMMU_PAGE_SIZE;
 +
 + /* Aligns TCE entry number to system page boundary */
 + entry = PAGE_MASK  IOMMU_PAGE_SHIFT;
 +
 + /* Count used 4K pages */
 + while (nbits--)
 + ret += (test_bit(entry++, map) == 0) ? 0 : 1;
 +
 + return ret;
 +}
 +
 +static void tce_flush(struct iommu_table *tbl)
 +{
 + /* Flush/invalidate TLB caches if necessary */
 + if (ppc_md.tce_flush)
 + ppc_md.tce_flush(tbl);
 +
 + /* Make sure updates are seen by hardware */
 + mb();
 +}
 +
 +/*
 + * iommu_clear_tces clears tces and returned the number of system pages
 + * which it called put_page() on
 + */
 +static long clear_tces_nolock(struct iommu_table *tbl, unsigned long entry,
 + unsigned long pages)
 +{
 + int i, retpages = 0;
 + unsigned long oldtce, oldweight;
 + struct page *page;
 +
 + for (i = 0; i  pages; ++i) {
 + oldtce = ppc_md.tce_get(tbl, entry + i);
 + ppc_md.tce_free(tbl, entry + i, 1);
 +
 + oldweight = syspage_weight(tbl-it_map, entry);
 + __clear_bit(entry - tbl-it_offset, tbl-it_map);
 +
 + if (!(oldtce  (TCE_PCI_WRITE | TCE_PCI_READ)))
 + continue;

Could this happen earlier, above syspage_weight() and __clear_bit()?

 +
 + page = pfn_to_page(oldtce  PAGE_SHIFT);
 +
 + WARN_ON(!page);
 + if (!page)
 + continue;
 +
 + if (oldtce  TCE_PCI_WRITE)
 + SetPageDirty(page);
 +
 + put_page(page);
 +
 + /* That was the last IOMMU page within the system page */
 + if ((oldweight == 1)  !syspage_weight(tbl-it_map, entry))
 + ++retpages;

If you used __test_and_clear_bit() above I think you could avoid this
2nd call to syspage_weight.  A minor optimization though.

 + }
 +
 + return retpages;
 +}
 +
 +/*
 + * iommu_clear_tces clears tces and returned the number
 + / of released system pages
 + */

Something bad happened to your comments here.

 +long iommu_clear_tces(struct iommu_table *tbl, 

Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv

2012-12-03 Thread Alexander Graf

On 03.12.2012, at 17:47, Scott Wood wrote:

 On 12/01/2012 07:58:25 AM, Alexander Graf wrote:
 In BookE, EPCR is defined and valid when either the HV or the 64bit
 category are implemented. Reflect this in the field definition.
 Today the only KVM target on 64bit is HV enabled, so there is no
 change in actual source code, but this keeps the code closer to the
 spec and doesn't build up artificial road blocks for a PR KVM
 on 64bit.
 [snip]
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
 arch/powerpc/include/asm/kvm_host.h |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 62fbd38..3480526 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -405,14 +405,19 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_KVM_BOOKE_HV
  u32 host_mas4;
  u32 host_mas6;
 -u32 shadow_epcr;
 -u32 epcr;
  u32 shadow_msrp;
  u32 eplc;
  u32 epsc;
  u32 oldpir;
 #endif
 +#if defined(CONFIG_BOOKE)
 +#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT)
 +u32 shadow_epcr;
 +u32 epcr;
 +#endif
 +#endif
 
 On a PR-mode implementation, why would we be have a shadow_epcr?  It would 
 always be set based on the host kernel, just like when running any other 
 userspace process.

Right - we could simply set MSR_CM. I'll move shadow_epcr back into the HV only 
bit above.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] vfio powerpc: implemented IOMMU driver for VFIO

2012-12-03 Thread Alex Williamson
On Mon, 2012-12-03 at 13:52 +1100, Alexey Kardashevskiy wrote:
 VFIO implements platform independent stuff such as
 a PCI driver, BAR access (via read/write on a file descriptor
 or direct mapping when possible) and IRQ signaling.
 
 The platform dependent part includes IOMMU initialization
 and handling. This patch implements an IOMMU driver for VFIO
 which does mapping/unmapping pages for the guest IO and
 provides information about DMA window (required by a POWERPC
 guest).
 
 The counterpart in QEMU is required to support this functionality.
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/Kconfig|6 +
  drivers/vfio/Makefile   |1 +
  drivers/vfio/vfio_iommu_spapr_tce.c |  350 
 +++
  include/linux/vfio.h|   26 +++
  4 files changed, 383 insertions(+)
  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
 
 diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
 index 7cd5dec..b464687 100644
 --- a/drivers/vfio/Kconfig
 +++ b/drivers/vfio/Kconfig
 @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
   depends on VFIO
   default n
  
 +config VFIO_IOMMU_SPAPR_TCE
 + tristate
 + depends on VFIO  SPAPR_TCE_IOMMU
 + default n
 +
  menuconfig VFIO
   tristate VFIO Non-Privileged userspace driver framework
   depends on IOMMU_API
   select VFIO_IOMMU_TYPE1 if X86
 + select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
   help
 VFIO provides a framework for secure userspace device drivers.
 See Documentation/vfio.txt for more details.
 diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
 index 2398d4a..72bfabc 100644
 --- a/drivers/vfio/Makefile
 +++ b/drivers/vfio/Makefile
 @@ -1,3 +1,4 @@
  obj-$(CONFIG_VFIO) += vfio.o
  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_PCI) += pci/
 diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
 b/drivers/vfio/vfio_iommu_spapr_tce.c
 new file mode 100644
 index 000..806ad9f
 --- /dev/null
 +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
 @@ -0,0 +1,350 @@
 +/*
 + * VFIO: IOMMU DMA mapping support for TCE on POWER
 + *
 + * Copyright (C) 2012 IBM Corp.  All rights reserved.
 + * Author: Alexey Kardashevskiy a...@ozlabs.ru
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * Derived from original vfio_iommu_type1.c:
 + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
 + * Author: Alex Williamson alex.william...@redhat.com
 + */
 +
 +#include linux/module.h
 +#include linux/pci.h
 +#include linux/slab.h
 +#include linux/uaccess.h
 +#include linux/err.h
 +#include linux/vfio.h
 +#include asm/iommu.h
 +
 +#define DRIVER_VERSION  0.1
 +#define DRIVER_AUTHOR   a...@ozlabs.ru
 +#define DRIVER_DESC VFIO IOMMU SPAPR TCE
 +
 +static void tce_iommu_detach_group(void *iommu_data,
 + struct iommu_group *iommu_group);
 +
 +/*
 + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
 + */
 +
 +/*
 + * This code handles mapping and unmapping of user data buffers
 + * into DMA'ble space using the IOMMU
 + */
 +
 +#define NPAGE_TO_SIZE(npage) ((size_t)(npage)  PAGE_SHIFT)
 +
 +struct vwork {
 + struct mm_struct*mm;
 + longnpage;
 + struct work_struct  work;
 +};
 +
 +/* delayed decrement/increment for locked_vm */
 +static void lock_acct_bg(struct work_struct *work)
 +{
 + struct vwork *vwork = container_of(work, struct vwork, work);
 + struct mm_struct *mm;
 +
 + mm = vwork-mm;
 + down_write(mm-mmap_sem);
 + mm-locked_vm += vwork-npage;
 + up_write(mm-mmap_sem);
 + mmput(mm);
 + kfree(vwork);
 +}
 +
 +static void lock_acct(long npage)
 +{
 + struct vwork *vwork;
 + struct mm_struct *mm;
 +
 + if (!current-mm)
 + return; /* process exited */
 +
 + if (down_write_trylock(current-mm-mmap_sem)) {
 + current-mm-locked_vm += npage;
 + up_write(current-mm-mmap_sem);
 + return;
 + }
 +
 + /*
 +  * Couldn't get mmap_sem lock, so must setup to update
 +  * mm-locked_vm later. If locked_vm were atomic, we
 +  * wouldn't need this silliness
 +  */
 + vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
 + if (!vwork)
 + return;
 + mm = get_task_mm(current);
 + if (!mm) {
 + kfree(vwork);
 + return;
 + }
 + INIT_WORK(vwork-work, lock_acct_bg);
 + vwork-mm = mm;
 + vwork-npage = npage;
 + schedule_work(vwork-work);
 +}
 +
 +/*
 + * The container descriptor supports only a single group per container.
 + * Required by the API as the container is not supplied with the IOMMU group
 + * at the moment of 

Re: [PATCH] vhost-net: initialize zcopy packet counters

2012-12-03 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Mon, 3 Dec 2012 19:31:51 +0200

 These packet counters are used to drive the zercopy
 selection heuristic so nothing too bad happens if they are off a bit -
 and they are also reset once in a while.
 But it's cleaner to clear them when backend is set so that
 we start in a known state.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 09/14] KVM: ARM: Emulation framework and CP15 emulation

2012-12-03 Thread Christoffer Dall
On Mon, Dec 3, 2012 at 6:05 AM, Will Deacon will.dea...@arm.com wrote:
 On Fri, Nov 30, 2012 at 08:22:28PM +, Christoffer Dall wrote:
 On Mon, Nov 19, 2012 at 10:01 AM, Will Deacon will.dea...@arm.com wrote:
  On Sat, Nov 10, 2012 at 03:43:13PM +, Christoffer Dall wrote:
  This should probably be 0xff and also use the macros that Lorenzo is
  introducing:
 

  http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/132977.html
 

 in the A15 TRM bits [7:2] are reserved, so we really only do care
 about bits [1:0], and this file is A15 specific, but if you prefer, I
 can merge this:

 The upper bits are RAZ, so yes, please do merge something like this.

I did, thanks!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection

2012-12-03 Thread Christoffer Dall
On Mon, Dec 3, 2012 at 9:21 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 On 03/12/12 13:25, Will Deacon wrote:
 On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 Plug the interrupt injection code. Interrupts can now be generated
 from user space.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_vgic.h |8 +++
  arch/arm/kvm/arm.c  |   29 +
  arch/arm/kvm/vgic.c |   90 
 +++
  3 files changed, 127 insertions(+)

 diff --git a/arch/arm/include/asm/kvm_vgic.h 
 b/arch/arm/include/asm/kvm_vgic.h
 index 7229324..6e3d303 100644
 --- a/arch/arm/include/asm/kvm_vgic.h
 +++ b/arch/arm/include/asm/kvm_vgic.h
 @@ -241,6 +241,8 @@ struct kvm_exit_mmio;
  int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
  void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
  void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
 +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 +bool level);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
struct kvm_exit_mmio *mmio);
 @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu 
 *vcpu) {}
  static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {}
  static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {}

 +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid,
 +  const struct kvm_irq_level *irq)
 +{
 +return 0;
 +}
 +
  static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
  {
  return 0;
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 3ac1aab..f43da01 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
 kvm_irq_level *irq_level)

  switch (irq_type) {
  case KVM_ARM_IRQ_TYPE_CPU:
 +if (irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
  if (irq_num  KVM_ARM_IRQ_CPU_FIQ)
  return -EINVAL;

  return vcpu_interrupt_line(vcpu, irq_num, level);
 +#ifdef CONFIG_KVM_ARM_VGIC
 +case KVM_ARM_IRQ_TYPE_PPI:
 +if (!irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
 +if (irq_num  16 || irq_num  31)
 +return -EINVAL;

 It's our favourite two numbers again! :)

 I already fixed a number of them. Probably missed this one though.

 +
 +return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level);
 +case KVM_ARM_IRQ_TYPE_SPI:
 +if (!irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
 +if (irq_num  32 || irq_num  KVM_ARM_IRQ_GIC_MAX)
 +return -EINVAL;
 +
 +return kvm_vgic_inject_irq(kvm, 0, irq_num, level);
 +#endif
  }

  return -EINVAL;
 @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
  void __user *argp = (void __user *)arg;

  switch (ioctl) {
 +#ifdef CONFIG_KVM_ARM_VGIC
 +case KVM_CREATE_IRQCHIP: {
 +if (vgic_present)
 +return kvm_vgic_create(kvm);
 +else
 +return -EINVAL;

 ENXIO? At least, that's what you use when setting the GIC addresses.

 -EINVAL seems to be one of the values other archs are using. -ENXIO is
 not one of them for KVM_CREATE_IRQCHIP. Doesn't mean they are right, but
 for the sake of keeping userspace happy, I'm not really inclined to
 change this.


We don't have user space code relying on this, and EINVAL is
misleading, so let's use ENXIO to be consistent with
SET_DEVICE_ADDRESS. No error values are specified in the API docs, so
we should use the most appropriate one.

You fix?

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 09/13] ARM: KVM: VGIC interrupt injection

2012-12-03 Thread Marc Zyngier
On 03/12/12 19:13, Christoffer Dall wrote:
 On Mon, Dec 3, 2012 at 9:21 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 On 03/12/12 13:25, Will Deacon wrote:
 On Sat, Nov 10, 2012 at 03:45:18PM +, Christoffer Dall wrote:
 From: Marc Zyngier marc.zyng...@arm.com

 Plug the interrupt injection code. Interrupts can now be generated
 from user space.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  arch/arm/include/asm/kvm_vgic.h |8 +++
  arch/arm/kvm/arm.c  |   29 +
  arch/arm/kvm/vgic.c |   90 
 +++
  3 files changed, 127 insertions(+)

 diff --git a/arch/arm/include/asm/kvm_vgic.h 
 b/arch/arm/include/asm/kvm_vgic.h
 index 7229324..6e3d303 100644
 --- a/arch/arm/include/asm/kvm_vgic.h
 +++ b/arch/arm/include/asm/kvm_vgic.h
 @@ -241,6 +241,8 @@ struct kvm_exit_mmio;
  int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
  void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
  void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
 +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 +bool level);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
struct kvm_exit_mmio *mmio);
 @@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu 
 *vcpu) {}
  static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {}
  static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {}

 +static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid,
 +  const struct kvm_irq_level *irq)
 +{
 +return 0;
 +}
 +
  static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
  {
  return 0;
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 3ac1aab..f43da01 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -764,10 +764,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
 kvm_irq_level *irq_level)

  switch (irq_type) {
  case KVM_ARM_IRQ_TYPE_CPU:
 +if (irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
  if (irq_num  KVM_ARM_IRQ_CPU_FIQ)
  return -EINVAL;

  return vcpu_interrupt_line(vcpu, irq_num, level);
 +#ifdef CONFIG_KVM_ARM_VGIC
 +case KVM_ARM_IRQ_TYPE_PPI:
 +if (!irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
 +if (irq_num  16 || irq_num  31)
 +return -EINVAL;

 It's our favourite two numbers again! :)

 I already fixed a number of them. Probably missed this one though.

 +
 +return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, 
 level);
 +case KVM_ARM_IRQ_TYPE_SPI:
 +if (!irqchip_in_kernel(kvm))
 +return -ENXIO;
 +
 +if (irq_num  32 || irq_num  KVM_ARM_IRQ_GIC_MAX)
 +return -EINVAL;
 +
 +return kvm_vgic_inject_irq(kvm, 0, irq_num, level);
 +#endif
  }

  return -EINVAL;
 @@ -849,6 +870,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
  void __user *argp = (void __user *)arg;

  switch (ioctl) {
 +#ifdef CONFIG_KVM_ARM_VGIC
 +case KVM_CREATE_IRQCHIP: {
 +if (vgic_present)
 +return kvm_vgic_create(kvm);
 +else
 +return -EINVAL;

 ENXIO? At least, that's what you use when setting the GIC addresses.

 -EINVAL seems to be one of the values other archs are using. -ENXIO is
 not one of them for KVM_CREATE_IRQCHIP. Doesn't mean they are right, but
 for the sake of keeping userspace happy, I'm not really inclined to
 change this.

 
 We don't have user space code relying on this, and EINVAL is
 misleading, so let's use ENXIO to be consistent with
 SET_DEVICE_ADDRESS. No error values are specified in the API docs, so
 we should use the most appropriate one.
 
 You fix?

Yes, I'll fix it as part the whole vgic series.

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction

2012-12-03 Thread Marcelo Tosatti
On Mon, Dec 03, 2012 at 04:33:01PM +0800, Xiao Guangrong wrote:
 Hi Marcelo,
 
 Thanks for your patience. I was reading your reply over and over again, i 
 would
 like to argue it more :).
 Please see below.
 
 On 11/29/2012 08:21 AM, Marcelo Tosatti wrote:
 
  
  https://lkml.org/lkml/2012/11/17/75
  
  Does unshadowing work with large sptes at reexecute_instruction? That
  is, do we nuke any large read-only sptes that might be causing a certain
  gfn to be read-only?
  
  That is, following the sequence there, is the large read-only spte
  properly destroyed?
 
 Actually, sptes can not prevent gfn becoming writable, that means the gfn
 can become writable *even if* it exists a spte which maps to the gfn.
 
 The condition that can prevent gfn becoming writable is, the gfn has been
 shadowed, that means the gfn can not become writable if it exists a sp
 with sp.gfn = gfn.
 
 Note, drop_spte does not remove any sp. So, either destroying spte or keeping
 spte doest not have any affect for gfn write-protection.
 
 If luck enough, my point is right, the current code can do some optimizations
 as below:
 
 if (level  PT_PAGE_TABLE_LEVEL 
 -   has_wrprotected_page(vcpu-kvm, gfn, level)) {
 -   ret = 1;
 -   drop_spte(vcpu-kvm, sptep);
 -   goto done;
 -   }
 +   has_wrprotected_page(vcpu-kvm, gfn, level))
 +   return 0;
 
 
 1): we can return 0 instead of 1 to avoid unnecessary emulation. vcpu will 
 refault
 again then kvm will use small page.

So on refault the large spte is nuked. That works, yes.

 2): need not do any change on the spte.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-12-03 Thread Marcelo Tosatti
On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:
 On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:
 
 Don't understand the reasoning behind why 3 is a good choice.
 
 Here is where I came from. (explaining from scratch for
 completeness, forgive me :))
 In moderate overcommits, we can falsely exit from ple handler even when
 we have preempted task of same VM waiting on other cpus. To reduce this
 problem, we try few times before exiting.
 The problem boils down to:
 what is the probability that we exit ple handler even when we have more
 than 1 task in other cpus. Theoretical worst case should be around 1.5x
 overcommit (As also pointed by Andrew Theurer). [But practical
 worstcase may be around 2x,3x overcommits as indicated by the results
 for the patch series]
 
 So if p is the probability of finding rq length one on a particular cpu,
 and if we do n tries, then probability of exiting ple handler is:
 
  p^(n+1) [ because we would have come across one source with rq length
 1 and n target cpu rqs  with length 1 ]
 
 so
 num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16
 
 We can increase this probability with more tries, but the problem is
 the overhead.
 Also, If we have tried three times that means we would have iterated
 over 3 good eligible vcpus along with many non-eligible candidates. In
 worst case if we iterate all the vcpus, we reduce 1x performance and
 overcommit performance get hit. [ as in results ].
 
 I have tried num_tries = 1,2,3 and n already ( not 4 yet). So I
 concluded 3 is enough.
 
 Infact I have also run kernbench and hackbench which are giving 5-20%
 improvement.
 
 [ As a side note , I also thought how about having num_tries = f(n) =
 ceil ( log(num_online_cpus)/2 ) But I thought calculation is too much
 overhead and also there is no point in probably making it dependent on
 online cpus ]
 
 Please let me know if you are happy with this rationale/ or correct me
 if you foresee some problem. (Infact Avi, Rik's concern about false
 exiting made me arrive at 'try' logic which I did not have earlier).
 
 I am currently trying out the result for 1.5x overcommit will post the
 result.

Raghavendra

Makes sense to me. Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes

2012-12-03 Thread Marcelo Tosatti
On Mon, Dec 03, 2012 at 12:36:33PM +0400, Glauber Costa wrote:
 On 11/30/2012 09:55 PM, Pavel Emelyanov wrote:
  Hello,
  
  This is an attempt to implement support for memory snapshot for the the
  checkpoint-restore project (http://criu.org).
  
  To create a dump of an application(s) we save all the information about it
  to files. No surprise, the biggest part of such dump is the contents of 
  tasks'
  memory. However, in some usage scenarios it's not required to get _all_ the
  task memory while creating a dump. For example, when doing periodical dumps
  it's only required to take full memory dump only at the first step and then
  take incremental changes of memory. Another example is live migration. In 
  the
  simplest form it looks like -- create dump, copy it on the remote node then
  restore tasks from dump files. While all this dump-copy-restore thing goes 
  all
  the process must be stopped. However, if we can monitor how tasks change 
  their
  memory, we can dump and copy it in smaller chunks, periodically updating it 
  and thus freezing tasks only at the very end for the very short time to pick
  up the recent changes.
  
  That said, some help from kernel to watch how processes modify the contents 
  of
  their memory is required. I'd like to propose one possible solution of this
  task -- with the help of page-faults and trace events.
  
  Briefly the approach is -- remap some memory regions as read-only, get the 
  #pf
  on task's attempt to modify the memory and issue a trace event of that. 
  Since
  we're only interested in parts of memory of some tasks, make it possible to 
  mark
  the vmas we're interested in and issue events for them only. Also, to be 
  aware
  of tasks unmapping the vma-s being watched, also issue an event when the 
  marked
  vma is removed (and for symmetry -- an event when a vma is marked).
  
  What do you think about this approach? Is this way of supporting mem 
  snapshot
  OK for you, or should we invent some better one?
  
 
 The page fault mechanism is pretty obvious - anything that deals with
 dirty pages will end up having to do this. So there is nothing crazy
 about this.
 
 What concerns me, however, is that should this go in, we'll have two
 dirty mem loggers in the kernel: one to support CRIU, one to support
 KVM. And the worst part: They have the exact the same purpose!!
 
 So to begin with, I think one thing to consider, would be to generalize
 KVM's dirty memory notification so it can work on a normal process
 memory region. KVM api requires a memory slot to be passed, something
 we are unlikely to have. But KVM can easily keep its API and use an
 alternate mechanics, that's trivial...
 
 Generally speaking, KVM will do polling with this ioctl. I prefer your
 tracing mechanism better. The only difference, is that KVM tends to
 transfer large chunks of memory in some loads - in the high gigs range.
 So the proposal tracing API should be able to optionally batch requests
 within a time frame.
 
 It would also be good to hear what does the KVM guys think of it as well

There are significant differences. KVM's dirty logging works for
guest translations (NPT/shadow) and is optimized for specific use cases.

Above is about dirty logging of userspace memory areas.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: optimize for set_spte

2012-12-03 Thread Xiao Guangrong
There are two cases we need to adjust page size in set_spte:
1): the one is other vcpu creates new sp in the window between mapping_level()
and acquiring mmu-lock.
2): the another case is the new sp is created by itself (page-fault path) when
guest uses the target gfn as its page table.

In current code, set_spte drop the spte and emulate the access for these case,
it works not good:
- for the case 1, it may destroy the mapping established by other vcpu, and
  do expensive instruction emulation.
- for the case 2, it may emulate the access even if the guest is accessing
  the page which not used as page table. There is a example, 0~2M is used as
  huge page in guest, in this huge page, only page 3 used as page table, then
  guest read/writes on other pages can cause instruction emulation.

Both of these cases can be fixed by allowing guest to retry the access, it
will refault, then we can establish the mapping by using small page

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/mmu.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b875a9e..01d7c2a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2382,12 +2382,20 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|| (!vcpu-arch.mmu.direct_map  write_fault
 !is_write_protection(vcpu)  !user_fault)) {

+   /*
+* There are two cases:
+* - the one is other vcpu creates new sp in the window
+*   between mapping_level() and acquiring mmu-lock.
+* - the another case is the new sp is created by itself
+*   (page-fault path) when guest uses the target gfn as
+*   its page table.
+* Both of these cases can be fixed by allowing guest to
+* retry the access, it will refault, then we can establish
+* the mapping by using small page.
+*/
if (level  PT_PAGE_TABLE_LEVEL 
-   has_wrprotected_page(vcpu-kvm, gfn, level)) {
-   ret = 1;
-   drop_spte(vcpu-kvm, sptep);
+   has_wrprotected_page(vcpu-kvm, gfn, level))
goto done;
-   }

spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;

-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 5/6] kvm: Re-introduce memslots-nmemslots

2012-12-03 Thread Alex Williamson
struct kvm_memory_slot is currently 52 bytes (LP64), not counting the
arch data.  On x86 this means the memslot array to support a tiny 32+3
entries (user+private) is over 2k.  We'd like to support more slots
so that we can support more assigned devices, but it doesn't make
sense to penalize everyone by using a statically allocated array.
This allows us to start introducing a grow-able array.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c |2 +-
 arch/powerpc/kvm/book3s_hv.c |2 +-
 arch/x86/kvm/vmx.c   |1 +
 arch/x86/kvm/x86.c   |4 +++-
 include/linux/kvm_host.h |9 ++---
 virt/kvm/kvm_main.c  |   10 ++
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 012e5dd..96401b5 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1836,7 +1836,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
memslot = id_to_memslot(kvm-memslots, log-slot);
r = -ENOENT;
-   if (!memslot-dirty_bitmap)
+   if (!memslots || !memslot-dirty_bitmap)
goto out;
 
kvm_ia64_sync_dirty_log(kvm, memslot);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 56067db..0417190 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1267,7 +1267,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
 
memslot = id_to_memslot(kvm-memslots, log-slot);
r = -ENOENT;
-   if (!memslot-dirty_bitmap)
+   if (!memslot || !memslot-dirty_bitmap)
goto out;
 
n = kvm_dirty_bitmap_bytes(memslot);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2bb9157..07fdd90 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2751,6 +2751,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
 
slots = kvm_memslots(kvm);
slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
+   BUG_ON(!slot);
base_gfn = slot-base_gfn + slot-npages - 3;
 
return base_gfn  PAGE_SHIFT;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8765485..53fe9b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3139,9 +3139,11 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
goto out;
 
memslot = id_to_memslot(kvm-memslots, log-slot);
+   r = -ENOENT;
+   if (!memslot)
+   goto out;
 
dirty_bitmap = memslot-dirty_bitmap;
-   r = -ENOENT;
if (!dirty_bitmap)
goto out;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7b3d5c4..1955a4e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -313,6 +313,7 @@ struct kvm_irq_routing_table {};
  * to get the memslot by its id.
  */
 struct kvm_memslots {
+   int nmemslots;
u64 generation;
struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
 };
@@ -397,7 +398,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
*kvm, int i)
 
 #define kvm_for_each_memslot(memslot, slots)   \
for (memslot = slots-memslots[0]; \
- memslot  slots-memslots + KVM_MEM_SLOTS_NUM  memslot-npages;\
+ memslot  slots-memslots + slots-nmemslots  memslot-npages;\
memslot++)
 
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
@@ -424,10 +425,12 @@ static inline struct kvm_memslots *kvm_memslots(struct 
kvm *kvm)
 static inline struct kvm_memory_slot *
 id_to_memslot(struct kvm_memslots *slots, int id)
 {
-   int index = slots-memslots[id].id_to_index;
struct kvm_memory_slot *slot;
 
-   slot = slots-memslots[index];
+   if (id = slots-nmemslots)
+   return NULL;
+
+   slot = slots-memslots[slots-memslots[id].id_to_index];
 
WARN_ON(slot-id != id);
return slot;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3ce2664..ebd3960 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -444,7 +444,9 @@ static void kvm_init_memslots_id(struct kvm *kvm)
int i;
struct kvm_memslots *slots = kvm-memslots;
 
-   for (i = 0; i  KVM_MEM_SLOTS_NUM; i++)
+   slots-nmemslots = KVM_MEM_SLOTS_NUM;
+
+   for (i = 0; i  kvm-memslots-nmemslots; i++)
slots-memslots[i].id_to_index = slots-memslots[i].id = i;
 }
 
@@ -658,10 +660,10 @@ static void sort_memslots(struct kvm_memslots *slots)
 {
int i;
 
-   sort(slots-memslots, KVM_MEM_SLOTS_NUM,
+   sort(slots-memslots, slots-nmemslots,
  sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
 
-   for (i = 0; i  KVM_MEM_SLOTS_NUM; i++)
+   for (i = 0; i  slots-nmemslots; i++)
slots-memslots[slots-memslots[i].id].id_to_index = i;
 }
 
@@ -898,7 +900,7 @@ int kvm_get_dirty_log(struct kvm 

[RFC PATCH 6/6] kvm: Allow memory slots to grow

2012-12-03 Thread Alex Williamson
Start with zero and grow up to KVM_MEM_SLOTS_NUM.  A modest guest
without device assignment likely uses around 1/4 of the total
entries.  We don't attempt to shrink the array when slots are
released.  Both x86 and powerpc still have some statically sized
elements elsewhere, but this covers the bulk of the memory used.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 include/linux/kvm_host.h |2 +
 virt/kvm/kvm_main.c  |   62 +++---
 2 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1955a4e..effc800 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -315,7 +315,7 @@ struct kvm_irq_routing_table {};
 struct kvm_memslots {
int nmemslots;
u64 generation;
-   struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
+   struct kvm_memory_slot memslots[];
 };
 
 struct kvm {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ebd3960..fa4df50 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -439,17 +439,6 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 
 #endif /* CONFIG_MMU_NOTIFIER  KVM_ARCH_WANT_MMU_NOTIFIER */
 
-static void kvm_init_memslots_id(struct kvm *kvm)
-{
-   int i;
-   struct kvm_memslots *slots = kvm-memslots;
-
-   slots-nmemslots = KVM_MEM_SLOTS_NUM;
-
-   for (i = 0; i  kvm-memslots-nmemslots; i++)
-   slots-memslots[i].id_to_index = slots-memslots[i].id = i;
-}
-
 static struct kvm *kvm_create_vm(unsigned long type)
 {
int r, i;
@@ -475,7 +464,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
kvm-memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
if (!kvm-memslots)
goto out_err_nosrcu;
-   kvm_init_memslots_id(kvm);
if (init_srcu_struct(kvm-srcu))
goto out_err_nosrcu;
for (i = 0; i  KVM_NR_BUSES; i++) {
@@ -696,6 +684,40 @@ static int check_memory_region_flags(struct 
kvm_userspace_memory_region *mem)
return 0;
 }
 
+struct kvm_memslots *__kvm_dup_and_grow_memslots(struct kvm_memslots *oldslots,
+int slot)
+{
+   int nmemslots;
+   struct kvm_memslots *newslots;
+
+   nmemslots = (slot = oldslots-nmemslots) ?
+   slot + 1 : oldslots-nmemslots;
+
+   newslots = kmalloc(sizeof(struct kvm_memslots) +
+   nmemslots * sizeof(struct kvm_memory_slot), GFP_KERNEL);
+   if (!newslots)
+   return NULL;
+
+   memcpy(newslots, oldslots, sizeof(struct kvm_memslots) +
+  oldslots-nmemslots * sizeof(struct kvm_memory_slot));
+
+   if (nmemslots != oldslots-nmemslots) {
+   int i;
+   memset(newslots-memslots[oldslots-nmemslots], 0,
+  (nmemslots - oldslots-nmemslots) *
+  sizeof(struct kvm_memory_slot));
+
+   for (i = oldslots-nmemslots; i  nmemslots; i++) {
+   newslots-memslots[i].id_to_index = i;
+   newslots-memslots[i].id = i;
+   }
+
+   newslots-nmemslots = nmemslots;
+   }
+
+   return newslots;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -711,8 +733,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
int r;
gfn_t base_gfn;
unsigned long npages;
-   struct kvm_memory_slot *memslot, *slot;
-   struct kvm_memory_slot old, new;
+   struct kvm_memory_slot *memslot = NULL, *slot;
+   struct kvm_memory_slot old = {}, new = {};
struct kvm_memslots *slots, *old_memslots;
 
r = check_memory_region_flags(mem);
@@ -737,7 +759,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (mem-guest_phys_addr + mem-memory_size  mem-guest_phys_addr)
goto out;
 
-   memslot = id_to_memslot(kvm-memslots, mem-slot);
base_gfn = mem-guest_phys_addr  PAGE_SHIFT;
npages = mem-memory_size  PAGE_SHIFT;
 
@@ -748,7 +769,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (!npages)
mem-flags = ~KVM_MEM_LOG_DIRTY_PAGES;
 
-   new = old = *memslot;
+   if (mem-slot  kvm-memslots-nmemslots) {
+   memslot = id_to_memslot(kvm-memslots, mem-slot);
+   new = old = *memslot;
+   }
 
new.id = mem-slot;
new.base_gfn = base_gfn;
@@ -796,8 +820,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_memory_slot *slot;
 
r = -ENOMEM;
-   slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
-   GFP_KERNEL);
+   slots = __kvm_dup_and_grow_memslots(kvm-memslots, mem-slot);
if (!slots)
goto out_free;
slot = id_to_memslot(slots, mem-slot);
@@ -832,8 +855,7 @@ int 

[RFC PATCH 0/6] kvm: Growable memory slot array

2012-12-03 Thread Alex Williamson
Memory slots are currently a fixed resource with a relatively small
limit.  When using PCI device assignment in a qemu guest it's fairly
easy to exhaust the number of available slots.  I posted patches
exploring growing the number of memory slots a while ago, but it was
prior to caching memory slot array misses and thefore had potentially
poor performance.  Now that we do that, Avi seemed receptive to
increasing the memory slot array to arbitrary lengths.  I think we
still don't want to impose unnecessary kernel memory consumptions on
guests not making use of this, so I present again a growable memory
slot array.

A couple notes/questions; in the previous version we had a
kvm_arch_flush_shadow() call when we increased the number of slots.
I'm not sure if this is still necessary.  I had also made the x86
specific slot_bitmap dynamically grow as well and switch between a
direct bitmap and indirect pointer to a bitmap.  That may have
contributed to needing the flush.  I haven't done that yet here
because it seems like an unnecessary complication if we have a max
on the order of 512 or 1024 entries.  A bit per slot isn't a lot of
overhead.  If we want to go more, maybe we should make it switch.
That leads to the final question, we need an upper bound since this
does allow consumption of extra kernel memory, what should it be?  A
PCI bus filled with assigned devices can theorically use up to 2048
slots (32 devices * 8 functions * (6 BARs + ROM + possibly split
MSI-X BAR)).  For this RFC, I don't change the max, just make it
grow up to 32 user slots.  Untested on anything but x86 so far.
Thanks,

Alex
---

Alex Williamson (6):
  kvm: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS
  kvm: Make KVM_PRIVATE_MEM_SLOTS optional
  kvm: Merge id_to_index into memslots
  kvm: Move private memory slots to start of memslots array
  kvm: Re-introduce memslots-nmemslots
  kvm: Allow memory slots to grow


 arch/ia64/include/asm/kvm_host.h|4 --
 arch/ia64/kvm/kvm-ia64.c|6 +--
 arch/powerpc/include/asm/kvm_host.h |6 +--
 arch/powerpc/kvm/book3s_hv.c|4 +-
 arch/s390/include/asm/kvm_host.h|4 --
 arch/x86/include/asm/kvm_host.h |8 ++-
 arch/x86/include/asm/vmx.h  |6 +--
 arch/x86/kvm/vmx.c  |3 +
 arch/x86/kvm/x86.c  |   10 +++-
 include/linux/kvm_host.h|   20 +---
 virt/kvm/kvm_main.c |   83 ---
 11 files changed, 93 insertions(+), 61 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/6] kvm: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS

2012-12-03 Thread Alex Williamson
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM.  One is
the user accessible slots and the other is user + private.  Make this
more obvious.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 arch/ia64/include/asm/kvm_host.h|2 +-
 arch/ia64/kvm/kvm-ia64.c|2 +-
 arch/powerpc/include/asm/kvm_host.h |4 ++--
 arch/powerpc/kvm/book3s_hv.c|2 +-
 arch/s390/include/asm/kvm_host.h|2 +-
 arch/x86/include/asm/kvm_host.h |4 ++--
 arch/x86/include/asm/vmx.h  |6 +++---
 arch/x86/kvm/x86.c  |6 +++---
 include/linux/kvm_host.h|2 +-
 virt/kvm/kvm_main.c |8 
 10 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 6d6a5ac..48d7b0e 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -23,7 +23,7 @@
 #ifndef __ASM_KVM_HOST_H
 #define __ASM_KVM_HOST_H
 
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 8b3a9c0..f1a46bd 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1831,7 +1831,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
mutex_lock(kvm-slots_lock);
 
r = -EINVAL;
-   if (log-slot = KVM_MEMORY_SLOTS)
+   if (log-slot = KVM_USER_MEM_SLOTS)
goto out;
 
memslot = id_to_memslot(kvm-memslots, log-slot);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 28e8f5e..5eb1dd8 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -37,10 +37,10 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
-#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
+#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 721d460..75ce80e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1262,7 +1262,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
mutex_lock(kvm-slots_lock);
 
r = -EINVAL;
-   if (log-slot = KVM_MEMORY_SLOTS)
+   if (log-slot = KVM_USER_MEM_SLOTS)
goto out;
 
memslot = id_to_memslot(kvm-memslots, log-slot);
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b784154..ac33432 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -20,7 +20,7 @@
 #include asm/cpu.h
 
 #define KVM_MAX_VCPUS 64
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..e619519 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -31,10 +31,10 @@
 
 #define KVM_MAX_VCPUS 254
 #define KVM_SOFT_MAX_VCPUS 160
-#define KVM_MEMORY_SLOTS 32
+#define KVM_USER_MEM_SLOTS 32
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
-#define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS)
+#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 
 #define KVM_MMIO_SIZE 16
 
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 36ec21c..72932d2 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -427,9 +427,9 @@ enum vmcs_field {
 
 #define AR_RESERVD_MASK 0xfffe0f00
 
-#define TSS_PRIVATE_MEMSLOT(KVM_MEMORY_SLOTS + 0)
-#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT   (KVM_MEMORY_SLOTS + 1)
-#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 2)
+#define TSS_PRIVATE_MEMSLOT(KVM_USER_MEM_SLOTS + 0)
+#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT   (KVM_USER_MEM_SLOTS + 1)
+#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2)
 
 #define VMX_NR_VPIDS   (1  16)
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f76417..1aa3fae 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2204,7 +2204,7 @@ int kvm_dev_ioctl_check_extension(long ext)
r = KVM_MAX_VCPUS;
break;
case KVM_CAP_NR_MEMSLOTS:
-   r = KVM_MEMORY_SLOTS;
+   r = KVM_USER_MEM_SLOTS;
break;
case KVM_CAP_PV_MMU:/* obsolete */
 

[RFC PATCH 2/6] kvm: Make KVM_PRIVATE_MEM_SLOTS optional

2012-12-03 Thread Alex Williamson
Seems like everyone copied x86 and defined 4 private memory slots
that never actually get used.  Even x86 only uses 3 of the 4.  These
aren't exposed so there's no need to add padding.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 arch/ia64/include/asm/kvm_host.h|2 --
 arch/powerpc/include/asm/kvm_host.h |4 +---
 arch/s390/include/asm/kvm_host.h|2 --
 arch/x86/include/asm/kvm_host.h |4 ++--
 include/linux/kvm_host.h|4 
 5 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 48d7b0e..cfa7498 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -24,8 +24,6 @@
 #define __ASM_KVM_HOST_H
 
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 5eb1dd8..23ca70d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,9 +38,7 @@
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
-#define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
+#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index ac33432..711c5ab 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -21,8 +21,6 @@
 
 #define KVM_MAX_VCPUS 64
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
 
 struct sca_entry {
atomic_t scn;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e619519..ce8b037 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -32,8 +32,8 @@
 #define KVM_MAX_VCPUS 254
 #define KVM_SOFT_MAX_VCPUS 160
 #define KVM_USER_MEM_SLOTS 32
-/* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 4
+/* memory slots that are not exposed to userspace */
+#define KVM_PRIVATE_MEM_SLOTS 3
 #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 
 #define KVM_MMIO_SIZE 16
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fb9354d..bf8380f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -298,6 +298,10 @@ struct kvm_irq_routing_table {};
 
 #endif
 
+#ifndef KVM_PRIVATE_MEM_SLOTS
+#define KVM_PRIVATE_MEM_SLOTS 0
+#endif
+
 #ifndef KVM_MEM_SLOTS_NUM
 #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 #endif

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/6] kvm: Merge id_to_index into memslots

2012-12-03 Thread Alex Williamson
This allows us to resize this structure and therefore the number of
memslots as part of the RCU update.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 include/linux/kvm_host.h |5 ++---
 virt/kvm/kvm_main.c  |4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bf8380f..7b3d5c4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -257,6 +257,7 @@ struct kvm_memory_slot {
unsigned long userspace_addr;
int user_alloc;
int id;
+   int id_to_index;
 };
 
 static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot 
*memslot)
@@ -314,8 +315,6 @@ struct kvm_irq_routing_table {};
 struct kvm_memslots {
u64 generation;
struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
-   /* The mapping table from slot id to the index in memslots[]. */
-   int id_to_index[KVM_MEM_SLOTS_NUM];
 };
 
 struct kvm {
@@ -425,7 +424,7 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm 
*kvm)
 static inline struct kvm_memory_slot *
 id_to_memslot(struct kvm_memslots *slots, int id)
 {
-   int index = slots-id_to_index[id];
+   int index = slots-memslots[id].id_to_index;
struct kvm_memory_slot *slot;
 
slot = slots-memslots[index];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48d3ede..4f8ae4b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -445,7 +445,7 @@ static void kvm_init_memslots_id(struct kvm *kvm)
struct kvm_memslots *slots = kvm-memslots;
 
for (i = 0; i  KVM_MEM_SLOTS_NUM; i++)
-   slots-id_to_index[i] = slots-memslots[i].id = i;
+   slots-memslots[i].id_to_index = slots-memslots[i].id = i;
 }
 
 static struct kvm *kvm_create_vm(unsigned long type)
@@ -662,7 +662,7 @@ static void sort_memslots(struct kvm_memslots *slots)
  sizeof(struct kvm_memory_slot), cmp_memslot, NULL);
 
for (i = 0; i  KVM_MEM_SLOTS_NUM; i++)
-   slots-id_to_index[slots-memslots[i].id] = i;
+   slots-memslots[slots-memslots[i].id].id_to_index = i;
 }
 
 void update_memslots(struct kvm_memslots *slots, struct kvm_memory_slot *new)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/6] kvm: Move private memory slots to start of memslots array

2012-12-03 Thread Alex Williamson
In order to make the memslots array grow on demand, move the private
slots to the lower indexes of the array.  The private slots are
assumed likely to be in use, so if we didn't do this we'd end up
allocating the full memslots array all the time.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c |4 ++--
 arch/powerpc/kvm/book3s_hv.c |2 +-
 arch/x86/include/asm/vmx.h   |6 +++---
 arch/x86/kvm/vmx.c   |2 +-
 arch/x86/kvm/x86.c   |2 +-
 virt/kvm/kvm_main.c  |   15 ++-
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index f1a46bd..012e5dd 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -949,7 +949,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(kvm_mem, argp, sizeof kvm_mem))
goto out;
-   kvm_userspace_mem.slot = kvm_mem.slot;
+   kvm_userspace_mem.slot = kvm_mem.slot + KVM_PRIVATE_MEM_SLOTS;
kvm_userspace_mem.flags = kvm_mem.flags;
kvm_userspace_mem.guest_phys_addr =
kvm_mem.guest_phys_addr;
@@ -1831,7 +1831,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
mutex_lock(kvm-slots_lock);
 
r = -EINVAL;
-   if (log-slot = KVM_USER_MEM_SLOTS)
+   if (log-slot = KVM_MEM_SLOTS_NUM)
goto out;
 
memslot = id_to_memslot(kvm-memslots, log-slot);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 75ce80e..56067db 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1262,7 +1262,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
mutex_lock(kvm-slots_lock);
 
r = -EINVAL;
-   if (log-slot = KVM_USER_MEM_SLOTS)
+   if (log-slot = KVM_MEM_SLOTS_NUM)
goto out;
 
memslot = id_to_memslot(kvm-memslots, log-slot);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 72932d2..97bcd7d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -427,9 +427,9 @@ enum vmcs_field {
 
 #define AR_RESERVD_MASK 0xfffe0f00
 
-#define TSS_PRIVATE_MEMSLOT(KVM_USER_MEM_SLOTS + 0)
-#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT   (KVM_USER_MEM_SLOTS + 1)
-#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2)
+#define TSS_PRIVATE_MEMSLOT0
+#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT   1
+#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT 2
 
 #define VMX_NR_VPIDS   (1  16)
 #define VMX_VPID_EXTENT_SINGLE_CONTEXT 1
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f858159..2bb9157 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2750,7 +2750,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
gfn_t base_gfn;
 
slots = kvm_memslots(kvm);
-   slot = id_to_memslot(slots, 0);
+   slot = id_to_memslot(slots, KVM_PRIVATE_MEM_SLOTS);
base_gfn = slot-base_gfn + slot-npages - 3;
 
return base_gfn  PAGE_SHIFT;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1aa3fae..8765485 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6449,7 +6449,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
int map_flags = MAP_PRIVATE | MAP_ANONYMOUS;
 
/* Prevent internal slot pages from being moved by fork()/COW. */
-   if (memslot-id = KVM_USER_MEM_SLOTS)
+   if (memslot-id  KVM_PRIVATE_MEM_SLOTS)
map_flags = MAP_SHARED | MAP_ANONYMOUS;
 
/*To keep backward compatibility with older userspace,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f8ae4b..3ce2664 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -761,7 +761,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
/* Check for overlaps */
r = -EEXIST;
kvm_for_each_memslot(slot, kvm-memslots) {
-   if (slot-id = KVM_USER_MEM_SLOTS || slot == memslot)
+   if (slot-id  KVM_PRIVATE_MEM_SLOTS || slot == memslot)
continue;
if (!((base_gfn + npages = slot-base_gfn) ||
  (base_gfn = slot-base_gfn + slot-npages)))
@@ -879,7 +879,7 @@ int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
   kvm_userspace_memory_region *mem,
   int user_alloc)
 {
-   if (mem-slot = KVM_USER_MEM_SLOTS)
+   if (mem-slot = KVM_MEM_SLOTS_NUM)
return -EINVAL;
return kvm_set_memory_region(kvm, mem, user_alloc);
 }
@@ -893,7 +893,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
unsigned long any = 0;
 
r = -EINVAL;
-   if (log-slot = KVM_USER_MEM_SLOTS)
+   if 

Re: [net-next rfc v7 3/3] virtio-net: change the number of queues through ethtool

2012-12-03 Thread Ben Hutchings
On Mon, 2012-12-03 at 13:25 +0200, Michael S. Tsirkin wrote:
 On Mon, Dec 03, 2012 at 02:09:28PM +0800, Jason Wang wrote:
  On Sunday, December 02, 2012 06:09:06 PM Michael S. Tsirkin wrote:
   On Tue, Nov 27, 2012 at 06:16:00PM +0800, Jason Wang wrote:
This patch implement the {set|get}_channels method of ethool to allow 
user
to change the number of queues dymaically when the device is running.
This would let the user to configure it on demand.

Signed-off-by: Jason Wang jasow...@redhat.com
---

 drivers/net/virtio_net.c |   41 
+
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bcaa6e5..f08ec2a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1578,10 +1578,51 @@ static struct virtio_driver virtio_net_driver = 
{

 #endif
 };

+/* TODO: Eliminate OOO packets during switching */
+static int virtnet_set_channels(struct net_device *dev,
+   struct ethtool_channels *channels)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   u16 queue_pairs = channels-combined_count;
 
 by the way shouldn't this be combined_count / 2?
 
 And below channels-max_combined = vi-max_queue_pairs * 2; ?
[...]

In this ethtool API, 'channel' means an IRQ and set of queues that
trigger it.  So each ethtool-channel will correspond to one queue-pair
and not one virtio channel.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] KVM: ARM: add vgic state save and restore support

2012-12-03 Thread Rusty Russell
Peter Maydell peter.mayd...@linaro.org writes:
 On 3 December 2012 10:36, Dong Aisheng b29...@freescale.com wrote:
 and via current ONE_REG interface we do not know which CPU
 is performing the register access, so the banked registers are not
 suppported well,

 Actually you do, because it's a vcpu ioctl. That does raise a
 different question, though. ONE_REG is currently aimed as a vcpu
 ioctl for CPU state save/restore -- how does it need to change
 to handle device state save/restore where the device is not per-cpu?

Good question.  I'd prefer to stretch the existing interface, than
create an identical one for non-per-cpu resources, but I could be swayed
if there were many other cases.

The simplest method is to give it a new type and mirror the non-per-cpu
regs on all vcpus.  Then a completely naive implementation save/restore
will Just Work, with redundant data.  Or we could pick a CPU, but that
means your discovery logic becomes more complex unless you know which
CPU it is (lowest number?).

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Maybe avoid a IPI between the cpu cores

2012-12-03 Thread yi li
The cpu inject the interrupt to vcpu which vcpu-cpu is the same as it.
And it maybe  avoid a IPI between the cpu core.

Signed-off-by: Yi Li yiliker...@gmail.com

--- linux/virt/kvm/irq_comm.c   2012-12-04 10:14:57.711024619 +0800
+++ linux/virt/kvm/irq_comm.c   2012-12-04 11:01:27.728859597 +0800
@@ -64,9 +64,10 @@ inline static bool kvm_is_dm_lowest_prio
 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq)
 {
-   int i, r = -1;
+   int i, cpu, r = -1;
struct kvm_vcpu *vcpu, *lowest = NULL;

+   cpu = get_cpu();
if (irq-dest_mode == 0  irq-dest_id == 0xff 
kvm_is_dm_lowest_prio(irq)) {
printk(KERN_INFO kvm: apic: phys broadcast and lowest prio\n);
@@ -89,13 +90,17 @@ int kvm_irq_delivery_to_apic(struct kvm
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
} else if (kvm_lapic_enabled(vcpu)) {
-   if (!lowest)
+   if(vcpu-cpu == cpu) {
+   lowest = vcpu;
+   break;
+   }
+   else if (!lowest)
lowest = vcpu;
else if (kvm_apic_compare_prio(vcpu, lowest)  0)
lowest = vcpu;
}
}
-
+   put_cpu();
if (lowest)
r = kvm_apic_set_irq(lowest, irq);


YiLi
Thanks
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next rfc v7 1/3] virtio-net: separate fields of sending/receiving queue from virtnet_info

2012-12-03 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 On Monday, December 03, 2012 12:25:42 PM Rusty Russell wrote:
  +
  +  /* Work struct for refilling if we run low on memory. */
  +  struct delayed_work refill;
 
 I can't really see the justificaiton for a refill per queue.  Just have
 one work iterate all the queues if it happens, unless it happens often
 (in which case, we need to look harder at this anyway).

 But during this kind of iteration, we may need enable/disable the napi 
 regardless of whether the receive queue has lots to be refilled. This may add 
 extra latency. 

Sure, but does it actually happen?  We only use the work when we run out
of memory.  If this happens in normal behaviour we need to change
something else...

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Maybe avoid a IPI between the cpu cores

2012-12-03 Thread yi li
Maybe the patch have a risk to break the process of lapic.just  avoid
a IPI between the cpu core asap.

Thanks
YiLi

2012/12/4 yi li yiliker...@gmail.com:
 The cpu inject the interrupt to vcpu which vcpu-cpu is the same as it.
 And it maybe  avoid a IPI between the cpu core.

 Signed-off-by: Yi Li yiliker...@gmail.com

 --- linux/virt/kvm/irq_comm.c   2012-12-04 10:14:57.711024619 +0800
 +++ linux/virt/kvm/irq_comm.c   2012-12-04 11:01:27.728859597 +0800
 @@ -64,9 +64,10 @@ inline static bool kvm_is_dm_lowest_prio
  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 struct kvm_lapic_irq *irq)
  {
 -   int i, r = -1;
 +   int i, cpu, r = -1;
 struct kvm_vcpu *vcpu, *lowest = NULL;

 +   cpu = get_cpu();
 if (irq-dest_mode == 0  irq-dest_id == 0xff 
 kvm_is_dm_lowest_prio(irq)) {
 printk(KERN_INFO kvm: apic: phys broadcast and lowest 
 prio\n);
 @@ -89,13 +90,17 @@ int kvm_irq_delivery_to_apic(struct kvm
 r = 0;
 r += kvm_apic_set_irq(vcpu, irq);
 } else if (kvm_lapic_enabled(vcpu)) {
 -   if (!lowest)
 +   if(vcpu-cpu == cpu) {
 +   lowest = vcpu;
 +   break;
 +   }
 +   else if (!lowest)
 lowest = vcpu;
 else if (kvm_apic_compare_prio(vcpu, lowest)  0)
 lowest = vcpu;
 }
 }
 -
 +   put_cpu();
 if (lowest)
 r = kvm_apic_set_irq(lowest, irq);


 YiLi
 Thanks
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 1/4] x86: PIT connects to pin 2 of IOAPIC

2012-12-03 Thread Zhang, Yang Z
Gleb Natapov wrote on 2012-12-03:
 On Mon, Dec 03, 2012 at 03:01:01PM +0800, Yang Zhang wrote:
 When PIT connects to IOAPIC, it route to pin 2 not pin 0.
 
 This hack didn't work for a long time and nobody complained. It
 should be safe to get rid of it.
This hack is used to work around an issue about RHEL4. Not sure whether anyone 
still uses RHEL4.

 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  virt/kvm/ioapic.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index
 cfb7e4d..166c450 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c
 @@ -181,7 +181,7 @@ static int ioapic_deliver(struct kvm_ioapic
 *ioapic, int irq)
 
  #ifdef CONFIG_X86
  /* Always delivery PIT interrupt to vcpu 0 */
 -if (irq == 0) {
 +if (irq == 2) {
  irqe.dest_mode = 0; /* Physical mode. */
  /* need to read apic_id from apic regiest since
   * it can be rewritten */
 --
 1.7.1
 
 --
   Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Remove unused bits checking for EPT logic.

2012-12-03 Thread Xiantao Zhang
From: Zhang Xiantao xiantao.zh...@intel.com

This two bits{bit 6 in exit_qualification, and bit 24 in VMX_EPT_VPID_CAP_MSR} 
are not available in SDM, remove them for consistence.

Zhang Xiantao (2):
  kvm: remove unnecessary bit checking for ept violation
  kvm: don't use bit24 for detecting address-specific invalidation
capability

 arch/x86/include/asm/vmx.h |3 +--
 arch/x86/kvm/vmx.c |   21 -
 2 files changed, 1 insertions(+), 23 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm: remove unnecessary bit checking for ept violation

2012-12-03 Thread Xiantao Zhang
From: Zhang Xiantao xiantao.zh...@intel.com

Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it.

Signed-off-by: Zhang Xiantao xiantao.zh...@intel.com
---
 arch/x86/kvm/vmx.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ad6b1dd..d077996 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4827,11 +4827,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
-   if (exit_qualification  (1  6)) {
-   printk(KERN_ERR EPT: GPA exceeds GAW!\n);
-   return -EINVAL;
-   }
-
gla_validity = (exit_qualification  7)  0x3;
if (gla_validity != 0x3  gla_validity != 0x1  gla_validity != 0) {
printk(KERN_ERR EPT: Handling EPT violation failed!\n);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm: don't use bit24 for detecting address-specific invalidation capability

2012-12-03 Thread Xiantao Zhang
From: Zhang Xiantao xiantao.zh...@intel.com

Bit24 in VMX_EPT_VPID_CAP_MASI is  not used for address-specific invalidation 
capability
reporting, so remove it from KVM to avoid conflicts in future.

Signed-off-by: Zhang Xiantao xiantao.zh...@intel.com
---
 arch/x86/include/asm/vmx.h |3 +--
 arch/x86/kvm/vmx.c |   16 
 2 files changed, 1 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 36ec21c..c2d56b3 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -445,8 +445,7 @@ enum vmcs_field {
 #define VMX_EPTP_WB_BIT(1ull  14)
 #define VMX_EPT_2MB_PAGE_BIT   (1ull  16)
 #define VMX_EPT_1GB_PAGE_BIT   (1ull  17)
-#define VMX_EPT_AD_BIT (1ull  21)
-#define VMX_EPT_EXTENT_INDIVIDUAL_BIT  (1ull  24)
+#define VMX_EPT_AD_BIT (1ull  21)
 #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull  25)
 #define VMX_EPT_EXTENT_GLOBAL_BIT  (1ull  26)
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d077996..81b1aa6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -802,11 +802,6 @@ static inline bool cpu_has_vmx_ept_ad_bits(void)
return vmx_capability.ept  VMX_EPT_AD_BIT;
 }
 
-static inline bool cpu_has_vmx_invept_individual_addr(void)
-{
-   return vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT;
-}
-
 static inline bool cpu_has_vmx_invept_context(void)
 {
return vmx_capability.ept  VMX_EPT_EXTENT_CONTEXT_BIT;
@@ -1051,17 +1046,6 @@ static inline void ept_sync_context(u64 eptp)
}
 }
 
-static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa)
-{
-   if (enable_ept) {
-   if (cpu_has_vmx_invept_individual_addr())
-   __invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR,
-   eptp, gpa);
-   else
-   ept_sync_context(eptp);
-   }
-}
-
 static __always_inline unsigned long vmcs_readl(unsigned long field)
 {
unsigned long value;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support

2012-12-03 Thread Zhang, Yang Z
Gleb Natapov wrote on 2012-12-03:
 On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
 Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
 manually, which is fully taken care of by the hardware. This needs
 some special awareness into existing interrupr injection path:
 
 - for pending interrupt, instead of direct injection, we may need
   update architecture specific indicators before resuming to guest.
 - A pending interrupt, which is masked by ISR, should be also
   considered in above update action, since hardware will decide
   when to inject it at right time. Current has_interrupt and
   get_interrupt only returns a valid vector from injection p.o.v.
 Most of my previous comments still apply.
 
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 Signed-off-by: Kevin Tian kevin.t...@intel.com
 ---
  arch/x86/include/asm/kvm_host.h |4 + arch/x86/include/asm/vmx.h   
|   11 +++ arch/x86/kvm/irq.c  |   53 ++-
  arch/x86/kvm/lapic.c|   56 +---
  arch/x86/kvm/lapic.h|6 ++ arch/x86/kvm/svm.c  
 |   19 + arch/x86/kvm/vmx.c  |  140
  ++- arch/x86/kvm/x86.c
   |   34 -- virt/kvm/ioapic.c   |1 + 9 files
  changed, 291 insertions(+), 33 deletions(-)
 diff --git a/arch/x86/include/asm/kvm_host.h
 b/arch/x86/include/asm/kvm_host.h index dc87b65..e5352c8 100644 ---
 a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
 @@ -697,6 +697,10 @@ struct kvm_x86_ops {
  void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
  void (*enable_irq_window)(struct kvm_vcpu *vcpu);
  void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
 +int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
 +void (*update_irq)(struct kvm_vcpu *vcpu);
 +void (*set_eoi_exitmap)(struct kvm_vcpu *vcpu, int vector,
 +int trig_mode, int always_set);
  int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
  int (*get_tdp_level)(void);
  u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
 index 21101b6..1003341 100644
 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -62,6 +62,7 @@
  #define EXIT_REASON_MCE_DURING_VMENTRY  41 #define
  EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS
  44 +#define EXIT_REASON_EOI_INDUCED 45 #define
  EXIT_REASON_EPT_VIOLATION   48 #define EXIT_REASON_EPT_MISCONFIG  
  49 #define EXIT_REASON_WBINVD  54 @@ -143,6 +144,7 @@
  #define SECONDARY_EXEC_WBINVD_EXITING   0x0040 #define
  SECONDARY_EXEC_UNRESTRICTED_GUEST   0x0080 #define
  SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100 +#define
  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200 #define
  SECONDARY_EXEC_PAUSE_LOOP_EXITING   0x0400 #define
  SECONDARY_EXEC_ENABLE_INVPCID   0x1000
 @@ -180,6 +182,7 @@ enum vmcs_field {
  GUEST_GS_SELECTOR   = 0x080a,   GUEST_LDTR_SELECTOR   
   = 0x080c,  GUEST_TR_SELECTOR   = 0x080e,
  +   GUEST_INTR_STATUS   = 0x0810,   HOST_ES_SELECTOR 
= 0x0c00, HOST_CS_SELECTOR= 0x0c02,
  HOST_SS_SELECTOR= 0x0c04, @@ -207,6 +210,14 @@
  enum vmcs_field {   APIC_ACCESS_ADDR_HIGH   = 0x2015,   
 EPT_POINTER  
= 0x201a, EPT_POINTER_HIGH=
  0x201b,
 +EOI_EXIT_BITMAP0= 0x201c,
 +EOI_EXIT_BITMAP0_HIGH   = 0x201d,
 +EOI_EXIT_BITMAP1= 0x201e,
 +EOI_EXIT_BITMAP1_HIGH   = 0x201f,
 +EOI_EXIT_BITMAP2= 0x2020,
 +EOI_EXIT_BITMAP2_HIGH   = 0x2021,
 +EOI_EXIT_BITMAP3= 0x2022,
 +EOI_EXIT_BITMAP3_HIGH   = 0x2023,
  GUEST_PHYSICAL_ADDRESS  = 0x2400,
  GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401,
  VMCS_LINK_POINTER   = 0x2800,
 diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
 index 7e06ba1..f782788 100644
 --- a/arch/x86/kvm/irq.c
 +++ b/arch/x86/kvm/irq.c
 @@ -43,45 +43,64 @@ EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
   */
  int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
  {
 -struct kvm_pic *s;
 -
  if (!irqchip_in_kernel(v-kvm))
  return v-arch.interrupt.pending;
 -if (kvm_apic_has_interrupt(v) == -1) {  /* LAPIC */
 -if (kvm_apic_accept_pic_intr(v)) {
 -s = pic_irqchip(v-kvm);/* PIC */
 -return s-output;
 -} else
 -return 0;
 -}
 +if (kvm_apic_has_interrupt(v) == -1) /* LAPIC */
 +return kvm_cpu_has_extint(v); /* non-APIC */
  return 

Re: [net-next rfc v7 2/3] virtio_net: multiqueue support

2012-12-03 Thread Michael S. Tsirkin
On Mon, Dec 03, 2012 at 06:30:49PM +0800, Jason Wang wrote:
 On 12/03/2012 06:14 PM, Michael S. Tsirkin wrote:
  On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
   -if (!try_fill_recv(vi-rq, GFP_KERNEL))
   -schedule_delayed_work(vi-rq.refill, 0);
   +for (i = 0; i  vi-max_queue_pairs; i++)
   +if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
   +schedule_delayed_work(vi-rq[i].refill, 0);

mutex_lock(vi-config_lock);
vi-config_enable = true;
mutex_unlock(vi-config_lock);

   +BUG_ON(virtnet_set_queues(vi));
   +
return 0;
}
#endif
  Also crashing on device nack of command is also not nice.
  In this case it seems we can just switch to
  single-queue mode which should always be safe.
 
 Not sure it's safe. It depends on the reason why this call fails. If we
 left a state that the driver only use single queue but the device use
 multi queues, we may still lost the network.

Looks like we won't: napi will stay enabled on all queues
so we will process incoming packets.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes

2012-12-03 Thread Glauber Costa
On 12/04/2012 12:16 AM, Marcelo Tosatti wrote:
 On Mon, Dec 03, 2012 at 12:36:33PM +0400, Glauber Costa wrote:
 On 11/30/2012 09:55 PM, Pavel Emelyanov wrote:
 Hello,

 This is an attempt to implement support for memory snapshot for the the
 checkpoint-restore project (http://criu.org).

 To create a dump of an application(s) we save all the information about it
 to files. No surprise, the biggest part of such dump is the contents of 
 tasks'
 memory. However, in some usage scenarios it's not required to get _all_ the
 task memory while creating a dump. For example, when doing periodical dumps
 it's only required to take full memory dump only at the first step and then
 take incremental changes of memory. Another example is live migration. In 
 the
 simplest form it looks like -- create dump, copy it on the remote node then
 restore tasks from dump files. While all this dump-copy-restore thing goes 
 all
 the process must be stopped. However, if we can monitor how tasks change 
 their
 memory, we can dump and copy it in smaller chunks, periodically updating it 
 and thus freezing tasks only at the very end for the very short time to pick
 up the recent changes.

 That said, some help from kernel to watch how processes modify the contents 
 of
 their memory is required. I'd like to propose one possible solution of this
 task -- with the help of page-faults and trace events.

 Briefly the approach is -- remap some memory regions as read-only, get the 
 #pf
 on task's attempt to modify the memory and issue a trace event of that. 
 Since
 we're only interested in parts of memory of some tasks, make it possible to 
 mark
 the vmas we're interested in and issue events for them only. Also, to be 
 aware
 of tasks unmapping the vma-s being watched, also issue an event when the 
 marked
 vma is removed (and for symmetry -- an event when a vma is marked).

 What do you think about this approach? Is this way of supporting mem 
 snapshot
 OK for you, or should we invent some better one?


 The page fault mechanism is pretty obvious - anything that deals with
 dirty pages will end up having to do this. So there is nothing crazy
 about this.

 What concerns me, however, is that should this go in, we'll have two
 dirty mem loggers in the kernel: one to support CRIU, one to support
 KVM. And the worst part: They have the exact the same purpose!!

 So to begin with, I think one thing to consider, would be to generalize
 KVM's dirty memory notification so it can work on a normal process
 memory region. KVM api requires a memory slot to be passed, something
 we are unlikely to have. But KVM can easily keep its API and use an
 alternate mechanics, that's trivial...

 Generally speaking, KVM will do polling with this ioctl. I prefer your
 tracing mechanism better. The only difference, is that KVM tends to
 transfer large chunks of memory in some loads - in the high gigs range.
 So the proposal tracing API should be able to optionally batch requests
 within a time frame.

 It would also be good to hear what does the KVM guys think of it as well
 
 There are significant differences. KVM's dirty logging works for
 guest translations (NPT/shadow) and is optimized for specific use cases.
 
 Above is about dirty logging of userspace memory areas.

This is envelope.

At the end, KVM is tracking pages, regardless of their format, and want
to know when pages are dirty, and which pages are dirty.

What you are saying, for example, is that if Linux had a dirty page
tracking for userpages memory, kvm could not make use of it ? IIRC
qemu's migration algorithm, a notification of dirty user pages - plus an
eventual extra trickery - would do just fine.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv

2012-12-03 Thread Scott Wood

On 12/01/2012 07:58:25 AM, Alexander Graf wrote:

In BookE, EPCR is defined and valid when either the HV or the 64bit
category are implemented. Reflect this in the field definition.

Today the only KVM target on 64bit is HV enabled, so there is no
change in actual source code, but this keeps the code closer to the
spec and doesn't build up artificial road blocks for a PR KVM
on 64bit.

[snip]

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h  
b/arch/powerpc/include/asm/kvm_host.h

index 62fbd38..3480526 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -405,14 +405,19 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_KVM_BOOKE_HV
u32 host_mas4;
u32 host_mas6;
-   u32 shadow_epcr;
-   u32 epcr;
u32 shadow_msrp;
u32 eplc;
u32 epsc;
u32 oldpir;
 #endif

+#if defined(CONFIG_BOOKE)
+#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT)
+   u32 shadow_epcr;
+   u32 epcr;
+#endif
+#endif


On a PR-mode implementation, why would we be have a shadow_epcr?  It  
would always be set based on the host kernel, just like when running  
any other userspace process.


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Make EPCR a valid field for booke64 and bookehv

2012-12-03 Thread Alexander Graf

On 03.12.2012, at 17:47, Scott Wood wrote:

 On 12/01/2012 07:58:25 AM, Alexander Graf wrote:
 In BookE, EPCR is defined and valid when either the HV or the 64bit
 category are implemented. Reflect this in the field definition.
 Today the only KVM target on 64bit is HV enabled, so there is no
 change in actual source code, but this keeps the code closer to the
 spec and doesn't build up artificial road blocks for a PR KVM
 on 64bit.
 [snip]
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
 arch/powerpc/include/asm/kvm_host.h |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 62fbd38..3480526 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -405,14 +405,19 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_KVM_BOOKE_HV
  u32 host_mas4;
  u32 host_mas6;
 -u32 shadow_epcr;
 -u32 epcr;
  u32 shadow_msrp;
  u32 eplc;
  u32 epsc;
  u32 oldpir;
 #endif
 +#if defined(CONFIG_BOOKE)
 +#if defined(CONFIG_KVM_BOOKE_HV) || defined(CONFIG_64BIT)
 +u32 shadow_epcr;
 +u32 epcr;
 +#endif
 +#endif
 
 On a PR-mode implementation, why would we be have a shadow_epcr?  It would 
 always be set based on the host kernel, just like when running any other 
 userspace process.

Right - we could simply set MSR_CM. I'll move shadow_epcr back into the HV only 
bit above.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html