Re: vhost-scsi support for ANY_LAYOUT

2015-01-27 Thread Paolo Bonzini


On 27/01/2015 07:35, Nicholas A. Bellinger wrote:
 Hi MST  Paolo,
 
 So I'm currently working on vhost-scsi support for ANY_LAYOUT, and
 wanted to verify some assumptions based upon your earlier emails..
 
 *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
 virtio-scsi request + response headers will (always..?) be within a
 single iovec.
 
 *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
 virtio-scsi request + response headers may be (but not always..?)
 combined with data-out + data-in payloads into a single iovec.
 
 *) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected
 that PI and data payloads for data-out + data-in may be (but not
 always..?) within the same iovec.  Consequently, both headers + PI +
 data-payloads may also be within a single iovec.

s/expected/possible/g

Any split between header and payload is possible.  It's also possible to
split the header in multiple parts, e.g. put the CDB or sense buffer in
a separate iovec.  Even a single field could be split across two iovecs.

 *) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc()
 in order to determine the data_direction...?  If not, what's the
 preferred way of determining this information for get_user_pages_fast()
 permission bits and target_submit_cmd_map_sgls()..?

No, it's not safe.  What QEMU does is to check if the output buffers are
collectively larger than the request header, and if the input buffers
are collectively larger than the response header.  This lets you cpmpute
the data direction.

 Also, what is required on the QEMU side in order to start generating
 ANY_LAYOUT style iovecs to verify the WIP changes..? 

Nothing on the QEMU side.  You could test any-layout payloads by
changing the kernel side.  Splitting the sense buffer into its own iovec
for example is a trivial patch to drivers/scsi/virtio_scsi.c.

It also helps to adhere to coding conventions.  For example in QEMU I'm
never using iov[N], and I'm restricting usage of iovcnt to (iov, iovcnt)
parameter pairs.  This was suggested by Michael and found a couple bugs
indeed.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches

2015-01-27 Thread Marc Zyngier
On 27/01/15 13:17, Christoffer Dall wrote:
 On Tue, Jan 27, 2015 at 11:21:38AM +, Marc Zyngier wrote:
 On 26/01/15 22:58, Christoffer Dall wrote:
 On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote:
 Trying to emulate the behaviour of set/way cache ops is fairly
 pointless, as there are too many ways we can end-up missing stuff.
 Also, there is some system caches out there that simply ignore
 set/way operations.

 So instead of trying to implement them, let's convert it to VA ops,
 and use them as a way to re-enable the trapping of VM ops. That way,
 we can detect the point when the MMU/caches are turned off, and do
 a full VM flush (which is what the guest was trying to do anyway).

 This allows a 32bit zImage to boot on the APM thingy, and will
 probably help bootloaders in general.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com

 This had some conflicts with dirty page logging.  I fixed it up here,
 and also removed some trailing white space and mixed spaces/tabs that
 patch complained about:

 http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes

 Thanks for doing so.

 ---
  arch/arm/include/asm/kvm_emulate.h   | 10 +
  arch/arm/include/asm/kvm_host.h  |  3 --
  arch/arm/include/asm/kvm_mmu.h   |  3 +-
  arch/arm/kvm/arm.c   | 10 -
  arch/arm/kvm/coproc.c| 64 ++
  arch/arm/kvm/coproc_a15.c|  2 +-
  arch/arm/kvm/coproc_a7.c |  2 +-
  arch/arm/kvm/mmu.c   | 70 
 -
  arch/arm/kvm/trace.h | 39 +++
  arch/arm64/include/asm/kvm_emulate.h | 10 +
  arch/arm64/include/asm/kvm_host.h|  3 --
  arch/arm64/include/asm/kvm_mmu.h |  3 +-
  arch/arm64/kvm/sys_regs.c| 75 
 +---
  13 files changed, 155 insertions(+), 139 deletions(-)

 diff --git a/arch/arm/include/asm/kvm_emulate.h 
 b/arch/arm/include/asm/kvm_emulate.h
 index 66ce176..7b01523 100644
 --- a/arch/arm/include/asm/kvm_emulate.h
 +++ b/arch/arm/include/asm/kvm_emulate.h
 @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
   vcpu-arch.hcr = HCR_GUEST_MASK;
  }

 +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
 +{
 + return vcpu-arch.hcr;
 +}
 +
 +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
 +{
 + vcpu-arch.hcr = hcr;
 +}
 +
  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
  {
   return 1;
 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index 254e065..04b4ea0 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
* Anything that is not used directly from assembly code goes
* here.
*/
 - /* dcache set/way operation pending */
 - int last_pcpu;
 - cpumask_t require_dcache_flush;

   /* Don't run the guest on this vcpu */
   bool pause;
 diff --git a/arch/arm/include/asm/kvm_mmu.h 
 b/arch/arm/include/asm/kvm_mmu.h
 index 63e0ecc..286644c 100644
 --- a/arch/arm/include/asm/kvm_mmu.h
 +++ b/arch/arm/include/asm/kvm_mmu.h
 @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct 
 kvm_vcpu *vcpu, hva_t hva,

  #define kvm_virt_to_phys(x)  virt_to_idmap((unsigned long)(x))

 -void stage2_flush_vm(struct kvm *kvm);
 +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);

  #endif   /* !__ASSEMBLY__ */

 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 2d6d910..0b0d58a 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
 cpu)
   vcpu-cpu = cpu;
   vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);

 - /*
 -  * Check whether this vcpu requires the cache to be flushed on
 -  * this physical CPU. This is a consequence of doing dcache
 -  * operations by set/way on this vcpu. We do it here to be in
 -  * a non-preemptible section.
 -  */
 - if (cpumask_test_and_clear_cpu(cpu, 
 vcpu-arch.require_dcache_flush))
 - flush_cache_all(); /* We'd really want v7_flush_dcache_all() 
 */
 -
   kvm_arm_set_running_vcpu(vcpu);
  }

 @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);

   vcpu-mode = OUTSIDE_GUEST_MODE;
 - vcpu-arch.last_pcpu = smp_processor_id();
   kvm_guest_exit();
   trace_kvm_exit(*vcpu_pc(vcpu));
   /*
 diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
 index 7928dbd..0afcc00 100644
 --- a/arch/arm/kvm/coproc.c
 +++ b/arch/arm/kvm/coproc.c
 @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
   return true;
  }

 

Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-27 Thread Mikhail Sennikovskii

Hi all,

I've posted the bolow mail to the qemu-dev mailing list, but I've got no 
response there.
That's why I decided to re-post it here as well, and besides that I 
think this could be a kvm-specific issue as well.


Some additional thing to note:
I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 
kernel as well.
I would typically use a max_downtime adjusted to 1 second instead of 
default 30 ms.
I also noticed that the issue happens much more rarelly if I increase 
the migration bandwidth, i.e. like


diff --git a/migration.c b/migration.c
index 26f4b65..d2e3b39 100644
--- a/migration.c
+++ b/migration.c
@@ -36,7 +36,7 @@ enum {
 MIG_STATE_COMPLETED,
 };

-#define MAX_THROTTLE  (32  20)  /* Migration speed throttling */
+#define MAX_THROTTLE  (90  20)  /* Migration speed throttling */

Like I said below, I would be glad to provide you with any additional 
information.


Thanks,
Mikhail

On 23.01.2015 15:03, Mikhail Sennikovskii wrote:

Hi all,

I'm running a slitely modified migration over tcp test in virt-test, 
which does a migration from one smp=2 VM to another on the same host 
over TCP,
and exposes some dummy CPU load inside the GUEST while migration, and 
after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT 
BSOD inside the guest,

which happens when

An expected clock interrupt was not received on a secondary processor 
in an
MP system within the allocated interval. This indicates that the 
specified

processor is hung and not processing interrupts.


This seems to happen with any qemu version I've tested (1.2 and above, 
including upstream),
and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 
14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 
6 with SMP6 host.


One thing I noticed is that exposing a dummy CPU load on the HOST 
(like running multiple instances of the while true; do false; done 
script) in parallel with doing migration makes the issue to be quite 
easily reproducible.



Looking inside the windows crash dump, the second CPU is just running 
at IRQL 0, and it aparently not hung, as Windows is able to save its 
state in the crash dump correctly, which assumes running some code on it.
So this aparently seems to be some timing issue (like host scheduler 
does not schedule the thread executing secondary CPU's code in time).


Could you give me some insight on this, i.e. is there a way to 
customize QEMU/KVM to avoid such issue?


If you think this might be a qemu/kvm issue, I can provide you any 
info, like windows crash dumps, or the test-case to reproduce this.



qemu is started as:

from-VM:

qemu-system-x86_64 \
-S  \
-name 'virt-tests-vm1'  \
-sandbox off  \
-M pc-1.0  \
-nodefaults  \
-vga std  \
-chardev 
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait 
\

-mon chardev=qmp_id_qmp1,mode=control  \
-chardev 
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait 
\

-device isa-serial,chardev=serial_id_serial0  \
-chardev 
socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait 
\
-device 
isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 
\

-device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
-device 
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 
\
-device 
virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 
\
-netdev 
user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \

-m 2G  \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
-cpu phenom \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=none  \
-boot order=cdn,once=c,menu=off \
-enable-kvm

to-VM:

qemu-system-x86_64 \
-S  \
-name 'virt-tests-vm1'  \
-sandbox off  \
-M pc-1.0  \
-nodefaults  \
-vga std  \
-chardev 
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait 
\

-mon chardev=qmp_id_qmp1,mode=control  \
-chardev 
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait 
\

-device isa-serial,chardev=serial_id_serial0  \
-chardev 
socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait 
\
-device 
isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 
\

-device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
-device 
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 
\
-device 
virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 
\
-netdev 

Re: [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit

2015-01-27 Thread Luis R. Rodriguez
On Tue, Jan 27, 2015 at 10:06:44AM +, David Vrabel wrote:
 On 27/01/15 08:35, Jan Beulich wrote:
  On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote:
  
  Even if David told you this would be acceptable, I have to question
  an abstract model of fixing issues on only 64-bit kernels - this may
  be acceptable for distro purposes, but seems hardly the right
  approach for upstream. If 32-bit ones are to become deliberately
  broken, the XEN config option should become dependent on !X86_32.
 
 I'd rather have something omitted (keeping the current behaviour) than
 something that has not been tested at all.
 
 Obviously it would be preferable to to fix both 32-bit and 64-bit x86
 (and ARM as well) but I'm not going to block an important bug fix for
 the majority use case (64-bit x86).

The hunk for 32-bit should indeed only go in once we get a Tested-by.
Please let me know if there is anything else needed for this patch.

  Luis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] x86/arm64: add xenconfig

2015-01-27 Thread Luis R. Rodriguez
On Fri, Jan 23, 2015 at 03:19:25PM +, Stefano Stabellini wrote:
 On Fri, 23 Jan 2015, Luis R. Rodriguez wrote:
  On Wed, Jan 14, 2015 at 11:33:45AM -0800, Luis R. Rodriguez wrote:
   From: Luis R. Rodriguez mcg...@suse.com
   
   This v3 addresses Stefano's feedback from the v2 series, namely
   moving PCI stuff to x86 as its all x86 specific and also just
   removing the CONFIG_TCG_XEN=m from the general config. To be
   clear the changes from the v2 series are below.
   
   Luis R. Rodriguez (2):
 x86, platform, xen, kconfig: clarify kvmconfig is for kvm
 x86, arm, platform, xen, kconfig: add xen defconfig helper
   
arch/x86/configs/xen.config | 10 ++
kernel/configs/xen.config   | 26 ++
scripts/kconfig/Makefile|  7 ++-
3 files changed, 42 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/configs/xen.config
create mode 100644 kernel/configs/xen.config
  
  Who could these changes go through?
 
 I would be OK with taking it in the Xen tree, but I would feel more
 comfortable doing that if you had an ack from Yann Morin (CC'ed).

*Poke*

  Luis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost-scsi support for ANY_LAYOUT

2015-01-27 Thread Michael S. Tsirkin
On Tue, Jan 27, 2015 at 09:42:22AM +0100, Paolo Bonzini wrote:
 
 
 On 27/01/2015 07:35, Nicholas A. Bellinger wrote:
  Hi MST  Paolo,
  
  So I'm currently working on vhost-scsi support for ANY_LAYOUT, and
  wanted to verify some assumptions based upon your earlier emails..
  
  *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
  virtio-scsi request + response headers will (always..?) be within a
  single iovec.
  
  *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
  virtio-scsi request + response headers may be (but not always..?)
  combined with data-out + data-in payloads into a single iovec.
  
  *) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected
  that PI and data payloads for data-out + data-in may be (but not
  always..?) within the same iovec.  Consequently, both headers + PI +
  data-payloads may also be within a single iovec.
 
 s/expected/possible/g
 
 Any split between header and payload is possible.  It's also possible to
 split the header in multiple parts, e.g. put the CDB or sense buffer in
 a separate iovec.  Even a single field could be split across two iovecs.
 
  *) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc()
  in order to determine the data_direction...?  If not, what's the
  preferred way of determining this information for get_user_pages_fast()
  permission bits and target_submit_cmd_map_sgls()..?
 
 No, it's not safe.  What QEMU does is to check if the output buffers are
 collectively larger than the request header, and if the input buffers
 are collectively larger than the response header.  This lets you cpmpute
 the data direction.

Generally true, but I'd like to clarify.

I think what we can say is that if there's in value,
host is expected to write data, and if there's out value,
to read data.

But this includes the headers so generally you need to get total length
for in and/or out (separately) and subtract header length.
But if you know there's e.g. no in header, then you can use in value
to figure out gup flags.
Or if you see there's no in data, you know you can use gup with read
only flags.


  Also, what is required on the QEMU side in order to start generating
  ANY_LAYOUT style iovecs to verify the WIP changes..? 
 
 Nothing on the QEMU side.  You could test any-layout payloads by
 changing the kernel side.  Splitting the sense buffer into its own iovec
 for example is a trivial patch to drivers/scsi/virtio_scsi.c.
 
 It also helps to adhere to coding conventions.  For example in QEMU I'm
 never using iov[N], and I'm restricting usage of iovcnt to (iov, iovcnt)
 parameter pairs.  This was suggested by Michael and found a couple bugs
 indeed.
 
 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost-scsi support for ANY_LAYOUT

2015-01-27 Thread Paolo Bonzini


On 27/01/2015 09:42, Paolo Bonzini wrote:
  *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
  virtio-scsi request + response headers may be (but not always..?)
  combined with data-out + data-in payloads into a single iovec.

Note that request and data-out can be combined, and response + data-in
can be combined, but there still have to be separate iovecs for at least
outgoing and incoming buffers (so 2 buffers for the control and request
queues; 1 buffer for the event queue since it's unidirectional.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit

2015-01-27 Thread David Vrabel
On 27/01/15 08:35, Jan Beulich wrote:
 On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote:
 
 Even if David told you this would be acceptable, I have to question
 an abstract model of fixing issues on only 64-bit kernels - this may
 be acceptable for distro purposes, but seems hardly the right
 approach for upstream. If 32-bit ones are to become deliberately
 broken, the XEN config option should become dependent on !X86_32.

I'd rather have something omitted (keeping the current behaviour) than
something that has not been tested at all.

Obviously it would be preferable to to fix both 32-bit and 64-bit x86
(and ARM as well) but I'm not going to block an important bug fix for
the majority use case (64-bit x86).

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit

2015-01-27 Thread Jan Beulich
 On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote:

Even if David told you this would be acceptable, I have to question
an abstract model of fixing issues on only 64-bit kernels - this may
be acceptable for distro purposes, but seems hardly the right
approach for upstream. If 32-bit ones are to become deliberately
broken, the XEN config option should become dependent on !X86_32.

Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit

2015-01-27 Thread Andrew Cooper
On 27/01/15 08:35, Jan Beulich wrote:
 On 27.01.15 at 02:51, mcg...@do-not-panic.com wrote:
 Even if David told you this would be acceptable, I have to question
 an abstract model of fixing issues on only 64-bit kernels - this may
 be acceptable for distro purposes, but seems hardly the right
 approach for upstream. If 32-bit ones are to become deliberately
 broken, the XEN config option should become dependent on !X86_32.

There are still legitimate reasons to prefer 32bit PV guests over 64bit
ones.  Previous versions of this patch had 32bit support as well.  Why
did you drop it?

~Andrew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost-scsi support for ANY_LAYOUT

2015-01-27 Thread Michael S. Tsirkin
On Mon, Jan 26, 2015 at 10:35:17PM -0800, Nicholas A. Bellinger wrote:
 Hi MST  Paolo,
 
 So I'm currently working on vhost-scsi support for ANY_LAYOUT, and
 wanted to verify some assumptions based upon your earlier emails..
 
 *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
 virtio-scsi request + response headers will (always..?) be within a
 single iovec.

Do you mean a single struct iovec - as opposed to an array?
If so this is not a valid assumption I think, it can
be split across entries as well.
For example:
struct virtio_scsi_cmd_req {
__u8 lun[8];/* Logical Unit Number */
__virtio64 tag; /* Command identifier */
__u8 task_attr; /* Task attribute */
__u8 prio;  /* SAM command priority field */
__u8 crn;
__u8 cdb[VIRTIO_SCSI_CDB_SIZE];
} __attribute__((packed));

We get struct iovec *iov and len 4, then
lun[0-3] might be in iov[0], lun[4-7], tag, task_attr, prio might be in
iov[1], crn, cdb and data-out might be in iov[2].
data-in must be separate in iov[3].

But if it makes sense, you can optimize for current
guest behaviour.



 *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that
 virtio-scsi request + response headers may be (but not always..?)
 combined with data-out + data-in payloads into a single iovec.

No. From virtio POV, you have to separate out (written by guest,
read by host) and in (written by host, read by guest).
Including both header and data.

The rule is that
1. out comes before in
2. out and in entries are separate


 *) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected
 that PI and data payloads for data-out + data-in may be (but not
 always..?) within the same iovec.  Consequently, both headers + PI +
 data-payloads may also be within a single iovec.
 
 *) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc()
 in order to determine the data_direction...?  If not, what's the
 preferred way of determining this information for get_user_pages_fast()
 permission bits and target_submit_cmd_map_sgls()..?
 
 Also, what is required on the QEMU side in order to start generating
 ANY_LAYOUT style iovecs to verify the WIP changes..?  I see
 hw/scsi/virtio-scsi.c has been converted to accept any_layout=1, but
 AFAICT the changes where only related to code not shared between
 hw/scsi/vhost-scsi.c.

QEMU needs to write-list this feature bit.

 Thank you,
 
 --nab
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] KVM: ARM: on IO mem abort - route the call to KVM MMIO bus

2015-01-27 Thread Christoffer Dall
On Sat, Jan 24, 2015 at 03:02:33AM +0200, Nikolay Nikolaev wrote:
 On Mon, Jan 12, 2015 at 7:09 PM, Eric Auger eric.au...@linaro.org wrote:
  Hi Nikolay,
  On 12/07/2014 10:37 AM, Nikolay Nikolaev wrote:
  On IO memory abort, try to handle the MMIO access thorugh the KVM
  registered read/write callbacks. This is done by invoking the relevant
  kvm_io_bus_* API.
 
  Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
  ---
   arch/arm/kvm/mmio.c |   33 +
   1 file changed, 33 insertions(+)
 
  diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
  index 4cb5a93..e42469f 100644
  --- a/arch/arm/kvm/mmio.c
  +++ b/arch/arm/kvm/mmio.c
  @@ -162,6 +162,36 @@ static int decode_hsr(struct kvm_vcpu *vcpu, 
  phys_addr_t fault_ipa,
return 0;
   }
 
  +/**
  + * handle_kernel_mmio - handle an in-kernel MMIO access
  + * @vcpu:pointer to the vcpu performing the access
  + * @run: pointer to the kvm_run structure
  + * @mmio:pointer to the data describing the access
  + *
  + * returns true if the MMIO access has been performed in kernel space,
  + * and false if it needs to be emulated in user space.
  + */
  +static bool handle_kernel_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  + struct kvm_exit_mmio *mmio)
  +{
  + int ret;
  +
  + if (mmio-is_write) {
  + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, mmio-phys_addr,
  + mmio-len, mmio-data);
  +
  + } else {
  + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, mmio-phys_addr,
  + mmio-len, mmio-data);
  + }
  + if (!ret) {
  + kvm_prepare_mmio(run, mmio);
  + kvm_handle_mmio_return(vcpu, run);
  + }
  +
  + return !ret;
  in case ret  0 (-EOPNOTSUPP = -95) aren't we returning true too? return
  (ret==0)?
 
  +}
  +
   int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 phys_addr_t fault_ipa)
   {
  @@ -200,6 +230,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
  *run,
if (vgic_handle_mmio(vcpu, run, mmio))
return 1;
 
  + if (handle_kernel_mmio(vcpu, run, mmio))
  + return 1;
  +
kvm_prepare_mmio(run, mmio);
return 0;
  currently the io_mem_abort returned value is not used by mmu.c code. I
  think this should be handed in kvm_handle_guest_abort. What do you think?
 
 You're right that the returned value is not handled further after we
 exit io_mem_abort, it's just passed up the call stack.
 However I'm not sure how to handle it better. If you have ideas, please share.
 
I'm confused: the return value from io_mem_abort is assigned to a
variable 'ret' in kvm_handle_guest_abort and that determines if we
should run the VM again or return to userspace (with some work for
userspace to do or with an error).

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.

2015-01-27 Thread Paolo Bonzini


On 24/01/2015 11:21, Wincy Van wrote:
 +   memset(vmx_msr_bitmap_nested, 0xff, PAGE_SIZE);

Most bytes are always 0xff.  It's better to initialize it to 0xff once,
and set the bit here if !nested_cpu_has_virt_x2apic_mode(vmcs12).

 +   if (nested_cpu_has_virt_x2apic_mode(vmcs12))

Please add braces here, because of the /* */ command below.

 +   /* TPR is allowed */
 +   nested_vmx_disable_intercept_for_msr(msr_bitmap,
 +   vmx_msr_bitmap_nested,
 +   APIC_BASE_MSR + (APIC_TASKPRI  4),
 +   MSR_TYPE_R | MSR_TYPE_W);
 
 +static inline int nested_vmx_check_virt_x2apic(struct kvm_vcpu *vcpu,
 +  struct vmcs12 *vmcs12)
 +{
 +   if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
 +   return -EINVAL;

No need for this function and nested_cpu_has_virt_x2apic_mode.  Just
inline them in their caller(s).  Same for other cases throughout the series.

Paolo

 +   return 0;
 +}
 +
 +static int nested_vmx_check_apicv_controls(struct kvm_vcpu *vcpu,
 +  struct vmcs12 *vmcs12)
 +{
 +   int r;
 +
 +   if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
 +   return 0;
 +
 +   r = nested_vmx_check_virt_x2apic(vcpu, vmcs12);
 +   if (r)
 +   goto fail;
 +
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/6] KVM: nVMX: Enable nested apicv support.

2015-01-27 Thread Paolo Bonzini


On 24/01/2015 11:18, Wincy Van wrote:
 v2 --- v3:
   1. Add a new field in nested_vmx to avoid the spin lock in v2.
   2. Drop send eoi to L1 when doing nested interrupt delivery.
   3. Use hardware MSR bitmap to enable nested virtualize x2apic
  mode.

I think the patches are mostly okay.  I made a few comments.

One of the things to do on top could be to avoid rebuilding the whole
vmcs02 on every entry.  Recomputing the MSR bitmap on every vmentry is
not particularly nice, for example.  It is not necessary unless the
execution controls have changed.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: iommu: Add cond_resched to legacy device assignment code

2015-01-27 Thread Paolo Bonzini


On 27/01/2015 11:57, Joerg Roedel wrote:
 From: Joerg Roedel jroe...@suse.de
 
 When assigning devices to large memory guests (=128GB guest
 memory in the failure case) the functions to create the
 IOMMU page-tables for the whole guest might run for a very
 long time. On non-preemptible kernels this might cause
 Soft-Lockup warnings. Fix these by adding a cond_resched()
 to the mapping and unmapping loops.
 
 Signed-off-by: Joerg Roedel jroe...@suse.de
 ---
  arch/x86/kvm/iommu.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/iommu.c b/arch/x86/kvm/iommu.c
 index 17b73ee..7dbced3 100644
 --- a/arch/x86/kvm/iommu.c
 +++ b/arch/x86/kvm/iommu.c
 @@ -138,7 +138,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
 kvm_memory_slot *slot)
  
   gfn += page_size  PAGE_SHIFT;
  
 -
 + cond_resched();
   }
  
   return 0;
 @@ -306,6 +306,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
   kvm_unpin_pages(kvm, pfn, unmap_pages);
  
   gfn += unmap_pages;
 +
 + cond_resched();
   }
  }
  
 

Applying to kvm/queue, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.

2015-01-27 Thread Paolo Bonzini


On 24/01/2015 11:21, Wincy Van wrote:
 +static void nested_vmx_disable_intercept_for_msr(unsigned long 
 *msr_bitmap_l1,
 +  unsigned long 
 *msr_bitmap_nested,
 +  u32 msr, int type)
 +{
 +   int f = sizeof(unsigned long);
 +
 +   if (!cpu_has_vmx_msr_bitmap())
 +   return;
 +

Also, make this a WARN_ON.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests][PATCH] x86: hypercall: fix compile error on i386

2015-01-27 Thread Chris J Arges
There is a failure to build on 32-bit hosts:
x86/hypercall.c: In function ‘test_edge’:
x86/hypercall.c:42:2: error: ‘test_rip’ undeclared (first use in this function)
  test_rip = 0;
  ^

This patch fixes this issue.

Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
---
 x86/hypercall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86/hypercall.c b/x86/hypercall.c
index 1548421..3ac5ff9 100644
--- a/x86/hypercall.c
+++ b/x86/hypercall.c
@@ -24,8 +24,8 @@ static inline long kvm_hypercall0_amd(unsigned int nr)
 }
 
 
-#ifdef __x86_64__
 volatile unsigned long test_rip;
+#ifdef __x86_64__
 extern void gp_tss(void);
 asm (gp_tss: \n\t
add $8, %rsp\n\t// discard error code
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests][PATCH] x86: hypercall: fix compile error on i386

2015-01-27 Thread Paolo Bonzini


On 27/01/2015 21:23, Chris J Arges wrote:
 There is a failure to build on 32-bit hosts:
 x86/hypercall.c: In function ‘test_edge’:
 x86/hypercall.c:42:2: error: ‘test_rip’ undeclared (first use in this 
 function)
   test_rip = 0;
   ^
 
 This patch fixes this issue.
 
 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
 ---
  x86/hypercall.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/x86/hypercall.c b/x86/hypercall.c
 index 1548421..3ac5ff9 100644
 --- a/x86/hypercall.c
 +++ b/x86/hypercall.c
 @@ -24,8 +24,8 @@ static inline long kvm_hypercall0_amd(unsigned int nr)
  }
  
  
 -#ifdef __x86_64__
  volatile unsigned long test_rip;
 +#ifdef __x86_64__
  extern void gp_tss(void);
  asm (gp_tss: \n\t
   add $8, %rsp\n\t// discard error code
 

Thanks, applied.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests][PATCH] x86: hypercall: a better fix for the compiler error

2015-01-27 Thread Chris J Arges
The last patch moves the ifdef in a way that causes a compiler warning.
Here fix the ifdefs to isloate x86_64 functions and variables for the test.

Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
---
 x86/hypercall.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/x86/hypercall.c b/x86/hypercall.c
index 3ac5ff9..d134146 100644
--- a/x86/hypercall.c
+++ b/x86/hypercall.c
@@ -24,8 +24,8 @@ static inline long kvm_hypercall0_amd(unsigned int nr)
 }
 
 
-volatile unsigned long test_rip;
 #ifdef __x86_64__
+volatile unsigned long test_rip;
 extern void gp_tss(void);
 asm (gp_tss: \n\t
add $8, %rsp\n\t// discard error code
@@ -34,7 +34,6 @@ asm (gp_tss: \n\t
iretq\n\t
jmp gp_tss\n\t
 );
-#endif
 
 static inline int
 test_edge(void)
@@ -47,6 +46,7 @@ test_edge(void)
printf(Return from int 13, test_rip = %lx\n, test_rip);
return test_rip == (1ul  47);
 }
+#endif
 
 int main(int ac, char **av)
 {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/6] KVM: nVMX: Enable nested posted interrupt processing.

2015-01-27 Thread Paolo Bonzini


On 24/01/2015 11:24, Wincy Van wrote:
 if (!nested_cpu_has_virt_x2apic_mode(vmcs12) 
 !nested_cpu_has_apic_reg_virt(vmcs12) 
 -   !nested_cpu_has_vid(vmcs12))
 +   !nested_cpu_has_vid(vmcs12) 
 +   !nested_cpu_has_posted_intr(vmcs12))
 return 0;
 
 if (nested_cpu_has_virt_x2apic_mode(vmcs12))
 r = nested_vmx_check_virt_x2apic(vcpu, vmcs12);
 if (nested_cpu_has_vid(vmcs12))
 r |= nested_vmx_check_vid(vcpu, vmcs12);
 +   if (nested_cpu_has_posted_intr(vmcs12))
 +   r |= nested_vmx_check_posted_intr(vcpu, vmcs12);

These ifs are always true.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: update_memslots: clean flags for invalid memslots

2015-01-27 Thread Paolo Bonzini


On 09/01/2015 09:29, Tiejun Chen wrote:
 Indeed, any invalid memslots should be new-npages = 0,
 new-base_gfn = 0 and new-flags = 0 at the same time.
 
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
 Paolo,
 
 This is just a small cleanup to follow-up commit, efbeec7098ee, fix
 sorting of memslots with base_gfn == 0.
 
 Tiejun
 
  virt/kvm/kvm_main.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 1cc6e2e..369c759 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -673,6 +673,7 @@ static void update_memslots(struct kvm_memslots *slots,
   if (!new-npages) {
   WARN_ON(!mslots[i].npages);
   new-base_gfn = 0;
 + new-flags = 0;
   if (mslots[i].npages)
   slots-used_slots--;
   } else {
 

Applied to kvm/queue.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-27 Thread Jidong Xiao
On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
mikhail.sennikovs...@profitbricks.com wrote:
 Hi all,

 I've posted the bolow mail to the qemu-dev mailing list, but I've got no
 response there.
 That's why I decided to re-post it here as well, and besides that I think
 this could be a kvm-specific issue as well.

 Some additional thing to note:
 I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
 well.
 I would typically use a max_downtime adjusted to 1 second instead of default
 30 ms.
 I also noticed that the issue happens much more rarelly if I increase the
 migration bandwidth, i.e. like

 diff --git a/migration.c b/migration.c
 index 26f4b65..d2e3b39 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -36,7 +36,7 @@ enum {
  MIG_STATE_COMPLETED,
  };

 -#define MAX_THROTTLE  (32  20)  /* Migration speed throttling */
 +#define MAX_THROTTLE  (90  20)  /* Migration speed throttling */

 Like I said below, I would be glad to provide you with any additional
 information.

 Thanks,
 Mikhail

Hi, Mikhail,

So if you choose to use one vcpu, instead of smp, this issue would not
happen, right?

-Jidong

 On 23.01.2015 15:03, Mikhail Sennikovskii wrote:

 Hi all,

 I'm running a slitely modified migration over tcp test in virt-test, which
 does a migration from one smp=2 VM to another on the same host over TCP,
 and exposes some dummy CPU load inside the GUEST while migration, and
 after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
 inside the guest,
 which happens when
 
 An expected clock interrupt was not received on a secondary processor in
 an
 MP system within the allocated interval. This indicates that the specified
 processor is hung and not processing interrupts.
 

 This seems to happen with any qemu version I've tested (1.2 and above,
 including upstream),
 and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
 host.

 One thing I noticed is that exposing a dummy CPU load on the HOST (like
 running multiple instances of the while true; do false; done script) in
 parallel with doing migration makes the issue to be quite easily
 reproducible.


 Looking inside the windows crash dump, the second CPU is just running at
 IRQL 0, and it aparently not hung, as Windows is able to save its state in
 the crash dump correctly, which assumes running some code on it.
 So this aparently seems to be some timing issue (like host scheduler does
 not schedule the thread executing secondary CPU's code in time).

 Could you give me some insight on this, i.e. is there a way to customize
 QEMU/KVM to avoid such issue?

 If you think this might be a qemu/kvm issue, I can provide you any info,
 like windows crash dumps, or the test-case to reproduce this.


 qemu is started as:

 from-VM:

 qemu-system-x86_64 \
 -S  \
 -name 'virt-tests-vm1'  \
 -sandbox off  \
 -M pc-1.0  \
 -nodefaults  \
 -vga std  \
 -chardev
 socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
 \
 -mon chardev=qmp_id_qmp1,mode=control  \
 -chardev
 socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
 \
 -device isa-serial,chardev=serial_id_serial0  \
 -chardev
 socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
 \
 -device
 isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
 -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
 -device
 virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
 -device
 virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
 \
 -netdev
 user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
 -m 2G  \
 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
 -cpu phenom \
 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
 -vnc :0  \
 -rtc base=localtime,clock=host,driftfix=none  \
 -boot order=cdn,once=c,menu=off \
 -enable-kvm

 to-VM:

 qemu-system-x86_64 \
 -S  \
 -name 'virt-tests-vm1'  \
 -sandbox off  \
 -M pc-1.0  \
 -nodefaults  \
 -vga std  \
 -chardev
 socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
 \
 -mon chardev=qmp_id_qmp1,mode=control  \
 -chardev
 socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
 \
 -device isa-serial,chardev=serial_id_serial0  \
 -chardev
 socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
 \
 -device
 isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \
 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
 -drive 

Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches

2015-01-27 Thread Marc Zyngier
On 26/01/15 22:58, Christoffer Dall wrote:
 On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote:
 Trying to emulate the behaviour of set/way cache ops is fairly
 pointless, as there are too many ways we can end-up missing stuff.
 Also, there is some system caches out there that simply ignore
 set/way operations.

 So instead of trying to implement them, let's convert it to VA ops,
 and use them as a way to re-enable the trapping of VM ops. That way,
 we can detect the point when the MMU/caches are turned off, and do
 a full VM flush (which is what the guest was trying to do anyway).

 This allows a 32bit zImage to boot on the APM thingy, and will
 probably help bootloaders in general.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 
 This had some conflicts with dirty page logging.  I fixed it up here,
 and also removed some trailing white space and mixed spaces/tabs that
 patch complained about:
 
 http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes

Thanks for doing so.

 ---
  arch/arm/include/asm/kvm_emulate.h   | 10 +
  arch/arm/include/asm/kvm_host.h  |  3 --
  arch/arm/include/asm/kvm_mmu.h   |  3 +-
  arch/arm/kvm/arm.c   | 10 -
  arch/arm/kvm/coproc.c| 64 ++
  arch/arm/kvm/coproc_a15.c|  2 +-
  arch/arm/kvm/coproc_a7.c |  2 +-
  arch/arm/kvm/mmu.c   | 70 -
  arch/arm/kvm/trace.h | 39 +++
  arch/arm64/include/asm/kvm_emulate.h | 10 +
  arch/arm64/include/asm/kvm_host.h|  3 --
  arch/arm64/include/asm/kvm_mmu.h |  3 +-
  arch/arm64/kvm/sys_regs.c| 75 
 +---
  13 files changed, 155 insertions(+), 139 deletions(-)

 diff --git a/arch/arm/include/asm/kvm_emulate.h 
 b/arch/arm/include/asm/kvm_emulate.h
 index 66ce176..7b01523 100644
 --- a/arch/arm/include/asm/kvm_emulate.h
 +++ b/arch/arm/include/asm/kvm_emulate.h
 @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
   vcpu-arch.hcr = HCR_GUEST_MASK;
  }

 +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
 +{
 + return vcpu-arch.hcr;
 +}
 +
 +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
 +{
 + vcpu-arch.hcr = hcr;
 +}
 +
  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
  {
   return 1;
 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index 254e065..04b4ea0 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
* Anything that is not used directly from assembly code goes
* here.
*/
 - /* dcache set/way operation pending */
 - int last_pcpu;
 - cpumask_t require_dcache_flush;

   /* Don't run the guest on this vcpu */
   bool pause;
 diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
 index 63e0ecc..286644c 100644
 --- a/arch/arm/include/asm/kvm_mmu.h
 +++ b/arch/arm/include/asm/kvm_mmu.h
 @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct 
 kvm_vcpu *vcpu, hva_t hva,

  #define kvm_virt_to_phys(x)  virt_to_idmap((unsigned long)(x))

 -void stage2_flush_vm(struct kvm *kvm);
 +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);

  #endif   /* !__ASSEMBLY__ */

 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 2d6d910..0b0d58a 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
   vcpu-cpu = cpu;
   vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);

 - /*
 -  * Check whether this vcpu requires the cache to be flushed on
 -  * this physical CPU. This is a consequence of doing dcache
 -  * operations by set/way on this vcpu. We do it here to be in
 -  * a non-preemptible section.
 -  */
 - if (cpumask_test_and_clear_cpu(cpu, vcpu-arch.require_dcache_flush))
 - flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
 -
   kvm_arm_set_running_vcpu(vcpu);
  }

 @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);

   vcpu-mode = OUTSIDE_GUEST_MODE;
 - vcpu-arch.last_pcpu = smp_processor_id();
   kvm_guest_exit();
   trace_kvm_exit(*vcpu_pc(vcpu));
   /*
 diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
 index 7928dbd..0afcc00 100644
 --- a/arch/arm/kvm/coproc.c
 +++ b/arch/arm/kvm/coproc.c
 @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
   return true;
  }

 -/* See note at ARM ARM B1.14.4 */
 +/*
 + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily 
 

[PATCH v2] kvm: iommu: Add cond_resched to legacy device assignment code

2015-01-27 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

When assigning devices to large memory guests (=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 arch/x86/kvm/iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/iommu.c b/arch/x86/kvm/iommu.c
index 17b73ee..7dbced3 100644
--- a/arch/x86/kvm/iommu.c
+++ b/arch/x86/kvm/iommu.c
@@ -138,7 +138,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)
 
gfn += page_size  PAGE_SHIFT;
 
-
+   cond_resched();
}
 
return 0;
@@ -306,6 +306,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
kvm_unpin_pages(kvm, pfn, unmap_pages);
 
gfn += unmap_pages;
+
+   cond_resched();
}
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] KVM: MMU: Explicitly set D-bit for writable spte.

2015-01-27 Thread Kai Huang
This patch avoids unnecessary dirty GPA logging to PML buffer in EPT violation
path by setting D-bit manually prior to the occurrence of the write from guest.

We only set D-bit manually in set_spte, and leave fast_page_fault path
unchanged, as fast_page_fault is very unlikely to happen in case of PML.

For the hva - pa change case, the spte is updated to either read-only (host
pte is read-only) or be dropped (host pte is writeable), and both cases will be
handled by above changes, therefore no change is necessary.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 arch/x86/kvm/mmu.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c438224..fb35535 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2597,8 +2597,14 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
}
}
 
-   if (pte_access  ACC_WRITE_MASK)
+   if (pte_access  ACC_WRITE_MASK) {
mark_page_dirty(vcpu-kvm, gfn);
+   /*
+* Explicitly set dirty bit. It is used to eliminate unnecessary
+* dirty GPA logging in case of PML is enabled on VMX.
+*/
+   spte |= shadow_dirty_mask;
+   }
 
 set_pte:
if (mmu_spte_update(sptep, spte))
@@ -2914,6 +2920,16 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 */
gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
 
+   /*
+* Theoretically we could also set dirty bit (and flush TLB) here in
+* order to eliminate the unnecessary PML logging. See comments in
+* set_spte. But as in case of PML, fast_page_fault is very unlikely to
+* happen so we leave it unchanged. This might result in the same GPA
+* to be logged in PML buffer again when the write really happens, and
+* eventually to be called by mark_page_dirty twice. But it's also no
+* harm. This also avoids the TLB flush needed after setting dirty bit
+* so non-PML cases won't be impacted.
+*/
if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
mark_page_dirty(vcpu-kvm, gfn);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty

2015-01-27 Thread Kai Huang
We don't have to write protect guest memory for dirty logging if architecture
supports hardware dirty logging, such as PML on VMX, so rename it to be more
generic.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 arch/arm/kvm/mmu.c   | 18 --
 arch/x86/kvm/mmu.c   | 21 +++--
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c  |  2 +-
 4 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 74aeaba..6034697 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1081,7 +1081,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 }
 
 /**
- * kvm_arch_mmu_write_protect_pt_masked() - write protect dirty pages
+ * kvm_mmu_write_protect_pt_masked() - write protect dirty pages
  * @kvm:   The KVM pointer
  * @slot:  The memory slot associated with mask
  * @gfn_offset:The gfn offset in memory slot
@@ -1091,7 +1091,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
  * Walks bits set in mask write protects the associated pte's. Caller must
  * acquire kvm_mmu_lock.
  */
-void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
 {
@@ -1102,6 +1102,20 @@ void kvm_arch_mmu_write_protect_pt_masked(struct kvm 
*kvm,
stage2_wp_range(kvm, start, end);
 }
 
+/*
+ * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
+ * dirty pages.
+ *
+ * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to
+ * enable dirty logging for them.
+ */
+void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
+   struct kvm_memory_slot *slot,
+   gfn_t gfn_offset, unsigned long mask)
+{
+   kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0ed9f79..b18e65c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1216,7 +1216,7 @@ static bool __rmap_write_protect(struct kvm *kvm, 
unsigned long *rmapp,
 }
 
 /**
- * kvm_arch_mmu_write_protect_pt_masked - write protect selected PT level pages
+ * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
  * @kvm: kvm instance
  * @slot: slot to protect
  * @gfn_offset: start of the BITS_PER_LONG pages we care about
@@ -1225,7 +1225,7 @@ static bool __rmap_write_protect(struct kvm *kvm, 
unsigned long *rmapp,
  * Used when we do not need to care about huge page mappings: e.g. during dirty
  * logging we do not have any such mappings.
  */
-void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 struct kvm_memory_slot *slot,
 gfn_t gfn_offset, unsigned long mask)
 {
@@ -1241,6 +1241,23 @@ void kvm_arch_mmu_write_protect_pt_masked(struct kvm 
*kvm,
}
 }
 
+/**
+ * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
+ * PT level pages.
+ *
+ * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to
+ * enable dirty logging for them.
+ *
+ * Used when we do not need to care about huge page mappings: e.g. during dirty
+ * logging we do not have any such mappings.
+ */
+void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
+   struct kvm_memory_slot *slot,
+   gfn_t gfn_offset, unsigned long mask)
+{
+   kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+}
+
 static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
struct kvm_memory_slot *slot;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d67195..32d0575 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -615,7 +615,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 int kvm_get_dirty_log_protect(struct kvm *kvm,
struct kvm_dirty_log *log, bool *is_dirty);
 
-void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
+void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset,
unsigned long mask);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a8490f0..0c28176 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1059,7 +1059,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
dirty_bitmap_buffer[i] = mask;
 
offset = i * BITS_PER_LONG;
-   kvm_arch_mmu_write_protect_pt_masked(kvm, memslot, 

[PATCH 5/6] KVM: x86: Add new dirty logging kvm_x86_ops for PML

2015-01-27 Thread Kai Huang
This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty
logging for particular memory slot, and to flush potentially logged dirty GPAs
before reporting slot-dirty_bitmap to userspace.

kvm x86 common code calls these hooks when they are available so PML logic can
be hidden to VMX specific. Other ARCHs won't be impacted as these hooks are NULL
for them.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h | 25 +++
 arch/x86/kvm/mmu.c  |  6 +++-
 arch/x86/kvm/x86.c  | 71 -
 3 files changed, 93 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 67a98d7..57916ec 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -802,6 +802,31 @@ struct kvm_x86_ops {
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
 
void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
+
+   /*
+* Arch-specific dirty logging hooks. These hooks are only supposed to
+* be valid if the specific arch has hardware-accelerated dirty logging
+* mechanism. Currently only for PML on VMX.
+*
+*  - slot_enable_log_dirty:
+*  called when enabling log dirty mode for the slot.
+*  - slot_disable_log_dirty:
+*  called when disabling log dirty mode for the slot.
+*  also called when slot is created with log dirty disabled.
+*  - flush_log_dirty:
+*  called before reporting dirty_bitmap to userspace.
+*  - enable_log_dirty_pt_masked:
+*  called when reenabling log dirty for the GFNs in the mask after
+*  corresponding bits are cleared in slot-dirty_bitmap.
+*/
+   void (*slot_enable_log_dirty)(struct kvm *kvm,
+ struct kvm_memory_slot *slot);
+   void (*slot_disable_log_dirty)(struct kvm *kvm,
+  struct kvm_memory_slot *slot);
+   void (*flush_log_dirty)(struct kvm *kvm);
+   void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
+  struct kvm_memory_slot *slot,
+  gfn_t offset, unsigned long mask);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6c24af3..c5833ca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1335,7 +1335,11 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm 
*kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask)
 {
-   kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+   if (kvm_x86_ops-enable_log_dirty_pt_masked)
+   kvm_x86_ops-enable_log_dirty_pt_masked(kvm, slot, gfn_offset,
+   mask);
+   else
+   kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
 static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3a7fcff..442ee7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3780,6 +3780,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
 
mutex_lock(kvm-slots_lock);
 
+   /*
+* Flush potentially hardware-cached dirty pages to dirty_bitmap.
+*/
+   if (kvm_x86_ops-flush_log_dirty)
+   kvm_x86_ops-flush_log_dirty(kvm);
+
r = kvm_get_dirty_log_protect(kvm, log, is_dirty);
 
/*
@@ -7533,6 +7539,56 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return 0;
 }
 
+static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
+struct kvm_memory_slot *new)
+{
+   /* Still write protect RO slot */
+   if (new-flags  KVM_MEM_READONLY) {
+   kvm_mmu_slot_remove_write_access(kvm, new);
+   return;
+   }
+
+   /*
+* Call kvm_x86_ops dirty logging hooks when they are valid.
+*
+* kvm_x86_ops-slot_disable_log_dirty is called when:
+*
+*  - KVM_MR_CREATE with dirty logging is disabled
+*  - KVM_MR_FLAGS_ONLY with dirty logging is disabled in new flag
+*
+* The reason is, in case of PML, we need to set D-bit for any slots
+* with dirty logging disabled in order to eliminate unnecessary GPA
+* logging in PML buffer (and potential PML buffer full VMEXT). This
+* guarantees leaving PML enabled during guest's lifetime won't have
+* any additonal overhead from PML when guest is running with dirty
+* logging disabled for memory slots.
+*
+* kvm_x86_ops-slot_enable_log_dirty is called when switching new slot
+* to dirty logging mode.
+*
+* If kvm_x86_ops dirty logging hooks are 

[PATCH 2/6] KVM: MMU: Add mmu help functions to support PML

2015-01-27 Thread Kai Huang
This patch adds new mmu layer functions to clear/set D-bit for memory slot, and
to write protect superpages for memory slot.

In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU
updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages,
instead, we only need to clear D-bit in order to log that GPA.

For superpages, we still write protect it and let page fault code to handle
dirty page logging, as we still need to split superpage to 4K pages in PML.

As PML is always enabled during guest's lifetime, to eliminate unnecessary PML
GPA logging, we set D-bit manually for the slot with dirty logging disabled.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |   9 ++
 arch/x86/kvm/mmu.c  | 195 
 2 files changed, 204 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 843bea0..4f6369b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -835,6 +835,15 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 
accessed_mask,
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
+  struct kvm_memory_slot *memslot);
+void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
+   struct kvm_memory_slot *memslot);
+void kvm_mmu_slot_set_dirty(struct kvm *kvm,
+   struct kvm_memory_slot *memslot);
+void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+  struct kvm_memory_slot *slot,
+  gfn_t gfn_offset, unsigned long mask);
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b18e65c..c438224 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1215,6 +1215,60 @@ static bool __rmap_write_protect(struct kvm *kvm, 
unsigned long *rmapp,
return flush;
 }
 
+static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep)
+{
+   u64 spte = *sptep;
+
+   rmap_printk(rmap_clear_dirty: spte %p %llx\n, sptep, *sptep);
+
+   spte = ~shadow_dirty_mask;
+
+   return mmu_spte_update(sptep, spte);
+}
+
+static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *sptep;
+   struct rmap_iterator iter;
+   bool flush = false;
+
+   for (sptep = rmap_get_first(*rmapp, iter); sptep;) {
+   BUG_ON(!(*sptep  PT_PRESENT_MASK));
+
+   flush |= spte_clear_dirty(kvm, sptep);
+   sptep = rmap_get_next(iter);
+   }
+
+   return flush;
+}
+
+static bool spte_set_dirty(struct kvm *kvm, u64 *sptep)
+{
+   u64 spte = *sptep;
+
+   rmap_printk(rmap_set_dirty: spte %p %llx\n, sptep, *sptep);
+
+   spte |= shadow_dirty_mask;
+
+   return mmu_spte_update(sptep, spte);
+}
+
+static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *sptep;
+   struct rmap_iterator iter;
+   bool flush = false;
+
+   for (sptep = rmap_get_first(*rmapp, iter); sptep;) {
+   BUG_ON(!(*sptep  PT_PRESENT_MASK));
+
+   flush |= spte_set_dirty(kvm, sptep);
+   sptep = rmap_get_next(iter);
+   }
+
+   return flush;
+}
+
 /**
  * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
  * @kvm: kvm instance
@@ -1242,6 +1296,32 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm 
*kvm,
 }
 
 /**
+ * kvm_mmu_clear_dirty_pt_masked - clear MMU D-bit for PT level pages
+ * @kvm: kvm instance
+ * @slot: slot to clear D-bit
+ * @gfn_offset: start of the BITS_PER_LONG pages we care about
+ * @mask: indicates which pages we should clear D-bit
+ *
+ * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap.
+ */
+void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+struct kvm_memory_slot *slot,
+gfn_t gfn_offset, unsigned long mask)
+{
+   unsigned long *rmapp;
+
+   while (mask) {
+   rmapp = __gfn_to_rmap(slot-base_gfn + gfn_offset + __ffs(mask),
+ PT_PAGE_TABLE_LEVEL, slot);
+   __rmap_clear_dirty(kvm, rmapp);
+
+   /* clear the first set bit */
+   mask = mask - 1;
+   }
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked);
+
+/**
  * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
  * PT level pages.
  *
@@ -4368,6 +4448,121 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
kvm_flush_remote_tlbs(kvm);
 }
 
+void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
+   

[PATCH 4/6] KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access

2015-01-27 Thread Kai Huang
This patch changes the second parameter of kvm_mmu_slot_remove_write_access from
'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging
hooks, which will be introduced in further patch.

Better way is to change second parameter of kvm_arch_commit_memory_region from
'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it
requires changes on other non-x86 ARCH too, so avoid it now.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu.c  |  5 ++---
 arch/x86/kvm/x86.c  | 10 +++---
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f6369b..67a98d7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -834,7 +834,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+ struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
   struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fb35535..6c24af3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4410,14 +4410,13 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
init_kvm_mmu(vcpu);
 }
 
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
+void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
 {
-   struct kvm_memory_slot *memslot;
gfn_t last_gfn;
int i;
bool flush = false;
 
-   memslot = id_to_memslot(kvm-memslots, slot);
last_gfn = memslot-base_gfn + memslot-npages - 1;
 
spin_lock(kvm-mmu_lock);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e10e3f..3a7fcff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7538,7 +7538,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
const struct kvm_memory_slot *old,
enum kvm_mr_change change)
 {
-
+   struct kvm_memory_slot *new;
int nr_mmu_pages = 0;
 
if ((mem-slot = KVM_USER_MEM_SLOTS)  (change == KVM_MR_DELETE)) {
@@ -7557,6 +7557,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 
if (nr_mmu_pages)
kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
+
+   /* It's OK to get 'new' slot here as it has already been installed */
+   new = id_to_memslot(kvm-memslots, mem-slot);
+
/*
 * Write protect all pages for dirty logging.
 *
@@ -7566,8 +7570,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 *
 * See the comments in fast_page_fault().
 */
-   if ((change != KVM_MR_DELETE)  (mem-flags  KVM_MEM_LOG_DIRTY_PAGES))
-   kvm_mmu_slot_remove_write_access(kvm, mem-slot);
+   if ((change != KVM_MR_DELETE)  (new-flags  KVM_MEM_LOG_DIRTY_PAGES))
+   kvm_mmu_slot_remove_write_access(kvm, new);
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] KVM: VMX: Add PML support in VMX

2015-01-27 Thread Kai Huang
This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
to allow user to enable/disable it manually.

Signed-off-by: Kai Huang kai.hu...@linux.intel.com
---
 arch/x86/include/asm/vmx.h  |   4 +
 arch/x86/include/uapi/asm/vmx.h |   1 +
 arch/x86/kvm/trace.h|  18 
 arch/x86/kvm/vmx.c  | 195 +++-
 arch/x86/kvm/x86.c  |   1 +
 5 files changed, 218 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 45afaee..da772ed 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -69,6 +69,7 @@
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING  0x0400
 #define SECONDARY_EXEC_ENABLE_INVPCID  0x1000
 #define SECONDARY_EXEC_SHADOW_VMCS  0x4000
+#define SECONDARY_EXEC_ENABLE_PML   0x0002
 #define SECONDARY_EXEC_XSAVES  0x0010
 
 
@@ -121,6 +122,7 @@ enum vmcs_field {
GUEST_LDTR_SELECTOR = 0x080c,
GUEST_TR_SELECTOR   = 0x080e,
GUEST_INTR_STATUS   = 0x0810,
+   GUEST_PML_INDEX = 0x0812,
HOST_ES_SELECTOR= 0x0c00,
HOST_CS_SELECTOR= 0x0c02,
HOST_SS_SELECTOR= 0x0c04,
@@ -140,6 +142,8 @@ enum vmcs_field {
VM_EXIT_MSR_LOAD_ADDR_HIGH  = 0x2009,
VM_ENTRY_MSR_LOAD_ADDR  = 0x200a,
VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x200b,
+   PML_ADDRESS = 0x200e,
+   PML_ADDRESS_HIGH= 0x200f,
TSC_OFFSET  = 0x2010,
TSC_OFFSET_HIGH = 0x2011,
VIRTUAL_APIC_PAGE_ADDR  = 0x2012,
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index ff2b8e2..c5f1a1d 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -73,6 +73,7 @@
 #define EXIT_REASON_XSETBV  55
 #define EXIT_REASON_APIC_WRITE  56
 #define EXIT_REASON_INVPCID 58
+#define EXIT_REASON_PML_FULL62
 #define EXIT_REASON_XSAVES  63
 #define EXIT_REASON_XRSTORS 64
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 587149b..a139977 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -846,6 +846,24 @@ TRACE_EVENT(kvm_track_tsc,
  __print_symbolic(__entry-host_clock, host_clocks))
 );
 
+/*
+ * Tracepoint for PML full VMEXIT.
+ */
+TRACE_EVENT(kvm_pml_full,
+   TP_PROTO(unsigned int vcpu_id),
+   TP_ARGS(vcpu_id),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   vcpu_id )
+   ),
+
+   TP_fast_assign(
+   __entry-vcpu_id= vcpu_id;
+   ),
+
+   TP_printk(vcpu %d: PML full, __entry-vcpu_id)
+);
+
 #endif /* CONFIG_X86_64 */
 
 TRACE_EVENT(kvm_ple_window,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c987374..de5ce82 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -101,6 +101,9 @@ module_param(nested, bool, S_IRUGO);
 
 static u64 __read_mostly host_xss;
 
+static bool __read_mostly enable_pml = 1;
+module_param_named(pml, enable_pml, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD)
 #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST (X86_CR0_WP | X86_CR0_NE)
 #define KVM_VM_CR0_ALWAYS_ON   \
@@ -516,6 +519,10 @@ struct vcpu_vmx {
/* Dynamic PLE window. */
int ple_window;
bool ple_window_dirty;
+
+   /* Support for PML */
+#define PML_ENTITY_NUM 512
+   struct page *pml_pg;
 };
 
 enum segment_cache_field {
@@ -1068,6 +1075,11 @@ static inline bool cpu_has_vmx_shadow_vmcs(void)
SECONDARY_EXEC_SHADOW_VMCS;
 }
 
+static inline bool cpu_has_vmx_pml(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl  SECONDARY_EXEC_ENABLE_PML;
+}
+
 static inline bool report_flexpriority(void)
 {
return flexpriority_enabled;
@@ -2924,7 +2936,8 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
SECONDARY_EXEC_APIC_REGISTER_VIRT |
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
SECONDARY_EXEC_SHADOW_VMCS |
-   SECONDARY_EXEC_XSAVES;
+   SECONDARY_EXEC_XSAVES |
+   SECONDARY_EXEC_ENABLE_PML;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
_cpu_based_2nd_exec_control)  0)
@@ -4355,6 +4368,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx 
*vmx)
   a current VMCS12
*/
exec_control = ~SECONDARY_EXEC_SHADOW_VMCS;
+   /* PML is enabled/disabled in 

RE: [v3 00/26] Add VT-d Posted-Interrupts support

2015-01-27 Thread Wu, Feng


 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, January 28, 2015 11:44 AM
 To: Wu, Feng
 Cc: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
 g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
 j...@8bytes.org; jiang@linux.intel.com; eric.au...@linaro.org;
 linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org;
 kvm@vger.kernel.org
 Subject: Re: [v3 00/26] Add VT-d Posted-Interrupts support
 
 On Wed, 2015-01-28 at 03:01 +, Wu, Feng wrote:
 
   -Original Message-
   From: Wu, Feng
   Sent: Wednesday, January 21, 2015 10:26 AM
   To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
 x...@kernel.org;
   g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
   j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
   Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
   io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
   Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support
  
  
-Original Message-
From: Wu, Feng
Sent: Friday, December 12, 2014 11:15 PM
To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
   x...@kernel.org;
g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
Subject: [v3 00/26] Add VT-d Posted-Interrupts support
   
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.
   
You can find the VT-d Posted-Interrtups Spec. in the following URL:
   
  
 http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
y/vt-directed-io-spec.html
   
v1-v2:
* Use VFIO framework to enable this feature, the VFIO part of this 
series is
  base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control
* Rebase this patchset on
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
  then revise some irq logic based on the new hierarchy irqdomain
 patches
provided
  by Jiang Liu jiang@linux.intel.com
   
v2-v3:
* Adjust the Posted-interrupts Descriptor updating logic when vCPU is
  preempted or blocked.
* KVM_DEV_VFIO_DEVICE_POSTING_IRQ --
KVM_DEV_VFIO_DEVICE_POST_IRQ
* __KVM_HAVE_ARCH_KVM_VFIO_POSTING --
__KVM_HAVE_ARCH_KVM_VFIO_POST
* Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
  can be used to change back to remapping mode.
* Fix typo
   
This patch series is made of the following groups:
1-6: Some preparation changes in iommu and irq component, this is based
 on
the
 new hierarchy irqdomain logic.
7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
detection,
  command line parameter.
10-17, 22-25: Changes related to KVM itself.
18-20: Changes in VFIO component, this part was previously sent out as
[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
Posted-Interrupts
21: x86 irq related changes
   
Feng Wu (26):
  genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
VCPU
  iommu: Add new member capability to struct irq_remap_ops
  iommu, x86: Define new irte structure for VT-d Posted-Interrupts
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
  iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Add intel_irq_remapping_capability() for Intel
  iommu, x86: define irq_remapping_cap()
  KVM: change struct pi_desc for VT-d Posted-Interrupts
  KVM: Add some helper functions for Posted-Interrupts
  KVM: Initialize VT-d Posted-Interrupts Descriptor
  KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
  KVM: add interfaces to control PI outside vmx
  KVM: Make struct kvm_irq_routing_table accessible
  KVM: make kvm_set_msi_irq() public
  KVM: kvm-vfio: User API for VT-d Posted-Interrupts
  KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
  KVM: x86: kvm-vfio: VT-d posted-interrupts setup
  x86, irq: Define a global vector for VT-d Posted-Interrupts
  KVM: Define a wakeup worker thread for vCPU
  KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  KVM: Suppress posted-interrupt when 'SN' is set
  iommu/vt-d: 

Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

2015-01-27 Thread Zhang Haoyu

On 2015-01-28 03:10:23, Jidong Xiao wrote:
 On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
 mikhail.sennikovs...@profitbricks.com wrote:
  Hi all,

  I've posted the bolow mail to the qemu-dev mailing list, but I've got no
  response there.
  That's why I decided to re-post it here as well, and besides that I think
  this could be a kvm-specific issue as well.
 
  Some additional thing to note:
  I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
  well.
  I would typically use a max_downtime adjusted to 1 second instead of default
  30 ms.
  I also noticed that the issue happens much more rarelly if I increase the
  migration bandwidth, i.e. like
 
  diff --git a/migration.c b/migration.c
  index 26f4b65..d2e3b39 100644
 --- a/migration.c
  +++ b/migration.c
  @@ -36,7 +36,7 @@ enum {
   MIG_STATE_COMPLETED,
   };
 
  -#define MAX_THROTTLE  (32  20)  /* Migration speed throttling */
  +#define MAX_THROTTLE  (90  20)  /* Migration speed throttling */
 
  Like I said below, I would be glad to provide you with any additional
  information.
 
  Thanks,
  Mikhail
 
 Hi, Mikhail,

 So if you choose to use one vcpu, instead of smp, this issue would not
 happen, right?
 
I think you can try cpu feature hv_relaxed, like
-cpu Haswell,hv_relaxed

 -Jidong
 
  On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
 
  Hi all,
 
  I'm running a slitely modified migration over tcp test in virt-test, which
  does a migration from one smp=2 VM to another on the same host over TCP,
  and exposes some dummy CPU load inside the GUEST while migration, and
  after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
  inside the guest,
  which happens when
 
  An expected clock interrupt was not received on a secondary processor in
  an
  MP system within the allocated interval. This indicates that the specified
  processor is hung and not processing interrupts.
  
 
  This seems to happen with any qemu version I've tested (1.2 and above,
  including upstream),
  and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
  LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
  host.
 
  One thing I noticed is that exposing a dummy CPU load on the HOST (like
  running multiple instances of the while true; do false; done script) in
  parallel with doing migration makes the issue to be quite easily
 reproducible.
 
 
  Looking inside the windows crash dump, the second CPU is just running at
  IRQL 0, and it aparently not hung, as Windows is able to save its state in
  the crash dump correctly, which assumes running some code on it.
  So this aparently seems to be some timing issue (like host scheduler does
  not schedule the thread executing secondary CPU's code in time).
 
  Could you give me some insight on this, i.e. is there a way to customize
  QEMU/KVM to avoid such issue?
 
  If you think this might be a qemu/kvm issue, I can provide you any info,
  like windows crash dumps, or the test-case to reproduce this.
 
 
 qemu is started as:
 
  from-VM:
 
  qemu-system-x86_64 \
  -S  \
  -name 'virt-tests-vm1'  \
  -sandbox off  \
  -M pc-1.0  \
  -nodefaults  \
  -vga std  \
  -chardev
  socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
  \
  -mon chardev=qmp_id_qmp1,mode=control  \
  -chardev
 socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
  \
  -device isa-serial,chardev=serial_id_serial0  \
  -chardev
  socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
  \
  -device
  isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
  -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
  -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
  -device
  virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
  -device
  virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
  \
  -netdev
 user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
  -m 2G  \
  -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
  -cpu phenom \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
  -vnc :0  \
  -rtc base=localtime,clock=host,driftfix=none  \
  -boot order=cdn,once=c,menu=off \
  -enable-kvm
 
  to-VM:
 
  qemu-system-x86_64 \
  -S  \
  -name 'virt-tests-vm1'  \
  -sandbox off  \
 -M pc-1.0  \
  -nodefaults  \
  -vga std  \
  -chardev
  socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
  \
  -mon chardev=qmp_id_qmp1,mode=control  \
  -chardev
  socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
  \
  -device isa-serial,chardev=serial_id_serial0  \
  -chardev
  

Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.

2015-01-27 Thread Wincy Van
On Wed, Jan 28, 2015 at 5:37 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 24/01/2015 11:21, Wincy Van wrote:
 +   memset(vmx_msr_bitmap_nested, 0xff, PAGE_SIZE);

 Most bytes are always 0xff.  It's better to initialize it to 0xff once,
 and set the bit here if !nested_cpu_has_virt_x2apic_mode(vmcs12).

Indeed, will do.


 +   if (nested_cpu_has_virt_x2apic_mode(vmcs12))

 Please add braces here, because of the /* */ command below.

Will do.


 +   /* TPR is allowed */
 +   nested_vmx_disable_intercept_for_msr(msr_bitmap,
 +   vmx_msr_bitmap_nested,
 +   APIC_BASE_MSR + (APIC_TASKPRI  4),
 +   MSR_TYPE_R | MSR_TYPE_W);

 +static inline int nested_vmx_check_virt_x2apic(struct kvm_vcpu *vcpu,
 +  struct vmcs12 *vmcs12)
 +{
 +   if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
 +   return -EINVAL;

 No need for this function and nested_cpu_has_virt_x2apic_mode.  Just
 inline them in their caller(s).  Same for other cases throughout the series.


Do you mean that we should also inline the same functions in the other
patches of this patch set?
I think these functions will keep the code tidy, just like the
functions as nested_cpu_has_preemption_timer, nested_cpu_has_ept, etc.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/6] KVM: nVMX: Enable nested apicv support.

2015-01-27 Thread Wincy Van
On Wed, Jan 28, 2015 at 6:06 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 24/01/2015 11:18, Wincy Van wrote:
 v2 --- v3:
   1. Add a new field in nested_vmx to avoid the spin lock in v2.
   2. Drop send eoi to L1 when doing nested interrupt delivery.
   3. Use hardware MSR bitmap to enable nested virtualize x2apic
  mode.

 I think the patches are mostly okay.  I made a few comments.


Thank you, Paolo and Yang. I can't accomplish this without your help.

 One of the things to do on top could be to avoid rebuilding the whole
 vmcs02 on every entry.  Recomputing the MSR bitmap on every vmentry is
 not particularly nice, for example.  It is not necessary unless the
 execution controls have changed.

Indeed, I was planned to do that optimization after this patch set.


Thanks,
Wincy


 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] KVM: nVMX: Enable nested virtualize x2apic mode.

2015-01-27 Thread Wincy Van
On Wed, Jan 28, 2015 at 5:39 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 24/01/2015 11:21, Wincy Van wrote:
 +static void nested_vmx_disable_intercept_for_msr(unsigned long 
 *msr_bitmap_l1,
 +  unsigned long 
 *msr_bitmap_nested,
 +  u32 msr, int type)
 +{
 +   int f = sizeof(unsigned long);
 +
 +   if (!cpu_has_vmx_msr_bitmap())
 +   return;
 +

 Also, make this a WARN_ON.

WIll do.



Thanks,
Wincy



 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/6] KVM: nVMX: Enable nested posted interrupt processing.

2015-01-27 Thread Wincy Van
On Wed, Jan 28, 2015 at 5:55 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 24/01/2015 11:24, Wincy Van wrote:
 if (!nested_cpu_has_virt_x2apic_mode(vmcs12) 
 !nested_cpu_has_apic_reg_virt(vmcs12) 
 -   !nested_cpu_has_vid(vmcs12))
 +   !nested_cpu_has_vid(vmcs12) 
 +   !nested_cpu_has_posted_intr(vmcs12))
 return 0;

 if (nested_cpu_has_virt_x2apic_mode(vmcs12))
 r = nested_vmx_check_virt_x2apic(vcpu, vmcs12);
 if (nested_cpu_has_vid(vmcs12))
 r |= nested_vmx_check_vid(vcpu, vmcs12);
 +   if (nested_cpu_has_posted_intr(vmcs12))
 +   r |= nested_vmx_check_posted_intr(vcpu, vmcs12);

 These ifs are always true.


Why? L1 may config these features seperately, we should check them one by one.
e.g.  L1 may enable posted interrupt processing and virtual interrupt
delivery, but leaving virtualize x2apic mode disabled, then
nested_cpu_has_virt_x2apic_mode will return false.

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend

2015-01-27 Thread Nikolay Nikolaev
Hi Andre,

On Tue, Jan 27, 2015 at 3:31 PM, Andre Przywara andre.przyw...@arm.com wrote:

 Hi Nikolay,

 On 24/01/15 11:59, Nikolay Nikolaev wrote:
  In io_mem_abort remove the call to vgic_handle_mmio. The target is to have
  a single MMIO handling path - that is through the kvm_io_bus_ API.
 
  Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region.
  Both read and write calls are redirected to vgic_io_dev_access where
  kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio.
 
 
  Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
  ---
   arch/arm/kvm/mmio.c|3 -
   include/kvm/arm_vgic.h |3 -
   virt/kvm/arm/vgic.c|  123 
  
   3 files changed, 114 insertions(+), 15 deletions(-)
 
  diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
  index d852137..8dc2fde 100644
  --- a/arch/arm/kvm/mmio.c
  +++ b/arch/arm/kvm/mmio.c
  @@ -230,9 +230,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
  *run,
   fault_ipa, 0);
}
 
  - if (vgic_handle_mmio(vcpu, run, mmio))
  - return 1;
  -

 Why is this (whole patch) actually needed? Is that just to make it nicer
 by pulling everything under one umbrella?


It started from this mail form Christofer:
https://lkml.org/lkml/2014/3/28/403


 For enabling ioeventfd you actually don't need this patch, right?
Yes, we don't need it.
 (I am asking because this breaks GICv3 emulation, see below)

if (handle_kernel_mmio(vcpu, run, mmio))
return 1;
 
  diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
  index 7c55dd5..60639b1 100644
  --- a/include/kvm/arm_vgic.h
  +++ b/include/kvm/arm_vgic.h
  @@ -237,6 +237,7 @@ struct vgic_dist {
unsigned long   *irq_pending_on_cpu;
 
struct vgic_vm_ops  vm_ops;
  + struct kvm_io_device*io_dev;
   #endif
   };
 
  @@ -311,8 +312,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
  unsigned int irq_num,
bool level);
   void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
   int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  -   struct kvm_exit_mmio *mmio);
 
   #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel))
   #define vgic_initialized(k)  (!!((k)-arch.vgic.nr_cpus))
  diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
  index 0cc6ab6..195d2ba 100644
  --- a/virt/kvm/arm/vgic.c
  +++ b/virt/kvm/arm/vgic.c
  @@ -31,6 +31,9 @@
   #include asm/kvm_emulate.h
   #include asm/kvm_arm.h
   #include asm/kvm_mmu.h
  +#include asm/kvm.h
  +
  +#include iodev.h
 
   /*
* How the whole thing works (courtesy of Christoffer Dall):
  @@ -77,6 +80,7 @@
 
   #include vgic.h
 
  +static int vgic_register_kvm_io_dev(struct kvm *kvm);
   static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
   static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
   static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
  @@ -97,6 +101,7 @@ static bool queue_sgi(struct kvm_vcpu *vcpu, int irq)
 
   int kvm_vgic_map_resources(struct kvm *kvm)
   {
  + vgic_register_kvm_io_dev(kvm);
return kvm-arch.vgic.vm_ops.map_resources(kvm, vgic);
   }
 
  @@ -776,27 +781,123 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, 
  struct kvm_run *run,
   }
 
   /**
  - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation
  + * vgic_io_dev_access - handle an in-kernel MMIO access for the GIC 
  emulation
* @vcpu:  pointer to the vcpu performing the access
  - * @run:   pointer to the kvm_run structure
  - * @mmio:  pointer to the data describing the access
  + * @this:  pointer to the kvm_io_device structure
  + * @addr:  the MMIO address being accessed
  + * @len:   the length of the accessed data
  + * @val:   pointer to the value being written,
  + * or where the read operation will store its result
  + * @is_write:  flag to show whether a write access is performed
*
  - * returns true if the MMIO access has been performed in kernel space,
  - * and false if it needs to be emulated in user space.
  + * returns 0 if the MMIO access has been performed in kernel space,
  + * and 1 if it needs to be emulated in user space.
* Calls the actual handling routine for the selected VGIC model.
*/
  -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  -   struct kvm_exit_mmio *mmio)
  +static int vgic_io_dev_access(struct kvm_vcpu *vcpu, struct kvm_io_device 
  *this,
  + gpa_t addr, int len, void *val, bool is_write)
   {
  - if (!irqchip_in_kernel(vcpu-kvm))
  - return false;
  + struct kvm_exit_mmio mmio;
  + bool ret;
  +
  + mmio = (struct kvm_exit_mmio) {
  + .phys_addr = addr,
  + 

Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches

2015-01-27 Thread Christoffer Dall
On Tue, Jan 27, 2015 at 11:21:38AM +, Marc Zyngier wrote:
 On 26/01/15 22:58, Christoffer Dall wrote:
  On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote:
  Trying to emulate the behaviour of set/way cache ops is fairly
  pointless, as there are too many ways we can end-up missing stuff.
  Also, there is some system caches out there that simply ignore
  set/way operations.
 
  So instead of trying to implement them, let's convert it to VA ops,
  and use them as a way to re-enable the trapping of VM ops. That way,
  we can detect the point when the MMU/caches are turned off, and do
  a full VM flush (which is what the guest was trying to do anyway).
 
  This allows a 32bit zImage to boot on the APM thingy, and will
  probably help bootloaders in general.
 
  Signed-off-by: Marc Zyngier marc.zyng...@arm.com
  
  This had some conflicts with dirty page logging.  I fixed it up here,
  and also removed some trailing white space and mixed spaces/tabs that
  patch complained about:
  
  http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes
 
 Thanks for doing so.
 
  ---
   arch/arm/include/asm/kvm_emulate.h   | 10 +
   arch/arm/include/asm/kvm_host.h  |  3 --
   arch/arm/include/asm/kvm_mmu.h   |  3 +-
   arch/arm/kvm/arm.c   | 10 -
   arch/arm/kvm/coproc.c| 64 ++
   arch/arm/kvm/coproc_a15.c|  2 +-
   arch/arm/kvm/coproc_a7.c |  2 +-
   arch/arm/kvm/mmu.c   | 70 
  -
   arch/arm/kvm/trace.h | 39 +++
   arch/arm64/include/asm/kvm_emulate.h | 10 +
   arch/arm64/include/asm/kvm_host.h|  3 --
   arch/arm64/include/asm/kvm_mmu.h |  3 +-
   arch/arm64/kvm/sys_regs.c| 75 
  +---
   13 files changed, 155 insertions(+), 139 deletions(-)
 
  diff --git a/arch/arm/include/asm/kvm_emulate.h 
  b/arch/arm/include/asm/kvm_emulate.h
  index 66ce176..7b01523 100644
  --- a/arch/arm/include/asm/kvm_emulate.h
  +++ b/arch/arm/include/asm/kvm_emulate.h
  @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu-arch.hcr = HCR_GUEST_MASK;
   }
 
  +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
  +{
  + return vcpu-arch.hcr;
  +}
  +
  +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
  +{
  + vcpu-arch.hcr = hcr;
  +}
  +
   static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
   {
return 1;
  diff --git a/arch/arm/include/asm/kvm_host.h 
  b/arch/arm/include/asm/kvm_host.h
  index 254e065..04b4ea0 100644
  --- a/arch/arm/include/asm/kvm_host.h
  +++ b/arch/arm/include/asm/kvm_host.h
  @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
 * Anything that is not used directly from assembly code goes
 * here.
 */
  - /* dcache set/way operation pending */
  - int last_pcpu;
  - cpumask_t require_dcache_flush;
 
/* Don't run the guest on this vcpu */
bool pause;
  diff --git a/arch/arm/include/asm/kvm_mmu.h 
  b/arch/arm/include/asm/kvm_mmu.h
  index 63e0ecc..286644c 100644
  --- a/arch/arm/include/asm/kvm_mmu.h
  +++ b/arch/arm/include/asm/kvm_mmu.h
  @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct 
  kvm_vcpu *vcpu, hva_t hva,
 
   #define kvm_virt_to_phys(x)  virt_to_idmap((unsigned long)(x))
 
  -void stage2_flush_vm(struct kvm *kvm);
  +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
  +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
   #endif   /* !__ASSEMBLY__ */
 
  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
  index 2d6d910..0b0d58a 100644
  --- a/arch/arm/kvm/arm.c
  +++ b/arch/arm/kvm/arm.c
  @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
  cpu)
vcpu-cpu = cpu;
vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
  - /*
  -  * Check whether this vcpu requires the cache to be flushed on
  -  * this physical CPU. This is a consequence of doing dcache
  -  * operations by set/way on this vcpu. We do it here to be in
  -  * a non-preemptible section.
  -  */
  - if (cpumask_test_and_clear_cpu(cpu, 
  vcpu-arch.require_dcache_flush))
  - flush_cache_all(); /* We'd really want v7_flush_dcache_all() 
  */
  -
kvm_arm_set_running_vcpu(vcpu);
   }
 
  @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
vcpu-mode = OUTSIDE_GUEST_MODE;
  - vcpu-arch.last_pcpu = smp_processor_id();
kvm_guest_exit();
trace_kvm_exit(*vcpu_pc(vcpu));
/*
  diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
  index 7928dbd..0afcc00 100644
  --- a/arch/arm/kvm/coproc.c
  +++ b/arch/arm/kvm/coproc.c
  @@ 

Re: [PATCH v3 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend

2015-01-27 Thread Andre Przywara
Hi Nikolay,

On 24/01/15 11:59, Nikolay Nikolaev wrote:
 In io_mem_abort remove the call to vgic_handle_mmio. The target is to have
 a single MMIO handling path - that is through the kvm_io_bus_ API.
 
 Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region.
 Both read and write calls are redirected to vgic_io_dev_access where
 kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio.
 
 
 Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
 ---
  arch/arm/kvm/mmio.c|3 -
  include/kvm/arm_vgic.h |3 -
  virt/kvm/arm/vgic.c|  123 
 
  3 files changed, 114 insertions(+), 15 deletions(-)
 
 diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
 index d852137..8dc2fde 100644
 --- a/arch/arm/kvm/mmio.c
 +++ b/arch/arm/kvm/mmio.c
 @@ -230,9 +230,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
 *run,
  fault_ipa, 0);
   }
  
 - if (vgic_handle_mmio(vcpu, run, mmio))
 - return 1;
 -

Why is this (whole patch) actually needed? Is that just to make it nicer
by pulling everything under one umbrella?
For enabling ioeventfd you actually don't need this patch, right?
(I am asking because this breaks GICv3 emulation, see below)

   if (handle_kernel_mmio(vcpu, run, mmio))
   return 1;
  
 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 7c55dd5..60639b1 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -237,6 +237,7 @@ struct vgic_dist {
   unsigned long   *irq_pending_on_cpu;
  
   struct vgic_vm_ops  vm_ops;
 + struct kvm_io_device*io_dev;
  #endif
  };
  
 @@ -311,8 +312,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
 unsigned int irq_num,
   bool level);
  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 -   struct kvm_exit_mmio *mmio);
  
  #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel))
  #define vgic_initialized(k)  (!!((k)-arch.vgic.nr_cpus))
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 0cc6ab6..195d2ba 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -31,6 +31,9 @@
  #include asm/kvm_emulate.h
  #include asm/kvm_arm.h
  #include asm/kvm_mmu.h
 +#include asm/kvm.h
 +
 +#include iodev.h
  
  /*
   * How the whole thing works (courtesy of Christoffer Dall):
 @@ -77,6 +80,7 @@
  
  #include vgic.h
  
 +static int vgic_register_kvm_io_dev(struct kvm *kvm);
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 @@ -97,6 +101,7 @@ static bool queue_sgi(struct kvm_vcpu *vcpu, int irq)
  
  int kvm_vgic_map_resources(struct kvm *kvm)
  {
 + vgic_register_kvm_io_dev(kvm);
   return kvm-arch.vgic.vm_ops.map_resources(kvm, vgic);
  }
  
 @@ -776,27 +781,123 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, 
 struct kvm_run *run,
  }
  
  /**
 - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation
 + * vgic_io_dev_access - handle an in-kernel MMIO access for the GIC emulation
   * @vcpu:  pointer to the vcpu performing the access
 - * @run:   pointer to the kvm_run structure
 - * @mmio:  pointer to the data describing the access
 + * @this:  pointer to the kvm_io_device structure
 + * @addr:  the MMIO address being accessed
 + * @len:   the length of the accessed data
 + * @val:   pointer to the value being written,
 + * or where the read operation will store its result
 + * @is_write:  flag to show whether a write access is performed
   *
 - * returns true if the MMIO access has been performed in kernel space,
 - * and false if it needs to be emulated in user space.
 + * returns 0 if the MMIO access has been performed in kernel space,
 + * and 1 if it needs to be emulated in user space.
   * Calls the actual handling routine for the selected VGIC model.
   */
 -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 -   struct kvm_exit_mmio *mmio)
 +static int vgic_io_dev_access(struct kvm_vcpu *vcpu, struct kvm_io_device 
 *this,
 + gpa_t addr, int len, void *val, bool is_write)
  {
 - if (!irqchip_in_kernel(vcpu-kvm))
 - return false;
 + struct kvm_exit_mmio mmio;
 + bool ret;
 +
 + mmio = (struct kvm_exit_mmio) {
 + .phys_addr = addr,
 + .len = len,
 + .is_write = is_write,
 + };
 +
 + if (is_write)
 + memcpy(mmio.data, val, len);
  
   /*
* This will currently call either vgic_v2_handle_mmio() or
* vgic_v3_handle_mmio(), which in turn will call
* vgic_handle_mmio_range() 

KVM call for agenda for 2015-02-03

2015-01-27 Thread Juan Quintela


Hi

Please, send any topic that you are interested in covering.

 Thanks, Juan.

 Call details:

By popular demand, a google calendar public entry with it

 
https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

 (Let me know if you have any problems with the calendar entry.  I just
 gave up about getting right at the same time CEST, CET, EDT and DST).

If you need phone number details,  contact me privately

Thanks, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend

2015-01-27 Thread Andre Przywara
Hi,

On 27/01/15 17:26, Eric Auger wrote:
 On 01/27/2015 05:51 PM, Nikolay Nikolaev wrote:
 Hi Andre,

 On Tue, Jan 27, 2015 at 3:31 PM, Andre Przywara andre.przyw...@arm.com 
 wrote:

 Hi Nikolay,

 On 24/01/15 11:59, Nikolay Nikolaev wrote:
 In io_mem_abort remove the call to vgic_handle_mmio. The target is to have
 a single MMIO handling path - that is through the kvm_io_bus_ API.

 Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region.
 Both read and write calls are redirected to vgic_io_dev_access where
 kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio.


 Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
 ---
  arch/arm/kvm/mmio.c|3 -
  include/kvm/arm_vgic.h |3 -
  virt/kvm/arm/vgic.c|  123 
 
  3 files changed, 114 insertions(+), 15 deletions(-)

 diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
 index d852137..8dc2fde 100644
 --- a/arch/arm/kvm/mmio.c
 +++ b/arch/arm/kvm/mmio.c
 @@ -230,9 +230,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
 *run,
  fault_ipa, 0);
   }

 - if (vgic_handle_mmio(vcpu, run, mmio))
 - return 1;
 -

 Why is this (whole patch) actually needed? Is that just to make it nicer
 by pulling everything under one umbrella?


 It started from this mail form Christofer:
 https://lkml.org/lkml/2014/3/28/403
 Hi Nikolay, Andre,
 
 I also understood that the target was to handle all kernel mmio through
 the same API, hence the first patch. This patch shows that at least for
 GICv2 it was doable without upheavals in vgic code and it also serves
 ioeventd which is good. Andre do you think the price to pay to integrate
 missing redistributors and forthcoming components is too high?

Hopefully not, actually I reckon that moving the upper level MMIO
dispatching out of vgic.c and letting the specific VGIC models register
what they need themselves (in their -emul.c files) sounds quite promising.
But this particular patch does not serve this purpose:
a) we replace two lines with a bunch of more layered code
b) we copy the MMIOed data to convert between the interfaces
c) we miss GICv3 emulation

So this needs to be addressed in a more general way (which maybe I will
give a try). That being sad I don't see why we would need to do this
right now and hold back ioeventfd by this rather orthogonal issue.

Christoffer, what's your take on this?

Cheers,
Andre.

 Best Regards
 
 Eric
 
 


 For enabling ioeventfd you actually don't need this patch, right?
 Yes, we don't need it.
 (I am asking because this breaks GICv3 emulation, see below)

   if (handle_kernel_mmio(vcpu, run, mmio))
   return 1;

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 7c55dd5..60639b1 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -237,6 +237,7 @@ struct vgic_dist {
   unsigned long   *irq_pending_on_cpu;

   struct vgic_vm_ops  vm_ops;
 + struct kvm_io_device*io_dev;
  #endif
  };

 @@ -311,8 +312,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
 unsigned int irq_num,
   bool level);
  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 -   struct kvm_exit_mmio *mmio);

  #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel))
  #define vgic_initialized(k)  (!!((k)-arch.vgic.nr_cpus))
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 0cc6ab6..195d2ba 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -31,6 +31,9 @@
  #include asm/kvm_emulate.h
  #include asm/kvm_arm.h
  #include asm/kvm_mmu.h
 +#include asm/kvm.h
 +
 +#include iodev.h

  /*
   * How the whole thing works (courtesy of Christoffer Dall):
 @@ -77,6 +80,7 @@

  #include vgic.h

 +static int vgic_register_kvm_io_dev(struct kvm *kvm);
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 @@ -97,6 +101,7 @@ static bool queue_sgi(struct kvm_vcpu *vcpu, int irq)

  int kvm_vgic_map_resources(struct kvm *kvm)
  {
 + vgic_register_kvm_io_dev(kvm);
   return kvm-arch.vgic.vm_ops.map_resources(kvm, vgic);
  }

 @@ -776,27 +781,123 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, 
 struct kvm_run *run,
  }

  /**
 - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC 
 emulation
 + * vgic_io_dev_access - handle an in-kernel MMIO access for the GIC 
 emulation
   * @vcpu:  pointer to the vcpu performing the access
 - * @run:   pointer to the kvm_run structure
 - * @mmio:  pointer to the data describing the access
 + * @this:  pointer to the kvm_io_device structure
 + * @addr:  the MMIO address being 

RE: [v3 00/26] Add VT-d Posted-Interrupts support

2015-01-27 Thread Wu, Feng


 -Original Message-
 From: Wu, Feng
 Sent: Wednesday, January 21, 2015 10:26 AM
 To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
 g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
 j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
 Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
 io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
 Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support
 
 
  -Original Message-
  From: Wu, Feng
  Sent: Friday, December 12, 2014 11:15 PM
  To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
 x...@kernel.org;
  g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
  j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
  Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
  io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
  Subject: [v3 00/26] Add VT-d Posted-Interrupts support
 
  VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
  With VT-d Posted-Interrupts enabled, external interrupts from
  direct-assigned devices can be delivered to guests without VMM
  intervention when guest is running in non-root mode.
 
  You can find the VT-d Posted-Interrtups Spec. in the following URL:
 
 http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
  y/vt-directed-io-spec.html
 
  v1-v2:
  * Use VFIO framework to enable this feature, the VFIO part of this series is
base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control
  * Rebase this patchset on
  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
then revise some irq logic based on the new hierarchy irqdomain patches
  provided
by Jiang Liu jiang@linux.intel.com
 
  v2-v3:
  * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
preempted or blocked.
  * KVM_DEV_VFIO_DEVICE_POSTING_IRQ --
  KVM_DEV_VFIO_DEVICE_POST_IRQ
  * __KVM_HAVE_ARCH_KVM_VFIO_POSTING --
  __KVM_HAVE_ARCH_KVM_VFIO_POST
  * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
can be used to change back to remapping mode.
  * Fix typo
 
  This patch series is made of the following groups:
  1-6: Some preparation changes in iommu and irq component, this is based on
  the
   new hierarchy irqdomain logic.
  7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
  detection,
command line parameter.
  10-17, 22-25: Changes related to KVM itself.
  18-20: Changes in VFIO component, this part was previously sent out as
  [RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
  Posted-Interrupts
  21: x86 irq related changes
 
  Feng Wu (26):
genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
  VCPU
iommu: Add new member capability to struct irq_remap_ops
iommu, x86: Define new irte structure for VT-d Posted-Interrupts
iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
iommu, x86: Add cap_pi_support() to detect VT-d PI capability
iommu, x86: Add intel_irq_remapping_capability() for Intel
iommu, x86: define irq_remapping_cap()
KVM: change struct pi_desc for VT-d Posted-Interrupts
KVM: Add some helper functions for Posted-Interrupts
KVM: Initialize VT-d Posted-Interrupts Descriptor
KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
KVM: add interfaces to control PI outside vmx
KVM: Make struct kvm_irq_routing_table accessible
KVM: make kvm_set_msi_irq() public
KVM: kvm-vfio: User API for VT-d Posted-Interrupts
KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
KVM: x86: kvm-vfio: VT-d posted-interrupts setup
x86, irq: Define a global vector for VT-d Posted-Interrupts
KVM: Define a wakeup worker thread for vCPU
KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
KVM: Suppress posted-interrupt when 'SN' is set
iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
 
   Documentation/kernel-parameters.txt|   1 +
   Documentation/virtual/kvm/devices/vfio.txt |   9 ++
   arch/x86/include/asm/entry_arch.h  |   2 +
   arch/x86/include/asm/hardirq.h |   1 +
   arch/x86/include/asm/hw_irq.h  |   2 +
   arch/x86/include/asm/irq_remapping.h   |  11 ++
   arch/x86/include/asm/irq_vectors.h |   1 +
   arch/x86/include/asm/kvm_host.h|  12 ++
   arch/x86/kernel/apic/msi.c |   1 +
   arch/x86/kernel/entry_64.S |   2 +
   arch/x86/kernel/irq.c  |  27 
   arch/x86/kernel/irqinit.c  |   2 +
   arch/x86/kvm/Makefile

[PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support

2015-01-27 Thread Kai Huang
This patch series adds Page Modification Logging (PML) support in VMX.

1) Introduction

PML is a new feature on Intel's Boardwell server platfrom targeted to reduce
overhead of dirty logging mechanism.

The specification can be found at:

http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html

Currently, dirty logging is done by write protection, which write protects guest
memory, and mark dirty GFN to dirty_bitmap in subsequent write fault. This works
fine, except with overhead of additional write fault for logging each dirty GFN.
The overhead can be large if the write operations from geust is intensive.

PML is a hardware-assisted efficient way for dirty logging. PML logs dirty GPA
automatically to a 4K PML memory buffer when CPU changes EPT table's D-bit from
0 to 1. To do this, A new 4K PML buffer base address, and a PML index were added
to VMCS. Initially PML index is set to 512 (8 bytes for each GPA), and CPU
decreases PML index after logging one GPA, and eventually a PML buffer full
VMEXIT happens when PML buffer is fully logged.

With PML, we don't have to use write protection so the intensive write fault EPT
violation can be avoided, with an additional PML buffer full VMEXIT for 512
dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in
dirty logging mode, and therefore more CPU cycles can be allocated to guest, so
it's expected benchmarks in guest will have better performance comparing to
non-PML.

2) Design

a. Enable/Disable PML

PML is per-vcpu (per-VMCS), while EPT table can be shared by vcpus, so we need
to enable/disable PML for all vcpus of guest. A dedicated 4K page will be
allocated for each vcpu when PML is enabled for that vcpu.

Currently, we choose to always enable PML for guest, which means we enables PML
when creating VCPU, and never disable it during guest's life time. This avoids
the complicated logic to enable PML by demand when guest is running. And to
eliminate potential unnecessary GPA logging in non-dirty logging mode, we set
D-bit manually for the slots with dirty logging disabled.

b. Flush PML buffer

When userspace querys dirty_bitmap, it's possible that there are GPAs logged in
vcpu's PML buffer, but as PML buffer is not full, so no VMEXIT happens. In this
case, we'd better to manually flush PML buffer for all vcpus and update the
dirty GPAs to dirty_bitmap.

We do PML buffer flush at the beginning of each VMEXIT, this makes dirty_bitmap
more updated, and also makes logic of flushing PML buffer for all vcpus easier
-- we only need to kick all vcpus out of guest and PML buffer for each vcpu will
be flushed automatically.

3) Tests and benchmark results

I tested specjbb benchmark, which is memory intensive to measure PML. All tests
are done in below configuration:

Machine (Boardwell server): 16 CPUs (1.4G) + 4G memory
Host Kernel: KVM queue branch. Transparent Hugepage disabled. C-state, P-state,
S-state disabled. Swap disabled.

Guest: Ubuntu 14.04 with kernel 3.13.0-36-generic
Guest: 4 vcpus + 1G memory. All vcpus are pinned.

a. Comapre score with and without PML enabled.

This is to make sure PML won't bring any performance regression as it's always
enabled for guest.

Booting guest with graphic window (no --nographic)

NOPML   PML

109755  109379
108786  109300
109234  109663
109257  107471
108514  108904
109740  107623

avg:109214  108723

performance regression: (109214 - 108723) / 109214 = 0.45%

Booting guest without graphic window (--nographic)

NOPML   PML

109090  109686
109461  110533
110523  108550
109960  110775
109090  109802
110787  109192

avg:109818  109756

performance regression: (109818 - 109756) / 109818 = 0.06%

So there's no noticeable performance regression leaving PML always enabled.

b. Compare specjbb score between PML and Write Protection.

This is used to see how much performance gain PML can bring when guest is in
dirty logging mode.

I modified qemu by adding an additional Monitoring thread to query
dirty_bitmap periodically (once per 1 second). With this thread, we can get
performance gain of PML by comparing specjbb score under PML code path and
write protection code path.

Again, I got score for both with/without graphic window of guest.

Booting guest with graphic window (no --nographic)

PML WP  No monitoring thread

104748  101358
102934  99895
103525  98832
105331  100678
106038  99476
104776  99851

avg:104558  100015  108723 (== PML score in test a)

percent: 96.17% 91.99%  100%

Re: [v3 00/26] Add VT-d Posted-Interrupts support

2015-01-27 Thread Alex Williamson
On Wed, 2015-01-28 at 03:01 +, Wu, Feng wrote:
 
  -Original Message-
  From: Wu, Feng
  Sent: Wednesday, January 21, 2015 10:26 AM
  To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
  g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
  j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
  Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
  io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
  Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support
  
  
   -Original Message-
   From: Wu, Feng
   Sent: Friday, December 12, 2014 11:15 PM
   To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
  x...@kernel.org;
   g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
   j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
   Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
   io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
   Subject: [v3 00/26] Add VT-d Posted-Interrupts support
  
   VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
   With VT-d Posted-Interrupts enabled, external interrupts from
   direct-assigned devices can be delivered to guests without VMM
   intervention when guest is running in non-root mode.
  
   You can find the VT-d Posted-Interrtups Spec. in the following URL:
  
  http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
   y/vt-directed-io-spec.html
  
   v1-v2:
   * Use VFIO framework to enable this feature, the VFIO part of this series 
   is
 base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control
   * Rebase this patchset on
   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
 then revise some irq logic based on the new hierarchy irqdomain patches
   provided
 by Jiang Liu jiang@linux.intel.com
  
   v2-v3:
   * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
 preempted or blocked.
   * KVM_DEV_VFIO_DEVICE_POSTING_IRQ --
   KVM_DEV_VFIO_DEVICE_POST_IRQ
   * __KVM_HAVE_ARCH_KVM_VFIO_POSTING --
   __KVM_HAVE_ARCH_KVM_VFIO_POST
   * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
 can be used to change back to remapping mode.
   * Fix typo
  
   This patch series is made of the following groups:
   1-6: Some preparation changes in iommu and irq component, this is based on
   the
new hierarchy irqdomain logic.
   7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
   detection,
 command line parameter.
   10-17, 22-25: Changes related to KVM itself.
   18-20: Changes in VFIO component, this part was previously sent out as
   [RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
   Posted-Interrupts
   21: x86 irq related changes
  
   Feng Wu (26):
 genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
   VCPU
 iommu: Add new member capability to struct irq_remap_ops
 iommu, x86: Define new irte structure for VT-d Posted-Interrupts
 iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
 x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
 iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
 iommu, x86: Add cap_pi_support() to detect VT-d PI capability
 iommu, x86: Add intel_irq_remapping_capability() for Intel
 iommu, x86: define irq_remapping_cap()
 KVM: change struct pi_desc for VT-d Posted-Interrupts
 KVM: Add some helper functions for Posted-Interrupts
 KVM: Initialize VT-d Posted-Interrupts Descriptor
 KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
 KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
 KVM: add interfaces to control PI outside vmx
 KVM: Make struct kvm_irq_routing_table accessible
 KVM: make kvm_set_msi_irq() public
 KVM: kvm-vfio: User API for VT-d Posted-Interrupts
 KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
 KVM: x86: kvm-vfio: VT-d posted-interrupts setup
 x86, irq: Define a global vector for VT-d Posted-Interrupts
 KVM: Define a wakeup worker thread for vCPU
 KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
 KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
 KVM: Suppress posted-interrupt when 'SN' is set
 iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
  
Documentation/kernel-parameters.txt|   1 +
Documentation/virtual/kvm/devices/vfio.txt |   9 ++
arch/x86/include/asm/entry_arch.h  |   2 +
arch/x86/include/asm/hardirq.h |   1 +
arch/x86/include/asm/hw_irq.h  |   2 +
arch/x86/include/asm/irq_remapping.h   |  11 ++
arch/x86/include/asm/irq_vectors.h |   1 +
arch/x86/include/asm/kvm_host.h|  12 ++
arch/x86/kernel/apic/msi.c |   1 +
arch/x86/kernel/entry_64.S