Re: [GIT PULL 0/5] KVM: s390: Let user space control the cpu states

2014-07-21 Thread Christian Borntraeger
On 18/07/14 16:49, Paolo Bonzini wrote:
 Il 15/07/2014 15:27, Christian Borntraeger ha scritto:
 Paolo,

 The following changes since commit 9f6226a762c7ae02f6a23a3d4fc552dafa57ea23:

   arch: x86: kvm: x86.c: Cleaning up variable is set more than once 
 (2014-06-30 16:52:04 +0200)

 are available in the git repository at:

   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
 tags/kvm-s390-20140715

 for you to fetch changes up to 6352e4d2dd9a349024a41356148eced553e1dce4:

   KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control 
 (2014-07-10 14:11:17 +0200)

 
 This series enables the KVM_(S|G)ET_MP_STATE ioctls on s390 to make
 the cpu state settable by user space.

 This is necessary to avoid races in s390 SIGP/reset handling which
 happen because some SIGPs are handled in QEMU, while others are
 handled in the kernel. Together with the busy conditions as return
 value of SIGP races happen especially in areas like starting and
 stopping of CPUs. (For example, there is a program 'cpuplugd', that
 runs on several s390 distros which does automatic onlining and
 offlining on cpus.)

 As soon as the MPSTATE interface is used, user space takes complete
 control of the cpu states. Otherwise the kernel will use the old way.

 Therefore, the new kernel continues to work fine with old QEMUs.

 
 David Hildenbrand (5):
   KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time
   KVM: s390: move finalization of SIGP STOP orders to kvm_s390_vcpu_stop
   KVM: s390: remove __cpu_is_stopped and expose is_vcpu_stopped
   KVM: prepare for KVM_(S|G)ET_MP_STATE on other architectures
   KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control

  Documentation/virtual/kvm/api.txt | 31 ++-
  arch/s390/include/asm/kvm_host.h  |  1 +
  arch/s390/kvm/diag.c  |  3 ++-
  arch/s390/kvm/intercept.c | 32 ++--
  arch/s390/kvm/kvm-s390.c  | 52 
 +++
  arch/s390/kvm/kvm-s390.h  | 10 ++--
  arch/s390/kvm/sigp.c  |  7 +-
  include/uapi/linux/kvm.h  |  7 +-
  8 files changed, 98 insertions(+), 45 deletions(-)

 
 Alex, wdyt about this patch series?  Does it make sense to use
 KVM_GET/SET_MP_STATE or should the one-reg interface be a better match?
 
 It's a bit weird that running and halted map to the same mp_state on
 s390.  I would be more confident that KVM_GET/SET_MP_STATE is the right
 choice if it had at least the KVM_MP_STATE_RUNNABLE and
 KVM_MP_STATE_HALTED.  Christian, where is the halted state stored, is it
 in the PSW?

Yes, there is a bit in the PSW called wait. It is pretty much similar to 
the HLT instruction: The CPU does not continue the execution, but it will
accept all interrupts that are not fenced via control registers or PSW. 
Its mostly used for cpu_idle. KVM on s390 is always doing the wait in the
kernel (IOW, we always have something like halt_in_kernel), except for the
disabled wait which boils down to no execution and all interrupts off. This is
used for error states of the OS and a special case (we set the guest in the
panic state).

So having such a state wont buy us much. It would be even wrong, because
we want our MP_STATE defines to be a 1:1 match of the states that are defined
in the architecture as proper CPU states. Some of the SIGP calls will return the
state of the target CPU and that depends on the CPU state as defined in the
architecture. The wait bit does not have an influence on the return value.


So instead of modelling as x86, we actually want to model the mp_states as 
defined for the architecture. What I can see from the x86 defines, its somewhat
similar: it matches the x86 architecture and not the QEMU model.


ONEREG would work as well (you can make it work with almost every interface),
but mp_state looks like a better fit to me, because its is an interface to 
define
CPU states that are not directly tied to runtime registers.
Furthermore, the bits in PSW and registers are only considered by the HW if
the CPU is in the operating state. By using ONEREG, we would have a register
that does not follow that rule.

Christian

PS: See SA22-7832 chapter 4-1 
(http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr009.pdf !! its big)
---snip---
The stopped, operating, load, and check-stop states
are four mutually exclusive states of the CPU. When
the CPU is in the stopped state, instructions and
interruptions, other than the restart interruption, are
not executed. In the operating state, the CPU exe-
cutes instructions and takes interruptions, subject to
the control of the program-status word (PSW) and
control registers, and in the manner specified by the
setting of the operator-facility rate control. The CPU
is in the load state during the initial-program-loading

Re: [PATCH v2 2/2] docs: update ivshmem device spec

2014-07-21 Thread Claudio Fontana
On 20.07.2014 11:38, David Marchand wrote:
 Add some notes on the parts needed to use ivshmem devices: more specifically,
 explain the purpose of an ivshmem server and the basic concept to use the
 ivshmem devices in guests.
 Move some parts of the documentation and re-organise it.
 
 Signed-off-by: David Marchand david.march...@6wind.com

Reviewed-by: Claudio Fontana claudio.font...@huawei.com

 ---
  docs/specs/ivshmem_device_spec.txt |  124 
 +++-
  1 file changed, 93 insertions(+), 31 deletions(-)
 
 diff --git a/docs/specs/ivshmem_device_spec.txt 
 b/docs/specs/ivshmem_device_spec.txt
 index 667a862..f5f2b95 100644
 --- a/docs/specs/ivshmem_device_spec.txt
 +++ b/docs/specs/ivshmem_device_spec.txt
 @@ -2,30 +2,103 @@
  Device Specification for Inter-VM shared memory device
  --
  
 -The Inter-VM shared memory device is designed to share a region of memory to
 -userspace in multiple virtual guests.  The memory region does not belong to 
 any
 -guest, but is a POSIX memory object on the host.  Optionally, the device may
 -support sending interrupts to other guests sharing the same memory region.
 +The Inter-VM shared memory device is designed to share a memory region 
 (created
 +on the host via the POSIX shared memory API) between multiple QEMU processes
 +running different guests. In order for all guests to be able to pick up the
 +shared memory area, it is modeled by QEMU as a PCI device exposing said 
 memory
 +to the guest as a PCI BAR.
 +The memory region does not belong to any guest, but is a POSIX memory object 
 on
 +the host. The host can access this shared memory if needed.
 +
 +The device also provides an optional communication mechanism between guests
 +sharing the same memory object. More details about that in the section 
 'Guest to
 +guest communication' section.
  
  
  The Inter-VM PCI device
  ---
  
 -*BARs*
 +From the VM point of view, the ivshmem PCI device supports three BARs.
 +
 +- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI 
 is
 +  not used.
 +- BAR1 is used for MSI-X when it is enabled in the device.
 +- BAR2 is used to access the shared memory object.
 +
 +It is your choice how to use the device but you must choose between two
 +behaviors :
 +
 +- basically, if you only need the shared memory part, you will map BAR2.
 +  This way, you have access to the shared memory in guest and can use it as 
 you
 +  see fit (memnic, for example, uses it in userland
 +  http://dpdk.org/browse/memnic).
 +
 +- BAR0 and BAR1 are used to implement an optional communication mechanism
 +  through interrupts in the guests. If you need an event mechanism between 
 the
 +  guests accessing the shared memory, you will most likely want to write a
 +  kernel driver that will handle interrupts. See details in the section 
 'Guest
 +  to guest communication' section.
 +
 +The behavior is chosen when starting your QEMU processes:
 +- no communication mechanism needed, the first QEMU to start creates the 
 shared
 +  memory on the host, subsequent QEMU processes will use it.
 +
 +- communication mechanism needed, an ivshmem server must be started before 
 any
 +  QEMU processes, then each QEMU process connects to the server unix socket.
 +
 +For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
 +
 +
 +Guest to guest communication
 +
 +
 +This section details the communication mechanism between the guests accessing
 +the ivhsmem shared memory.
  
 -The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
 -registers.  BAR1 is used for MSI-X when it is enabled in the device.  BAR2 is
 -used to map the shared memory object from the host.  The size of BAR2 is
 -specified when the guest is started and must be a power of 2 in size.
 +*ivshmem server*
  
 -*Registers*
 +This server code is available in qemu.git/contrib/ivshmem-server.
  
 -The device currently supports 4 registers of 32-bits each.  Registers
 -are used for synchronization between guests sharing the same memory object 
 when
 -interrupts are supported (this requires using the shared memory server).
 +The server must be started on the host before any guest.
 +It creates a shared memory object then waits for clients to connect on an 
 unix
 +socket.
  
 -The server assigns each VM an ID number and sends this ID number to the QEMU
 -process when the guest starts.
 +For each client (QEMU processes) that connects to the server:
 +- the server assigns an ID for this client and sends this ID to him as the 
 first
 +  message,
 +- the server sends a fd to the shared memory object to this client,
 +- the server creates a new set of host eventfds associated to the new client 
 and
 +  sends this set to all already connected clients,
 +- finally, the server sends all the eventfds sets for all clients to the new
 +  client.
 +
 +The server signals all 

Re: [GIT PULL 0/5] KVM: s390: Let user space control the cpu states

2014-07-21 Thread Paolo Bonzini
Il 21/07/2014 09:47, Christian Borntraeger ha scritto:
 So having such a state wont buy us much. It would be even wrong, because
 we want our MP_STATE defines to be a 1:1 match of the states that are defined
 in the architecture as proper CPU states. Some of the SIGP calls will return 
 the
 state of the target CPU and that depends on the CPU state as defined in the
 architecture. The wait bit does not have an influence on the return value.

Thanks for the explanation.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-all: Use 'tmpcpu' instead of 'cpu' in sub-looping to avoid 'cpu' be NULL

2014-07-21 Thread Paolo Bonzini
Il 19/07/2014 03:21, Chen Gang ha scritto:
 If kvm_arch_remove_sw_breakpoint() in CPU_FOREACH() always be fail, it
 will let 'cpu' NULL. And the next kvm_arch_remove_sw_breakpoint() in
 QTAILQ_FOREACH_SAFE() will get NULL parameter for 'cpu'.
 
 And kvm_arch_remove_sw_breakpoint() can assumes 'cpu' must never be NULL,
 so need define additional temporary variable for 'cpu' to avoid the case.
 
 
 Signed-off-by: Chen Gang gang.chen.5...@gmail.com
 ---
  kvm-all.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 3ae30ee..1402f4f 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -2077,12 +2077,13 @@ void kvm_remove_all_breakpoints(CPUState *cpu)
  {
  struct kvm_sw_breakpoint *bp, *next;
  KVMState *s = cpu-kvm_state;
 +CPUState *tmpcpu;
  
  QTAILQ_FOREACH_SAFE(bp, s-kvm_sw_breakpoints, entry, next) {
  if (kvm_arch_remove_sw_breakpoint(cpu, bp) != 0) {
  /* Try harder to find a CPU that currently sees the breakpoint. 
 */
 -CPU_FOREACH(cpu) {
 -if (kvm_arch_remove_sw_breakpoint(cpu, bp) == 0) {
 +CPU_FOREACH(tmpcpu) {
 +if (kvm_arch_remove_sw_breakpoint(tmpcpu, bp) == 0) {
  break;
  }
  }
 

Applying to uq/master, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail

2014-07-21 Thread mihai.cara...@freescale.com
 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+mihai.caraman=freescale@lists.ozlabs.org] On Behalf Of
 mihai.cara...@freescale.com
 Sent: Friday, July 18, 2014 12:06 PM
 To: Alexander Graf; kvm-...@vger.kernel.org
 Cc: linuxppc-...@lists.ozlabs.org; kvm@vger.kernel.org
 Subject: RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, July 17, 2014 5:21 PM
  To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
  Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
  Subject: Re: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to
 fail
 
 
  On 17.07.14 13:22, Mihai Caraman wrote:
   On book3e, guest last instruction is read on the exit path using load
   external pid (lwepx) dedicated instruction. This load operation may
  fail
   due to TLB eviction and execute-but-not-read entries.
  
   This patch lay down the path for an alternative solution to read the
  guest
   last instruction, by allowing kvmppc_get_lat_inst() function to fail.
   Architecture specific implmentations of kvmppc_load_last_inst() may
  read
   last guest instruction and instruct the emulation layer to re-execute
  the
   guest in case of failure.
  
   Make kvmppc_get_last_inst() definition common between architectures.
  
   Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
   ---
 
 ...
 
   diff --git a/arch/powerpc/include/asm/kvm_ppc.h
  b/arch/powerpc/include/asm/kvm_ppc.h
   index e2fd5a1..7f9c634 100644
   --- a/arch/powerpc/include/asm/kvm_ppc.h
   +++ b/arch/powerpc/include/asm/kvm_ppc.h
   @@ -47,6 +47,11 @@ enum emulation_result {
 EMULATE_EXIT_USER,/* emulation requires exit to user-
 space */
 };
  
   +enum instruction_type {
   + INST_GENERIC,
   + INST_SC,/* system call */
   +};
   +
 extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu
  *vcpu);
 extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
 kvm_vcpu
  *vcpu);
 extern void kvmppc_handler_highmem(void);
   @@ -62,6 +67,9 @@ extern int kvmppc_handle_store(struct kvm_run *run,
  struct kvm_vcpu *vcpu,
u64 val, unsigned int bytes,
int is_default_endian);
  
   +extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
   +  enum instruction_type type, u32 *inst);
   +
 extern int kvmppc_emulate_instruction(struct kvm_run *run,
   struct kvm_vcpu *vcpu);
 extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu
  *vcpu);
   @@ -234,6 +242,23 @@ struct kvmppc_ops {
 extern struct kvmppc_ops *kvmppc_hv_ops;
 extern struct kvmppc_ops *kvmppc_pr_ops;
  
   +static inline int kvmppc_get_last_inst(struct kvm_vcpu *vcpu,
   + enum instruction_type type, u32 *inst)
   +{
   + int ret = EMULATE_DONE;
   +
   + /* Load the instruction manually if it failed to do so in the
   +  * exit path */
   + if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED)
   + ret = kvmppc_load_last_inst(vcpu, type, vcpu-
  arch.last_inst);
   +
   +
   + *inst = (ret == EMULATE_DONE  kvmppc_need_byteswap(vcpu)) ?
   + swab32(vcpu-arch.last_inst) : vcpu-arch.last_inst;
 
  This makes even less sense than the previous version. Either you treat
  inst as definitely overwritten or as preserves previous data on
  failure.
 
 Both v4 and v5 versions treat inst as definitely overwritten.
 
 
  So either you unconditionally swap like you did before
 
 If we make abstraction of its symmetry, KVM_INST_FETCH_FAILED is operated
 in host endianness, so it doesn't need byte swap.
 
 I agree with your reasoning if last_inst is initialized and compared with
 data in guest endianess, which is not the case yet for
 KVM_INST_FETCH_FAILED.

Alex, are you relying on the fact that KVM_INST_FETCH_FAILED value is 
symmetrical?
With a non symmetrical value like 0xDEADBEEF, and considering a little-endian 
guest
on a big-endian host, we need to fix kvm logic to initialize and compare 
last_inst
with 0xEFBEADDE swaped value.

Your suggestion to unconditionally swap makes sense only with the above fix, 
otherwise
inst may end up with 0xEFBEADDE swaped value with is wrong.

-Mike
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

[GIT PULL] KVM changes for 3.16-rc7

2014-07-21 Thread Paolo Bonzini
Linus,

The following changes since commit cd3de83f147601356395b57a8673e9c5ff1e59d1:

  Linux 3.16-rc4 (2014-07-06 12:37:51 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to bb18b526a9d8d4a3fe56f234d5013b9f6036978d:

  Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into 
kvm-master (2014-07-08 12:08:58 +0200)



These are mostly PPC changes for 3.16-new things.  However, there is
an x86 change too and it is a regression from 3.14.  As it only affects
nested virtualization and there were other changes in this area in 3.16,
I am not nominating it for 3.15-stable.


Alexander Graf (3):
  PPC: Add _GLOBAL_TOC for 32bit
  KVM: PPC: Book3S PR: Fix ABIv2 on LE
  KVM: PPC: RTAS: Do byte swaps explicitly

Aneesh Kumar K.V (1):
  KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value

Anton Blanchard (1):
  KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()

Bandan Das (1):
  KVM: x86: Check for nested events if there is an injectable interrupt

Mihai Caraman (1):
  KVM: PPC: Book3E: Unlock mmu_lock when setting caching atttribute

Paolo Bonzini (1):
  Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into 
kvm-master

 arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-
 arch/powerpc/include/asm/ppc_asm.h   |  2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  7 +---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  2 +-
 arch/powerpc/kvm/book3s_interrupts.S |  4 ++
 arch/powerpc/kvm/book3s_rmhandlers.S |  6 ++-
 arch/powerpc/kvm/book3s_rtas.c   | 65 +---
 arch/powerpc/kvm/e500_mmu_host.c |  3 +-
 arch/x86/kvm/x86.c   | 12 ++
 10 files changed, 64 insertions(+), 58 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call for agenda for 2014-07-22

2014-07-21 Thread Juan Quintela
Hi

Please, send any topic that you are interested in covering.

Thanks, Juan.

Call details:

15:00 CEST
13:00 UTC
09:00 EDT

Every two weeks

If you need phone number details,  contact me privately

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: nVMX: clean up nested_release_vmcs12 and code around it

2014-07-21 Thread Paolo Bonzini
Make nested_release_vmcs12 idempotent.

Tested-by: Wanpeng Li wanpeng...@linux.intel.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/kvm/vmx.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 462334eaa3c0..3300f4f2da48 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6109,20 +6109,27 @@ static int nested_vmx_check_permission(struct kvm_vcpu 
*vcpu)
 static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 {
u32 exec_control;
+   if (vmx-nested.current_vmptr == -1ull)
+   return;
+
+   /* current_vmptr and current_vmcs12 are always set/reset together */
+   if (WARN_ON(vmx-nested.current_vmcs12 == NULL))
+   return;
+
if (enable_shadow_vmcs) {
-   if (vmx-nested.current_vmcs12 != NULL) {
-   /* copy to memory all shadowed fields in case
-  they were modified */
-   copy_shadow_to_vmcs12(vmx);
-   vmx-nested.sync_shadow_vmcs = false;
-   exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
-   exec_control = ~SECONDARY_EXEC_SHADOW_VMCS;
-   vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
-   vmcs_write64(VMCS_LINK_POINTER, -1ull);
-   }
+   /* copy to memory all shadowed fields in case
+  they were modified */
+   copy_shadow_to_vmcs12(vmx);
+   vmx-nested.sync_shadow_vmcs = false;
+   exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+   exec_control = ~SECONDARY_EXEC_SHADOW_VMCS;
+   vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
+   vmcs_write64(VMCS_LINK_POINTER, -1ull);
}
kunmap(vmx-nested.current_vmcs12_page);
nested_release_page(vmx-nested.current_vmcs12_page);
+   vmx-nested.current_vmptr = -1ull;
+   vmx-nested.current_vmcs12 = NULL;
 }
 
 /*
@@ -6133,12 +6140,9 @@ static void free_nested(struct vcpu_vmx *vmx)
 {
if (!vmx-nested.vmxon)
return;
+
vmx-nested.vmxon = false;
-   if (vmx-nested.current_vmptr != -1ull) {
-   nested_release_vmcs12(vmx);
-   vmx-nested.current_vmptr = -1ull;
-   vmx-nested.current_vmcs12 = NULL;
-   }
+   nested_release_vmcs12(vmx);
if (enable_shadow_vmcs)
free_vmcs(vmx-nested.current_shadow_vmcs);
/* Unpin physical memory we referred to in current vmcs02 */
@@ -6175,11 +6179,8 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMCLEAR, vmptr))
return 1;
 
-   if (vmptr == vmx-nested.current_vmptr) {
+   if (vmptr == vmx-nested.current_vmptr)
nested_release_vmcs12(vmx);
-   vmx-nested.current_vmptr = -1ull;
-   vmx-nested.current_vmcs12 = NULL;
-   }
 
page = nested_get_page(vcpu, vmptr);
if (page == NULL) {
@@ -6521,9 +6522,8 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
skip_emulated_instruction(vcpu);
return 1;
}
-   if (vmx-nested.current_vmptr != -1ull)
-   nested_release_vmcs12(vmx);
 
+   nested_release_vmcs12(vmx);
vmx-nested.current_vmptr = vmptr;
vmx-nested.current_vmcs12 = new_vmcs12;
vmx-nested.current_vmcs12_page = page;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: nVMX: fix lifetime issues for vmcs02 and shadow VMCS

2014-07-21 Thread Paolo Bonzini
I think that commit 26a865f4aa8e (KVM: VMX: fix use after free of
vmx-loaded_vmcs, 2014-01-03) was wrong, as it introduced a use of
a dangling vmcs02. 

The first patch introduces what I think is the right fix, while the
second patch strengthens the invariants around nested_release_vmcs12.

Paolo Bonzini (2):
  KVM: nVMX: fix lifetime issues for vmcs02
  KVM: nVMX: clean up nested_release_vmcs12 and code around it

 arch/x86/kvm/vmx.c | 91 --
 1 file changed, 54 insertions(+), 37 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: nVMX: fix lifetime issues for vmcs02

2014-07-21 Thread Paolo Bonzini
free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in
order to detach it from the shadow vmcs.  However, this is not
available anymore after commit 26a865f4aa8e (KVM: VMX: fix use after
free of vmx-loaded_vmcs, 2014-01-03).

Revert that patch, and fix its problem by forcing a vmcs01 as the
active VMCS before freeing all the nested VMX state.

Reported-by: Wanpeng Li wanpeng...@linux.intel.com
Tested-by: Wanpeng Li wanpeng...@linux.intel.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/kvm/vmx.c | 49 +
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7534a9f67cc8..462334eaa3c0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5772,22 +5772,27 @@ static void nested_free_vmcs02(struct vcpu_vmx *vmx, 
gpa_t vmptr)
 
 /*
  * Free all VMCSs saved for this vcpu, except the one pointed by
- * vmx-loaded_vmcs. These include the VMCSs in vmcs02_pool (except the one
- * currently used, if running L2), and vmcs01 when running L2.
+ * vmx-loaded_vmcs. We must be running L1, so vmx-loaded_vmcs
+ * must be vmx-vmcs01.
  */
 static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx)
 {
struct vmcs02_list *item, *n;
+
+   WARN_ON(vmx-loaded_vmcs != vmx-vmcs01);
list_for_each_entry_safe(item, n, vmx-nested.vmcs02_pool, list) {
-   if (vmx-loaded_vmcs != item-vmcs02)
-   free_loaded_vmcs(item-vmcs02);
+   /*
+* Something will leak if the above WARN triggers.  Better than
+* a use-after-free.
+*/
+   if (vmx-loaded_vmcs == item-vmcs02)
+   continue;
+
+   free_loaded_vmcs(item-vmcs02);
list_del(item-list);
kfree(item);
+   vmx-nested.vmcs02_num--;
}
-   vmx-nested.vmcs02_num = 0;
-
-   if (vmx-loaded_vmcs != vmx-vmcs01)
-   free_loaded_vmcs(vmx-vmcs01);
 }
 
 /*
@@ -7557,13 +7562,31 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
*vcpu)
vmx_complete_interrupts(vmx);
 }
 
+static void vmx_load_vmcs01(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   int cpu;
+
+   if (vmx-loaded_vmcs == vmx-vmcs01)
+   return;
+
+   cpu = get_cpu();
+   vmx-loaded_vmcs = vmx-vmcs01;
+   vmx_vcpu_put(vcpu);
+   vmx_vcpu_load(vcpu, cpu);
+   vcpu-cpu = cpu;
+   put_cpu();
+}
+
 static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
free_vpid(vmx);
-   free_loaded_vmcs(vmx-loaded_vmcs);
+   leave_guest_mode(vcpu);
+   vmx_load_vmcs01(vcpu);
free_nested(vmx);
+   free_loaded_vmcs(vmx-loaded_vmcs);
kfree(vmx-guest_msrs);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vmx);
@@ -8721,7 +8744,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
  unsigned long exit_qualification)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   int cpu;
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 
/* trying to cancel vmlaunch/vmresume is a bug */
@@ -8746,12 +8768,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
   vmcs12-vm_exit_intr_error_code,
   KVM_ISA_VMX);
 
-   cpu = get_cpu();
-   vmx-loaded_vmcs = vmx-vmcs01;
-   vmx_vcpu_put(vcpu);
-   vmx_vcpu_load(vcpu, cpu);
-   vcpu-cpu = cpu;
-   put_cpu();
+   vmx_load_vmcs01(vcpu);
 
vm_entry_controls_init(vmx, vmcs_read32(VM_ENTRY_CONTROLS));
vm_exit_controls_init(vmx, vmcs_read32(VM_EXIT_CONTROLS));
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path

2014-07-21 Thread Gleb Natapov
On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote:
 Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before
 deleting a pinned spte.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 ---
  arch/x86/kvm/mmu.c |   29 +++--
  1 file changed, 23 insertions(+), 6 deletions(-)
 
 Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c
 ===
 --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c  2014-07-09 11:23:59.290744490 
 -0300
 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c   2014-07-09 11:24:58.449632435 
 -0300
 @@ -1208,7 +1208,8 @@
   *
   * Return true if tlb need be flushed.
   */
 -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect)
 +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect,
 +bool skip_pinned)
  {
   u64 spte = *sptep;
  
 @@ -1218,6 +1219,22 @@
  
   rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep);
  
 + if (is_pinned_spte(spte)) {
 + /* keep pinned spte intact, mark page dirty again */
 + if (skip_pinned) {
 + struct kvm_mmu_page *sp;
 + gfn_t gfn;
 +
 + sp = page_header(__pa(sptep));
 + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
 +
 + mark_page_dirty(kvm, gfn);
 + return false;
Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while
populating dirty_bitmap_buffer?

 + } else
 + mmu_reload_pinned_vcpus(kvm);
Can you explain why do you need this?

 + }
 +
 +
   if (pt_protect)
   spte = ~SPTE_MMU_WRITEABLE;
   spte = spte  ~PT_WRITABLE_MASK;
 @@ -1226,7 +1243,7 @@
  }
  
  static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
 -  bool pt_protect)
 +  bool pt_protect, bool skip_pinned)
  {
   u64 *sptep;
   struct rmap_iterator iter;
 @@ -1235,7 +1252,7 @@
   for (sptep = rmap_get_first(*rmapp, iter); sptep;) {
   BUG_ON(!(*sptep  PT_PRESENT_MASK));
  
 - flush |= spte_write_protect(kvm, sptep, pt_protect);
 + flush |= spte_write_protect(kvm, sptep, pt_protect, 
 skip_pinned);
   sptep = rmap_get_next(iter);
   }
  
 @@ -1261,7 +1278,7 @@
   while (mask) {
   rmapp = __gfn_to_rmap(slot-base_gfn + gfn_offset + __ffs(mask),
 PT_PAGE_TABLE_LEVEL, slot);
 - __rmap_write_protect(kvm, rmapp, false);
 + __rmap_write_protect(kvm, rmapp, false, true);
  
   /* clear the first set bit */
   mask = mask - 1;
 @@ -1280,7 +1297,7 @@
   for (i = PT_PAGE_TABLE_LEVEL;
i  PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
   rmapp = __gfn_to_rmap(gfn, i, slot);
 - write_protected |= __rmap_write_protect(kvm, rmapp, true);
 + write_protected |= __rmap_write_protect(kvm, rmapp, true, 
 false);
   }
  
   return write_protected;
 @@ -4565,7 +4582,7 @@
  
   for (index = 0; index = last_index; ++index, ++rmapp) {
   if (*rmapp)
 - __rmap_write_protect(kvm, rmapp, false);
 + __rmap_write_protect(kvm, rmapp, false, false);
  
   if (need_resched() || spin_needbreak(kvm-mmu_lock))
   cond_resched_lock(kvm-mmu_lock);
 
 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-21 Thread mihai.cara...@freescale.com
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, July 03, 2014 3:21 PM
 To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
 Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
 Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
 SPE/FP/AltiVec int numbers
 
 
 On 30.06.14 17:34, Mihai Caraman wrote:
  Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
  which share the same interrupt numbers.
 
  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
  ---
  v2:
- remove outdated definitions
 
arch/powerpc/include/asm/kvm_asm.h|  8 
arch/powerpc/kvm/booke.c  | 17 +
arch/powerpc/kvm/booke.h  |  4 ++--
arch/powerpc/kvm/booke_interrupts.S   |  9 +
arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
arch/powerpc/kvm/e500.c   | 10 ++
arch/powerpc/kvm/e500_emulate.c   | 10 ++
7 files changed, 30 insertions(+), 32 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm_asm.h
 b/arch/powerpc/include/asm/kvm_asm.h
  index 9601741..c94fd33 100644
  --- a/arch/powerpc/include/asm/kvm_asm.h
  +++ b/arch/powerpc/include/asm/kvm_asm.h
  @@ -56,14 +56,6 @@
/* E500 */
#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
  -/*
  - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
 defines
  - */
  -#define BOOKE_INTERRUPT_SPE_UNAVAIL
 BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
  -#define BOOKE_INTERRUPT_SPE_FP_DATA
 BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
  -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
 BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
  -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
  -   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
 
 I think I'd prefer to keep them separate.
 
#define BOOKE_INTERRUPT_SPE_FP_ROUND 34
#define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
#define BOOKE_INTERRUPT_DOORBELL 36
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index ab62109..3c86d9b 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct
 kvm_vcpu *vcpu,
  case BOOKE_IRQPRIO_ITLB_MISS:
  case BOOKE_IRQPRIO_SYSCALL:
  case BOOKE_IRQPRIO_FP_UNAVAIL:
  -   case BOOKE_IRQPRIO_SPE_UNAVAIL:
  -   case BOOKE_IRQPRIO_SPE_FP_DATA:
  +   case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
  +   case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:
 
 #ifdef CONFIG_KVM_E500V2
case ...SPE:
 #else
case ..ALTIVEC:
 #endif
 
  case BOOKE_IRQPRIO_SPE_FP_ROUND:
  case BOOKE_IRQPRIO_AP_UNAVAIL:
  allowed = 1;
  @@ -977,18 +977,19 @@ int kvmppc_handle_exit(struct kvm_run *run,
 struct kvm_vcpu *vcpu,
  break;
 
#ifdef CONFIG_SPE
  -   case BOOKE_INTERRUPT_SPE_UNAVAIL: {
  +   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: {
  if (vcpu-arch.shared-msr  MSR_SPE)
  kvmppc_vcpu_enable_spe(vcpu);
  else
  kvmppc_booke_queue_irqprio(vcpu,
  -  BOOKE_IRQPRIO_SPE_UNAVAIL);
  +   BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL);
  r = RESUME_GUEST;
  break;
  }
 
  -   case BOOKE_INTERRUPT_SPE_FP_DATA:
  -   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA);
  +   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
  +   kvmppc_booke_queue_irqprio(vcpu,
  +   BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
  r = RESUME_GUEST;
  break;
 
  @@ -997,7 +998,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct
 kvm_vcpu *vcpu,
  r = RESUME_GUEST;
  break;
#else
  -   case BOOKE_INTERRUPT_SPE_UNAVAIL:
  +   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL:
  /*
   * Guest wants SPE, but host kernel doesn't support it.  Send
   * an unimplemented operation program check to the guest.
  @@ -1010,7 +1011,7 @@ int kvmppc_handle_exit(struct kvm_run *run,
 struct kvm_vcpu *vcpu,
   * These really should never happen without CONFIG_SPE,
   * as we should never enable the real MSR[SPE] in the guest.
   */
  -   case BOOKE_INTERRUPT_SPE_FP_DATA:
  +   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
  case BOOKE_INTERRUPT_SPE_FP_ROUND:
  printk(KERN_CRIT %s: unexpected SPE interrupt %u at
 %08lx\n,
 __func__, exit_nr, vcpu-arch.pc);
  diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
  index b632cd3..f182b32 100644
  --- a/arch/powerpc/kvm/booke.h
  +++ b/arch/powerpc/kvm/booke.h
  @@ -32,8 +32,8 @@
#define BOOKE_IRQPRIO_ALIGNMENT 2
#define BOOKE_IRQPRIO_PROGRAM 3
#define BOOKE_IRQPRIO_FP_UNAVAIL 4
  -#define BOOKE_IRQPRIO_SPE_UNAVAIL 5
  -#define BOOKE_IRQPRIO_SPE_FP_DATA 6
  +#define 

[PATCH] vhost: Add polling mode

2014-07-21 Thread Razya Ladelsky
Hello All,

When vhost is waiting for buffers from the guest driver (e.g., more 
packets
to send in vhost-net's transmit queue), it normally goes to sleep and 
waits
for the guest to kick it. This kick involves a PIO in the guest, and
therefore an exit (and possibly userspace involvement in translating this 
PIO
exit into a file descriptor event), all of which hurts performance.

If the system is under-utilized (has cpu time to spare), vhost can 
continuously poll the virtqueues for new buffers, and avoid asking 
the guest to kick us.
This patch adds an optional polling mode to vhost, that can be enabled 
via a kernel module parameter, poll_start_rate.

When polling is active for a virtqueue, the guest is asked to
disable notification (kicks), and the worker thread continuously checks 
for
new buffers. When it does discover new buffers, it simulates a kick by
invoking the underlying backend driver (such as vhost-net), which thinks 
it
got a real kick from the guest, and acts accordingly. If the underlying
driver asks not to be kicked, we disable polling on this virtqueue.

We start polling on a virtqueue when we notice it has
work to do. Polling on this virtqueue is later disabled after 3 seconds of
polling turning up no new work, as in this case we are better off 
returning
to the exit-based notification mechanism. The default timeout of 3 seconds
can be changed with the poll_stop_idle kernel module parameter.

This polling approach makes lot of sense for new HW with posted-interrupts
for which we have exitless host-to-guest notifications. But even with 
support 
for posted interrupts, guest-to-host communication still causes exits. 
Polling adds the missing part.

When systems are overloaded, there won?t be enough cpu time for the 
various 
vhost threads to poll their guests' devices. For these scenarios, we plan 
to add support for vhost threads that can be shared by multiple devices, 
even of multiple vms. 
Our ultimate goal is to implement the I/O acceleration features described 
in:
KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
https://www.youtube.com/watch?v=9EyweibHfEs
and
https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html

 
Comments are welcome, 
Thank you,
Razya

From: Razya Ladelsky ra...@il.ibm.com

Add an optional polling mode to continuously poll the virtqueues
for new buffers, and avoid asking the guest to kick us.

Signed-off-by: Razya Ladelsky ra...@il.ibm.com
---
 drivers/vhost/net.c   |6 +-
 drivers/vhost/scsi.c  |5 +-
 drivers/vhost/vhost.c |  247 
+++--
 drivers/vhost/vhost.h |   37 +++-
 4 files changed, 277 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 971a760..558aecb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct 
file *f)
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
-   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, 
dev);
-   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, 
dev);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
+   vqs[VHOST_NET_VQ_TX]);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
+   vqs[VHOST_NET_VQ_RX]);
 
f-private_data = n;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4f4ffa4..56f0233 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, 
struct file *f)
if (!vqs)
goto err_vqs;
 
-   vhost_work_init(vs-vs_completion_work, 
vhost_scsi_complete_cmd_work);
-   vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work);
-
+   vhost_work_init(vs-vs_completion_work, NULL, 
vhost_scsi_complete_cmd_work);
+   vhost_work_init(vs-vs_event_work, NULL, tcm_vhost_evt_work);
vs-vs_events_nr = 0;
vs-vs_events_missed = false;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c90f437..678d766 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -24,9 +24,17 @@
 #include linux/slab.h
 #include linux/kthread.h
 #include linux/cgroup.h
+#include linux/jiffies.h
 #include linux/module.h
 
 #include vhost.h
+static int poll_start_rate = 0;
+module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue 
when rate of events is at least this number per jiffy. If 0, never start 
polling.);
+
+static int poll_stop_idle = 3*HZ; /* 3 seconds */
+module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue 
after this many jiffies of no work.);
 
 enum {
VHOST_MEMORY_MAX_NREGIONS = 64,
@@ -58,27 +66,27 @@ static int vhost_poll_wakeup(wait_queue_t *wait, 
unsigned mode, int sync,
return 0;
 }
 

Re: [PATCH v2 1/2] contrib: add ivshmem client and server

2014-07-21 Thread Eric Blake
On 07/20/2014 03:38 AM, David Marchand wrote:
 When using ivshmem devices, notifications between guests can be sent as
 interrupts using a ivshmem-server (typical use described in documentation).
 The client is provided as a debug tool.
 
 Signed-off-by: Olivier Matz olivier.m...@6wind.com
 Signed-off-by: David Marchand david.march...@6wind.com
 ---
  contrib/ivshmem-client/Makefile |   26 ++

 +++ b/contrib/ivshmem-client/Makefile
 @@ -0,0 +1,26 @@
 +# Copyright 2014 6WIND S.A.
 +# All rights reserved

This file has no other license, and is therefore incompatible with
GPLv2.  You'll need to resubmit under an appropriately open license.

 +++ b/contrib/ivshmem-client/ivshmem-client.h
 @@ -0,0 +1,238 @@
 +/*
 + * Copyright(c) 2014 6WIND S.A.
 + * All rights reserved.
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.

I'm not a lawyer, but to me, this license is self-contradictory.  You
can't have All rights reserved and still be GPL, because the point of
the GPL is that you are NOT reserving all rights, but explicitly
granting your user various rights (on condition that they likewise grant
those rights to others).  But you're not the only file in the qemu code
base with this questionable mix.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 1/2] contrib: add ivshmem client and server

2014-07-21 Thread Daniel P. Berrange
On Mon, Jul 21, 2014 at 08:21:21AM -0600, Eric Blake wrote:
 On 07/20/2014 03:38 AM, David Marchand wrote:
  When using ivshmem devices, notifications between guests can be sent as
  interrupts using a ivshmem-server (typical use described in documentation).
  The client is provided as a debug tool.
  
  Signed-off-by: Olivier Matz olivier.m...@6wind.com
  Signed-off-by: David Marchand david.march...@6wind.com
  ---
   contrib/ivshmem-client/Makefile |   26 ++
 
  +++ b/contrib/ivshmem-client/Makefile
  @@ -0,0 +1,26 @@
  +# Copyright 2014 6WIND S.A.
  +# All rights reserved
 
 This file has no other license, and is therefore incompatible with
 GPLv2.  You'll need to resubmit under an appropriately open license.
 
  +++ b/contrib/ivshmem-client/ivshmem-client.h
  @@ -0,0 +1,238 @@
  +/*
  + * Copyright(c) 2014 6WIND S.A.
  + * All rights reserved.
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2.  See
  + * the COPYING file in the top-level directory.
 
 I'm not a lawyer, but to me, this license is self-contradictory.  You
 can't have All rights reserved and still be GPL, because the point of
 the GPL is that you are NOT reserving all rights, but explicitly
 granting your user various rights (on condition that they likewise grant
 those rights to others).  But you're not the only file in the qemu code
 base with this questionable mix.

In any case adding the term 'All rights reserved' is said to be redundant
obsolete these days

  https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] x86: Check DR6.RTM is writable

2014-07-21 Thread Paolo Bonzini
Il 15/07/2014 16:41, Nadav Amit ha scritto:
 Recently discovered bug shows DR6.RTM is fixed to one. The bug is only 
 apparent
 when the host emulates the MOV-DR instruction or when the host debugs the
 guest kernel. This patch tests whether DR6.RTM is indeed accessible according
 to RTM support as reported by cpuid.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  x86/emulator.c | 15 +++
  1 file changed, 15 insertions(+)
 
 diff --git a/x86/emulator.c b/x86/emulator.c
 index 1fd0ca6..f68882f 100644
 --- a/x86/emulator.c
 +++ b/x86/emulator.c
 @@ -3,6 +3,7 @@
  #include libcflat.h
  #include desc.h
  #include types.h
 +#include processor.h
  
  #define memset __builtin_memset
  #define TESTDEV_IO_PORT 0xe0
 @@ -870,6 +871,19 @@ static void test_nop(uint64_t *mem, uint8_t *insn_page,
   report(nop, outregs.rax == inregs.rax);
  }
  
 +static void test_mov_dr(uint64_t *mem, uint8_t *insn_page,
 + uint8_t *alt_insn_page, void *insn_ram)
 +{
 + bool rtm_support = cpuid(7).b  (1  11);
 + unsigned long dr6_fixed_1 = rtm_support ? 0xfffe0ff0ul : 0x0ff0ul;
 + inregs = (struct regs){ .rax = 0 };
 + MK_INSN(mov_to_dr6, movq %rax, %dr6\n\t);
 + trap_emulator(mem, alt_insn_page, insn_mov_to_dr6);
 + MK_INSN(mov_from_dr6, movq %dr6, %rax\n\t);
 + trap_emulator(mem, alt_insn_page, insn_mov_from_dr6);
 + report(mov_dr6, outregs.rax == dr6_fixed_1);
 +}
 +
  static void test_crosspage_mmio(volatile uint8_t *mem)
  {
  volatile uint16_t w, *pw;
 @@ -1072,6 +1086,7 @@ int main()
   test_movabs(mem, insn_page, alt_insn_page, insn_ram);
   test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram);
   test_nop(mem, insn_page, alt_insn_page, insn_ram);
 + test_mov_dr(mem, insn_page, alt_insn_page, insn_ram);
   test_crosspage_mmio(mem);
  
   test_string_io_mmio(mem);
 

Thanks, applying.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: DR6/7.RTM cannot be written

2014-07-21 Thread Paolo Bonzini
Il 15/07/2014 16:37, Nadav Amit ha scritto:
 Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
 not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
 current KVM implementation. This bug is apparent only if the MOV-DR 
 instruction
 is emulated or the host also debugs the guest.
 
 This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared 
 and
 set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
 it is not a complete fix, as debugging of RTM is still unsupported.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/include/asm/kvm_host.h |  8 +---
  arch/x86/kvm/cpuid.h|  8 
  arch/x86/kvm/vmx.c  |  4 ++--
  arch/x86/kvm/x86.c  | 22 --
  4 files changed, 31 insertions(+), 11 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index b8a4480..a84eaf7 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -152,14 +152,16 @@ enum {
  
  #define DR6_BD   (1  13)
  #define DR6_BS   (1  14)
 -#define DR6_FIXED_1  0x0ff0
 -#define DR6_VOLATILE 0xe00f
 +#define DR6_RTM  (1  16)
 +#define DR6_FIXED_1  0xfffe0ff0
 +#define DR6_INIT 0x0ff0
 +#define DR6_VOLATILE 0x0001e00f
  
  #define DR7_BP_EN_MASK   0x00ff
  #define DR7_GE   (1  9)
  #define DR7_GD   (1  13)
  #define DR7_FIXED_1  0x0400
 -#define DR7_VOLATILE 0x23ff
 +#define DR7_VOLATILE 0x2bff
  
  /* apic attention bits */
  #define KVM_APIC_CHECK_VAPIC 0
 diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
 index f908731..a538059 100644
 --- a/arch/x86/kvm/cpuid.h
 +++ b/arch/x86/kvm/cpuid.h
 @@ -95,4 +95,12 @@ static inline bool guest_cpuid_has_gbpages(struct kvm_vcpu 
 *vcpu)
   best = kvm_find_cpuid_entry(vcpu, 0x8001, 0);
   return best  (best-edx  bit(X86_FEATURE_GBPAGES));
  }
 +
 +static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_cpuid_entry2 *best;
 +
 + best = kvm_find_cpuid_entry(vcpu, 7, 0);
 + return best  (best-ebx  bit(X86_FEATURE_RTM));
 +}
  #endif
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 0c9569b..1fd3598 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -4892,7 +4892,7 @@ static int handle_exception(struct kvm_vcpu *vcpu)
   if (!(vcpu-guest_debug 
 (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
   vcpu-arch.dr6 = ~15;
 - vcpu-arch.dr6 |= dr6;
 + vcpu-arch.dr6 |= dr6 | DR6_RTM;
   if (!(dr6  ~DR6_RESERVED)) /* icebp */
   skip_emulated_instruction(vcpu);
  
 @@ -5151,7 +5151,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
   return 0;
   } else {
   vcpu-arch.dr7 = ~DR7_GD;
 - vcpu-arch.dr6 |= DR6_BD;
 + vcpu-arch.dr6 |= DR6_BD | DR6_RTM;
   vmcs_writel(GUEST_DR7, vcpu-arch.dr7);
   kvm_queue_exception(vcpu, DB_VECTOR);
   return 1;
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index f750b69..fae064f 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -759,6 +759,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu)
   vcpu-arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
  }
  
 +static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
 +{
 + u64 fixed = DR6_FIXED_1;
 +
 + if (!guest_cpuid_has_rtm(vcpu))
 + fixed |= DR6_RTM;
 + return fixed;
 +}
 +
  static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
  {
   switch (dr) {
 @@ -774,7 +783,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, 
 unsigned long val)
   case 6:
   if (val  0xULL)
   return -1; /* #GP */
 - vcpu-arch.dr6 = (val  DR6_VOLATILE) | DR6_FIXED_1;
 + vcpu-arch.dr6 = (val  DR6_VOLATILE) | kvm_dr6_fixed(vcpu);
   kvm_update_dr6(vcpu);
   break;
   case 5:
 @@ -5115,7 +5124,8 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu 
 *vcpu, unsigned long rflag
*/
   if (unlikely(rflags  X86_EFLAGS_TF)) {
   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP) {
 - kvm_run-debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
 + kvm_run-debug.arch.dr6 = DR6_BS | DR6_FIXED_1 |
 +   DR6_RTM;
   kvm_run-debug.arch.pc = vcpu-arch.singlestep_rip;
   kvm_run-debug.arch.exception = DB_VECTOR;
   kvm_run-exit_reason = KVM_EXIT_DEBUG;
 @@ -5128,7 +5138,7 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu 
 *vcpu, 

Re: [PATCH v2 1/2] contrib: add ivshmem client and server

2014-07-21 Thread David Marchand

Hello Eric,

On 07/21/2014 04:21 PM, Eric Blake wrote:

On 07/20/2014 03:38 AM, David Marchand wrote:

+# Copyright 2014 6WIND S.A.
+# All rights reserved


This file has no other license, and is therefore incompatible with
GPLv2.  You'll need to resubmit under an appropriately open license.


missed the makefiles ...




+++ b/contrib/ivshmem-client/ivshmem-client.h
@@ -0,0 +1,238 @@
+/*
+ * Copyright(c) 2014 6WIND S.A.
+ * All rights reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.


I'm not a lawyer, but to me, this license is self-contradictory.  You
can't have All rights reserved and still be GPL, because the point of
the GPL is that you are NOT reserving all rights, but explicitly
granting your user various rights (on condition that they likewise grant
those rights to others).  But you're not the only file in the qemu code
base with this questionable mix.



Hum, ok, will update.


--
David Marchand
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 1/2] contrib: add ivshmem client and server

2014-07-21 Thread Markus Armbruster
David Marchand david.march...@6wind.com writes:

 When using ivshmem devices, notifications between guests can be sent as
 interrupts using a ivshmem-server (typical use described in documentation).
 The client is provided as a debug tool.
[...]
 diff --git a/contrib/ivshmem-client/ivshmem-client.c
 b/contrib/ivshmem-client/ivshmem-client.c
 new file mode 100644
 index 000..32ef3ef
 --- /dev/null
 +++ b/contrib/ivshmem-client/ivshmem-client.c
 @@ -0,0 +1,418 @@
 +/*
 + * Copyright(c) 2014 6WIND S.A.
 + * All rights reserved.
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + */

Do you have a compelling reason why you can't license under GPLv2+?  If
yes, please explain it to us.  If no, please use

 * This work is licensed under the terms of the GNU GPL, version 2 or
 * later.  See the COPYING file in the top-level directory.

[...]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 9/9] KVM: s390: add ipte to trace event decoding

2014-07-21 Thread Christian Borntraeger
IPTE intercept can happen, let's decode that.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/include/uapi/asm/sie.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/include/uapi/asm/sie.h b/arch/s390/include/uapi/asm/sie.h
index 5d9cc19..d4096fd 100644
--- a/arch/s390/include/uapi/asm/sie.h
+++ b/arch/s390/include/uapi/asm/sie.h
@@ -108,6 +108,7 @@
exit_code_ipa0(0xB2, 0x17, STETR),\
exit_code_ipa0(0xB2, 0x18, PC),   \
exit_code_ipa0(0xB2, 0x20, SERVC),\
+   exit_code_ipa0(0xB2, 0x21, IPTE), \
exit_code_ipa0(0xB2, 0x28, PT),   \
exit_code_ipa0(0xB2, 0x29, ISKE), \
exit_code_ipa0(0xB2, 0x2a, RRBE), \
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 7/9] KVM: s390: document KVM_CAP_S390_IRQCHIP

2014-07-21 Thread Christian Borntraeger
From: Cornelia Huck cornelia.h...@de.ibm.com

Let's document that this is a capability that may be enabled per-vm.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7ab41e9..f1979c7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3023,3 +3023,12 @@ Parameters: args[0] is the XICS device fd
 args[1] is the XICS CPU number (server ID) for this vcpu
 
 This capability connects the vcpu to an in-kernel XICS device.
+
+6.8 KVM_CAP_S390_IRQCHIP
+
+Architectures: s390
+Target: vm
+Parameters: none
+
+This capability enables the in-kernel irqchip for s390. Please refer to
+4.24 KVM_CREATE_IRQCHIP for details.
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 6/9] KVM: document target of capability enablement

2014-07-21 Thread Christian Borntraeger
From: Cornelia Huck cornelia.h...@de.ibm.com

Capabilities can be enabled on a vcpu or (since recently) on a vm. Document
this and note for the existing capabilites whether they are per-vcpu or
per-vm.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a41465b..7ab41e9 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2875,15 +2875,18 @@ The fields in each entry are defined as follows:
 6. Capabilities that can be enabled
 ---
 
-There are certain capabilities that change the behavior of the virtual CPU when
-enabled. To enable them, please see section 4.37. Below you can find a list of
-capabilities and what their effect on the vCPU is when enabling them.
+There are certain capabilities that change the behavior of the virtual CPU or
+the virtual machine when enabled. To enable them, please see section 4.37.
+Below you can find a list of capabilities and what their effect on the vCPU or
+the virtual machine is when enabling them.
 
 The following information is provided along with the description:
 
   Architectures: which instruction set architectures provide this ioctl.
   x86 includes both i386 and x86_64.
 
+  Target: whether this is a per-vcpu or per-vm capability.
+
   Parameters: what parameters are accepted by the capability.
 
   Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
@@ -2893,6 +2896,7 @@ The following information is provided along with the 
description:
 6.1 KVM_CAP_PPC_OSI
 
 Architectures: ppc
+Target: vcpu
 Parameters: none
 Returns: 0 on success; -1 on error
 
@@ -2907,6 +2911,7 @@ When this capability is enabled, KVM_EXIT_OSI can occur.
 6.2 KVM_CAP_PPC_PAPR
 
 Architectures: ppc
+Target: vcpu
 Parameters: none
 Returns: 0 on success; -1 on error
 
@@ -2926,6 +2931,7 @@ When this capability is enabled, KVM_EXIT_PAPR_HCALL can 
occur.
 6.3 KVM_CAP_SW_TLB
 
 Architectures: ppc
+Target: vcpu
 Parameters: args[0] is the address of a struct kvm_config_tlb
 Returns: 0 on success; -1 on error
 
@@ -2968,6 +2974,7 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and 
KVM_MMU_FSL_BOOKE_HV:
 6.4 KVM_CAP_S390_CSS_SUPPORT
 
 Architectures: s390
+Target: vcpu
 Parameters: none
 Returns: 0 on success; -1 on error
 
@@ -2979,9 +2986,13 @@ handled in-kernel, while the other I/O instructions are 
passed to userspace.
 When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
 SUBCHANNEL intercepts.
 
+Note that even though this capability is enabled per-vcpu, the complete
+virtual machine is affected.
+
 6.5 KVM_CAP_PPC_EPR
 
 Architectures: ppc
+Target: vcpu
 Parameters: args[0] defines whether the proxy facility is active
 Returns: 0 on success; -1 on error
 
@@ -3007,6 +3018,7 @@ This capability connects the vcpu to an in-kernel MPIC 
device.
 6.7 KVM_CAP_IRQ_XICS
 
 Architectures: ppc
+Target: vcpu
 Parameters: args[0] is the XICS device fd
 args[1] is the XICS CPU number (server ID) for this vcpu
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 1/9] KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block

2014-07-21 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

This patch cleans up the code in handle_wait by reusing the common code
function kvm_vcpu_block.

signal_pending(), kvm_cpu_has_pending_timer() and kvm_arch_vcpu_runnable() are
sufficient for checking if we need to wake-up that VCPU. kvm_vcpu_block
uses these functions, so no checks are lost.

The flag timer_due can be removed - kvm_cpu_has_pending_timer() tests whether
the timer is pending, thus the vcpu is correctly woken up.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Acked-by: Christian Borntraeger borntrae...@de.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/include/asm/kvm_host.h |  1 -
 arch/s390/kvm/interrupt.c| 41 +---
 arch/s390/kvm/kvm-s390.c |  3 +++
 3 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index c2ba020..b3acf28 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -305,7 +305,6 @@ struct kvm_s390_local_interrupt {
struct list_head list;
atomic_t active;
struct kvm_s390_float_interrupt *float_int;
-   int timer_due; /* event indicator for waitqueue below */
wait_queue_head_t *wq;
atomic_t *cpuflags;
unsigned int action_bits;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 90c8de2..5fd11ce 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -585,60 +585,32 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
 {
u64 now, sltime;
-   DECLARE_WAITQUEUE(wait, current);
 
vcpu-stat.exit_wait_state++;
-   if (kvm_cpu_has_interrupt(vcpu))
-   return 0;
 
-   __set_cpu_idle(vcpu);
-   spin_lock_bh(vcpu-arch.local_int.lock);
-   vcpu-arch.local_int.timer_due = 0;
-   spin_unlock_bh(vcpu-arch.local_int.lock);
+   /* fast path */
+   if (kvm_cpu_has_pending_timer(vcpu) || kvm_arch_vcpu_runnable(vcpu))
+   return 0;
 
if (psw_interrupts_disabled(vcpu)) {
VCPU_EVENT(vcpu, 3, %s, disabled wait);
-   __unset_cpu_idle(vcpu);
return -EOPNOTSUPP; /* disabled wait */
}
 
+   __set_cpu_idle(vcpu);
if (!ckc_interrupts_enabled(vcpu)) {
VCPU_EVENT(vcpu, 3, %s, enabled wait w/o timer);
goto no_timer;
}
 
now = get_tod_clock_fast() + vcpu-arch.sie_block-epoch;
-   if (vcpu-arch.sie_block-ckc  now) {
-   __unset_cpu_idle(vcpu);
-   return 0;
-   }
-
sltime = tod_to_ns(vcpu-arch.sie_block-ckc - now);
-
hrtimer_start(vcpu-arch.ckc_timer, ktime_set (0, sltime) , 
HRTIMER_MODE_REL);
VCPU_EVENT(vcpu, 5, enabled wait via clock comparator: %llx ns, 
sltime);
 no_timer:
srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
-   spin_lock(vcpu-arch.local_int.float_int-lock);
-   spin_lock_bh(vcpu-arch.local_int.lock);
-   add_wait_queue(vcpu-wq, wait);
-   while (list_empty(vcpu-arch.local_int.list) 
-   list_empty(vcpu-arch.local_int.float_int-list) 
-   (!vcpu-arch.local_int.timer_due) 
-   !signal_pending(current) 
-   !kvm_s390_si_ext_call_pending(vcpu)) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   spin_unlock_bh(vcpu-arch.local_int.lock);
-   spin_unlock(vcpu-arch.local_int.float_int-lock);
-   schedule();
-   spin_lock(vcpu-arch.local_int.float_int-lock);
-   spin_lock_bh(vcpu-arch.local_int.lock);
-   }
+   kvm_vcpu_block(vcpu);
__unset_cpu_idle(vcpu);
-   __set_current_state(TASK_RUNNING);
-   remove_wait_queue(vcpu-wq, wait);
-   spin_unlock_bh(vcpu-arch.local_int.lock);
-   spin_unlock(vcpu-arch.local_int.float_int-lock);
vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
 
hrtimer_try_to_cancel(vcpu-arch.ckc_timer);
@@ -649,11 +621,8 @@ void kvm_s390_tasklet(unsigned long parm)
 {
struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
 
-   spin_lock(vcpu-arch.local_int.lock);
-   vcpu-arch.local_int.timer_due = 1;
if (waitqueue_active(vcpu-wq))
wake_up_interruptible(vcpu-wq);
-   spin_unlock(vcpu-arch.local_int.lock);
 }
 
 /*
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index fdf88f7..ecb1357 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1068,6 +1068,9 @@ retry:
goto retry;
}
 
+   /* nothing to do, just clear the request */
+   clear_bit(KVM_REQ_UNHALT, vcpu-requests);
+
return 0;
 }
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

[GIT PULL 5/9] KVM: s390: remove the tasklet used by the hrtimer

2014-07-21 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

We can get rid of the tasklet used for waking up a VCPU in the hrtimer
code but wakeup the VCPU directly.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/include/asm/kvm_host.h |  1 -
 arch/s390/kvm/interrupt.c| 13 +
 arch/s390/kvm/kvm-s390.c |  2 --
 arch/s390/kvm/kvm-s390.h |  1 -
 4 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b3acf28..773bef7 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -366,7 +366,6 @@ struct kvm_vcpu_arch {
s390_fp_regs  guest_fpregs;
struct kvm_s390_local_interrupt local_int;
struct hrtimerckc_timer;
-   struct tasklet_struct tasklet;
struct kvm_s390_pgm_info pgm;
union  {
struct cpuidcpu_id;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 65396e1..1be3d8d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -629,23 +629,12 @@ void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu)
}
 }
 
-void kvm_s390_tasklet(unsigned long parm)
-{
-   struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
-   kvm_s390_vcpu_wakeup(vcpu);
-}
-
-/*
- * low level hrtimer wake routine. Because this runs in hardirq context
- * we schedule a tasklet to do the real work.
- */
 enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer)
 {
struct kvm_vcpu *vcpu;
 
vcpu = container_of(timer, struct kvm_vcpu, arch.ckc_timer);
-   vcpu-preempted = true;
-   tasklet_schedule(vcpu-arch.tasklet);
+   kvm_s390_vcpu_wakeup(vcpu);
 
return HRTIMER_NORESTART;
 }
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b29a031..dd902e6 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -649,8 +649,6 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
return rc;
}
hrtimer_init(vcpu-arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
-   tasklet_init(vcpu-arch.tasklet, kvm_s390_tasklet,
-(unsigned long) vcpu);
vcpu-arch.ckc_timer.function = kvm_s390_idle_wakeup;
get_cpu_id(vcpu-arch.cpu_id);
vcpu-arch.cpu_id.version = 0xff;
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 665eacc..3862fa2 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -138,7 +138,6 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm 
*kvm)
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
 void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
 enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer);
-void kvm_s390_tasklet(unsigned long parm);
 void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
 void kvm_s390_deliver_pending_machine_checks(struct kvm_vcpu *vcpu);
 void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu);
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 2/9] KVM: s390: remove _bh locking from local_int.lock

2014-07-21 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

local_int.lock is not used in a bottom-half handler anymore, therefore we can
turn it into an ordinary spin_lock at all occurrences.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/interrupt.c | 32 
 arch/s390/kvm/kvm-s390.c  |  4 ++--
 arch/s390/kvm/sigp.c  | 20 ++--
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 5fd11ce..86575b4 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -544,13 +544,13 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu)
int rc = 0;
 
if (atomic_read(li-active)) {
-   spin_lock_bh(li-lock);
+   spin_lock(li-lock);
list_for_each_entry(inti, li-list, list)
if (__interrupt_is_deliverable(vcpu, inti)) {
rc = 1;
break;
}
-   spin_unlock_bh(li-lock);
+   spin_unlock(li-lock);
}
 
if ((!rc)  atomic_read(fi-active)) {
@@ -645,13 +645,13 @@ void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu)
struct kvm_s390_local_interrupt *li = vcpu-arch.local_int;
struct kvm_s390_interrupt_info  *n, *inti = NULL;
 
-   spin_lock_bh(li-lock);
+   spin_lock(li-lock);
list_for_each_entry_safe(inti, n, li-list, list) {
list_del(inti-list);
kfree(inti);
}
atomic_set(li-active, 0);
-   spin_unlock_bh(li-lock);
+   spin_unlock(li-lock);
 
/* clear pending external calls set by sigp interpretation facility */
atomic_clear_mask(CPUSTAT_ECALL_PEND, vcpu-arch.sie_block-cpuflags);
@@ -670,7 +670,7 @@ void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu 
*vcpu)
if (atomic_read(li-active)) {
do {
deliver = 0;
-   spin_lock_bh(li-lock);
+   spin_lock(li-lock);
list_for_each_entry_safe(inti, n, li-list, list) {
if (__interrupt_is_deliverable(vcpu, inti)) {
list_del(inti-list);
@@ -681,7 +681,7 @@ void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu 
*vcpu)
}
if (list_empty(li-list))
atomic_set(li-active, 0);
-   spin_unlock_bh(li-lock);
+   spin_unlock(li-lock);
if (deliver) {
__do_deliver_interrupt(vcpu, inti);
kfree(inti);
@@ -727,7 +727,7 @@ void kvm_s390_deliver_pending_machine_checks(struct 
kvm_vcpu *vcpu)
if (atomic_read(li-active)) {
do {
deliver = 0;
-   spin_lock_bh(li-lock);
+   spin_lock(li-lock);
list_for_each_entry_safe(inti, n, li-list, list) {
if ((inti-type == KVM_S390_MCHK) 
__interrupt_is_deliverable(vcpu, inti)) {
@@ -739,7 +739,7 @@ void kvm_s390_deliver_pending_machine_checks(struct 
kvm_vcpu *vcpu)
}
if (list_empty(li-list))
atomic_set(li-active, 0);
-   spin_unlock_bh(li-lock);
+   spin_unlock(li-lock);
if (deliver) {
__do_deliver_interrupt(vcpu, inti);
kfree(inti);
@@ -786,11 +786,11 @@ int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, 
u16 code)
 
VCPU_EVENT(vcpu, 3, inject: program check %d (from kernel), code);
trace_kvm_s390_inject_vcpu(vcpu-vcpu_id, inti-type, code, 0, 1);
-   spin_lock_bh(li-lock);
+   spin_lock(li-lock);
list_add(inti-list, li-list);
atomic_set(li-active, 1);
BUG_ON(waitqueue_active(li-wq));
-   spin_unlock_bh(li-lock);
+   spin_unlock(li-lock);
return 0;
 }
 
@@ -811,11 +811,11 @@ int kvm_s390_inject_prog_irq(struct kvm_vcpu *vcpu,
 
inti-type = KVM_S390_PROGRAM_INT;
memcpy(inti-pgm, pgm_info, sizeof(inti-pgm));
-   spin_lock_bh(li-lock);
+   spin_lock(li-lock);
list_add(inti-list, li-list);
atomic_set(li-active, 1);
BUG_ON(waitqueue_active(li-wq));
-   spin_unlock_bh(li-lock);
+   spin_unlock(li-lock);
return 0;
 }
 
@@ -903,12 +903,12 @@ static int __inject_vm(struct kvm *kvm, struct 
kvm_s390_interrupt_info *inti)
}
dst_vcpu = kvm_get_vcpu(kvm, 

[GIT PULL 8/9] KVM: s390: advertise KVM_CAP_S390_IRQCHIP

2014-07-21 Thread Christian Borntraeger
From: Cornelia Huck cornelia.h...@de.ibm.com

We should advertise all capabilities, including those that can
be enabled.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Acked-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index dd902e6..339b34a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -166,6 +166,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_IOEVENTFD:
case KVM_CAP_DEVICE_CTRL:
case KVM_CAP_ENABLE_CAP_VM:
+   case KVM_CAP_S390_IRQCHIP:
case KVM_CAP_VM_ATTRIBUTES:
case KVM_CAP_MP_STATE:
r = 1;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 3/9] KVM: s390: remove _bh locking from start_stop_lock

2014-07-21 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

The start_stop_lock is no longer acquired when in atomic context, therefore we
can convert it into an ordinary spin_lock.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index a7bda18..b29a031 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1478,7 +1478,7 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
 
trace_kvm_s390_vcpu_start_stop(vcpu-vcpu_id, 1);
/* Only one cpu at a time may enter/leave the STOPPED state. */
-   spin_lock_bh(vcpu-kvm-arch.start_stop_lock);
+   spin_lock(vcpu-kvm-arch.start_stop_lock);
online_vcpus = atomic_read(vcpu-kvm-online_vcpus);
 
for (i = 0; i  online_vcpus; i++) {
@@ -1504,7 +1504,7 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
 * Let's play safe and flush the VCPU at startup.
 */
vcpu-arch.sie_block-ihcpu  = 0x;
-   spin_unlock_bh(vcpu-kvm-arch.start_stop_lock);
+   spin_unlock(vcpu-kvm-arch.start_stop_lock);
return;
 }
 
@@ -1518,7 +1518,7 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
 
trace_kvm_s390_vcpu_start_stop(vcpu-vcpu_id, 0);
/* Only one cpu at a time may enter/leave the STOPPED state. */
-   spin_lock_bh(vcpu-kvm-arch.start_stop_lock);
+   spin_lock(vcpu-kvm-arch.start_stop_lock);
online_vcpus = atomic_read(vcpu-kvm-online_vcpus);
 
/* Need to lock access to action_bits to avoid a SIGP race condition */
@@ -1547,7 +1547,7 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
__enable_ibs_on_vcpu(started_vcpu);
}
 
-   spin_unlock_bh(vcpu-kvm-arch.start_stop_lock);
+   spin_unlock(vcpu-kvm-arch.start_stop_lock);
return;
 }
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 0/9] KVM: s390: Fixes and cleanups for 3.17

2014-07-21 Thread Christian Borntraeger
Paolo,

this should be the last bunch of s390 patches for 3.17 (on top of the
mp_state changes). Please consider to apply.

The following changes since commit 6352e4d2dd9a349024a41356148eced553e1dce4:

  KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control 
(2014-07-10 14:11:17 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-20140721

for you to fetch changes up to e59d120f96687a606db0513c427f10e30a427cc4:

  KVM: s390: add ipte to trace event decoding (2014-07-21 13:22:47 +0200)


Bugfixes

- add IPTE to trace event decoder
- document and advertise KVM_CAP_S390_IRQCHIP

Cleanups

- Reuse kvm_vcpu_block for s390
- Get rid of tasklet for wakup processing


Christian Borntraeger (1):
  KVM: s390: add ipte to trace event decoding

Cornelia Huck (3):
  KVM: document target of capability enablement
  KVM: s390: document KVM_CAP_S390_IRQCHIP
  KVM: s390: advertise KVM_CAP_S390_IRQCHIP

David Hildenbrand (5):
  KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block
  KVM: s390: remove _bh locking from local_int.lock
  KVM: s390: remove _bh locking from start_stop_lock
  KVM: s390: move vcpu wakeup code to a central point
  KVM: s390: remove the tasklet used by the hrtimer

 Documentation/virtual/kvm/api.txt |  27 --
 arch/s390/include/asm/kvm_host.h  |   2 -
 arch/s390/include/uapi/asm/sie.h  |   1 +
 arch/s390/kvm/interrupt.c | 100 --
 arch/s390/kvm/kvm-s390.c  |  18 ---
 arch/s390/kvm/kvm-s390.h  |   2 +-
 arch/s390/kvm/sigp.c  |  36 ++
 7 files changed, 82 insertions(+), 104 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 4/9] KVM: s390: move vcpu wakeup code to a central point

2014-07-21 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

Let's move the vcpu wakeup code to a central point.

We should set the vcpu-preempted flag only if the target is actually sleeping
and before the real wakeup happens. Otherwise the preempted flag might be set,
when not necessary. This may result in immediate reschedules after schedule()
in some scenarios.

The wakeup code doesn't require the local_int.lock to be held.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/interrupt.c | 24 +++-
 arch/s390/kvm/kvm-s390.h  |  1 +
 arch/s390/kvm/sigp.c  | 20 ++--
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 86575b4..65396e1 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -617,12 +617,22 @@ no_timer:
return 0;
 }
 
+void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu)
+{
+   if (waitqueue_active(vcpu-wq)) {
+   /*
+* The vcpu gave up the cpu voluntarily, mark it as a good
+* yield-candidate.
+*/
+   vcpu-preempted = true;
+   wake_up_interruptible(vcpu-wq);
+   }
+}
+
 void kvm_s390_tasklet(unsigned long parm)
 {
struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
-
-   if (waitqueue_active(vcpu-wq))
-   wake_up_interruptible(vcpu-wq);
+   kvm_s390_vcpu_wakeup(vcpu);
 }
 
 /*
@@ -905,10 +915,8 @@ static int __inject_vm(struct kvm *kvm, struct 
kvm_s390_interrupt_info *inti)
li = dst_vcpu-arch.local_int;
spin_lock(li-lock);
atomic_set_mask(CPUSTAT_EXT_INT, li-cpuflags);
-   if (waitqueue_active(li-wq))
-   wake_up_interruptible(li-wq);
-   kvm_get_vcpu(kvm, sigcpu)-preempted = true;
spin_unlock(li-lock);
+   kvm_s390_vcpu_wakeup(kvm_get_vcpu(kvm, sigcpu));
 unlock_fi:
spin_unlock(fi-lock);
mutex_unlock(kvm-lock);
@@ -1059,11 +1067,9 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
if (inti-type == KVM_S390_SIGP_STOP)
li-action_bits |= ACTION_STOP_ON_STOP;
atomic_set_mask(CPUSTAT_EXT_INT, li-cpuflags);
-   if (waitqueue_active(vcpu-wq))
-   wake_up_interruptible(vcpu-wq);
-   vcpu-preempted = true;
spin_unlock(li-lock);
mutex_unlock(vcpu-kvm-lock);
+   kvm_s390_vcpu_wakeup(vcpu);
return 0;
 }
 
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 33a0e4b..665eacc 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -136,6 +136,7 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm 
*kvm)
 }
 
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
+void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
 enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer);
 void kvm_s390_tasklet(unsigned long parm);
 void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index 946992f..c6f1c2b 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -125,8 +125,9 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 
cpu_addr)
return rc ? rc : SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static int __inject_sigp_stop(struct kvm_s390_local_interrupt *li, int action)
+static int __inject_sigp_stop(struct kvm_vcpu *dst_vcpu, int action)
 {
+   struct kvm_s390_local_interrupt *li = dst_vcpu-arch.local_int;
struct kvm_s390_interrupt_info *inti;
int rc = SIGP_CC_ORDER_CODE_ACCEPTED;
 
@@ -151,8 +152,7 @@ static int __inject_sigp_stop(struct 
kvm_s390_local_interrupt *li, int action)
atomic_set(li-active, 1);
li-action_bits |= action;
atomic_set_mask(CPUSTAT_STOP_INT, li-cpuflags);
-   if (waitqueue_active(li-wq))
-   wake_up_interruptible(li-wq);
+   kvm_s390_vcpu_wakeup(dst_vcpu);
 out:
spin_unlock(li-lock);
 
@@ -161,7 +161,6 @@ out:
 
 static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action)
 {
-   struct kvm_s390_local_interrupt *li;
struct kvm_vcpu *dst_vcpu = NULL;
int rc;
 
@@ -171,9 +170,8 @@ static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, 
int action)
dst_vcpu = kvm_get_vcpu(vcpu-kvm, cpu_addr);
if (!dst_vcpu)
return SIGP_CC_NOT_OPERATIONAL;
-   li = dst_vcpu-arch.local_int;
 
-   rc = __inject_sigp_stop(li, action);
+   rc = __inject_sigp_stop(dst_vcpu, action);
 
VCPU_EVENT(vcpu, 4, sent sigp stop to cpu %x, cpu_addr);
 
@@ -258,8 +256,7 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
cpu_addr, u32 address,
 
list_add_tail(inti-list, li-list);
atomic_set(li-active, 1);
-   if (waitqueue_active(li-wq))
- 

Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)

2014-07-21 Thread Xiao Guangrong


Hi Marcelo,

On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote:

 struct kvm_vcpu_arch {
   /*
* rip and regs accesses must go through
 @@ -392,6 +402,9 @@
   struct kvm_mmu_memory_cache mmu_page_cache;
   struct kvm_mmu_memory_cache mmu_page_header_cache;
 
 + struct list_head pinned_mmu_pages;
 + atomic_t nr_pinned_ranges;
 +

I’m not sure per-cpu pinned list is a good idea, since currently all vcpu are 
using the same
page table, the per-list can not reduce lock-contention, why not make it be 
global to vm
instead?

   struct fpu guest_fpu;
   u64 xcr0;
   u64 guest_supported_xcr0;
 Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c
 ===
 --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c  2014-07-09 12:05:34.837161264 
 -0300
 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c   2014-07-09 12:09:21.856684314 
 -0300
 @@ -148,6 +148,9 @@
 
 #define SPTE_HOST_WRITEABLE   (1ULL  PT_FIRST_AVAIL_BITS_SHIFT)
 #define SPTE_MMU_WRITEABLE(1ULL  (PT_FIRST_AVAIL_BITS_SHIFT + 1))
 +#define SPTE_PINNED  (1ULL  (PT64_SECOND_AVAIL_BITS_SHIFT))
 +
 +#define SPTE_PINNED_BIT PT64_SECOND_AVAIL_BITS_SHIFT
 
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 @@ -327,6 +330,11 @@
   return pte  PT_PRESENT_MASK  !is_mmio_spte(pte);
 }
 
 +static int is_pinned_spte(u64 spte)
 +{
 + return spte  SPTE_PINNED  is_shadow_present_pte(spte);
 +}
 +
 static int is_large_pte(u64 pte)
 {
   return pte  PT_PAGE_SIZE_MASK;
 @@ -1176,6 +1184,16 @@
   kvm_flush_remote_tlbs(vcpu-kvm);
 }
 
 +static bool vcpu_has_pinned(struct kvm_vcpu *vcpu)
 +{
 + return atomic_read(vcpu-arch.nr_pinned_ranges);
 +}
 +
 +static void mmu_reload_pinned_vcpus(struct kvm *kvm)
 +{
 + make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, vcpu_has_pinned);
 +}
 +
 /*
 * Write-protect on the specified @sptep, @pt_protect indicates whether
 * spte write-protection is caused by protecting shadow page table.
 @@ -1268,7 +1286,8 @@
 }
 
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 -struct kvm_memory_slot *slot, unsigned long data)
 +struct kvm_memory_slot *slot, unsigned long data,
 +bool age)
 {
   u64 *sptep;
   struct rmap_iterator iter;
 @@ -1278,6 +1297,14 @@
   BUG_ON(!(*sptep  PT_PRESENT_MASK));
   rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, sptep, 
 *sptep);
 
 + if (is_pinned_spte(*sptep)) {
 + /* don't nuke pinned sptes if page aging: return
 +  * young=yes instead.
 +  */
 + if (age)
 + return 1;
 + mmu_reload_pinned_vcpus(kvm);
 + }
   drop_spte(kvm, step);

This has a window between zapping spte and re-pin spte, so guest will fail
at this time.

   need_tlb_flush = 1;
   }
 @@ -1286,7 +1313,8 @@
 }
 
 static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 -  struct kvm_memory_slot *slot, unsigned long data)
 +  struct kvm_memory_slot *slot, unsigned long data,
 +  bool age)
 {
   u64 *sptep;
   struct rmap_iterator iter;
 @@ -1304,6 +1332,9 @@
 
   need_flush = 1;
 
 + if (is_pinned_spte(*sptep))
 + mmu_reload_pinned_vcpus(kvm);
 +
   if (pte_write(*ptep)) {
   drop_spte(kvm, sptep);
   sptep = rmap_get_first(*rmapp, iter);
 @@ -1334,7 +1365,8 @@
   int (*handler)(struct kvm *kvm,
  unsigned long *rmapp,
  struct kvm_memory_slot *slot,
 -unsigned long data))
 +unsigned long data,
 +bool age))
 {
   int j;
   int ret = 0;
 @@ -1374,7 +1406,7 @@
   rmapp = __gfn_to_rmap(gfn_start, j, memslot);
 
   for (; idx = idx_end; ++idx)
 - ret |= handler(kvm, rmapp++, memslot, data);
 + ret |= handler(kvm, rmapp++, memslot, data, 
 false);
   }
   }
 
 @@ -1385,7 +1417,8 @@
 unsigned long data,
 int (*handler)(struct kvm *kvm, unsigned long *rmapp,
struct kvm_memory_slot *slot,
 -  unsigned long data))
 +  unsigned long data,
 +  bool age))
 {
   return kvm_handle_hva_range(kvm, hva, hva + 1, data, handler);
 }
 @@ -1406,7 +1439,8 @@
 }
 
 static int kvm_age_rmapp(struct kvm 

Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path

2014-07-21 Thread Xiao Guangrong

On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote:

 Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before
 deleting a pinned spte.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 ---
 arch/x86/kvm/mmu.c |   29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)
 
 Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c
 ===
 --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c  2014-07-09 11:23:59.290744490 
 -0300
 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c   2014-07-09 11:24:58.449632435 
 -0300
 @@ -1208,7 +1208,8 @@
 *
 * Return true if tlb need be flushed.
 */
 -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect)
 +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect,
 +bool skip_pinned)
 {
   u64 spte = *sptep;
 
 @@ -1218,6 +1219,22 @@
 
   rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep);
 
 + if (is_pinned_spte(spte)) {
 + /* keep pinned spte intact, mark page dirty again */
 + if (skip_pinned) {
 + struct kvm_mmu_page *sp;
 + gfn_t gfn;
 +
 + sp = page_header(__pa(sptep));
 + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
 +
 + mark_page_dirty(kvm, gfn);
 + return false;
 + } else
 + mmu_reload_pinned_vcpus(kvm);
 + }
 +
 +
   if (pt_protect)
   spte = ~SPTE_MMU_WRITEABLE;
   spte = spte  ~PT_WRITABLE_MASK;

This is also a window between marking spte readonly and re-ping…
IIUC, I think all spte spte can not be zapped and write-protected at any time

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] KVM: MMU: pinned sps are not candidates for deletion.

2014-07-21 Thread Xiao Guangrong

On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote:

 Skip pinned shadow pages when selecting pages to zap.

It seems there is no way to prevent changing pinned spte on
zap-all path?

I am thing if we could move pinned spte to another list (eg. 
pinned_shadow_pages)
instead of active list so that it can not be touched by any other free paths. 
Your idea?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 V3] PCI: introduce device assignment interface and refactory related code

2014-07-21 Thread Alexander Duyck
On 07/11/2014 05:30 AM, Ethan Zhao wrote:
 This patch introduces two new device assignment functions
 
 pci_iov_assign_device(),
 pci_iov_deassign_device()
 
 along with the existed one
 
 pci_vfs_assigned()
 
 They construct the VFs assignment management interface, used to assign/
 deassign device to VM and query the VFs reference counter. instead of
 direct manipulation of device flag.
 
 This patch refashioned the related code and make them atomic.
 
 v3: change the naming of device assignment helpers, because they work
 for all kind of PCI device, not only SR-IOV (david.vra...@citrix.com)
 
 v2: reorder the patchset and make it bisectable and atomic, steps clear
 between interface defination and implemenation according to the
 suggestion from alex.william...@redhat.com
 
 Signed-off-by: Ethan Zhao ethan.z...@oracle.com
 ---
  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   17 ++---
  drivers/pci/iov.c  |   20 
 
  drivers/xen/xen-pciback/pci_stub.c |4 ++--
  include/linux/pci.h|4 
  virt/kvm/assigned-dev.c|2 +-
  virt/kvm/iommu.c   |4 ++--
  6 files changed, 31 insertions(+), 20 deletions(-)
 
 diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
 b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
 index 02c11a7..781040e 100644
 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
 +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
 @@ -693,22 +693,9 @@ complete_reset:
  static bool i40e_vfs_are_assigned(struct i40e_pf *pf)
  {
   struct pci_dev *pdev = pf-pdev;
 - struct pci_dev *vfdev;
 -
 - /* loop through all the VFs to see if we own any that are assigned */
 - vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_VF , NULL);
 - while (vfdev) {
 - /* if we don't own it we don't care */
 - if (vfdev-is_virtfn  pci_physfn(vfdev) == pdev) {
 - /* if it is assigned we cannot release it */
 - if (vfdev-dev_flags  PCI_DEV_FLAGS_ASSIGNED)
 - return true;
 - }
  
 - vfdev = pci_get_device(PCI_VENDOR_ID_INTEL,
 -I40E_DEV_ID_VF,
 -vfdev);
 - }
 + if (pci_vfs_assigned(pdev))
 + return true;
  
   return false;
  }

This portion for i40e should be in one patch by itself.  It shouldn't be
included in the bits below.  Normally this would go through netdev.  The
rest of this below would go through linux-pci.

 diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
 index de7a747..090f827 100644
 --- a/drivers/pci/iov.c
 +++ b/drivers/pci/iov.c
 @@ -644,6 +644,26 @@ int pci_vfs_assigned(struct pci_dev *dev)
  EXPORT_SYMBOL_GPL(pci_vfs_assigned);
  
  /**
 + * pci_iov_assign_device - assign device to VM
 + * @pdev: the device to be assigned.
 + */
 +void pci_iov_assign_device(struct pci_dev *pdev)
 +{
 + pdev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
 +}
 +EXPORT_SYMBOL_GPL(pci_iov_assign_device);
 +
 +/**
 + * pci_iov_deassign_device - deasign device from VM
 + * @pdev: the device to be deassigned.
 + */
 +void pci_iov_deassign_device(struct pci_dev *pdev)
 +{
 + pdev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED;
 +}
 +EXPORT_SYMBOL_GPL(pci_iov_deassign_device);
 +
 +/**
   * pci_sriov_set_totalvfs -- reduce the TotalVFs available
   * @dev: the PCI PF device
   * @numvfs: number that should be used for TotalVFs supported

The two functions above don't have anything to do with IOV.  You can
direct assign a device that doesn't even support SR-IOV or MR-IOV.  You
might be better off defining this as something like
pci_set_flag_assigned/pci_clear_flag_assigned.  I would likely also
make them inline and possibly move them to pci.h since it would likely
result in less actual code after you consider the overhead to push
everything on the stack prior to making the call.

 diff --git a/drivers/xen/xen-pciback/pci_stub.c 
 b/drivers/xen/xen-pciback/pci_stub.c
 index 62fcd48..27e00d1 100644
 --- a/drivers/xen/xen-pciback/pci_stub.c
 +++ b/drivers/xen/xen-pciback/pci_stub.c
 @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref)
   xen_pcibk_config_free_dyn_fields(dev);
   xen_pcibk_config_free_dev(dev);
  
 - dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED;
 + pci_iov_deassign_device(dev);
   pci_dev_put(dev);
  
   kfree(psdev);
 @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev)
   dev_dbg(dev-dev, reset device\n);
   xen_pcibk_reset_device(dev);
  
 - dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
 + pci_iov_assign_device(dev);
   return 0;
  
  config_release:
 diff --git a/include/linux/pci.h b/include/linux/pci.h
 index aab57b4..5ece6d6 100644
 --- a/include/linux/pci.h
 +++ b/include/linux/pci.h
 @@ -1603,6 

buildbot failure in kvm on ia64

2014-07-21 Thread kvm
The Buildbot has detected a new failure on builder ia64 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/ia64/builds/1315

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_master' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed git

sincerely,
 -The Buildbot



Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)

2014-07-21 Thread Xiao Guangrong
On 07/22/2014 05:46 AM, Xiao Guangrong wrote:

 +if (is_pinned_spte(*sptep)) {
 +/* don't nuke pinned sptes if page aging: return
 + * young=yes instead.
 + */
 +if (age)
 +return 1;
 +mmu_reload_pinned_vcpus(kvm);
 +}
  drop_spte(kvm, step);
 
 This has a window between zapping spte and re-pin spte, so guest will fail
 at this time.

I got it, mmu_reload_pinned_vcpus will kick all vcpus out of guest and
pin the pages again... so it is ok. :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Mike Qiu

On 06/25/2014 11:19 AM, Benjamin Herrenschmidt wrote:

On Wed, 2014-06-25 at 11:05 +0800, Mike Qiu wrote:

Here maybe  /sys/kernel/debug/powerpc/errinjct is better, because
it
will supply PCI_domain_nr in parameters, so no need supply errinjct
for each PCI domain.

Another reason is error inject not only for PCI(in future), so better
not in PCI domain entry.

Also it simple for userland tools to has a fixed path.

I don't like this. I much prefer have dedicated error injection files
in their respective locations, something for PCI under the corresponding
PCI bridge etc...


So PowerNV error injection will be designed rely on debugfs been 
configured, right?


Thanks,
Mike


Cheers,
Ben.






--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail

2014-07-21 Thread mihai.cara...@freescale.com
 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+mihai.caraman=freescale@lists.ozlabs.org] On Behalf Of
 mihai.cara...@freescale.com
 Sent: Friday, July 18, 2014 12:06 PM
 To: Alexander Graf; kvm-ppc@vger.kernel.org
 Cc: linuxppc-...@lists.ozlabs.org; k...@vger.kernel.org
 Subject: RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, July 17, 2014 5:21 PM
  To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org
  Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org
  Subject: Re: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to
 fail
 
 
  On 17.07.14 13:22, Mihai Caraman wrote:
   On book3e, guest last instruction is read on the exit path using load
   external pid (lwepx) dedicated instruction. This load operation may
  fail
   due to TLB eviction and execute-but-not-read entries.
  
   This patch lay down the path for an alternative solution to read the
  guest
   last instruction, by allowing kvmppc_get_lat_inst() function to fail.
   Architecture specific implmentations of kvmppc_load_last_inst() may
  read
   last guest instruction and instruct the emulation layer to re-execute
  the
   guest in case of failure.
  
   Make kvmppc_get_last_inst() definition common between architectures.
  
   Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
   ---
 
 ...
 
   diff --git a/arch/powerpc/include/asm/kvm_ppc.h
  b/arch/powerpc/include/asm/kvm_ppc.h
   index e2fd5a1..7f9c634 100644
   --- a/arch/powerpc/include/asm/kvm_ppc.h
   +++ b/arch/powerpc/include/asm/kvm_ppc.h
   @@ -47,6 +47,11 @@ enum emulation_result {
 EMULATE_EXIT_USER,/* emulation requires exit to user-
 space */
 };
  
   +enum instruction_type {
   + INST_GENERIC,
   + INST_SC,/* system call */
   +};
   +
 extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu
  *vcpu);
 extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
 kvm_vcpu
  *vcpu);
 extern void kvmppc_handler_highmem(void);
   @@ -62,6 +67,9 @@ extern int kvmppc_handle_store(struct kvm_run *run,
  struct kvm_vcpu *vcpu,
u64 val, unsigned int bytes,
int is_default_endian);
  
   +extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
   +  enum instruction_type type, u32 *inst);
   +
 extern int kvmppc_emulate_instruction(struct kvm_run *run,
   struct kvm_vcpu *vcpu);
 extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu
  *vcpu);
   @@ -234,6 +242,23 @@ struct kvmppc_ops {
 extern struct kvmppc_ops *kvmppc_hv_ops;
 extern struct kvmppc_ops *kvmppc_pr_ops;
  
   +static inline int kvmppc_get_last_inst(struct kvm_vcpu *vcpu,
   + enum instruction_type type, u32 *inst)
   +{
   + int ret = EMULATE_DONE;
   +
   + /* Load the instruction manually if it failed to do so in the
   +  * exit path */
   + if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED)
   + ret = kvmppc_load_last_inst(vcpu, type, vcpu-
  arch.last_inst);
   +
   +
   + *inst = (ret == EMULATE_DONE  kvmppc_need_byteswap(vcpu)) ?
   + swab32(vcpu-arch.last_inst) : vcpu-arch.last_inst;
 
  This makes even less sense than the previous version. Either you treat
  inst as definitely overwritten or as preserves previous data on
  failure.
 
 Both v4 and v5 versions treat inst as definitely overwritten.
 
 
  So either you unconditionally swap like you did before
 
 If we make abstraction of its symmetry, KVM_INST_FETCH_FAILED is operated
 in host endianness, so it doesn't need byte swap.
 
 I agree with your reasoning if last_inst is initialized and compared with
 data in guest endianess, which is not the case yet for
 KVM_INST_FETCH_FAILED.

Alex, are you relying on the fact that KVM_INST_FETCH_FAILED value is 
symmetrical?
With a non symmetrical value like 0xDEADBEEF, and considering a little-endian 
guest
on a big-endian host, we need to fix kvm logic to initialize and compare 
last_inst
with 0xEFBEADDE swaped value.

Your suggestion to unconditionally swap makes sense only with the above fix, 
otherwise
inst may end up with 0xEFBEADDE swaped value with is wrong.

-Mike
N�r��yb�X��ǧv�^�)޺{.n�+jir)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

RE: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-21 Thread mihai.cara...@freescale.com
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, July 03, 2014 3:21 PM
 To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org
 Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org
 Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
 SPE/FP/AltiVec int numbers
 
 
 On 30.06.14 17:34, Mihai Caraman wrote:
  Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
  which share the same interrupt numbers.
 
  Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
  ---
  v2:
- remove outdated definitions
 
arch/powerpc/include/asm/kvm_asm.h|  8 
arch/powerpc/kvm/booke.c  | 17 +
arch/powerpc/kvm/booke.h  |  4 ++--
arch/powerpc/kvm/booke_interrupts.S   |  9 +
arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
arch/powerpc/kvm/e500.c   | 10 ++
arch/powerpc/kvm/e500_emulate.c   | 10 ++
7 files changed, 30 insertions(+), 32 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm_asm.h
 b/arch/powerpc/include/asm/kvm_asm.h
  index 9601741..c94fd33 100644
  --- a/arch/powerpc/include/asm/kvm_asm.h
  +++ b/arch/powerpc/include/asm/kvm_asm.h
  @@ -56,14 +56,6 @@
/* E500 */
#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
  -/*
  - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
 defines
  - */
  -#define BOOKE_INTERRUPT_SPE_UNAVAIL
 BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
  -#define BOOKE_INTERRUPT_SPE_FP_DATA
 BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
  -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
 BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
  -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
  -   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
 
 I think I'd prefer to keep them separate.
 
#define BOOKE_INTERRUPT_SPE_FP_ROUND 34
#define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
#define BOOKE_INTERRUPT_DOORBELL 36
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
  index ab62109..3c86d9b 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct
 kvm_vcpu *vcpu,
  case BOOKE_IRQPRIO_ITLB_MISS:
  case BOOKE_IRQPRIO_SYSCALL:
  case BOOKE_IRQPRIO_FP_UNAVAIL:
  -   case BOOKE_IRQPRIO_SPE_UNAVAIL:
  -   case BOOKE_IRQPRIO_SPE_FP_DATA:
  +   case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
  +   case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:
 
 #ifdef CONFIG_KVM_E500V2
case ...SPE:
 #else
case ..ALTIVEC:
 #endif
 
  case BOOKE_IRQPRIO_SPE_FP_ROUND:
  case BOOKE_IRQPRIO_AP_UNAVAIL:
  allowed = 1;
  @@ -977,18 +977,19 @@ int kvmppc_handle_exit(struct kvm_run *run,
 struct kvm_vcpu *vcpu,
  break;
 
#ifdef CONFIG_SPE
  -   case BOOKE_INTERRUPT_SPE_UNAVAIL: {
  +   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: {
  if (vcpu-arch.shared-msr  MSR_SPE)
  kvmppc_vcpu_enable_spe(vcpu);
  else
  kvmppc_booke_queue_irqprio(vcpu,
  -  BOOKE_IRQPRIO_SPE_UNAVAIL);
  +   BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL);
  r = RESUME_GUEST;
  break;
  }
 
  -   case BOOKE_INTERRUPT_SPE_FP_DATA:
  -   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA);
  +   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
  +   kvmppc_booke_queue_irqprio(vcpu,
  +   BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
  r = RESUME_GUEST;
  break;
 
  @@ -997,7 +998,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct
 kvm_vcpu *vcpu,
  r = RESUME_GUEST;
  break;
#else
  -   case BOOKE_INTERRUPT_SPE_UNAVAIL:
  +   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL:
  /*
   * Guest wants SPE, but host kernel doesn't support it.  Send
   * an unimplemented operation program check to the guest.
  @@ -1010,7 +1011,7 @@ int kvmppc_handle_exit(struct kvm_run *run,
 struct kvm_vcpu *vcpu,
   * These really should never happen without CONFIG_SPE,
   * as we should never enable the real MSR[SPE] in the guest.
   */
  -   case BOOKE_INTERRUPT_SPE_FP_DATA:
  +   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
  case BOOKE_INTERRUPT_SPE_FP_ROUND:
  printk(KERN_CRIT %s: unexpected SPE interrupt %u at
 %08lx\n,
 __func__, exit_nr, vcpu-arch.pc);
  diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
  index b632cd3..f182b32 100644
  --- a/arch/powerpc/kvm/booke.h
  +++ b/arch/powerpc/kvm/booke.h
  @@ -32,8 +32,8 @@
#define BOOKE_IRQPRIO_ALIGNMENT 2
#define BOOKE_IRQPRIO_PROGRAM 3
#define BOOKE_IRQPRIO_FP_UNAVAIL 4
  -#define BOOKE_IRQPRIO_SPE_UNAVAIL 5
  -#define BOOKE_IRQPRIO_SPE_FP_DATA 6
  +#define 

Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Benjamin Herrenschmidt
On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote:
  I don't like this. I much prefer have dedicated error injection files
  in their respective locations, something for PCI under the corresponding
  PCI bridge etc...
 
 So PowerNV error injection will be designed rely on debugfs been 
 configured, right?

Not necessarily. If we create a better debugfs layout for our PHBs, then
yes. It might be useful to provide more info in there for example access
to some of the counters ...

But on the other hand, for error injection in general, I wonder if we should
be under sysfs instead... something to study a bit.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Mike Qiu

On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote:

On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote:

I don't like this. I much prefer have dedicated error injection files
in their respective locations, something for PCI under the corresponding
PCI bridge etc...

So PowerNV error injection will be designed rely on debugfs been
configured, right?

Not necessarily. If we create a better debugfs layout for our PHBs, then
yes. It might be useful to provide more info in there for example access
to some of the counters ...

But on the other hand, for error injection in general, I wonder if we should
be under sysfs instead... something to study a bit.


In pHyp, general error injection use syscall:

#define __NR_rtas255

I don't know if it is a good idea to reuse this syscall for PowerNV.

At least, it is another choice without sysfs rely.

Thanks,
Mike



Cheers,
Ben.





--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Benjamin Herrenschmidt
On Tue, 2014-07-22 at 11:10 +0800, Mike Qiu wrote:
 On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote:
  On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote:
  I don't like this. I much prefer have dedicated error injection files
  in their respective locations, something for PCI under the corresponding
  PCI bridge etc...
  So PowerNV error injection will be designed rely on debugfs been
  configured, right?
  Not necessarily. If we create a better debugfs layout for our PHBs, then
  yes. It might be useful to provide more info in there for example access
  to some of the counters ...
 
  But on the other hand, for error injection in general, I wonder if we should
  be under sysfs instead... something to study a bit.
 
 In pHyp, general error injection use syscall:
 
  #define __NR_rtas255
 
 I don't know if it is a good idea to reuse this syscall for PowerNV.
 
 At least, it is another choice without sysfs rely.

No, we certainly don't want that RTAS stuff. I though Linux had some
kind of error injection infrastructure nowadays... somebody needs to
have a look.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Gavin Shan
On Tue, Jul 22, 2014 at 11:10:42AM +0800, Mike Qiu wrote:
On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote:
On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote:
I don't like this. I much prefer have dedicated error injection files
in their respective locations, something for PCI under the corresponding
PCI bridge etc...
So PowerNV error injection will be designed rely on debugfs been
configured, right?
Not necessarily. If we create a better debugfs layout for our PHBs, then
yes. It might be useful to provide more info in there for example access
to some of the counters ...

But on the other hand, for error injection in general, I wonder if we should
be under sysfs instead... something to study a bit.

In pHyp, general error injection use syscall:

#define __NR_rtas255

I don't know if it is a good idea to reuse this syscall for PowerNV.

At least, it is another choice without sysfs rely.


We won't use syscall for routing the error injection on PowerNV any more.
Generally speaking, we will use ioctl commands or subcode of EEH ioctl
command, which was invented for EEH support for VFIO devices to suport
QEMU. For the utility (errinjct) running on PowerNV, we will use debugfs
entries. I have premature code for that, but don't have chance to polish
it yet. Let me send you that so that you can start working from there.

Thanks,
Gavin 

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection

2014-07-21 Thread Mike Qiu

On 07/22/2014 11:26 AM, Gavin Shan wrote:

On Tue, Jul 22, 2014 at 11:10:42AM +0800, Mike Qiu wrote:

On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote:

On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote:

I don't like this. I much prefer have dedicated error injection files
in their respective locations, something for PCI under the corresponding
PCI bridge etc...

So PowerNV error injection will be designed rely on debugfs been
configured, right?

Not necessarily. If we create a better debugfs layout for our PHBs, then
yes. It might be useful to provide more info in there for example access
to some of the counters ...

But on the other hand, for error injection in general, I wonder if we should
be under sysfs instead... something to study a bit.

In pHyp, general error injection use syscall:

#define __NR_rtas255

I don't know if it is a good idea to reuse this syscall for PowerNV.

At least, it is another choice without sysfs rely.


We won't use syscall for routing the error injection on PowerNV any more.
Generally speaking, we will use ioctl commands or subcode of EEH ioctl
command, which was invented for EEH support for VFIO devices to suport
QEMU. For the utility (errinjct) running on PowerNV, we will use debugfs
entries. I have premature code for that, but don't have chance to polish
it yet. Let me send you that so that you can start working from there.


OK, thanks

Thanks,
Gavin





--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html