Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexey Kardashevskiy
On 06/17/2013 08:06 AM, Alexander Graf wrote:
 
 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:
 
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.

 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.

 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.

 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.

 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 Only a few minor nits. Ben already commented on implementation details.
 

 ---
 Changelog:
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number

 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 ---
 Documentation/virtual/kvm/api.txt   |   17 ++
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266 
 +++
 arch/powerpc/kvm/book3s_hv.c|   39 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
 arch/powerpc/kvm/powerpc.c  |3 +
 include/uapi/linux/kvm.h|1 +
 10 files changed, 473 insertions(+), 32 deletions(-)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 5f91eda..6c082ff 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed to 
 userspace to be
 handled.


 +4.83 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability tells the guest that multiple TCE entry add/remove 
 hypercalls
 +handling is supported by the kernel. This significanly accelerates DMA
 +operations for PPC KVM guests.
 +
 +Unlike other capabilities in this section, this one does not have an ioctl.
 +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
 +H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed 
 to
 +the guest. Othwerwise it might be better for the guest to continue using 
 H_PUT_TCE
 +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 

 While this describes perfectly well what the consequences are of the
 patches, it does not describe properly what the CAP actually expresses.
 The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls directly. All other consequences are nice to
 document, but the semantics of the CAP are missing.


? It expresses ability to handle 2 hcalls. What is missing?


 We also usually try to keep KVM behavior unchanged with regards to older
 versions until a CAP is enabled. In this case I don't think it matters
 all that much, so I'm fine with declaring it as enabled by default.
 Please document that this is a change in behavior versus older KVM
 versions though.


Ok!


 +
 +
 5. The kvm_run structure
 

 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index af326cd..85d8f26 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -609,6 +609,8 @@ struct kvm_vcpu_arch {
  spinlock_t tbacct_lock;
  u64 busy_stolen;
  u64 busy_preempt;
 +
 +unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
 #endif
 };
 
 [...]


 
 [...]
 
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:

 On 06/17/2013 08:06 AM, Alexander Graf wrote:
 
 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:
 
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.
 
 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.
 
 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.
 
 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 Only a few minor nits. Ben already commented on implementation details.
 
 
 ---
 Changelog:
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number
 
 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 ---
 Documentation/virtual/kvm/api.txt   |   17 ++
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266 
 +++
 arch/powerpc/kvm/book3s_hv.c|   39 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
 arch/powerpc/kvm/powerpc.c  |3 +
 include/uapi/linux/kvm.h|1 +
 10 files changed, 473 insertions(+), 32 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 5f91eda..6c082ff 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed 
 to userspace to be
 handled.
 
 
 +4.83 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability tells the guest that multiple TCE entry add/remove 
 hypercalls
 +handling is supported by the kernel. This significanly accelerates DMA
 +operations for PPC KVM guests.
 +
 +Unlike other capabilities in this section, this one does not have an ioctl.
 +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
 +H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed 
 to
 +the guest. Othwerwise it might be better for the guest to continue using 
 H_PUT_TCE
 +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
 While this describes perfectly well what the consequences are of the
 patches, it does not describe properly what the CAP actually expresses.
 The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls directly. All other consequences are nice to
 document, but the semantics of the CAP are missing.
 
 
 ? It expresses ability to handle 2 hcalls. What is missing?

You don't describe the kvm - qemu interface. You describe some decisions qemu 
can take from this cap.

 
 
 We also usually try to keep KVM behavior unchanged with regards to older
 versions until a CAP is enabled. In this case I don't think it matters
 all that much, so I'm fine with declaring it as enabled by default.
 Please document that this is a change in behavior versus older KVM
 versions though.
 
 
 Ok!
 
 
 +
 +
 5. The kvm_run structure
 
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index af326cd..85d8f26 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -609,6 +609,8 @@ struct kvm_vcpu_arch {
 spinlock_t tbacct_lock;
 u64 busy_stolen;
 u64 busy_preempt;
 +
 +   unsigned long 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexey Kardashevskiy
On 06/17/2013 06:02 PM, Alexander Graf wrote:
 
 On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:
 
 On 06/17/2013 08:06 AM, Alexander Graf wrote:

 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:

 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.

 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.

 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.

 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.

 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org

 Only a few minor nits. Ben already commented on implementation details.


 ---
 Changelog:
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number

 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 ---
 Documentation/virtual/kvm/api.txt   |   17 ++
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266 
 +++
 arch/powerpc/kvm/book3s_hv.c|   39 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
 arch/powerpc/kvm/powerpc.c  |3 +
 include/uapi/linux/kvm.h|1 +
 10 files changed, 473 insertions(+), 32 deletions(-)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 5f91eda..6c082ff 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed 
 to userspace to be
 handled.


 +4.83 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability tells the guest that multiple TCE entry add/remove 
 hypercalls
 +handling is supported by the kernel. This significanly accelerates DMA
 +operations for PPC KVM guests.
 +
 +Unlike other capabilities in this section, this one does not have an 
 ioctl.
 +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
 +H_STUFF_TCE hypercalls are to be handled in the host kernel and not 
 passed to
 +the guest. Othwerwise it might be better for the guest to continue using 
 H_PUT_TCE
 +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).


 While this describes perfectly well what the consequences are of the
 patches, it does not describe properly what the CAP actually expresses.
 The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls directly. All other consequences are nice to
 document, but the semantics of the CAP are missing.


 ? It expresses ability to handle 2 hcalls. What is missing?
 
 You don't describe the kvm - qemu interface. You describe some decisions 
 qemu can take from this cap.


This file does not mention qemu at all. And the interface is - qemu (or
kvmtool could do that) just adds hcall-multi-tce to
ibm,hypertas-functions but this is for pseries linux and AIX could always
do it (no idea about it). Does it really have to be in this file?



 We also usually try to keep KVM behavior unchanged with regards to older
 versions until a CAP is enabled. In this case I don't think it matters
 all that much, so I'm fine with declaring it as enabled by default.
 Please document that this is a change in behavior versus older KVM
 versions though.


 Ok!


 +
 +
 5. The kvm_run structure
 

 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Benjamin Herrenschmidt
On Mon, 2013-06-17 at 17:55 +1000, Alexey Kardashevskiy wrote:
 David:
 ===
 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not to
 allow TCE acceleration as well.  We can't make it dependent on PAPR
 mode being selected, because that's enabled per-vcpu, whereas these
 capabilities are queried on the VM before the vcpus are created.
 ===
 
 Wrong?

The capability just tells qemu the kernel supports it, it doesn't have
to depend on PAPR mode, qemu can sort things out no ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:

 On 06/17/2013 06:02 PM, Alexander Graf wrote:
 
 On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:
 
 On 06/17/2013 08:06 AM, Alexander Graf wrote:
 
 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:
 
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.
 
 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.
 
 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.
 
 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 Only a few minor nits. Ben already commented on implementation details.
 
 
 ---
 Changelog:
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number
 
 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 ---
 Documentation/virtual/kvm/api.txt   |   17 ++
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266 
 +++
 arch/powerpc/kvm/book3s_hv.c|   39 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
 arch/powerpc/kvm/powerpc.c  |3 +
 include/uapi/linux/kvm.h|1 +
 10 files changed, 473 insertions(+), 32 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 5f91eda..6c082ff 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed 
 to userspace to be
 handled.
 
 
 +4.83 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability tells the guest that multiple TCE entry add/remove 
 hypercalls
 +handling is supported by the kernel. This significanly accelerates DMA
 +operations for PPC KVM guests.
 +
 +Unlike other capabilities in this section, this one does not have an 
 ioctl.
 +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
 +H_STUFF_TCE hypercalls are to be handled in the host kernel and not 
 passed to
 +the guest. Othwerwise it might be better for the guest to continue using 
 H_PUT_TCE
 +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
 While this describes perfectly well what the consequences are of the
 patches, it does not describe properly what the CAP actually expresses.
 The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls directly. All other consequences are nice to
 document, but the semantics of the CAP are missing.
 
 
 ? It expresses ability to handle 2 hcalls. What is missing?
 
 You don't describe the kvm - qemu interface. You describe some decisions 
 qemu can take from this cap.
 
 
 This file does not mention qemu at all. And the interface is - qemu (or
 kvmtool could do that) just adds hcall-multi-tce to
 ibm,hypertas-functions but this is for pseries linux and AIX could always
 do it (no idea about it). Does it really have to be in this file?

Ok, let's go back a step. What does this CAP describe? Don't look at the 
description you wrote above. Just write a new one. What exactly can user space 
expect when it finds this CAP?

 
 
 
 We also usually try to keep KVM behavior unchanged with regards to older
 versions until a CAP is enabled. In this case I don't think it matters
 all that 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 17.06.2013, at 10:37, Benjamin Herrenschmidt wrote:

 On Mon, 2013-06-17 at 17:55 +1000, Alexey Kardashevskiy wrote:
 David:
 ===
 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not to
 allow TCE acceleration as well.  We can't make it dependent on PAPR
 mode being selected, because that's enabled per-vcpu, whereas these
 capabilities are queried on the VM before the vcpus are created.
 ===
 
 Wrong?
 
 The capability just tells qemu the kernel supports it, it doesn't have
 to depend on PAPR mode, qemu can sort things out no ?

Yes, this goes hand-in-hand with the documentation bit I'm trying to get 
through to Alexey atm. The CAP merely says that if in PAPR mode the kernel can 
handle hypercalls X and Y itself.

This is true for all book3s implementations as the patches stand. It is not 
true for BookE as the patches stand. Hence the CAP should be limited to book3s, 
regardless of its mode :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexey Kardashevskiy
On 06/17/2013 06:40 PM, Alexander Graf wrote:
 
 On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:
 
 On 06/17/2013 06:02 PM, Alexander Graf wrote:
 
 On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:
 
 On 06/17/2013 08:06 AM, Alexander Graf wrote:
 
 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:
 
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and 
 H_STUFF_TCE hypercalls for QEMU emulated devices such as
 IBMVIO devices or emulated PCI.  These calls allow adding
 multiple entries (up to 512) into the TCE table in one call
 which saves time on transition to/from real mode.
 
 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs 
 (copied from user and verified) before writing the whole list
 into the TCE table. This cache will be utilized more in the
 upcoming VFIO/IOMMU support to continue TCE list processing in
 the virtual mode in the case if the real mode handler failed
 for some reason.
 
 This adds a guest physical to host real address converter and
 calls the existing H_PUT_TCE handler. The converting function 
 is going to be fully utilized by upcoming VFIO supporting
 patches.
 
 This also implements the KVM_CAP_PPC_MULTITCE capability, so
 in order to support the functionality of this patch, QEMU 
 needs to query for this capability and set the
 hcall-multi-tce hypertas property only if the capability is
 present, otherwise there will be serious performance
 degradation.
 
 Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by:
 Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul
 Mackerras pau...@samba.org
 
 Only a few minor nits. Ben already commented on implementation
 details.
 
 
 --- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the
 commit message * updated doc and moved it to another section *
 changed capability number
 
 2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup
 if put_indirect failed, instead we do not even start writing
 to TCE table if we cannot get TCEs from the user and they are 
 invalid * kvmppc_emulated_h_put_tce is split to
 kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for
 the previous item) * fixed bug with failthrough for H_IPI *
 removed all get_user() from real mode handlers *
 kvmppc_lookup_pte() added (instead of making lookup_linux_pte
 public) --- Documentation/virtual/kvm/api.txt   |   17 ++ 
 arch/powerpc/include/asm/kvm_host.h |2 + 
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +- 
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++ 
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266
 +++ arch/powerpc/kvm/book3s_hv.c
 |   39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + 
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 - 
 arch/powerpc/kvm/powerpc.c  |3 + 
 include/uapi/linux/kvm.h|1 + 10 files
 changed, 473 insertions(+), 32 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt
 b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff
 100644 --- a/Documentation/virtual/kvm/api.txt +++
 b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@
 calls by the guest for that service will be passed to
 userspace to be handled.
 
 
 +4.83 KVM_CAP_PPC_MULTITCE + +Capability:
 KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This
 capability tells the guest that multiple TCE entry add/remove
 hypercalls +handling is supported by the kernel. This
 significanly accelerates DMA +operations for PPC KVM guests. 
 + +Unlike other capabilities in this section, this one does
 not have an ioctl. +Instead, when the capability is present,
 the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be
 handled in the host kernel and not passed to +the guest.
 Othwerwise it might be better for the guest to continue using
 H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or
 KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
 While this describes perfectly well what the consequences are of
 the patches, it does not describe properly what the CAP actually
 expresses. The CAP only says this kernel is able to handle
 H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All
 other consequences are nice to document, but the semantics of
 the CAP are missing.
 
 
 ? It expresses ability to handle 2 hcalls. What is missing?
 
 You don't describe the kvm - qemu interface. You describe some
 decisions qemu can take from this cap.
 
 
 This file does not mention qemu at all. And the interface is - qemu
 (or kvmtool could do that) just adds hcall-multi-tce to 
 ibm,hypertas-functions but this is for pseries linux and AIX could
 always do it (no idea about it). Does it really have to be in this
 file?
 

 Ok, let's go back a step. What does this CAP describe? Don't look at the
 description you wrote above. Just write a new one.

The CAP means the kernel is capable of handling hcalls A and B without
passing those into the user space. That accelerates DMA.


 What exactly can user space expect when it finds this 

Re: [PATCH 2/4] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-06-17 Thread Alexey Kardashevskiy
On 06/16/2013 02:26 PM, Benjamin Herrenschmidt wrote:
 +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
 +int realmode_get_page(struct page *page)
 +{
 +if (PageCompound(page))
 +return -EAGAIN;
 +
 +get_page(page);
 +
 +return 0;
 +}
 +EXPORT_SYMBOL_GPL(realmode_get_page);
 +
 +int realmode_put_page(struct page *page)
 +{
 +if (PageCompound(page))
 +return -EAGAIN;
 +
 +if (!atomic_add_unless(page-_count, -1, 1))
 +return -EAGAIN;
 +
 +return 0;
 +}
 +EXPORT_SYMBOL_GPL(realmode_put_page);
 +#endif
 
 Several worries here, mostly that if the generic code ever changes
 (something gets added to get_page() that makes it no-longer safe for use
 in real mode for example, or some other condition gets added to
 put_page()), we go out of sync and potentially end up with very hard and
 very subtle bugs.
 
 It might be worth making sure that:
 
  - This is reviewed by some generic VM people (and make sure they
 understand why we need to do that)
 
  - A comment is added to get_page() and put_page() to make sure that if
 they are changed in any way, dbl check the impact on our
 realmode_get_page() (or ping us to make sure things are still ok).

After changing get_page() to get_page_unless_zero(), the get_page API I use is:
get_page_unless_zero() - basically atomic_inc_not_zero()
atomic_add_unless() - just operated with the counter
PageCompound() - check if it is a huge page.

No usage of get_page or put_page.

If any of those changes, I would expect it to hit us immediately, no?

So it may only make sense to add a comment to PageCompound(). But the
comment says PageCompound is generally not used in hot code paths, and
our path is hot. Heh.

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6d53675..c70a654 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -329,7 +329,8 @@ static inline void set_page_writeback(struct page *page)
  * System with lots of page flags available. This allows separate
  * flags for PageHead() and PageTail() checks of compound pages so that bit
  * tests can be used in performance sensitive paths. PageCompound is
- * generally not used in hot code paths.
+ * generally not used in hot code paths except arch/powerpc/mm/init_64.c
+ * which uses it to detect huge pages and avoid handling those in real mode.
  */
 __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
 __PAGEFLAG(Tail, tail)


So?


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


access PCI card trouble from VM guest

2013-06-17 Thread David Cure
Hello,

I want to access this PCI card from one VM :

10:00.0 Serial controller: Moxa Technologies Co Ltd CP-132EL (2-port
RS-422/485 Smart PCI Express Serial Board)

This card is not handle by driver on the host kernel.

When I start the VM with this command (generated vy libvirt) :

LC_ALL=C
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/
QEMU_AUDIO_DRV=none /usr/bin/kvm -name huahine -S -machine
pc-i440fx-1.5,accel=kvm,usb=off -cpu
SandyBridge,+pdpe1gb,+osxsave,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,hv_relaxed
-m 4096 -smp 2,sockets=2,cores=1,threads=1 -uuid
efaa3133-eb5a-e9c3-5120-d6dd5e4551ca -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/huahine.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive
file=/dev/drbd2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
-drive
file=/Iso/virtio-win.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,fd=23,id=hostnet0 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1f:ba:4e,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
-vnc 127.0.0.1:2 -k fr -vga std -device
pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


I get this error message :

qemu-system-x86_64: -device
pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6: No
IOMMU found.  Unable to assign device hostdev0
qemu-system-x86_64: -device
pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6:
Device initialization failed.
qemu-system-x86_64: -device
pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6:
Device 'kvm-pci-assign' could not be initialized
2013-06-11 16:01:15.131+: shutting down


I check in the BIOS and all virtualization technologie seems to
be enable.

If I grep IOMMU in the bootlog of the server (I don't know if
it's related or not), I see this :

messages.1:Jun 11 14:22:54 futuna kernel: [0.028041] IOMMU 0:
reg_base_addr fbdfe000 ver 1:0 cap d2078c106f0462 ecap f020fe
messages.1:Jun 11 14:22:54 futuna kernel: [0.028049] IOMMU 1:
reg_base_addr f4ffe000 ver 1:0 cap d2078c106f0462 ecap f020fe


Any idea to debug this ?

David.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 06/17/2013 10:51 AM, Alexey Kardashevskiy wrote:

On 06/17/2013 06:40 PM, Alexander Graf wrote:

On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:


On 06/17/2013 06:02 PM, Alexander Graf wrote:

On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:


On 06/17/2013 08:06 AM, Alexander Graf wrote:

On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:


This adds real mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for QEMU emulated devices such as
IBMVIO devices or emulated PCI.  These calls allow adding
multiple entries (up to 512) into the TCE table in one call
which saves time on transition to/from real mode.

This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
(copied from user and verified) before writing the whole list
into the TCE table. This cache will be utilized more in the
upcoming VFIO/IOMMU support to continue TCE list processing in
the virtual mode in the case if the real mode handler failed
for some reason.

This adds a guest physical to host real address converter and
calls the existing H_PUT_TCE handler. The converting function
is going to be fully utilized by upcoming VFIO supporting
patches.

This also implements the KVM_CAP_PPC_MULTITCE capability, so
in order to support the functionality of this patch, QEMU
needs to query for this capability and set the
hcall-multi-tce hypertas property only if the capability is
present, otherwise there will be serious performance
degradation.

Cc: David Gibsonda...@gibson.dropbear.id.au  Signed-off-by:
Alexey Kardashevskiya...@ozlabs.ru  Signed-off-by: Paul
Mackerraspau...@samba.org

Only a few minor nits. Ben already commented on implementation
details.


--- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the
commit message * updated doc and moved it to another section *
changed capability number

2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup
if put_indirect failed, instead we do not even start writing
to TCE table if we cannot get TCEs from the user and they are
invalid * kvmppc_emulated_h_put_tce is split to
kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for
the previous item) * fixed bug with failthrough for H_IPI *
removed all get_user() from real mode handlers *
kvmppc_lookup_pte() added (instead of making lookup_linux_pte
public) --- Documentation/virtual/kvm/api.txt   |   17 ++
arch/powerpc/include/asm/kvm_host.h |2 +
arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
arch/powerpc/kvm/book3s_64_vio.c|  118 ++
arch/powerpc/kvm/book3s_64_vio_hv.c |  266
+++ arch/powerpc/kvm/book3s_hv.c
|   39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
arch/powerpc/kvm/powerpc.c  |3 +
include/uapi/linux/kvm.h|1 + 10 files
changed, 473 insertions(+), 32 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff
100644 --- a/Documentation/virtual/kvm/api.txt +++
b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@
calls by the guest for that service will be passed to
userspace to be handled.


+4.83 KVM_CAP_PPC_MULTITCE + +Capability:
KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This
capability tells the guest that multiple TCE entry add/remove
hypercalls +handling is supported by the kernel. This
significanly accelerates DMA +operations for PPC KVM guests.
+ +Unlike other capabilities in this section, this one does
not have an ioctl. +Instead, when the capability is present,
the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be
handled in the host kernel and not passed to +the guest.
Othwerwise it might be better for the guest to continue using
H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or
KVM_CAP_SPAPR_TCE_IOMMU are present).

While this describes perfectly well what the consequences are of
the patches, it does not describe properly what the CAP actually
expresses. The CAP only says this kernel is able to handle
H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All
other consequences are nice to document, but the semantics of
the CAP are missing.


? It expresses ability to handle 2 hcalls. What is missing?

You don't describe the kvm-  qemu interface. You describe some
decisions qemu can take from this cap.


This file does not mention qemu at all. And the interface is - qemu
(or kvmtool could do that) just adds hcall-multi-tce to
ibm,hypertas-functions but this is for pseries linux and AIX could
always do it (no idea about it). Does it really have to be in this
file?

Ok, let's go back a step. What does this CAP describe? Don't look at the
description you wrote above. Just write a new one.

The CAP means the kernel is capable of handling hcalls A and B without
passing those into the user space. That accelerates DMA.



What exactly can user space expect when it finds this CAP?

The user space can expect that its handlers for A and B are not 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 06/17/2013 12:46 PM, Alexander Graf wrote:

On 06/17/2013 10:51 AM, Alexey Kardashevskiy wrote:

On 06/17/2013 06:40 PM, Alexander Graf wrote:

On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:


On 06/17/2013 06:02 PM, Alexander Graf wrote:

On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:


On 06/17/2013 08:06 AM, Alexander Graf wrote:

On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:


This adds real mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for QEMU emulated devices such as
IBMVIO devices or emulated PCI.  These calls allow adding
multiple entries (up to 512) into the TCE table in one call
which saves time on transition to/from real mode.

This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
(copied from user and verified) before writing the whole list
into the TCE table. This cache will be utilized more in the
upcoming VFIO/IOMMU support to continue TCE list processing in
the virtual mode in the case if the real mode handler failed
for some reason.

This adds a guest physical to host real address converter and
calls the existing H_PUT_TCE handler. The converting function
is going to be fully utilized by upcoming VFIO supporting
patches.

This also implements the KVM_CAP_PPC_MULTITCE capability, so
in order to support the functionality of this patch, QEMU
needs to query for this capability and set the
hcall-multi-tce hypertas property only if the capability is
present, otherwise there will be serious performance
degradation.

Cc: David Gibsonda...@gibson.dropbear.id.au  Signed-off-by:
Alexey Kardashevskiya...@ozlabs.ru  Signed-off-by: Paul
Mackerraspau...@samba.org

Only a few minor nits. Ben already commented on implementation
details.


--- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the
commit message * updated doc and moved it to another section *
changed capability number

2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup
if put_indirect failed, instead we do not even start writing
to TCE table if we cannot get TCEs from the user and they are
invalid * kvmppc_emulated_h_put_tce is split to
kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for
the previous item) * fixed bug with failthrough for H_IPI *
removed all get_user() from real mode handlers *
kvmppc_lookup_pte() added (instead of making lookup_linux_pte
public) --- Documentation/virtual/kvm/api.txt   |   17 ++
arch/powerpc/include/asm/kvm_host.h |2 +
arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
arch/powerpc/kvm/book3s_64_vio.c|  118 ++
arch/powerpc/kvm/book3s_64_vio_hv.c |  266
+++ arch/powerpc/kvm/book3s_hv.c
|   39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
arch/powerpc/kvm/powerpc.c  |3 +
include/uapi/linux/kvm.h|1 + 10 files
changed, 473 insertions(+), 32 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff
100644 --- a/Documentation/virtual/kvm/api.txt +++
b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@
calls by the guest for that service will be passed to
userspace to be handled.


+4.83 KVM_CAP_PPC_MULTITCE + +Capability:
KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This
capability tells the guest that multiple TCE entry add/remove
hypercalls +handling is supported by the kernel. This
significanly accelerates DMA +operations for PPC KVM guests.
+ +Unlike other capabilities in this section, this one does
not have an ioctl. +Instead, when the capability is present,
the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be
handled in the host kernel and not passed to +the guest.
Othwerwise it might be better for the guest to continue using
H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or
KVM_CAP_SPAPR_TCE_IOMMU are present).

While this describes perfectly well what the consequences are of
the patches, it does not describe properly what the CAP actually
expresses. The CAP only says this kernel is able to handle
H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All
other consequences are nice to document, but the semantics of
the CAP are missing.


? It expresses ability to handle 2 hcalls. What is missing?

You don't describe the kvm-  qemu interface. You describe some
decisions qemu can take from this cap.


This file does not mention qemu at all. And the interface is - qemu
(or kvmtool could do that) just adds hcall-multi-tce to
ibm,hypertas-functions but this is for pseries linux and AIX could
always do it (no idea about it). Does it really have to be in this
file?
Ok, let's go back a step. What does this CAP describe? Don't look at 
the

description you wrote above. Just write a new one.

The CAP means the kernel is capable of handling hcalls A and B without
passing those into the user space. That accelerates DMA.



What exactly can user space expect when it finds this CAP?
The user space can 

Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-17 Thread Xiao Guangrong
Sorry for the delay reply since i was on vacation.

On 06/15/2013 10:22 AM, Takuya Yoshikawa wrote:
 On Thu, 13 Jun 2013 21:08:21 -0300
 Marcelo Tosatti mtosa...@redhat.com wrote:
 
 On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
 
 - Where is the generation number increased?
 
 Looks like when a new slot is installed in update_memslots() because
 it's based on slots-generation.  This is not restricted to create
 and move.

Yes. It reuses slots-generation to avoid unnecessary synchronizations
(RCU, memory barrier).

Increasing mmio generation number in the case of create and move
is ok - it is no addition work unless mmio generation number is overflow
which is hardly triggered (since the valid mmio generation number is
large enough and zap_all is scale well now.) and the mmio spte is updated
only when it is used in the future.

 
 - Should use spinlock breakable code in kvm_mmu_zap_mmio_sptes()
 (picture guest with 512GB of RAM, even walking all those pages is
 expensive) (ah, patch to remove kvm_mmu_zap_mmio_sptes does that).
 - Is -13 enough to test wraparound? Its highly likely the guest has 
 not began executing by the time 13 kvm_set_memory_calls are made
 (so no sptes around). Perhaps -2000 is more sensible (should confirm
 though).
 
 In the future, after we've tested enough, we should change the testing
 code to be executed only for some debugging configs.  Especially, if we
 change zap_mmio_sptes() to zap_all_shadows(), very common guests, even
 without huge memory like 512GB, can see the effect induced by sudden page
 faults unnecessarily.
 
 If necessary, developers can test the wraparound code by lowering the
 max_gen itself anyway.

I agree.

 
 - Why remove if (change == KVM_MR_CREATE) || (change
 ==  KVM_MR_MOVE) from kvm_arch_commit_memory_region?
 Its instructive.
 
 There may be a chance that we miss generation wraparounds if we don't
 check other cases: seems unlikely, but theoretically possible.
 
 In short, all memory slot changes make mmio sptes stored in shadow pages
 obsolete, or zapped for wraparounds, in the new way -- am I right?

Yes. You are definitely right. :)

Takuya-san, thank you very much for you answering the questions for me and 
thanks
all of you for patiently reviewing my patches.

Marcelo, your points?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 03/13] nEPT: Add EPT tables support to paging_tmpl.h

2013-06-17 Thread Xiao Guangrong
On 06/11/2013 07:32 PM, Gleb Natapov wrote:
 On Tue, May 21, 2013 at 03:52:12PM +0800, Xiao Guangrong wrote:
 On 05/19/2013 12:52 PM, Jun Nakajima wrote:
 From: Nadav Har'El n...@il.ibm.com

 This is the first patch in a series which adds nested EPT support to KVM's
 nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can 
 use
 EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest
 to set its own cr3 and take its own page faults without either of L0 or L1
 getting involved. This often significanlty improves L2's performance over 
 the
 previous two alternatives (shadow page tables over EPT, and shadow page
 tables over shadow page tables).

 This patch adds EPT support to paging_tmpl.h.

 paging_tmpl.h contains the code for reading and writing page tables. The 
 code
 for 32-bit and 64-bit tables is very similar, but not identical, so
 paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once
 with PTTYPE=64, and this generates the two sets of similar functions.

 There are subtle but important differences between the format of EPT tables
 and that of ordinary x86 64-bit page tables, so for nested EPT we need a
 third set of functions to read the guest EPT table and to write the shadow
 EPT table.

 So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions 
 (prefixed
 with EPT) which correctly read and write EPT tables.

 Signed-off-by: Nadav Har'El n...@il.ibm.com
 Signed-off-by: Jun Nakajima jun.nakaj...@intel.com
 Signed-off-by: Xinhao Xu xinhao...@intel.com
 ---
  arch/x86/kvm/mmu.c |  5 +
  arch/x86/kvm/paging_tmpl.h | 43 +--
  2 files changed, 46 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 117233f..6c1670f 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -3397,6 +3397,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, 
 unsigned level, unsigned gp
 return mmu-last_pte_bitmap  (1  index);
  }

 +#define PTTYPE_EPT 18 /* arbitrary */
 +#define PTTYPE PTTYPE_EPT
 +#include paging_tmpl.h
 +#undef PTTYPE
 +
  #define PTTYPE 64
  #include paging_tmpl.h
  #undef PTTYPE
 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index df34d4a..4c45654 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -50,6 +50,22 @@
 #define PT_LEVEL_BITS PT32_LEVEL_BITS
 #define PT_MAX_FULL_LEVELS 2
 #define CMPXCHG cmpxchg
 +#elif PTTYPE == PTTYPE_EPT
 +   #define pt_element_t u64
 +   #define guest_walker guest_walkerEPT
 +   #define FNAME(name) EPT_##name
 +   #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
 +   #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
 +   #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
 +   #define PT_INDEX(addr, level) PT64_INDEX(addr, level)
 +   #define PT_LEVEL_BITS PT64_LEVEL_BITS
 +   #ifdef CONFIG_X86_64
 +   #define PT_MAX_FULL_LEVELS 4
 +   #define CMPXCHG cmpxchg
 +   #else
 +   #define CMPXCHG cmpxchg64

 CMPXHG is only used in FNAME(cmpxchg_gpte), but you commented it later.
 Do we really need it?

 +   #define PT_MAX_FULL_LEVELS 2

 And the SDM says:

 It uses a page-walk length of 4, meaning that at most 4 EPT paging-structure
 entriesare accessed to translate a guest-physical address., Is My SDM 
 obsolete?
 Which kind of process supports page-walk length = 2?

 It seems your patch is not able to handle the case that the guest uses 
 walk-lenght = 2
 which is running on the host with walk-lenght = 4.
 (plrease refer to how to handle sp-role.quadrant in 
 FNAME(get_level1_sp_gpa) in
 the current code.)

 But since EPT always has 4 levels on all existing cpus it is not an issue and 
 the only case
 that we should worry about is guest walk-lenght == host walk-lenght == 4, or 
 have I

Yes. I totally agree with you, but...

 misunderstood what you mean here?

What confused me is that this patch defines #define PT_MAX_FULL_LEVELS 2, so 
i asked the
question: Which kind of process supports page-walk length = 2.
Sorry, there is a typo in my origin comments. process should be processor 
or CPU.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: access PCI card trouble from VM guest

2013-06-17 Thread Alex Williamson
On Mon, 2013-06-17 at 12:00 +0200, David Cure wrote:
   Hello,
 
   I want to access this PCI card from one VM :
 
 10:00.0 Serial controller: Moxa Technologies Co Ltd CP-132EL (2-port
 RS-422/485 Smart PCI Express Serial Board)
 
   This card is not handle by driver on the host kernel.
 
   When I start the VM with this command (generated vy libvirt) :
 
 LC_ALL=C
 PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/
 QEMU_AUDIO_DRV=none /usr/bin/kvm -name huahine -S -machine
 pc-i440fx-1.5,accel=kvm,usb=off -cpu
 SandyBridge,+pdpe1gb,+osxsave,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme,hv_relaxed
 -m 4096 -smp 2,sockets=2,cores=1,threads=1 -uuid
 efaa3133-eb5a-e9c3-5120-d6dd5e4551ca -no-user-config -nodefaults
 -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/huahine.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
 -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive
 file=/dev/drbd2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none
 -device
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
 -drive
 file=/Iso/virtio-win.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw
 -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
 tap,fd=23,id=hostnet0 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1f:ba:4e,bus=pci.0,addr=0x3
 -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
 -vnc 127.0.0.1:2 -k fr -vga std -device
 pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 
 
   I get this error message :
 
 qemu-system-x86_64: -device
 pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6: No
 IOMMU found.  Unable to assign device hostdev0
 qemu-system-x86_64: -device
 pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6:
 Device initialization failed.
 qemu-system-x86_64: -device
 pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6:
 Device 'kvm-pci-assign' could not be initialized
 2013-06-11 16:01:15.131+: shutting down
 
 
   I check in the BIOS and all virtualization technologie seems to
 be enable.
 
   If I grep IOMMU in the bootlog of the server (I don't know if
 it's related or not), I see this :
 
 messages.1:Jun 11 14:22:54 futuna kernel: [0.028041] IOMMU 0:
 reg_base_addr fbdfe000 ver 1:0 cap d2078c106f0462 ecap f020fe
 messages.1:Jun 11 14:22:54 futuna kernel: [0.028049] IOMMU 1:
 reg_base_addr f4ffe000 ver 1:0 cap d2078c106f0462 ecap f020fe
 
 
   Any idea to debug this ?
 

Boot the host with intel_iommu=on, but if you want 10:00.0, why are you
assigning 2a:00.3?  Note that you can't/shouldn't assign bridges, assign
the actual end device.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: access PCI card trouble from VM guest

2013-06-17 Thread David Cure
Le Mon, Jun 17, 2013 at 12:00:10PM +0200, David Cure ecrivait :
 
 10:00.0 Serial controller: Moxa Technologies Co Ltd CP-132EL (2-port
 RS-422/485 Smart PCI Express Serial Board)
 
 pci-assign,configfd=24,host=2a:00.3,id=hostdev0,bus=pci.0,addr=0x6
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

the pci id for the host is a mistake (I try to access network
card too to see if it's works ... and no), I have the right PCI id
(10:00.0) for the card.

David.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: access PCI card trouble from VM guest

2013-06-17 Thread David Cure
Hello Alex,

Le Mon, Jun 17, 2013 at 07:24:54AM -0600, Alex Williamson ecrivait :
 
 Boot the host with intel_iommu=on, 

I add it, and now I can start the VM, thanks a lot. (will look
now if I can access it in the VM ;))

 but if you want 10:00.0, why are you
 assigning 2a:00.3?  

just a copy/paste trouble : not the right log ;)

David.

-- 
 Quand tu veux enregister, tu as le trombone qui apparait
  et qui te demande si tu ne veux pas sauver.
Non c'est pour un tennis cod !! 
-+- PD/TP in Guide du Finixien Pervers : Bien jouer au trombonne 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci: Enable overrides for missing ACS capabilities

2013-06-17 Thread Don Dutile

On 05/30/2013 02:40 PM, Alex Williamson wrote:

PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that
allows us to control whether transactions are allowed to be redirected
in various subnodes of a PCIe topology.  For instance, if two
endpoints are below a root port or downsteam switch port, the
downstream port may optionally redirect transactions between the
devices, bypassing upstream devices.  The same can happen internally
on multifunction devices.  The transaction may never be visible to the
upstream devices.

One upstream device that we particularly care about is the IOMMU.  If
a redirection occurs in the topology below the IOMMU, then the IOMMU
cannot provide isolation between devices.  This is why the PCIe spec
encourages topologies to include ACS support.  Without it, we have to
assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation.

Unfortunately, far too many topologies do not support ACS to make this
a steadfast requirement.  Even the latest chipsets from Intel are only
sporadically supporting ACS.  We have trouble getting interconnect
vendors to include the PCIe spec required PCIe capability, let alone
suggested features.

Therefore, we need to add some flexibility.  The pcie_acs_override=
boot option lets users opt-in specific devices or sets of devices to
assume ACS support.  The downstream option assumes full ACS support
on root ports and downstream switch ports.  The multifunction
option assumes the subset of ACS features available on multifunction
endpoints and upstream switch ports are supported.  The id::
option enables ACS support on devices matching the provided vendor
and device IDs, allowing more strategic ACS overrides.  These options
may be combined in any order.  A maximum of 16 id specific overrides
are available.  It's suggested to use the most limited set of options
necessary to avoid completely disabling ACS across the topology.

Note to hardware vendors, we have facilities to permanently quirk
specific devices which enforce isolation but not provide an ACS
capability.  Please contact me to have your devices added and save
your customers the hassle of this boot option.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---
  Documentation/kernel-parameters.txt |   10 +++
  drivers/pci/quirks.c|  102 +++
  2 files changed, 112 insertions(+)



Feel free to add my ack.

I like the fact that all of this code is in quirks, and not sprinkled btwn pci core 
 quirks.
As we have discovered, even common, ACS-compatilbe x86 chipsets are out there 
w/o ACS caps.
Additionally, an unclear area of the spec has some vendors providing 'null ACS' 
caps,
which is suppose to be interpreted as no-peer-to-peer DMA; this latter 
'feature' is being
brought up to PCI-SIG to get a note added to SRIOV spec in the ACS-related 
material to clarify.


diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 47bb23c..a60e6ad 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2349,6 +2349,16 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
nomsi   Do not use MSI for native PCIe PME signaling (this makes
all PCIe root ports use INTx for all services).

+   pcie_acs_override =
+   [PCIE] Override missing PCIe ACS support for:
+   downstream
+   All downstream ports - full ACS capabilties
+   multifunction
+   All multifunction devices - multifunction ACS subset
+   id::
+   Specfic device - full ACS capabilities
+   Specified as vid:did (vendor/device ID) in hex
+
pcmv=   [HW,PCMCIA] BadgePAD 4

pd. [PARIDE]
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 0369fb6..c7609f6 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3292,11 +3292,113 @@ struct pci_dev *pci_get_dma_source(struct pci_dev *dev)
return pci_dev_get(dev);
  }

+static bool acs_on_downstream;
+static bool acs_on_multifunction;
+
+#define NUM_ACS_IDS 16
+struct acs_on_id {
+   unsigned short vendor;
+   unsigned short device;
+};

At first, I wasn't sure if vid/did would be sufficient, but convinced myself
that a sub-vid/did that differed amongst sub-vid's, even if they added/modified
ACS cap, the ACS-cap-existence test in pcie_acs_overrides() will not exercise
any override for that variant, and both bases -- newer/sub-vid modified devices
w/ACS, and vid/did w/o ACS will be handled properly.
So, another reason for the ack.


+static struct acs_on_id acs_on_ids[NUM_ACS_IDS];
+static u8 max_acs_id;
+
+static __init int pcie_acs_override_setup(char *p)
+{
+   if (!p)
+   return -EINVAL;
+
+   while (*p) {
+   if (!strncmp(p, downstream, 10))
+  

Re: [PATCH 0/6] KVM: s390: Patches for kvm-next.

2013-06-17 Thread Paolo Bonzini
Il 12/06/2013 13:54, Cornelia Huck ha scritto:
 Hi,
 
 here are some patches that have accumulated in our kvm/s390 patch queue.
 
 There's now support for large pages in the guest, and perf samples can
 be sorted between kvm host or guest. Other than that, some cleanup and
 a bugfix.
 
 Please apply.
 
 Christian Borntraeger (3):
   KVM: s390: Provide function for setting the guest storage key
   KVM: s390: guest large pages
   KVM: s390: Use common waitqueue
 
 Heinz Graalfs (1):
   KVM: s390,perf: Detect if perf samples belong to KVM host or guest
 
 Michael Mueller (1):
   KVM: s390: code cleanup to use common vcpu slab cache
 
 Thomas Huth (1):
   KVM: s390: Fix epsw instruction decoding
 
  arch/s390/include/asm/kvm_host.h   |  8 +++-
  arch/s390/include/asm/perf_event.h | 10 +
  arch/s390/include/asm/pgalloc.h|  3 ++
  arch/s390/kernel/entry64.S |  1 +
  arch/s390/kernel/perf_event.c  | 52 ++
  arch/s390/kernel/s390_ksyms.c  |  1 +
  arch/s390/kvm/interrupt.c  | 18 
  arch/s390/kvm/kvm-s390.c   | 15 ---
  arch/s390/kvm/kvm-s390.h   |  6 +++
  arch/s390/kvm/priv.c   | 89 
 --
  arch/s390/kvm/sigp.c   | 16 +++
  arch/s390/mm/pgtable.c | 48 
  12 files changed, 238 insertions(+), 29 deletions(-)
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm/mips: ABI fix for 3.10

2013-06-17 Thread Paolo Bonzini
Il 10/06/2013 21:33, David Daney ha scritto:
 From: David Daney david.da...@cavium.com
 
 As requested by Gleb Natapov, we need to define and use KVM_REG_MIPS
 when using the GET_ONE_REG/SET_ONE_REG ioctl.  Since this is part of
 the MIPS kvm support that is new in 3.10, it should be merged before a
 bad ABI leaks out into an 'official' kernel release.
 
 David Daney (2):
   kvm: Add definition of KVM_REG_MIPS
   mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG
 
  arch/mips/include/uapi/asm/kvm.h | 81 +++
  arch/mips/kvm/kvm_mips.c | 83 
 ++--
  include/uapi/linux/kvm.h |  1 +
  3 files changed, 94 insertions(+), 71 deletions(-)
 

CCed people probably already know, but anyway: this is already in
Linus's tree (commit af180b81a3f4ea925fae88878f367e676e99bf73).

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] kvm/ppc: fixes for 3.10

2013-06-17 Thread Paolo Bonzini
Il 10/06/2013 21:52, Scott Wood ha scritto:
 On 06/09/2013 03:09:21 AM, Gleb Natapov wrote:
 On Thu, Jun 06, 2013 at 07:16:28PM -0500, Scott Wood wrote:
  Most of these have been posted before, but I grouped them together as
  there are some contextual dependencies between them.
 
  Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these
  if there's no objection over the next few days?
 
 Well we are at -rc5 now and Linus specifically said that if he sees one
 more cleanup he will be less then happy [1]. Looks like this patch
 series does have some cleanups that can be postponed to 3.11.
 Patches 1-4,7 looks like 3.10 material to me. 5 and 6 a cleanups that can
 wait for 3.11. Not sure about 8, if 8 fixes serious problem please
 specify it in the commit message.
 
 Agreed.
 
 8 did fix a BUG_ON before patch 7 came along, but now it looks
 non-critical.
 
 5 only affects IRQ tracing, and it's not a regression, so also probably
 not critical.  I'll resend patch 7 so that it applies without needing
 patch 5.
 
 6 is mainly doing things that we originally thought were a fix to lazy
 ee handling, until we noticed code elsewhere handling it in a hackier
 way.  There's still a bugfix in that previously kvm_guest_exit() was
 called in the wrong place which could occasionally mess up virtual time
 accounting, but that's also not a regression and not critical.

CCed people probably already know, but in any case patches 1-4 are
already in Linus's tree (commit af180b81).

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: remove vcpu's CPL check in host invoked vcpu's xcr set process

2013-06-17 Thread Paolo Bonzini
Il 14/06/2013 09:36, Zhanghaoyu (A) ha scritto:
 __kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr is 
 called in two flows,
 one is invoked by guest, call stack shown as below,
 handle_xsetbv(or xsetbv_interception)
   kvm_set_xcr
 __kvm_set_xcr
 the other one is invoked by host(QEMU), call stack shown as below,
 kvm_arch_vcpu_ioctl
   kvm_vcpu_ioctl_x86_set_xcrs
 __kvm_set_xcr
 
 The former does need the CPL check, but the latter does not.
 
 Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com

What does this patch fix?  I suppose it is some kind of migration
problem since you mentioned QEMU, but I'd rather be sure.  I can fix the
commit message myself when applying.

Thanks,

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-test PATCH] kvmclock: serialize RDTSC

2013-06-17 Thread Paolo Bonzini
Il 14/06/2013 23:30, Marcelo Tosatti ha scritto:
 
 Serialize RDTSC so its executed inside kvmclock_read 
 section.
 
 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=922285
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 diff --git a/x86/kvmclock.c b/x86/kvmclock.c
 index 0624da3..5b831c5 100644
 --- a/x86/kvmclock.c
 +++ b/x86/kvmclock.c
 @@ -177,10 +177,10 @@ cycle_t pvclock_clocksource_read(struct 
 pvclock_vcpu_time_info *src)
  
   do {
   version = pvclock_get_time_values(shadow, src);
 - barrier();
 + mb();
   offset = pvclock_get_nsec_offset(shadow);
   ret = shadow.system_timestamp + offset;
 - barrier();
 + mb();
   } while (version != src-version);
  
   if ((valid_flags  PVCLOCK_RAW_CYCLE_BIT) ||
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 59521] KVM linux guest reads uninitialized pvclock values before executing rdmsr MSR_KVM_WALL_CLOCK

2013-06-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=59521





--- Comment #3 from Anonymous Emailer anonym...@kernel-bugs.osdl.org  
2013-06-17 15:50:27 ---
Reply-To: pbonz...@redhat.com

Il 15/06/2013 19:17, bugzilla-dae...@bugzilla.kernel.org ha scritto:
 The problem is in cpu_init() which is called earlier.
 cpu_init() calls printk and possibly other stuff which can use timestamps.
 printk calls local_clock() to obtain a timestamp of a log message. On KVM
 guests call sequence usually ends up in kvm_clock_read but needed rdmsr is
 executed only in x86_cpuinit.early_percpu_clock_init().
 
 I consider two approaches to fix the problem:
 1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init();
 + Simple
 - We will get excessive restrictions on operations which allowed to be
 performed in early_percpu_clock_init() because percpu specific data is
 initialized only in cpu_init().

Considering how simple kvm_register_clock is, I think this is
preferrable if it works.  Ironically, commit 7069ed6 (x86: kvmclock:
allocate pvclock shared memory area, 2012-11-27), which introduced the
regression, is what should make this simpler fix possible.

Paolo

 2. Return 0ULL from kvm_clock_read untill it is initialized.
 + Simple too
 - Additional if statement inside kvm_clock_read (not serious even for
 performance paranoiacs)
 - Returning 0ULL looks ok because it is the same thing which kernel bootstrap
 CPU does on early boot stages. But I am not quite sure. Better to ask guys who
 maintain the needed relevant subsystem.
 
 I prefer the second way. It doesn't add complex restrictions to CPU bootup
 code. I'll send a patch soon which fixes the problem in the second way.
 
 I don't propagate such a logic to levels higher then KVM clocksource
 (pv_time_ops level for example) because of the following code:
 void __init kvmclock_init(void)
 ...
 263 pv_time_ops.sched_clock = kvm_clock_read;
 264 x86_platform.calibrate_tsc = kvm_get_tsc_khz;
 265 x86_platform.get_wallclock = kvm_get_wallclock;
 266 x86_platform.set_wallclock = kvm_set_wallclock;
 267 #ifdef CONFIG_X86_LOCAL_APIC
 268 x86_cpuinit.early_percpu_clock_init =
 269 kvm_setup_secondary_clock;
 270 #endif
 271 x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
 272 x86_platform.restore_sched_clock_state =
 kvm_restore_sched_clock_state;
 
 To propagate the logic I need to make changes both in x86_platform and
 pv_time_ops also I should make a similar fix for ia64 arch. It needs some
 subsystems refactoring to make the changes clean. Dont' think that its worth 
 to
 fix the bug. Better to make a simple fix.


-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 59521] KVM linux guest reads uninitialized pvclock values before executing rdmsr MSR_KVM_WALL_CLOCK

2013-06-17 Thread Paolo Bonzini
Il 15/06/2013 19:17, bugzilla-dae...@bugzilla.kernel.org ha scritto:
 The problem is in cpu_init() which is called earlier.
 cpu_init() calls printk and possibly other stuff which can use timestamps.
 printk calls local_clock() to obtain a timestamp of a log message. On KVM
 guests call sequence usually ends up in kvm_clock_read but needed rdmsr is
 executed only in x86_cpuinit.early_percpu_clock_init().
 
 I consider two approaches to fix the problem:
 1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init();
 + Simple
 - We will get excessive restrictions on operations which allowed to be
 performed in early_percpu_clock_init() because percpu specific data is
 initialized only in cpu_init().

Considering how simple kvm_register_clock is, I think this is
preferrable if it works.  Ironically, commit 7069ed6 (x86: kvmclock:
allocate pvclock shared memory area, 2012-11-27), which introduced the
regression, is what should make this simpler fix possible.

Paolo

 2. Return 0ULL from kvm_clock_read untill it is initialized.
 + Simple too
 - Additional if statement inside kvm_clock_read (not serious even for
 performance paranoiacs)
 - Returning 0ULL looks ok because it is the same thing which kernel bootstrap
 CPU does on early boot stages. But I am not quite sure. Better to ask guys who
 maintain the needed relevant subsystem.
 
 I prefer the second way. It doesn't add complex restrictions to CPU bootup
 code. I'll send a patch soon which fixes the problem in the second way.
 
 I don't propagate such a logic to levels higher then KVM clocksource
 (pv_time_ops level for example) because of the following code:
 void __init kvmclock_init(void)
 ...
 263 pv_time_ops.sched_clock = kvm_clock_read;
 264 x86_platform.calibrate_tsc = kvm_get_tsc_khz;
 265 x86_platform.get_wallclock = kvm_get_wallclock;
 266 x86_platform.set_wallclock = kvm_set_wallclock;
 267 #ifdef CONFIG_X86_LOCAL_APIC
 268 x86_cpuinit.early_percpu_clock_init =
 269 kvm_setup_secondary_clock;
 270 #endif
 271 x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
 272 x86_platform.restore_sched_clock_state =
 kvm_restore_sched_clock_state;
 
 To propagate the logic I need to make changes both in x86_platform and
 pv_time_ops also I should make a similar fix for ia64 arch. It needs some
 subsystems refactoring to make the changes clean. Dont' think that its worth 
 to
 fix the bug. Better to make a simple fix.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-17 Thread Paolo Bonzini
Il 16/06/2013 02:25, Stefan Pietsch ha scritto:
 Bisecting leads to
 
 git bisect bad 378a8b099fc207ddcb91b19a8c1457667e0af398
 git bisect good 007a3b547512d69f67ceb9641796d64552bd337e
 git bisect good 1f3141e80b149e7215313dff29e9a0c47811b1d1
 git bisect good 286da4156dc65c8a054580fdd96b7709132dce8d
 git bisect bad 25391454e73e3156202264eb3c473825afe4bc94
 git bisect good 218e763f458c44f30041c1b48b4371e130fd4317
 
 
 first bad commit: [25391454e73e3156202264eb3c473825afe4bc94]
 KVM: VMX: don't clobber segment AR of unusable segments.
 
 25391454e73e3156202264eb3c473825afe4bc94
 emulate_invalid_guest_state=0 - hangs and shows KVM: entry failed
 emulate_invalid_guest_state=1 - hangs
 
 Please note, I had to compile some revisions with
 3f0c3d0bb2bcc4b88b22452a7cf0073ee9a0f1e6 applied, caused by
 9ae9febae9500a0a6f5ce29ee4b8d942b5332529.

Can you please execute info registers and x/10i $pc from the QEMU
monitor at the time of the hang, and include the output?  Using
-monitor stdio or the new GTK+ interface can help.

Also, can you run under tracing (for information on how to do this, see
http://www.linux-kvm.org/page/Tracing) and include the bottom of the log?

Thanks,

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-06-17 Thread Paolo Bonzini
Il 05/06/2013 08:11, Alexey Kardashevskiy ha scritto:
 +/*
 + * The KVM guest can be backed with 16MB pages (qemu switch
 + * -mem-path /var/lib/hugetlbfs/global/pagesize-16MB/).

Nitpick: we try to avoid references to QEMU, so perhaps

s/qemu switch/for example, with QEMU you can use the command-line option/

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated

2013-06-17 Thread Luiz Capitulino
On Wed, 5 Jun 2013 21:18:37 -0400
Luiz Capitulino lcapitul...@redhat.com wrote:

 The balloon_page_dequeue() function can return NULL. If it does for
 the first page being freed, then leak_balloon() will create a
 scatter list with len=0. Which in turn seems to generate an invalid
 virtio request.
 
 I didn't get this in practice, I found it by code review. On the other
 hand, such an invalid virtio request will cause errors in QEMU and
 fill_balloon() also performs the same check implemented by this commit.
 
 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
 Acked-by: Rafael Aquini aqu...@redhat.com

Andrew, can you pick this one?

 ---
 
 o v2
 
  - Improve changelog
 
  drivers/virtio/virtio_balloon.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
 index bd3ae32..71af7b5 100644
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, 
 size_t num)
* virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
* is true, we *have* to do it in this order
*/
 - tell_host(vb, vb-deflate_vq);
 + if (vb-num_pfns != 0)
 + tell_host(vb, vb-deflate_vq);
   mutex_unlock(vb-balloon_lock);
   release_pages_by_pfn(vb-pfns, vb-num_pfns);
  }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 59521] KVM linux guest reads uninitialized pvclock values before executing rdmsr MSR_KVM_WALL_CLOCK

2013-06-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=59521





--- Comment #4 from Eugene Batalov eabatalo...@gmail.com  2013-06-17 21:29:08 
---
(In reply to comment #3)
 Reply-To: pbonz...@redhat.com
 
 Il 15/06/2013 19:17, bugzilla-dae...@bugzilla.kernel.org ha scritto:
  The problem is in cpu_init() which is called earlier.
  cpu_init() calls printk and possibly other stuff which can use timestamps.
  printk calls local_clock() to obtain a timestamp of a log message. On KVM
  guests call sequence usually ends up in kvm_clock_read but needed rdmsr is
  executed only in x86_cpuinit.early_percpu_clock_init().
  
  I consider two approaches to fix the problem:
  1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init();
  + Simple
  - We will get excessive restrictions on operations which allowed to be
  performed in early_percpu_clock_init() because percpu specific data is
  initialized only in cpu_init().
 
 Considering how simple kvm_register_clock is, I think this is
 preferrable if it works.  Ironically, commit 7069ed6 (x86: kvmclock:
 allocate pvclock shared memory area, 2012-11-27), which introduced the
 regression, is what should make this simpler fix possible.
 
 Paolo

Understood your point. I'll test this fix and report the results.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2013-06-17 Thread AFG GTBANK LOAN



Loan Syndicacion

Am AFG Guaranty Trust Bank, zu strukturieren wir Kreditlinien treffen Sie
unsere
Kunden spezifischen geschäftlichen Anforderungen und einen deutlichen
Mehrwert für unsere
Kunden Unternehmen.
eine Division der AFG Finance und Private Bank plc.

Wenn Sie erwägen, eine große Akquisition oder ein Großprojekt sind, können
Sie
brauchen eine erhebliche Menge an Kredit. AFG Guaranty Trust Bank setzen
können
zusammen das Syndikat, das die gesamte Kredit schnürt für
Sie.


Als Bank mit internationaler Reichweite, sind wir gekommen, um Darlehen zu
identifizieren
Syndizierungen als Teil unseres Kerngeschäfts und durch spitzte diese Zeile
aggressiv sind wir an einem Punkt, wo wir kommen, um als erkannt haben
Hauptakteur in diesem Bereich.


öffnen Sie ein Girokonto heute mit einem Minimum Bankguthaben von 500 £ und
Getup zu £ 10.000 als Darlehen und auch den Hauch einer Chance und gewann
die Sterne
Preis von £ 500.000 in die sparen und gewinnen promo in may.aply jetzt.


mit dem Folowing Informationen über Rechtsanwalt steven lee das Konto
Offizier.


FULL NAME;


Wohnadresse;


E-MAIL-ADRESSE;

Telefonnummer;

Nächsten KINS;

MUTTER MAIDEN NAME;


Familienstand;


BÜROADRESSE;

ALTERNATIVE Telefonnummer;

TO @ yahoo.com bar.stevenlee
NOTE; ALLE Darlehen sind für 10JAHRE RATE VALID
ANGEBOT ENDET BALD SO JETZT HURRY

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: remove vcpu's CPL check in host invoked vcpu's xcr set process

2013-06-17 Thread Zhanghaoyu (A)
 __kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr 
 is called in two flows, one is invoked by guest, call stack shown as 
 below, handle_xsetbv(or xsetbv_interception)
   kvm_set_xcr
 __kvm_set_xcr
 the other one is invoked by host(QEMU), call stack shown as below, 
 kvm_arch_vcpu_ioctl
   kvm_vcpu_ioctl_x86_set_xcrs
 __kvm_set_xcr
 
 The former does need the CPL check, but the latter does not.
 
 Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com

What does this patch fix?  I suppose it is some kind of migration problem 
since you mentioned QEMU, but I'd rather be sure.  I can fix the commit 
message myself when applying.
This patch tries to fix the problem of system_reset invoked by qemu 
monitor[system_reset] or virsh command[virsh reset domain].
QEMU will reset the domain on receiving the reset request from qemu monitor or 
libvirtd, the reset flow shown as below,
main_loop_should_exit
|- pause_all_vcpus
|- cpu_synchronize_all_states
|- qemu_system_reset
|-- cpu_synchronize_all_post_reset
|--- cpu_synchronize_post_reset
| kvm_cpu_synchronize_post_reset
|- kvm_arch_put_registers
|-- kvm_put_xcrs
|--- kvm_vcpu_ioctl(CPU(cpu), KVM_SET_XCRS, xcrs)
Above IOCTL syscall traps to kernel-space, KVM hypervisor then deal with the 
xcr set request,
kvm_arch_vcpu_ioctl
|- kvm_vcpu_ioctl_x86_set_xcrs
|-- __kvm_set_xcr
|--- if (kvm_x86_ops-get_cpl(vcpu) != 0)
  return 1;
It's quite probably happened that vcpu's CPL is 3 while QEMU monitor doing the 
reset operation, which is a normal case, so the CPL check should be removed.

But __kvm_set_xcr is also called in below flow, which is invoked by guest,
handle_xsetbv(or xsetbv_interception)
|- kvm_set_xcr
|-- __kvm_set_xcr
On this case, vcpu's CPL check must be performed, if vcpu's CPL!=0, then #GP 
exception should be injected to VM.

Thanks,
Zhang Haoyu


Thanks,

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Alex Williamson
On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
 
  IOMMU groups themselves don't provide security, they're accessed by
  interfaces like VFIO, which provide the security.  Given a brief look, I
  agree, this looks like a possible backdoor.  The typical VFIO way to
  handle this would be to pass a VFIO file descriptor here to prove that
  the process has access to the IOMMU group.  This is how /dev/vfio/vfio
  gains the ability to setup an IOMMU domain an do mappings with the
  SET_CONTAINER ioctl using a group fd.  Thanks,
 
 How do you envision that in the kernel ? IE. I'm in KVM code, gets that
 vfio fd, what do I do with it ?
 
 Basically, KVM needs to know that the user is allowed to use that iommu
 group. I don't think we want KVM however to call into VFIO directly
 right ?

Right, we don't want to create dependencies across modules.  I don't
have a vision for how this should work.  This is effectively a complete
side-band to vfio, so we're really just dealing in the iommu group
space.  Maybe there needs to be some kind of registration of ownership
for the group using some kind of token.  It would need to include some
kind of notification when that ownership ends.  That might also be a
convenient tag to toggle driver probing off for devices in the group.
Other ideas?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Benjamin Herrenschmidt
On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:

 Right, we don't want to create dependencies across modules.  I don't
 have a vision for how this should work.  This is effectively a complete
 side-band to vfio, so we're really just dealing in the iommu group
 space.  Maybe there needs to be some kind of registration of ownership
 for the group using some kind of token.  It would need to include some
 kind of notification when that ownership ends.  That might also be a
 convenient tag to toggle driver probing off for devices in the group.
 Other ideas?  Thanks,

All of that smells nasty like it will need a pile of bloody
infrastructure which makes me think it's too complicated and not the
right approach.

How does access control work today on x86/VFIO ? Can you give me a bit
more details ? I didn't get a good grasp in your previous email

From the look of it, the VFIO file descriptor is what has the access
control to the underlying iommu, is this right ? So we somewhat need to
transfer (or copy) that ownership from the VFIO fd to the KVM VM.

I don't see a way to do that without some cross-layering here...

Rusty, are you aware of some kernel mechanism we can use for that ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexey Kardashevskiy
On 06/17/2013 08:06 AM, Alexander Graf wrote:
 
 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:
 
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.

 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.

 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.

 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.

 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 Only a few minor nits. Ben already commented on implementation details.
 

 ---
 Changelog:
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number

 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 ---
 Documentation/virtual/kvm/api.txt   |   17 ++
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266 
 +++
 arch/powerpc/kvm/book3s_hv.c|   39 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
 arch/powerpc/kvm/powerpc.c  |3 +
 include/uapi/linux/kvm.h|1 +
 10 files changed, 473 insertions(+), 32 deletions(-)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 5f91eda..6c082ff 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed to 
 userspace to be
 handled.


 +4.83 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability tells the guest that multiple TCE entry add/remove 
 hypercalls
 +handling is supported by the kernel. This significanly accelerates DMA
 +operations for PPC KVM guests.
 +
 +Unlike other capabilities in this section, this one does not have an ioctl.
 +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
 +H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed 
 to
 +the guest. Othwerwise it might be better for the guest to continue using 
 H_PUT_TCE
 +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 

 While this describes perfectly well what the consequences are of the
 patches, it does not describe properly what the CAP actually expresses.
 The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls directly. All other consequences are nice to
 document, but the semantics of the CAP are missing.


? It expresses ability to handle 2 hcalls. What is missing?


 We also usually try to keep KVM behavior unchanged with regards to older
 versions until a CAP is enabled. In this case I don't think it matters
 all that much, so I'm fine with declaring it as enabled by default.
 Please document that this is a change in behavior versus older KVM
 versions though.


Ok!


 +
 +
 5. The kvm_run structure
 

 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index af326cd..85d8f26 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -609,6 +609,8 @@ struct kvm_vcpu_arch {
  spinlock_t tbacct_lock;
  u64 busy_stolen;
  u64 busy_preempt;
 +
 +unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
 #endif
 };
 
 [...]


 
 [...]
 
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:

 On 06/17/2013 06:02 PM, Alexander Graf wrote:
 
 On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:
 
 On 06/17/2013 08:06 AM, Alexander Graf wrote:
 
 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:
 
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.
 
 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.
 
 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.
 
 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 Only a few minor nits. Ben already commented on implementation details.
 
 
 ---
 Changelog:
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number
 
 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 ---
 Documentation/virtual/kvm/api.txt   |   17 ++
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266 
 +++
 arch/powerpc/kvm/book3s_hv.c|   39 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
 arch/powerpc/kvm/powerpc.c  |3 +
 include/uapi/linux/kvm.h|1 +
 10 files changed, 473 insertions(+), 32 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 5f91eda..6c082ff 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed 
 to userspace to be
 handled.
 
 
 +4.83 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability tells the guest that multiple TCE entry add/remove 
 hypercalls
 +handling is supported by the kernel. This significanly accelerates DMA
 +operations for PPC KVM guests.
 +
 +Unlike other capabilities in this section, this one does not have an 
 ioctl.
 +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
 +H_STUFF_TCE hypercalls are to be handled in the host kernel and not 
 passed to
 +the guest. Othwerwise it might be better for the guest to continue using 
 H_PUT_TCE
 +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
 While this describes perfectly well what the consequences are of the
 patches, it does not describe properly what the CAP actually expresses.
 The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls directly. All other consequences are nice to
 document, but the semantics of the CAP are missing.
 
 
 ? It expresses ability to handle 2 hcalls. What is missing?
 
 You don't describe the kvm - qemu interface. You describe some decisions 
 qemu can take from this cap.
 
 
 This file does not mention qemu at all. And the interface is - qemu (or
 kvmtool could do that) just adds hcall-multi-tce to
 ibm,hypertas-functions but this is for pseries linux and AIX could always
 do it (no idea about it). Does it really have to be in this file?

Ok, let's go back a step. What does this CAP describe? Don't look at the 
description you wrote above. Just write a new one. What exactly can user space 
expect when it finds this CAP?

 
 
 
 We also usually try to keep KVM behavior unchanged with regards to older
 versions until a CAP is enabled. In this case I don't think it matters
 all that 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 17.06.2013, at 10:37, Benjamin Herrenschmidt wrote:

 On Mon, 2013-06-17 at 17:55 +1000, Alexey Kardashevskiy wrote:
 David:
 ===
 So, in the case of MULTITCE, that's not quite right.  PR KVM can
 emulate a PAPR system on a BookE machine, and there's no reason not to
 allow TCE acceleration as well.  We can't make it dependent on PAPR
 mode being selected, because that's enabled per-vcpu, whereas these
 capabilities are queried on the VM before the vcpus are created.
 ===
 
 Wrong?
 
 The capability just tells qemu the kernel supports it, it doesn't have
 to depend on PAPR mode, qemu can sort things out no ?

Yes, this goes hand-in-hand with the documentation bit I'm trying to get 
through to Alexey atm. The CAP merely says that if in PAPR mode the kernel can 
handle hypercalls X and Y itself.

This is true for all book3s implementations as the patches stand. It is not 
true for BookE as the patches stand. Hence the CAP should be limited to book3s, 
regardless of its mode :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexey Kardashevskiy
On 06/17/2013 06:40 PM, Alexander Graf wrote:
 
 On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:
 
 On 06/17/2013 06:02 PM, Alexander Graf wrote:
 
 On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:
 
 On 06/17/2013 08:06 AM, Alexander Graf wrote:
 
 On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:
 
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and 
 H_STUFF_TCE hypercalls for QEMU emulated devices such as
 IBMVIO devices or emulated PCI.  These calls allow adding
 multiple entries (up to 512) into the TCE table in one call
 which saves time on transition to/from real mode.
 
 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs 
 (copied from user and verified) before writing the whole list
 into the TCE table. This cache will be utilized more in the
 upcoming VFIO/IOMMU support to continue TCE list processing in
 the virtual mode in the case if the real mode handler failed
 for some reason.
 
 This adds a guest physical to host real address converter and
 calls the existing H_PUT_TCE handler. The converting function 
 is going to be fully utilized by upcoming VFIO supporting
 patches.
 
 This also implements the KVM_CAP_PPC_MULTITCE capability, so
 in order to support the functionality of this patch, QEMU 
 needs to query for this capability and set the
 hcall-multi-tce hypertas property only if the capability is
 present, otherwise there will be serious performance
 degradation.
 
 Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by:
 Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul
 Mackerras pau...@samba.org
 
 Only a few minor nits. Ben already commented on implementation
 details.
 
 
 --- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the
 commit message * updated doc and moved it to another section *
 changed capability number
 
 2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup
 if put_indirect failed, instead we do not even start writing
 to TCE table if we cannot get TCEs from the user and they are 
 invalid * kvmppc_emulated_h_put_tce is split to
 kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for
 the previous item) * fixed bug with failthrough for H_IPI *
 removed all get_user() from real mode handlers *
 kvmppc_lookup_pte() added (instead of making lookup_linux_pte
 public) --- Documentation/virtual/kvm/api.txt   |   17 ++ 
 arch/powerpc/include/asm/kvm_host.h |2 + 
 arch/powerpc/include/asm/kvm_ppc.h  |   16 +- 
 arch/powerpc/kvm/book3s_64_vio.c|  118 ++ 
 arch/powerpc/kvm/book3s_64_vio_hv.c |  266
 +++ arch/powerpc/kvm/book3s_hv.c
 |   39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + 
 arch/powerpc/kvm/book3s_pr_papr.c   |   37 - 
 arch/powerpc/kvm/powerpc.c  |3 + 
 include/uapi/linux/kvm.h|1 + 10 files
 changed, 473 insertions(+), 32 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt
 b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff
 100644 --- a/Documentation/virtual/kvm/api.txt +++
 b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@
 calls by the guest for that service will be passed to
 userspace to be handled.
 
 
 +4.83 KVM_CAP_PPC_MULTITCE + +Capability:
 KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This
 capability tells the guest that multiple TCE entry add/remove
 hypercalls +handling is supported by the kernel. This
 significanly accelerates DMA +operations for PPC KVM guests. 
 + +Unlike other capabilities in this section, this one does
 not have an ioctl. +Instead, when the capability is present,
 the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be
 handled in the host kernel and not passed to +the guest.
 Othwerwise it might be better for the guest to continue using
 H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or
 KVM_CAP_SPAPR_TCE_IOMMU are present).
 
 
 While this describes perfectly well what the consequences are of
 the patches, it does not describe properly what the CAP actually
 expresses. The CAP only says this kernel is able to handle
 H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All
 other consequences are nice to document, but the semantics of
 the CAP are missing.
 
 
 ? It expresses ability to handle 2 hcalls. What is missing?
 
 You don't describe the kvm - qemu interface. You describe some
 decisions qemu can take from this cap.
 
 
 This file does not mention qemu at all. And the interface is - qemu
 (or kvmtool could do that) just adds hcall-multi-tce to 
 ibm,hypertas-functions but this is for pseries linux and AIX could
 always do it (no idea about it). Does it really have to be in this
 file?
 

 Ok, let's go back a step. What does this CAP describe? Don't look at the
 description you wrote above. Just write a new one.

The CAP means the kernel is capable of handling hcalls A and B without
passing those into the user space. That accelerates DMA.


 What exactly can user space expect when it finds this 

Re: [PATCH 2/4] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-06-17 Thread Alexey Kardashevskiy
On 06/16/2013 02:26 PM, Benjamin Herrenschmidt wrote:
 +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
 +int realmode_get_page(struct page *page)
 +{
 +if (PageCompound(page))
 +return -EAGAIN;
 +
 +get_page(page);
 +
 +return 0;
 +}
 +EXPORT_SYMBOL_GPL(realmode_get_page);
 +
 +int realmode_put_page(struct page *page)
 +{
 +if (PageCompound(page))
 +return -EAGAIN;
 +
 +if (!atomic_add_unless(page-_count, -1, 1))
 +return -EAGAIN;
 +
 +return 0;
 +}
 +EXPORT_SYMBOL_GPL(realmode_put_page);
 +#endif
 
 Several worries here, mostly that if the generic code ever changes
 (something gets added to get_page() that makes it no-longer safe for use
 in real mode for example, or some other condition gets added to
 put_page()), we go out of sync and potentially end up with very hard and
 very subtle bugs.
 
 It might be worth making sure that:
 
  - This is reviewed by some generic VM people (and make sure they
 understand why we need to do that)
 
  - A comment is added to get_page() and put_page() to make sure that if
 they are changed in any way, dbl check the impact on our
 realmode_get_page() (or ping us to make sure things are still ok).

After changing get_page() to get_page_unless_zero(), the get_page API I use is:
get_page_unless_zero() - basically atomic_inc_not_zero()
atomic_add_unless() - just operated with the counter
PageCompound() - check if it is a huge page.

No usage of get_page or put_page.

If any of those changes, I would expect it to hit us immediately, no?

So it may only make sense to add a comment to PageCompound(). But the
comment says PageCompound is generally not used in hot code paths, and
our path is hot. Heh.

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6d53675..c70a654 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -329,7 +329,8 @@ static inline void set_page_writeback(struct page *page)
  * System with lots of page flags available. This allows separate
  * flags for PageHead() and PageTail() checks of compound pages so that bit
  * tests can be used in performance sensitive paths. PageCompound is
- * generally not used in hot code paths.
+ * generally not used in hot code paths except arch/powerpc/mm/init_64.c
+ * which uses it to detect huge pages and avoid handling those in real mode.
  */
 __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
 __PAGEFLAG(Tail, tail)


So?


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 06/17/2013 10:51 AM, Alexey Kardashevskiy wrote:

On 06/17/2013 06:40 PM, Alexander Graf wrote:

On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:


On 06/17/2013 06:02 PM, Alexander Graf wrote:

On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:


On 06/17/2013 08:06 AM, Alexander Graf wrote:

On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:


This adds real mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for QEMU emulated devices such as
IBMVIO devices or emulated PCI.  These calls allow adding
multiple entries (up to 512) into the TCE table in one call
which saves time on transition to/from real mode.

This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
(copied from user and verified) before writing the whole list
into the TCE table. This cache will be utilized more in the
upcoming VFIO/IOMMU support to continue TCE list processing in
the virtual mode in the case if the real mode handler failed
for some reason.

This adds a guest physical to host real address converter and
calls the existing H_PUT_TCE handler. The converting function
is going to be fully utilized by upcoming VFIO supporting
patches.

This also implements the KVM_CAP_PPC_MULTITCE capability, so
in order to support the functionality of this patch, QEMU
needs to query for this capability and set the
hcall-multi-tce hypertas property only if the capability is
present, otherwise there will be serious performance
degradation.

Cc: David Gibsonda...@gibson.dropbear.id.au  Signed-off-by:
Alexey Kardashevskiya...@ozlabs.ru  Signed-off-by: Paul
Mackerraspau...@samba.org

Only a few minor nits. Ben already commented on implementation
details.


--- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the
commit message * updated doc and moved it to another section *
changed capability number

2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup
if put_indirect failed, instead we do not even start writing
to TCE table if we cannot get TCEs from the user and they are
invalid * kvmppc_emulated_h_put_tce is split to
kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for
the previous item) * fixed bug with failthrough for H_IPI *
removed all get_user() from real mode handlers *
kvmppc_lookup_pte() added (instead of making lookup_linux_pte
public) --- Documentation/virtual/kvm/api.txt   |   17 ++
arch/powerpc/include/asm/kvm_host.h |2 +
arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
arch/powerpc/kvm/book3s_64_vio.c|  118 ++
arch/powerpc/kvm/book3s_64_vio_hv.c |  266
+++ arch/powerpc/kvm/book3s_hv.c
|   39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
arch/powerpc/kvm/powerpc.c  |3 +
include/uapi/linux/kvm.h|1 + 10 files
changed, 473 insertions(+), 32 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff
100644 --- a/Documentation/virtual/kvm/api.txt +++
b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@
calls by the guest for that service will be passed to
userspace to be handled.


+4.83 KVM_CAP_PPC_MULTITCE + +Capability:
KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This
capability tells the guest that multiple TCE entry add/remove
hypercalls +handling is supported by the kernel. This
significanly accelerates DMA +operations for PPC KVM guests.
+ +Unlike other capabilities in this section, this one does
not have an ioctl. +Instead, when the capability is present,
the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be
handled in the host kernel and not passed to +the guest.
Othwerwise it might be better for the guest to continue using
H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or
KVM_CAP_SPAPR_TCE_IOMMU are present).

While this describes perfectly well what the consequences are of
the patches, it does not describe properly what the CAP actually
expresses. The CAP only says this kernel is able to handle
H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All
other consequences are nice to document, but the semantics of
the CAP are missing.


? It expresses ability to handle 2 hcalls. What is missing?

You don't describe the kvm-  qemu interface. You describe some
decisions qemu can take from this cap.


This file does not mention qemu at all. And the interface is - qemu
(or kvmtool could do that) just adds hcall-multi-tce to
ibm,hypertas-functions but this is for pseries linux and AIX could
always do it (no idea about it). Does it really have to be in this
file?

Ok, let's go back a step. What does this CAP describe? Don't look at the
description you wrote above. Just write a new one.

The CAP means the kernel is capable of handling hcalls A and B without
passing those into the user space. That accelerates DMA.



What exactly can user space expect when it finds this CAP?

The user space can expect that its handlers for A and B are not 

Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-17 Thread Alexander Graf

On 06/17/2013 12:46 PM, Alexander Graf wrote:

On 06/17/2013 10:51 AM, Alexey Kardashevskiy wrote:

On 06/17/2013 06:40 PM, Alexander Graf wrote:

On 17.06.2013, at 10:34, Alexey Kardashevskiy wrote:


On 06/17/2013 06:02 PM, Alexander Graf wrote:

On 17.06.2013, at 09:55, Alexey Kardashevskiy wrote:


On 06/17/2013 08:06 AM, Alexander Graf wrote:

On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote:


This adds real mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for QEMU emulated devices such as
IBMVIO devices or emulated PCI.  These calls allow adding
multiple entries (up to 512) into the TCE table in one call
which saves time on transition to/from real mode.

This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
(copied from user and verified) before writing the whole list
into the TCE table. This cache will be utilized more in the
upcoming VFIO/IOMMU support to continue TCE list processing in
the virtual mode in the case if the real mode handler failed
for some reason.

This adds a guest physical to host real address converter and
calls the existing H_PUT_TCE handler. The converting function
is going to be fully utilized by upcoming VFIO supporting
patches.

This also implements the KVM_CAP_PPC_MULTITCE capability, so
in order to support the functionality of this patch, QEMU
needs to query for this capability and set the
hcall-multi-tce hypertas property only if the capability is
present, otherwise there will be serious performance
degradation.

Cc: David Gibsonda...@gibson.dropbear.id.au  Signed-off-by:
Alexey Kardashevskiya...@ozlabs.ru  Signed-off-by: Paul
Mackerraspau...@samba.org

Only a few minor nits. Ben already commented on implementation
details.


--- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the
commit message * updated doc and moved it to another section *
changed capability number

2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup
if put_indirect failed, instead we do not even start writing
to TCE table if we cannot get TCEs from the user and they are
invalid * kvmppc_emulated_h_put_tce is split to
kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for
the previous item) * fixed bug with failthrough for H_IPI *
removed all get_user() from real mode handlers *
kvmppc_lookup_pte() added (instead of making lookup_linux_pte
public) --- Documentation/virtual/kvm/api.txt   |   17 ++
arch/powerpc/include/asm/kvm_host.h |2 +
arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
arch/powerpc/kvm/book3s_64_vio.c|  118 ++
arch/powerpc/kvm/book3s_64_vio_hv.c |  266
+++ arch/powerpc/kvm/book3s_hv.c
|   39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
arch/powerpc/kvm/powerpc.c  |3 +
include/uapi/linux/kvm.h|1 + 10 files
changed, 473 insertions(+), 32 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff
100644 --- a/Documentation/virtual/kvm/api.txt +++
b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@
calls by the guest for that service will be passed to
userspace to be handled.


+4.83 KVM_CAP_PPC_MULTITCE + +Capability:
KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This
capability tells the guest that multiple TCE entry add/remove
hypercalls +handling is supported by the kernel. This
significanly accelerates DMA +operations for PPC KVM guests.
+ +Unlike other capabilities in this section, this one does
not have an ioctl. +Instead, when the capability is present,
the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be
handled in the host kernel and not passed to +the guest.
Othwerwise it might be better for the guest to continue using
H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or
KVM_CAP_SPAPR_TCE_IOMMU are present).

While this describes perfectly well what the consequences are of
the patches, it does not describe properly what the CAP actually
expresses. The CAP only says this kernel is able to handle
H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All
other consequences are nice to document, but the semantics of
the CAP are missing.


? It expresses ability to handle 2 hcalls. What is missing?

You don't describe the kvm-  qemu interface. You describe some
decisions qemu can take from this cap.


This file does not mention qemu at all. And the interface is - qemu
(or kvmtool could do that) just adds hcall-multi-tce to
ibm,hypertas-functions but this is for pseries linux and AIX could
always do it (no idea about it). Does it really have to be in this
file?
Ok, let's go back a step. What does this CAP describe? Don't look at 
the

description you wrote above. Just write a new one.

The CAP means the kernel is capable of handling hcalls A and B without
passing those into the user space. That accelerates DMA.



What exactly can user space expect when it finds this CAP?
The user space can 

Re: [PATCH 0/8] kvm/ppc: fixes for 3.10

2013-06-17 Thread Paolo Bonzini
Il 10/06/2013 21:52, Scott Wood ha scritto:
 On 06/09/2013 03:09:21 AM, Gleb Natapov wrote:
 On Thu, Jun 06, 2013 at 07:16:28PM -0500, Scott Wood wrote:
  Most of these have been posted before, but I grouped them together as
  there are some contextual dependencies between them.
 
  Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these
  if there's no objection over the next few days?
 
 Well we are at -rc5 now and Linus specifically said that if he sees one
 more cleanup he will be less then happy [1]. Looks like this patch
 series does have some cleanups that can be postponed to 3.11.
 Patches 1-4,7 looks like 3.10 material to me. 5 and 6 a cleanups that can
 wait for 3.11. Not sure about 8, if 8 fixes serious problem please
 specify it in the commit message.
 
 Agreed.
 
 8 did fix a BUG_ON before patch 7 came along, but now it looks
 non-critical.
 
 5 only affects IRQ tracing, and it's not a regression, so also probably
 not critical.  I'll resend patch 7 so that it applies without needing
 patch 5.
 
 6 is mainly doing things that we originally thought were a fix to lazy
 ee handling, until we noticed code elsewhere handling it in a hackier
 way.  There's still a bugfix in that previously kvm_guest_exit() was
 called in the wrong place which could occasionally mess up virtual time
 accounting, but that's also not a regression and not critical.

CCed people probably already know, but in any case patches 1-4 are
already in Linus's tree (commit af180b81).

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-06-17 Thread Paolo Bonzini
Il 05/06/2013 08:11, Alexey Kardashevskiy ha scritto:
 +/*
 + * The KVM guest can be backed with 16MB pages (qemu switch
 + * -mem-path /var/lib/hugetlbfs/global/pagesize-16MB/).

Nitpick: we try to avoid references to QEMU, so perhaps

s/qemu switch/for example, with QEMU you can use the command-line option/

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2013-06-17 Thread AFG GTBANK LOAN



Loan Syndicacion

Am AFG Guaranty Trust Bank, zu strukturieren wir Kreditlinien treffen Sie
unsere
Kunden spezifischen geschäftlichen Anforderungen und einen deutlichen
Mehrwert für unsere
Kunden Unternehmen.
eine Division der AFG Finance und Private Bank plc.

Wenn Sie erwägen, eine große Akquisition oder ein Großprojekt sind, können
Sie
brauchen eine erhebliche Menge an Kredit. AFG Guaranty Trust Bank setzen
können
zusammen das Syndikat, das die gesamte Kredit schnürt für
Sie.


Als Bank mit internationaler Reichweite, sind wir gekommen, um Darlehen zu
identifizieren
Syndizierungen als Teil unseres Kerngeschäfts und durch spitzte diese Zeile
aggressiv sind wir an einem Punkt, wo wir kommen, um als erkannt haben
Hauptakteur in diesem Bereich.


öffnen Sie ein Girokonto heute mit einem Minimum Bankguthaben von 500 £ und
Getup zu £ 10.000 als Darlehen und auch den Hauch einer Chance und gewann
die Sterne
Preis von £ 500.000 in die sparen und gewinnen promo in may.aply jetzt.


mit dem Folowing Informationen über Rechtsanwalt steven lee das Konto
Offizier.


FULL NAME;


Wohnadresse;


E-MAIL-ADRESSE;

Telefonnummer;

Nächsten KINS;

MUTTER MAIDEN NAME;


Familienstand;


BÜROADRESSE;

ALTERNATIVE Telefonnummer;

TO @ yahoo.com bar.stevenlee
NOTE; ALLE Darlehen sind für 10JAHRE RATE VALID
ANGEBOT ENDET BALD SO JETZT HURRY

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Alex Williamson
On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote:
 
  IOMMU groups themselves don't provide security, they're accessed by
  interfaces like VFIO, which provide the security.  Given a brief look, I
  agree, this looks like a possible backdoor.  The typical VFIO way to
  handle this would be to pass a VFIO file descriptor here to prove that
  the process has access to the IOMMU group.  This is how /dev/vfio/vfio
  gains the ability to setup an IOMMU domain an do mappings with the
  SET_CONTAINER ioctl using a group fd.  Thanks,
 
 How do you envision that in the kernel ? IE. I'm in KVM code, gets that
 vfio fd, what do I do with it ?
 
 Basically, KVM needs to know that the user is allowed to use that iommu
 group. I don't think we want KVM however to call into VFIO directly
 right ?

Right, we don't want to create dependencies across modules.  I don't
have a vision for how this should work.  This is effectively a complete
side-band to vfio, so we're really just dealing in the iommu group
space.  Maybe there needs to be some kind of registration of ownership
for the group using some kind of token.  It would need to include some
kind of notification when that ownership ends.  That might also be a
convenient tag to toggle driver probing off for devices in the group.
Other ideas?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-17 Thread Benjamin Herrenschmidt
On Mon, 2013-06-17 at 20:32 -0600, Alex Williamson wrote:

 Right, we don't want to create dependencies across modules.  I don't
 have a vision for how this should work.  This is effectively a complete
 side-band to vfio, so we're really just dealing in the iommu group
 space.  Maybe there needs to be some kind of registration of ownership
 for the group using some kind of token.  It would need to include some
 kind of notification when that ownership ends.  That might also be a
 convenient tag to toggle driver probing off for devices in the group.
 Other ideas?  Thanks,

All of that smells nasty like it will need a pile of bloody
infrastructure which makes me think it's too complicated and not the
right approach.

How does access control work today on x86/VFIO ? Can you give me a bit
more details ? I didn't get a good grasp in your previous email

From the look of it, the VFIO file descriptor is what has the access
control to the underlying iommu, is this right ? So we somewhat need to
transfer (or copy) that ownership from the VFIO fd to the KVM VM.

I don't see a way to do that without some cross-layering here...

Rusty, are you aware of some kernel mechanism we can use for that ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html