Re: [PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().

2014-09-09 Thread tangchen

Hi Gleb,

On 09/03/2014 11:04 PM, Gleb Natapov wrote:

On Wed, Sep 03, 2014 at 09:42:30AM +0800, tangchen wrote:

Hi Gleb,

On 09/03/2014 12:00 AM, Gleb Natapov wrote:

..
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+   /*
+* apic access page could be migrated. When the page is being migrated,
+* GUP will wait till the migrate entry is replaced with the new pte
+* entry pointing to the new page.
+*/
+   vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm,
+   APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+   kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+   page_to_phys(vcpu->kvm->arch.apic_access_page));
I am a little bit worried that here all vcpus write to 
vcpu->kvm->arch.apic_access_page
without any locking. It is probably benign since pointer write is atomic on 
x86. Paolo?

Do we even need apic_access_page? Why not call
  gfn_to_page(APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT)
  put_page()
on rare occasions we need to know its address?

Isn't it a necessary item defined in hardware spec ?


vcpu->kvm->arch.apic_access_page? No. This is internal kvm data structure.


I didn't read intel spec deeply, but according to the code, the page's
address is
written into vmcs. And it made me think that we cannnot remove it.


We cannot remove writing of apic page address into vmcs, but this is not done by
assigning to vcpu->kvm->arch.apic_access_page, but by vmwrite in 
set_apic_access_page_addr().


OK, I'll try to remove kvm->arch.apic_access_page and send a patch for 
it soon.


BTW, if you don't have objection to the first two patches, would you 
please help to

commit them first ?

Thanks.



--
Gleb.
.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v5] powerpc/kvm: support to handle sw breakpoint

2014-09-09 Thread Madhavan Srinivasan
On Monday 08 September 2014 06:35 PM, Alexander Graf wrote:
> 
> 
> On 07.09.14 18:31, Madhavan Srinivasan wrote:
>> This patch adds kernel side support for software breakpoint.
>> Design is that, by using an illegal instruction, we trap to hypervisor
>> via Emulation Assistance interrupt, where we check for the illegal 
>> instruction
>> and accordingly we return to Host or Guest. Patch also adds support for
>> software breakpoint in PR KVM.
>>
>> Signed-off-by: Madhavan Srinivasan 
>> ---
>>  arch/powerpc/include/asm/kvm_ppc.h |  6 ++
>>  arch/powerpc/kvm/book3s.c  |  3 ++-
>>  arch/powerpc/kvm/book3s_hv.c   | 41 
>> ++
>>  arch/powerpc/kvm/book3s_pr.c   |  3 +++
>>  arch/powerpc/kvm/emulate.c | 18 +
>>  5 files changed, 66 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index fb86a22..dd83c9a 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -38,6 +38,12 @@
>>  #include 
>>  #endif
>>  
>> +/*
>> + * KVMPPC_INST_SW_BREAKPOINT is debug Instruction
>> + * for supporting software breakpoint.
>> + */
>> +#define KVMPPC_INST_SW_BREAKPOINT   0x0000
>> +
>>  enum emulation_result {
>>  EMULATE_DONE, /* no further processing */
>>  EMULATE_DO_MMIO,  /* kvm_run filled with MMIO request */
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index dd03f6b..00e9c9f 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>>  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>>  struct kvm_guest_debug *dbg)
>>  {
>> -return -EINVAL;
>> +vcpu->guest_debug = dbg->control;
>> +return 0;
>>  }
>>  
>>  void kvmppc_decrementer_func(unsigned long data)
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 27cced9..3a2414c 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
>>  return kvmppc_hcall_impl_hv_realmode(cmd);
>>  }
>>  
>> +static int kvmppc_emulate_debug_inst(struct kvm_run *run,
>> +struct kvm_vcpu *vcpu)
>> +{
>> +u32 last_inst;
>> +
>> +if(kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) !=
>> +EMULATE_DONE) {
>> +/*
>> + * Fetch failed, so return to guest and
>> + * try executing it again.
>> + */
>> +return RESUME_GUEST;
> 
> Please end the if() here. In the kernel we usually treat abort
> situations as separate.
> 

OK. Will change it.

>> +} else {
>> +if (last_inst == KVMPPC_INST_SW_BREAKPOINT) {
>> +run->exit_reason = KVM_EXIT_DEBUG;
>> +run->debug.arch.address = kvmppc_get_pc(vcpu);
>> +return RESUME_HOST;
>> +} else {
>> +kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
>> +return RESUME_GUEST;
>> +}
>> +}
>> +}
>> +
>>  static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>   struct task_struct *tsk)
>>  {
>> @@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
>> struct kvm_vcpu *vcpu,
>>  break;
>>  /*
>>   * This occurs if the guest executes an illegal instruction.
>> - * We just generate a program interrupt to the guest, since
>> - * we don't emulate any guest instructions at this stage.
>> + * If the guest debug is disabled, generate a program interrupt
>> + * to the guest. If guest debug is enabled, we need to check
>> + * whether the instruction is a software breakpoint instruction.
>> + * Accordingly return to Guest or Host.
>>   */
>>  case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
>> -kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
>> -r = RESUME_GUEST;
>> +if(vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) {
>> +r = kvmppc_emulate_debug_inst(run, vcpu);
>> +} else {
>> +kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
>> +r = RESUME_GUEST;
>> +}
>>  break;
>>  /*
>>   * This occurs if the guest (kernel or userspace), does something that
>> @@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
>> u64 id,
>>  long int i;
>>  
>>  switch (id) {
>> +case KVM_REG_PPC_DEBUG_INST:
>> +*val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT);
>> +break;
>>  case KVM_REG_PPC_HIOR:
>>  *val = get_reg_val(id, 0);
>>  break;
>> diff --git a/arch/p

[Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge

2014-09-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #19 from Marti Raudsepp  ---
(In reply to Joel Schopp from comment #15)
> It's not clear to me which devices were being put in the same group.

Hi Joel, any updates on this? I posted my IOMMU groups in comment #17 in case
you missed it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v5] powerpc/kvm: common sw breakpoint instr across ppc

2014-09-09 Thread Madhavan Srinivasan
On Monday 08 September 2014 06:39 PM, Alexander Graf wrote:
> 
> 
> On 07.09.14 18:31, Madhavan Srinivasan wrote:
>> This patch extends the use of illegal instruction as software
>> breakpoint instruction across the ppc platform. Patch extends
>> booke program interrupt code to support software breakpoint.
>>
>> Signed-off-by: Madhavan Srinivasan 
>> ---
>>
>> Patch is only compile tested. Will really help if
>> someone can try it out and let me know comments.
>>
>>  arch/powerpc/kvm/booke.c | 18 --
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index b4c89fa..1b84853 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -870,6 +870,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
>> kvm_vcpu *vcpu,
>>  case BOOKE_INTERRUPT_HV_PRIV:
>>  emulated = kvmppc_get_last_inst(vcpu, false, &last_inst);
>>  break;
>> +case BOOKE_INTERRUPT_PROGRAM:
>> +/*SW breakpoints arrive as illegal instructions on HV */
> 
> Is it my email client or is there a space missing again? ;)
> 

Facepalm. Will fix it.

> Also, please only fetch the last instruction if debugging is active.
> 

Will change it.

>> +emulated = kvmppc_get_last_inst(vcpu, false, &last_inst);
>> +break;
>>  default:
>>  break;
>>  }
>> @@ -947,7 +951,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
>> kvm_vcpu *vcpu,
>>  break;
>>  
>>  case BOOKE_INTERRUPT_PROGRAM:
>> -if (vcpu->arch.shared->msr & (MSR_PR | MSR_GS)) {
>> +if ((vcpu->arch.shared->msr & (MSR_PR | MSR_GS)) &&
>> +(last_inst == KVMPPC_INST_SW_BREAKPOINT)) {
> 
> I think this is changing the logic from "if the guest is in user mode or
> we're in HV, deflect" to "if the guest is in user mode or an HV guest
> and the instruction is a breakpoint, treat it as debug. Otherwise
> deflect". So you're essentially breaking PR KVM here from what I can tell.
> 
> Why don't you just split the whole thing out to the beginning of
> BOOKE_INTERRUPT_PROGRAM and check for
> 
>   a) debug is enabled
>   b) instruction is sw breakpoint
> 
This is what we pretty much do for the server side. Will changes it.

> instead?
> 
>> +/*
>> + * We are here because of an SW breakpoint instr,
>> + * so lets return to host to handle.
>> + */
>> +r = kvmppc_handle_debug(run, vcpu);
>> +run->exit_reason = KVM_EXIT_DEBUG;
>> +kvmppc_account_exit(vcpu, DEBUG_EXITS);
>> +break;
>> +} else {
>>  /*
>>   * Program traps generated by user-level software must
>>   * be handled by the guest kernel.
>> @@ -1505,7 +1519,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
>> struct kvm_one_reg *reg)
>>  val = get_reg_val(reg->id, vcpu->arch.tsr);
>>  break;
>>  case KVM_REG_PPC_DEBUG_INST:
>> -val = get_reg_val(reg->id, KVMPPC_INST_EHPRIV_DEBUG);
> 
> Please also remove the definition of EHPRIV_DEBUG.
> 
OK. Will do.


Thanks for review
Maddy

> 
> Alex
> 
>> +val = get_reg_val(reg->id, KVMPPC_INST_SW_BREAKPOINT);
>>  break;
>>  case KVM_REG_PPC_VRSAVE:
>>  val = get_reg_val(reg->id, vcpu->arch.vrsave);
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-09 Thread Ard Biesheuvel
On 9 September 2014 11:35, Marc Zyngier  wrote:
> Hi Ard,
>
>
> On 2014-09-08 21:29, Ard Biesheuvel wrote:
>>
>> The ISS encoding for an exception from a Data Abort has a WnR
>> bit[6] that indicates whether the Data Abort was caused by a
>> read or a write instruction. While there are several fields
>> in the encoding that are only valid if the ISV bit[24] is set,
>> WnR is not one of them, so we can read it unconditionally.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>
>> This fixes an issue I observed with UEFI running under QEMU/KVM using
>> NOR flash emulation and the upcoming KVM_CAP_READONLY_MEM support, where
>> NOR flash reads were mistaken for NOR flash writes, resulting in all read
>> accesses to go through the MMIO emulation layer.
>>
>>  arch/arm/include/asm/kvm_mmu.h   | 5 +
>>  arch/arm64/include/asm/kvm_mmu.h | 5 +
>>  2 files changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h
>> b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f5f72f..fad5648980ad 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -83,10 +83,7 @@ static inline bool kvm_is_write_fault(unsigned long
>> hsr)
>> unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
>> if (hsr_ec == HSR_EC_IABT)
>> return false;
>> -   else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
>> -   return false;
>> -   else
>> -   return true;
>> +   return hsr & HSR_WNR;
>>  }
>>
>>  static inline void kvm_clean_pgd(pgd_t *pgd)
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index 8e138c7c53ac..09fd9e4c13d8 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -100,10 +100,7 @@ static inline bool kvm_is_write_fault(unsigned long
>> esr)
>> if (esr_ec == ESR_EL2_EC_IABT)
>> return false;
>>
>> -   if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR))
>> -   return false;
>> -
>> -   return true;
>> +   return esr & ESR_EL2_WNR;
>>  }
>>
>>  static inline void kvm_clean_pgd(pgd_t *pgd) {}
>
>
> Nice catch. One thing though.
>
> This is a case where code duplication has led to this glaring bug:
> On both arm and arm64, kvm_emulate.h has code that implements this
> correctly, just that we failed to use it. Blame me.
>
> I think this should be rewritten entierely in mmu.c, with something like
> this (fully untested, of course):
>
> static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> {
> if (kvm_vcpu_trap_is_iabt(vcpu))
> return false;
>
> return kvm_vcpu_dabt_iswrite(vcpu);
> }
>
> Care to respin it?
>

Will do.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-09 Thread Marc Zyngier

Hi Ard,

On 2014-09-08 21:29, Ard Biesheuvel wrote:

The ISS encoding for an exception from a Data Abort has a WnR
bit[6] that indicates whether the Data Abort was caused by a
read or a write instruction. While there are several fields
in the encoding that are only valid if the ISV bit[24] is set,
WnR is not one of them, so we can read it unconditionally.

Signed-off-by: Ard Biesheuvel 
---

This fixes an issue I observed with UEFI running under QEMU/KVM using
NOR flash emulation and the upcoming KVM_CAP_READONLY_MEM support, 
where
NOR flash reads were mistaken for NOR flash writes, resulting in all 
read

accesses to go through the MMIO emulation layer.

 arch/arm/include/asm/kvm_mmu.h   | 5 +
 arch/arm64/include/asm/kvm_mmu.h | 5 +
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h 
b/arch/arm/include/asm/kvm_mmu.h

index 5cc0b0f5f72f..fad5648980ad 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -83,10 +83,7 @@ static inline bool kvm_is_write_fault(unsigned 
long hsr)

unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
if (hsr_ec == HSR_EC_IABT)
return false;
-   else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
-   return false;
-   else
-   return true;
+   return hsr & HSR_WNR;
 }

 static inline void kvm_clean_pgd(pgd_t *pgd)
diff --git a/arch/arm64/include/asm/kvm_mmu.h
b/arch/arm64/include/asm/kvm_mmu.h
index 8e138c7c53ac..09fd9e4c13d8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -100,10 +100,7 @@ static inline bool kvm_is_write_fault(unsigned 
long esr)

if (esr_ec == ESR_EL2_EC_IABT)
return false;

-   if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR))
-   return false;
-
-   return true;
+   return esr & ESR_EL2_WNR;
 }

 static inline void kvm_clean_pgd(pgd_t *pgd) {}


Nice catch. One thing though.

This is a case where code duplication has led to this glaring bug:
On both arm and arm64, kvm_emulate.h has code that implements this 
correctly, just that we failed to use it. Blame me.


I think this should be rewritten entierely in mmu.c, with something 
like this (fully untested, of course):


static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
{
if (kvm_vcpu_trap_is_iabt(vcpu))
return false;

return kvm_vcpu_dabt_iswrite(vcpu);
}

Care to respin it?

Thanks,

M.
--
Fast, cheap, reliable. Pick two.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm64: add gic-400 compatible

2014-09-09 Thread Mark Rutland
On Mon, Sep 08, 2014 at 11:21:44PM +0100, Joel Schopp wrote:
> Add a one liner to identify the gic-400.  It's gicv2 with optional MSI 
> extensions.
> 
> Cc: Christoffer Dall 
> Signed-off-by: Joel Schopp 
> ---
>  virt/kvm/arm/vgic.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 73eba79..e81444e 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1550,6 +1550,7 @@ static struct notifier_block vgic_cpu_nb = {
>  
>  static const struct of_device_id vgic_ids[] = {
>   { .compatible = "arm,cortex-a15-gic", .data = vgic_v2_probe, },
> + { .compatible = "arm,gic-400", .data = vgic_v2_probe, },

Given this is documented and is supported by the GIC driver:

Acked-by: Mark Rutland 

Mark.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-09 Thread Ard Biesheuvel
The ISS encoding for an exception from a Data Abort has a WnR
bit[6] that indicates whether the Data Abort was caused by a
read or a write instruction. While there are several fields
in the encoding that are only valid if the ISV bit[24] is set,
WnR is not one of them, so we can read it unconditionally.

Instead of fixing both implementations of kvm_is_write_fault()
in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
which already does the right thing with respect to the WnR bit.
Also fix up the callers to pass 'vcpu'

Acked-by: Laszlo Ersek 
Signed-off-by: Ard Biesheuvel 
---
v2: reimplement locally in mmu.c and drop the 2 arch specific version
add ack

 arch/arm/include/asm/kvm_mmu.h   | 11 ---
 arch/arm/kvm/mmu.c   | 12 ++--
 arch/arm64/include/asm/kvm_mmu.h | 13 -
 3 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f5f72f..3f688b458143 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -78,17 +78,6 @@ static inline void kvm_set_pte(pte_t *pte, pte_t new_pte)
flush_pmd_entry(pte);
 }
 
-static inline bool kvm_is_write_fault(unsigned long hsr)
-{
-   unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
-   if (hsr_ec == HSR_EC_IABT)
-   return false;
-   else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
-   return false;
-   else
-   return true;
-}
-
 static inline void kvm_clean_pgd(pgd_t *pgd)
 {
clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t));
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 747ef4c97d98..c68ec28f17c3 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -746,6 +746,14 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
phys_addr_t *ipap)
return false;
 }
 
+static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
+{
+   if (kvm_vcpu_trap_is_iabt(vcpu))
+   return false;
+
+   return kvm_vcpu_dabt_iswrite(vcpu);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
@@ -760,7 +768,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
 
-   write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
+   write_fault = kvm_is_write_fault(vcpu);
if (fault_status == FSC_PERM && !write_fault) {
kvm_err("Unexpected L2 read permission error\n");
return -EFAULT;
@@ -886,7 +894,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
gfn = fault_ipa >> PAGE_SHIFT;
memslot = gfn_to_memslot(vcpu->kvm, gfn);
hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
-   write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
+   write_fault = kvm_is_write_fault(vcpu);
if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
if (is_iabt) {
/* Prefetch Abort on I/O address */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8e138c7c53ac..737da742b293 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -93,19 +93,6 @@ void kvm_clear_hyp_idmap(void);
 #definekvm_set_pte(ptep, pte)  set_pte(ptep, pte)
 #definekvm_set_pmd(pmdp, pmd)  set_pmd(pmdp, pmd)
 
-static inline bool kvm_is_write_fault(unsigned long esr)
-{
-   unsigned long esr_ec = esr >> ESR_EL2_EC_SHIFT;
-
-   if (esr_ec == ESR_EL2_EC_IABT)
-   return false;
-
-   if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR))
-   return false;
-
-   return true;
-}
-
 static inline void kvm_clean_pgd(pgd_t *pgd) {}
 static inline void kvm_clean_pmd_entry(pmd_t *pmd) {}
 static inline void kvm_clean_pte(pte_t *pte) {}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-rng: complete have_data completion in removing device

2014-09-09 Thread Amos Kong
On Mon, Sep 08, 2014 at 11:29:51PM +0800, Amos Kong wrote:
> On Wed, Aug 06, 2014 at 01:55:29PM +0530, Amit Shah wrote:
> > On (Wed) 06 Aug 2014 [16:05:41], Amos Kong wrote:
> > > On Wed, Aug 06, 2014 at 01:35:15AM +0800, Amos Kong wrote:
> > > > When we try to hot-remove a busy virtio-rng device from QEMU monitor,
> > > > the device can't be hot-removed. Because virtio-rng driver hangs at
> > > > wait_for_completion_killable().
> > > > 
> > > > This patch fixed the hang by completing have_data completion before
> > > > unregistering a virtio-rng device.
> > > 
> > > Hi Amit,
> > > 
> > > Before applying this patch, it's blocking insider 
> > > wait_for_completion_killable() 
> > > Applied this patch, wait_for_completion_killable() returns 0,
> > > and vi->data_avail becomes 0, then rng_get_date() will return 0.
> > 
> > Thanks for checking this.
> > 
> > > Is it expected result?
> > 

Hi Amit

> > I think what will happen is vi->data_avail will be set to whatever it
> > was set last.  In case of a previous successful read request, the
> > data_avail will be set to whatever number of bytes the host gave.  On
> > doing a hot-unplug on the succeeding wait, the value in data_avail
> > will be re-used, and the hwrng core will wrongly take some bytes in
> > the buffer as input from the host.
> > 
> > So, I think we need to set vi->data_avail = 0; before calling
> > wait_event_completion_killable().

I finally understand content, we need to set vi->data_avail to 0
before returning virtio_read(), it might enter wait_event_completion_killable()
when we try to remove the device.

We call complete() in remove_common(), then wait_event_completion_killable()
will exit, but virtio_read() will be re-entered if the device still
isn't unregistered, then re-stuck inside wait_event_completion_killable().

I tested some complex condition (both quick/slow backend), I found
some problem in below two patches.

I will post another fix later. The test result is expected.

1. Hotplug remove virtio-rng0, dd process will exit with an error:
"dd: error reading ‘/dev/hwrng’: Operation not permitted"
   virtio-rng0 disappear from 'info pci'
 
2. Re-read by dd, hotplug virtio-rng1, dd process exit with same
   error, virtio-rng1 disappear


Thanks, Amos

> > 
> > Amit
> 
> In my latest debugging, I found the hang is caused by unexpected reading
> when we started to remove the device.
> 
> I have two draft fix, 1) is skip unexpected reading by checking a
> remove flag. 2) is unregistering device at the beginning of
> remove_common(). I think second patch is better if it won't cause
> new problem.
> 
> The original patch (complete in remove_common()) is still necessary.
> 
> Test results: hotplug issue disappeared (dd process will quit).
> 
> 
> diff --git a/drivers/char/hw_random/virtio-rng.c
> b/drivers/char/hw_random/virtio-rng.c
> index 2e3139e..028797c 100644
> --- a/drivers/char/hw_random/virtio-rng.c
> +++ b/drivers/char/hw_random/virtio-rng.c
> @@ -35,6 +35,7 @@ struct virtrng_info {
> unsigned int data_avail;
> int index;
> bool busy;
> +bool remove;
> bool hwrng_register_done;
>  };
>  
> @@ -68,6 +69,9 @@ static int virtio_read(struct hwrng *rng, void *buf,
> size_t size, bool wait)
> int ret;
> struct virtrng_info *vi = (struct virtrng_info *)rng->priv;
>  
> +if (vi->remove)
> +   return 0;
> +
> if (!vi->busy) {
> vi->busy = true;
> init_completion(&vi->have_data);
> @@ -137,6 +141,8 @@ static void remove_common(struct virtio_device
> *vdev)
>  {
> struct virtrng_info *vi = vdev->priv;
>  
> +   vi->remove = true;
> +   complete(&vi->have_data);
> vdev->config->reset(vdev);
> vi->busy = false;
> if (vi->hwrng_register_done)
> 
> 
> diff --git a/drivers/char/hw_random/virtio-rng.c
> b/drivers/char/hw_random/virtio-rng.c
> index 2e3139e..9b8c2ce 100644
> --- a/drivers/char/hw_random/virtio-rng.c
> +++ b/drivers/char/hw_random/virtio-rng.c
> @@ -137,10 +137,11 @@ static void remove_common(struct virtio_device
> *vdev)
>  {
> struct virtrng_info *vi = vdev->priv;
>  
> -   vdev->config->reset(vdev);
> -   vi->busy = false;
> if (vi->hwrng_register_done)
> hwrng_unregister(&vi->hwrng);
> +   complete(&vi->have_data);
> +   vdev->config->reset(vdev);
> +   vi->busy = false;
> vdev->config->del_vqs(vdev);
> ida_simple_remove(&rng_index_ida, vi->index);
> kfree(vi);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-09 Thread Marc Zyngier
[resending, as ARM email server seems to be in some mood]

On 09/09/14 11:27, Ard Biesheuvel wrote:
> The ISS encoding for an exception from a Data Abort has a WnR
> bit[6] that indicates whether the Data Abort was caused by a
> read or a write instruction. While there are several fields
> in the encoding that are only valid if the ISV bit[24] is set,
> WnR is not one of them, so we can read it unconditionally.
> 
> Instead of fixing both implementations of kvm_is_write_fault()
> in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
> which already does the right thing with respect to the WnR bit.
> Also fix up the callers to pass 'vcpu'
> 
> Acked-by: Laszlo Ersek 
> Signed-off-by: Ard Biesheuvel 

Because I like that kind of diffstat:
Acked-by: Marc Zyngier 

Christoffer, if you too are happy with that, I'll queue it right away.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] virtio-rng: fix stuck of hot-unplugging busy device

2014-09-09 Thread Amos Kong
When we try to hot-remove a busy virtio-rng device from QEMU monitor,
the device can't be hot-removed. Because virtio-rng driver hangs at
wait_for_completion_killable().

This patch exits the waiting by completing have_data completion before
unregistering, resets data_avail to avoid the hwrng core use wrong
buffer bytes. Before real unregistering we should return -ENODEV for
reading.

When we hot-unplug the device, dd process in guest will exit.
  $ dd if=/dev/hwrng of=/dev/null
  dd: error reading ‘/dev/hwrng’: No such device

Signed-off-by: Amos Kong 
Cc: sta...@vger.kernel.org
---
V2: reset data_avail (Amit)
adjust unregister order
---
 drivers/char/hw_random/virtio-rng.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 2e3139e..e76433b 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -68,6 +68,10 @@ static int virtio_read(struct hwrng *rng, void *buf, size_t 
size, bool wait)
int ret;
struct virtrng_info *vi = (struct virtrng_info *)rng->priv;
 
+   if (!vi->hwrng_register_done) {
+   return -ENODEV;
+   }
+
if (!vi->busy) {
vi->busy = true;
init_completion(&vi->have_data);
@@ -137,10 +141,14 @@ static void remove_common(struct virtio_device *vdev)
 {
struct virtrng_info *vi = vdev->priv;
 
+   vi->data_avail = 0;
+   complete(&vi->have_data);
vdev->config->reset(vdev);
vi->busy = false;
-   if (vi->hwrng_register_done)
+   if (vi->hwrng_register_done) {
+   vi->hwrng_register_done = false;
hwrng_unregister(&vi->hwrng);
+   }
vdev->config->del_vqs(vdev);
ida_simple_remove(&rng_index_ida, vi->index);
kfree(vi);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

2014-09-09 Thread Venkateswara Rao Nandigam
I have tried Freebsd10.0 64bit VM on the KVM Host running RHEL 6.4, processor 
Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz.

The Freebsd VM hangs at the "booting... " prompt.

If I boot the host kernel with "nosmep", then Freebsd VM boots up fine. I know 
Xeon V2 processors are having the smep feature.

Any ideas/solutions on how to boot Freebsd VM with smep option enabled in Host 
Kernel.

Thanks,
Venkatesh
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge

2014-09-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #20 from Joel Schopp  ---
What updates are you looking for?  Joerg's fix is now upstream.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

2014-09-09 Thread Nakajima, Jun
On Tue, Sep 9, 2014 at 4:12 AM, Venkateswara Rao Nandigam
 wrote:
> I have tried Freebsd10.0 64bit VM on the KVM Host running RHEL 6.4, processor 
> Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz.
>
> The Freebsd VM hangs at the "booting... " prompt.
>
> If I boot the host kernel with "nosmep", then Freebsd VM boots up fine. I 
> know Xeon V2 processors are having the smep feature.
>
> Any ideas/solutions on how to boot Freebsd VM with smep option enabled in 
> Host Kernel.
>

Does it boot on bare metal?


-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mutex

2014-09-09 Thread Amos Kong
Hi Amit, Rusty

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062
steps:
- Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest
- check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*'

Result: cat process will get stuck, it will return if we kill dd process.

We have some static variables (eg, current_rng, data_avail, etc) in 
hw_random/core.c,
they are protected by rng_mutex. I try to workaround this issue by undelay(100)
after mutex_unlock() in rng_dev_read(). This gives chance for 
hwrng_attr_*_show()
to get mutex.

This patch also contains some cleanup, moving some code out of mutex
protection.

Do you have some suggestion? Thanks.


diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..fa69020 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
}
 
mutex_unlock(&rng_mutex);
+   udelay(100);
 
if (need_resched())
schedule_timeout_interruptible(1);
@@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device 
*dev,
int err;
struct hwrng *rng;
 
+   err = -ENODEV;
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
-   err = -ENODEV;
list_for_each_entry(rng, &rng_list, list) {
if (strcmp(rng->name, buf) == 0) {
if (rng == current_rng) {
@@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
return -ERESTARTSYS;
if (current_rng)
name = current_rng->name;
-   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
mutex_unlock(&rng_mutex);
+   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
 
return ret;
 }
@@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct device 
*dev,
ssize_t ret = 0;
struct hwrng *rng;
 
+   buf[0] = '\0';
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
-   buf[0] = '\0';
list_for_each_entry(rng, &rng_list, list) {
strncat(buf, rng->name, PAGE_SIZE - ret - 1);
ret += strlen(rng->name);
strncat(buf, " ", PAGE_SIZE - ret - 1);
ret++;
}
+   mutex_unlock(&rng_mutex);
strncat(buf, "\n", PAGE_SIZE - ret - 1);
ret++;
-   mutex_unlock(&rng_mutex);
 
return ret;
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC virtio-rng: fail to read sysfs of a busy device

2014-09-09 Thread Amos Kong
(Resend to fix the subject)

Hi Amit, Rusty

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062
steps:
- Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest
- check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*'

Result: cat process will get stuck, it will return if we kill dd process.

We have some static variables (eg, current_rng, data_avail, etc) in 
hw_random/core.c,
they are protected by rng_mutex. I try to workaround this issue by undelay(100)
after mutex_unlock() in rng_dev_read(). This gives chance for 
hwrng_attr_*_show()
to get mutex.

This patch also contains some cleanup, moving some code out of mutex
protection.

Do you have some suggestion? Thanks.


diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..fa69020 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
}
 
mutex_unlock(&rng_mutex);
+   udelay(100);
 
if (need_resched())
schedule_timeout_interruptible(1);
@@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device 
*dev,
int err;
struct hwrng *rng;
 
+   err = -ENODEV;
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
-   err = -ENODEV;
list_for_each_entry(rng, &rng_list, list) {
if (strcmp(rng->name, buf) == 0) {
if (rng == current_rng) {
@@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
return -ERESTARTSYS;
if (current_rng)
name = current_rng->name;
-   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
mutex_unlock(&rng_mutex);
+   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
 
return ret;
 }
@@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct device 
*dev,
ssize_t ret = 0;
struct hwrng *rng;
 
+   buf[0] = '\0';
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
-   buf[0] = '\0';
list_for_each_entry(rng, &rng_list, list) {
strncat(buf, rng->name, PAGE_SIZE - ret - 1);
ret += strlen(rng->name);
strncat(buf, " ", PAGE_SIZE - ret - 1);
ret++;
}
+   mutex_unlock(&rng_mutex);
strncat(buf, "\n", PAGE_SIZE - ret - 1);
ret++;
-   mutex_unlock(&rng_mutex);
 
return ret;
 }

-- 
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge

2014-09-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=81841

--- Comment #21 from Marti Raudsepp  ---
(In reply to Joel Schopp from comment #20)
> What updates are you looking for?  Joerg's fix is now upstream.

Yes, but there's still the issue with southbridge component isolation. You
requested more information from me in comment #15 that I provided in comment
#17.

For background see comment #9 from Alex Williamson:
> AMD would need to confirm it.  IOMMU groups are based on hardware advertised
> isolation via the PCIe ACS capability.  Without this, or a device specific
> quirk to take its place, IOMMU groups must assume that peer-to-peer between
> functions of a multi-function device is possible and therefore that the
> devices are not isolated. [...]

> I think the path forward is to get confirmation from AMD that these function
> are isolated from each other and add quirks to the kernel.  Then you won't
> have the device dependencies in vfio-pci.  The override patch allows you to
> do that with just a kernel boot parameter.  There's no gurantee that
> pci-assign will ever be fixed since it's being phased out.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

2014-09-09 Thread Venkateswara Rao Nandigam
I haven't tried on bare metal, but have seen some other list, where in some one 
tried on bare metal and that was running well, as their VM on KVM was still 
failing.

-Original Message-
From: Nakajima, Jun [mailto:jun.nakaj...@intel.com] 
Sent: Tuesday, September 09, 2014 8:37 PM
To: Venkateswara Rao Nandigam
Cc: KVM
Subject: Re: Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) 
CPU E5-2620 v2 @ 2.10GHz

On Tue, Sep 9, 2014 at 4:12 AM, Venkateswara Rao Nandigam 
 wrote:
> I have tried Freebsd10.0 64bit VM on the KVM Host running RHEL 6.4, processor 
> Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz.
>
> The Freebsd VM hangs at the "booting... " prompt.
>
> If I boot the host kernel with "nosmep", then Freebsd VM boots up fine. I 
> know Xeon V2 processors are having the smep feature.
>
> Any ideas/solutions on how to boot Freebsd VM with smep option enabled in 
> Host Kernel.
>

Does it boot on bare metal?


--
Jun
Intel Open Source Technology Center
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path

2014-09-09 Thread Marcelo Tosatti
On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote:
> On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote:
> > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before
> > deleting a pinned spte.
> > 
> > Signed-off-by: Marcelo Tosatti 
> > 
> > ---
> >  arch/x86/kvm/mmu.c |   29 +++--
> >  1 file changed, 23 insertions(+), 6 deletions(-)
> > 
> > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c
> > ===
> > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 
> > 11:23:59.290744490 -0300
> > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 
> > -0300
> > @@ -1208,7 +1208,8 @@
> >   *
> >   * Return true if tlb need be flushed.
> >   */
> > -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool 
> > pt_protect)
> > +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool 
> > pt_protect,
> > +  bool skip_pinned)
> >  {
> > u64 spte = *sptep;
> >  
> > @@ -1218,6 +1219,22 @@
> >  
> > rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
> >  
> > +   if (is_pinned_spte(spte)) {
> > +   /* keep pinned spte intact, mark page dirty again */
> > +   if (skip_pinned) {
> > +   struct kvm_mmu_page *sp;
> > +   gfn_t gfn;
> > +
> > +   sp = page_header(__pa(sptep));
> > +   gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
> > +
> > +   mark_page_dirty(kvm, gfn);
> > +   return false;
> Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while
> populating dirty_bitmap_buffer?

The pinned gfns are per-vcpu. Requires looping all vcpus (not
scalable).


> > +   } else
> > +   mmu_reload_pinned_vcpus(kvm);
> Can you explain why do you need this?

Because if skip_pinned = false, we want vcpus to exit (called
from enable dirty logging codepath).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] KVM: MMU: pinned sps are not candidates for deletion.

2014-09-09 Thread Marcelo Tosatti
On Tue, Jul 22, 2014 at 05:59:42AM +0800, Xiao Guangrong wrote:
> 
> On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote:
> 
> > Skip pinned shadow pages when selecting pages to zap.
> 
> It seems there is no way to prevent changing pinned spte on
> zap-all path?

Xiao,

The way would be to reload remote mmus, forcing the vcpu to exit,
zapping a page, then vcpu will pagefault any necessary page via
kvm_mmu_pin_pages.

kvm_mmu_invalidate_zap_all_pages does: 

- spin_lock(mmu_lock)
- kvm_reload_remote_mmus
...
- spin_unlock(mmu_lock)

So its OK to change pinned spte on zap all path.

> I am thing if we could move pinned spte to another list (eg. 
> pinned_shadow_pages)
> instead of active list so that it can not be touched by any other free paths. 
> Your idea?

As mentioned it above, it is ok to zap pinned sptes as long w 
reload remote mmus request is performed.

That said, you still consider a separate list?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path

2014-09-09 Thread Marcelo Tosatti
On Tue, Jul 22, 2014 at 05:55:20AM +0800, Xiao Guangrong wrote:
> 
> On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote:
> 
> > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before
> > deleting a pinned spte.
> > 
> > Signed-off-by: Marcelo Tosatti 
> > 
> > ---
> > arch/x86/kvm/mmu.c |   29 +++--
> > 1 file changed, 23 insertions(+), 6 deletions(-)
> > 
> > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c
> > ===
> > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 
> > 11:23:59.290744490 -0300
> > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 
> > -0300
> > @@ -1208,7 +1208,8 @@
> > *
> > * Return true if tlb need be flushed.
> > */
> > -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool 
> > pt_protect)
> > +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool 
> > pt_protect,
> > +  bool skip_pinned)
> > {
> > u64 spte = *sptep;
> > 
> > @@ -1218,6 +1219,22 @@
> > 
> > rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
> > 
> > +   if (is_pinned_spte(spte)) {
> > +   /* keep pinned spte intact, mark page dirty again */
> > +   if (skip_pinned) {
> > +   struct kvm_mmu_page *sp;
> > +   gfn_t gfn;
> > +
> > +   sp = page_header(__pa(sptep));
> > +   gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
> > +
> > +   mark_page_dirty(kvm, gfn);
> > +   return false;
> > +   } else
> > +   mmu_reload_pinned_vcpus(kvm);
> > +   }
> > +
> > +
> > if (pt_protect)
> > spte &= ~SPTE_MMU_WRITEABLE;
> > spte = spte & ~PT_WRITABLE_MASK;
> 
> This is also a window between marking spte readonly and re-ping…
> IIUC, I think all spte spte can not be zapped and write-protected at any time

It is safe because mmu_lock is held by kvm_mmu_slot_remove_write_access
?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: fix api documentation of KVM_GET_EMULATED_CPUID

2014-09-09 Thread Alex Bennée
It looks like when this was initially merged it got accidentally included
in the following section. I've just moved it back in the correct section
and re-numbered it as other ioctls have been added since.

Signed-off-by: Alex Bennée 

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 723d3f3..60c1582 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2620,6 +2620,76 @@ When debug events exit the main run loop with the reason
 KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
 structure containing architecture specific debug information.
 
+4.88 KVM_GET_EMULATED_CPUID
+
+Capability: KVM_CAP_EXT_EMUL_CPUID
+Architectures: x86
+Type: system ioctl
+Parameters: struct kvm_cpuid2 (in/out)
+Returns: 0 on success, -1 on error
+
+struct kvm_cpuid2 {
+   __u32 nent;
+   __u32 flags;
+   struct kvm_cpuid_entry2 entries[0];
+};
+
+The member 'flags' is used for passing flags from userspace.
+
+#define KVM_CPUID_FLAG_SIGNIFCANT_INDEXBIT(0)
+#define KVM_CPUID_FLAG_STATEFUL_FUNC   BIT(1)
+#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
+
+struct kvm_cpuid_entry2 {
+   __u32 function;
+   __u32 index;
+   __u32 flags;
+   __u32 eax;
+   __u32 ebx;
+   __u32 ecx;
+   __u32 edx;
+   __u32 padding[3];
+};
+
+This ioctl returns x86 cpuid features which are emulated by
+kvm.Userspace can use the information returned by this ioctl to query
+which features are emulated by kvm instead of being present natively.
+
+Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
+structure with the 'nent' field indicating the number of entries in
+the variable-size array 'entries'. If the number of entries is too low
+to describe the cpu capabilities, an error (E2BIG) is returned. If the
+number is too high, the 'nent' field is adjusted and an error (ENOMEM)
+is returned. If the number is just right, the 'nent' field is adjusted
+to the number of valid entries in the 'entries' array, which is then
+filled.
+
+The entries returned are the set CPUID bits of the respective features
+which kvm emulates, as returned by the CPUID instruction, with unknown
+or unsupported feature bits cleared.
+
+Features like x2apic, for example, may not be present in the host cpu
+but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
+emulated efficiently and thus not included here.
+
+The fields in each entry are defined as follows:
+
+  function: the eax value used to obtain the entry
+  index: the ecx value used to obtain the entry (for entries that are
+ affected by ecx)
+  flags: an OR of zero or more of the following:
+KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
+   if the index field is valid
+KVM_CPUID_FLAG_STATEFUL_FUNC:
+   if cpuid for this function returns different values for successive
+   invocations; there will be several entries with the same function,
+   all with this flag set
+KVM_CPUID_FLAG_STATE_READ_NEXT:
+   for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
+   the first entry to be read by a cpu
+   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
+ this function/index combination
+
 5. The kvm_run structure
 
 
@@ -2918,76 +2988,6 @@ and usually define the validity of a groups of 
registers. (e.g. one bit
 };
 
 
-4.81 KVM_GET_EMULATED_CPUID
-
-Capability: KVM_CAP_EXT_EMUL_CPUID
-Architectures: x86
-Type: system ioctl
-Parameters: struct kvm_cpuid2 (in/out)
-Returns: 0 on success, -1 on error
-
-struct kvm_cpuid2 {
-   __u32 nent;
-   __u32 flags;
-   struct kvm_cpuid_entry2 entries[0];
-};
-
-The member 'flags' is used for passing flags from userspace.
-
-#define KVM_CPUID_FLAG_SIGNIFCANT_INDEXBIT(0)
-#define KVM_CPUID_FLAG_STATEFUL_FUNC   BIT(1)
-#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
-
-struct kvm_cpuid_entry2 {
-   __u32 function;
-   __u32 index;
-   __u32 flags;
-   __u32 eax;
-   __u32 ebx;
-   __u32 ecx;
-   __u32 edx;
-   __u32 padding[3];
-};
-
-This ioctl returns x86 cpuid features which are emulated by
-kvm.Userspace can use the information returned by this ioctl to query
-which features are emulated by kvm instead of being present natively.
-
-Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
-structure with the 'nent' field indicating the number of entries in
-the variable-size array 'entries'. If the number of entries is too low
-to describe the cpu capabilities, an error (E2BIG) is returned. If the
-number is too high, the 'nent' field is adjusted and an error (ENOMEM)
-is returned. If the number is just right, the 'nent' field is adjusted
-to the number of valid entries in the 'entries' array, which is then
-filled.
-
-The entries returned are the set CPUID bits of the respective features
-which kvm em

[PATCH 0/2] KVM API documentation patches

2014-09-09 Thread Alex Bennée
Hi,

I'm preparing to add ARM KVM GDB support and I went to read the API
documentation and found it surprisingly mute on the subject ;-)

The first patch documents the "new" KVM_SET_GUEST_DEBUG ioctl based on
reviewing the code. I've included a long CC list of people who've
actually done the various implementations who I hope can sanity check
the write-up. The second is a trivial formatting fix for what looks
like a minor merge trip-up.

Alex Bennée (2):
  KVM: document KVM_SET_GUEST_DEBUG api
  KVM: fix api documentation of KVM_GET_EMULATED_CPUID

 Documentation/virtual/kvm/api.txt | 184 +++---
 1 file changed, 114 insertions(+), 70 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: document KVM_SET_GUEST_DEBUG api

2014-09-09 Thread Alex Bennée
In preparation for working on the ARM implementation I noticed the debug
interface was missing from the API document. I've pieced together the
expected behaviour from the code and commit messages written it up as
best I can.

Signed-off-by: Alex Bennée 

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index d3dde61..723d3f3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2575,6 +2575,50 @@ associated with the service will be forgotten, and 
subsequent RTAS
 calls by the guest for that service will be passed to userspace to be
 handled.
 
+4.87 KVM_SET_GUEST_DEBUG
+
+Capability: KVM_CAP_SET_GUEST_DEBUG
+Architectures: x86, s390, ppc
+Type: vcpu ioctl
+Parameters: struct kvm_guest_debug (in)
+Returns: 0 on success; -1 on error
+
+struct kvm_guest_debug {
+   __u32 control;
+   __u32 pad;
+   struct kvm_guest_debug_arch arch;
+};
+
+Set up the processor specific debug registers and configure vcpu for
+handling guest debug events. There are two parts to the structure, the
+first a control bitfield indicates the type of debug events to handle
+when running. Common control bits are:
+
+  - KVM_GUESTDBG_ENABLE:guest debugging is enabled
+  - KVM_GUESTDBG_SINGLESTEP:the next run should single-step
+
+The top 16 bits of the control field are architecture specific control
+flags which can include the following:
+
+  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86]
+  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
+  - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
+  - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
+  - KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
+
+For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
+are enabled in memory so we need to ensure breakpoint exceptions are
+correctly trapped and the KVM run loop exits at the breakpoint and not
+running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP
+we need to ensure the guest vCPUs architecture specific registers are
+updated to the correct (supplied) values.
+
+The second part of the structure is architecture specific and
+typically contains a set of debug registers.
+
+When debug events exit the main run loop with the reason
+KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
+structure containing architecture specific debug information.
 
 5. The kvm_run structure
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v6] powerpc/kvm: common sw breakpoint instr across ppc

2014-09-09 Thread Madhavan Srinivasan
This patch extends the use of illegal instruction as software
breakpoint instruction across the ppc platform. Patch extends
booke program interrupt code to support software breakpoint.

Signed-off-by: Madhavan Srinivasan 
---
Patch is only compile tested. Will really help if
someone can try it out and let me know the comments.

 arch/powerpc/include/asm/kvm_booke.h |  2 --
 arch/powerpc/kvm/booke.c | 19 ++-
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index f7aa5cc..ab7123a 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -30,8 +30,6 @@
 #define EHPRIV_OC_SHIFT11
 /* "ehpriv 1" : ehpriv with OC = 1 is used for debug emulation */
 #define EHPRIV_OC_DEBUG1
-#define KVMPPC_INST_EHPRIV_DEBUG   (KVMPPC_INST_EHPRIV | \
-(EHPRIV_OC_DEBUG << EHPRIV_OC_SHIFT))
 
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index b4c89fa..365e85d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -870,6 +870,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
case BOOKE_INTERRUPT_HV_PRIV:
emulated = kvmppc_get_last_inst(vcpu, false, &last_inst);
break;
+   case BOOKE_INTERRUPT_PROGRAM:
+   /* SW breakpoints arrive as illegal instructions on HV */
+   if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
+   emulated = kvmppc_get_last_inst(vcpu, false, 
&last_inst);
+   break;
default:
break;
}
@@ -947,6 +952,18 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
 
case BOOKE_INTERRUPT_PROGRAM:
+   if ((vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) &&
+   (last_inst == KVMPPC_INST_SW_BREAKPOINT)) {
+   /*
+* We are here because of an SW breakpoint instr,
+* so lets return to host to handle.
+*/
+   r = kvmppc_handle_debug(run, vcpu);
+   run->exit_reason = KVM_EXIT_DEBUG;
+   kvmppc_account_exit(vcpu, DEBUG_EXITS);
+   break;
+   }
+
if (vcpu->arch.shared->msr & (MSR_PR | MSR_GS)) {
/*
 * Program traps generated by user-level software must
@@ -1505,7 +1522,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
val = get_reg_val(reg->id, vcpu->arch.tsr);
break;
case KVM_REG_PPC_DEBUG_INST:
-   val = get_reg_val(reg->id, KVMPPC_INST_EHPRIV_DEBUG);
+   val = get_reg_val(reg->id, KVMPPC_INST_SW_BREAKPOINT);
break;
case KVM_REG_PPC_VRSAVE:
val = get_reg_val(reg->id, vcpu->arch.vrsave);
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2 v6] powerpc/kvm: support to handle sw breakpoint

2014-09-09 Thread Madhavan Srinivasan
This patchset adds ppc64 server side support for software breakpoint
and extends the use of illegal instruction as software
breakpoint across ppc platform.

Patch 1, adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to
hypervisor via Emulation Assistance interrupt, where we check
for the illegal instruction and accordingly we return to Host
or Guest. Patch also adds support for software breakpoint
in PR KVM.

Patch 2,extends the use of illegal instruction as software
breakpoint instruction across the ppc platform. Patch extends
booke program interrupt code to support software breakpoint.

Patch 2 is only compile tested. Will really help if
someone can try it out and let me know comments.

Changes v5->v6:
 Fixed checkpatch.pl errors
 Added abort case as a seperate condition block in
  emulation function in book3s_hv
 Added debug active check for instruction emulation
 Modified return value to emulate_fail in else part
  of illegal instruction case in book3s_pr.c
 Removed KVMPPC_INST_EHPRIV_DEBUG instruction
 Moved the debug instruction as a separate block in
   program interrupt code

Changes v4->v5:
 Made changes to code comments and commit messages
 Added debugging active checks for illegal instr comparison
 Added debug instruction check in emulate code
 Extended SW breakpoint to booke

Changes v3->v4:
 Made changes to code comments and removed #define of zero opcode
 Added a new function to handle the debug instruction emulation in book3s_hv
 Rebased the code to latest upstream source.

Changes v2->v3:
 Changed the debug instructions. Using the all zero opcode in the instruction 
word
  as illegal instruction as mentioned in Power ISA instead of ABS
 Removed reg updated in emulation assist and added a call to
  kvmppc_emulate_instruction for reg update.

Changes v1->v2:

 Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM can also 
share it.
 Added code to use KVM get one reg infrastructure to get debug opcode.
 Updated emulate.c to include emulation of debug instruction incase of PR_KVM.
 Made changes to commit message.

Madhavan Srinivasan (2):
  powerpc/kvm: support to handle sw breakpoint
  powerpc/kvm: common sw breakpoint instr across ppc

 arch/powerpc/include/asm/kvm_booke.h |  2 --
 arch/powerpc/include/asm/kvm_ppc.h   |  6 ++
 arch/powerpc/kvm/book3s.c|  3 ++-
 arch/powerpc/kvm/book3s_hv.c | 41 
 arch/powerpc/kvm/book3s_pr.c |  3 +++
 arch/powerpc/kvm/booke.c | 19 -
 arch/powerpc/kvm/emulate.c   | 15 +
 7 files changed, 81 insertions(+), 8 deletions(-)

-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 v6] powerpc/kvm: support to handle sw breakpoint

2014-09-09 Thread Madhavan Srinivasan
This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/kvm_ppc.h |  6 ++
 arch/powerpc/kvm/book3s.c  |  3 ++-
 arch/powerpc/kvm/book3s_hv.c   | 41 ++
 arch/powerpc/kvm/book3s_pr.c   |  3 +++
 arch/powerpc/kvm/emulate.c | 15 ++
 5 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fb86a22..dd83c9a 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -38,6 +38,12 @@
 #include 
 #endif
 
+/*
+ * KVMPPC_INST_SW_BREAKPOINT is debug Instruction
+ * for supporting software breakpoint.
+ */
+#define KVMPPC_INST_SW_BREAKPOINT  0x0000
+
 enum emulation_result {
EMULATE_DONE, /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with MMIO request */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index dd03f6b..00e9c9f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
 {
-   return -EINVAL;
+   vcpu->guest_debug = dbg->control;
+   return 0;
 }
 
 void kvmppc_decrementer_func(unsigned long data)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 27cced9..000fbec 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
return kvmppc_hcall_impl_hv_realmode(cmd);
 }
 
+static int kvmppc_emulate_debug_inst(struct kvm_run *run,
+   struct kvm_vcpu *vcpu)
+{
+   u32 last_inst;
+
+   if (kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) !=
+   EMULATE_DONE) {
+   /*
+* Fetch failed, so return to guest and
+* try executing it again.
+*/
+   return RESUME_GUEST;
+   }
+
+   if (last_inst == KVMPPC_INST_SW_BREAKPOINT) {
+   run->exit_reason = KVM_EXIT_DEBUG;
+   run->debug.arch.address = kvmppc_get_pc(vcpu);
+   return RESUME_HOST;
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   return RESUME_GUEST;
+   }
+}
+
 static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
 struct task_struct *tsk)
 {
@@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
break;
/*
 * This occurs if the guest executes an illegal instruction.
-* We just generate a program interrupt to the guest, since
-* we don't emulate any guest instructions at this stage.
+* If the guest debug is disabled, generate a program interrupt
+* to the guest. If guest debug is enabled, we need to check
+* whether the instruction is a software breakpoint instruction.
+* Accordingly return to Guest or Host.
 */
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
-   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
-   r = RESUME_GUEST;
+   if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) {
+   r = kvmppc_emulate_debug_inst(run, vcpu);
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   r = RESUME_GUEST;
+   }
break;
/*
 * This occurs if the guest (kernel or userspace), does something that
@@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 
id,
long int i;
 
switch (id) {
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT);
+   break;
case KVM_REG_PPC_HIOR:
*val = get_reg_val(id, 0);
break;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index faffb27..6d73708 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1319,6 +1319,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
int r = 0;
 
switch (id) {
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT);
+   break;
case KVM_REG_PPC_HIOR:
*val = get_reg_val(id, to_book3s(vcpu)->h

Re: [Qemu-devel] [PATCH v3 2/2] docs: update ivshmem device spec

2014-09-09 Thread Eric Blake
On 09/01/2014 03:52 AM, David Marchand wrote:

>>> What about upgrading QEMU and ivshmem-server while you have existing
>>> guests?  You cannot restart ivshmem-server, and the new QEMU would have
>>> to talk to the old ivshmem-server.
>>
>> Version negotiation also helps avoid confusion if someone combines
>> ivshmem-server and QEMU from different origins (e.g. built from source
>> and distro packaged).

Don't underestimate the likelihood of this happening.  Any long-running
process (which an ivshmem-server will be) continues running at the old
version, even when a package upgrade installs a new qemu binary; the new
binary should still be able to manage connections to the already-running
server.

Even neater would be a solution where an existing ivshmem-server could
re-exec an updated ivshmem-server binary that resulted from a distro
upgrade, hand over all state required for the new server to take over
from the point managed by the old server, so that you aren't stuck
running the old binary forever.  But that's a lot trickier to write, so
it is not necessary for a first implementation; and if you do that, then
you have the reverse situation to worry about (the new server must still
accept communication from existing old qemu binaries).

Note that the goal here is to support upgrades; it is probably okay if
downgrading from a new binary back to an old doesn't work correctly
(because the new software was using a feature not present in the old).

>>
>> It's a safeguard to prevent hard-to-diagnose failures when the system is
>> misconfigured.
>>
> 
> Hum, so you want the code to be defensive against mis-use, why not.
> 
> I wanted to keep modifications on ivshmem as little as possible in a
> first phase (all the more so as there are potential ivshmem users out
> there that I think will be impacted by a protocol change).

Existing ivshmem users MUST be aware that they are using something that
is not yet polished, and be prepared to make the upgrade to the polished
version.  It's best to minimize the hassle by making them upgrade
exactly once to a fully-robust version, rather than to have them upgrade
to a slightly-more robust version only to find out we didn't plan ahead
well enough to make further extensions in a back-compatible manner.

> 
> Sending the version as the first "vm_id" with an associated fd to -1
> before sending the real client id should work with existing QEMU client
> code (hw/misc/ivshmem.c).
> 
> Do you have a better idea ?
> Is there a best practice in QEMU for "version negotiation" that could
> work with ivshmem protocol ?

QMP starts off with a mandatory "qmp_capabilities" handshake, although
we haven't yet had to define any capabilities where cross-versioned
communication differs as a result.  Migration is somewhat of an example,
except that it is one-directional (we don't have a feedback path), so it
is somewhat best effort.  The qcow2 v3 file format is an example of
declaring features, rather than version numbers, and making decisions
about whether a feature is compatible (older clients can safely ignore
the bit, without corrupting the image but possibly having worse
performance) vs. incompatible (older clients must reject the image,
because not handling the feature correctly would corrupt the image).
The best handshakes are bi-directional - both sides advertise their
version (or better, their features), and a well-defined algorithm for
settling on the common subset of advertised features then ensures that
the two sides know how to talk to each other, or give a reason for
either side to disconnect early because of a missing feature.

> 
> I have a v4 ready with this (and all the pending comments), I will send
> it later unless a better idea is exposed.
> 
> 
> Thanks.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] KVM: fix api documentation of KVM_GET_EMULATED_CPUID

2014-09-09 Thread Borislav Petkov
On Tue, Sep 09, 2014 at 05:27:19PM +0100, Alex Bennée wrote:
> It looks like when this was initially merged it got accidentally included
> in the following section.

Whoops.

> I've just moved it back in the correct section
> and re-numbered it as other ioctls have been added since.
> 
> Signed-off-by: Alex Bennée 

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v6] powerpc/kvm: support to handle sw breakpoint

2014-09-09 Thread Alexander Graf


On 09.09.14 19:07, Madhavan Srinivasan wrote:
> This patch adds kernel side support for software breakpoint.
> Design is that, by using an illegal instruction, we trap to hypervisor
> via Emulation Assistance interrupt, where we check for the illegal instruction
> and accordingly we return to Host or Guest. Patch also adds support for
> software breakpoint in PR KVM.
> 
> Signed-off-by: Madhavan Srinivasan 
> ---
>  arch/powerpc/include/asm/kvm_ppc.h |  6 ++
>  arch/powerpc/kvm/book3s.c  |  3 ++-
>  arch/powerpc/kvm/book3s_hv.c   | 41 
> ++
>  arch/powerpc/kvm/book3s_pr.c   |  3 +++
>  arch/powerpc/kvm/emulate.c | 15 ++
>  5 files changed, 63 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index fb86a22..dd83c9a 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -38,6 +38,12 @@
>  #include 
>  #endif
>  
> +/*
> + * KVMPPC_INST_SW_BREAKPOINT is debug Instruction
> + * for supporting software breakpoint.
> + */
> +#define KVMPPC_INST_SW_BREAKPOINT0x0000
> +
>  enum emulation_result {
>   EMULATE_DONE, /* no further processing */
>   EMULATE_DO_MMIO,  /* kvm_run filled with MMIO request */
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index dd03f6b..00e9c9f 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>   struct kvm_guest_debug *dbg)
>  {
> - return -EINVAL;
> + vcpu->guest_debug = dbg->control;
> + return 0;
>  }
>  
>  void kvmppc_decrementer_func(unsigned long data)
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 27cced9..000fbec 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
>   return kvmppc_hcall_impl_hv_realmode(cmd);
>  }
>  
> +static int kvmppc_emulate_debug_inst(struct kvm_run *run,
> + struct kvm_vcpu *vcpu)
> +{
> + u32 last_inst;
> +
> + if (kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) !=
> + EMULATE_DONE) {
> + /*
> +  * Fetch failed, so return to guest and
> +  * try executing it again.
> +  */
> + return RESUME_GUEST;
> + }
> +
> + if (last_inst == KVMPPC_INST_SW_BREAKPOINT) {
> + run->exit_reason = KVM_EXIT_DEBUG;
> + run->debug.arch.address = kvmppc_get_pc(vcpu);
> + return RESUME_HOST;
> + } else {
> + kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> + return RESUME_GUEST;
> + }
> +}
> +
>  static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
>struct task_struct *tsk)
>  {
> @@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
> struct kvm_vcpu *vcpu,
>   break;
>   /*
>* This occurs if the guest executes an illegal instruction.
> -  * We just generate a program interrupt to the guest, since
> -  * we don't emulate any guest instructions at this stage.
> +  * If the guest debug is disabled, generate a program interrupt
> +  * to the guest. If guest debug is enabled, we need to check
> +  * whether the instruction is a software breakpoint instruction.
> +  * Accordingly return to Guest or Host.
>*/
>   case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
> - kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> - r = RESUME_GUEST;
> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) {
> + r = kvmppc_emulate_debug_inst(run, vcpu);
> + } else {
> + kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> + r = RESUME_GUEST;
> + }
>   break;
>   /*
>* This occurs if the guest (kernel or userspace), does something that
> @@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
> u64 id,
>   long int i;
>  
>   switch (id) {
> + case KVM_REG_PPC_DEBUG_INST:
> + *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT);
> + break;
>   case KVM_REG_PPC_HIOR:
>   *val = get_reg_val(id, 0);
>   break;
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index faffb27..6d73708 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c

Very nice patch set :). The only thing we're missing now is that Book3S
PR does not allow sw breakpoints to arrive in user mode (MSR.PR == 1),
because there we're ne

Re: [PATCH 0/2 v6] powerpc/kvm: support to handle sw breakpoint

2014-09-09 Thread Alexander Graf


On 09.09.14 19:07, Madhavan Srinivasan wrote:
> This patchset adds ppc64 server side support for software breakpoint
> and extends the use of illegal instruction as software
> breakpoint across ppc platform.
> 
> Patch 1, adds kernel side support for software breakpoint.
> Design is that, by using an illegal instruction, we trap to
> hypervisor via Emulation Assistance interrupt, where we check
> for the illegal instruction and accordingly we return to Host
> or Guest. Patch also adds support for software breakpoint
> in PR KVM.
> 
> Patch 2,extends the use of illegal instruction as software
> breakpoint instruction across the ppc platform. Patch extends
> booke program interrupt code to support software breakpoint.

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question about using "qemu-kvm" to create vm including "-mem-path"

2014-09-09 Thread Min Du
Hi everyone,

I was wondering if anyone could answer my confusion here. I am using qemu-kvm 
0.12.1 installed with centos-6.5. I would like to use large pages to back my 
guest vm, so I did it as this: http://www.linux-kvm.org/page/UsingLargePages, 
and added "-mem-path /hugetlbfs" in the command to create VM. When things went 
well, after creating vm, I type "tail /proc/meminfo", it would show like this, 
which means "HugePages" was used:
[root@mind domain]# tail /proc/meminfo 
VmallocChunk:   34359481380 kB
HardwareCorrupted: 0 kB
AnonHugePages:   1368064 kB
HugePages_Total: 512
HugePages_Free:  384
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:6144 kB
DirectMap2M:12552192 kB

However if I add "root=/dev/vda1" in the command to create vm, which looks like 
"-append root=/dev/vda1 console=ttyS0" in command line, then "-mem-path 
/hugetlbfs" doesn't work:
[root@mind domain]# tail /proc/meminfo 
VmallocChunk:   34359481380 kB
HardwareCorrupted: 0 kB
AnonHugePages:   1390592 kB
HugePages_Total: 512
HugePages_Free:  512
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:6144 kB
DirectMap2M:12552192 kB

Does anyone know how does "root=/dev/vda1" conflict with hugepages and how do I 
use both of them to create a VM? I am actually using an xml file to create VM, 
and those commands I mentioned above are shown in "ps aux | grep qemu".

Thanks,
Min--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio-rng: fix stuck of hot-unplugging busy device

2014-09-09 Thread Amit Shah
Hi Amos,

On (Tue) 09 Sep 2014 [19:14:02], Amos Kong wrote:
> When we try to hot-remove a busy virtio-rng device from QEMU monitor,
> the device can't be hot-removed. Because virtio-rng driver hangs at
> wait_for_completion_killable().
> 
> This patch exits the waiting by completing have_data completion before
> unregistering, resets data_avail to avoid the hwrng core use wrong
> buffer bytes. Before real unregistering we should return -ENODEV for
> reading.
> 
> When we hot-unplug the device, dd process in guest will exit.
>   $ dd if=/dev/hwrng of=/dev/null
>   dd: error reading ‘/dev/hwrng’: No such device
> 
> Signed-off-by: Amos Kong 
> Cc: sta...@vger.kernel.org
> ---
> V2: reset data_avail (Amit)
> adjust unregister order

Thanks, this looks good.

Can you please split into two parts, the complete() in one, and the
hwrng_register_done stuff into another?

Thanks,

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC virtio-rng: fail to read sysfs of a busy device

2014-09-09 Thread Amit Shah
On (Tue) 09 Sep 2014 [23:23:07], Amos Kong wrote:
> (Resend to fix the subject)
> 
> Hi Amit, Rusty
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062
> steps:
> - Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest
> - check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*'
> 
> Result: cat process will get stuck, it will return if we kill dd process.

How common is it going to be to have a long-running 'dd' process on
/dev/hwrng?

Also, with the new khwrng thread, reading from /dev/hwrng isn't
required -- just use /dev/random?

(This doesn't mean we shouldn't fix the issue here...)

> We have some static variables (eg, current_rng, data_avail, etc) in 
> hw_random/core.c,
> they are protected by rng_mutex. I try to workaround this issue by 
> undelay(100)
> after mutex_unlock() in rng_dev_read(). This gives chance for 
> hwrng_attr_*_show()
> to get mutex.
> 
> This patch also contains some cleanup, moving some code out of mutex
> protection.
> 
> Do you have some suggestion? Thanks.
> 
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index aa30a25..fa69020 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>   }
>  
>   mutex_unlock(&rng_mutex);
> + udelay(100);

We have a need_resched() right below.  Why doesn't that work?

>   if (need_resched())
>   schedule_timeout_interruptible(1);
> @@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device 
> *dev,
>   int err;
>   struct hwrng *rng;

The following hunk doesn't work:

> + err = -ENODEV;
>   err = mutex_lock_interruptible(&rng_mutex);

err is being set to another value in the next line!

>   if (err)
>   return -ERESTARTSYS;
> - err = -ENODEV;

And all usage of err below now won't have -ENODEV but some other value.

>   list_for_each_entry(rng, &rng_list, list) {
>   if (strcmp(rng->name, buf) == 0) {
>   if (rng == current_rng) {
> @@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
>   return -ERESTARTSYS;
>   if (current_rng)
>   name = current_rng->name;
> - ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
>   mutex_unlock(&rng_mutex);
> + ret = snprintf(buf, PAGE_SIZE, "%s\n", name);

This looks OK...

>  
>   return ret;
>  }
> @@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct device 
> *dev,
>   ssize_t ret = 0;
>   struct hwrng *rng;
>  
> + buf[0] = '\0';
>   err = mutex_lock_interruptible(&rng_mutex);
>   if (err)
>   return -ERESTARTSYS;
>
> - buf[0] = '\0';
>   list_for_each_entry(rng, &rng_list, list) {
>   strncat(buf, rng->name, PAGE_SIZE - ret - 1);
>   ret += strlen(rng->name);
>   strncat(buf, " ", PAGE_SIZE - ret - 1);
>   ret++;
>   }
> + mutex_unlock(&rng_mutex);
>   strncat(buf, "\n", PAGE_SIZE - ret - 1);
>   ret++;
> - mutex_unlock(&rng_mutex);

But this isn't resulting in savings; the majority of the time is being
spent in the for loop, and that writes to the buffer.

BTW I don't expect strcat'ing to the buf in each of these scenarios is
a long operation, so this reworking doesn't strike to me as something
we should pursue.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] virtio-rng: fix hotunplug

2014-09-09 Thread Amos Kong
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1121540

When we try to hot-unplugging a busy virtio-rng device, the device
can't be removed. And the reading process in guest gets stuck.

Those two patches fixed this issue by completing have_data completion
and preventing invalid reading.

Thanks for the help of Amit.

Cc: sta...@vger.kernel.org

V2: reset data_avail (Amit)
adjust unregister order
V3: split patch, update commitlog

Amos Kong (2):
  virtio-rng: fix stuck of hot-unplugging busy device
  virtio-rng: skip reading when we start to remove the device

 drivers/char/hw_random/virtio-rng.c | 7 +++
 1 file changed, 7 insertions(+)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] virtio-rng: fix stuck of hot-unplugging busy device

2014-09-09 Thread Amos Kong
When we try to hot-remove a busy virtio-rng device from QEMU monitor,
the device can't be hot-removed. Because virtio-rng driver hangs at
wait_for_completion_killable().

This patch exits the waiting by completing have_data completion before
unregistering, resets data_avail to avoid the hwrng core use wrong
buffer bytes.

Signed-off-by: Amos Kong 
Cc: sta...@vger.kernel.org
---
 drivers/char/hw_random/virtio-rng.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 2e3139e..849b228 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -137,6 +137,8 @@ static void remove_common(struct virtio_device *vdev)
 {
struct virtrng_info *vi = vdev->priv;
 
+   vi->data_avail = 0;
+   complete(&vi->have_data);
vdev->config->reset(vdev);
vi->busy = false;
if (vi->hwrng_register_done)
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] virtio-rng: skip reading when we start to remove the device

2014-09-09 Thread Amos Kong
Before we really unregister the hwrng device, reading will get stuck if
the virtio device is reset. We should return error for reading when we
start to remove the device.

Signed-off-by: Amos Kong 
Cc: sta...@vger.kernel.org
---
 drivers/char/hw_random/virtio-rng.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 849b228..132c9cc 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -36,6 +36,7 @@ struct virtrng_info {
int index;
bool busy;
bool hwrng_register_done;
+   bool hwrng_removed;
 };
 
 
@@ -68,6 +69,9 @@ static int virtio_read(struct hwrng *rng, void *buf, size_t 
size, bool wait)
int ret;
struct virtrng_info *vi = (struct virtrng_info *)rng->priv;
 
+   if (vi->hwrng_removed)
+   return -ENODEV;
+
if (!vi->busy) {
vi->busy = true;
init_completion(&vi->have_data);
@@ -137,6 +141,7 @@ static void remove_common(struct virtio_device *vdev)
 {
struct virtrng_info *vi = vdev->priv;
 
+   vi->hwrng_removed = true;
vi->data_avail = 0;
complete(&vi->have_data);
vdev->config->reset(vdev);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio-rng: fix stuck of hot-unplugging busy device

2014-09-09 Thread Amos Kong
On Wed, Sep 10, 2014 at 11:04:54AM +0530, Amit Shah wrote:
> Hi Amos,
> 
> On (Tue) 09 Sep 2014 [19:14:02], Amos Kong wrote:
> > When we try to hot-remove a busy virtio-rng device from QEMU monitor,
> > the device can't be hot-removed. Because virtio-rng driver hangs at
> > wait_for_completion_killable().
> > 
> > This patch exits the waiting by completing have_data completion before
> > unregistering, resets data_avail to avoid the hwrng core use wrong
> > buffer bytes. Before real unregistering we should return -ENODEV for
> > reading.
> > 
> > When we hot-unplug the device, dd process in guest will exit.
> >   $ dd if=/dev/hwrng of=/dev/null
> >   dd: error reading ‘/dev/hwrng’: No such device
> > 
> > Signed-off-by: Amos Kong 
> > Cc: sta...@vger.kernel.org
> > ---
> > V2: reset data_avail (Amit)
> > adjust unregister order
> 
> Thanks, this looks good.
> 
> Can you please split into two parts, the complete() in one, and the
> hwrng_register_done stuff into another?

I just posted the V3, split to two patches, and updated the commitlog
to describe why we need to return ENODEV for reading.
 
> Thanks,
> 
>   Amit

-- 
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC virtio-rng: fail to read sysfs of a busy device

2014-09-09 Thread Amos Kong
On Wed, Sep 10, 2014 at 11:22:12AM +0530, Amit Shah wrote:
> On (Tue) 09 Sep 2014 [23:23:07], Amos Kong wrote:
> > (Resend to fix the subject)
> > 
> > Hi Amit, Rusty
> > 
> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062
> > steps:
> > - Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest
> > - check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*'
> > 
> > Result: cat process will get stuck, it will return if we kill dd process.
> 
> How common is it going to be to have a long-running 'dd' process on
> /dev/hwrng?

Not a common usage, but we have this strict testing.
 
> Also, with the new khwrng thread, reading from /dev/hwrng isn't
> required -- just use /dev/random?

Yes.
 
> (This doesn't mean we shouldn't fix the issue here...)

Completely agree :-)
 
> > We have some static variables (eg, current_rng, data_avail, etc) in 
> > hw_random/core.c,
> > they are protected by rng_mutex. I try to workaround this issue by 
> > undelay(100)
> > after mutex_unlock() in rng_dev_read(). This gives chance for 
> > hwrng_attr_*_show()
> > to get mutex.
> > 
> > This patch also contains some cleanup, moving some code out of mutex
> > protection.
> > 
> > Do you have some suggestion? Thanks.
> > 
> > 
> > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > index aa30a25..fa69020 100644
> > --- a/drivers/char/hw_random/core.c
> > +++ b/drivers/char/hw_random/core.c
> > @@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char 
> > __user *buf,
> > }
> >  
> > mutex_unlock(&rng_mutex);
> > +   udelay(100);
> 
> We have a need_resched() right below.  Why doesn't that work?

need_resched() is giving chance for userspace to 
 
> > if (need_resched())

It never success in my debugging.

If we remove this check and always call schedule_timeout_interruptible(1),
problem also disappears.

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..263a370 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp,
char __user *buf,
 
mutex_unlock(&rng_mutex);
 
-   if (need_resched())
-   schedule_timeout_interruptible(1);
+   schedule_timeout_interruptible(1);
 
if (signal_pending(current)) {
err = -ERESTARTSYS;

> > schedule_timeout_interruptible(1);
> > @@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device 
> > *dev,
> > int err;
> > struct hwrng *rng;
 
> The following hunk doesn't work:
> 
> > +   err = -ENODEV;
> > err = mutex_lock_interruptible(&rng_mutex);
> 
> err is being set to another value in the next line!
> 
> > if (err)
> > return -ERESTARTSYS;
> > -   err = -ENODEV;
> 
> And all usage of err below now won't have -ENODEV but some other value.

Oops!
 
> > list_for_each_entry(rng, &rng_list, list) {
> > if (strcmp(rng->name, buf) == 0) {
> > if (rng == current_rng) {
> > @@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device 
> > *dev,
> > return -ERESTARTSYS;
> > if (current_rng)
> > name = current_rng->name;
> > -   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> > mutex_unlock(&rng_mutex);
> > +   ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> 
> This looks OK...
> 
> >  
> > return ret;
> >  }
> > @@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct 
> > device *dev,
> > ssize_t ret = 0;
> > struct hwrng *rng;
> >  
> > +   buf[0] = '\0';
> > err = mutex_lock_interruptible(&rng_mutex);
> > if (err)
> > return -ERESTARTSYS;
> >
> > -   buf[0] = '\0';
> > list_for_each_entry(rng, &rng_list, list) {
> > strncat(buf, rng->name, PAGE_SIZE - ret - 1);
> > ret += strlen(rng->name);
> > strncat(buf, " ", PAGE_SIZE - ret - 1);
> > ret++;
> > }
> > +   mutex_unlock(&rng_mutex);
> > strncat(buf, "\n", PAGE_SIZE - ret - 1);
> > ret++;
> > -   mutex_unlock(&rng_mutex);
> 
> But this isn't resulting in savings; the majority of the time is being
> spent in the for loop, and that writes to the buffer.

Right
 
> BTW I don't expect strcat'ing to the buf in each of these scenarios is
> a long operation, so this reworking doesn't strike to me as something
> we should pursue.
> 
>   Amit

-- 
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)

2014-09-09 Thread Matt Mullins
On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote:
> Il 08/09/2014 17:56, Matt Mullins ha scritto:
> >> > What host are you running?
> > What information do you want that I missed in my first email?
> 
> What version of QEMU?  Can you try the 12.04 qemu (which IIRC is 1.0) on
> top of the newer kernel?

I did reproduce this on qemu 1.0.1.

What would you like me to test next?  I've got both VM hosts currently running
3.17-rc4, so I'll know tomorrow if that works.

I've been looking into this off-and-on for a while now, and this time around I
may have found other folks experiencing the same issue:
https://issues.apache.org/jira/browse/CLOUDSTACK-6788
That one's a little empty on the details (do y'all know more about to whom that
bug was "known" than I do?), but
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218
sees similar symptoms due to running NTP on the host.  I may try disabling that
after the current round of testing-3.17-rc4 finishes up.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html