Re: Crash in intel_iommu_assign_device

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 07:52, Sheng Yang wrote:
> On Monday 01 November 2010 19:41:21 Jan Kiszka wrote:
>> Hi Sheng,
>>
>> I'm not claiming to understand the details, but this looks like use
>> (dereference of pte via dma_pte_addr) after release (free_pgtable_page
>> of dmar_domain->pgd aka pte) to me:
>>
>> static int intel_iommu_attach_device(struct iommu_domain *domain,
>>   struct device *dev)
>> {
>>  [...]
>>  pte = dmar_domain->pgd;
>>  if (dma_pte_present(pte)) {
>>  free_pgtable_page(dmar_domain->pgd);
>>  dmar_domain->pgd = (struct dma_pte *)
>>  phys_to_virt(dma_pte_addr(pte));
>>  }
>>
>> At least it crashes here right on pte->val access. Swap both lines?
> 
> I think code is right.
> 
> The comment above indicate the case: the code want to decrease the level of 
> page 
> table. Mostly it is a 4 level page table, and the code would turn it into 3 
> levels 
> pagetable. What the code did is just get the first entry of the old pagetable 
> level 
> 4, then free the level 4 pagetable's page, and make the pagetable to a level 
> 3 
> pagetable.
> 
> Seems it make no sense to swap the lines...

It fixes the crash here, and I'm convinced the current code is wrong.
See the patch I've just sent out.

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH] intel-iommu: Fix use after release during device attach

2010-11-02 Thread Jan Kiszka
From: Jan Kiszka 

Obtail the new pgd pointer before releasing the page containing this
value.

Signed-off-by: Jan Kiszka 
---

Who is taking care of this? The kvm tree?

 drivers/pci/intel-iommu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 4789f8e..35463dd 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3627,9 +3627,9 @@ static int intel_iommu_attach_device(struct iommu_domain 
*domain,
 
pte = dmar_domain->pgd;
if (dma_pte_present(pte)) {
-   free_pgtable_page(dmar_domain->pgd);
dmar_domain->pgd = (struct dma_pte *)
phys_to_virt(dma_pte_addr(pte));
+   free_pgtable_page(pte);
}
dmar_domain->agaw--;
}
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kernel,cred,kvm,security - removing superfluous rcu_read_lock_held check

2010-11-02 Thread Jiri Olsa
On Mon, Nov 01, 2010 at 11:42:23PM +0100, Paolo Bonzini wrote:
> On 11/01/2010 08:15 PM, Jiri Olsa wrote:
> >diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >index a055742..a90a7e3 100644
> >--- a/include/linux/kvm_host.h
> >+++ b/include/linux/kvm_host.h
> >@@ -256,7 +256,6 @@ void kvm_put_kvm(struct kvm *kvm);
> >  static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
> >  {
> > return rcu_dereference_check(kvm->memslots,
> >-srcu_read_lock_held(&kvm->srcu)
> > || lockdep_is_held(&kvm->slots_lock));
> >  }
> >
> 
> This is an srcu_read_lock_held, which you don't touch here:
> 
> >diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
> >index 9d8e8fb..0956a73 100644
> >--- a/kernel/rcutorture.c
> >+++ b/kernel/rcutorture.c
> >@@ -807,7 +807,6 @@ static void rcu_torture_timer(unsigned long unused)
> > idx = cur_ops->readlock();
> > completed = cur_ops->completed();
> > p = rcu_dereference_check(rcu_torture_current,
> >-  rcu_read_lock_held() ||
> >   rcu_read_lock_bh_held() ||
> >   rcu_read_lock_sched_held() ||
> >   srcu_read_lock_held(&srcu_ctl));
> 
> I guess the kvm hunk is the incorrect one?
ops, you're right. I overlooked it, please skip the kvm hunk..

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 7/7] KVM: KVM: don't break vcpu 'halt' state due to apfs

2010-11-02 Thread Xiao Guangrong
On 11/02/2010 02:56 PM, Gleb Natapov wrote:
> On Tue, Nov 02, 2010 at 10:30:10AM +0800, Xiao Guangrong wrote:
>> On 11/01/2010 08:55 PM, Gleb Natapov wrote:
>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 2cfdf2d..f7aed95 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -5295,8 +5295,9 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>> {
>>> switch(vcpu->arch.mp_state) {
>>> case KVM_MP_STATE_HALTED:
>>> -   vcpu->arch.mp_state =
>>> -   KVM_MP_STATE_RUNNABLE;
>>> +   if 
>>> (list_empty_careful(&vcpu->async_pf.done))
>>> +   vcpu->arch.mp_state =
>>> +   KVM_MP_STATE_RUNNABLE;
>>
>> if nmi/interrupt and apfs completed event occur at the same time, we will 
>> miss to
>> exit halt sate. Maybe we can check the pending event here, see below patch 
>> please.
>>
> No, we will not. If nmi/interrupt and apfs completed event occur at the same
> time kvm_vcpu_block() will exit with KVM_REQ_UNHALT set, but cpu will
> not be unhalted because of list_empty_careful(&vcpu->async_pf.done)
> check. Vcpu then will process pending apf completion and enter
> kvm_vcpu_block() again which will immediately exit because
> kvm_arch_vcpu_runnable() will return true since there is pending
> nmi/interrupt. This time vcpu will be unhalted.

Thanks for your explanation, but if it has nmi/interrupt pending,
kvm_arch_can_inject_async_page_present() always return false in PV guest case,
it can not process pending apf completion, so, the vcpu is remain halt state
forever?

Also, the pv guest can only handle an apf completion at one time, it can not 
ensure
vcpu->async_pf.done is empty after kvm_check_async_pf_completion()

> 
>>> case KVM_MP_STATE_RUNNABLE:
>>> vcpu->arch.apf.halted = false;
>>> break;
>>> @@ -6279,6 +6280,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu 
>>> *vcpu,
>>> vcpu->arch.fault.error_code = 0;
>>> vcpu->arch.fault.address = work->arch.token;
>>> kvm_inject_page_fault(vcpu);
>>> +   vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>>> }
>>>  }
>>
>> Have a stupid question, why we make the vcpu runnable here? Sorry i don't 
>> know
>> kvm pv guest to much. :-(
> Because kvm_arch_vcpu_runnable() does not check for pending exceptions.
> Since now we do not unhalt vcpu when apf completion happens we need to
> unhalt it here. But, as I said, the patch is untested.
> 

As i know, exception can not let guest exit halt state, only NMI/interruption 
can do it, yes? :-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Fix use after release during device attach

2010-11-02 Thread Sheng Yang
On Tuesday 02 November 2010 15:05:51 Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> Obtail the new pgd pointer before releasing the page containing this
> value.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> Who is taking care of this? The kvm tree?
> 
>  drivers/pci/intel-iommu.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 4789f8e..35463dd 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -3627,9 +3627,9 @@ static int intel_iommu_attach_device(struct
> iommu_domain *domain,
> 
>   pte = dmar_domain->pgd;
>   if (dma_pte_present(pte)) {
> - free_pgtable_page(dmar_domain->pgd);
>   dmar_domain->pgd = (struct dma_pte *)
>   phys_to_virt(dma_pte_addr(pte));
> + free_pgtable_page(pte);
>   }
>   dmar_domain->agaw--;
>   }

Reviewed-by: Sheng Yang 

CC iommu mailing list and David.

OK, Jan, I got your meaning now. And it's not the exactly swap. :)

I think the old code is safe, seems it's broken(exposed) by: 

commit 1a8bd481bfba30515b54368d90a915db3faf302f
Author: David Woodhouse 
Date:   Tue Aug 10 01:38:53 2010 +0100

intel-iommu: Fix 32-bit build warning with __cmpxchg()

drivers/pci/intel-iommu.c: In function 'dma_pte_addr':
drivers/pci/intel-iommu.c:239: warning: passing argument 1 of '__cmpxchg64' 
from incompatible pointer typ

It seems that __cmpxchg64() now cares about the type of its pointer 
argument,
so give it a (uint64_t *) instead of a pointer to a structure which contains
only that.

Signed-off-by: David Woodhouse 

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index c9171be..603cdc0 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -236,7 +236,7 @@ static inline u64 dma_pte_addr(struct dma_pte *pte)
return pte->val & VTD_PAGE_MASK;
 #else
/* Must have a full atomic 64-bit read */
-   return  __cmpxchg64(pte, 0ULL, 0ULL) & VTD_PAGE_MASK;
+   return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
 #endif
 }

Seems here is the only affected code?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 7/7] KVM: KVM: don't break vcpu 'halt' state due to apfs

2010-11-02 Thread Gleb Natapov
On Tue, Nov 02, 2010 at 03:31:26PM +0800, Xiao Guangrong wrote:
> On 11/02/2010 02:56 PM, Gleb Natapov wrote:
> > On Tue, Nov 02, 2010 at 10:30:10AM +0800, Xiao Guangrong wrote:
> >> On 11/01/2010 08:55 PM, Gleb Natapov wrote:
> >>
> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>> index 2cfdf2d..f7aed95 100644
> >>> --- a/arch/x86/kvm/x86.c
> >>> +++ b/arch/x86/kvm/x86.c
> >>> @@ -5295,8 +5295,9 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>>   {
> >>>   switch(vcpu->arch.mp_state) {
> >>>   case KVM_MP_STATE_HALTED:
> >>> - vcpu->arch.mp_state =
> >>> - KVM_MP_STATE_RUNNABLE;
> >>> + if 
> >>> (list_empty_careful(&vcpu->async_pf.done))
> >>> + vcpu->arch.mp_state =
> >>> + KVM_MP_STATE_RUNNABLE;
> >>
> >> if nmi/interrupt and apfs completed event occur at the same time, we will 
> >> miss to
> >> exit halt sate. Maybe we can check the pending event here, see below patch 
> >> please.
> >>
> > No, we will not. If nmi/interrupt and apfs completed event occur at the same
> > time kvm_vcpu_block() will exit with KVM_REQ_UNHALT set, but cpu will
> > not be unhalted because of list_empty_careful(&vcpu->async_pf.done)
> > check. Vcpu then will process pending apf completion and enter
> > kvm_vcpu_block() again which will immediately exit because
> > kvm_arch_vcpu_runnable() will return true since there is pending
> > nmi/interrupt. This time vcpu will be unhalted.
> 
> Thanks for your explanation, but if it has nmi/interrupt pending,
> kvm_arch_can_inject_async_page_present() always return false in PV guest case,
> it can not process pending apf completion, so, the vcpu is remain halt state
> forever?
> 

kvm_event_needs_reinjection() checks for nmi/interrupts that
need reinjection (not injection).  Those are nmi/interrupts that
was injected but injection failed for some reason. For nmi, for
instance, kvm_arch_vcpu_runnable() checks vcpu->arch.nmi_pending,
but kvm_event_needs_reinjection() checks for vcpu->arch.nmi_injected.
NMI moves from nmi_pending to nmi_injected when it is injected into vcpu
for the first time. CPU cannot be halted in this state.

> Also, the pv guest can only handle an apf completion at one time, it can not 
> ensure
> vcpu->async_pf.done is empty after kvm_check_async_pf_completion()
> 
In case of PV guest it will be woken up by apf completion by
kvm_arch_async_page_present() below.

> > 
> >>>   case KVM_MP_STATE_RUNNABLE:
> >>>   vcpu->arch.apf.halted = false;
> >>>   break;
> >>> @@ -6279,6 +6280,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu 
> >>> *vcpu,
> >>>   vcpu->arch.fault.error_code = 0;
> >>>   vcpu->arch.fault.address = work->arch.token;
> >>>   kvm_inject_page_fault(vcpu);
> >>> + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> >>>   }
> >>>  }
> >>
> >> Have a stupid question, why we make the vcpu runnable here? Sorry i don't 
> >> know
> >> kvm pv guest to much. :-(
> > Because kvm_arch_vcpu_runnable() does not check for pending exceptions.
> > Since now we do not unhalt vcpu when apf completion happens we need to
> > unhalt it here. But, as I said, the patch is untested.
> > 
> 
> As i know, exception can not let guest exit halt state, only NMI/interruption 
> can do it, yes? :-)
On physical HW exception cannot happen while cpu is in halt state, but
with PV we define what guest can and cannot expect, so when guest
explicitly enables apf it should be able to handle it during halt.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Fix use after release during device attach

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 08:31, Sheng Yang wrote:
> On Tuesday 02 November 2010 15:05:51 Jan Kiszka wrote:
>> From: Jan Kiszka 
>>
>> Obtail the new pgd pointer before releasing the page containing this
>> value.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>
>> Who is taking care of this? The kvm tree?
>>
>>  drivers/pci/intel-iommu.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
>> index 4789f8e..35463dd 100644
>> --- a/drivers/pci/intel-iommu.c
>> +++ b/drivers/pci/intel-iommu.c
>> @@ -3627,9 +3627,9 @@ static int intel_iommu_attach_device(struct
>> iommu_domain *domain,
>>
>>  pte = dmar_domain->pgd;
>>  if (dma_pte_present(pte)) {
>> -free_pgtable_page(dmar_domain->pgd);
>>  dmar_domain->pgd = (struct dma_pte *)
>>  phys_to_virt(dma_pte_addr(pte));
>> +free_pgtable_page(pte);
>>  }
>>  dmar_domain->agaw--;
>>  }
> 
> Reviewed-by: Sheng Yang 
> 
> CC iommu mailing list and David.
> 
> OK, Jan, I got your meaning now. And it's not the exactly swap. :)
> 
> I think the old code is safe, seems it's broken(exposed) by: 
> 
> commit 1a8bd481bfba30515b54368d90a915db3faf302f
> Author: David Woodhouse 
> Date:   Tue Aug 10 01:38:53 2010 +0100
> 
> intel-iommu: Fix 32-bit build warning with __cmpxchg()
> 
> drivers/pci/intel-iommu.c: In function 'dma_pte_addr':
> drivers/pci/intel-iommu.c:239: warning: passing argument 1 of 
> '__cmpxchg64' 
> from incompatible pointer typ
> 
> It seems that __cmpxchg64() now cares about the type of its pointer 
> argument,
> so give it a (uint64_t *) instead of a pointer to a structure which 
> contains
> only that.
> 
> Signed-off-by: David Woodhouse 
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index c9171be..603cdc0 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -236,7 +236,7 @@ static inline u64 dma_pte_addr(struct dma_pte *pte)
> return pte->val & VTD_PAGE_MASK;
>  #else
> /* Must have a full atomic 64-bit read */
> -   return  __cmpxchg64(pte, 0ULL, 0ULL) & VTD_PAGE_MASK;
> +   return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
>  #endif
>  }
> 
> Seems here is the only affected code?

CONFIG_64BIT is on here, so this change did not make a difference for me.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] intel-iommu: Fix use after release during device attach

2010-11-02 Thread Sheng Yang
On Tuesday 02 November 2010 15:46:11 Jan Kiszka wrote:
> Am 02.11.2010 08:31, Sheng Yang wrote:
> > On Tuesday 02 November 2010 15:05:51 Jan Kiszka wrote:
> >> From: Jan Kiszka 
> >> 
> >> Obtail the new pgd pointer before releasing the page containing this
> >> value.
> >> 
> >> Signed-off-by: Jan Kiszka 
> >> ---
> >> 
> >> Who is taking care of this? The kvm tree?
> >> 
> >>  drivers/pci/intel-iommu.c |2 +-
> >>  1 files changed, 1 insertions(+), 1 deletions(-)
> >> 
> >> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> >> index 4789f8e..35463dd 100644
> >> --- a/drivers/pci/intel-iommu.c
> >> +++ b/drivers/pci/intel-iommu.c
> >> @@ -3627,9 +3627,9 @@ static int intel_iommu_attach_device(struct
> >> iommu_domain *domain,
> >> 
> >>pte = dmar_domain->pgd;
> >>if (dma_pte_present(pte)) {
> >> 
> >> -  free_pgtable_page(dmar_domain->pgd);
> >> 
> >>dmar_domain->pgd = (struct dma_pte *)
> >>
> >>phys_to_virt(dma_pte_addr(pte));
> >> 
> >> +  free_pgtable_page(pte);
> >> 
> >>}
> >>dmar_domain->agaw--;
> >>
> >>}
> > 
> > Reviewed-by: Sheng Yang 
> > 
> > CC iommu mailing list and David.
> > 
> > OK, Jan, I got your meaning now. And it's not the exactly swap. :)
> > 
> > I think the old code is safe, seems it's broken(exposed) by:
> > 
> > commit 1a8bd481bfba30515b54368d90a915db3faf302f
> > Author: David Woodhouse 
> > Date:   Tue Aug 10 01:38:53 2010 +0100
> > 
> > intel-iommu: Fix 32-bit build warning with __cmpxchg()
> > 
> > drivers/pci/intel-iommu.c: In function 'dma_pte_addr':
> > drivers/pci/intel-iommu.c:239: warning: passing argument 1 of
> > '__cmpxchg64'
> > 
> > from incompatible pointer typ
> > 
> > It seems that __cmpxchg64() now cares about the type of its pointer
> > argument, so give it a (uint64_t *) instead of a pointer to a
> > structure which contains only that.
> > 
> > Signed-off-by: David Woodhouse 
> > 
> > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > index c9171be..603cdc0 100644
> > --- a/drivers/pci/intel-iommu.c
> > +++ b/drivers/pci/intel-iommu.c
> > @@ -236,7 +236,7 @@ static inline u64 dma_pte_addr(struct dma_pte *pte)
> > 
> > return pte->val & VTD_PAGE_MASK;
> >  
> >  #else
> >  
> > /* Must have a full atomic 64-bit read */
> > 
> > -   return  __cmpxchg64(pte, 0ULL, 0ULL) & VTD_PAGE_MASK;
> > +   return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
> > 
> >  #endif
> >  }
> > 
> > Seems here is the only affected code?
> 
> CONFIG_64BIT is on here, so this change did not make a difference for me.

Oh...

Then it would due to most VT-d machine wouldn't run into while (iommu->agaw < 
dmar_domain->agaw). 

We have routing test for VT-d devices assignment, but seems we don't use this 
kind 
of VT-d machine for testing.

--
regards
Yang, Sheng


> 
> Jan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Fix use after release during device attach

2010-11-02 Thread Sheng Yang
On Tuesday 02 November 2010 15:31:22 Sheng Yang wrote:
> On Tuesday 02 November 2010 15:05:51 Jan Kiszka wrote:
> > From: Jan Kiszka 
> > 
> > Obtail the new pgd pointer before releasing the page containing this
> > value.
> > 
> > Signed-off-by: Jan Kiszka 
> > ---
> > 
> > Who is taking care of this? The kvm tree?
> > 
> >  drivers/pci/intel-iommu.c |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > index 4789f8e..35463dd 100644
> > --- a/drivers/pci/intel-iommu.c
> > +++ b/drivers/pci/intel-iommu.c
> > @@ -3627,9 +3627,9 @@ static int intel_iommu_attach_device(struct
> > iommu_domain *domain,
> > 
> > pte = dmar_domain->pgd;
> > if (dma_pte_present(pte)) {
> > 
> > -   free_pgtable_page(dmar_domain->pgd);
> > 
> > dmar_domain->pgd = (struct dma_pte *)
> > 
> > phys_to_virt(dma_pte_addr(pte));
> > 
> > +   free_pgtable_page(pte);
> > 
> > }
> > dmar_domain->agaw--;
> > 
> > }
> 
> Reviewed-by: Sheng Yang 
> 
> CC iommu mailing list and David.
> 
> OK, Jan, I got your meaning now. And it's not the exactly swap. :)
> 
> I think the old code is safe, seems it's broken(exposed) by:
> 
> commit 1a8bd481bfba30515b54368d90a915db3faf302f
> Author: David Woodhouse 
> Date:   Tue Aug 10 01:38:53 2010 +0100
> 
> intel-iommu: Fix 32-bit build warning with __cmpxchg()

In fact this one shouldn't affect the result. Wrong guess...

--
regards
Yang, Sheng

> 
> drivers/pci/intel-iommu.c: In function 'dma_pte_addr':
> drivers/pci/intel-iommu.c:239: warning: passing argument 1 of
> '__cmpxchg64' from incompatible pointer typ
> 
> It seems that __cmpxchg64() now cares about the type of its pointer
> argument, so give it a (uint64_t *) instead of a pointer to a structure
> which contains only that.
> 
> Signed-off-by: David Woodhouse 
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index c9171be..603cdc0 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -236,7 +236,7 @@ static inline u64 dma_pte_addr(struct dma_pte *pte)
> return pte->val & VTD_PAGE_MASK;
>  #else
> /* Must have a full atomic 64-bit read */
> -   return  __cmpxchg64(pte, 0ULL, 0ULL) & VTD_PAGE_MASK;
> +   return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
>  #endif
>  }
> 
> Seems here is the only affected code?
> 
> --
> regards
> Yang, Sheng
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Launch other test during migration

2010-11-02 Thread Michael Goldish
On 11/02/2010 07:34 AM, Jason Wang wrote:
> Michael Goldish writes:
>  > On 09/25/2010 11:36 AM, Jason Wang wrote:
>  > > We could give a further test of migration by launch test during 
> migartion. So
>  > > the following series implements:
>  > > 
>  > > - A simple class to run a specified test in the background which could 
> be used
>  > > to launch other test during migartion. Its design is rather simple and 
> its usage
>  > > is a little tricky, but it work well.
>  > > - Two sample tests which take advantages of the background class: Test 
> reboot
>  > > during guest migration and test file_transfer during guest migration.
>  > > 
>  > > In the future, we could even lauch autotest client test during guest 
> migation.
>  > > 
>  > > ---
>  > > 
>  > > Jason Wang (3):
>  > >   KVM Test: Introduce a helper class to run a test in the background
>  > >   KVM test: Test reboot during migration
>  > >   KVM test: Test the file transfer during migartion
>  > > 
>  > > 
>  > >  client/tests/kvm/kvm_test_utils.py |   44 
> +++
>  > >  .../kvm/tests/migration_with_file_transfer.py  |   59 
> 
>  > >  client/tests/kvm/tests/migration_with_reboot.py|   45 
> +++
>  > >  client/tests/kvm/tests_base.cfg.sample |   12 
>  > >  4 files changed, 159 insertions(+), 1 deletions(-)
>  > >  create mode 100644 
> client/tests/kvm/tests/migration_with_file_transfer.py
>  > >  create mode 100644 client/tests/kvm/tests/migration_with_reboot.py
>  > 
>  > It seems to me that this method is only applicable to tests/functions
>  > that don't require a VM object (i.e. that require only a shell session
>  > object).  kvm_test_utils.reboot() operates on a VM object, and the same
>  > VM is destroyed by migrate() which runs in the background, so eventually
>  > reboot() tries logging into a destroyed VM, which fails because
>  > vm.get_address() fails.  Any monitor operation will also fail.  If the
>  > autotest wrapper requires a VM object (currently it does) then it can't
>  > be used either.
>  > 
> 
> You are right and that's an issue when running test in parallel with
> migration, but reboot through shell should work. The aim of this kind
> of test is just to add some stress ( such as run_autotest) during
> migartion, so most (probably all) of the tests only needs a
> session. So I think it's not a very big issue in this situation.

Many tests need to be modified in order to require only a session.  I've
tried reboot and it doesn't work, and I can see that run_autotest() also
uses a VM.  Reboot needs the VM object in order to log in to make sure
the VM goes back up, and run_autotest() needs it for SCP and probably
is_alive().  I agree that some tests don't require a VM object, but the
ones that do are also interesting.

>  > An alternative (somewhat ugly) way to migrate in the background is to
>  > pass a boolean 'migrate' flag to various functions/tests, such as
>  > reboot() and run_autotest().  If 'migrate' is True, these functions will
>  > do something like
>  > 
>  > vm = kvm_test_utils.migrate(vm, ...)
>  > 
>  > in their waiting loops, where wait_for() is normally used.  This
>  > guarantees that 'vm' is always a valid VM object.  For example:
>  > 
>  > # Log in after reboot
>  > while time.time() < end_time:
>  > if migrate_in_bg:
>  > vm = kvm_test_utils.migrate(vm, ...)
>  > session = vm.remote_login()
>  > if session:
>  > break
>  > time.sleep(2)
>  > 
>  > This is much longer than the usual wait_for(...) but it does the job.
>  > What do you think?
> 
> This makes sense but it would let testcases need to care about the
> migration and it's also hard to put all related codes into a wrapper
> which would complicate the codes.
> 
> Despite the issue of vm object, all tests that only depends on the
> shell session should works well with my method and it's more easy to

We should also find a solution for tests that require a VM object.

Which other tests do you think it would be interesting to run during
migration, in addition to reboot(), run_autotest() and file_transfer()?

> be extended. Maybe we could just warn the user of its usage and adapt
> my method?

I think it's cleaner to just avoid passing a VM object to BackgroundTest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gold Dust Availability

2010-11-02 Thread Harry Moore
Gold  Dust  Availability

Good day,

Firstly let me introduced myself to you because I searched you on the internet 
and decided to mail you about a business that might be of interest to you.

I am Harry Moore, the chief security officer of a logistic company located in 
London - the United Kingdom, the company deals with storing people and 
organizations’ valuable goods and properties. I will need you to assist me in 
claiming and selling a large quantity of gold dust worth an estimated two 
million US dollars (2,000,000 USD), which was abandoned in our custody for the 
past two years without prior claims.
 
I will give you the details of the gold dust as soon as you agreed to work with 
me. If we work diligently the entire project would be complete within 7 working 
days. Once the transaction is completed, the total amount of money realized 
after the sales of the gold dust will be shared in the percentage ratio of 
50-50.
 
Should you be interested in this project, please kindly reply me so that I can 
give you the full detail of the project. Your earlier reply will be highly 
appreciated, thanks for your consideration.
 
Regards, 

Harry Moore
harrymoo...@gmx.com
harrymoo...@mail.com



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 7/7] KVM: KVM: don't break vcpu 'halt' state due to apfs

2010-11-02 Thread Xiao Guangrong
On 11/02/2010 03:45 PM, Gleb Natapov wrote:

> kvm_event_needs_reinjection() checks for nmi/interrupts that
> need reinjection (not injection).  Those are nmi/interrupts that
> was injected but injection failed for some reason. For nmi, for
> instance, kvm_arch_vcpu_runnable() checks vcpu->arch.nmi_pending,
> but kvm_event_needs_reinjection() checks for vcpu->arch.nmi_injected.
> NMI moves from nmi_pending to nmi_injected when it is injected into vcpu
> for the first time. CPU cannot be halted in this state.
> 

Yeah, nmi is handled like this way, but for interrupt:
If interruption controller is in userspace (!irqchip_in_kernel(v->kvm)),
kvm_arch_vcpu_runnable() checks vcpu->arch.interrupt.pending and
kvm_event_needs_reinjection() also checks vcpu->arch.interrupt.pending.

Consider this case:

- Guest vcpu executes 'HLT'
- wakeup the vcpu and inject interrupt and apfs is completed at this time
- then the vcpu can't handle apf completion and .done list keeps nonempty.  

Can this case happen? Sorry if i missed it again.

>> Also, the pv guest can only handle an apf completion at one time, it can not 
>> ensure
>> vcpu->async_pf.done is empty after kvm_check_async_pf_completion()
>>
> In case of PV guest it will be woken up by apf completion by
> kvm_arch_async_page_present() below.

..

>> As i know, exception can not let guest exit halt state, only 
>> NMI/interruption can do it, yes? :-)
> On physical HW exception cannot happen while cpu is in halt state, but
> with PV we define what guest can and cannot expect, so when guest
> explicitly enables apf it should be able to handle it during halt.
> 

Ah, i see, thanks your very much.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 7/7] KVM: KVM: don't break vcpu 'halt' state due to apfs

2010-11-02 Thread Gleb Natapov
On Tue, Nov 02, 2010 at 05:09:42PM +0800, Xiao Guangrong wrote:
> On 11/02/2010 03:45 PM, Gleb Natapov wrote:
> 
> > kvm_event_needs_reinjection() checks for nmi/interrupts that
> > need reinjection (not injection).  Those are nmi/interrupts that
> > was injected but injection failed for some reason. For nmi, for
> > instance, kvm_arch_vcpu_runnable() checks vcpu->arch.nmi_pending,
> > but kvm_event_needs_reinjection() checks for vcpu->arch.nmi_injected.
> > NMI moves from nmi_pending to nmi_injected when it is injected into vcpu
> > for the first time. CPU cannot be halted in this state.
> > 
> 
> Yeah, nmi is handled like this way, but for interrupt:
> If interruption controller is in userspace (!irqchip_in_kernel(v->kvm)),
> kvm_arch_vcpu_runnable() checks vcpu->arch.interrupt.pending and
> kvm_event_needs_reinjection() also checks vcpu->arch.interrupt.pending.
> 
> Consider this case:
> 
> - Guest vcpu executes 'HLT'
> - wakeup the vcpu and inject interrupt and apfs is completed at this time
> - then the vcpu can't handle apf completion and .done list keeps nonempty.  
> 
> Can this case happen? Sorry if i missed it again.
> 
If irqchip is in userspace apf is disabled (see mmu.c:can_do_async_pf()).
The reason for this is that when irqchip_in_kernel(v->kvm) cpu sleeps in
userspace during halt, so all event that can cause it to be unhalted
should be generated in userspace too. This is also the reason you can't have
pit in kernel and irqchip in userpsace.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 7/7] KVM: KVM: don't break vcpu 'halt' state due to apfs

2010-11-02 Thread Xiao Guangrong
On 11/02/2010 05:14 PM, Gleb Natapov wrote:

> If irqchip is in userspace apf is disabled (see mmu.c:can_do_async_pf()).
> The reason for this is that when irqchip_in_kernel(v->kvm) cpu sleeps in
> userspace during halt, so all event that can cause it to be unhalted
> should be generated in userspace too. This is also the reason you can't have
> pit in kernel and irqchip in userpsace.
> 

Oh, thank you very much for answering so many questions, and your patch is
looks good for me! ;-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/7] KVM: handle more completed apfs if possible

2010-11-02 Thread Xiao Guangrong
If it's no need to inject async #PF to PV guest we can handle
more completed apfs at one time, so we can retry guest #PF
as early as possible

Signed-off-by: Xiao Guangrong 
---
 virt/kvm/async_pf.c |   32 
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 60df9e0..100c66e 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -124,24 +124,24 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 {
struct kvm_async_pf *work;
 
-   if (list_empty_careful(&vcpu->async_pf.done) ||
-   !kvm_arch_can_inject_async_page_present(vcpu))
-   return;
-
-   spin_lock(&vcpu->async_pf.lock);
-   work = list_first_entry(&vcpu->async_pf.done, typeof(*work), link);
-   list_del(&work->link);
-   spin_unlock(&vcpu->async_pf.lock);
+   while (!list_empty_careful(&vcpu->async_pf.done) &&
+ kvm_arch_can_inject_async_page_present(vcpu)) {
+   spin_lock(&vcpu->async_pf.lock);
+   work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
+ link);
+   list_del(&work->link);
+   spin_unlock(&vcpu->async_pf.lock);
 
-   if (work->page)
-   kvm_arch_async_page_ready(vcpu, work);
-   kvm_arch_async_page_present(vcpu, work);
+   if (work->page)
+   kvm_arch_async_page_ready(vcpu, work);
+   kvm_arch_async_page_present(vcpu, work);
 
-   list_del(&work->queue);
-   vcpu->async_pf.queued--;
-   if (work->page)
-   put_page(work->page);
-   kmem_cache_free(async_pf_cache, work);
+   list_del(&work->queue);
+   vcpu->async_pf.queued--;
+   if (work->page)
+   put_page(work->page);
+   kmem_cache_free(async_pf_cache, work);
+   }
 }
 
 int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/7] KVM: handle more completed apfs if possible

2010-11-02 Thread Gleb Natapov
On Tue, Nov 02, 2010 at 05:35:35PM +0800, Xiao Guangrong wrote:
> If it's no need to inject async #PF to PV guest we can handle
> more completed apfs at one time, so we can retry guest #PF
> as early as possible
> 
> Signed-off-by: Xiao Guangrong 
Acked-by: Gleb Natapov 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 7/7] KVM: KVM: don't break vcpu 'halt' state due to apfs

2010-11-02 Thread Gleb Natapov
On Tue, Nov 02, 2010 at 05:30:16PM +0800, Xiao Guangrong wrote:
> On 11/02/2010 05:14 PM, Gleb Natapov wrote:
> 
> > If irqchip is in userspace apf is disabled (see mmu.c:can_do_async_pf()).
> > The reason for this is that when irqchip_in_kernel(v->kvm) cpu sleeps in
> > userspace during halt, so all event that can cause it to be unhalted
> > should be generated in userspace too. This is also the reason you can't have
> > pit in kernel and irqchip in userpsace.
> > 
> 
> Oh, thank you very much for answering so many questions, and your patch is
> looks good for me! ;-)
It is still not tested though :)

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: -pcidevice broken - fix or remove it?

2010-11-02 Thread Markus Armbruster
Jan Kiszka  writes:

> Hi,
>
> looks like the documented way to configure device assignment at qemu-kvm
> level is broken in 0.13 and git head:
>
>   # qemu-system-x86_64 -pcidevice host=0:1a.0
>   qemu-system-x86_64: Parameter 'id' expects an identifier
>   Identifiers consist of letters, digits, '-', '.', '_', starting with a 
> letter.
>   pcidevice argument parse error; please check the help text for usage
>
> "-device pci-assign" works, but only if specify "iommu=0" (otherwise: No
> IOMMU found.  Unable to assign device "(null)").
>
> Fix that legacy qemu-kvm switch or remove it at this chance? No one
> seems to have complained yet.
>
> Jan

Broken since June.  Xudong Hao (cc'ed) reported it then[1], and
Hidetoshi Seto (also cc'ed) attempted to get it patched [2,3].

Removing -pcidevice would be fine with me.  For what it's worth, it's
not in upstream qemu.


[1] http://lists.nongnu.org/archive/html/qemu-devel/2010-06/msg02858.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2010-06/msg03073.html
[3] http://lists.nongnu.org/archive/html/qemu-devel/2010-06/msg02858.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Nov-2 cancelled

2010-11-02 Thread Juan Quintela

Hi

As several of the participants on the call are at plumbers, this call
got cancelled this week.

Later, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: -pcidevice broken - fix or remove it?

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 14:19, Markus Armbruster wrote:
> Jan Kiszka  writes:
> 
>> Hi,
>>
>> looks like the documented way to configure device assignment at qemu-kvm
>> level is broken in 0.13 and git head:
>>
>>   # qemu-system-x86_64 -pcidevice host=0:1a.0
>>   qemu-system-x86_64: Parameter 'id' expects an identifier
>>   Identifiers consist of letters, digits, '-', '.', '_', starting with a 
>> letter.
>>   pcidevice argument parse error; please check the help text for usage
>>
>> "-device pci-assign" works, but only if specify "iommu=0" (otherwise: No
>> IOMMU found.  Unable to assign device "(null)").
>>
>> Fix that legacy qemu-kvm switch or remove it at this chance? No one
>> seems to have complained yet.
>>
>> Jan
> 
> Broken since June.  Xudong Hao (cc'ed) reported it then[1], and
> Hidetoshi Seto (also cc'ed) attempted to get it patched [2,3].
> 
> Removing -pcidevice would be fine with me.  For what it's worth, it's
> not in upstream qemu.
> 

Patch queued.

While at it, I wondered if we should kill "pci_add ... host" as well.
But it looks like libvirt uses it - and should stumble over this
breakage (it does not specify a device name). I can fix it by removing
the silly auto-naming, or can libvirt live without it?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: -pcidevice broken - fix or remove it?

2010-11-02 Thread Daniel P. Berrange
On Tue, Nov 02, 2010 at 03:31:31PM +0100, Jan Kiszka wrote:
> Am 02.11.2010 14:19, Markus Armbruster wrote:
> > Jan Kiszka  writes:
> > 
> >> Hi,
> >>
> >> looks like the documented way to configure device assignment at qemu-kvm
> >> level is broken in 0.13 and git head:
> >>
> >>   # qemu-system-x86_64 -pcidevice host=0:1a.0
> >>   qemu-system-x86_64: Parameter 'id' expects an identifier
> >>   Identifiers consist of letters, digits, '-', '.', '_', starting with a 
> >> letter.
> >>   pcidevice argument parse error; please check the help text for usage
> >>
> >> "-device pci-assign" works, but only if specify "iommu=0" (otherwise: No
> >> IOMMU found.  Unable to assign device "(null)").
> >>
> >> Fix that legacy qemu-kvm switch or remove it at this chance? No one
> >> seems to have complained yet.
> >>
> >> Jan
> > 
> > Broken since June.  Xudong Hao (cc'ed) reported it then[1], and
> > Hidetoshi Seto (also cc'ed) attempted to get it patched [2,3].
> > 
> > Removing -pcidevice would be fine with me.  For what it's worth, it's
> > not in upstream qemu.
> > 
> 
> Patch queued.
> 
> While at it, I wondered if we should kill "pci_add ... host" as well.
> But it looks like libvirt uses it - and should stumble over this
> breakage (it does not specify a device name). I can fix it by removing
> the silly auto-naming, or can libvirt live without it?

As of libvirt >= 0.8.1 & QEMU >= 0.12.x we use switched to using -device 
for everything. Older libvirt versions had rather broken checking for
PCI device topology, so I think it is fine to require libvirt >= 0.8.1
for latest QEMU releases if users want PCI dev assignment. Thus -pcidevice
and pci_add can both be killed from our POV.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] pci-assign: Allow to disable MSI perference for host IRQ

2010-11-02 Thread Jan Kiszka
Some devices (e.g. the ath9k) claim to support MSI but actually do not
work when this is enabled. We must not blindly switch such devices to
MSI but rather provide the user a way to pass control back to the
guest driver. This can be done by turning the new property "prefer_msi"
off (default remains on).

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |5 -
 hw/device-assignment.h |2 ++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 349e864..73d8afd 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -964,7 +964,8 @@ static int assign_irq(AssignedDevice *dev)
 }
 
 assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
-if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
+if (dev->features & ASSIGNED_DEVICE_PREFER_MSI_MASK &&
+dev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
 assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_MSI;
 else
 assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_INTX;
@@ -1541,6 +1542,8 @@ static PCIDeviceInfo assign_info = {
 DEFINE_PROP_BIT("iommu", AssignedDevice, features,
 ASSIGNED_DEVICE_USE_IOMMU_BIT, true),
 DEFINE_PROP_STRING("configfd", AssignedDevice, configfd_name),
+DEFINE_PROP_BIT("prefer_msi", AssignedDevice, features,
+ASSIGNED_DEVICE_PREFER_MSI_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
 },
 };
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 4eb5835..56b7076 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -75,8 +75,10 @@ typedef struct {
 } AssignedDevRegion;
 
 #define ASSIGNED_DEVICE_USE_IOMMU_BIT  0
+#define ASSIGNED_DEVICE_PREFER_MSI_BIT 1
 
 #define ASSIGNED_DEVICE_USE_IOMMU_MASK (1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
+#define ASSIGNED_DEVICE_PREFER_MSI_MASK(1 << 
ASSIGNED_DEVICE_PREFER_MSI_BIT)
 
 typedef struct AssignedDevice {
 PCIDevice dev;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] pci-assign: Issue warning when running w/o IOMMU

2010-11-02 Thread Jan Kiszka
Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 73d8afd..d8c565c 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -913,6 +913,12 @@ static int assign_device(AssignedDevice *dev)
 #else
 dev->features &= ~ASSIGNED_DEVICE_USE_IOMMU_MASK;
 #endif
+if (!(dev->features & ASSIGNED_DEVICE_USE_IOMMU_MASK)) {
+fprintf(stderr,
+"WARNING: Assigning a device without IOMMU protection can "
+"cause host memory corruption if the device issues DMA write "
+"requests!\n");
+}
 
 r = kvm_assign_pci_device(kvm_context, &assigned_dev_data);
 if (r < 0) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] pci-assign: Convert iommu property to booleam

2010-11-02 Thread Jan Kiszka
Defining the iommu property as an integer opens the door for confusion
(Is it an index or a switch?). Fix this by converting it to a bit
property that can be on or off, nothing else.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |7 ---
 hw/device-assignment.h |6 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 2605bd1..349e864 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -902,7 +902,7 @@ static int assign_device(AssignedDevice *dev)
 
 #ifdef KVM_CAP_IOMMU
 /* We always enable the IOMMU unless disabled on the command line */
-if (dev->use_iommu) {
+if (dev->features & ASSIGNED_DEVICE_USE_IOMMU_MASK) {
 if (!kvm_check_extension(kvm_state, KVM_CAP_IOMMU)) {
 fprintf(stderr, "No IOMMU found.  Unable to assign device 
\"%s\"\n",
 dev->dev.qdev.id);
@@ -911,7 +911,7 @@ static int assign_device(AssignedDevice *dev)
 assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU;
 }
 #else
-dev->use_iommu = 0;
+dev->features &= ~ASSIGNED_DEVICE_USE_IOMMU_MASK;
 #endif
 
 r = kvm_assign_pci_device(kvm_context, &assigned_dev_data);
@@ -1538,7 +1538,8 @@ static PCIDeviceInfo assign_info = {
 .config_write = assigned_dev_pci_write_config,
 .qdev.props   = (Property[]) {
 DEFINE_PROP("host", AssignedDevice, host, qdev_prop_hostaddr, 
PCIHostDevice),
-DEFINE_PROP_UINT32("iommu", AssignedDevice, use_iommu, 1),
+DEFINE_PROP_BIT("iommu", AssignedDevice, features,
+ASSIGNED_DEVICE_USE_IOMMU_BIT, true),
 DEFINE_PROP_STRING("configfd", AssignedDevice, configfd_name),
 DEFINE_PROP_END_OF_LIST(),
 },
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 2f5fa17..4eb5835 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -74,10 +74,14 @@ typedef struct {
 PCIRegion *region;
 } AssignedDevRegion;
 
+#define ASSIGNED_DEVICE_USE_IOMMU_BIT  0
+
+#define ASSIGNED_DEVICE_USE_IOMMU_MASK (1 << ASSIGNED_DEVICE_USE_IOMMU_BIT)
+
 typedef struct AssignedDevice {
 PCIDevice dev;
 PCIHostDevice host;
-uint32_t use_iommu;
+uint32_t features;
 int intpin;
 uint8_t debug_flags;
 AssignedDevRegion v_addrs[PCI_NUM_REGIONS - 1];
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] pci-assign: Remove broken -pcidevice and pci_add host

2010-11-02 Thread Jan Kiszka
These qemu-kvm-only interfaces were broken by b560a9ab9b, but no one
complained loud enough to get them fixed again. As we have properly
working "-device pci-assign"/"device_add" and we won't push this
upstream anyway, there is likely no point in restoring the interface.

Signed-off-by: Jan Kiszka 
CC: Markus Armbruster 
CC: Daniel P. Berrange 
---
 hmp-commands.hx|2 +-
 hw/device-assignment.c |   63 
 hw/device-assignment.h |7 -
 hw/ipf.c   |6 
 hw/pc.c|6 
 hw/pci-hotplug.c   |   22 
 qemu-options.hx|5 
 vl.c   |   14 --
 8 files changed, 1 insertions(+), 124 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9406496..ba6de28 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -806,7 +806,7 @@ ETEXI
 {
 .name   = "pci_add",
 .args_type  = "pci_addr:s,type:s,opts:s?",
-.params = "auto|[[:]:] nic|storage|host 
[[vlan=n][,macaddr=addr][,model=type]] [file=file][,if=type][,bus=nr]... 
[host=02:00.0[,name=string][,dma=none]",
+.params = "auto|[[:]:] nic|storage 
[[vlan=n][,macaddr=addr][,model=type]] [file=file][,if=type][,bus=nr]...",
 .help   = "hot-add PCI device",
 .mhandler.cmd = pci_device_hot_add,
 },
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index d8c565c..0bbf9d9 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1561,69 +1561,6 @@ static void assign_register_devices(void)
 
 device_init(assign_register_devices)
 
-
-/*
- * Syntax to assign device:
- *
- * -pcidevice host=bus:dev.func[,dma=none][,name=Foo]
- *
- * Example:
- * -pcidevice host=00:13.0,dma=pvdma
- *
- * dma can currently only be 'none' to disable iommu support.
- */
-QemuOpts *add_assigned_device(const char *arg)
-{
-QemuOpts *opts = NULL;
-char host[64], id[64], dma[8];
-int r;
-
-r = get_param_value(host, sizeof(host), "host", arg);
-if (!r)
- goto bad;
-r = get_param_value(id, sizeof(id), "id", arg);
-if (!r)
-r = get_param_value(id, sizeof(id), "name", arg);
-if (!r)
-r = get_param_value(id, sizeof(id), "host", arg);
-
-opts = qemu_opts_create(qemu_find_opts("device"), id, 0);
-if (!opts)
-goto bad;
-qemu_opt_set(opts, "driver", "pci-assign");
-qemu_opt_set(opts, "host", host);
-
-#ifdef KVM_CAP_IOMMU
-r = get_param_value(dma, sizeof(dma), "dma", arg);
-if (r && !strncmp(dma, "none", 4))
-qemu_opt_set(opts, "iommu", "0");
-#endif
-qemu_opts_print(opts, NULL);
-return opts;
-
-bad:
-fprintf(stderr, "pcidevice argument parse error; "
-"please check the help text for usage\n");
-if (opts)
-qemu_opts_del(opts);
-return NULL;
-}
-
-void add_assigned_devices(PCIBus *bus, const char **devices, int n_devices)
-{
-QemuOpts *opts;
-int i;
-
-for (i = 0; i < n_devices; i++) {
-opts = add_assigned_device(devices[i]);
-if (opts == NULL) {
-fprintf(stderr, "Could not add assigned device %s\n", devices[i]);
-exit(1);
-}
-/* generic code will call qdev_device_add() for the device */
-}
-}
-
 /*
  * Scan the assigned devices for the devices that have an option ROM, and then
  * load the corresponding ROM data to RAM. If an error occurs while loading an
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 56b7076..c94a730 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -114,13 +114,6 @@ typedef struct AssignedDevice {
 QLIST_ENTRY(AssignedDevice) next;
 } AssignedDevice;
 
-QemuOpts *add_assigned_device(const char *arg);
-void add_assigned_devices(PCIBus *bus, const char **devices, int n_devices);
 void assigned_dev_update_irqs(void);
 
-#define MAX_DEV_ASSIGN_CMDLINE 8
-
-extern const char *assigned_devices[MAX_DEV_ASSIGN_CMDLINE];
-extern int assigned_devices_index;
-
 #endif  /* __DEVICE_ASSIGNMENT_H__ */
diff --git a/hw/ipf.c b/hw/ipf.c
index 21cff72..10c21eb 100644
--- a/hw/ipf.c
+++ b/hw/ipf.c
@@ -635,12 +635,6 @@ static void ipf_init1(ram_addr_t ram_size,
unit_id++;
}
 }
-
-#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
-if (kvm_enabled())
-   add_assigned_devices(pci_bus, assigned_devices, assigned_devices_index);
-#endif /* CONFIG_KVM_DEVICE_ASSIGNMENT */
-
 }
 
 static void ipf_init_pci(ram_addr_t ram_size,
diff --git a/hw/pc.c b/hw/pc.c
index b624873..3bf3862 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1105,10 +1105,4 @@ void pc_pci_device_init(PCIBus *pci_bus)
 
 extboot_init(info->bdrv);
 }
-
-#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
-if (kvm_enabled()) {
-add_assigned_devices(pci_bus, assigned_devices, 
assigned_devices_index);
-}
-#endif /* CONFIG_KVM_DEVICE_ASSIGNMENT */
 }
diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index b370b9b..a2d4bb2 10064

[PATCH 0/4] qemu-kvm: Fixes and improvements for device assigment

2010-11-02 Thread Jan Kiszka
Mostly usability fixes, but also a patch to allow using devices that
have problems with MSIs.

Jan Kiszka (4):
  pci-assign: Convert iommu property to booleam
  pci-assign: Allow to disable MSI perference for host IRQ
  pci-assign: Issue warning when running w/o IOMMU
  pci-assign: Remove broken -pcidevice and pci_add host

 hmp-commands.hx|2 +-
 hw/device-assignment.c |   81 ---
 hw/device-assignment.h |   15 -
 hw/ipf.c   |6 ---
 hw/pc.c|6 ---
 hw/pci-hotplug.c   |   22 -
 qemu-options.hx|5 ---
 vl.c   |   14 
 8 files changed, 22 insertions(+), 129 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 32-bit qemu on current x86-64 kernel segfauls very early

2010-11-02 Thread Christoph Hellwig
On Sun, Oct 31, 2010 at 09:06:29AM -0400, Christoph Hellwig wrote:
> With Linus' git tree from today I can't boot qemu when using kvm.  It
> seems to do fine, just glacially slow without -enable-kvm.  The command
> simplest command line that fails is:
> 
>   /opt/qemu/bin/qemu-system-x86_64 -enable-kvm

This issue was caused by commit 9581d442b9058d3699b4be568b6e5eae38a41493

"KVM: Fix fs/gs reload oops with invalid ldt"

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] KVM: Refactor IRQ names of assigned devices

2010-11-02 Thread Jan Kiszka
Cosmetic change, but it helps to correlate IRQs with PCI devices.

Signed-off-by: Jan Kiszka 
---
 include/linux/kvm_host.h |1 +
 virt/kvm/assigned-dev.c  |   10 ++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a786419..46120da 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -475,6 +475,7 @@ struct kvm_assigned_dev_kernel {
struct pci_dev *dev;
struct kvm *kvm;
spinlock_t intx_lock;
+   char irq_name[32];
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index d0b07ea..ca402ed 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -231,7 +231,7 @@ static int assigned_device_enable_host_intx(struct kvm *kvm,
 * are going to be long delays in accepting, acking, etc.
 */
if (request_threaded_irq(dev->host_irq, NULL, kvm_assigned_dev_thread,
-0, "kvm_assigned_intx_device", (void *)dev))
+0, dev->irq_name, (void *)dev))
return -EIO;
return 0;
 }
@@ -250,7 +250,7 @@ static int assigned_device_enable_host_msi(struct kvm *kvm,
 
dev->host_irq = dev->dev->irq;
if (request_threaded_irq(dev->host_irq, NULL, kvm_assigned_dev_thread,
-0, "kvm_assigned_msi_device", (void *)dev)) {
+0, dev->irq_name, (void *)dev)) {
pci_disable_msi(dev->dev);
return -EIO;
}
@@ -277,8 +277,7 @@ static int assigned_device_enable_host_msix(struct kvm *kvm,
for (i = 0; i < dev->entries_nr; i++) {
r = request_threaded_irq(dev->host_msix_entries[i].vector,
 NULL, kvm_assigned_dev_thread,
-0, "kvm_assigned_msix_device",
-(void *)dev);
+0, dev->irq_name, (void *)dev);
if (r)
goto err;
}
@@ -335,6 +334,9 @@ static int assign_host_irq(struct kvm *kvm,
if (dev->irq_requested_type & KVM_DEV_IRQ_HOST_MASK)
return r;
 
+   snprintf(dev->irq_name, sizeof(dev->irq_name), "kvm:%s",
+pci_name(dev->dev));
+
switch (host_irq_type) {
case KVM_DEV_IRQ_HOST_INTX:
r = assigned_device_enable_host_intx(kvm, dev);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] KVM: Clear assigned guest IRQ on release

2010-11-02 Thread Jan Kiszka
When we deassign a guest IRQ, clear the potentially asserted guest line.
There might be no chance for the guest to do this, specifically if we
switch from INTx to MSI mode.

Signed-off-by: Jan Kiszka 
---
 virt/kvm/assigned-dev.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7c98928..ecc4419 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -141,6 +141,9 @@ static void deassign_guest_irq(struct kvm *kvm,
kvm_unregister_irq_ack_notifier(kvm, &assigned_dev->ack_notifier);
assigned_dev->ack_notifier.gsi = -1;
 
+   kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
+   assigned_dev->guest_irq, 0);
+
if (assigned_dev->irq_source_id != -1)
kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id);
assigned_dev->irq_source_id = -1;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] KVM: Switch assigned device IRQ forwarding to threaded handler

2010-11-02 Thread Jan Kiszka
This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.

Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.

Signed-off-by: Jan Kiszka 
---
 include/linux/kvm_host.h |   12 +
 virt/kvm/assigned-dev.c  |  106 ++---
 2 files changed, 35 insertions(+), 83 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 462b982..a786419 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -456,16 +456,8 @@ struct kvm_irq_ack_notifier {
void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
-#define KVM_ASSIGNED_MSIX_PENDING  0x1
-struct kvm_guest_msix_entry {
-   u32 vector;
-   u16 entry;
-   u16 flags;
-};
-
 struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
-   struct work_struct interrupt_work;
struct list_head list;
int assigned_dev_id;
int host_segnr;
@@ -476,13 +468,13 @@ struct kvm_assigned_dev_kernel {
bool host_irq_disabled;
struct msix_entry *host_msix_entries;
int guest_irq;
-   struct kvm_guest_msix_entry *guest_msix_entries;
+   struct msix_entry *guest_msix_entries;
unsigned long irq_requested_type;
int irq_source_id;
int flags;
struct pci_dev *dev;
struct kvm *kvm;
-   spinlock_t assigned_dev_lock;
+   spinlock_t intx_lock;
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index ecc4419..d0b07ea 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -55,58 +55,31 @@ static int find_index_from_host_irq(struct 
kvm_assigned_dev_kernel
return index;
 }
 
-static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
+static irqreturn_t kvm_assigned_dev_thread(int irq, void *dev_id)
 {
-   struct kvm_assigned_dev_kernel *assigned_dev;
-   int i;
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   u32 vector;
+   int index;
 
-   assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
-   interrupt_work);
+   if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_INTX) {
+   spin_lock(&assigned_dev->intx_lock);
+   disable_irq_nosync(irq);
+   assigned_dev->host_irq_disabled = true;
+   spin_unlock(&assigned_dev->intx_lock);
+   }
 
-   spin_lock_irq(&assigned_dev->assigned_dev_lock);
if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) {
-   struct kvm_guest_msix_entry *guest_entries =
-   assigned_dev->guest_msix_entries;
-   for (i = 0; i < assigned_dev->entries_nr; i++) {
-   if (!(guest_entries[i].flags &
-   KVM_ASSIGNED_MSIX_PENDING))
-   continue;
-   guest_entries[i].flags &= ~KVM_ASSIGNED_MSIX_PENDING;
+   index = find_index_from_host_irq(assigned_dev, irq);
+   if (index >= 0) {
+   vector = assigned_dev->
+   guest_msix_entries[index].vector;
kvm_set_irq(assigned_dev->kvm,
-   assigned_dev->irq_source_id,
-   guest_entries[i].vector, 1);
+   assigned_dev->irq_source_id, vector, 1);
}
} else
kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
assigned_dev->guest_irq, 1);
 
-   spin_unlock_irq(&assigned_dev->assigned_dev_lock);
-}
-
-static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
-{
-   unsigned long flags;
-   struct kvm_assigned_dev_kernel *assigned_dev =
-   (struct kvm_assigned_dev_kernel *) dev_id;
-
-   spin_lock_irqsave(&assigned_dev->assigned_dev_lock, flags);
-   if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) {
-   int index = find_index_from_host_irq(assigned_dev, irq);
-   if (index < 0)
-   goto out;
-   assigned_dev->guest_msix_entries[index].flags |=
-   KVM_ASSIGNED_MSIX_PENDING;
-   }
-
-   schedule_work(&assigned_dev->interrupt_work);
-
-   if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_INTX) {
-   disable_irq_nosync(irq);
-   assigned_dev->host_irq_disabled = true;
-   }
-
-out:
-   spin_unlock_irqrestore(&assigned_dev->assign

[PATCH v2 0/4] KVM: Improve IRQ assignment for device passthrough

2010-11-02 Thread Jan Kiszka
Four patches to improve "classic" device assigment /wrt IRQs. Highlight
is the last one that resolves the host IRQ sharing issue for all PCI 2.3
devices. Quite essential when passing non-MSI-ready devices like many
USB host controllers.

Changes in v2:
 - Reworked IRQ forwarding path to use threaded IRQs (direct signalling
   from IRQ context does not work out of the box and may be too lengthy)
 - Refactored host IRQ naming of assigned devices (cosmetic change)
 - Avoid unmask on ack when the next IRQ is pending, rather reassert the
   guest line (PCI-2.3 patch)
 - Refactored PCI-2.3 patch (but still no control knob for shared mode -
   is that a must?)

Jan Kiszka (4):
  KVM: Clear assigned guest IRQ on release
  KVM: Switch assigned device IRQ forwarding to threaded handler
  KVM: Refactor IRQ names of assigned devices
  KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

 include/linux/kvm_host.h |   14 +--
 virt/kvm/assigned-dev.c  |  279 +
 2 files changed, 208 insertions(+), 85 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share IRQs of such devices between on the host side when
passing them to a guest.

Signed-off-by: Jan Kiszka 
---
 include/linux/kvm_host.h |1 +
 virt/kvm/assigned-dev.c  |  194 ++
 2 files changed, 180 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 46120da..fdc2bd9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -466,6 +466,7 @@ struct kvm_assigned_dev_kernel {
unsigned int entries_nr;
int host_irq;
bool host_irq_disabled;
+   bool pci_2_3;
struct msix_entry *host_msix_entries;
int guest_irq;
struct msix_entry *guest_msix_entries;
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index ca402ed..91fe9c8 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -55,17 +55,145 @@ static int find_index_from_host_irq(struct 
kvm_assigned_dev_kernel
return index;
 }
 
+/*
+ * Verify that the device supports Interrupt Disable bit in command register,
+ * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
+ * in PCI 2.2.
+ */
+static bool pci_2_3_supported(struct pci_dev *pdev)
+{
+   bool supported = false;
+   u16 orig, new;
+
+   pci_block_user_cfg_access(pdev);
+   pci_read_config_word(pdev, PCI_COMMAND, &orig);
+   pci_write_config_word(pdev, PCI_COMMAND,
+ orig ^ PCI_COMMAND_INTX_DISABLE);
+   pci_read_config_word(pdev, PCI_COMMAND, &new);
+
+   /*
+* There's no way to protect against
+* hardware bugs or detect them reliably, but as long as we know
+* what the value should be, let's go ahead and check it.
+*/
+   if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
+   dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
+   "driver or HW bug?\n", orig, new);
+   goto out;
+   }
+   if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
+   dev_warn(&pdev->dev, "Device does not support "
+"disabling interrupts: unable to bind.\n");
+   goto out;
+   }
+   supported = true;
+
+   /* Now restore the original value. */
+   pci_write_config_word(pdev, PCI_COMMAND, orig);
+
+out:
+   pci_unblock_user_cfg_access(pdev);
+   return supported;
+}
+
+static unsigned int
+pci_2_3_set_irq_mask(struct pci_dev *dev, bool mask, bool check_status)
+{
+   u32 cmd_status_dword;
+   u16 origcmd, newcmd;
+   unsigned int status;
+
+   /*
+* We do a single dword read to retrieve both command and status.
+* Document assumptions that make this possible.
+*/
+   BUILD_BUG_ON(PCI_COMMAND % 4);
+   BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
+
+   pci_block_user_cfg_access(dev);
+
+   /*
+* Read both command and status registers in a single 32-bit operation.
+* Note: we could cache the value for command and move the status read
+* out of the lock if there was a way to get notified of user changes
+* to command register through sysfs. Should be good for shared irqs.
+*/
+   pci_read_config_dword(dev, PCI_COMMAND, &cmd_status_dword);
+   origcmd = cmd_status_dword;
+   status = cmd_status_dword >> 16;
+
+   if (check_status) {
+   bool irq_pending = status & PCI_STATUS_INTERRUPT;
+
+   /*
+* Check interrupt status register to see whether our device
+* triggered the interrupt (when masking) or the next IRQ is
+* already pending (when unmasking).
+*/
+   if (!(mask == irq_pending))
+   goto done;
+   }
+
+   newcmd = origcmd & ~PCI_COMMAND_INTX_DISABLE;
+   if (mask)
+   newcmd |= PCI_COMMAND_INTX_DISABLE;
+   if (newcmd != origcmd)
+   pci_write_config_word(dev, PCI_COMMAND, newcmd);
+
+done:
+   pci_unblock_user_cfg_access(dev);
+   return status;
+}
+
+static void pci_2_3_irq_mask(struct pci_dev *dev)
+{
+   pci_2_3_set_irq_mask(dev, true, false);
+}
+
+static unsigned int pci_2_3_irq_check_and_mask(struct pci_dev *dev)
+{
+   return pci_2_3_set_irq_mask(dev, true, true);
+}
+
+static void pci_2_3_irq_unmask(struct pci_dev *dev)
+{
+   pci_2_3_set_irq_mask(dev, false, false);
+}
+
+static unsigned int pci_2_3_irq_check_and_unmask(struct pci_dev *dev)
+{
+   return pci_2_3_set_irq_mask(dev, false, true);
+}
+
+static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
+{
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   int ret = IRQ_WAKE_THREAD;
+
+   spin_lock(&assigned_dev->intx_lock);
+   if (assigned_dev->host_irq_disabled ||
+   !(pci_2_3_irq_check_and_mask(assigned_dev-

Re: 32-bit qemu on current x86-64 kernel segfauls very early

2010-11-02 Thread Avi Kivity

 On 11/02/2010 11:11 AM, Christoph Hellwig wrote:

On Sun, Oct 31, 2010 at 09:06:29AM -0400, Christoph Hellwig wrote:
>  With Linus' git tree from today I can't boot qemu when using kvm.  It
>  seems to do fine, just glacially slow without -enable-kvm.  The command
>  simplest command line that fails is:
>
>/opt/qemu/bin/qemu-system-x86_64 -enable-kvm

This issue was caused by commit 9581d442b9058d3699b4be568b6e5eae38a41493

"KVM: Fix fs/gs reload oops with invalid ldt"



Interesting, I guess we corrupt %fs on x86_64.

Intel or AMD?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 32-bit qemu on current x86-64 kernel segfauls very early

2010-11-02 Thread Christoph Hellwig
On Tue, Nov 02, 2010 at 11:59:48AM -0400, Avi Kivity wrote:
> > "KVM: Fix fs/gs reload oops with invalid ldt"
> >
> 
> Interesting, I guess we corrupt %fs on x86_64.
> 
> Intel or AMD?

Intel:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU T9600  @ 2.80GHz
stepping: 6
cpu MHz : 2133.000
cache size  : 6144 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dts tpr_shadow vnmi 
flexpriority
bogomips: 5588.00
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] KVM: Improve IRQ assignment for device passthrough

2010-11-02 Thread Michael S. Tsirkin
>  - Refactored PCI-2.3 patch (but still no control knob for shared mode -
>is that a must?)

I think it's the prudent thing to do.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 04:49:20PM +0100, Jan Kiszka wrote:
> PCI 2.3 allows to generically disable IRQ sources at device level. This
> enables us to share IRQs of such devices between on the host side when
> passing them to a guest.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  include/linux/kvm_host.h |1 +
>  virt/kvm/assigned-dev.c  |  194 
> ++
>  2 files changed, 180 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 46120da..fdc2bd9 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -466,6 +466,7 @@ struct kvm_assigned_dev_kernel {
>   unsigned int entries_nr;
>   int host_irq;
>   bool host_irq_disabled;
> + bool pci_2_3;
>   struct msix_entry *host_msix_entries;
>   int guest_irq;
>   struct msix_entry *guest_msix_entries;
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index ca402ed..91fe9c8 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -55,17 +55,145 @@ static int find_index_from_host_irq(struct 
> kvm_assigned_dev_kernel
>   return index;
>  }
>  
> +/*
> + * Verify that the device supports Interrupt Disable bit in command register,
> + * per PCI 2.3, by flipping this bit and reading it back: this bit was 
> readonly
> + * in PCI 2.2.
> + */
> +static bool pci_2_3_supported(struct pci_dev *pdev)
> +{
> + bool supported = false;
> + u16 orig, new;
> +
> + pci_block_user_cfg_access(pdev);
> + pci_read_config_word(pdev, PCI_COMMAND, &orig);
> + pci_write_config_word(pdev, PCI_COMMAND,
> +   orig ^ PCI_COMMAND_INTX_DISABLE);
> + pci_read_config_word(pdev, PCI_COMMAND, &new);
> +
> + /*
> +  * There's no way to protect against
> +  * hardware bugs or detect them reliably, but as long as we know
> +  * what the value should be, let's go ahead and check it.
> +  */
> + if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
> + dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
> + "driver or HW bug?\n", orig, new);
> + goto out;
> + }
> + if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
> + dev_warn(&pdev->dev, "Device does not support "
> +  "disabling interrupts: unable to bind.\n");
> + goto out;
> + }
> + supported = true;
> +
> + /* Now restore the original value. */
> + pci_write_config_word(pdev, PCI_COMMAND, orig);
> +
> +out:
> + pci_unblock_user_cfg_access(pdev);
> + return supported;
> +}
> +
> +static unsigned int
> +pci_2_3_set_irq_mask(struct pci_dev *dev, bool mask, bool check_status)
> +{
> + u32 cmd_status_dword;
> + u16 origcmd, newcmd;
> + unsigned int status;
> +
> + /*
> +  * We do a single dword read to retrieve both command and status.
> +  * Document assumptions that make this possible.
> +  */
> + BUILD_BUG_ON(PCI_COMMAND % 4);
> + BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
> +
> + pci_block_user_cfg_access(dev);
> +
> + /*
> +  * Read both command and status registers in a single 32-bit operation.
> +  * Note: we could cache the value for command and move the status read
> +  * out of the lock if there was a way to get notified of user changes
> +  * to command register through sysfs. Should be good for shared irqs.
> +  */
> + pci_read_config_dword(dev, PCI_COMMAND, &cmd_status_dword);
> + origcmd = cmd_status_dword;
> + status = cmd_status_dword >> 16;
> +
> + if (check_status) {
> + bool irq_pending = status & PCI_STATUS_INTERRUPT;
> +
> + /*
> +  * Check interrupt status register to see whether our device
> +  * triggered the interrupt (when masking) or the next IRQ is
> +  * already pending (when unmasking).
> +  */
> + if (!(mask == irq_pending))

Same as mask != irq_pending?

> + goto done;
> + }
> +
> + newcmd = origcmd & ~PCI_COMMAND_INTX_DISABLE;
> + if (mask)
> + newcmd |= PCI_COMMAND_INTX_DISABLE;
> + if (newcmd != origcmd)
> + pci_write_config_word(dev, PCI_COMMAND, newcmd);
> +
> +done:
> + pci_unblock_user_cfg_access(dev);
> + return status;
> +}
> +
> +static void pci_2_3_irq_mask(struct pci_dev *dev)
> +{
> + pci_2_3_set_irq_mask(dev, true, false);
> +}
> +
> +static unsigned int pci_2_3_irq_check_and_mask(struct pci_dev *dev)
> +{
> + return pci_2_3_set_irq_mask(dev, true, true);
> +}
> +
> +static void pci_2_3_irq_unmask(struct pci_dev *dev)
> +{
> + pci_2_3_set_irq_mask(dev, false, false);
> +}
> +
> +static unsigned int pci_2_3_irq_check_and_unmask(struct pci_dev *dev)
> +{
> + return pci_2_3_set_irq_mask(dev, false, true);
> +}
> +

IMO this is not a terribly good interface: all users check the pending bit
(PCI_STAT

Re: [PATCH v2 2/4] KVM: Switch assigned device IRQ forwarding to threaded handler

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 04:49:18PM +0100, Jan Kiszka wrote:
> This improves the IRQ forwarding for assigned devices: By using the
> kernel's threaded IRQ scheme, we can get rid of the latency-prone work
> queue and simplify the code in the same run.

Interesting. Do you see any latency improvements from this?

> Moreover, we no longer have to hold assigned_dev_lock while raising the
> guest IRQ, which can be a lenghty operation as we may have to iterate
> over all VCPUs. The lock is now only used for synchronizing masking vs.
> unmasking of INTx-type IRQs, thus is renames to intx_lock.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  include/linux/kvm_host.h |   12 +
>  virt/kvm/assigned-dev.c  |  106 ++---
>  2 files changed, 35 insertions(+), 83 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 462b982..a786419 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -456,16 +456,8 @@ struct kvm_irq_ack_notifier {
>   void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
>  };
>  
> -#define KVM_ASSIGNED_MSIX_PENDING0x1
> -struct kvm_guest_msix_entry {
> - u32 vector;
> - u16 entry;
> - u16 flags;
> -};
> -
>  struct kvm_assigned_dev_kernel {
>   struct kvm_irq_ack_notifier ack_notifier;
> - struct work_struct interrupt_work;
>   struct list_head list;
>   int assigned_dev_id;
>   int host_segnr;
> @@ -476,13 +468,13 @@ struct kvm_assigned_dev_kernel {
>   bool host_irq_disabled;
>   struct msix_entry *host_msix_entries;
>   int guest_irq;
> - struct kvm_guest_msix_entry *guest_msix_entries;
> + struct msix_entry *guest_msix_entries;
>   unsigned long irq_requested_type;
>   int irq_source_id;
>   int flags;
>   struct pci_dev *dev;
>   struct kvm *kvm;
> - spinlock_t assigned_dev_lock;
> + spinlock_t intx_lock;
>  };
>  
>  struct kvm_irq_mask_notifier {
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index ecc4419..d0b07ea 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -55,58 +55,31 @@ static int find_index_from_host_irq(struct 
> kvm_assigned_dev_kernel
>   return index;
>  }
>  
> -static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
> +static irqreturn_t kvm_assigned_dev_thread(int irq, void *dev_id)
>  {
> - struct kvm_assigned_dev_kernel *assigned_dev;
> - int i;
> + struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
> + u32 vector;
> + int index;
>  
> - assigned_dev = container_of(work, struct kvm_assigned_dev_kernel,
> - interrupt_work);
> + if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_INTX) {
> + spin_lock(&assigned_dev->intx_lock);
> + disable_irq_nosync(irq);
> + assigned_dev->host_irq_disabled = true;
> + spin_unlock(&assigned_dev->intx_lock);
> + }
>  
> - spin_lock_irq(&assigned_dev->assigned_dev_lock);
>   if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) {
> - struct kvm_guest_msix_entry *guest_entries =
> - assigned_dev->guest_msix_entries;
> - for (i = 0; i < assigned_dev->entries_nr; i++) {
> - if (!(guest_entries[i].flags &
> - KVM_ASSIGNED_MSIX_PENDING))
> - continue;
> - guest_entries[i].flags &= ~KVM_ASSIGNED_MSIX_PENDING;
> + index = find_index_from_host_irq(assigned_dev, irq);
> + if (index >= 0) {
> + vector = assigned_dev->
> + guest_msix_entries[index].vector;
>   kvm_set_irq(assigned_dev->kvm,
> - assigned_dev->irq_source_id,
> - guest_entries[i].vector, 1);
> + assigned_dev->irq_source_id, vector, 1);
>   }
>   } else
>   kvm_set_irq(assigned_dev->kvm, assigned_dev->irq_source_id,
>   assigned_dev->guest_irq, 1);
>  
> - spin_unlock_irq(&assigned_dev->assigned_dev_lock);
> -}
> -
> -static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
> -{
> - unsigned long flags;
> - struct kvm_assigned_dev_kernel *assigned_dev =
> - (struct kvm_assigned_dev_kernel *) dev_id;
> -
> - spin_lock_irqsave(&assigned_dev->assigned_dev_lock, flags);
> - if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) {
> - int index = find_index_from_host_irq(assigned_dev, irq);
> - if (index < 0)
> - goto out;
> - assigned_dev->guest_msix_entries[index].flags |=
> - KVM_ASSIGNED_MSIX_PENDING;
> - }
> -
> - schedule_work(&assigned_dev->interrupt_work);
> -
> - if (assigne

Re: [PATCH 1/3] cgroup - removing superfluous rcu_read_lock_held check

2010-11-02 Thread Li Zefan
On 2010年11月02日 03:15, Jiri Olsa wrote:
> hi,

This..

> the rcu_dereference_check is defined as
> 
>   #define rcu_dereference_check(p, c) \
>  __rcu_dereference_check((p), rcu_read_lock_held() || (c), __rcu)
> 
> so the caller does not need to specify rcu_read_lock_held() condition.
>
 
> wbr,
> jirka

and this should be excluded from the changelog.

> 
> 
> Signed-off-by: Jiri Olsa 

Reviewed-by: Li Zefan 

However a nitpick:

> ---
>  include/linux/cgroup.h |1 -
>  kernel/cgroup.c|6 ++
>  2 files changed, 2 insertions(+), 5 deletions(-)
...
> @@ -4544,7 +4542,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
>* it's unchanged until freed.
>*/
>   cssid = rcu_dereference_check(css->id,
> - rcu_read_lock_held() || atomic_read(&css->refcnt));
> + atomic_read(&css->refcnt));

Now the 2 lines can be made into one line and still fit into 80 chars.

>  
>   if (cssid)
>   return cssid->id;
> @@ -4557,7 +4555,7 @@ unsigned short css_depth(struct cgroup_subsys_state 
> *css)
>   struct css_id *cssid;
>  
>   cssid = rcu_dereference_check(css->id,
> - rcu_read_lock_held() || atomic_read(&css->refcnt));
> + atomic_read(&css->refcnt));

dito

>  
>   if (cssid)
>   return cssid->depth;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 18:41, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 04:49:20PM +0100, Jan Kiszka wrote:
>> PCI 2.3 allows to generically disable IRQ sources at device level. This
>> enables us to share IRQs of such devices between on the host side when
>> passing them to a guest.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  include/linux/kvm_host.h |1 +
>>  virt/kvm/assigned-dev.c  |  194 
>> ++
>>  2 files changed, 180 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 46120da..fdc2bd9 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -466,6 +466,7 @@ struct kvm_assigned_dev_kernel {
>>  unsigned int entries_nr;
>>  int host_irq;
>>  bool host_irq_disabled;
>> +bool pci_2_3;
>>  struct msix_entry *host_msix_entries;
>>  int guest_irq;
>>  struct msix_entry *guest_msix_entries;
>> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
>> index ca402ed..91fe9c8 100644
>> --- a/virt/kvm/assigned-dev.c
>> +++ b/virt/kvm/assigned-dev.c
>> @@ -55,17 +55,145 @@ static int find_index_from_host_irq(struct 
>> kvm_assigned_dev_kernel
>>  return index;
>>  }
>>  
>> +/*
>> + * Verify that the device supports Interrupt Disable bit in command 
>> register,
>> + * per PCI 2.3, by flipping this bit and reading it back: this bit was 
>> readonly
>> + * in PCI 2.2.
>> + */
>> +static bool pci_2_3_supported(struct pci_dev *pdev)
>> +{
>> +bool supported = false;
>> +u16 orig, new;
>> +
>> +pci_block_user_cfg_access(pdev);
>> +pci_read_config_word(pdev, PCI_COMMAND, &orig);
>> +pci_write_config_word(pdev, PCI_COMMAND,
>> +  orig ^ PCI_COMMAND_INTX_DISABLE);
>> +pci_read_config_word(pdev, PCI_COMMAND, &new);
>> +
>> +/*
>> + * There's no way to protect against
>> + * hardware bugs or detect them reliably, but as long as we know
>> + * what the value should be, let's go ahead and check it.
>> + */
>> +if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
>> +dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
>> +"driver or HW bug?\n", orig, new);
>> +goto out;
>> +}
>> +if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
>> +dev_warn(&pdev->dev, "Device does not support "
>> + "disabling interrupts: unable to bind.\n");
>> +goto out;
>> +}
>> +supported = true;
>> +
>> +/* Now restore the original value. */
>> +pci_write_config_word(pdev, PCI_COMMAND, orig);
>> +
>> +out:
>> +pci_unblock_user_cfg_access(pdev);
>> +return supported;
>> +}
>> +
>> +static unsigned int
>> +pci_2_3_set_irq_mask(struct pci_dev *dev, bool mask, bool check_status)
>> +{
>> +u32 cmd_status_dword;
>> +u16 origcmd, newcmd;
>> +unsigned int status;
>> +
>> +/*
>> + * We do a single dword read to retrieve both command and status.
>> + * Document assumptions that make this possible.
>> + */
>> +BUILD_BUG_ON(PCI_COMMAND % 4);
>> +BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
>> +
>> +pci_block_user_cfg_access(dev);
>> +
>> +/*
>> + * Read both command and status registers in a single 32-bit operation.
>> + * Note: we could cache the value for command and move the status read
>> + * out of the lock if there was a way to get notified of user changes
>> + * to command register through sysfs. Should be good for shared irqs.
>> + */
>> +pci_read_config_dword(dev, PCI_COMMAND, &cmd_status_dword);
>> +origcmd = cmd_status_dword;
>> +status = cmd_status_dword >> 16;
>> +
>> +if (check_status) {
>> +bool irq_pending = status & PCI_STATUS_INTERRUPT;
>> +
>> +/*
>> + * Check interrupt status register to see whether our device
>> + * triggered the interrupt (when masking) or the next IRQ is
>> + * already pending (when unmasking).
>> + */
>> +if (!(mask == irq_pending))
> 
> Same as mask != irq_pending?

Yes. Relict of various refactorings.

> 
>> +goto done;
>> +}
>> +
>> +newcmd = origcmd & ~PCI_COMMAND_INTX_DISABLE;
>> +if (mask)
>> +newcmd |= PCI_COMMAND_INTX_DISABLE;
>> +if (newcmd != origcmd)
>> +pci_write_config_word(dev, PCI_COMMAND, newcmd);
>> +
>> +done:
>> +pci_unblock_user_cfg_access(dev);
>> +return status;
>> +}
>> +
>> +static void pci_2_3_irq_mask(struct pci_dev *dev)
>> +{
>> +pci_2_3_set_irq_mask(dev, true, false);
>> +}
>> +
>> +static unsigned int pci_2_3_irq_check_and_mask(struct pci_dev *dev)
>> +{
>> +return pci_2_3_set_irq_mask(dev, true, true);
>> +}
>> +
>> +static void pci_2_3_irq_unmask(struct pci_dev *dev)
>> +{
>> +pci_2_3_set_irq_mask(dev, false, false);
>> +}
>> +
>> +static unsigned int pci_2_3_irq_check_and_unmask(struct pci_dev

Re: [PATCH v2 0/4] KVM: Improve IRQ assignment for device passthrough

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 18:11, Michael S. Tsirkin wrote:
>>  - Refactored PCI-2.3 patch (but still no control knob for shared mode -
>>is that a must?)
> 
> I think it's the prudent thing to do.
> 

What a pity...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] KVM: Switch assigned device IRQ forwarding to threaded handler

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 18:44, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 04:49:18PM +0100, Jan Kiszka wrote:
>> This improves the IRQ forwarding for assigned devices: By using the
>> kernel's threaded IRQ scheme, we can get rid of the latency-prone work
>> queue and simplify the code in the same run.
> 
> Interesting. Do you see any latency improvements from this?

Haven't measured, but I was primarily thinking of real-time scenarios,
ie. setups where you may want adjust the threaded IRQ priority -
something you cannot do with the shared kevent thread.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
> >> @@ -99,12 +228,23 @@ static void kvm_assigned_dev_ack_irq(struct 
> >> kvm_irq_ack_notifier *kian)
> >>/* The guest irq may be shared so this ack may be
> >> * from another device.
> >> */
> >> -  spin_lock(&dev->intx_lock);
> >> +  spin_lock_irq(&dev->intx_lock);
> >>if (dev->host_irq_disabled) {
> >> -  enable_irq(dev->host_irq);
> >> +  if (dev->pci_2_3) {
> >> +  if (pci_2_3_irq_check_and_unmask(dev->dev) &
> >> +  PCI_STATUS_INTERRUPT) {
> >> +  reassert = true;
> >> +  goto out;
> >> +  }
> >> +  } else
> >> +  enable_irq(dev->host_irq);
> > 
> > Or
> > 
> > if (!dev->pci_2_3)
> > enable_irq(dev->host_irq);
> > else if (pci_2_3_irq_check_and_unmask(dev->dev) & 
> > PCI_STATUS_INTERRUPT) {
> > reassert = true;
> > goto out;
> > }
> > 
> > to reduce nesting.
> 
> Yeah.
> 
> > 
> >>dev->host_irq_disabled = false;
> >>}
> >> -  spin_unlock(&dev->intx_lock);
> >> +out:
> >> +  spin_unlock_irq(&dev->intx_lock);
> >> +
> >> +  if (reassert)
> >> +  kvm_set_irq(dev->kvm, dev->irq_source_id, dev->guest_irq, 1);
> > 
> > Hmm, I think this still has more overhead than it needs to have.
> > Instead of setting level to 0 and then back to 1, can't we just
> > avoid set to 1 in the first place? This would need a different
> > interface than pci_2_3_irq_check_and_unmask to avoid a race
> > where interrupt is received while we are acking another one:
> > 
> > block userspace access
> > check pending bit
> > if (!pending)
> > set irq (0)
> > clear pending
> > block userspace access
> > 
> > Would be worth it for high volume devices.
> 
> The problem is that we can't reorder guest IRQ line clearing and host
> IRQ line enabling without taking a lock across host IRQ disable + guest
> IRQ raise - and that is now distributed across hard and threaded IRQ
> handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.

Oh I think I confused you.
What I mean is:

block userspace access
check interrupt status bit
if (!interrupt status bit set)
set irq (0)
clear interrupt disable bit
block userspace access

This way we enable interrupt after set irq so not need for
extra locks I think.

Hmm one thing I noticed is that pci_block_user_cfg_access
will BUG_ON if it was already blocked. So I think we have
a bug here when interrupt handler kicks in right after
we unmask interrupts.

Probably need some kind of lock to protect against this.


> > 
> >>  }
> >>  
> >>  static void deassign_guest_irq(struct kvm *kvm,
> >> @@ -151,7 +291,11 @@ static void deassign_host_irq(struct kvm *kvm,
> >>pci_disable_msix(assigned_dev->dev);
> >>} else {
> >>/* Deal with MSI and INTx */
> >> -  disable_irq(assigned_dev->host_irq);
> >> +  if (assigned_dev->pci_2_3) {
> >> +  pci_2_3_irq_mask(assigned_dev->dev);
> >> +  synchronize_irq(assigned_dev->host_irq);
> >> +  } else
> >> +  disable_irq(assigned_dev->host_irq);
> >>  
> >>free_irq(assigned_dev->host_irq, (void *)assigned_dev);
> >>  
> >> @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm *kvm,
> >>  
> >>pci_reset_function(assigned_dev->dev);
> >>  
> >> +  /*
> >> +   * Unmask the IRQ at PCI level once the reset is done - the next user
> >> +   * may not expect the IRQ being masked.
> >> +   */
> >> +  if (assigned_dev->pci_2_3)
> >> +  pci_2_3_irq_unmask(assigned_dev->dev);
> >> +
> > 
> > Doesn't pci_reset_function clear mask bit? It seems to ...
> 
> I was left with non-functional devices for the host here if I was not
> doing this. Need to recheck, but I think it was required.

Interesting. Could you check why please?


> > 
> >>pci_release_regions(assigned_dev->dev);
> >>pci_disable_device(assigned_dev->dev);
> >>pci_dev_put(assigned_dev->dev);
> >> @@ -224,15 +375,29 @@ void kvm_free_all_assigned_devices(struct kvm *kvm)
> >>  static int assigned_device_enable_host_intx(struct kvm *kvm,
> >>struct kvm_assigned_dev_kernel *dev)
> >>  {
> >> +  irq_handler_t irq_handler = NULL;
> >> +  unsigned long flags = 0;
> >> +
> >>dev->host_irq = dev->dev->irq;
> >> -  /* Even though this is PCI, we don't want to use shared
> >> -   * interrupts. Sharing host devices with guest-assigned devices
> >> -   * on the same interrupt line is not a happy situation: there
> >> -   * are going to be long delays in accepting, acking, etc.
> >> +
> >> +  /*
> >> +   * We can only share the IRQ line with other host devices if we are
> >> +   * able to disable the IRQ source at device-

Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
dev->host_irq_disabled = false;
}
 -  spin_unlock(&dev->intx_lock);
 +out:
 +  spin_unlock_irq(&dev->intx_lock);
 +
 +  if (reassert)
 +  kvm_set_irq(dev->kvm, dev->irq_source_id, dev->guest_irq, 1);
>>>
>>> Hmm, I think this still has more overhead than it needs to have.
>>> Instead of setting level to 0 and then back to 1, can't we just
>>> avoid set to 1 in the first place? This would need a different
>>> interface than pci_2_3_irq_check_and_unmask to avoid a race
>>> where interrupt is received while we are acking another one:
>>>
>>> block userspace access
>>> check pending bit
>>> if (!pending)
>>> set irq (0)
>>> clear pending
>>> block userspace access
>>>
>>> Would be worth it for high volume devices.
>>
>> The problem is that we can't reorder guest IRQ line clearing and host
>> IRQ line enabling without taking a lock across host IRQ disable + guest
>> IRQ raise - and that is now distributed across hard and threaded IRQ
>> handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
> 
> Oh I think I confused you.
> What I mean is:
> 
>   block userspace access
>   check interrupt status bit
>   if (!interrupt status bit set)
>   set irq (0)
>   clear interrupt disable bit
>   block userspace access
> 
> This way we enable interrupt after set irq so not need for
> extra locks I think.

OK. Would require some serious refactoring again.

But what about edge IRQs? Don't we need to toggle the bit for them? And
as we do not differentiate between level and edge, we currently have to
do this unconditionally.

> 
> Hmm one thing I noticed is that pci_block_user_cfg_access
> will BUG_ON if it was already blocked. So I think we have
> a bug here when interrupt handler kicks in right after
> we unmask interrupts.
> 
> Probably need some kind of lock to protect against this.
> 

Or an atomic counter. Will have a look.

Alex, does VFIO take care of this already?

> 
>>>
  }
  
  static void deassign_guest_irq(struct kvm *kvm,
 @@ -151,7 +291,11 @@ static void deassign_host_irq(struct kvm *kvm,
pci_disable_msix(assigned_dev->dev);
} else {
/* Deal with MSI and INTx */
 -  disable_irq(assigned_dev->host_irq);
 +  if (assigned_dev->pci_2_3) {
 +  pci_2_3_irq_mask(assigned_dev->dev);
 +  synchronize_irq(assigned_dev->host_irq);
 +  } else
 +  disable_irq(assigned_dev->host_irq);
  
free_irq(assigned_dev->host_irq, (void *)assigned_dev);
  
 @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm *kvm,
  
pci_reset_function(assigned_dev->dev);
  
 +  /*
 +   * Unmask the IRQ at PCI level once the reset is done - the next user
 +   * may not expect the IRQ being masked.
 +   */
 +  if (assigned_dev->pci_2_3)
 +  pci_2_3_irq_unmask(assigned_dev->dev);
 +
>>>
>>> Doesn't pci_reset_function clear mask bit? It seems to ...
>>
>> I was left with non-functional devices for the host here if I was not
>> doing this. Need to recheck, but I think it was required.
> 
> Interesting. Could you check why please?
> 

Can't reproduce anymore. This was early code, maybe affected by some
bits or buts that no longer exist.

Spec says it's cleared on reset, so I removed those lines now.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] macvlan: Introduce a PASSTHRU mode to takeover the underlying device

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 11:19:13AM -0700, Sridhar Samudrala wrote:
> On Mon, 2010-11-01 at 10:28 +0200, Michael S. Tsirkin wrote:
> 
> On Tue, Oct 26, 2010 at 03:19:38PM -0700, Sridhar Samudrala wrote:
> > With the current default macvtap mode, a KVM guest using virtio with
> > macvtap backend has the following limitations.
> > - cannot change/add a mac address on the guest virtio-net
> > - cannot create a vlan device on the guest virtio-net
> > - cannot enable promiscuous mode on guest virtio-net
> >
> > This patch introduces a new mode called 'passthru' when creating a
> > macvlan device which allows takeover of the underlying device and
> > passing it to a guest using virtio with macvtap backend.
> >
> > Only one macvlan device is allowed in passthru mode and it inherits
> > the mac address from the underlying device and sets it in promiscuous
> > mode to receive and forward all the packets.
> >
> > Thanks
> > Sridhar
> 
> One concern with promisc mode is that for the common case,
> which is a single mac and no vlans, we will be getting
> extra packets that will get dropped in userspace/guest
> as compared to the case where same mac is programmed
> in hardware and by guest.
> 
> In the tap/bridge model also, the external i/f is put in promiscuous mode and
> the
> bridge does the filtering of extra packets.

Yes but
1. that is much cheaper than passing them all the way up to the guest.
2. it's pretty painful for management to have to decide between
   features and speed. Better give them both :)

> We could let userspace supply a list of mac/vlan addresses through
> an ioctl on macvtap, and then
> 1. for a single mac, program it in hardware
> 2. for other configurations, set promisc mode
> 
> tun already has TUNSETTXFILTER which might come in handy here.
> We don't pass in vlans with the filter now but maybe we should.
> How does this sound?
> 
> I guess this can be done. But i am not sure if we can extend the existing
> TUNSETTXFILTER
> to support vlans too. we may need a new ioctl.
> 
> Thanks
> Sridhar

OK. Maybe add it to tap too.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:40, Jan Kiszka wrote:
> @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm *kvm,
>  
>   pci_reset_function(assigned_dev->dev);
>  
> + /*
> +  * Unmask the IRQ at PCI level once the reset is done - the next user
> +  * may not expect the IRQ being masked.
> +  */
> + if (assigned_dev->pci_2_3)
> + pci_2_3_irq_unmask(assigned_dev->dev);
> +

 Doesn't pci_reset_function clear mask bit? It seems to ...
>>>
>>> I was left with non-functional devices for the host here if I was not
>>> doing this. Need to recheck, but I think it was required.
>>
>> Interesting. Could you check why please?
>>
> 
> Can't reproduce anymore. This was early code, maybe affected by some
> bits or buts that no longer exist.
> 
> Spec says it's cleared on reset, so I removed those lines now.
> 

Hmpf, it just happened again: Guest was using my ath9k, I killed the
guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
config space after the reset, bringing the disable bit back?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:48, Jan Kiszka wrote:
> Am 02.11.2010 19:40, Jan Kiszka wrote:
>> @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
>> *kvm,
>>  
>>  pci_reset_function(assigned_dev->dev);
>>  
>> +/*
>> + * Unmask the IRQ at PCI level once the reset is done - the 
>> next user
>> + * may not expect the IRQ being masked.
>> + */
>> +if (assigned_dev->pci_2_3)
>> +pci_2_3_irq_unmask(assigned_dev->dev);
>> +
>
> Doesn't pci_reset_function clear mask bit? It seems to ...

 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.
>>>
>>> Interesting. Could you check why please?
>>>
>>
>> Can't reproduce anymore. This was early code, maybe affected by some
>> bits or buts that no longer exist.
>>
>> Spec says it's cleared on reset, so I removed those lines now.
>>
> 
> Hmpf, it just happened again: Guest was using my ath9k, I killed the
> guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
> config space after the reset, bringing the disable bit back?

Or does the kernel cache the control word?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote:
> Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
> > On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
>   dev->host_irq_disabled = false;
>   }
>  -spin_unlock(&dev->intx_lock);
>  +out:
>  +spin_unlock_irq(&dev->intx_lock);
>  +
>  +if (reassert)
>  +kvm_set_irq(dev->kvm, dev->irq_source_id, 
>  dev->guest_irq, 1);
> >>>
> >>> Hmm, I think this still has more overhead than it needs to have.
> >>> Instead of setting level to 0 and then back to 1, can't we just
> >>> avoid set to 1 in the first place? This would need a different
> >>> interface than pci_2_3_irq_check_and_unmask to avoid a race
> >>> where interrupt is received while we are acking another one:
> >>>
> >>>   block userspace access
> >>>   check pending bit
> >>>   if (!pending)
> >>>   set irq (0)
> >>>   clear pending
> >>>   block userspace access
> >>>
> >>> Would be worth it for high volume devices.
> >>
> >> The problem is that we can't reorder guest IRQ line clearing and host
> >> IRQ line enabling without taking a lock across host IRQ disable + guest
> >> IRQ raise - and that is now distributed across hard and threaded IRQ
> >> handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
> > 
> > Oh I think I confused you.
> > What I mean is:
> > 
> > block userspace access
> > check interrupt status bit
> > if (!interrupt status bit set)
> > set irq (0)
> > clear interrupt disable bit
> > block userspace access
> > 
> > This way we enable interrupt after set irq so not need for
> > extra locks I think.
> 
> OK. Would require some serious refactoring again.

That would also mean we can't just solve the nested block/unblock
problem with a simple lock.  Not sure this is worth the effort.

> But what about edge IRQs? Don't we need to toggle the bit for them? And
> as we do not differentiate between level and edge, we currently have to
> do this unconditionally.

AFAIK PCI IRQs are level, so I don't think we need to bother.

> > 
> > Hmm one thing I noticed is that pci_block_user_cfg_access
> > will BUG_ON if it was already blocked. So I think we have
> > a bug here when interrupt handler kicks in right after
> > we unmask interrupts.
> > 
> > Probably need some kind of lock to protect against this.
> > 
> 
> Or an atomic counter.

BTW block userspace access uses a global spinlock which will likely hurt
us on multi-CPU. Switching that to something more SMP friendly, e.g. a
per-device spinlock, might be a good idea: I don't see why that lock and
queue are global.

> Will have a look.

Need to also consider an interrupt running in parallel
with unmasking in thread.

> Alex, does VFIO take care of this already?

VFIO does this all under spin_lock_irq.

> > 
> >>>
>   }
>   
>   static void deassign_guest_irq(struct kvm *kvm,
>  @@ -151,7 +291,11 @@ static void deassign_host_irq(struct kvm *kvm,
>   pci_disable_msix(assigned_dev->dev);
>   } else {
>   /* Deal with MSI and INTx */
>  -disable_irq(assigned_dev->host_irq);
>  +if (assigned_dev->pci_2_3) {
>  +pci_2_3_irq_mask(assigned_dev->dev);
>  +synchronize_irq(assigned_dev->host_irq);
>  +} else
>  +disable_irq(assigned_dev->host_irq);
>   
>   free_irq(assigned_dev->host_irq, (void *)assigned_dev);
>   
>  @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
>  *kvm,
>   
>   pci_reset_function(assigned_dev->dev);
>   
>  +/*
>  + * Unmask the IRQ at PCI level once the reset is done - the 
>  next user
>  + * may not expect the IRQ being masked.
>  + */
>  +if (assigned_dev->pci_2_3)
>  +pci_2_3_irq_unmask(assigned_dev->dev);
>  +
> >>>
> >>> Doesn't pci_reset_function clear mask bit? It seems to ...
> >>
> >> I was left with non-functional devices for the host here if I was not
> >> doing this. Need to recheck, but I think it was required.
> > 
> > Interesting. Could you check why please?
> > 
> 
> Can't reproduce anymore. This was early code, maybe affected by some
> bits or buts that no longer exist.
> 
> Spec says it's cleared on reset, so I removed those lines now.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
> Am 02.11.2010 19:48, Jan Kiszka wrote:
> > Am 02.11.2010 19:40, Jan Kiszka wrote:
> >> @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
> >> *kvm,
> >>  
> >>pci_reset_function(assigned_dev->dev);
> >>  
> >> +  /*
> >> +   * Unmask the IRQ at PCI level once the reset is done - the 
> >> next user
> >> +   * may not expect the IRQ being masked.
> >> +   */
> >> +  if (assigned_dev->pci_2_3)
> >> +  pci_2_3_irq_unmask(assigned_dev->dev);
> >> +
> >
> > Doesn't pci_reset_function clear mask bit? It seems to ...
> 
>  I was left with non-functional devices for the host here if I was not
>  doing this. Need to recheck, but I think it was required.
> >>>
> >>> Interesting. Could you check why please?
> >>>
> >>
> >> Can't reproduce anymore. This was early code, maybe affected by some
> >> bits or buts that no longer exist.
> >>
> >> Spec says it's cleared on reset, so I removed those lines now.
> >>
> > 
> > Hmpf, it just happened again: Guest was using my ath9k, I killed the
> > guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
> > config space after the reset, bringing the disable bit back?
> 
> Or does the kernel cache the control word?
> 
> Jan

Maybe it does, need to dig in drivers/pci. If it does this
might have other implications.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] building fix - do not check unsigned var >= 0

2010-11-02 Thread Michael S. Tsirkin
On Wed, Oct 13, 2010 at 10:32:18AM +0800, Yang Ruirui wrote:
> fix following build error by do not check unsigned op >= 0: 
> 
>   CCx86_64-softmmu/tcg/tcg.o
> cc1: warnings being treated as errors
> /home/dave/git/qemu-kvm/tcg/tcg.c: In function ‘tcg_add_target_add_op_defs’:
> /home/dave/git/qemu-kvm/tcg/tcg.c:1030: error: comparison of unsigned 
> expression >= 0 is always true
> make[1]: *** [tcg/tcg.o] Error 1
> 
> Signed-off-by: Yang Ruirui 
> Signed-off-by: Dave Young 

I think Blue Swirl was replacing these with range macros from range.h

> ---
>  tcg/tcg.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- qemu-kvm.orig/tcg/tcg.c   2010-06-29 14:45:04.0 +0800
> +++ qemu-kvm/tcg/tcg.c2010-09-26 18:07:26.523339798 +0800
> @@ -1027,7 +1027,7 @@ void tcg_add_target_add_op_defs(const TC
>  if (tdefs->op == (TCGOpcode)-1)
>  break;
>  op = tdefs->op;
> -assert(op >= 0 && op < NB_OPS);
> +assert(op < NB_OPS);
>  def = &tcg_op_defs[op];
>  #if defined(CONFIG_DEBUG_TCG)
>  /* Duplicate entry in op definitions? */
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 32-bit qemu on current x86-64 kernel segfauls very early

2010-11-02 Thread Avi Kivity

On 11/02/2010 12:02 PM, Christoph Hellwig wrote:

On Tue, Nov 02, 2010 at 11:59:48AM -0400, Avi Kivity wrote:
>  >  "KVM: Fix fs/gs reload oops with invalid ldt"
>  >
>
>  Interesting, I guess we corrupt %fs on x86_64.
>
>  Intel or AMD?

Intel:


Thanks, reproduced and have a fix.  Will send it out later.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio: Extended capability fixes

2010-11-02 Thread Michael S. Tsirkin
On Mon, Nov 01, 2010 at 11:08:35PM -0600, Alex Williamson wrote:
> - Virtual channel position gets truncated as a u8
>  - Print the ecap that's unknown, not the last cap we saw
>  - Print actual config offset, which provides enough info to make
>some sense of the error.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  drivers/vfio/vfio_pci_config.c |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_pci_config.c b/drivers/vfio/vfio_pci_config.c
> index 8af995d..8304316 100644
> --- a/drivers/vfio/vfio_pci_config.c
> +++ b/drivers/vfio/vfio_pci_config.c
> @@ -410,7 +410,7 @@ static int vfio_msi_cap_len(struct vfio_dev *vdev, u8 pos)
>   * Determine extended capability length for VC (2 & 9) and
>   * MFVC capabilities
>   */
> -static int vfio_vc_cap_len(struct vfio_dev *vdev, u8 pos)
> +static int vfio_vc_cap_len(struct vfio_dev *vdev, u16 pos)
>  {
>   struct pci_dev *pdev = vdev->pdev;
>   u32 dw;
> @@ -580,7 +580,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
>   printk(KERN_WARNING
>   "%s: pci config conflict at %x, "
>   "caps %x %x\n",
> - __func__, i, map[pos+i], cap);
> + __func__, pos+i, map[pos+i], cap);
>   map[pos+i] = cap;
>   }
>   ret = pci_read_config_byte(pdev, pos + PCI_CAP_LIST_NEXT, &pos);
> @@ -683,7 +683,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
>   if (len == 0 || len == 0xFF) {
>   printk(KERN_WARNING
>   "%s: unknown length for pci ext cap %x\n",
> - __func__, cap);
> + __func__, ecap);
>   len = PCI_CAP_SIZEOF;
>   }
>   for (i = 0; i < len; i++) {
> @@ -691,7 +691,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
>   printk(KERN_WARNING
>   "%s: pci config conflict at %x, "
>   "caps %x %x\n",
> - __func__, i, map[epos+i], ecap);
> + __func__, epos+i, map[epos+i], ecap);
>   map[epos+i] = ecap;

Not related to this patch, but I am surprised checkpatch does not
complain about lack of spaces around + here and elsewhere.
Or does it?

>   }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:52, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote:
>> Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
>>> On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
>>  dev->host_irq_disabled = false;
>>  }
>> -spin_unlock(&dev->intx_lock);
>> +out:
>> +spin_unlock_irq(&dev->intx_lock);
>> +
>> +if (reassert)
>> +kvm_set_irq(dev->kvm, dev->irq_source_id, 
>> dev->guest_irq, 1);
>
> Hmm, I think this still has more overhead than it needs to have.
> Instead of setting level to 0 and then back to 1, can't we just
> avoid set to 1 in the first place? This would need a different
> interface than pci_2_3_irq_check_and_unmask to avoid a race
> where interrupt is received while we are acking another one:
>
>   block userspace access
>   check pending bit
>   if (!pending)
>   set irq (0)
>   clear pending
>   block userspace access
>
> Would be worth it for high volume devices.

 The problem is that we can't reorder guest IRQ line clearing and host
 IRQ line enabling without taking a lock across host IRQ disable + guest
 IRQ raise - and that is now distributed across hard and threaded IRQ
 handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
>>>
>>> Oh I think I confused you.
>>> What I mean is:
>>>
>>> block userspace access
>>> check interrupt status bit
>>> if (!interrupt status bit set)
>>> set irq (0)
>>> clear interrupt disable bit
>>> block userspace access
>>>
>>> This way we enable interrupt after set irq so not need for
>>> extra locks I think.
>>
>> OK. Would require some serious refactoring again.
> 
> That would also mean we can't just solve the nested block/unblock
> problem with a simple lock.  Not sure this is worth the effort.

Can't follow: what can be nested and how?

> 
>> But what about edge IRQs? Don't we need to toggle the bit for them? And
>> as we do not differentiate between level and edge, we currently have to
>> do this unconditionally.
> 
> AFAIK PCI IRQs are level, so I don't think we need to bother.

Ah, indeed.

> 
>>>
>>> Hmm one thing I noticed is that pci_block_user_cfg_access
>>> will BUG_ON if it was already blocked. So I think we have
>>> a bug here when interrupt handler kicks in right after
>>> we unmask interrupts.
>>>
>>> Probably need some kind of lock to protect against this.
>>>
>>
>> Or an atomic counter.
> 
> BTW block userspace access uses a global spinlock which will likely hurt
> us on multi-CPU. Switching that to something more SMP friendly, e.g. a
> per-device spinlock, might be a good idea: I don't see why that lock and
> queue are global.

Been through that code recently, hairy stuff. pci_lock also protects the
bus operation which can be overloaded (e.g. for error injection - if
that is not the only use case). So we need a per-bus lock, but that can
still cause contentions if devices on the same bus are handled on
different cpus.

I think the whole PCI config interface is simply not designed for
performance. It's considered a slow path, which it normally is.

> 
>> Will have a look.
> 
> Need to also consider an interrupt running in parallel
> with unmasking in thread.
> 
>> Alex, does VFIO take care of this already?
> 
> VFIO does this all under spin_lock_irq.

Everything has its pros and cons...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 08:11:31PM +0100, Jan Kiszka wrote:
> Am 02.11.2010 19:52, Michael S. Tsirkin wrote:
> > On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote:
> >> Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
> >>dev->host_irq_disabled = false;
> >>}
> >> -  spin_unlock(&dev->intx_lock);
> >> +out:
> >> +  spin_unlock_irq(&dev->intx_lock);
> >> +
> >> +  if (reassert)
> >> +  kvm_set_irq(dev->kvm, dev->irq_source_id, 
> >> dev->guest_irq, 1);
> >
> > Hmm, I think this still has more overhead than it needs to have.
> > Instead of setting level to 0 and then back to 1, can't we just
> > avoid set to 1 in the first place? This would need a different
> > interface than pci_2_3_irq_check_and_unmask to avoid a race
> > where interrupt is received while we are acking another one:
> >
> > block userspace access
> > check pending bit
> > if (!pending)
> > set irq (0)
> > clear pending
> > block userspace access
> >
> > Would be worth it for high volume devices.
> 
>  The problem is that we can't reorder guest IRQ line clearing and host
>  IRQ line enabling without taking a lock across host IRQ disable + guest
>  IRQ raise - and that is now distributed across hard and threaded IRQ
>  handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
> >>>
> >>> Oh I think I confused you.
> >>> What I mean is:
> >>>
> >>>   block userspace access
> >>>   check interrupt status bit
> >>>   if (!interrupt status bit set)
> >>>   set irq (0)
> >>>   clear interrupt disable bit
> >>>   block userspace access
> >>>
> >>> This way we enable interrupt after set irq so not need for
> >>> extra locks I think.
> >>
> >> OK. Would require some serious refactoring again.
> > 
> > That would also mean we can't just solve the nested block/unblock
> > problem with a simple lock.  Not sure this is worth the effort.
> 
> Can't follow: what can be nested and how?

I just mean interrupt trying to block userspace when thread
did that already.

> > 
> >> But what about edge IRQs? Don't we need to toggle the bit for them? And
> >> as we do not differentiate between level and edge, we currently have to
> >> do this unconditionally.
> > 
> > AFAIK PCI IRQs are level, so I don't think we need to bother.
> 
> Ah, indeed.
> 
> > 
> >>>
> >>> Hmm one thing I noticed is that pci_block_user_cfg_access
> >>> will BUG_ON if it was already blocked. So I think we have
> >>> a bug here when interrupt handler kicks in right after
> >>> we unmask interrupts.
> >>>
> >>> Probably need some kind of lock to protect against this.
> >>>
> >>
> >> Or an atomic counter.
> > 
> > BTW block userspace access uses a global spinlock which will likely hurt
> > us on multi-CPU. Switching that to something more SMP friendly, e.g. a
> > per-device spinlock, might be a good idea: I don't see why that lock and
> > queue are global.
> 
> Been through that code recently, hairy stuff. pci_lock also protects the
> bus operation which can be overloaded (e.g. for error injection - if
> that is not the only use case). So we need a per-bus lock, but that can
> still cause contentions if devices on the same bus are handled on
> different cpus.

Right. So this lock got reused for blocking userspace, I do not suggest
we rip it all out, just make userspace blocking use
a finer-grained lock.

> I think the whole PCI config interface is simply not designed for
> performance. It's considered a slow path, which it normally is.

So I guess we'll need to fix that now, at least if we
want to make the 2.3 way the default.

> > 
> >> Will have a look.
> > 
> > Need to also consider an interrupt running in parallel
> > with unmasking in thread.
> > 
> >> Alex, does VFIO take care of this already?
> > 
> > VFIO does this all under spin_lock_irq.
> 
> Everything has its pros and cons...
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
>> Am 02.11.2010 19:48, Jan Kiszka wrote:
>>> Am 02.11.2010 19:40, Jan Kiszka wrote:
 @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
 *kvm,
  
pci_reset_function(assigned_dev->dev);
  
 +  /*
 +   * Unmask the IRQ at PCI level once the reset is done - the 
 next user
 +   * may not expect the IRQ being masked.
 +   */
 +  if (assigned_dev->pci_2_3)
 +  pci_2_3_irq_unmask(assigned_dev->dev);
 +
>>>
>>> Doesn't pci_reset_function clear mask bit? It seems to ...
>>
>> I was left with non-functional devices for the host here if I was not
>> doing this. Need to recheck, but I think it was required.
>
> Interesting. Could you check why please?
>

 Can't reproduce anymore. This was early code, maybe affected by some
 bits or buts that no longer exist.

 Spec says it's cleared on reset, so I removed those lines now.

>>>
>>> Hmpf, it just happened again: Guest was using my ath9k, I killed the
>>> guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
>>> config space after the reset, bringing the disable bit back?
>>
>> Or does the kernel cache the control word?
>>
>> Jan
> 
> Maybe it does, need to dig in drivers/pci. If it does this
> might have other implications.

OK, that mystery is resolved now: pci_reset_function saves & restores
the device state.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Alex Williamson
On Tue, 2010-11-02 at 19:40 +0100, Jan Kiszka wrote:
> Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
> > On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
>   dev->host_irq_disabled = false;
>   }
>  -spin_unlock(&dev->intx_lock);
>  +out:
>  +spin_unlock_irq(&dev->intx_lock);
>  +
>  +if (reassert)
>  +kvm_set_irq(dev->kvm, dev->irq_source_id, 
>  dev->guest_irq, 1);
> >>>
> >>> Hmm, I think this still has more overhead than it needs to have.
> >>> Instead of setting level to 0 and then back to 1, can't we just
> >>> avoid set to 1 in the first place? This would need a different
> >>> interface than pci_2_3_irq_check_and_unmask to avoid a race
> >>> where interrupt is received while we are acking another one:
> >>>
> >>>   block userspace access
> >>>   check pending bit
> >>>   if (!pending)
> >>>   set irq (0)
> >>>   clear pending
> >>>   block userspace access
> >>>
> >>> Would be worth it for high volume devices.
> >>
> >> The problem is that we can't reorder guest IRQ line clearing and host
> >> IRQ line enabling without taking a lock across host IRQ disable + guest
> >> IRQ raise - and that is now distributed across hard and threaded IRQ
> >> handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
> > 
> > Oh I think I confused you.
> > What I mean is:
> > 
> > block userspace access
> > check interrupt status bit
> > if (!interrupt status bit set)
> > set irq (0)
> > clear interrupt disable bit
> > block userspace access
> > 
> > This way we enable interrupt after set irq so not need for
> > extra locks I think.
> 
> OK. Would require some serious refactoring again.
> 
> But what about edge IRQs? Don't we need to toggle the bit for them? And
> as we do not differentiate between level and edge, we currently have to
> do this unconditionally.
> 
> > 
> > Hmm one thing I noticed is that pci_block_user_cfg_access
> > will BUG_ON if it was already blocked. So I think we have
> > a bug here when interrupt handler kicks in right after
> > we unmask interrupts.
> > 
> > Probably need some kind of lock to protect against this.
> > 
> 
> Or an atomic counter. Will have a look.
> 
> Alex, does VFIO take care of this already?

Yes, VFIO has a lock used by the interrupt handler and the EOI handler
that prevents them from both blocking user cfg access at the same time.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 08:30:25PM +0100, Jan Kiszka wrote:
> Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
> > On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
> >> Am 02.11.2010 19:48, Jan Kiszka wrote:
> >>> Am 02.11.2010 19:40, Jan Kiszka wrote:
>  @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
>  *kvm,
>   
>   pci_reset_function(assigned_dev->dev);
>   
>  +/*
>  + * Unmask the IRQ at PCI level once the reset is done - the 
>  next user
>  + * may not expect the IRQ being masked.
>  + */
>  +if (assigned_dev->pci_2_3)
>  +pci_2_3_irq_unmask(assigned_dev->dev);
>  +
> >>>
> >>> Doesn't pci_reset_function clear mask bit? It seems to ...
> >>
> >> I was left with non-functional devices for the host here if I was not
> >> doing this. Need to recheck, but I think it was required.
> >
> > Interesting. Could you check why please?
> >
> 
>  Can't reproduce anymore. This was early code, maybe affected by some
>  bits or buts that no longer exist.
> 
>  Spec says it's cleared on reset, so I removed those lines now.
> 
> >>>
> >>> Hmpf, it just happened again: Guest was using my ath9k, I killed the
> >>> guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
> >>> config space after the reset, bringing the disable bit back?
> >>
> >> Or does the kernel cache the control word?
> >>
> >> Jan
> > 
> > Maybe it does, need to dig in drivers/pci. If it does this
> > might have other implications.
> 
> OK, that mystery is resolved now: pci_reset_function saves & restores
> the device state.
> 
> Jan

Aha. I wonder what other state we need to clear.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 20:14, Michael S. Tsirkin wrote:
>>> BTW block userspace access uses a global spinlock which will likely hurt
>>> us on multi-CPU. Switching that to something more SMP friendly, e.g. a
>>> per-device spinlock, might be a good idea: I don't see why that lock and
>>> queue are global.
>>
>> Been through that code recently, hairy stuff. pci_lock also protects the
>> bus operation which can be overloaded (e.g. for error injection - if
>> that is not the only use case). So we need a per-bus lock, but that can
>> still cause contentions if devices on the same bus are handled on
>> different cpus.
> 
> Right. So this lock got reused for blocking userspace, I do not suggest
> we rip it all out, just make userspace blocking use
> a finer-grained lock.
> 
>> I think the whole PCI config interface is simply not designed for
>> performance. It's considered a slow path, which it normally is.
> 
> So I guess we'll need to fix that now, at least if we
> want to make the 2.3 way the default.
> 

On many systems (those with "direct" PCI config access), there is
another lock down the road: pci_config_lock. That can't be broken up as
the protected resource is unique. So we do not gain much improving the
higher-level lock.

BTW, accessing the interrupt controller for IRQ line fiddling is not a
per-device thing either. So as long as the latency of the code under the
locks is not significantly worse with PCI-level masking, we don't lose
scalability here.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 20:53, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 08:30:25PM +0100, Jan Kiszka wrote:
>> Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
>>> On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:48, Jan Kiszka wrote:
> Am 02.11.2010 19:40, Jan Kiszka wrote:
>> @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
>> *kvm,
>>  
>>  pci_reset_function(assigned_dev->dev);
>>  
>> +/*
>> + * Unmask the IRQ at PCI level once the reset is done - the 
>> next user
>> + * may not expect the IRQ being masked.
>> + */
>> +if (assigned_dev->pci_2_3)
>> +pci_2_3_irq_unmask(assigned_dev->dev);
>> +
>
> Doesn't pci_reset_function clear mask bit? It seems to ...

 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.
>>>
>>> Interesting. Could you check why please?
>>>
>>
>> Can't reproduce anymore. This was early code, maybe affected by some
>> bits or buts that no longer exist.
>>
>> Spec says it's cleared on reset, so I removed those lines now.
>>
>
> Hmpf, it just happened again: Guest was using my ath9k, I killed the
> guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
> config space after the reset, bringing the disable bit back?

 Or does the kernel cache the control word?

 Jan
>>>
>>> Maybe it does, need to dig in drivers/pci. If it does this
>>> might have other implications.
>>
>> OK, that mystery is resolved now: pci_reset_function saves & restores
>> the device state.
>>
>> Jan
> 
> Aha. I wonder what other state we need to clear.
> 

Maybe just save/restore before/after assigning the device?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 08:58:36PM +0100, Jan Kiszka wrote:
> Am 02.11.2010 20:53, Michael S. Tsirkin wrote:
> > On Tue, Nov 02, 2010 at 08:30:25PM +0100, Jan Kiszka wrote:
> >> Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
>  Am 02.11.2010 19:48, Jan Kiszka wrote:
> > Am 02.11.2010 19:40, Jan Kiszka wrote:
> >> @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct 
> >> kvm *kvm,
> >>  
> >>pci_reset_function(assigned_dev->dev);
> >>  
> >> +  /*
> >> +   * Unmask the IRQ at PCI level once the reset is done - the 
> >> next user
> >> +   * may not expect the IRQ being masked.
> >> +   */
> >> +  if (assigned_dev->pci_2_3)
> >> +  pci_2_3_irq_unmask(assigned_dev->dev);
> >> +
> >
> > Doesn't pci_reset_function clear mask bit? It seems to ...
> 
>  I was left with non-functional devices for the host here if I was not
>  doing this. Need to recheck, but I think it was required.
> >>>
> >>> Interesting. Could you check why please?
> >>>
> >>
> >> Can't reproduce anymore. This was early code, maybe affected by some
> >> bits or buts that no longer exist.
> >>
> >> Spec says it's cleared on reset, so I removed those lines now.
> >>
> >
> > Hmpf, it just happened again: Guest was using my ath9k, I killed the
> > guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
> > config space after the reset, bringing the disable bit back?
> 
>  Or does the kernel cache the control word?
> 
>  Jan
> >>>
> >>> Maybe it does, need to dig in drivers/pci. If it does this
> >>> might have other implications.
> >>
> >> OK, that mystery is resolved now: pci_reset_function saves & restores
> >> the device state.
> >>
> >> Jan
> > 
> > Aha. I wonder what other state we need to clear.
> > 
> 
> Maybe just save/restore before/after assigning the device?
> 
> Jan
> 

Yea. Sounds good.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm unhandled exit 4400

2010-11-02 Thread Khaled El Mously
There seems to be a problem kvm-booting a 32-bit (Fedora) machine on a 32-bit 
Intel Q9450 (Ubuntu) machine.

The error displayed is:

kvm: unhandled exit 4400
kvm_run returned -22

I've checked google, the kvm website, forums, FAQs and IRC channel.

Anyone know what causes this?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio: Extended capability fixes

2010-11-02 Thread Tom Lyon
Applied. Thanks!

On Monday, November 01, 2010 10:08:35 pm Alex Williamson wrote:
> - Virtual channel position gets truncated as a u8
>  - Print the ecap that's unknown, not the last cap we saw
>  - Print actual config offset, which provides enough info to make
>some sense of the error.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  drivers/vfio/vfio_pci_config.c |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_pci_config.c
> b/drivers/vfio/vfio_pci_config.c index 8af995d..8304316 100644
> --- a/drivers/vfio/vfio_pci_config.c
> +++ b/drivers/vfio/vfio_pci_config.c
> @@ -410,7 +410,7 @@ static int vfio_msi_cap_len(struct vfio_dev *vdev, u8
> pos) * Determine extended capability length for VC (2 & 9) and
>   * MFVC capabilities
>   */
> -static int vfio_vc_cap_len(struct vfio_dev *vdev, u8 pos)
> +static int vfio_vc_cap_len(struct vfio_dev *vdev, u16 pos)
>  {
>   struct pci_dev *pdev = vdev->pdev;
>   u32 dw;
> @@ -580,7 +580,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
>   printk(KERN_WARNING
>   "%s: pci config conflict at %x, "
>   "caps %x %x\n",
> - __func__, i, map[pos+i], cap);
> + __func__, pos+i, map[pos+i], cap);
>   map[pos+i] = cap;
>   }
>   ret = pci_read_config_byte(pdev, pos + PCI_CAP_LIST_NEXT, &pos);
> @@ -683,7 +683,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
>   if (len == 0 || len == 0xFF) {
>   printk(KERN_WARNING
>   "%s: unknown length for pci ext cap %x\n",
> - __func__, cap);
> + __func__, ecap);
>   len = PCI_CAP_SIZEOF;
>   }
>   for (i = 0; i < len; i++) {
> @@ -691,7 +691,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
>   printk(KERN_WARNING
>   "%s: pci config conflict at %x, "
>   "caps %x %x\n",
> - __func__, i, map[epos+i], ecap);
> + __func__, epos+i, map[epos+i], ecap);
>   map[epos+i] = ecap;
>   }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio: Extended capability fixes

2010-11-02 Thread Tom Lyon
On Tuesday, November 02, 2010 12:11:08 pm Michael S. Tsirkin wrote:
> On Mon, Nov 01, 2010 at 11:08:35PM -0600, Alex Williamson wrote:
> > - Virtual channel position gets truncated as a u8
> > 
> >  - Print the ecap that's unknown, not the last cap we saw
> >  - Print actual config offset, which provides enough info to make
> >  
> >some sense of the error.
> > 
> > Signed-off-by: Alex Williamson 
> > ---
> > 
> >  drivers/vfio/vfio_pci_config.c |8 
> >  1 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_pci_config.c
> > b/drivers/vfio/vfio_pci_config.c index 8af995d..8304316 100644
> > --- a/drivers/vfio/vfio_pci_config.c
> > +++ b/drivers/vfio/vfio_pci_config.c
> > @@ -410,7 +410,7 @@ static int vfio_msi_cap_len(struct vfio_dev *vdev, u8
> > pos)
> > 
> >   * Determine extended capability length for VC (2 & 9) and
> >   * MFVC capabilities
> >   */
> > 
> > -static int vfio_vc_cap_len(struct vfio_dev *vdev, u8 pos)
> > +static int vfio_vc_cap_len(struct vfio_dev *vdev, u16 pos)
> > 
> >  {
> >  
> > struct pci_dev *pdev = vdev->pdev;
> > u32 dw;
> > 
> > @@ -580,7 +580,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
> > 
> > printk(KERN_WARNING
> > 
> > "%s: pci config conflict at %x, "
> > "caps %x %x\n",
> > 
> > -   __func__, i, map[pos+i], cap);
> > +   __func__, pos+i, map[pos+i], cap);
> > 
> > map[pos+i] = cap;
> > 
> > }
> > ret = pci_read_config_byte(pdev, pos + PCI_CAP_LIST_NEXT, &pos);
> > 
> > @@ -683,7 +683,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
> > 
> > if (len == 0 || len == 0xFF) {
> > 
> > printk(KERN_WARNING
> > 
> > "%s: unknown length for pci ext cap %x\n",
> > 
> > -   __func__, cap);
> > +   __func__, ecap);
> > 
> > len = PCI_CAP_SIZEOF;
> > 
> > }
> > for (i = 0; i < len; i++) {
> > 
> > @@ -691,7 +691,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
> > 
> > printk(KERN_WARNING
> > 
> > "%s: pci config conflict at %x, "
> > "caps %x %x\n",
> > 
> > -   __func__, i, map[epos+i], ecap);
> > +   __func__, epos+i, map[epos+i], ecap);
> > 
> > map[epos+i] = ecap;
> 
> Not related to this patch, but I am surprised checkpatch does not
> complain about lack of spaces around + here and elsewhere.
> Or does it?
It did not complain.

> 
> > }
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Does KVM support VHD(Virtual Hard Disk)?

2010-11-02 Thread Ming Gao
Hi, all,

Anyone knows that if KVM support VHD(Virtual Hard Disk) or not?

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Launch other test during migration

2010-11-02 Thread Jason Wang
Michael Goldish writes:
 > On 11/02/2010 07:34 AM, Jason Wang wrote:
 > > Michael Goldish writes:
 > >  > On 09/25/2010 11:36 AM, Jason Wang wrote:
 > >  > > We could give a further test of migration by launch test during 
 > > migartion. So
 > >  > > the following series implements:
 > >  > > 
 > >  > > - A simple class to run a specified test in the background which 
 > > could be used
 > >  > > to launch other test during migartion. Its design is rather simple 
 > > and its usage
 > >  > > is a little tricky, but it work well.
 > >  > > - Two sample tests which take advantages of the background class: 
 > > Test reboot
 > >  > > during guest migration and test file_transfer during guest migration.
 > >  > > 
 > >  > > In the future, we could even lauch autotest client test during guest 
 > > migation.
 > >  > > 
 > >  > > ---
 > >  > > 
 > >  > > Jason Wang (3):
 > >  > >   KVM Test: Introduce a helper class to run a test in the 
 > > background
 > >  > >   KVM test: Test reboot during migration
 > >  > >   KVM test: Test the file transfer during migartion
 > >  > > 
 > >  > > 
 > >  > >  client/tests/kvm/kvm_test_utils.py |   44 
 > > +++
 > >  > >  .../kvm/tests/migration_with_file_transfer.py  |   59 
 > > 
 > >  > >  client/tests/kvm/tests/migration_with_reboot.py|   45 
 > > +++
 > >  > >  client/tests/kvm/tests_base.cfg.sample |   12 
 > >  > >  4 files changed, 159 insertions(+), 1 deletions(-)
 > >  > >  create mode 100644 
 > > client/tests/kvm/tests/migration_with_file_transfer.py
 > >  > >  create mode 100644 client/tests/kvm/tests/migration_with_reboot.py
 > >  > 
 > >  > It seems to me that this method is only applicable to tests/functions
 > >  > that don't require a VM object (i.e. that require only a shell session
 > >  > object).  kvm_test_utils.reboot() operates on a VM object, and the same
 > >  > VM is destroyed by migrate() which runs in the background, so eventually
 > >  > reboot() tries logging into a destroyed VM, which fails because
 > >  > vm.get_address() fails.  Any monitor operation will also fail.  If the
 > >  > autotest wrapper requires a VM object (currently it does) then it can't
 > >  > be used either.
 > >  > 
 > > 
 > > You are right and that's an issue when running test in parallel with
 > > migration, but reboot through shell should work. The aim of this kind
 > > of test is just to add some stress ( such as run_autotest) during
 > > migartion, so most (probably all) of the tests only needs a
 > > session. So I think it's not a very big issue in this situation.
 > 
 > Many tests need to be modified in order to require only a session.  I've
 > tried reboot and it doesn't work, and I can see that run_autotest() also
 > uses a VM.  Reboot needs the VM object in order to log in to make sure
 > the VM goes back up, and run_autotest() needs it for SCP and probably
 > is_alive().  I agree that some tests don't require a VM object, but the
 > ones that do are also interesting.
 > 

Looks like every test need a VM object and we could not expect the
execution order among autotest threads, so if the thread created by
BackgroundTest was executed after migration, it would always fail.

I know re-use the VM object after migration involves lots of
modification, but is that possible or valuable?

And I think we can just focus on the tests which is useful to run in
parallel with migration. Your suggestions looks good but it only works
with the tests which have a step to wait for something like test
results. For the tests who does not have such step, another method is
needed.

 > >  > An alternative (somewhat ugly) way to migrate in the background is to
 > >  > pass a boolean 'migrate' flag to various functions/tests, such as
 > >  > reboot() and run_autotest().  If 'migrate' is True, these functions will
 > >  > do something like
 > >  > 
 > >  > vm = kvm_test_utils.migrate(vm, ...)
 > >  > 
 > >  > in their waiting loops, where wait_for() is normally used.  This
 > >  > guarantees that 'vm' is always a valid VM object.  For example:
 > >  > 
 > >  > # Log in after reboot
 > >  > while time.time() < end_time:
 > >  > if migrate_in_bg:
 > >  > vm = kvm_test_utils.migrate(vm, ...)
 > >  > session = vm.remote_login()
 > >  > if session:
 > >  > break
 > >  > time.sleep(2)
 > >  > 
 > >  > This is much longer than the usual wait_for(...) but it does the job.
 > >  > What do you think?
 > > 
 > > This makes sense but it would let testcases need to care about the
 > > migration and it's also hard to put all related codes into a wrapper
 > > which would complicate the codes.
 > > 
 > > Despite the issue of vm object, all tests that only depends on the
 > > shell session should works well with my method and it's more easy to
 > 
 > We should also find a solution for tests that require a VM object.
 > 
 > Which other tests do you think it would be interesting to r

KVM hard disk partition

2010-11-02 Thread dinesh t
Hi

 I am new to KVM development.I assigned oneof my real hard disk
partition (/dev/sdb1) to one of my KVM virtual machine and install
fedora linux in that.I booted with that. It is fine.But while
rebooting the host It is started booting the guest.

I need to start the Host not guest. Could anybody give suggestions
regarding this issue?

Reg
Dinesh.T
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html